Re: [PHP] Re: languages and PHP



At 2:01 PM -0500 9/27/07, Edward Vermillion wrote:
So back to my original question, what breaks if you're *expecting* UTF-8 and you don't *get* UTF-8?

Ed

Isn't UTF-8 the big fish here?

Sure there' UTF-16 and larger, but everything else is a subset of UTF-8, is it not?

So, what's the problem if you get a character defined by ISO -- it's still within the UTF-8 super-group, right?

The only problem I see here is IF the user has the char set to display the glyph correctly -- OR am I off on something else that you guys aren't even discussing?

Cheers,

tedd


--
-------
http://sperling.com http://ancientstones.com http://earthstones.com
.



Relevant Pages

  • Re: GAS-style syntax issue...
    ... but, alas, the issue becomes a little more hairy than a few simple parser ... I guess it is an issue right up there with making the assembler UTF-8 ... (UTF-16 just wastes too much memory IMO, ... majority of text is ASCII... ...
    (alt.lang.asm)
  • Re: UTF-16 file input, C programming.
    ... However, you are only partly correct, from the fact that all standard ASCII chars, are mapped on a single byte as you mention. ... UTF-8 only maps the standard ASCII chars in one byte and anything above is represented in two or more bytes. ... I believe unicode.org has some source, providing functions, that can convert UTF-16 surrogate pairs, into UTF-8 multibyte characters, but I will have to look into that. ...
    (comp.unix.programmer)
  • Re: unicode in ruby
    ... UNIX program: UTF-16 allows the octect 0x00, ... Hence the existence of UTF-8. ... exception is the single octet 0x00. ... UTF-16, UTF-32, and every other variation of Unicode. ...
    (comp.lang.ruby)
  • Re: MBCS oder Unicode
    ... 90% Texte habe, der aus lateinischen Buchstaben bestehen, dann ist die Frage nach UTF-8 oder UTF-16 IMHO wohl berechtigt. ... UTF-8 und UTF-16 bei ..NET wandeln. ... jedes Diakrit mit jedem Zeichen kombinieren zu wollen. ...
    (microsoft.public.de.vc)
  • Re: what does "serialization" mean?
    ... UTF-8 means that each unit is 8 bits ... of characters common to ASCII UTF-8 and UTF-16, ... bytes were used to represent each character you see. ...
    (comp.programming)