Re: Encoding/characterset/font family confusion



Willem Bogaerts wrote:

..., and I catch you, I'm going to punish you by making you peel onions
for 6 months in a submarine. I swear I will.

Good luck with the onions!

And one more thing: IT'S NOT THAT HARD.

I completely disagree. The theory is not hard at all, but the difference
between strings and texts is one that I have never encountered on the
web. Encodings simply are not linked with the strings themselves, and
that makes it almost impossible. And it is really hard to find which
programs do translate encodings, and which don't. MySQL alone has far
too many encoding settings that are counter-intuitive at best. Also, the
complete lack of proper escaping possibilities makes it even more
difficult, unless you want to escape the characters by turning them into
HTML entities.

Best regards,

Yes, I completely agree.
After reading through a few pages of UTF-8, and after discovering that the
length of strings in PHP is something completely different than I want to
see, I decided to NOT use UTF-8 (yet).
Maybe when PHP6 is out, and debugged, and I switch my server to PHP6,
well... maybe then.

I'll stick to LATIN1 for now, that is something I understand, and something
I can use my existing functionlib on without a headache.

In case my app becomes so popular people outside Europe want it, I'll dive
into Unicode again. ;-)

Thanks for your time!

Regards,
Erwin Moller
.



Relevant Pages

  • Re: System.WCh_Cnv
    ... With UTF-8 in strings the two abstractions (codepoints, encodings) are too entangled for my taste. ... I mean you can but must fiddle with the encodings i.e. you are not searching for a codepoint but for a particular encoding. ...
    (comp.lang.ada)
  • Re: Unicode Support
    ... > Not knowing much about UTF-8 (my Unicode knowledge extends as far as ... > literal strings of this form as long as the character code for quote ... > can never appear in a MBCS (multibyte character sequence). ... then XP Notepad directly understands UNICODE and you can ...
    (alt.lang.asm)
  • Re: RfD: XCHAR wordset
    ... It's somewhat worse, because Windows has "A" prototypes, which convert the ... current code page into UTF-16 on the fly. ... Actually, it might be possible to change the current code page to UTF-8, but ... Windows strings are usually not C strings, ...
    (comp.lang.forth)
  • Re: Unicode in Regex
    ... index, length), using bytestrings and unicode regexp, verses native ... utf-8 strings in 1.9.0. ... *elegant* solution in 1.8., regexps or otherwise. ...
    (comp.lang.ruby)
  • Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)
    ... UTF-8 was invented to transform unicode _text_ to filenames by ... this does not mean that they intended your typical kernel to ... It is, however, agnostic to multibyte encodings as used in e.g. ISO-C. ...
    (Linux-Kernel)