Re: System.WCh_Cnv



places where that might be needed, like text rendering, don't work on per
code point basis anyway....

Exactly. And that is wrong, and I want to fix it.

So I'm quite happy with UTF-8 and plain strings.

I am more or less happy with this too [1], but I think we can do better. With UTF-8 in strings the two abstractions (codepoints, encodings) are too entangled for my taste. In rigour you cannot use the standard string operations. I mean you can but must fiddle with the encodings i.e. you are not searching for a codepoint but for a particular encoding. Instead I want to be able to write things like

for I in Str'Range loop
if Str (I) = Euro_Sign then ...
end loop;

I cannot do that with UTF-8 in strings. Note that Wide_Wide_String is of little help here, because of the endianess issue. But it might be a good idea to base Unico on Wide_Wide_String for closeness to the standard.

[1] What makes me happy about UTF-8 is that it seems to have become a de facto default, common denominator encoding.

.



Relevant Pages

  • Re: Encoding/characterset/font family confusion
    ... between strings and texts is one that I have never encountered on the ... programs do translate encodings, and which don't. ... After reading through a few pages of UTF-8, ... Maybe when PHP6 is out, and debugged, and I switch my server to PHP6, ...
    (comp.lang.php)
  • Re: Unicode Support
    ... > Not knowing much about UTF-8 (my Unicode knowledge extends as far as ... > literal strings of this form as long as the character code for quote ... > can never appear in a MBCS (multibyte character sequence). ... then XP Notepad directly understands UNICODE and you can ...
    (alt.lang.asm)
  • Re: RfD: XCHAR wordset
    ... It's somewhat worse, because Windows has "A" prototypes, which convert the ... current code page into UTF-16 on the fly. ... Actually, it might be possible to change the current code page to UTF-8, but ... Windows strings are usually not C strings, ...
    (comp.lang.forth)
  • Re: Unicode in Regex
    ... index, length), using bytestrings and unicode regexp, verses native ... utf-8 strings in 1.9.0. ... *elegant* solution in 1.8., regexps or otherwise. ...
    (comp.lang.ruby)
  • Re: UTF-8 practically vs. theoretically in the VFS API (was: Re: JFS default behavior)
    ... UTF-8 was invented to transform unicode _text_ to filenames by ... this does not mean that they intended your typical kernel to ... It is, however, agnostic to multibyte encodings as used in e.g. ISO-C. ...
    (Linux-Kernel)