Re: Another Cross Platform Delphi Tought



Zoren Lendry wrote:

Allen Bauer (CodeGear) wrote:
WideChar is a UTF-16 "character" which could be a high or low
surrogate. This is the format used by the underlying OS. Rather
than forcing expensive conversions all over the place,
UnicodeString will be UTF-16 encoded. Unless you're dealing with
data from old Phoenician texts, most things will not trip into
surrogates.

Aah, so the "char" section of WideChar is a bit misleading -- it
really just refers to the word size subsection of a Unicode string,
and then you potentially have just one more layer to climb up to get
what is normally thought of as a "char".

In the strictest sense, yes. At one point is was the same as a Unicode
Char. Since then, the Unicode.org has increased the number of code
points to $10FFFF, which necessitated the introduction of UTF-16.
Since Windows NT had already been shipping for a while, MS moved to
UTF-16 so that they wouldn't have to add "yet another layer of APIs."

Will there be a function to simply get the CharLength(AUnicodeString)
-- which knows about the surrogates?

Don't know yet. Possibly.

Loren sZendre

BTW, as a historical linguist, I just may be dealing with old
Phoenician someday!

I challenge you to find a font that contains those code-points :-)...
Processing text and rendering text for display and/or print are two
different things. You may be able to process the text, but I doubt
there would be very many fonts that will allow you to render the glyphs.

--
Allen Bauer
CodeGear
Chief Scientist
http://blogs.borland.com/abauer
.



Relevant Pages

  • Re: UNICODE and Encoding
    ... Note that UCS-2 is a proper subset of UTF-16 (it's essentially the subset ... of UTF-16 that doesn't involve surrogates). ... Subject: UNICODE and Encoding ... Character data types that are either fixed-length or ...
    (microsoft.public.sqlserver.programming)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... (When Unix filesystems can write UTF-16 as ... to use decomposed characters instead of composed characters (e.g., ... even compress repetitive text which no encoding can. ...
    (comp.lang.ruby)
  • Re: unicode in ruby
    ... doesn't support unicode strings natively? ... put on Unix ages ago. ... (When Unix filesystems can write UTF-16 as ... translate to UTF-8 and/or follow the nonsensical POSIX rules for native ...
    (comp.lang.ruby)
  • Re: Case-sensitivity as option?
    ... Code points beyond 0x10FFFF cannot be encoded with UTF-16, ... it is unlikely that Unicode will ... Windows to UTF-8. ... encode them with normal surrogates. ...
    (comp.lang.forth)
  • WM_CHAR
    ... Note that WM_CHAR uses 16-bit Unicode ... of the character key that was pressed. ... version of Windows. ... WM_CHAR chooses between UTF-16 and ASCII depending on whether the window ...
    (microsoft.public.win32.programmer.kernel)

Loading