Re: Unicode in Delphi: just deprecate WideString/WideChar
- From: Eric Grange <egrangeNO@xxxxxxxxxxxxxxx>
- Date: Tue, 05 Sep 2006 08:03:34 +0200
UTF8 is how we're currently dealing with Unicode here, essentially because it's the only efficient string currently in Delphi, that said...
There might be some problems when trying to use the index [] or doing copy commands on returning partial UTF8 characters, but these are relatively minor.
These aren't so minor, actually they're the reason a very large proportion of string-manipulating .Net applications out there aren't Unicode capable, and only deal well with UCS-2.
It happens not only in the form of trimming/cutting at the wrong place, but also under the assumption that System.Char can hold a character, and then f.i. using it as function parameter.
Mostly all these are used to parse strings, and the parsing tokens are usually in the 0..127 range (so 1 byte = 1 utf8 char).
Don't underestimate the legions of developpers that when they don't find the function they need in a 30sec search proceed to implement their own...
Granted, with UTF8 at least every developper will expect variable-lentth characters, and will take precautions (that most don't take when dealing with UTF16), so given a choice between full WideString, fullUTF and UTF-8 only I would pick UTF8. For the Delphi side, that's good, but...
I really don't see any need to support UTF16.
....Windows interfaces are in UTF16, so you have to convert to UTF16 and back everytime you call them. Conversion is reasonnably fast, but it results in the need to wrap every call.
Right now for us and UTF8, this is a necessity arising from the lack of Unicode support in Delphi, but IMO for a "Unicode-compliant Delphi" this would be quite a shame not to be able to have and use UTF8/UTF16/UTF32 string types directly.
The other side of the coin is to be able to expose DLL and Interfaces with UTF16 strings parameters to other applications.
> Almost all big text documents around nowadays are in UTF8, simply because
> it is almost always more economical. The only languages where there might
> be a slight size increase of UTF8 vs. UTF16 are Japanese and Chinese.
From what we encountered -the largest strings were XML- UTF8 is still more compact thanks to the legions of tags and other xml bits (which are < ASCII 128).
Eric
.
- References:
- Unicode in Delphi: just deprecate WideString/WideChar
- From: Eric Grange
- Re: Unicode in Delphi: just deprecate WideString/WideChar
- From: Nils Haeck
- Unicode in Delphi: just deprecate WideString/WideChar
- Prev by Date: Re: 100% pure Win32 IDE
- Next by Date: Re: Unicode in Delphi: just deprecate WideString/WideChar
- Previous by thread: Re: Unicode in Delphi: just deprecate WideString/WideChar
- Next by thread: Re: Unicode in Delphi: just deprecate WideString/WideChar
- Index(es):
Relevant Pages
|