Re: Unicode Delphi Win32 - which approach




4. What about the Char TYPE. I guess this will no longer become a fixed
width set of bytes. Unless we look at this type a little bit more
pragmatically and just retain the "old" meaning of Char = Byte.

5. All old source/dpr/text dfm will be converted to Unicode (UTF-8/16)
when opened in Unicode Delphi.

Ok. So what's your take on all this?

Personally I'd probably go for UTF-8. It just seems like the simplest
and best solution. But that's just my opinion.


Some things from the top of my head:

What are, in your opinion, the disadvantages of string ( := UTF-16) compared with string ( := UTF-8)?

Because we are mainly on Windows (at least for the time being) I'd rather prefer an UTF-16 encoding. It seems a more strategical approach but I don't know what work implies this in the inners of VCL.

Endianness: The Windows native. Check for BOM/endianness only in clearly specified IO routines LoadFromStream/SaveToStream for ex. - where exists the possibility of a foreign (ie. non-Windows standard) source.

Imho due the fact that UTF-16 has a more constant distribution of bytes (2 for the most of the range) the sorting will be easier and faster - but perhaps is only my opinion.

As an aside, also Java and Mac OSX uses UTF-16. Also, on Linux side Qt uses it. It seems that it will be the future.

Also consider that some definitions are already in WideString, Database connectivity area for example. How do you think to cope with it? (For me, isn't a problem to leave it as is, and unify it incrementally with the new string type).

my 2c,

m. th.
.



Relevant Pages

  • Re: The Register interview Nigel Brown
    ... performance isn't quite as good as string. ... Have you considered implementing a native UTF-8 ... than UTF-16 with European ... which does not include all Chinese characters. ...
    (borland.public.delphi.non-technical)
  • Re: What string encoding to pick as standard for a programming language?
    ... UTF-8 doesn't suffer from endianness issues. ... I don't want different string types in the language and I don't want them to ... UTF-16 has a number of issues, as you seem to have discovered as well. ...
    (comp.lang.misc)
  • Re: UTF-8 encoding in AJAX web application.
    ... And if so how come the result is still in UTF-8 when I retrieve ... in the string have to be read with a UTF-8 encoding to make sense? ... you would like the string to be UTF-16, but the bytes in the string have to ... So does fetching the CDATA section's value like this actually translate from ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: Codierungsproblem mit UTF-8
    ... > Schleifen und Datenbankabfragen einen String, ... > ist die Datei natürlich doppelt so gross, ... Das FileSystemObject kann meines Wissens nur UTF-16 oder aber 8-bit code ... Per MSXML kann man aber sicher eine UTF-8 kodierte XML-Datei erzeugen. ...
    (microsoft.public.de.inetserver.iis.asp)
  • Re: Is there a Unicode equivalent to ASCIIZ Stings?
    ... In fact, in my opinion, one of the "bad things" of UTF-8 was the ... space for characters storage). ... I thought that in UTF-16 I could assume "one ... Use UTF-8 instead, as previously written). ...
    (microsoft.public.vc.mfc)