Re: Unicode Support



wolfgang kern wrote:
> "Chewy509" asked about UTF-source-code:
>
> Even the idea may not find many friends, ..

Hi Wolfgang,

That seems to be what's happening... Great in theory, but trying to
make it happen, well that's another story.

> *tool size may explode if several UTF16-sets needs support

Memory and disk usage, yes. (keeping related to assemblers/compilers),
the only real parts may be the tokeniser, parsing routines, and hashing
algorithm.

Even limiting the allowed areas of where the full range of unicode
characters, to say strings and comments, will make a transition a lot
easier. eg make the restriction that labels must only contain
'a..z,A..Z,_,0..9' and numbers only '0..9', directives/operands remain
as Intel/AMD defined, strings can contain the full unicode set.

Would doing this make a great difference to the actual internals of any
assembler or compiler?

> *loss on source-portability and general readability

Unicode support:
MacOS - yes
Linux and most modern UNIX - yes (UTF-8), even GNU Emacs supports
UTF-8.
Windows - yes

Even notepad.exe in Windows XP supports unicode! (both UTF-8 and
UTF-16)

> *Net/NG-bandwidth
> (my news-reader use 'Courier New, text-only', so I'd see only
garbage)

I'll give you that one.

> ..my new disassembler got separated decoder/interpreter routines,
> so it is already prepared to output any syntax with any
character-set.
> At the moment it just output ASCII, UTF could be added easy.
>
> But I think English is the proper language for technicians and
> programmers. Without a global communication standard we had to
> rely on Babel-fish, and its name include 'Babylonian confusion' ;)

:D. I can definately see the point on the last one!

Darran (aka Chewy509).

.



Relevant Pages

  • Re: Delphi 5 und Tchechiische Zeichen
    ... wenn die Windows Controls nicht unicodefähig sind. ... Es schadet fast nichts, wenn man gleich auf unicodefähige Forms und Controls umstellt, die Konvertierung an der Schnittstelle Ansi/Unicode macht der Compiler ja automatisch. ... Wenn dann später Unicode abgespeichert werden soll, einfach einen BOM voranstellen, dann können die Funktionen sich selbständig drauf einstellen. ... Zuletzt möchte ich nochmal dringend davor warnen, im Programm UTF-8 in AnsiStrings zu packen. ...
    (de.comp.lang.delphi.misc)
  • Re: Filename Encoding Help
    ... I suggest UTF-8, it's the most efficient for regular text, and it's the default for all methods reading and writing text files in .NET. ... UTF-16 but I am not sure what Windows Vista does. ... UTF-8 can represent the full Unicode spectrum, but many characters wind up encoded in just one or two bytes. ...
    (microsoft.public.dotnet.framework)
  • Re: strings in C++
    ... I tend to save text out of application boundaries using Unicode UTF-8 ... ASCII characters (instead, with UTF-16, there is the null byte associated to ... default of Windows); there is also UTF-16 BE, which I think is used on Macs. ...
    (microsoft.public.vc.language)
  • Re: Is there a Unicode equivalent to ASCIIZ Stings?
    ... text is English I still store files in UTF-8 and just convert to Unicode ... In recent versions of Windows, 16-bit strings are ...
    (microsoft.public.vc.mfc)
  • Re: Unicode Delphi Win32 - which approach
    ... I like the backwards compatibility aspects of UTF-8 vs UTF-16. ... The first 256 Unicode characters map to the ANSI character set. ... entire stream> but calling an API 100 times in a loop I can imagine. ... and explicitly contextualise every string. ...
    (borland.public.delphi.non-technical)