Unicode Support
- From: Chewy509@xxxxxxxxxxxxxxxx
- Date: 18 Apr 2005 21:51:24 -0700
Hi Everyone,
Something that has been bugging me, since I started on my own
compiler/assembler is unicode support. Not the API's or libraries, or
how unicode works (it's quite simple once you get your head around it),
but the fact that most assemblers (if not all assemblers), and most HLL
Compilers (that I've used) still require the source code to be 8bit
ASCII. (and the use of code-pages).
So for Randy, Rene, etc, are there any plans to allow source code in
UTF-16 format for your compilers/assemblers? eg to allow symbols/labels
to contain non-ASCII characters, and allow easier unicode string
support from within the source file itself?
eg to be able to support source code like this:
<code = FASM>
org 100h
старт:
mov ax, 9
mov dx, шнур
int 21h
ret
шнур du "여보세요 세계"
db "$"
шнур2 du "Γειάσου κόσμος"
db "$"
</code>
Obvisously all directives and operands should remain as they are (in
english as defined by Intel/AMD), but would be nice to have true
support for userdefined labels and strings.
<mini rant>
Since we are now in 2005, most modern OS's support unicode, why do the
base tools we use, are still insisting on ASCII source code? We all
want the "viva asm revolution" to happen, but one thing IMHO that we
are lacking is UTF-16 support for sourcecode. Would it give a one-up on
common HLL's. Well I don't know, but it will make asm more accessible
to more global users around the world.
</rant>
PS. If your assembler already supports UTF-16 based source code, I
would be deeply interested in hearing about some of the challenges in
implementing unicode support. In particular, did you limit numbers to
the western 0..9 figures, or did you allow other numbers to be
included, eg arabic, many of the asian sets, etc. Did you limit to
valid range of characters to the BMP (the first 64K characters only),
or did you allow for the full range of characters (1024K characters)
for labels. How did you handle compatible encodings, and combining
characters? What about UTF-8 vs UTF-16 vs UTF-32?
PPS. I know the DOS API doesn't support unicode strings, but just used
it for the example.
PPPS. The full Unicode 4.1 spec can be downloaded as PDF's from
www.unicode.org.
PPPPS. I use jEdit as my preferred text editor. (It's pure java so
should run on any java enabled platform, and supports UTF-16 natively).
.
- Follow-Ups:
- Re: Unicode Support
- From: wolfgang kern
- Re: Unicode Support
- From: randyhyde
- Re: Unicode Support
- From: websnarf
- Re: Unicode Support
- From: Betov
- Re: Unicode Support
- Prev by Date: Re: Linux, X, ld, gcc, linking, shared libraries and stuff
- Next by Date: Re: Linux, X, ld, gcc, linking, shared libraries and stuff
- Previous by thread: RosAsm is a broken pile of crap
- Next by thread: Re: Unicode Support
- Index(es):
Relevant Pages
|