Re: Unicode Support
- From: randyhyde@xxxxxxxxxxxxx
- Date: 19 Apr 2005 09:09:45 -0700
Chewy509@xxxxxxxxxxxxxxxx wrote:
> Hi Everyone,
>
> Something that has been bugging me, since I started on my own
> compiler/assembler is unicode support. Not the API's or libraries, or
> how unicode works (it's quite simple once you get your head around
it),
> but the fact that most assemblers (if not all assemblers), and most
HLL
> Compilers (that I've used) still require the source code to be 8bit
> ASCII. (and the use of code-pages).
For good reason. Most popular *editing* tools support this data format.
>
> So for Randy, Rene, etc, are there any plans to allow source code in
> UTF-16 format for your compilers/assemblers? eg to allow
symbols/labels
> to contain non-ASCII characters, and allow easier unicode string
> support from within the source file itself?
Again, the availability of editing tools (i.e., the lack thereof)
creates a problem here. Sure, RosAsm could do this as RosAsm users are
*tied* to that editor, but for generic tools (most other assemblers),
you're limited by the editing and other text processing tools available
to you.
And even if you find a decent editor, all of a sudden all of the other
text processing tools you're using don't work.
Is there really a need to put unicode characters into identifiers?
Sure, non-English (non-Roman) characters in identifiers might be cool
in some countries, but I don't see that dramatically improving the
usability of an assembler (or other programming language).
BTW, keep in mind that GoAsm supports Unicode in this manner today. So
people who *absolutely* need this facility (e.g., Edgar, aka "Donkey")
have a tool they can use already. Why steal GoAsm's thunder?
>
> eg to be able to support source code like this:
>
> <code = FASM>
> org 100h
>
> старт:
> mov ax, 9
> mov dx, шнур
> int 21h
> ret
>
> шнур du "여보세요 세계"
> db "$"
> шнур2 du "Γειάσου κόσμος"
> db "$"
> </code>
>
> Obvisously all directives and operands should remain as they are (in
> english as defined by Intel/AMD), but would be nice to have true
> support for userdefined labels and strings.
You can do the above with resources, easy enough, in any existing
assembler. Granted, it's not as convenient, but it works.
>
> <mini rant>
> Since we are now in 2005, most modern OS's support unicode, why do
the
> base tools we use, are still insisting on ASCII source code?
Because those base tools cooperate with a lot of secondary tools, that
also use ASCII. Getting them all to change overnight isn't going to
happen.
And there is the issue of *all* programming languages, not just
assembly. When all the major HLLs support Unicode in a manner you
describe, when all editors support it, etc., etc., I think you'll find
that assemblers are going to support it as well.
Just keep in mind, there are some *significant* costs associated with
using Unicode that dramatically affect compiler performance. Beyond the
obivious "source files are larger" issue, things like case
insensitivity can get real nasty in Unicode. Issues like "does this
character belong in a particular set of characters" gets ugly. And, of
course, there are issues like hashing functions and what-not that have
to be re-though-out.
> We all
> want the "viva asm revolution" to happen, but one thing IMHO that we
> are lacking is UTF-16 support for sourcecode.
GoAsm has Unicode support already. And does a *good* job of it. If
Unicode were all that was holding the "viva asm revolution" back, it
would be happening today, with GoAsm leading the way.
> Would it give a one-up on
> common HLL's. Well I don't know, but it will make asm more accessible
> to more global users around the world.
> </rant>
Again, it's not just the assembler. It's all the supporting tools.
GoAsm succeeds in this area because Jeremy provides a complete suite of
tools.
>
> PS. If your assembler already supports UTF-16 based source code, I
> would be deeply interested in hearing about some of the challenges in
> implementing unicode support. In particular, did you limit numbers to
> the western 0..9 figures, or did you allow other numbers to be
> included, eg arabic, many of the asian sets, etc. Did you limit to
> valid range of characters to the BMP (the first 64K characters only),
> or did you allow for the full range of characters (1024K characters)
> for labels. How did you handle compatible encodings, and combining
> characters? What about UTF-8 vs UTF-16 vs UTF-32?
I think you're starting to get the idea. :-)
Think of the bugs that are going to wind up in your code because you
didn't consider certain character combinations.
> PPPPS. I use jEdit as my preferred text editor. (It's pure java so
> should run on any java enabled platform, and supports UTF-16
natively).
And what of people who want to use a different editor, that is not
Unicode enabled? Therein lies the big problem.
Cheers,
Randy Hyde
.
- Follow-Ups:
- Re: Unicode Support
- From: Chewy509
- Re: Unicode Support
- References:
- Unicode Support
- From: Chewy509
- Unicode Support
- Prev by Date: Re: RosAsm Just Can't Compete with MASM (this is news?)
- Next by Date: Re: RosAsm is a broken pile of crap
- Previous by thread: Re: Unicode Support
- Next by thread: Re: Unicode Support
- Index(es):
Relevant Pages
|
Loading