Re: GAS-style syntax issue...




"Rod Pemberton" <do_not_have@xxxxxxxxxxxxx> wrote in message
news:gm1nhc$atb$1@xxxxxxxxxxxxxxxxxxxxx
"cr88192" <cr88192@xxxxxxxxxxx> wrote in message
news:gm1la9$9sq$1@xxxxxxxxxxxxxxxxxxxxx
'q' wasn't in the info I found...

but, alas, what info I could find was rather fragmentary, and even some
of
the more official documentation was far from comprehensive on some
topics...


Oh, um, I was using the raw sources for GAS...

The ones I'm using were from GNU binutils 2.15 package (DJGPP version).
binutl-2.15\include\opcode\i386.h
binutl-2.15\gas\config\tc-i386.h
binutl-2.15\gas\config\tc-i386.c

The roughly equivalent files for the current (stock) GNU binutils 2.19
package are:
bintutils-2.19\opcodes\i386-opc.tbl
bintutils-2.19\gas\config\tc-i386.h
bintutils-2.19\gas\config\tc-i386.c

The first file lists the instructions and has field(s) which lists what
types of suffixes each instruction uses, as well as cpu generation,
addressing modes, etc. The other files have info on code directives and
prefixes etc.

HTH,


that is a place to look I guess, just I would have to go get the sources...

but, oh well, no real rush, I just felt like beating together limited GAS
support (for my existing assembler) in case either I would want to load
assembler input from GCC, or other people/code not wanting to generate
Intel-style syntax...

but, alas, the issue becomes a little more hairy than a few simple parser
hacks.

I guess it is an issue right up there with making the assembler UTF-8
friendly (currently, it only really supports ASCII, and trying to use any
extended chars is likely to make it misbehave...).


I guess it is my oddity: it seems like a much better idea to me to make code
UTF-8 friendly, rather than going and storing all text and strings as
UTF-16... (UTF-16 just wastes too much memory IMO, especially when the vast
majority of text is ASCII... as I see it, UTF-16 would only really be
justified "in general" when non-ASCII characters were the majority, such as
people giving all of their variables names based on Chinese or Kana+Kanji,
but even for Cyrillic, Greek, or Arabic, text UTF-8 doesn't really use any
more space than UTF-16...).

so, as I see it, since the many of the worlds' languages either use ASCII or
else alphabets which fit in the 0x0080-0x07FF range (Latin, Latin+Accents,
Greek, Cyrillic), UTF-8 will provide the best general space usage. even for
lots of CJK text, it is only a 50% inflation over UTF-16, which would mean
that there would have to be a very large amount of CJK text in the system in
order to outweigh the space savings for most everything else...

although, I guess there is the concern with some people over the efficiency
of treating strings like arrays...

or such...



Rod Pemberton




.



Relevant Pages

  • Re: what does "serialization" mean?
    ... UTF-8 means that each unit is 8 bits ... of characters common to ASCII UTF-8 and UTF-16, ... bytes were used to represent each character you see. ...
    (comp.programming)
  • Re: How to convert from UTF-8 or ASCII to UTF-16 and back.
    ... Otherwise take a look at MultiByteToWideChar() and WideCharToMultiByte ... Mark) and you'll find it at the start of most Unicode and UTF-8 files. ... It's a simple example of how to convert from UTF-8 or ASCII to UTF-16 ...
    (microsoft.public.vc.ide_general)
  • Re: Writing out text with nulls
    ... The bytes you posted have mixed UTF-8 and UTF-16 (UTF-8 is the default for StreamWriter, and as long as the characters are all in the 0-127 range will be indistinguishable from ASCII), because you're reading UTF-16 data from the original file and emitted that data as if it were UTF-8. ...
    (microsoft.public.dotnet.languages.csharp)
  • Re: [Vim] UTF-8
    ... want it to default to standard ascii. ... Do you really mean UTF-8, ... Unicode files that Windows handles are actually UTF-16. ...
    (comp.editors)
  • Re: UTF-16 file input, C programming.
    ... However, you are only partly correct, from the fact that all standard ASCII chars, are mapped on a single byte as you mention. ... UTF-8 only maps the standard ASCII chars in one byte and anything above is represented in two or more bytes. ... I believe unicode.org has some source, providing functions, that can convert UTF-16 surrogate pairs, into UTF-8 multibyte characters, but I will have to look into that. ...
    (comp.unix.programmer)