Re: AT&T or Intel syntax ?



On Jul 1, 6:00 pm, Frank Kotler <spamt...@xxxxxxxxxx> wrote:
r...@xxxxxxxxxx wrote:

...

You cannot take NASM source code, without major
modification, and feed it into an Intel-syntax accepting compiler.

For what values of "major"? Nasm code requires far less modification
than, say, Gas or HLA. At what point does it become "completely different"?

True, Gas and HLA require more modification. OTOH, no one is calling
GAS' and HLA's syntaxes "Intel Syntax".


...

Take, for example, how an Intel Syntax
assembler uses square brackets (generally as the addition operator,
not to provide semantic information concerning the use of an
addressing mode).

Okay, lets take that, for example. Intel has a perfectly "+" operator.
Is it not a principle of language design to do a thing only one way? Is
this not a "flaw", then?

Yes, it is a flaw.
And this is an important point I should raise here: I don't
particularly believe that "Intel Syntax" is the end-all, be-all, of
assembly language syntaxes. Intel did some brilliant things when
designing the syntax of their assembler, and they did some bone-headed
things. Let's consider the one thing near and dear to every FASM,
NASM, and TASM-ideal-mode, user's heart, those brackets. In Intel
syntax, brackets are overloaded in some really bad ways. Consider:

[expr1][expr2] means expr1+expr2. This expression has nothing to do
with memory. [someMemory][2] is the same as someMemory[2] and is the
same as someMemory+2. This accesses memory if and only if "someMemory"
is a defined memory location (or a text equate the expands to one). [1]
[2] is equivalent to 3 and does not reference a memory location. [1]
[2+3] is equivalent to 6, etc. These last two bracketed expressions
are *constants*, that will invoke the immediate addressing mode if
used in an appropriate operand position. They do not access memory.

Special case #1: [reg] (reg= 16-bit register in a 16-bit segment, 32-
bit register in a 32-bit segment, or 64-bit register in a 64-bit
segment). Access the memory location pointed at by the register. Here,
the brackets serve a different purpose than the one above (they force
a memory addressing mode. Note, however, that if you have [reg]
[constexpr] the brackets *also* serve as the addition operator. This
overloading is not a particularly good idea from a language design
point of view.

Special case #2: segreg:[constexpr]. (e.g., ds:[0]). Now the brackets
do access memory. Again, a bad idea to overload the bracket operator
in this manner.

What Intel should have done: Well, #1 on the priority list is that
they should have used a consistent syntax. Brackets should either be a
summation operator (or "index" operator, if you prefer) or they should
have been used to access memory (e.g., TASM "real" mode). Overloading
them in this manner is a bit confusing and is one of the reasons other
assemblers (e.g., TASM "ideal" mode" have deviated from Intel Syntax.
And this is another important point, using the brackets to specify
memory addressing, as TASM has done in Ideal Mode, is a *deviation*
from Intel Syntax.

Personally, I feel that Intel already had the syntactical tools at
their disposal to handle force memory references: "<type> PTR". For
example, if you really wanted to access memory location zero, you
should be able to specify something like "DWORD PTR ds:0". Of course,
it would have been a bit more verbose to specify things like "byte ptr
[eax]" all over the place, but if that's a problem to the language
designers, then they should have considered a better way to specify
type coercion. (Consider the NS32K assembler syntax, for a moment;
they use a ":b", ":w", or ":d" suffix to specify type information).

As long as you (and others) have brought up HLA, I would point out
that brackets in HLA aren't used with quite the same "wild abandon" as
in the Intel syntax, but HLA *does* use them in a manner consistent
with Intel Syntax (a subset of the Intel syntax, in fact). HLA doesn't
allow you to specify a constant expression using brackets (e.g., "[1]
[2]"), but it does allow most of the Intel forms when using the
brackets as an index or indirect operator (that is, following a memory
address or surrounding a register for indirect operations). Though you
would rarely be able to take a full HLA statement and compile it as-is
under an Intel syntax assembler (exception: implied addressing mode
instructions), the syntax for indexed addressing modes *is*
transportable to an Intel assembler. This isn't a perfect translation
(HLA is a subset so you cannot go the other direction, and things like
type coercion and constant syntax are different), but it *is* closer
to Intel syntax than most of the assemblers that are claiming to be
"Intel Syntax" assemblers.





mov eax, [table + 4 + ecx * 4]

Okay with Nasm, I think Intel will eat it.

mov eax, table[4][ecx * 4]

I think that's okay with Intel, *not* okay with Nasm.

Of course, to this we can add:

table[ecx*4+4]

and several other variations as well.


So it is possible to write code that is more, or less, "completely
different".

And if you look at the productions that correspond to the grammar, you
will find that they are more or less "completely different".



Of course, there are all the semantic issues (above and beyond syntax)
to consider as well. For example, Intel's x86 assembly language
supports type checking -- a hallmark of NASM is that it does not.

Type checking, or as I prefer to call it, "type remembering", makes a
difference in syntax, even between "like Intel syntax, but simpler"
assemblers like Fasm and Nasm.

Technically, one could argue that semantics and syntax are equivalent.
The reason people make a difference between the two is because some
grammar systems are incapable of handling certain concepts, such as
"type remembering". However, this is more a function of trying to
specify a non-context-free language using a context-free grammar. CFGs
cannot handle certain constructions like type checking. However, a
*real* grammar for the language could be specified using something
more powerful than a CFG, and that grammar could capture all the type
information. So forgive me if I gave the impression that semantic
information is *not* represented by the grammar for "Intel Syntax". It
most certainly is. However, if I were to provide a CFG (say, in BNF
form), it could not appropriately capture the important semantic
information. This would make, say, the Intel grammar and the NASM
grammar appear more similar than they really are. For example, a
statement like "mov eax, memory" might like like this in BNF form:

Intel:
stmt ::= "MOV" <reg32> "," <identifer>

NASM
stmt ::= "MOV" <reg32>, "," "[" <identifier> "]"

(I've enclosed terminal symbols in quotes and non-terminal symbols in
angle brackets).

Now one could argue that this productions are hardly "completely
different", particularly as Intel's syntax also has a production that
looks like this:

stmt ::= "MOV" <reg32>, "," "[" <identifier> "]"

(because the brackets around an expression, with nothing else, imply
"+0"). However, a CFG is generally not capable of manipulating the
semantic information associated with <identifier> that differentiates
the meanings of this statement. For example, "mov reg,
[someconstexpr]" has a completely different meaning in Intel syntax
than it does in NASM syntax.

If were were to specify the grammar correctly, using some non-context-
free grammar, then the two productions would be significantly
different.



foo dd 42

mov al, [foo] ; okay with Nasm - possibly an error!
mov al, byte [foo] ; for Fasm - Nasm will accept it

If you're trying to say that there are some subset expressions that
*are* common between these two assemblers and Intel syntax, keep in
mind that you're preaching to the choir. It just so happens that I'm
one of the few people on this planet who has written a program that
generates MASM, TASM, FASM, and GAS (using "Intel Syntax") output from
a single compiler. Absolutely I've exploited lots of tricks with
respect to "holes" in the various grammars in order to simplify the
code generation process. However, I'm also *painfully* aware of the
differences between all these assemblers (even between MASM and TASM
[non-ideal mode], both of which I consider to be reasonably "Intel
Syntax" compatible). Go back and look at the HLA source code some
time. Check out all the tests for "if( assembler == XXXX)" that appear
in the code.


inc [foo] ; Okay with Fasm - Nasm complains
inc byte [foo] ; Okay with either - Nasm *needs* it

On the other hand both NASM and MASM require a size-coercion prefix
(e.g. byte [ptr]) when the data size cannot be inferred from the
operands - although admittedly MASM uses the "ptr" and NASM doesn't

You do understand that this makes the syntaxes different, right?

%define offset
%define ptr

Yes, this is certainly a trick I've used in HLA's code generation (for
FASM, different syntax for the macros, of course). But if you try and
compile this code with MASM, it breaks (interestingly enough, I
believe ASM86 uses a macro syntax that is closer to NASM than MASM
[MASM's macro facilities came from MS' Z80 assembler, IIRC).

Nonetheless, synthesizing the grammar in this way is a bit of a cheat.
If you're going to allow that, then I can claim that HLA is "Intel
Syntax" compatible because HLA's "regular expression" macros let me
specify limited context-free grammars (though enough to create an
"intel-syntax-like" grammar). I still wouldn't claim it to be "Intel
Syntax" because that's not how you would use HLA. Just like sticking
"PTR" everywhere is not how you would use NASM. NASM's designers
dropped a lot of that cruft for a reason, and AFAICT, most NASM users
agree with those decisions they made. And this is the point I keep
coming back to -- why are people so hung up on calling something
"Intel Syntax" when they obviously despise Intel Syntax so much?



goes a long way towards getting Nasm to assemble Masm code - especially
if the Masm code *uses* the optional square brackets, which quite a lot
of it does (compatibility with "Ideal syntax"?). We can't count on it,
unfortunately. I don't think trying to do it Nasm->Masm would work as well.

Not at all. You may have noticed (though this is more for semantic
reasons) that I abandoned the NASM output from HLA v1.x. I could *not*
coerce the Intel/MASM syntax that HLA produces to yield something that
NASM would accept without a major rewrite of the code generator. FASM
and GAS/Intel were bad enough.



Furthermore, MASM is *far* more capable of inferring operand size than
is NASM. Granted, this could be argued to be a semantic issue rather
than a syntactical issue, but it *is* one of the major differences
between MASM and NASM.

If by "inferring", you mean something more than what I mean by "type
remembering", I'm not sure what you mean... Masm allows something like:

assume ebx: byte ptr
inc [ebx]

You mean like that? I'd view that as "being told" and "remembering" than
as "inferring"... If that's not what you're thinking of, got an example?

Nothing quite so exotic.

memory dd ?

inc memory

MASM (or any Intel syntax assembler) stores the type information for
memory in the symbol table (as semantic information) and references
that information when it encounters "inc memory". From the symbol's
name, it infers type and classification (i.e., "memory" vs
"constant").



Getting Masm syntax into something Nasm will assemble isn't "that bad".

But you're exploiting holes in the Intel Syntax grammar to pull this
off. The fact that you *can* stick brackets around a memory expression
(to force an addition by zero) in MASM does *not* imply that the
grammars are the same or are even close.


Going Nasm->Masm would be much harder, due to the indeterminant nature
of Masm's syntax.

There is nothing "indeterminate" (that is, ambiguous) about MASM's
syntax. If there was something ambiguous, then nondetermism would
exist and MASM would generate different code on different compiles of
the same source code.


I would make that argument for MASM vs. Intel's ASM86, I certainly
would not make such a statement for NASM vs. Intel.

Seems an arbitrary distinction. Which version of Masm?

Prior to v6.
After MASM v6.0, Microsoft added a *lot* of things to the language
that went well beyond Intel Syntax (e.g., the HLL stuff). One could
argue that the subset of Intel Syntax still exists, and therefore it's
still fair to call MASM an "Intel Syntax" assembler, and I wouldn't
argue with them about this, but for two syntaxes to be compatible, you
must be able to recognize all the same programs with the push-down
automata (parsers) for those languages, and the two CFGs for the
languages should be able to generate all the same programs. MASM v6.0
fails because it recognizes more programs and the CFG generates more
programs than the comparable Intel Syntax versions. Most people don't
have a problem with this superset capability, so I don't argue about
it.




I have Nasm and
Masm 4.0 running under dosemu. I have ASM86, I "assume" it'll run under
dosemu. I *could* do some experiments - quantify it in MegaPTRs or
something. Seems like a waste of time, to me. You want ASM86? I've got
some old AoA16 example code, I think. Do you think ASM86 will assemble it?

Pretty close.
There will be some differences, of course, because we're talking about
implmentations (which are rarely correct) versus the actual languages
themselves.

We needn't get quite so exotic to see the problems. Compare TASM 4.0
and MASM 5.1 (ignoring TASM "Ideal" mode). With a little care, it's
certainly possible to write programs that compile properly with either
assembler (for example, I always made sure that the UCR stdlib for
80x86 Assembly Language Programmers compiled on both assemblers). This
is easy enough to do without all the fancy macros and auxillary syntax
you tried to employ with your NASM examples. Being pedantic, the two
syntaxes are *not* equivalent (especially as TASM *does* have that
ideal mode that creates a superset language). But I don't have a
problem with the argument that MASM and TASM are both "the same
language". Ignoring Ideal mode, the differences between them aren't a
whole lot worse than what you find with other language implementations
(e.g., different C compilers). What the NASM/FASM/whomever people are
trying to say is "look with a lot of care we can make this 'Java'
program look like C, so Java is "C syntax compatible." I'm not buying
that argument.



And that is my big
gripe, people are misusing the term "Intel Syntax" and trying to make
it mean something that it doesn't.

The observable fact is that "some" people, I would venture to say "many"
people, maybe even "most" people, use the term to mean "not AT&T
syntax". Until you've gone around and corrected them all to use the term
the way it's defined over at the University, you'd sleep better if you
just accepted the fact that sometimes "Intel syntax" *doesn't* mean
"will assemble with ASM86".

Why do you think I keep posting on this subject every time it comes
up?
Again, just because the common person *thinks* a date ending with "0"
begins a new decade (century, etc....) doesn't make it so.




The real sad part is most people
who do this turn around and claim that their "new" syntax is
"improved". If that's the case, why would they want to associate
themselves with the older, "bad", syntax?

We could call it "Ideal syntax" :)

And you'd get the same argument :-) Though less commonly used and
certainly not as formally defined, "Ideal Syntax" still have a very
specific meaning.

Why not just use "NASM syntax" or "FASM syntax"?
After all, that's what they really are. Why try to imply something
that isn't true about the language? Gee, if operand ordering were all
that is important, then I could claim HLA to be "GAS syntax
compatible". :-)


hLater,
Randy Hyde

.



Relevant Pages

  • Re: HLA History
    ... >> meaning of the brackets in the Intel Syntax. ... NASM uses a more "traditional" assembly language syntax. ... People claiming that their assemblers are "Intel Syntax" compatible ... suitable for handling HLA output. ...
    (alt.lang.asm)
  • Re: Why there are so many assemblers.
    ... "Intel Syntax" means that the assembler uses the language grammar ... NASM doesn't use that grammar. ... it is not an "Intel Syntax" assembler. ...
    (alt.lang.asm)
  • Re: HLA History
    ... If Nasm didn't already exist, ... > effort writing HLA (then again, ... > constant arrays as well as for memory addressing modes. ... > different, so it's nice to try and use different syntax, too. ...
    (alt.lang.asm)
  • Re: Significant Pure Assembler Application In MASM ?
    ... And if you're going to talk about what happens *inside* HLA, ... don't forget that HLA also produces MASM code and MASM ... "NASM grew out of a discussion on comp.lang.asm.x86 a year or two ago." ... "...that the syntax I've chosen for fasm, ...
    (alt.lang.asm)
  • Re: Significant Pure Assembler Application In MASM ?
    ... And if you're going to talk about what happens *inside* HLA, ... don't forget that HLA also produces MASM code and MASM ... "NASM grew out of a discussion on comp.lang.asm.x86 a year or two ago." ... "...that the syntax I've chosen for fasm, ...
    (alt.lang.asm)