Re: [Luxasm-devel] The "Unified Model"

From: C (blackmarlin_at_asean-mail.com)
Date: 05/21/04


Date: 21 May 2004 12:10:20 -0700

Frank Kotler <fbkotler@comcast.net> wrote in message news:<40AC8D4D.97CCE7E1@comcast.net>...
> Let's get this out of Jeremy's thread...
>
> Yeah, I did see your "unified model" post... Here it is, for C and
> others who didn't see it.

Thanks, the archive on srcfrog is not updating so I am not seeing
what ever is posted there.

> Seems like an interesting idea - I can't say I like the resulting
> syntax...

Not only is in an interesting idea -- but I think 95% of it could
be done using the _current_ design of the macro system and some
tricky macros. The other 5% by either modifying the grouping syntax
proposal, some of the compiler's core, or adding some extensions to
the directives -- I will address what and where momentarily.

> I worry that you may try to take the idea
> too far... seems to me that a "procedure" is a fundamentally different
> thing than a "segment/section" or "structure/record/object"...

Yes -- I think the #section directive will need to be retained,
though (as Luxasm does not recognise either a structure or a
procedure) the other groupings could be part of the standard
macro library. I have written parts of the macros in the following
code, though the specifics of the syntax may change before the final
release.

[snip]

> For the unicode support, I think that's a good idea. If we follow
> your "everything's a macro" idea, even "Egyptian directives" might
> be possible. But we don't need to actually implement this immediately
> (if we get over-ambitious, we'll never get done).

Yes -- I have notes on some extensions to the directives which
would allow this, though to get them to work would require a
complete rewrite of the tokeniser module (scan.asm). As a result
I am not implementing them in the NASM code.

The idea is to add directives to reconfigure the scanners table,
something like...

#letter "A" - "Z", "a" - "z", "@" ;; define labels (ranges)
#comment ";", "//" ;; define comment starts
#string "'", "`", '"' ;; define string
#space 0x20, 0x13 ;; white space
#newline 0x10 ;; new line

This would not be difficult to implement, though I do not want
to do too much too soon (as you say). [Also we will have to
caution against 'second version bloat' -- see "The Mythical
Man Month" for details on that.]

[snip]

> Elisabeth Stone wrote:

[snip]

> > So, basically, we can look over to C (the language, not the person ;) and
> > borrow their syntax (at least for the moment to make examples...but I choose
> > it for "familiar" and it actually kind of fits real well into what I'm
> > talking about ;)...we can have an "open curly bracket" - "{" - to start a
> > group and a "close curly bracket" - "}" - to end a group...

The "{" and "}" characters are defined as standard letters -- this
allows you to do some funky things with the #define. Eg.

#macro if
        // implementation of 'if'
#define {, \\
#define }, #end#if
#end#macro

Indeed, you could define them to call a macro which defines them
to something else, and so on...

[snip]

> > So, putting this together - and using semi-colons to end statements rather
> > than newline, just to space it all out neatly in the source code (not part
> > of the idea itself, actually, but I throw this in to have neat source code
> > examples to show you ;) - we can have something like the following:

I dislike the ';' as an 'end of line' marker -- seems a waste of
typing -- though with the #newline directive proposal you could
redefine it as such.

Never the less, most of the following could be processed with
the #parse and #token directives to do what you propose (you
just need to be able to tell them where to start processing).

With some syntax changes you could get the following to work...

SECTION Data
{
        STRUCT FirstStructure
        {
                FirstMember dd ?
                SecondMember dw ?
        }
}

This could be done with

#equate db.__size, 4
#equate dw.__size, 4
#equate dd.__size, 4

#macro SECTION, 1
 #if ##1 === Data
  #section .data
 #else
  #error "nasty error"
 #end#if
 #token ;; read next line into macro parameters
 #if ##0 <> 1 || !( ##1 === { )
  #error "nasty error"
 #end#if
 #push SECTION
 #assign #.last, }
 #assign }, END_SECTION
#end#macro

#macro STRUCT, 1
 #assign ##n, ##1
 #token
 #if ##0 <> 1 || !( ##1 === { )
  #error "nasty error"
 #end#if
 #assign ##p, 0
 #while !( ##1 === } )
  #equate #( ##n "." ##1 )#, ##p
  #assign ##p, ##p + #( ##2 ".__size" )#
  #assign #( ##n "." ##1 ".__init" )#, #&3
 #end#while
 #equate #( ##n ".__size" )#, ##p
#end#parse

#macro END_SECTION
 #assign }, #.last
 #pull SECTION
#end#macro

(Actually I would not bother with the SECTION macro
at all and just create a 'variable' macro which
shoves all variable definitions in the .data section
and a 'constant' macro which does similar but puts
stuff in the .rodata section -- this is what I have
done in the NASM macros I am using to write Luxasm.)

[snip]

> > And this is the trick with this "unified" scheme...note that "{" and "}" are
> > used for numerous different purposes in the above but there's just _one_
> > "unified" way of grouping things together...

Yes, though (as "{" and "}" are considered as standard letters
this _can_ be done via macros -- it would save messing with the
'if' .. 'end_if' macros I currently use, an I could make it work
_nearly_ identically to how C works... eg.

        if eax = ebx
                mov eax, ecx
        end_if

could be replaced (in Luxasm) with

        if eax = ebx
        {
                mov eax, ecx
        }

Just a matter of adding a few lines to the macros (though the Ada
style 'if' .. 'end_if' _do_ make tracking down errors when too many
or too few "}" terminators have been used. (This can be very
tricky in C.)

[snip]

> > Although, often, we will want to invoke our "group" in places _without_
> > giving it a name...so, for this, a special "exception" is introduced...

Often? Sections -- no (naming is needed on the ELF format), Procedures
-- no (you need to reference them), Structure instances -- very rare
(again need to reference).

Also how is the assembler to tell what type of grouping you want? A
section is _very_ different from a procedure and a procedure quite
different from a structure -- I would not want to necessatate too much
lookahead and why should we stop someone declairing code in a data
section or vica-versa? (This is a major problem with you proposed
syntax.)

The macros would be a little tricky, but you could define one to
do ...

structure MyStructure
{
        r db ?
        g db ?
        b db ?
        _pad db 0
}

MyStructureInstance2: MyStructure 0, 1, 2 ;; initialised
MyStructureInstance2: MyStructure ;; uninitialised

[snip]

> > StackMacros:
> > {
> > #var SIZE_OF_STACK; // Compile-time variable only!
> >
> > CreateStackFrame(Size):
> > {
> > #assign SIZE_OF_STACK = Size;
> >
> > push ebp;
> > mov ebp, esp;

Using a frame pointer -- what a waste of a register ;-)
Have a look at the 'procedure', 'locals', 'return' and
'end_procedure' macros in the NASM source for Luxasm -- though
far more complex than this it generates much better code (using
esp to reference locals and parameters, these macros can be
further improved by using features new to Luxasm).

> > #if (SIZE_OF_STACK > 0)

You can currently do...
        #if Size > 0
                (no need to #assign it).

> > {
> > sub esp, SIZE_OF_STACK;
> > }
> > }
> > KillStackFrame:
> > {
> > #if (SIZE_OF_STACK > 0)
> > {
> > add esp, SIZE_OF_STACK;
> > }
> > mov esp, ebp;
> > pop ebp;
> >
> > #invalidate SIZE_OF_STACK;
> >
> > /* The above makes the use of SIZE_OF_STACK
> > invalid until re-assigned by "CreateStackFrame"
> > once more...stops the macros being used
> > "out-of-order" and an error happens, if used
> > wrongly ;)... */
> > }
> > }

You could do something like...

#macro procedure
#push PROCEDURE
#equate ##p, ##0-1
#equate #.loci, #@
#equate #.local, 0
#assign #.name, #( ##1 "." ##p )#
#.name:
        // setup stack parameters here
#end#macro

#macro local
 #if #context === PROCEDURE || #.loci != #@
  #if #.local == 0
   sub esp, #.frame
  #end#if
  #iterate ##0
   #equate #.local, #.local + #( #type ##sz ".__size" )#
   #assign #( #.name ".__local." ##1 )#, [ esp + #.frame - #.local ]
   #rotate 1
  #end#iterate
 #else
  #error "nasty error"
 #end#if
#end#macro

#macro end_procedure
 #equate #.frame, #.local
 #if #.local > 0
  add esp, #.frame
 #end#if
 ret
 #pull PROCEDURE
#end#macro

[snip]

> > COMPLEX_EQUATE:
> > {
> > #ifndef (DEBUGGING)
> > {
> > #writefile "current address = ", $;

#writefile is a nice idea -- though the question is 'which file'?
(I assume writing will be done at compile time -- if so it could
be a useful way to output info for debuggers.)

> > "Debugging string";
> > }
> > #else
> > {
> > "";
> > }
> > }
> >
> > CodeSection:
> > {
> > :Main:
> > {
> > :CreateStackFrame(0);
> >
> > mov eax, ebx;
> > call Procedure;
> >
> > :KillStackFrame;
> > ret;
> > }
> >
> > :Procedure:
> > {
> > :CreateStackFrame(8);
> >
> > mov eax, CONSTANT_DECLARATION;
> > mov ebx, edx;
> >
> > :KillStackFrame;
> > ret;
> > }
> > }

With the current design idea you would do something like...

#include "control.lxl" ;; control flow macro library
#include "datatype.lxl" ;; standard data structures

procedure start
{
        mov eax, ebx
        call Procedure
}

procedure Procedure
{
        local myLocal:dword, myLocal2:dword
        mov esp->myLocal, eax
        mov ebx, eax
}
        
Which I think you will find is much neater. (I got close
to achieving this with NASM.)
  
> > :Program:
> > {
> > :CodeSection;
> > :DataSection;
> > }
> >

I cannot see the need for this section, especially as
defining everything as a macro can cause problems.

[snip]

> > I'm creating structure definitions, sections, ordering the sections in the
> > program, macros, constants, equates, metamorphic structure definitions (!),
> > "macro groups", etc., etc....also, note that I could use this for creating
> > objects in OOP, just as easily, or adding data typing onto variables (if we
> > look at the above, the "MyFirstStructure:MY_STRUCTURE;" line almost looks
> > like a Pascal typed data declaration...note that it _isn't_, though, as the
> > assembler checks _NOTHING_ (it can't...to the assembler, a "group" is just a
> > "group"..."groups" are "typeless", exactly like variables in NASM ;)...but

The current design is similar to this, instead of actually checking types
the assembler mearly records the type as a label or size. This can be read
and set using the #type directive. (Considering C has a weak type system,
then Luxasm has a puny type system :-) .) But to get strong typing you could
do...

#macro add, 2 ;; redefine add instruction
 #if #type ##1 === #type ##2 ;; check types match
  #=add ##1, ##2
 #else
  #warn "Type missmatch, use cast"
 #end#if
#end#macro

#macro mov, 2
 #if #register ##1
  #type ##1, #type ##2
 #else
  #error "Should not redefine type of a label, use cast"
 #end#if
#end#macro

Then just setup initial types...

#type eax, dword
#type ebx, *byte
#type ecx, *MyStructure
#type edx, *MyClass

        ...to setup the register's type id. (Most of this is designed to
occur in macros.)

[snip]

> > Another nice idea is to be able to do something like:
> >
> > video.asm:
> > ---------------------------------------
> >
> > // Ignore that this is DOS code rather than Linux
> > // code...the only reason is that it makes an easier
> > // example
> >
> > InitVideo:
> > {
> > mov ax, 0013h;
> > int 10h;
> > }
> >
> > FreeVideo:
> > {
> > mov ax, 0003h;
> > int 10h;
> > }

Current syntax would be...

#macro InitVideo, 0
        mov ax, 0013h
        int 10h
#end#macro

#macro FreeVideo, 0
        mov ax, 0003h
        int 10h
#end#macro

> > ---------------------------------------
> >
> > main.asm:
> > ---------------------------------------
> >
> > :Main:
> > {
> > :InitVideo;
> >
> > call DrawGraphics;
> >
> > :FreeVideo;
> > }

And again

#include "control.lxl"

procedure start
        InitVideo
        call DrawGraphics
        FreeVideo
end_procedure

This is virtually identical to NASM, though I
like your idea of using "{" and "}" as a way
to save a little typing. I think this could
be added to the macro library with little
difficulty. Such as...

procedure start
{
        InitVideo
        call DrawGraphics
        FreeVideo
}

Or even...

procedure start
{
        InitVideo
        {
                call DrawGraphics
        }
}

> > ---------------------------------------
> >
> > Note that the above are _macros_, not procedures...there is no "CALL" or
> > "RET" instructions and "InitVideo" and "FreeVideo" are _macros_ "expanded
> > inline"...but, as we have this kind of "always in macro mode" feature, you
> > can _develop_ your code in a modular procedure-like way (nice and
> > simple...all the "video" stuff is in "video.asm" ;)...but you're not forced
> > to define your macros in the same source file...
> >
> > This might seem slightly unremarkable..._UNTIL_ you try doing it with any
> > other assembler of your choice - MASM, HLA, RosAsm (well, certainly not as
> > it doesn't even have "modules" ;), NASM, etc. - because all of those
> > assemblers have their output strictly tied to the source layout...

You could do...

#define procedure, #macro
#define end_procedure, #end#macro

Which would have a similar effect.

> > The most contravertial idea, though, to heap on top of this is to actually
> > NOT define any instruction set whatsoever..."Huh?!? An assembler without an
> > instruction set?!"...yes, indeed...instead, you include a "x86.inc" include
> > file which defines a bunch of _macros_ that cover the x86 instruction set

Currently Luxasm 'loads' the instruction set and registers during the
initialisation stage (instead of having them hardcoded into the parser --
unlike most other assemblers). Therefore using a processor definition
file would be rather simple -- all you would have to do is load the file
then instead of calling 'Encode.load_instructions' call (something like)
'Encode.load_external_instructions' which would read the file instead of
the internal data -- this could be activated by a parameter. The only
problem is that 'unloading' instructions is kind of difficult (due to the
way the label lookup system works), so only one cpu could be used per
source file without major alterations to the compiler itself.

A fully generic version would be even more difficult though -- as the ModRM
instruction format is a right pain without support from the compiler itself.
I guess other CISC processors are likely to have similar difficult to encode
formats which would need special support too.

Also there is a '__modrm' instruction which can be used in macros to generate
instructions using the ModRM format which are not directly supported by the
assemblers internal definitions. (Though altering the internal tables
themselves would probably be easier than writing the macro.)

[snip]

> > ...and those who ignored HLLs and stuck with ASM
> > tended to have done this because they were the "right, I'm going to program
> > like I learnt how to do in 1927 and NOTHING is ever going to budge me into
> > using some other approach" type...

<pedantic> Stored programme computers had not been invented in 1927
</pedantic> :-)

I really do not care whether I use assembly or a HLL -- it is just that
_every_ HLL I have ever used has had limitations which stop me from
writing code the way I want. Assembler never has those limitations and
thus, in my opinion, is superiour to HLLs -- its only problem is the
time it takes to write a programme (reinventing the wheel and all).
Therefore I want a programming tool which does not restrict me (like HLLs
do) but which does not take an age to write anything with (as pure
assembly does). With a suitable macro library, Luxasm could well be that
tool.

[snip]

> > I don't mind, though, going with C's more "NASM-like" method and dispensing
> > with the radical "mad ideas" about "unifying" and all that...

I think Luxasm is already more powerfull than you expected. Though the
syntax is not like what you proposed, the macro system can be used to give
the programme similar semantics to what you seem to be getting at. Indeed,
you could make Luxasm parse the source of many other assemblers and large
parts of HLLs such as C, Pascal or BASIC given the right macros (if the text
redefinition directives are included then the entireity of these languages
could be processed). Having said this, the macros would be quite complex
and likely to take several hundred kbytes to implement, but Luxasm's
compile time syntax is, I believe, Turing complete so it is possible.

C
2004-05-21



Relevant Pages

  • Re: Missing documentation for HAVE_FOOBAR definitions
    ... <snip HAVE_FOO def'n list> ... this macro checked whether `setvbuf' takes ... appears in the autoconf documentation, ... looking for to see #define or #undef directives? ...
    (comp.os.linux.development.apps)
  • Re: Ann: Luxasm 00.01.07 : 2004-05-30
    ... Probably because the macro expansion code is not working yet. ... I was mainly making sure Luxasm will ...
    (alt.lang.asm)
  • Re: Difference between a MACRO and a FUNCTION
    ... waste some of the time of every attentive programmer who reads it. ... this macro, I'm likely to waste time convincing myself that the macro ...
    (comp.lang.c)
  • Re: Insert Time Not Updated
    ... But a macro is the way to go ... Re this suggestion, Unlinking the field (Unlink Field, which makes it plain ... Sub InsertTime() ...
    (microsoft.public.mac.office.word)
  • Re: "action" in UK33496
    ... that no new OP code or macro will intrude on the space ceded to users. ... The component prefix registry could be used for this, ... I would, sort of, like HLASM to implement "namespaces". ...
    (bit.listserv.ibm-main)