Re: [Luxasm-devel] The "Unified Model"

From: Frank Kotler (fbkotler_at_comcast.net)
Date: 05/20/04


Date: Thu, 20 May 2004 10:51:10 GMT

Let's get this out of Jeremy's thread...

Yeah, I did see your "unified model" post... Here it is, for C and
others who didn't see it. Seems like an interesting idea - I can't say I
like the resulting syntax... I worry that you may try to take the idea
too far... seems to me that a "procedure" is a fundamentally different
thing than a "segment/section" or "structure/record/object"... But
keeping in mind the "a group's a group" idea might allow us to reuse
some "generic group-handling" code for the things that are "just a
group"...

For the unicode support, I think that's a good idea. If we follow your
"everything's a macro" idea, even "Egyptian directives" might be
possible. But we don't need to actually implement this immediately (if
we get over-ambitious, we'll never get done). The important thing is to
not lock ourselves out of it! You worry a lot about C's code getting
ahead of the "master plan"... if it needs to be modified to fit in with
what we want, it can be modified. If you see any spots where a
"placeholder" needs to be inserted, speak up! (I can see that this might
be a bigger concern for the Unified Model" idea)

Regarding the use of "not free" toolkits... lets say that I don't accept
Mr. Stallman's position that freedom mandated by legal force is the only
kind of freedom there is... Even Xlib ain't GPL...

Sorry for the "top post"...

Best,
Frank

Elisabeth Stone wrote:
>
> Hi,
>
> I had originally wanted to talk about this idea _before_ the assembler
> had work done on it...and was discussing first the "foundation" ideas with
> Frank...when C appeared with his "scanner" and kind of rendered what I'd
> wanted to say a touch moot...as this is about the general syntax approach
> the assembler could take and C's code has already decided on that kind of
> thing (without consultation - hmmph! - hence all my stuff about "please
> communicate before proceeding" in the other posts ;)...but, well, even if
> now too late for inclusion or anything and to be consigned to the dustbin, I
> might as well talk about it anyway...just to get it off my chest that I told
> everyone, as I'd intended to do...
>
> The basic idea is deceptively simple, really...
>
> What's a section / segment? A grouping of code and / or data...what's a
> structure? A grouping of code and / or data...what's an object? A grouping
> of code and / or data...what's a procedure? A grouping of code and / or
> data...what's a macro? A grouping of code and / or data...
>
> Hmmm, spotting a pattern here yet?
>
> Yes, these are all groupings of data (where "code", of course, is merely
> data in a special format that the CPU interprets as instructions and that
> the assembler provides handy "mnemonics" for to make "data entry" of these
> instructions a breeze ;)...they are groupings that serve different logical
> purposes, of course...but - and here's the rub - most assemblers treat them
> all as completely different, incompatible things with their own set of
> directives and so forth...an explosion of directives to duplicate what is
> just "grouping" again and again in numerous completely incompatible ways?
> Exactly, kind of dumb...also, as the assembler is taking upon itself to kind
> of "type" a "section" as being a different "type" of grouping from the
> "procedure" type of grouping, this is actually a touch "anti-assembly", to
> use Rene terminology...if these things only differ in logical purpose then
> it's up to the _programmer_ to _use_ the various "groupings" as they see
> fit, not for the assembler to dictate what they are...
>
> Hence, instead, we can "unify" all these things (and much more) into a
> single set of typeless "grouping" directives...these directives can be used
> to group together any kinds of code and / or data...they are kind of more
> about "grouping" symbols inside the assembler's symbol table - letting the
> assembler know these things go hand-in-hand as a "group" - than in any way
> trying to dictate how these things should be used by the programmer
> exactly...a weird way of NASM-ifying much of the pointless "red tape" crap
> other assemblers have without actually throwing out the usefulness of being
> able to do stuff like, well, "group" things...
>
> So, basically, we can look over to C (the language, not the person ;) and
> borrow their syntax (at least for the moment to make examples...but I choose
> it for "familiar" and it actually kind of fits real well into what I'm
> talking about ;)...we can have an "open curly bracket" - "{" - to start a
> group and a "close curly bracket" - "}" - to end a group...then the
> assembler simply treats the entire group of data as a "block" (exactly like,
> in C, an "if" statement expects a single statement after it...but, by using
> the curly brackets, you can put lots of statements together into a "block"
> and put that one "block" as the target of the "if" or "else" or whatever
> ;)...how do we name this "group"? Well, stick with the same old tried and
> tested way of any assembler of a identifier followed by a colon...
>
> So, putting this together - and using semi-colons to end statements rather
> than newline, just to space it all out neatly in the source code (not part
> of the idea itself, actually, but I throw this in to have neat source code
> examples to show you ;) - we can have something like the following:
>
> ---------------------------------------
>
> DataSection:
> {
> FirstStructure:
> {
> FirstMember dd ?
> SecondMember dw ?
> ThirdMember dw ?
> FourthMember db ?
> }
>
> SecondStructure:
> {
> FirstMember db ?
> SecondMember db ?
> ThirdMember dd ?
> }
>
> VariableA dw ?
> VariableB db ?
> }
>
> CodeSection:
> {
> ReadOnlyString db "Hello, World!", 0
>
> FirstProcedure:
> {
> mov eax, ebx;
> mov ebx, edx;
> mov edx, eax;
>
> ret;
> }
>
> SecondProcedure:
> {
> mov eax, ebx;
> mov ebx, edx;
> mov edx, eax;
>
> ret;
> }
> }
>
> ---------------------------------------
>
> And this is the trick with this "unified" scheme...note that "{" and "}" are
> used for numerous different purposes in the above but there's just _one_
> "unified" way of grouping things together...no need for "section" directives
> and "structure" directives and "proc" directives and so on and so
> forth...all of which are incompatible with one another and so on...kind of
> refreshing to see things like "sections" and "procedures" and "structures"
> without all that "red tape" crud hanging around it, eh? Yup, the above does
> all _work_ (in fact, it's capable of _more_ than even the most "red taped"
> assembler manages ;) and, yup, there's not one piece of "red tape" here at
> all (also, to be noted: you don't have to lay out your program in this
> way...you can lay it out any way you like...this is a simply a "tool" for
> the programmer to use to lay out code and "group" things...unlike the "red
> tape" assemblers - which is a _delibrate point_ here - there's absolutely
> NOTHING "prescriptive" about this at all...they are all the same thing
> ("unified") so the assembler itself doesn't know if a "group" is a procedure
> or a structure or whatever...it doesn't need to know, in fact...so, you can
> put anything anywhere...but the feature allows the programmer to _choose_ to
> lay out their code in a neat "hierarchical" way with sections, structures,
> procedures, etc. (and other "block"-like things...it's amazing just how many
> things are actually just "groups" when you get down to it and ignore the
> pointless "red tape" directives and useless jargon...an "object" in OOP, for
> example, is nothing but a "group" of code with corresponding data, normally
> held apart in non-OOP programming ;)...
>
> Instead, the way the "unified model" here works is just to provide the
> programmer with a generic method of "grouping" things and then _they_ choose
> how to use it...and it is the _USE_ of these groups that make them
> "sections" or "structures" or "procedures" or _WHATEVER_ (because this is
> kind of the point: the assembler no long "dictates" which is which...the
> _programmer_ does this...and the shoe is, in fact, on the other foot...the
> programmer gets to do whatever they like but can use this device of
> "grouping" things to _inform_ the assembler what's happening ;)...
>
> There's nothing really "anti-assembly" in this idea...all it's doing is
> providing support for grouping a bunch of code and data into a
> "block"...that's all...nothing else is attached or implied...
>
> One more addition, though, to consider is that "macros" are also groupings
> of code and / or data and, also, most "structure" declarations just "define
> the type" but don't create an instance of it unless instructed to do so (the
> above is a C-like "struct"...but most assemblers actually do a "typedef
> struct" - just define the type but don't create the structure at that
> point - for their "structure support" ;)...well, we can "unify" this too, if
> we realise that this is, in fact, the same thing yet again...yes, most
> things actually turn out to be "the same thing yet again" but, amazingly,
> I've not seen anyone else out there who's realised it like I have here...the
> "structure" doesn't get created where it's defined, the "macro" doesn't get
> created where it's defined...they get defined by another command which
> "invokes" them at different points (possibly multiple times ;) elsewhere in
> the program...the difference between the two? Only that we "expect" to only
> have data variables in a "structure definition" and that we "expect" to have
> code inside a "macro definition"...
>
> So, we can "unify" in this behaviour too into our "unified model" with a
> simple little change...groups are _defined_ by the curly brackets but they
> don't _appear_ where they are defined (hence, in the code now following, the
> "{" and "}" just _define_ a group, it doesn't actually appear where it's
> defined necessarily unless the ":" is used to do that :)...instead, we now
> use a more generalised ":" colon operator which equates the right-hand
> "block" to the left-hand "symbol" (which may be a single statement or a
> "group"...just like with C's "if" statements :)...
>
> Although, often, we will want to invoke our "group" in places _without_
> giving it a name...so, for this, a special "exception" is introduced...if
> you use the colon ":" equating operator without a left-hand symbol (it's the
> first non-white-space character in that line ;) then you are equating it to
> the line it's on...you can see this as being exactly like "Identifier:{ }"
> but that as we don't want to actually give it a name (a "nameless group"
> like "nameless structure or nameless unions" in C or something ;), then it's
> the same format but the name is missing ":{ }"...there's a kind of implied
> "name" at the beginning of each line that you can "equate" your "groups"
> to...kind of like there's an implied "line number" at the start of each line
> that you can equate to...
>
> Anyway, what am I talking about? Well, something like the following:
>
> ---------------------------------------
>
> MY_STRUCTURE(a, b, c):
> {
> #ifndef a { a = ? };
> #ifndef b { b = ? };
> #ifndef c { c = ? };
>
> FirstMember dw a
> SecondMember db b
> ThirdMember dd c
> }
>
> DataSection:
> {
> // With initialisers:
> //
> :MyFirstStructure :MY_STRUCTURE(1, 2, 3);
>
> // Without initialisers:
> //
> // (the macros in the structure definition
> // are used to presume "?" when no parameters
> // given...see above...and note that by having
> // "groups" defined so loosely, I am able to
> // add in macros or even code inside a
> // "structure" with no problems...the assembler
> // doesn't know it's a "structure"...this is
> // defined solely by _USAGE_ and the programmer ;)...
> //
> :MySecondStructure :MY_STRUCTURE;
> }
>
> /* Note, as the "grouping" stuff _DOESN'T_ specify what the "group"
> is for - this is entirely down to the programmer and the _USAGE_
> they make of it - then I'm taking advantage of this to create
> a kind of "OOP macro" below..."grouping" a bunch of macros to do
> with the stack together into "StackMacros"...you can "invent" your
> own kinds of "groups" that have no equivalent in other assemblers...
> this one is some kind of "MacroGroup" or something :)... */
>
> StackMacros:
> {
> #var SIZE_OF_STACK; // Compile-time variable only!
>
> CreateStackFrame(Size):
> {
> #assign SIZE_OF_STACK = Size;
>
> push ebp;
> mov ebp, esp;
> #if (SIZE_OF_STACK > 0)
> {
> sub esp, SIZE_OF_STACK;
> }
> }
>
> KillStackFrame:
> {
> #if (SIZE_OF_STACK > 0)
> {
> add esp, SIZE_OF_STACK;
> }
> mov esp, ebp;
> pop ebp;
>
> #invalidate SIZE_OF_STACK;
>
> /* The above makes the use of SIZE_OF_STACK
> invalid until re-assigned by "CreateStackFrame"
> once more...stops the macros being used
> "out-of-order" and an error happens, if used
> wrongly ;)... */
> }
> }
>
> // Simple equate:
> //
> CONSTANT_DECLARATION: 36;
>
> /* Note how the "unified model" looks at things...you could also
> just as easily create "complex equates" that use macros inside
> a "group" to check other variables that the equate is different
> in different places...again, the assembler does NOT force any
> particular kind of usage on "groups"...it's _ALL_ down to the
> programmer and how they make _USE_ of that "group" in their
> code...see below for more "complex" example :)... */
>
> COMPLEX_EQUATE:
> {
> #ifndef (DEBUGGING)
> {
> #writefile "current address = ", $;
> "Debugging string";
> }
> #else
> {
> "";
> }
> }
>
> CodeSection:
> {
> :Main:
> {
> :CreateStackFrame(0);
>
> mov eax, ebx;
> call Procedure;
>
> :KillStackFrame;
> ret;
> }
>
> :Procedure:
> {
> :CreateStackFrame(8);
>
> mov eax, CONSTANT_DECLARATION;
> mov ebx, edx;
>
> :KillStackFrame;
> ret;
> }
> }
>
> :Program:
> {
> :CodeSection;
> :DataSection;
> }
>
> ---------------------------------------
>
> In the immortal words of Keanu Reeves, "Woah!!" ;)...
>
> Four "operators" defined - ";" to end statements, ":" to equate "symbols"
> together and the "{" / "}" open and close pair - under the "unified model"
> here only...but just look at all the different things possible because they
> are "unified"! As I've always said, "the whole is greater than the sums of
> its parts" is a solid _fact_, not just a saying ;)...
>
> I'm creating structure definitions, sections, ordering the sections in the
> program, macros, constants, equates, metamorphic structure definitions (!),
> "macro groups", etc., etc....also, note that I could use this for creating
> objects in OOP, just as easily, or adding data typing onto variables (if we
> look at the above, the "MyFirstStructure:MY_STRUCTURE;" line almost looks
> like a Pascal typed data declaration...note that it _isn't_, though, as the
> assembler checks _NOTHING_ (it can't...to the assembler, a "group" is just a
> "group"..."groups" are "typeless", exactly like variables in NASM ;)...but
> if we added in some "#if" compile-time macros to test that only "compatible"
> types are being used together - and an "#error" if they are not - then you
> could manually add "data typing", if you wanted to...NOTE, of course, that
> this is just a demonstration of what is _possible_ with the "unified
> model"...I know most ASM programmers don't want to do that "typing" crap and
> you don't have to do it...but it's that powerful, versatile and flexible
> that _four_ operators and a bunch of assembler macros can add on any kind of
> "data typing" - loose, strict, absurdly strict, non-existent, etc. - that
> you like :)...
>
> You're thinking: "oh my goodness! It's HLA using C syntax rather than
> Pascal!!!" right now, aren't you?
>
> But, no! There is not a single "anti-assembly" thing in the above at
> all...not one...if it looks a touch "C" then that's just because I borrowed
> the C syntax style (but I do think that this actually makes the most elegant
> and "familiar" style :) and the effect of "unification" in being able to
> bring out an unbelievable amount of facilities and features - HLA thinks
> it's got a lot? No contest! - from just doing _ONE_ thing of adding a
> generic set of "grouping" directives...kind of see why I keep telling people
> you can do much more _together_ than apart? Here's a demonstration of how
> just one small, tiny bit of "unification" can propell a still strictly
> NASM-like assembler into the stratosphere with its "features"...
>
> And the above is NOT enforced at all (despite how it looks, I'm remaining
> very "NASM-like" to the core here...it's only the addition of a completely
> _TYPELESS_ "grouping" facility...to which the assembler has no "types" or
> "checks" or anything :)...the whole point about this strategy is that it's
> the _programmer_ who defines what is what by _USAGE_ and only by
> _USAGE_...the assembler doesn't "check" any of the above, it is simply
> outputting it according to the directions given by the operators...but those
> "operators" are defined in such a way as to _detach_ the logic from layout
> more or less completely...
>
> This is equally valid:
>
> ---------------------------------------
>
> :Main:
> {
> mov ebx, eax;
> mov ecx, ebx;
> ret;
> }
>
> ---------------------------------------
>
> Just writing it all as one procedure with global variables or
> whatever...don't even really need to put the statement into a "group" either
> but it just looks neater to me...
>
> Another nice idea is to be able to do something like:
>
> video.asm:
> ---------------------------------------
>
> // Ignore that this is DOS code rather than Linux
> // code...the only reason is that it makes an easier
> // example
>
> InitVideo:
> {
> mov ax, 0013h;
> int 10h;
> }
>
> FreeVideo:
> {
> mov ax, 0003h;
> int 10h;
> }
>
> ---------------------------------------
>
> main.asm:
> ---------------------------------------
>
> :Main:
> {
> :InitVideo;
>
> call DrawGraphics;
>
> :FreeVideo;
> }
>
> ---------------------------------------
>
> Note that the above are _macros_, not procedures...there is no "CALL" or
> "RET" instructions and "InitVideo" and "FreeVideo" are _macros_ "expanded
> inline"...but, as we have this kind of "always in macro mode" feature, you
> can _develop_ your code in a modular procedure-like way (nice and
> simple...all the "video" stuff is in "video.asm" ;)...but you're not forced
> to define your macros in the same source file...
>
> This might seem slightly unremarkable..._UNTIL_ you try doing it with any
> other assembler of your choice - MASM, HLA, RosAsm (well, certainly not as
> it doesn't even have "modules" ;), NASM, etc. - because all of those
> assemblers have their output strictly tied to the source layout...I've
> delibrately _undone_ this for the "unified model"...and there's a whole lot
> of magical things that are made possible by that...the above can be
> developed totally "modular" with a clear procedure-like format in separate
> source files but it doesn't cost you the need to use "CALL / RET"...for
> something like "InitVideo" and "FreeVideo" - which is why I chose these -
> you only need to do these things once...we're creating them as macros for
> _convenience_ (to lay it all out nicely in the source code with "video" in
> one file, "mainline" code in another ;), not because these are actually
> going to be used more than once...
>
> To be honest, playing around with this "unified model" sometimes feels a
> touch like "art"...you know, painting a picture rather than
> programming...and that's the good feeling I tend to miss from programming
> increasingly ;)...
>
> The most contravertial idea, though, to heap on top of this is to actually
> NOT define any instruction set whatsoever..."Huh?!? An assembler without an
> instruction set?!"...yes, indeed...instead, you include a "x86.inc" include
> file which defines a bunch of _macros_ that cover the x86 instruction set
> (and if you define a "CPU" variable to "486" then only the instructions up
> to a '486 are included by the include file...some "conditional assembly"
> macros there, basically..."#ifdef (CPU <= 486) {}" ;)...if you then wanted
> to add on new instructions, then just add them into the include file...use a
> different include file if you want 68K mnemonics (then you could have
> Herbert's mnemonical style ;)...create your "generic" instruction set (or
> use Java's one ;) and then an include file just relates it to an
> x86...blah-blah-blah...
>
> Not a feature directly useful for Linux ASM coding, admittedly (had the idea
> before LuxAsm was around ;) but allows the assembler to be re-defined as a
> "cross-assembler" too and easily expanded for new CPUs like "Pentium 27 with
> super-hyper-threading" or moved to any 64-bit successor that may show up
> later...
>
> Most assemblers can't really do this because having even the instruction set
> itself as just a bunch of "macros" would take a really long, long time to
> assemble big files...but if you're using my "fast food assembly" method, of
> course, then "time" is no longer any kind of big worry whatsoever anymore
> (see? More than just about beating Rene in some "speed trials" against MASM
> ;)...if you're doing it _while the code is being typed_ then there's oodles
> and oodles of time...a machine can certainly easily stay miles ahead in
> processing the code than any human is capable of typing (I mean, there's not
> too many "10,000 words per second" typists out there, is there? But a well
> written compiler could probably manage those kinds of speeds happily..."fast
> food assembly" doesn't wait to get to work on assembling the source code -
> assembles it into memory buffers - and, thus, we completely eliminate the
> usual "how long do I have to wait for my assembler to compile things?"
> problem because, in short, you never have to _wait_...ever...the assembler
> dives straight into it while you're typing and developing the code...it can
> dump out some "symbols" file or whatever to the disk too, so that it can
> pick up the code between sessions - or you can hand the file over when
> handing the source code over to someone else so they can - pick it up
> exactly where you left off last time ;)...
>
> Oh, yes, I'm dumping a whole lot of radical mad ideas on you in one go, I
> appreciate...
>
> But you know the old thing "we're all using OSes and programming methods
> developed in the '60s and '70s!" thing? Well, this is probably the kind of
> stuff that they _should_ have thought up in the '80s and '90s but never did
> because software, for some weird reason (HLLs probably, I reckon, but it
> corresponds with when they showed up and ruined everything!! ;), just
> stopped dead in its tracks...the "culture shock" of things like "fast food
> assembly" and "the unified model" wouldn't be around, if only software
> hadn't just _STOPPED_ developing...and it stopped developing as it should
> have, really, because HLLs came along and invented its own little "abstract
> universes" in which software has been pissing around doing nothing useful
> for the last few decades...and those who ignored HLLs and stuck with ASM
> tended to have done this because they were the "right, I'm going to program
> like I learnt how to do in 1927 and NOTHING is ever going to budge me into
> using some other approach" type...hence, HLLs are increasingly useless
> because all they can do is "abstract" more and more (slowing and bloating
> and mismanaging resources more and more and more) while assembly language
> hasn't really been touched by anyone - except Randy and that "Terse" guy -
> and assembly language _support_ hasn't come off the command line until,
> fair's fair, Rene dragged it kicking and screaming into a GUI
> interface...it's why I find Randy and Rene's "feud" kind of weird and
> bizarre because they actually share more in common - both wanting to
> actually _progress_ ASM (one the actual ASM language, one the ASM support
> tools surrounding it ;) - which should actually unite them because no-one
> else has been bothered to do that for a few decades...I suppose they perhaps
> _both_ want the "crown" of being the one that "made ASM popular and useable
> again!"...well, fine...while they argue about it, we can use that
> distraction to improve ASM tool support and the ASM language style, when
> they're not looking, and steal the "glory" out from under them! That'll
> teach them to fight instead of co-operate! ;)...
>
> Nah, just kidding...couldn't give a crap about "glory" or I wouldn't just
> throw all my ideas out there for anyone to pick up and use or join "open
> source" projects and that kind of thing...I want to see _ASM_ made good and
> strong..._that_ is my reward in being able to use my favourite language all
> the time to do good software _in reasonable times_..."fast food assembly" is
> only _part_ of the overall "evil mad scientist" plans I have to make ASM
> _fast_, _smooth_, _seamless_, _easy_, _able to deal with the large
> scale_...and, ultimately, a more than viable _alternative_ to, say, using C
> or C++ (if one looks at it all objectively - especially with modern OSes and
> HLA giving a small "insight" into what is possible - then it's NOT really
> that far off it now...and developers _do_ choose C / C++ already, as more
> than reasonable options...just got to get the "support" to HLL acceptable
> levels, break down the "myths" surrounding ASM and here comes "assembly
> rebirth"...on Linux, of course, not Windows! ;)...*evil laughter* muahahaha!
> :)
>
> I don't mind, though, going with C's more "NASM-like" method and dispensing
> with the radical "mad ideas" about "unifying" and all that...this can always
> be something I look at "post-LuxAsm" for a newer assembly tool again or
> something...but, well, you know the line: I believe people should be free to
> choose...but a choice is not a choice unless it's _INFORMED_...here is the
> basic information, therefore...
>
> Although, of course, what I am talking about? "Unification"...one could
> "unify" all our "mad ideas" together, I'm sure...as you can see from the
> above, I'm rather good at that kind of thing because I've just done it to
> all the other assemblers with, ooh, _four_ "operators" defined only ;)...
>
> And, yes, C, I'm talking to you!! Please respond for once!!! You're the most
> important person to talk about with this stuff and you're the one who's
> usually saying least whenever I make a post!! Don't be shy! You can tell me
> about what you think of the UTF-8 idea too while you're at it, for example
> ;)...
>
> Beth :)



Relevant Pages

  • Re: Evolution
    ... > inline assembler and larger parts by linking to an assembler ... in the HLL. ... language, particularly in smaller projects. ... > or two instructions if you ...
    (alt.lang.asm)
  • Re: What do I do with Art Of Assembly?
    ... >> and spit it out into an EXE file then, with ASM coding, it's ... > whether HLA is an assembler or not. ... instructions" such as macros, section declarations, MASM's "HLLisms" ... forms of "JMP" as 4 of the encodings may take a size prefix to select ...
    (alt.lang.asm)
  • Re: Id like to learn asm...
    ... While we're at it, remove all the system instructions, ... instructions" is a miserable way to learn assembly language ... IF segment selector is NULL ... assembler code. ...
    (alt.lang.asm)
  • Re: Two Click disassembly/reassembly
    ... Map the extra x86 registers to memory. ... > equivalents to the string instructions. ... > got such a limited RISC like instruction set that the assembler is more ...
    (alt.lang.asm)
  • Re: Trivia Question
    ... This is what I meant by "ignorance". ... people's *existing* HLL knowledge to learn assembly language programming? ... Which I had such an assembler when starting with Delphi. ... After working with the needed instructions some times, repleatly, they tend to stick around in your memory. ...
    (alt.lang.asm)