Re: from elsewhere, an assembler




"Wolfgang Kern" <nowhere@xxxxxxxxxxx> wrote in message
news:ev8bai$rsi$3@xxxxxxxxxxxxxxxxxxxxxxxx

Hi "cr88192",

[..]
First, I'd keep the semicolon as the "old standard" comment
delimiter and use the separator "|" between instructions.

yes, I just don't like '|' this way, and may want to use it for ...

Ok.

but, yes, semicolon serves as both comment and seperator...

Would be hard for most programmers to distinguish between
' ;' and '; ' or ';' ...

How if you help it with formatting ?
ie: have comments at a defined TAB-stop and
use an ALT-";"-Key to tell it's a comment.


errm, I have no intent at this point for any kind of specialized editor. I
tend to use notepad for everything externally, and even internal to my apps
(I use custom gui rendering code), I use a general-purpose text-editor
widget (has an interface very similar to notepad, albeit lacking any menus
or 'specific' features).

in time this may be used for inline/runtime code editing, but I don't know
(don't exactly do much coding inside my apps).

some people have before complained and demeaned notepad, but personally I
feel it is one of the best general purpose text editors (or at least, when
set to use a nice fixed-width font, presently I use fixedsys).

most of the time, it is fine, if a few times some features would have been
nice (ie: a more capable/intelligent find/replace feature).

I also like how it is fairly light on memory and windows resources, so I can
have a good number of them running (nice though would be a notepad like
editor where each window did not use any windows resources, and was capable
of partly sorting open windows by category).

dunno though, good enough (many others I have looked at typically seem
either overly specialized or wonky).

better find/replace would be nice, syntax highlighting maybe (don't find
much need personally), heavy resource/mem use or some wonky/incapable
interface, no...

notepad also comes by default with windows, which is worth something by
itself.

I dislike though when other people had used, ie, my laptop, and then went
and resized the window or changed the font (I also like having each window
at a fixed 80x25 layout, odd as that is, before thinking it would be nice if
I could lock the size...).

for some tasks though I will use other editors (such as vi or emacs). on
linux for gui-based editing, I have usually used gedit or kedit (and often
vi from the shell for small tasks).

reminds me of at one time, my mom was taking some linux-certification
course, and needed to be assisted with using VI. seems many people have a
surprisingly difficult time with this. I just don't like using vi for any
larger tasks (a linux clone of the old dos 'edit' would have been nice...).


as noted elsewhere, I already have a good mass of code (maybe 5-10 kloc
or so) that depends on this particular feature (2 different JIT
backends),
and it would be too much of a hassle to go and modify it.

Often things gotta change on users demand ... :)


note, I still intend this primarily for backend/autogenerated code,
reasoning mostly that assembler now exists in a state of great decline
anymore, ie, where everyone and their dog knows Java, many know C, and only
some know assembler...


note: this kind of character overloading is very common in HLLs,
which is what I am most used to.

I hope you try to write an assembler!


I write it as I write it, with whatever seems to IMO make sense.
an assembler exists as an assembler, since it is the lowest reasonable level
for code generation (one moves up to the level of bytecode, and is limited
to whatever the JIT implements, and one moves lower than assembler, well
then they have an ugly mess...).


[..]
macros:
would be a hassle to implement (sensible, maybe, if there will be a lot
of human-written code, but very optional for machine-written code,
where adding a few synthetic utility ops may make sense).

I see.

[REX and INC reg]

Just one flag is required to either produce the two or the one byte code.


yes, that is what was done eventually (actually, it involved a flag check
and a special case in the encode-'op <reg>' function).

it involves actually changing the opcode nmonic index (internally, each
nmonic is given a number used prior to locating the correct form of the
opcode during assembly).


some other cases are similar.
inc word [esi]
inc dword [esi]
is how it is done at present in my case (ptr is optional/ignored).

Because I'm a lazy typist, I use the shorter INCb INCw INCd INCq
similar to MOVSb/w/d/q, but your way is more portable.


yeah.


note that often duplicating opcodes with different names leads to
inflation in the listing files (they are regarded as completely
different opcodes, and are duplicated accordingly).

Sure. And also the duplicated instructions like 8B C3 vs. 89 C8.
Here I always recommend to use the Direction-bit as a LOAD/store
indicator, so only MOV [mem],reg should use the '89' form.


yes, in my case I have reg,rm forms generally precede rm,reg forms, so they
are higher precedence.

similarities like this are not exploited (or exploitable) by the assembler,
so each is completely seperate wrt the listing (apart from being listed
under the same nmonic).


some cases have been handled this way though, namely where it was
ambiguous (ie: my current assembler can't figure it out).

thus:
movzx and movsz
now have alternative forms:
movzxw and movsxw
could possibly also add:
movzxb and movsxb
as alternatives to the originals (for clarity).

Yes, the CPU instructions are different for MOVZXb and MOVZXw/d/q
So it sounds logical to add these optional to the syntax.


yeah.
dunno what others have done here, I just noticed that, "oh crap", the only
distinction was a difference in the size of the right-hand memory oprand. my
assembler can't handle this one, so I split it off...

actually, as it is my assembler also can't handle 3-arg opcodes either (so
they have been generally ommitted).

I have at times considered partly rewriting this part of my assembler (both
the listing-translation tool and the opcode matching), so that each argument
is fully qualified (size and type), vs as it is where they are only partly
qualified (a single 'size' field is used for the whole opcode).

in this case, the current size field or similar would probably be reused as
an argument (allowing some 3-oprand forms and funky combinations of fixed
regs and sizes, as found in some opcodes).


btw:(64-bit mode)
I optionally use 'Zp' in my disassembler to indicate the inherent
ZeroPage Addressing, where the upper 32 bits are quietly zeroed.
But for an assembler this is a 'just know, don't care' issue.


I am not sure, I am not fammiliar with this one...


[..]
Mmh? Compile in memory?
Where else? :)

You mean immediate compilation with prototype opcode ?
where the programmer immediate can see code-size and format.
This is an interesting attempt as it would help for better
performing coding styles in general.

actually, I mean that, I directly compile/assemble the code, and run
it where it is (vs storing it in object files and passing it off to
the linker). this is why it is needed to auto-link against the host
app, so that eventually I may be able to run dynamicly compiled code
just like statically compiled code (apart from the fact that, sadly,
anything pruned out by the traditional linker is not directly usable).

Yeah, good for immediate test and debug, but will be tricky to avoid
run-time compiled delays.


yeah.

then again, I have some experience writing dynamic compilers for script
languages. doing all this magic for mixed dynamic and statically compiled
code is in a way a reasonable next step.

at present, once code is assembled it is more or less frozen in place, so
should work about the same as normal statically-compiled code (except that
on windows I am currently running in memory grabbed from malloc of all
places...).

a linux port may need to use mmap, so that I can explicitly get
read/write/execute memory...


as such, I am considering specialized object file, and possibly library
loading, where it may be possible to pull new code/data from libraries
as needed (or to simply just link the whole big mass into memory).
at least if I am using this with statically compiled versions of the
same libs, the static versions should get precedence (so I am not ending
up with mixed duplicated and non-duplicated state).

I wrote my 'libs' as self-relocating modules, so they will run
anywhere in memory without linking-tools and relocating.
The address given by mem_alloc for loading it is already the
link address and there are no delaying relocate needs at all.


yes.

however, these libs may well be masses of code compiled by GCC, and at least
for PE/COFF it refuses to generate PIC (actually, it claims that PIC is the
default, but from disassembly, obviously not). can't see why PIC is
supposedly not possible with COFF (one is only lacking a GOT). of course,
this could be argued to not be truely PIC, but only a hybrid (since, static
linking would still be needed in producing the lib).


oh well, I am currently thinking I will have to go to using a hash-chaining
scheme rather than a hash-caching scheme, because otherwise linking will
become an O(n^2 operation), and with the current size of the libs I am
considering, that could become horrible...


[about your list]
I've seen it on CLAX, now I think to know its purpose...

the listing is used in my assembler, to autogenerate the tables needed
for doing assembly.
[..]
Yes, I wrote my disassembler in a similar way, just swap the
tables to work with another CPU-family...


yes, except I use a single table for everything (all archs, all modes).

that is why certain things are represented with letters, namely so that they
can be handled in a mode-specific way.


however, with some more recent changes now the address-size byte is
inserted automatically in some cases, so it works differently than
the way it is done in the listings (potentially, some esoteric
situations could result in duplicate address bytes, which would be bad).

Oh yes, the '67'-override was always a problem for assemblers,
so not too many work it out the correct way or support it at all.

Have you planned to allow mixed code assembly?
ie: use16(32,64)
Here you'll need all allowed mix of prefix bytes available.


mixed 16/32 bit code should work, at least in theory (beat against this
recently, but I am unsure as to whether or not I will ever have much reason
to use this).

past this, I am much less certain (mixing 64-bit long-mode code with 16 or
32 bit code, could be horrible, dunno what the hell the CPU does here).

or course, it is always possible to simply tell the assembler to use a
different size:

section .text

bits 32 (or .a32)
....

bits 64
....

but this is different...


[..]
at present the assembler should do something sensible (and in these
particular cases, the a16 or a32 prefix is simply redundant).
may clean this up eventually...

I'd keep it alive, just in case...


yeah.
this prefix was borrowed from nasm anyways.


__
wolfgang





.



Relevant Pages

  • Re: Video Mode 13h in windows XP ... impossible?
    ... there is NO problem using mode 13h under XP...if you write a DOS ... terminates so does the "DOS box" and you return to the Windows desktop ... A dedicated assembler (most of which are completely free and available ... NASM, this is _certainty_ that it's okay for you to do so...the people ...
    (alt.lang.asm)
  • Re: Dynamic (as in Reflective) Programming in C?
    ... sparked by curiousity. ... (illegal, but oddly allowed by most compilers), then buffering all the ... assembler, and a compiler. ... In Windows: ...
    (comp.lang.c)
  • Re: Segmentation in real mode
    ... Except that Windows does have its "prohibition" on "direct hardware ... what you're saying about 32-bit programming is still ... Linux is often ignored in this context but it's actually arguably the best ... a "portable assembler", so, you know, it's not really got anything ...
    (alt.lang.asm)
  • Re: ASM noob - couple of questions
    ... comes to Windows programming. ... When you talk of "tutorials for beginners", ... The Visual Tuts? ... learn and unlearn later when using another assembler. ...
    (alt.lang.asm)
  • Re: Beginner Learning Assembler
    ... Linux is free and thus gets spread to more machines than Windows -- so ... language is still the C language no matter what compiler you use. ... the Windows API is still the same Windows ... API no matter what assembler you use. ...
    (alt.lang.asm)