Re: COM/EXE header problem



opexoc@xxxxxxxxx wrote:
Hi. I was wondering what is exactly different in COM and EXE files. I
know that COM file is some raw piece of data, but EXE files contain
some header for example.

Yup! That's the difference.

And there begins some problem for me...
I create some simple piece of assembler code :

start:
mov ax,0x4c00
int 0x21
string db "helllo"

I compiled it as obj. file and then linked it to the EXE file.

And... the linker didn't complain??? I would expect a linker to warn about "no stack" and "no entrypoint". Probably will link the thing into a valid(?) executable, but it really isn't right. Some assemblers use any label you want for an entrypoint, and the "end start" directive tells which label is the entrypoint. Nasm uses the special symbol "..start" to indicate the entrypoint. (this information is passed to the linker in the .obj header, which puts it in that .exe header) You want exactly one entrypoint per program.

You probably want to specify a stack, also. Nasm doesn't care what you call your segments, in "-f obj" format (most other output formats know ".text", ".data" and ".bss". - sometimes others). Usually, you'll see
segment stack stack". The first "stack" is just a name - could be "segment frank stack" just as easily - but the second "stack" is a segment "attribute", and has to be there.

segment code
...start:
mov ax, data
mov ds, ax
; you may also want to do "mov es, ax" here.

; curiously, the example in the Nasm manual
; shows loading ss and sp here, too. You *don't*
; need to do this - dos takes care of it!

mov ax, 4C00h
int 21h

segment data
msg db "hello"

segment stack stack
resw 64

Assemble that with "nasm -f obj myfile.asm", and it'll produce "myfile.obj", which you can link with... whatever you're using... the command line will differ, depending on which.

But it's pointless to use a linkable format if you've only got one module. But you can do:

;---------------------
; t1.asm
; nasm -f obj t1.asm

extern sayhi

segment code
...start:
mov ax, data
mov ds, ax
call sayhi
mov ax, 4C00h
int 21h

segment stack stack
resw 64
;----------------------

;----------------------
; t2.asm
; nasm -f obj t2.asm

global sayhi

segment code
sayhi:
mov ah, 9
mov dx, msg
int 21h
ret

segment data
msg db "hello$"
;----------------------

Now you can link them together with:

link t1.obj t2.obj

or similar, depending on linker. *That's* the point of a linkable object format. You can also do "link myfile.obj somelib.lib".

and I used some hex view program to watch what is in this file...
and I see that on begin of this files are "MZ" letters and nextly two
or three bytes which are not equal to zero up to 200h address. There is
begun my code (on 200h). So I have a questions.

Why this header is so empty ( some many zero bytes ) ?

In a more complicated file, more of the bytes would be used.

Does it always happen ?

Depends, to an extent, on the linker, and what you tell it to do. The minimal header is 28 bytes, I think. Most linkers add more...

What is a purpose of this header ?

Tells the loader what kind of executable format it is (no "MZ" sig in a ..com file, "MZ" in a dos .exe. "MZ" and "PE" in a Windows executable), where the entrypoint is, how much memory it needs, how much stack... And there's a list of addresses that need to be "relocated" (some amount is added to each one.

Curiously, the dos loader doesn't care if the *name* is .com or .exe - it goes *entirely* by the "MZ" signature (or not). "command.com" has been an .exe for ages - they still call it .com for "historical reasons".

Is this header loaded into memory ?

Well... yes and no. The loader needs to read it into memory, but it doesn't become part of your program. So for practical purposes, "no".

- there is a quite weird thing
because when I use debug.exe it looks like that only raw code is loaded
into memory because this "pure" code begins on 0000 offset.

That sounds right.

but, how does matter of COM file look like ?

Much simpler - it's exactly the code you write, nothing more.

when I use hex view program I can see only raw code.
but when I attempt to use debug.exe I see that this code is loaded into
100h offset.
So there are some questions.

Why does this code have 100h offset ?

The loader loads it at an offset of 100h into the segment it chooses. It sets cs, ds, es, ss to this segment. (so you don't need to - can't - do the "mov ax, data"/"mov ds, ax" bit). We need to inform Nasm that our code will be loaded at 100h, so start a .com file with "org 100h".

- I figured out that this 100h
bytes is taken by some header
of COM files, which appears when the program is loaded into memory.

Right. It's called the "Program Segment Prefix" or "PSP". There's some information in there that might be of interest... later. An .exe has a PSP, too. When dos loads an MZ .exe, ds and es are pointed to the PSP - that's why you have to load ds with your data segment in an .exe.

Can you help me to understand this ( i think my main problem is to
differ the header in file and header in memory ) ? I readed some piece
of papers about it, but still I can't understand. I will be very glad
if some can get insight to this.

There's info on the MZ header on Ben Lunt's site (Hi Ben!):

http://www.frontiernet.net/~fys/exehdr.htm

Poke around the rest of that site, too - lots of good info!

Donkey has given you some links to info about the "PE" header (which starts off like an "MZ" header). You might want to study that instead. You don't want to spend *too* much time learning dos. It's "dead", they say. (of course, they say assembly language is "dead", too, and we know it's not so! :)

Best,
Frank
.



Relevant Pages

  • Re: Expand Down Stack in Protected Mode
    ... and you start "already below limit" when using expand down! ... Stack will always grow downwards on x86 CPUs. ... and an Expand Down segment can be enlarged downward ... I just needed to change the 'mov dword esp,0x00000010' ...
    (comp.lang.asm.x86)
  • Re: COM/EXE header problem
    ... some header for example. ... mov ax,0x4c00 ... file and then linked it to the EXE file. ... segment "attribute", ...
    (alt.lang.asm)
  • Re: Program abnormally terminates
    ... Your original post indicated that the code was failing at that point - "mov dl, ... In real mode, any interrupts that occur use your stack, in addition to any use you make of it - which is already enough to "blow the stack". ... something, and referring to it by that, rather than just putting it first in the segment and assuming offset 0. ...
    (alt.lang.asm)
  • Re: Program abnormally terminates
    ... mov ax, DATEN2 ... fiddle and diddle around long enough so an interrupt *will* occur. ... interrupts are "faked" and may get a new stack as you describe... ... 32-bit segment, which we wouldn't want. ...
    (alt.lang.asm)
  • Re: itoa assembly version
    ... display segment ... stack segment ... mov ax, data ... The only thing I see is that you zero dx at the *end* of your loop. ...
    (alt.lang.asm)