Re: COM/EXE header problem
- From: Frank Kotler <fbkotler@xxxxxxxxxxx>
- Date: Sun, 02 Jul 2006 23:22:02 -0400
opexoc@xxxxxxxxx wrote:
Hi. I was wondering what is exactly different in COM and EXE files. I
know that COM file is some raw piece of data, but EXE files contain
some header for example.
Yup! That's the difference.
And there begins some problem for me...
I create some simple piece of assembler code :
start:
mov ax,0x4c00
int 0x21
string db "helllo"
I compiled it as obj. file and then linked it to the EXE file.
And... the linker didn't complain??? I would expect a linker to warn about "no stack" and "no entrypoint". Probably will link the thing into a valid(?) executable, but it really isn't right. Some assemblers use any label you want for an entrypoint, and the "end start" directive tells which label is the entrypoint. Nasm uses the special symbol "..start" to indicate the entrypoint. (this information is passed to the linker in the .obj header, which puts it in that .exe header) You want exactly one entrypoint per program.
You probably want to specify a stack, also. Nasm doesn't care what you call your segments, in "-f obj" format (most other output formats know ".text", ".data" and ".bss". - sometimes others). Usually, you'll see
segment stack stack". The first "stack" is just a name - could be "segment frank stack" just as easily - but the second "stack" is a segment "attribute", and has to be there.
segment code
...start:
mov ax, data
mov ds, ax
; you may also want to do "mov es, ax" here.
; curiously, the example in the Nasm manual
; shows loading ss and sp here, too. You *don't*
; need to do this - dos takes care of it!
mov ax, 4C00h
int 21h
segment data
msg db "hello"
segment stack stack
resw 64
Assemble that with "nasm -f obj myfile.asm", and it'll produce "myfile.obj", which you can link with... whatever you're using... the command line will differ, depending on which.
But it's pointless to use a linkable format if you've only got one module. But you can do:
;---------------------
; t1.asm
; nasm -f obj t1.asm
extern sayhi
segment code
...start:
mov ax, data
mov ds, ax
call sayhi
mov ax, 4C00h
int 21h
segment stack stack
resw 64
;----------------------
;----------------------
; t2.asm
; nasm -f obj t2.asm
global sayhi
segment code
sayhi:
mov ah, 9
mov dx, msg
int 21h
ret
segment data
msg db "hello$"
;----------------------
Now you can link them together with:
link t1.obj t2.obj
or similar, depending on linker. *That's* the point of a linkable object format. You can also do "link myfile.obj somelib.lib".
and I used some hex view program to watch what is in this file...
and I see that on begin of this files are "MZ" letters and nextly two
or three bytes which are not equal to zero up to 200h address. There is
begun my code (on 200h). So I have a questions.
Why this header is so empty ( some many zero bytes ) ?
In a more complicated file, more of the bytes would be used.
Does it always happen ?
Depends, to an extent, on the linker, and what you tell it to do. The minimal header is 28 bytes, I think. Most linkers add more...
What is a purpose of this header ?
Tells the loader what kind of executable format it is (no "MZ" sig in a ..com file, "MZ" in a dos .exe. "MZ" and "PE" in a Windows executable), where the entrypoint is, how much memory it needs, how much stack... And there's a list of addresses that need to be "relocated" (some amount is added to each one.
Curiously, the dos loader doesn't care if the *name* is .com or .exe - it goes *entirely* by the "MZ" signature (or not). "command.com" has been an .exe for ages - they still call it .com for "historical reasons".
Is this header loaded into memory ?
Well... yes and no. The loader needs to read it into memory, but it doesn't become part of your program. So for practical purposes, "no".
- there is a quite weird thing
because when I use debug.exe it looks like that only raw code is loaded
into memory because this "pure" code begins on 0000 offset.
That sounds right.
but, how does matter of COM file look like ?
Much simpler - it's exactly the code you write, nothing more.
when I use hex view program I can see only raw code.
but when I attempt to use debug.exe I see that this code is loaded into
100h offset.
So there are some questions.
Why does this code have 100h offset ?
The loader loads it at an offset of 100h into the segment it chooses. It sets cs, ds, es, ss to this segment. (so you don't need to - can't - do the "mov ax, data"/"mov ds, ax" bit). We need to inform Nasm that our code will be loaded at 100h, so start a .com file with "org 100h".
- I figured out that this 100h
bytes is taken by some header
of COM files, which appears when the program is loaded into memory.
Right. It's called the "Program Segment Prefix" or "PSP". There's some information in there that might be of interest... later. An .exe has a PSP, too. When dos loads an MZ .exe, ds and es are pointed to the PSP - that's why you have to load ds with your data segment in an .exe.
Can you help me to understand this ( i think my main problem is to
differ the header in file and header in memory ) ? I readed some piece
of papers about it, but still I can't understand. I will be very glad
if some can get insight to this.
There's info on the MZ header on Ben Lunt's site (Hi Ben!):
http://www.frontiernet.net/~fys/exehdr.htm
Poke around the rest of that site, too - lots of good info!
Donkey has given you some links to info about the "PE" header (which starts off like an "MZ" header). You might want to study that instead. You don't want to spend *too* much time learning dos. It's "dead", they say. (of course, they say assembly language is "dead", too, and we know it's not so! :)
Best,
Frank
.
- Follow-Ups:
- Re: COM/EXE header problem
- From: opexoc
- Re: COM/EXE header problem
- References:
- COM/EXE header problem
- From: opexoc
- COM/EXE header problem
- Prev by Date: Re: COM/EXE header problem
- Next by Date: Win x64 Questions
- Previous by thread: Re: COM/EXE header problem
- Next by thread: Re: COM/EXE header problem
- Index(es):
Relevant Pages
|