Re: COM/EXE header problem
- From: opexoc@xxxxxxxxx
- Date: 3 Jul 2006 02:55:09 -0700
Frank Kotler napisal(a):
opexoc@xxxxxxxxx wrote:
Hi. I was wondering what is exactly different in COM and EXE files. I
know that COM file is some raw piece of data, but EXE files contain
some header for example.
Yup! That's the difference.
And there begins some problem for me...
I create some simple piece of assembler code :
start:
mov ax,0x4c00
int 0x21
string db "helllo"
I compiled it as obj. file and then linked it to the EXE file.
And... the linker didn't complain??? I would expect a linker to warn
about "no stack" and "no entrypoint". Probably will link the thing into
a valid(?) executable, but it really isn't right. Some assemblers use
any label you want for an entrypoint, and the "end start" directive
tells which label is the entrypoint. Nasm uses the special symbol
"..start" to indicate the entrypoint. (this information is passed to the
linker in the .obj header, which puts it in that .exe header) You want
exactly one entrypoint per program.
You probably want to specify a stack, also. Nasm doesn't care what you
call your segments, in "-f obj" format (most other output formats know
".text", ".data" and ".bss". - sometimes others). Usually, you'll see
segment stack stack". The first "stack" is just a name - could be
"segment frank stack" just as easily - but the second "stack" is a
segment "attribute", and has to be there.
segment code
..start:
mov ax, data
mov ds, ax
; you may also want to do "mov es, ax" here.
; curiously, the example in the Nasm manual
; shows loading ss and sp here, too. You *don't*
; need to do this - dos takes care of it!
mov ax, 4C00h
int 21h
segment data
msg db "hello"
segment stack stack
resw 64
Assemble that with "nasm -f obj myfile.asm", and it'll produce
"myfile.obj", which you can link with... whatever you're using... the
command line will differ, depending on which.
But it's pointless to use a linkable format if you've only got one
module. But you can do:
;---------------------
; t1.asm
; nasm -f obj t1.asm
extern sayhi
segment code
..start:
mov ax, data
mov ds, ax
call sayhi
mov ax, 4C00h
int 21h
segment stack stack
resw 64
;----------------------
;----------------------
; t2.asm
; nasm -f obj t2.asm
global sayhi
segment code
sayhi:
mov ah, 9
mov dx, msg
int 21h
ret
segment data
msg db "hello$"
;----------------------
Now you can link them together with:
link t1.obj t2.obj
or similar, depending on linker. *That's* the point of a linkable object
format. You can also do "link myfile.obj somelib.lib".
and I used some hex view program to watch what is in this file...
and I see that on begin of this files are "MZ" letters and nextly two
or three bytes which are not equal to zero up to 200h address. There is
begun my code (on 200h). So I have a questions.
Why this header is so empty ( some many zero bytes ) ?
In a more complicated file, more of the bytes would be used.
Does it always happen ?
Depends, to an extent, on the linker, and what you tell it to do. The
minimal header is 28 bytes, I think. Most linkers add more...
What is a purpose of this header ?
Tells the loader what kind of executable format it is (no "MZ" sig in a
.com file, "MZ" in a dos .exe. "MZ" and "PE" in a Windows executable),
where the entrypoint is, how much memory it needs, how much stack... And
there's a list of addresses that need to be "relocated" (some amount is
added to each one.
Curiously, the dos loader doesn't care if the *name* is .com or .exe -
it goes *entirely* by the "MZ" signature (or not). "command.com" has
been an .exe for ages - they still call it .com for "historical reasons".
Is this header loaded into memory ?
Well... yes and no. The loader needs to read it into memory, but it
doesn't become part of your program. So for practical purposes, "no".
- there is a quite weird thing
because when I use debug.exe it looks like that only raw code is loaded
into memory because this "pure" code begins on 0000 offset.
That sounds right.
but, how does matter of COM file look like ?
Much simpler - it's exactly the code you write, nothing more.
when I use hex view program I can see only raw code.
but when I attempt to use debug.exe I see that this code is loaded into
100h offset.
So there are some questions.
Why does this code have 100h offset ?
The loader loads it at an offset of 100h into the segment it chooses. It
sets cs, ds, es, ss to this segment. (so you don't need to - can't - do
the "mov ax, data"/"mov ds, ax" bit). We need to inform Nasm that our
code will be loaded at 100h, so start a .com file with "org 100h".
- I figured out that this 100h
bytes is taken by some header
of COM files, which appears when the program is loaded into memory.
Right. It's called the "Program Segment Prefix" or "PSP". There's some
information in there that might be of interest... later. An .exe has a
PSP, too. When dos loads an MZ .exe, ds and es are pointed to the PSP -
that's why you have to load ds with your data segment in an .exe.
Can you help me to understand this ( i think my main problem is to
differ the header in file and header in memory ) ? I readed some piece
of papers about it, but still I can't understand. I will be very glad
if some can get insight to this.
There's info on the MZ header on Ben Lunt's site (Hi Ben!):
http://www.frontiernet.net/~fys/exehdr.htm
Poke around the rest of that site, too - lots of good info!
Donkey has given you some links to info about the "PE" header (which
starts off like an "MZ" header). You might want to study that instead.
You don't want to spend *too* much time learning dos. It's "dead", they
say. (of course, they say assembly language is "dead", too, and we know
it's not so! :)
Best,
Frank
I am very glad that there are still people who want to help others. I
think that I understand this quite good now ( main principles ), but I
am wondering on this PSP. It looks like that PSP takes some another
segment than the "pure" code of program and the only informations where
it is, are DS and ES register which are set a while after loaded code
into
memory. Am I right ? ( If it is true, I think that this PSP has 0000
offset relative to its segment )
.
- Follow-Ups:
- Re: COM/EXE header problem
- From: Frank Kotler
- Re: COM/EXE header problem
- References:
- COM/EXE header problem
- From: opexoc
- Re: COM/EXE header problem
- From: Frank Kotler
- COM/EXE header problem
- Prev by Date: _fillbuf
- Next by Date: Re: _fillbuf
- Previous by thread: Re: COM/EXE header problem
- Next by thread: Re: COM/EXE header problem
- Index(es):
Relevant Pages
|