Re: COM/EXE header problem




Frank Kotler napisal(a):
opexoc@xxxxxxxxx wrote:
Hi. I was wondering what is exactly different in COM and EXE files. I
know that COM file is some raw piece of data, but EXE files contain
some header for example.

Yup! That's the difference.

And there begins some problem for me...
I create some simple piece of assembler code :

start:
mov ax,0x4c00
int 0x21
string db "helllo"

I compiled it as obj. file and then linked it to the EXE file.

And... the linker didn't complain??? I would expect a linker to warn
about "no stack" and "no entrypoint". Probably will link the thing into
a valid(?) executable, but it really isn't right. Some assemblers use
any label you want for an entrypoint, and the "end start" directive
tells which label is the entrypoint. Nasm uses the special symbol
"..start" to indicate the entrypoint. (this information is passed to the
linker in the .obj header, which puts it in that .exe header) You want
exactly one entrypoint per program.

You probably want to specify a stack, also. Nasm doesn't care what you
call your segments, in "-f obj" format (most other output formats know
".text", ".data" and ".bss". - sometimes others). Usually, you'll see
segment stack stack". The first "stack" is just a name - could be
"segment frank stack" just as easily - but the second "stack" is a
segment "attribute", and has to be there.

segment code
..start:
mov ax, data
mov ds, ax
; you may also want to do "mov es, ax" here.

; curiously, the example in the Nasm manual
; shows loading ss and sp here, too. You *don't*
; need to do this - dos takes care of it!

mov ax, 4C00h
int 21h

segment data
msg db "hello"

segment stack stack
resw 64

Assemble that with "nasm -f obj myfile.asm", and it'll produce
"myfile.obj", which you can link with... whatever you're using... the
command line will differ, depending on which.

But it's pointless to use a linkable format if you've only got one
module. But you can do:

;---------------------
; t1.asm
; nasm -f obj t1.asm

extern sayhi

segment code
..start:
mov ax, data
mov ds, ax
call sayhi
mov ax, 4C00h
int 21h

segment stack stack
resw 64
;----------------------

;----------------------
; t2.asm
; nasm -f obj t2.asm

global sayhi

segment code
sayhi:
mov ah, 9
mov dx, msg
int 21h
ret

segment data
msg db "hello$"
;----------------------

Now you can link them together with:

link t1.obj t2.obj

or similar, depending on linker. *That's* the point of a linkable object
format. You can also do "link myfile.obj somelib.lib".

and I used some hex view program to watch what is in this file...
and I see that on begin of this files are "MZ" letters and nextly two
or three bytes which are not equal to zero up to 200h address. There is
begun my code (on 200h). So I have a questions.

Why this header is so empty ( some many zero bytes ) ?

In a more complicated file, more of the bytes would be used.

Does it always happen ?

Depends, to an extent, on the linker, and what you tell it to do. The
minimal header is 28 bytes, I think. Most linkers add more...

What is a purpose of this header ?

Tells the loader what kind of executable format it is (no "MZ" sig in a
.com file, "MZ" in a dos .exe. "MZ" and "PE" in a Windows executable),
where the entrypoint is, how much memory it needs, how much stack... And
there's a list of addresses that need to be "relocated" (some amount is
added to each one.

Curiously, the dos loader doesn't care if the *name* is .com or .exe -
it goes *entirely* by the "MZ" signature (or not). "command.com" has
been an .exe for ages - they still call it .com for "historical reasons".

Is this header loaded into memory ?

Well... yes and no. The loader needs to read it into memory, but it
doesn't become part of your program. So for practical purposes, "no".

- there is a quite weird thing
because when I use debug.exe it looks like that only raw code is loaded
into memory because this "pure" code begins on 0000 offset.

That sounds right.

but, how does matter of COM file look like ?

Much simpler - it's exactly the code you write, nothing more.

when I use hex view program I can see only raw code.
but when I attempt to use debug.exe I see that this code is loaded into
100h offset.
So there are some questions.

Why does this code have 100h offset ?

The loader loads it at an offset of 100h into the segment it chooses. It
sets cs, ds, es, ss to this segment. (so you don't need to - can't - do
the "mov ax, data"/"mov ds, ax" bit). We need to inform Nasm that our
code will be loaded at 100h, so start a .com file with "org 100h".

- I figured out that this 100h
bytes is taken by some header
of COM files, which appears when the program is loaded into memory.

Right. It's called the "Program Segment Prefix" or "PSP". There's some
information in there that might be of interest... later. An .exe has a
PSP, too. When dos loads an MZ .exe, ds and es are pointed to the PSP -
that's why you have to load ds with your data segment in an .exe.

Can you help me to understand this ( i think my main problem is to
differ the header in file and header in memory ) ? I readed some piece
of papers about it, but still I can't understand. I will be very glad
if some can get insight to this.

There's info on the MZ header on Ben Lunt's site (Hi Ben!):

http://www.frontiernet.net/~fys/exehdr.htm

Poke around the rest of that site, too - lots of good info!

Donkey has given you some links to info about the "PE" header (which
starts off like an "MZ" header). You might want to study that instead.
You don't want to spend *too* much time learning dos. It's "dead", they
say. (of course, they say assembly language is "dead", too, and we know
it's not so! :)

Best,
Frank

I am very glad that there are still people who want to help others. I
think that I understand this quite good now ( main principles ), but I
am wondering on this PSP. It looks like that PSP takes some another
segment than the "pure" code of program and the only informations where
it is, are DS and ES register which are set a while after loaded code
into
memory. Am I right ? ( If it is true, I think that this PSP has 0000
offset relative to its segment )

.



Relevant Pages

  • Re: COM/EXE header problem
    ... some header for example. ... mov ax,0x4c00 ... I would expect a linker to warn about "no stack" and "no entrypoint". ... The first "stack" is just a name - could be "segment frank stack" just as easily - but the second "stack" is a segment "attribute", ...
    (alt.lang.asm)
  • Re: Nasm hello world with a .obj file and ALink
    ... > I have decided to use nasm and Alink to compile a .exe file. ... registers are all set to your one-and-only segment. ... mov ds, ax; can't move a number to ds directly ...
    (comp.lang.asm.x86)
  • Re: Help with Checksum and conversion to Exe format
    ... Convert it to an .exe than can be compiled by either Masm or Tasm ... you're doing the checksum over the buffer. ... data segment. ... mov ax, @data ...
    (alt.lang.asm)
  • Re: Addresses in memory
    ... Rounded up to the next page boundary *plus* the size of the .text section, including header. ... mov esi, 8048000h ... mov ecx, esp ... xor edx, edx ...
    (alt.lang.asm)
  • Re: Forth for Mac OS X Leopard (Intel) - what are the options?
    ... just put assemble the code to the text section instead of data ... put a 0x07 in byte 0x80 in the header. ... header for the text segment is probably unlikely to move, ... Apple's reference has all the structures, ...
    (comp.lang.forth)