Re: Buffers in Assembly (NASM)
- From: Frank Kotler <spamtrap@xxxxxxxxxx>
- Date: Sat, 19 Jul 2008 15:28:21 GMT
bwaichu@xxxxxxxxx wrote:
I'm trying to better understand data structures in assembly. I know I
can create a zero filled buffer in the bss section in NASM with this:
buffer: times 64 db 0
Hi Brian,
Well... In -f obj (OMF) output format, Nasm will tolerate this - since it doesn't know what ".bss" means. In (all?) other formats, this will generate a warning (64 of 'em, actually) "attempt to initialize memory in a nobits section: ignored". In an uninitialized (.bss) section, there's "nothing there", so it would be "conceptually impossible" for Nasm to zero it. That would be the right way to do it in an initialized (.data) section.
In executable formats *other* than dos .com files, the loader - not Nasm - zeros the .bss section. Some "standard"... the "Intel ABI" I think, requires it... I think... I'm not sure whether we're "supposed to" count on this or not. Seems to me like an "error" to ASSume anything about memory that we haven't explicitly initialized.
In a dos .com file, the .bss section is truly uninitialized - whatever "garbage" was there, stays there. We can demonstrate this:
org 100h
section .bss
answer resb 1
section .text
cmp byte [answer], 42
jz printyes
mov al, 'N'
int 29h
jmp common
printyes:
mov al, 'Y'
int 29h
common:
mov byte [answer], 42
ret
First time you run this, it (probably) will print 'N'. run it again - without doing anything to mess with memory - and it'll print 'Y'. (untested, but that's the way I remember it)
I'm a little unclear what Nasm does with "resb" (and friends) in -f obj output format. It *seems* that if your uninitialized data is collected at the end of your source, it does not become part of the on-disk file. If initialized data follows it, Nasm silently zero-fills it, and it *does* add to the file size. It may depend on the linker, as well.
[if anyone knows/remembers details of the OMF format, the nasm-development team is looking for "verification" of a proposed patch]
And I know I can create a buffer on the stack like this:
sub esp, 64
mov ebx, esp ;save the start point of the buffer
But how do I zero out the buffer on the stack? In C, I would just do
something like:
char buffer[64] = {0};
What's the equivalent in assembly using NASM?
Call memset? That's the way C does it...
I'd probably do, depending on mood...
BUFSIZ equ 64 ; damn well *better* be a multiple of 4!!!
....
sub esp, BUFSIZ
mov ebx, esp ; you said to...
mov ecx, BUFSIZ - 4
xor eax, eax
..my_memset:
mov [esp + ecx], eax
sub ecx, byte 4
jns .my_memset
....
I'm not saying that's a "good" way to do it - or even right (untested... I should know better...), but I'd probably do something like that... Possibly something involving "rep stosd"...
How would gcc handle it? Depends on version - and switches - I'm sure. Here's *one* way gcc does it:
(suppose I'd better post the source... this is just "junk" that I added the "={0}" to...)
#include <stdio.h>
#include <unistd.h>
int main()
{
char name[80] = {0};
int name_len;
printf ("Please tell me your name? ");
/* fflush(stdout); */
name_len = read (0, name, 79);
name[name_len - 1] = 0;
/* gets (name); */ /* this *does* flush stdout! */
/* we know better than to use gets(), right? */
printf ("Hello, %s! Welcome to Linux Assembly!\n", name);
return 0;
}
Here's what "objdump -d" thinks of "main":
080483d0 <main>:
80483d0: 8d 4c 24 04 lea 0x4(%esp),%ecx
80483d4: 83 e4 f0 and $0xfffffff0,%esp
80483d7: ff 71 fc pushl -0x4(%ecx)
Hmmm... Align the stack and "re-push" the return address?
80483da: 55 push %ebp
80483db: 89 e5 mov %esp,%ebp
80483dd: 83 ec 68 sub $0x68,%esp
Note that this is more than the 0x50 bytes in our buffer.
80483e0: 89 5d fc mov %ebx,-0x4(%ebp)
Save caller's reg? Instead of "push ebx"?
80483e3: 8d 5d a8 lea -0x58(%ebp),%ebx
Address of our buffer?
80483e6: 89 4d f8 mov %ecx,-0x8(%ebp)
Lemme see... ecx was initial esp + 4... address of "argc"??? WTF???
80483e9: 89 1c 24 mov %ebx,(%esp)
80483ec: c7 44 24 08 50 00 00 movl $0x50,0x8(%esp)
80483f3: 00
80483f4: c7 44 24 04 00 00 00 movl $0x0,0x4(%esp)
80483fb: 00
Here, I believe we're "pushing without push" the parameters for memset onto the stack.
80483fc: e8 c7 fe ff ff call 80482c8 <memset@plt>
.... and that's how we zero the buffer...
8048401: c7 04 24 54 85 04 08 movl $0x8048554,(%esp)
8048408: e8 eb fe ff ff call 80482f8 <printf@plt>
"not-push" the (static) address of "please tell me..." and print it
804840d: 89 5c 24 04 mov %ebx,0x4(%esp)
8048411: c7 44 24 08 4f 00 00 movl $0x4f,0x8(%esp)
8048418: 00
8048419: c7 04 24 00 00 00 00 movl $0x0,(%esp)
8048420: e8 c3 fe ff ff call 80482e8 <read@plt>
"not-push" our buffer address (in ebx), length (-1), and "stdin", call read...
8048425: c6 44 05 a7 00 movb $0x0,-0x59(%ebp,%eax,1)
Zero-terminate the string.
804842a: 89 5c 24 04 mov %ebx,0x4(%esp)
804842e: c7 04 24 70 85 04 08 movl $0x8048570,(%esp)
8048435: e8 be fe ff ff call 80482f8 <printf@plt>
"not-push" our buffer, format string, call printf...
804843a: 8b 4d f8 mov -0x8(%ebp),%ecx
Get that "address of argc" back into ecx...
804843d: 31 c0 xor %eax,%eax
"return 0".
804843f: 8b 5d fc mov -0x4(%ebp),%ebx
Restore caller's reg?
8048442: 89 ec mov %ebp,%esp
Get back our "aligned stack"...
8048444: 5d pop %ebp
Restore caller's reg.
8048445: 8d 61 fc lea -0x4(%ecx),%esp
Restore our "original stack".
8048448: c3 ret
Whew!
Why? 'Cause that's the way the compiler writer wanted it, I guess. Maybe I should have looked at something with *just* that buffer in it...
Best,
Frank
.
- Follow-Ups:
- Re: Buffers in Assembly (NASM)
- From: bwaichu@xxxxxxxxx
- Re: Buffers in Assembly (NASM)
- From: ArarghMail807NOSPAM
- Re: Buffers in Assembly (NASM)
- References:
- Buffers in Assembly (NASM)
- From: bwaichu@xxxxxxxxx
- Buffers in Assembly (NASM)
- Prev by Date: Re: Buffers in Assembly (NASM)
- Next by Date: Re: Buffers in Assembly (NASM)
- Previous by thread: Re: Buffers in Assembly (NASM)
- Next by thread: Re: Buffers in Assembly (NASM)
- Index(es):
Relevant Pages
|