Re: ESI as string iterator
- From: Frank Kotler <fbkotler@xxxxxxxxxxx>
- Date: Sun, 12 Feb 2006 13:03:02 -0500
James Daughtry wrote:
Can I not push a character pointed to by ESI onto the stack?
Well... no. Not just the one character - but it'll work, the character you want will just have some company.
Pushes come in two sizes, word and dword. Nasm's "push byte" syntax is potentially misleading. Confuses hell out of non-Nasm users - they think we're butchering the stack royally! Not to worry, there's no instruction that pushes a byte. There *is* a form of "push immediate" which *stores* the operand as a byte, but it's "sign extended" to a word or dword which is pushed. Most assemblers, if they see "push 0" will automatically see that "0" is within signed byte range (-128 thru +127), and will generate the short form. Nasm, in its infinite wisdom, defaults to the long form, using four (or two, in "bits 16") bytes to store the parameter. "push byte" gives us the short form. This applies *only* to immediate operands - there really isn't a "push byte [esi]" instruction - Nasm's not messin' with ya.
You can get smarter code from Nasm by using the "-O" switch (uppercase 'o', not zero!). "-O1" will get you the signed byte form of instructions that have it (add, adc,... most of the "arithmetic" instructions, and "push immediate"), if the operand fits. Larger "n" to "-On" gives n passes (which allows Nasm to optimize jump displacements). "-O2" or "-O3", as you're probably used to using, are *not* enough ("-O2" currently silently produces bad code sometimes... maybe always). I usually use "-O999" if I use "-O" at all - more won't hurt, it's a *maximum* number of extra passes, Nasm quits when it's done. I don't think you need to worry about it, at this stage...
section .data
mytest: db 'This is a test',10,0
section .text
global _main
_main:
mov esi,mytest
push byte [esi]
I think if you change this to "push dword [esi]", it'll work. "_putchar" will only access one byte, ignoring the other three. "inc esi" or "add esi, 1" (or "add esi, byte 1" if you want the short form - again, I'd ignore this for now) to get to the next character.
It occurs to me that there's a potential problem here... If the end of the string were butted up against the end of "valid" memory, "dword [esi]" might be trying to access memory that isn't "there", causing a segfault (or "bus error", perhaps). I don't know the "right" way to avoid this - pad the string with some extra zeros, I suppose(?). Anyway, it's not going to happen with this code...
add esp,4
mov eax,0
ret
; end _main
I get an error telling me "invalid combination of opcode and operands".
Under the assumption that I still don't understand addressing in
assembly, I tried it without the size specification and square
brackets:
section .data
mytest: db 'This is a test',10,0
section .text
global _main
extern _putchar
_main:
mov esi,mytest
push esi
call _putchar
add esp,4
mov eax,0
ret
; end _main
But I don't get any output at all. :-P
This is going to push the address/offset of your string. It's loaded aligned to... 4K?... so the lower bytes of the address are likely zero - so you won't see any output. If you had a few strings, so that the low byte of the address happened to be a printable character, you'd see output - but not what you want.
Incidentally, since you alter esi, you really should push it at the beginning of "main" and pop it before the "ret". Again, the registers the "Intel ABI" cares about are ebx, esi, edi, ebp, and esp. This would apply to libc and the Windows API (but not dos or Linux interrupts). If you're calling them, you can expect 'em to return those registers unaltered. If they're calling you ("main" is called), they expect you to return 'em as you found 'em. Other registers, you may alter, and you can expect 'em to be trashed by anything you call. The result or status is returned in eax. If the return type is "void" - you're not returning anything meaningful - eax is "scratch", too.
When Jeff Duntemann was preparing the second edition of "Assembly Language Step By Step", he ran across a problem which he posted here. A Linux example, which interfaced with C. He'd altered ebx, and not saved/restored it. This worked okay on older versions of Linux/libc, where he apparently developed it, but mysteriously segfaulted on newer versions. Saving/restoring ebx solved it. So... while your code may work with an altered esi, it isn't really "right", and might fail with other implementations of C. This *doesn't* mean that you shouldn't use these registers, just put 'em back the way you found 'em. Some people use "pushad"/"popad" for this purpose - "overkill", perhaps, but it does no harm.
Disclaimer: This is "AFAIK", I'm not really very good with either C or the Windows APIs. (whats the difference between "putchar" and "putc", if any?)
Best,
Frank
.
- Follow-Ups:
- Re: ESI as string iterator
- From: Betov
- Re: ESI as string iterator
- References:
- ESI as string iterator
- From: James Daughtry
- ESI as string iterator
- Prev by Date: Re: ESI as string iterator
- Next by Date: Re: Confusing stack effects
- Previous by thread: Re: ESI as string iterator
- Next by thread: Re: ESI as string iterator
- Index(es):
Relevant Pages
|