Re: Cost of calling a standard library function

From: Frank Kotler (fbkotler_at_comcast.net)
Date: 03/04/04


Date: Thu, 04 Mar 2004 01:30:30 GMT

Beth wrote:

>> sub ecx 1
>
>
> Why "sub ecx 1" rather than "dec ecx"?

If I understand the explainations others have provided, "dec" affects
the flags in a "complicated" manner, and "sub ecx, 1" is preferred "on
some processors"... I wonder how "lea ecx, [ecx - 1]" would stack up?
Wouldn't set flags at all, so a "cmp" would be needed... probably not...

> It would be shorter (just a
> single byte opcode :) and doesn't need the access for the immediate of
> "1" stored inside the instruction (stored as byte or dword?

That's an interesting question.

00000000 81E901000000 sub ecx,0x1
00000006 83E901 sub ecx,byte +0x1

Nasm's default behavior is to emit the long form on "sub ecx, 1", but
allow the short form to be selected by "sub ecx, byte 1". "Most
assemblers" give the short form "if it'll fit". Dunno what RosAsm does
with it. (a "good" assembler should give the user *some* way to specify
- doesn't much matter what it is - IMO)

I recently encountered some code in a macro file for Nasm that emitted
"byte" if the operand was less than 254 (!). For the record, a "signed
byte" is -128 thru +127... You *can* "push byte 255" (Nasm will warn,
but do it), but what's on your stack is "-1", which in 32-bits is *not*
the same as 255! As long as you're *sure* that the upper bits will be
ignored, you can use it...

...
[alignment]
>>>>inserts the needed number of NOPs to reach the desired boundary.
>
>
> If NOPs number If the NOPs number is greater than 4, the two first
> NOPs are replaced by a short JMP.

It's interesting that Gas uses a number of longer "do nothing"
instructions for alignment padding. ("nop" decodes as "xchg eax, eax")

            8DB42600000000 lea esi,[esi+0x0]
            8DB600000000 lea esi,[esi+0x0]
            8D742600 lea esi,[esi+0x0]
            8D7600 lea esi,[esi+0x0]
            8D36 lea esi,[esi]
            90 nop

... and issues the "jmp" only for 14 byte or greater padding... Faster,
I guess... (this exists as "palign" in Nasm 0.98.24p1 and in no other
version, AFAIK...)

Best,
Frank