Re: Kernel Calling Conventions



"Rod Pemberton" <spamtrap@xxxxxxxxxx> wrote in message
news:dvksfo$bob$1@xxxxxxxxxxxxxxxxxxxx
"Kroll" <spamtrap@xxxxxxxxxx> wrote in message
news:1142802844.123840.209440@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Greetings all,
I was reading the "FreeBSD Assembly Language Programming" tutorial
(http://www.int80h.org/bsdasm/), when I came across something that
piqued my curiosity.

The C calling convention is touted as being more convenient, and
superior to the calling convention used by linux and microsoft of
passing arguments within registers. I was wondering if this is just
a
bias on the part of the writer or what.

What are the actual advantages of popping the arguments onto to the
stack as opposed to passing them via registers?

I don't think one is necessarily better than the other. For the C
language, I'd say that the C calling convention is best. But, for
assembly programming, I think passing in registers is the more
convenient.

If you're trying to design a one-size-fits-all calling convention, the
main
consideration is the number of GPRs. x86 only has 6 usable GPRs (5 for
PIC)
during a call, so if you pass in registers most of the time the callee
will
have to immediately spill them; you might as well let the caller spill
them
and improve OOO performance. AMD64 has 14 usable GPRs, so the default
calling convention is to use registers and not the stack. This isn't
surprising.

In assembly, of course, you can customize the calling convention for
every
function, so you can determine what's most efficient in each case.

Syscalls have special requirements due to the challenges in having the
kernel access the user-mode stack, which is presumably why Linux uses
registers for syscalls when user code sticks with the stack (on x86 at
least).

Leaving them on the stack doesn't require you to save any registers
which may be an advantage. Also, you can always access data on
the stack via the stack pointer. But, if you're doing anything other
than
an extremely simple operation on the data, you'll want to move them
off the stack and into to registers for speed.

It'll cost you a couple cycles to pull the stack data back from L1
cache,
but there's probably just as many times you'll want to push the data
from a
register onto the stack to free the register. There's always cases
where
the general rule doesn't provide the best performance, and that's why
assembly survives.

When programming in C, one does not usually have direct access to
the registers. And, when you do, you can't usually tell which ones
are
in use by the C compiler or tell the C compiler to "reoptimize" your
assembly code.

Some C compilers will let you specify generic registers in inline asm
and
will optimize the register usage for you. If you have to use particular

registers, most compilers will make a note of that and optimize their
register use around your asm code.

Also, a large amount of the speed that OpenWatcom compiler has
over the DJGPP GCC based compiler, is due to the fact that the OW
compiler makes heavy use of the registers, for both it's register and
stack calling conventions. DJGPP, on the other hand, is heavy on
stack usage, makes poor use of the registers, and frequently uses
low code density instructions. I think it does the latter on the
assumption that faster instructions are faster without considering the
effects of code bloat on the reloading of the caches.

One thing to note is that if you want your code to be compatible with
code
compiled by other people, which will nearly always be in the platform's
standard ABI. If you don't like the choice the ABI authors made, you
can
override the calling convention, provided you compile both the caller
and
callee code.

GCC (and thus DJGPP) always defaults to the official ABI. If OpenWatcom

does not, calling external code (or having it call you) will require a
lot
of hacks unless that external code was also compiled with OW.

I won't dispute that GCC's register allocation and instruction choice
may
not be as good as OW's. ICC is definitely better. Then again, GCC
explicitly puts portability over performance, so it's not surprising a
compiler that works only on one or two archs turns out better code than
one
that works on 20+.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


Relevant Pages

  • Re: [OT] PostLisp, a language experiment
    ... >> that a C compiler generates code to arrange the parameters where it ... > value from the user stack. ... > can be simply passed in registers. ... common for the top stack item to reside in a register, ...
    (comp.lang.lisp)
  • Re: next mystery: 32 vs 64 bits...
    ... single pass compiler to use ret n for loops and stackframe. ... namely, they are no longer regarded as 'registers', even for the stack. ... With the dependence of the x86 instruction set on EAX, ...
    (alt.lang.asm)
  • Re: x86-64 and calling conventions
    ... and there is no particular relation between the registers and the ... Beyond Just Putting A Big Damper On The Compiler Machinery, ... Win32 is my primary target, with Linux as secondary ... either to break the calling convention or violate threading). ...
    (comp.compilers)
  • Re: why this program is not crashing
    ... >> compiler would need to stack it before the call to printf. ... If the args were in registers, you'd need either a need a wierd ...
    (comp.lang.c)
  • Re: printf doubt
    ... If the format is exhausted while arguments remain, ... same stack, like the old 6502, manage this. ... At least one popular compiler does not. ... The __stdcall calling convention is used to call Win32 API functions. ...
    (comp.lang.c)