Re: Kernel Calling Conventions
- From: "Sprunk, Stephen" <spamtrap@xxxxxxxxxx>
- Date: Thu, 23 Mar 2006 13:45:32 -0500
"Rod Pemberton" <spamtrap@xxxxxxxxxx> wrote in message
news:dvksfo$bob$1@xxxxxxxxxxxxxxxxxxxx
"Kroll" <spamtrap@xxxxxxxxxx> wrote in messagea
news:1142802844.123840.209440@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Greetings all,
I was reading the "FreeBSD Assembly Language Programming" tutorial
(http://www.int80h.org/bsdasm/), when I came across something that
piqued my curiosity.
The C calling convention is touted as being more convenient, and
superior to the calling convention used by linux and microsoft of
passing arguments within registers. I was wondering if this is just
bias on the part of the writer or what.
What are the actual advantages of popping the arguments onto to the
stack as opposed to passing them via registers?
I don't think one is necessarily better than the other. For the C
language, I'd say that the C calling convention is best. But, for
assembly programming, I think passing in registers is the more
convenient.
If you're trying to design a one-size-fits-all calling convention, the
main
consideration is the number of GPRs. x86 only has 6 usable GPRs (5 for
PIC)
during a call, so if you pass in registers most of the time the callee
will
have to immediately spill them; you might as well let the caller spill
them
and improve OOO performance. AMD64 has 14 usable GPRs, so the default
calling convention is to use registers and not the stack. This isn't
surprising.
In assembly, of course, you can customize the calling convention for
every
function, so you can determine what's most efficient in each case.
Syscalls have special requirements due to the challenges in having the
kernel access the user-mode stack, which is presumably why Linux uses
registers for syscalls when user code sticks with the stack (on x86 at
least).
Leaving them on the stack doesn't require you to save any registersthan
which may be an advantage. Also, you can always access data on
the stack via the stack pointer. But, if you're doing anything other
an extremely simple operation on the data, you'll want to move them
off the stack and into to registers for speed.
It'll cost you a couple cycles to pull the stack data back from L1
cache,
but there's probably just as many times you'll want to push the data
from a
register onto the stack to free the register. There's always cases
where
the general rule doesn't provide the best performance, and that's why
assembly survives.
When programming in C, one does not usually have direct access toare
the registers. And, when you do, you can't usually tell which ones
in use by the C compiler or tell the C compiler to "reoptimize" your
assembly code.
Some C compilers will let you specify generic registers in inline asm
and
will optimize the register usage for you. If you have to use particular
registers, most compilers will make a note of that and optimize their
register use around your asm code.
Also, a large amount of the speed that OpenWatcom compiler has
over the DJGPP GCC based compiler, is due to the fact that the OW
compiler makes heavy use of the registers, for both it's register and
stack calling conventions. DJGPP, on the other hand, is heavy on
stack usage, makes poor use of the registers, and frequently uses
low code density instructions. I think it does the latter on the
assumption that faster instructions are faster without considering the
effects of code bloat on the reloading of the caches.
One thing to note is that if you want your code to be compatible with
code
compiled by other people, which will nearly always be in the platform's
standard ABI. If you don't like the choice the ABI authors made, you
can
override the calling convention, provided you compile both the caller
and
callee code.
GCC (and thus DJGPP) always defaults to the official ABI. If OpenWatcom
does not, calling external code (or having it call you) will require a
lot
of hacks unless that external code was also compiled with OW.
I won't dispute that GCC's register allocation and instruction choice
may
not be as good as OW's. ICC is definitely better. Then again, GCC
explicitly puts portability over performance, so it's not surprising a
compiler that works only on one or two archs turns out better code than
one
that works on 20+.
S
--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin
- Follow-Ups:
- Re: Kernel Calling Conventions
- From: Rod Pemberton
- Re: Kernel Calling Conventions
- Prev by Date: Re: Kernel Calling Conventions
- Next by Date: Re: ld dynamic linker x86-64 ?!
- Previous by thread: Re: Kernel Calling Conventions
- Next by thread: Re: Kernel Calling Conventions
- Index(es):
Relevant Pages
|