Re: Stack and performance
- From: Nathan Moore <spamtrap@xxxxxxxxxx>
- Date: Fri, 1 Apr 2005 20:20:07 +0000 (UTC)
c0d1f1ed wrote:
Hi all,
I am totally puzzled by a weird phenomenon in my code. My debug version is twice as fast as my release version (Visual C++)! What's even stranger; adding an offset to the stack pointer solves the problem. Adding a 64-byte offset results in serious performance degradation again.
Is it possible that a function executes two times slower when the stack starts at another address? The function I'm using is quite big and takes 99% of execution time. The stack frame is ~600 bytes and 16-byte aligned. My Pentium M has 64-byte L1 cache lines but it's 8-way associative and 32 kB so I seen no reason for cache thrashing effects.
Any possible explanations?
Thanks,
Nicolas Capens
2 things that come to mind but might not really matter:
if the data on the stack is moved around so that important stuff all fits comfortably in it in the fewest cache lines, then that could
account for some performance boost. It may be that the stack of the
important function is starting out at a (almost) cache alligned state for the debugging version and in the non-debugging version it is starting out with an improtant piece of data being the last thing in an otherwise unused cache line. There are other cache issues that might benifit by rearranging the declarations or some of the other code. I'm not sure how concious VC++ is of cache issues but even so it could be missing something that rearranging would cure.
Similar issues may arrise from virtual memory and paging.
Paging and cache issues seem to jump up and bight hard without much warning in some instances. If either of these are the case, both versions of the program should have similar performance when other nontrivial programs are ACTIVE at the same time, because they will cause
paging and cache swaps that should make the others non-issues.
The other thing is the frame pointer (ebp). It is almost always used when compiling for debugging, but is sometimes as an extra general purpose register when compiling without debug settings. I'm not the most knowledgable on such things, but I didn't think that it was a big issue for x86 to use large offsets from the stack pointer, but on some architectures it is. You could force the use of the frame pointer by
calling alloca(0) if the compiler isn't smart enough to optimise it away or you are smart enough to pass it a 0 that the compiler can't determine is a 0 (external global variable set to 0 or something like that).
Test and see.
Nathan
.
- Follow-Ups:
- Re: Stack and performance
- From: c0d1f1ed
- Re: Stack and performance
- References:
- Stack and performance
- From: c0d1f1ed
- Stack and performance
- Prev by Date: Re: Emulating DOS/x86 Protected Mode: AIIEE! THE TENTACLES!
- Next by Date: Re: listing sub directories of a directory using hla
- Previous by thread: Stack and performance
- Next by thread: Re: Stack and performance
- Index(es):
Relevant Pages
|
|