Simple (I think) question I've having the most impossible time with...
- From: "ldb" <spamtrap@xxxxxxxxxx>
- Date: 15 Nov 2005 13:02:37 -0800
Essentially, I am writing a matrix-type operation in assembly for pure
speed. I've been reading the optimization guides, experimenting, and
making steady progress in cutting down the original naive code into a
fast routine. I went ahead and moved everything into aligned SSE
instructions, which resulted in a pretty large improvement.
Essentially, I load an SSE register with 4 floating points, do my math,
store it, repeat. Things improved very quickly. So I've moved on to
memory/cache optimization.
My 'test' program takes a very large array (~16mb), does some math to
it, and puts it into a similarly sized output array. I run this
subroutine 1000 times or so, to amortize the cost of reading in the
data and printing outputs, etc. The total run-time is ~60 seconds or
so.
Running under Linux, with the time command, I end up with near 50% of
the execution time in kernel mode. I surmise that this cannot be
optimal. From my own (naive) attempts to diagnose the culprit, I've
come to the conclusion that a very large (50+%) percentage of the time
in this loop is spent on the store instruction. That guess came from
alot comment-out voodoo, with my own understanding that alot of
parrallelism is lost when you comment out functionally independant
instructions.
So, my question is, what can I do to improve SSE store instructions? I
am currently using the MOVNTPS instruction. Is there any way to improve
this situation of doing a single store per iteration? Is there some
sort of prefetch that would help? Or, perhaps, is there something I can
do at a higher level in the memory hierarchy that is causing this
delay... perhaps page faults or TLB issues? I'm not really sure how to
even localize the problem to a particular area of memory. Or, for that
matter, I'm not really sure anything _is_ wrong... 50% of the execution
time being in kernel mode seems extremely poor to me, but maybe that's
how it's supposed to be?
.
- Follow-Ups:
- Prev by Date: Help pmode2realmode
- Next by Date: Re: strange invalid instruction operands with ML.EXE
- Previous by thread: Help pmode2realmode
- Next by thread: Re: Simple (I think) question I've having the most impossible time with...
- Index(es):
Relevant Pages
|