How much does it take to execute MMX instruction?
- From: spamtrap@xxxxxxxxxx
- Date: 9 Jul 2006 09:23:04 -0700
Hello group,
I need to develop a highly optimized MMX based image processing
algorithm. From the Intel Optimization Manual I found worst case
instructions timings. It appears that instruction timings may vary from
execution to execution. It may not be significant problem if you are
not trying to squeeze every piece of performance available for your
application. If extreme performance is the primary goal, then you need
to use all available stuff to speed-up your calculation. The main
advantage could be achieved using instruction pairing in U and V
execution pipes. And here is the biggest contradiction I don't know how
to overcome. All instructions with memory operand may experience one or
two cycles penalty for L1 cache hit. Let's say we plan instruction
pairing with assumption that data will arrive one cycle later. This way
movq mm0,Variable ; 1U
paddw mm3,mm2 ; 1V
paddw mm6,mm7 ; 2U
psrlw mm5,3 ; 2V
paddw mm3,mm0 ; 3U - pitfall
psrlw mm2,3 ; 3V
This code snippet is only for demonstration of the issue. Subsequent
code highly depends on mm0 register value. If delay is more than 1
planned cycle then instruction marked as 3U will be stalled for
additional clock cycle, resulting in destroying of whole calculation
chain, because for cycle 4 there may be its own pair of instructions,
which may not pair with 3U addition. I understand that planning for
worst case latency may help, but early arrived data in conjunction with
out-of-order execution will result in the same type of issue.
Another issue is mixed optimization for Pentium 4 and for Pentium M.
Pentium M, in general, has latency one clock cycle less, than Pentium
4. This way code optimized for Pentium 4 will be executed on Pentium M
almost two times slower, because of broken instruction pairing.
Is there any bullet-proof strategy, which may help to overcome
described issue?
With best regards,
Vladimir S. Mirgorodsky
.
- Follow-Ups:
- Re: How much does it take to execute MMX instruction?
- From: jukka@xxxxxxxxxxxx
- Re: How much does it take to execute MMX instruction?
- Prev by Date: How to use Write Combining buffers?
- Next by Date: Re: OS in x86-64?
- Previous by thread: How to use Write Combining buffers?
- Next by thread: Re: How much does it take to execute MMX instruction?
- Index(es):
Relevant Pages
|
|