Re: SSE2 half as fast as it should be?
- From: André Kempe <spamtrap@xxxxxxxxxx>
- Date: Sat, 22 Apr 2006 10:07:39 +0200
spamtrap@xxxxxxxxxx wrote:
Of course this stuff only works if sse, x86, and mmx registers do not
affect each other's throughputs, but from what I read I think they are
independent.
Please let me know if you know of any throughput dependencies between
x86, sse2 and mmx. Also, let me know if any 64-bit instructions
actually decode to 2 micro ops, because that would really destroy
things. If this is a unique endeavor then hopefully it will get
published.
you should have a look at Fog Agner's "ow to opimize for the pentium family ..."( http://www.agner.org/assem/ ), there he explains the sharing of execution-units between mmx and sse(2/3), instruction latency and lots of other stuff.
personally i do not think it is worth the effort to split execution between mmx and sse-registers. this is due to differences in register-sizes ( 64 vs. 128 bit ), which will make it hard to keep data aligned as required and will make another execution-path necessary. which will result in code much harder to maintain.
and using to much code wihtin a loop will render all your careful decoder-throughput-optimizations useless. once decoded, the micro-ops will be kept in trace-cash. if you use to much instructions in a loop, this cash gets trashed, resulting in decoding in each and every loop-iteration.
and this highly elaborated optimizations will make your code highly dependend on a specific processor. optimum performance is also effected by cache-sizes, size of a cache-lines and so on. all this you'd have to keep in mind when writing youre code. whenever you use your code on a different machine, well, you're going to have a problem.
greetings,
andre
.
- References:
- SSE2 half as fast as it should be?
- From: spamtrap
- Re: SSE2 half as fast as it should be?
- From: Maarten Kronenburg
- Re: SSE2 half as fast as it should be?
- From: spamtrap
- SSE2 half as fast as it should be?
- Prev by Date: Re: Population Count in SSE2
- Next by Date: Re: Linux assembly
- Previous by thread: Re: SSE2 half as fast as it should be?
- Next by thread: Re: SSE2 half as fast as it should be?
- Index(es):
Relevant Pages
|