SSE2 advice

From: Phil Carmody (thefatphil_demunged_at_yahoo.co.uk)
Date: 01/25/04


Date: Sun, 25 Jan 2004 19:50:00 +0000 (UTC)

I have absolutely no experience of programming for the more
modern x86 features, and would like a few handy hits please.

I'm going to be doing lots of 32-bit unsigned multiplies,
and fortunately I want to do them in blocks of 4.

Which approach is to be prefered - loading all values
into registers, or multiplying by values in memory?

e.g.
movdqa xmm0, [a64]
movdqa xmm1, [b64]
movdqa xmm2, [i64]
movdqa xmm3, [j64]
pmuludq xmm0, xmm1
pmuludq xmm2, xmm3
movdqa [c64], xmm0
movdqa [k64], xmm2

verses:
movdqa xmm0, [a64]
movdqa xmm2, [i64]
pmuludq xmm0, [b64]
pmuludq xmm2, [j64]
movdqa [c64], xmm0
movdqa [k64], xmm2

I intend to be doing this at the same point as FP calculations,
and integer housekeeping code too, so they probably won't be as
packed as tightly as that in reality. I'd rather do the latter
(m128 operand), as I think I'll be wanting to unroll the whole
loop a fair bit, and using only half the registers looks like it
will facilitate that. Everything can be assumed to be in the L1
cache, if that matters.

Cheers,

Phil

-- 
Unpatched IE vulnerability: Alexa Related Privacy Disclosure
Description: Unintended disclosure of private information when 
             using the Related feature
Reference: http://www.secunia.com/advisories/8955/
Reference: http://www.imilly.com/alexa.htm

Quantcast