What's faster?



octMove:
movdqa xmm1, [edi]

Blah, blah, blah

movdqa [edi], xmm1
add edi, 16

loop octMove


or


octMove:
movdqa xmm1, [edi]
add edi, 16

Blah, blah, blah

movdqa [edi-16], xmm1

loop octMove


I know there are latency issue with using an address register just
after setting it. Does the loop soak that up?

Thanks!

-- Rich Fife --

.