LDDQU vs. MOVDQU
- From: spamtrap@xxxxxxxxxx
- Date: Fri, 28 Oct 2005 23:46:42 +0000 (UTC)
LDDQU is the load unaligned op in SSE3 that has the same interface as
the old MOVDQU of SSE2.
Questions:
1. Is this purely an implementation detail? If so why have a
distinctly different op rather than "upgrade" the older op when SSE3
came out?
2. All the (sparse) online docs say don't use LDDQU in a store-load
forwarding situation, use MOVDQU instead. I presume that if the intent
is to do pure streaming i.e. reading from x and storing into distinctly
different y (fire and forget), then LDDQU is the appropriate op?
3. The (sparse) online docs also say that LDDQU works better across
cache lines because it is 2 aligned loads + a realign, rather than 2
part loads lie MOVDQU. Why?
4. When you use LDDQU in a streaming sequential load, do I end up
with double the number of memory accesses (due to the implicit 2
aligned loads) or is the Intel wizardry saavy enough to factor out the
repeated loads?
I'm implementing cross-platform unaligned SIMD loads in macstl and want
to do The Right Thing (TM).
http://www.pixelglow.com/macstl/
Cheers
Glen Low, Pixelglow Software
www.pixelglow.com
.
- Prev by Date: Re: improve strlen
- Next by Date: Re: improve strlen
- Previous by thread: Stripping nops from an executable
- Next by thread: TEST and JS Question
- Index(es):
Relevant Pages
|