Re: Combining two MMX registers into one SSE register?



Maarten Kronenburg schrieb:

> Also possible should be:
>
> MOVQ2DQ xmm0, mm0
> MOVQ2DQ xmm1, mm1
> MOVLHPS xmm0, xmm1
>
> See
> http://developer.intel.com/design/Pentium4/documentation.htm
> Manuals.
> Perhaps MOVLHPS is faster because why have the instruction otherwise?
> Maarten.

Yes, might be - specially if you like to operate on the xmm-registers
later as vector of four floats. Guess MOVLHPS tags the higher qword as
float-type, while it leaves the unchanged lower qword without explicte
{int, float, double}-type.
For later use as integer i think PUNPCKLQDQ is appropriate.
For xmm as temporary mmx-storage it might not care.

See:

Software Optimization Guide for
AMD Athlon? 64 and AMD Opteron? Processors

Appendix E SSE and SSE2 Optimizations 351
E.1 SSE and SSE2 Instruction and Data Types 353

- Gerd


.