SSE help please....

From: Asfand Yar Qazi (im_not_giving_it_here_at_i_hate_spam.com)
Date: 04/13/04


Date: Tue, 13 Apr 2004 03:27:10 +0100

Hi,

I'm messing around with the built-in vector operations in GCC 3.3.3
(__builtin_ia32_addps, etc.) that generate SSE instructions. I'm not
experienced in these matters, so please forgive me if I say something silly.

I use the term 'slot' as follows: A 128-bit SSE/MMX register is made of
4 adjacent 32-bit _slots_

I was wondering if there was a way to do the following with SSE (SSE1 on
Pentium3, if anyone's wondering:)

Add all 4 32-bit floats in an SSE register and store them in a 32-bit
slot in some SSE register. (I'd like to implement matrix multiplication
like this.)

What is the difference between the movups and movaps instructions?
movaps deals with packed data, movups makes no assumption on alignment
(apparently). What does that mean? The GCC's builtin functions for
both instructions gave the same result.

Thanks for your patience,
        Asfand Yar

--
http://www.it-is-truth.org/


Relevant Pages

  • Re: Intel SSE sucks dogshit for 3D graphics
    ... > reasonable reference to try MMX or XMM code using the instructions you ... > which usually help. ... > a PIV for some time, there is a bit more info around to improve the ... And I'm speaking quite beyond SSE concerns. ...
    (comp.lang.asm.x86)
  • Re: Float/SSE optimization on Athlon/P4
    ... > SSE code I simply used scalar SSE instructions for the loop ... > a nasty surprise as speed dropped significantly on Athlon. ... > add esi, eax ...
    (comp.lang.asm.x86)
  • Re: Intel-x86 models TLB-information
    ... 1: Virtual 86 Mode Extensions ... Conditional Move & Compare Instructions ... SSE supports DenormalsAreZero ... least a thousand times in a single timeslice. ...
    (comp.arch)
  • Re: taking advantage of SSE
    ... > applet is using SSE instructions or not? ... > This loop seems like it could take advantage of SSE, ... instructions, I doubt that your applet will benefit from it. ... compiler, or rather the bytecode to native code translator does generate ...
    (comp.lang.java.programmer)
  • Re: builtin lists and Intel SSE support?
    ... >> 3) supports Intel SSE instructions in some manner ... > SSE is hardware support for various operations usually used in linear ... Looks like that's only integer SSE code though. ...
    (comp.lang.lisp)