Re: Intel SSE sucks dogshit for 3D graphics

From: Brandon J. Van Every (try_vanevery_at_mycompanyname_at_yahoo.com)
Date: 01/26/04


Date: Mon, 26 Jan 2004 09:18:41 +0000 (UTC)

hutch-- wrote:
> Brandon,
>
> I have had a read through the postings and there are a number of
> things that probably need to be addressed. As long as you are feeding
> the code through a compiler, you have an additional layer to evaluate
> to track down why you are not getting the performance you require.
>
> What I would be inclined to do is start with an assembler module as
> long as your C compiler allows it and code an full integer code
> version of what you require. Do all of the normal optimisations on it,
> align labels where you can, try for conditional jumps in their normal
> backwards prediction etc ... until you get it as fast as you can.

Yeah yeah... not the problem.

> When you have it optimised to the PIII you are using, you have a
> reasonable reference to try MMX or XMM code using the instructions you
> want to try out. I have generally found that memory bandwidth is the
> final limitation that makes a good algo slower than it should be but
> SSE does have a few write instructions that avoid cache pollution
> which usually help.

Those didn't help in my case. Can't quite quote you chapter and verse on
whether they should have, it's been awhile and the problem is not fresh in
my mind.

> I used to develop on a PIII 600 and found it was piggishly hard to get
> optimum code that worked on other processors well but after working on
> a PIV for some time, there is a bit more info around to improve the
> averages on more processors. I was more pissed off with the change in
> shifts, rotations and LEA from the PIII to the PIV as it involved a
> diferent set of optimisations to get it right and it does not work as
> well on older processors.

I've decided that if I'm coding ASM, I've lost the forest for the sake of
the trees. And I'm speaking quite beyond SSE concerns.

-- 
Cheers,                     www.indiegamedesign.com
Brandon Van Every           Seattle, WA
20% of the world is real.
80% is gobbledygook we make up inside our own heads.


Relevant Pages

  • Re: MOVLHPS & MOVLHPS Not Actually Floating Pointer Instructions
    ... because they (Intel manuals) substitute most ... *pd instructions take an extra byte ... It is conceivable that some future SSE implementation will ... that MOVHL-PS-name (SIMD technology) shares the same _category_ of MAX-PS-name, ...
    (comp.lang.asm.x86)
  • Re: Float/SSE optimization on Athlon/P4
    ... > SSE code I simply used scalar SSE instructions for the loop ... > a nasty surprise as speed dropped significantly on Athlon. ... > add esi, eax ...
    (comp.lang.asm.x86)
  • Re: Intel-x86 models TLB-information
    ... 1: Virtual 86 Mode Extensions ... Conditional Move & Compare Instructions ... SSE supports DenormalsAreZero ... least a thousand times in a single timeslice. ...
    (comp.arch)
  • Re: taking advantage of SSE
    ... > applet is using SSE instructions or not? ... > This loop seems like it could take advantage of SSE, ... instructions, I doubt that your applet will benefit from it. ... compiler, or rather the bytecode to native code translator does generate ...
    (comp.lang.java.programmer)
  • Re: testing SSE(2) logical operation results
    ... You should get the Intel manuals for the PIV for all instructions. ... quick look does not show that it sets the zero flag like some of the ...
    (comp.lang.asm.x86)