Intel SSE sucks dog*** for 3D graphics
From: Brandon J. Van Every (try_vanevery_at_mycompanyname_at_yahoo.com)
Date: 01/23/04
- Next message: Brandon J. Van Every: "convenient list syntax + SSE support?"
- Previous message: Matt Taylor: "Re: How to measure my application speed ?"
- Next in thread: Jack Klein: "Re: Intel SSE sucks dog*** for 3D graphics"
- Reply: Jack Klein: "Re: Intel SSE sucks dog*** for 3D graphics"
- Reply: George Buyanovsky: "Re: Intel SSE sucks dog*** for 3D graphics"
- Reply: Matt Taylor: "Re: Intel SSE sucks dog*** for 3D graphics"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 23 Jan 2004 22:51:45 +0000 (UTC)
I want to know if any of you have found Intel SSE to be of real benefit in
3D graphics applications. I'm happy to be proven wrong, and happy to hear
that I've missed something.
I say, it's a lousy, braindead set of instructions for 3D graphics. Why?
Because they're high latency. Because it's hard to get at individual scalar
fields, and that's often needed in computational geometry problems. Because
SSE and SSE2 lack a "sum of scalar fields" instruction. You have to do a
lot of gymnastics to perform a dot product, and that means a lot of latency.
Dot products are typically used for culling tests, which means I want the
answer *now*. Not 20 cycles from now. Also, I usually don't have a lot to
do while waiting for the results of my dot product. That's just how the 3D
graphics cookie crumbles in my experience. You can't mask the long
latencies of SSE when there isn't all that much to do for any one test.
Prescott New Instructions will partially amend this situation with the
horizontal add instruction, HADDPS. You'll need 2 of these in succession to
perform a sum of scalar fields. Still latent, just not as latent. Anyone
want to take bets on what the latency of a single HADDPS will be? I bet
it's not 4, but 6 cycles. Plus, you won't be able to rely on the existence
of PNI for a loooooong time. You'll still be punting with SSE.
I think SSE was designed for image processing and multimedia, i.e. codecs.
Problems where you've got a lot of highly redundant calculations to perform
on streams of pixels. It wasn't designed for computational geometry. CG
problems have a lot of interaction between vector and scalar
representations, and a proper architecture would make it easier to get at
the fields of a vector to perform various tests.
Anyways, before you rant about what a *** I am for pointing out these flaws
(well it may be too late, you may be already counter-ranting about how
wonderful SSE is), let's bottom line this. I want to hear about your
*TANGIBLE BENCHMARKS* where SSE sped up your 3D application. I want to hear
what kind of problem you were tackling, and what you were doing before that
was going slow. Because if you had bad code before, and SSE wasn't what
turned it into good code, let's hear about that devil of detail. Rewriting
things can usually improve performance, because your algorithm, approach, or
idiom has more thought behind it. But let's hear how rewriting in *SSE*
improved your situation.
FWIW, my tangible benchmarks didn't do ***. All I got were latencies and
hassles about which register I was currently using. Better off with the
cheesy X87 FPU.
I found an impressive set of lies about SSE at
www.intel.com/update/departments/software/sw03011.pdf
"Using SSE and SSE2: Misconceptions and Reality"
Impressive, because the truth is told in each gripe, then Intel labels each
gripe a "misconception," then proceeds to spin a "reality" that's utter
bull***. Like, telling me that instead of doing all my vector processing
in (x y z w) form that I should do (x x x x) (y y y y) (z z z z) (w w w w),
because Intel adds are vertical. Like I'm going to rewrite my entire app
with discontiguous memory for Intel's benefit! Hey assholes, why didn't you
build a proper horizontal add capability into your fucking chip?? "Oh, no
problem, just rewrite all your code in our idioms because we screwed up.
After all, doing everything in a clunky, Intel-specific way is a virtue, it
sells more of our CPUs!"
--
Cheers, www.indiegamedesign.com
Brandon Van Every Seattle, WA
"The pioneer is the one with the arrows in his back."
- anonymous entrepreneur
- Next message: Brandon J. Van Every: "convenient list syntax + SSE support?"
- Previous message: Matt Taylor: "Re: How to measure my application speed ?"
- Next in thread: Jack Klein: "Re: Intel SSE sucks dog*** for 3D graphics"
- Reply: Jack Klein: "Re: Intel SSE sucks dog*** for 3D graphics"
- Reply: George Buyanovsky: "Re: Intel SSE sucks dog*** for 3D graphics"
- Reply: Matt Taylor: "Re: Intel SSE sucks dog*** for 3D graphics"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]