Re: SSE2
From: Phil Carmody (thefatphil_demunged_at_yahoo.co.uk)
Date: 01/27/04
- Next message: Bryan Parkoff: "RDTSC Is not Accuracy"
- Previous message: vivek: "Invalid instructions in 64bit code (x86-64/AMD64)"
- In reply to: Matt Taylor: "Re: SSE2"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 27 Jan 2004 17:13:19 +0000 (UTC)
"Matt Taylor" <para@tampabay.rr.com> writes:
> > I don't like halving the throughput by looking only at 64-bit MMX.
> > However, if SSE2, MMX and FP can all happily co-exist, then I might
> > try to have MMX, FPU, and int units taking on some of the mults if
> > it appears that the SSE2 unit is making the other units idle.
>
> SSE 2 is a possibility although it has higher latency. I usually favor MMX
> because the underlying implementation favors it. Perhaps with SSE 2 it may
> be worthwhile to pack/unpack to do 4 simultaneous 32x32 multiplies using
> pmullw/pmulhuw.
According to http://www.cen.uiuc.edu/~cjiang/reference/index.htm
pmullw and pmullhuw do 4 (MMX) or 8 (SSE) 16*16->16 multiplies,
not 32*32->32 ones. Oneof the things that I'm trying to do is minimise
the number movement instructions (unpacks, shuffles, as well as movs),
and forming a 32-bit result from 16*16->16 multiplies I think would be
too much effort.
> The best way to find out really is to time both sequences.
I don't own a machine on which I can run such sequences alas.
> > When I've got code that actually works, I'll post it here for forensics.
> > (May be a while, I don't have a machine I can test on!)
> >
> > Does anyone have any good ideas about
> > if(a<0) a+=b;
> > for 64 bit values, such that a is already in [63-0] of XMMn, and b isn't?
> > (I might just let the 32-bit int unit do this stage rather than idling.)
>
> Ah, if only there were a pcmpltq. If you already have a in an SSE register,
> I would keep it there. The latency of adc combined with moving data between
> register files is much worse than a pcmpltd with appropriate logic to handle
> 64-bits.
OK, I'll see if I ca scrape some magical cure-all instruction out of
the depths of that reference. I remember in the past seeing rants about
how intels SIMD instruction sets were half-baked, and had lots of
gaping holes, and that AltiVec was far more complete. Perhaps they were
right, I keep finding instructions that don't exist. Why, for example
does a 4-way float*float instruction exist, but not a 4-way int*int?
The latter would be like a gift from above presently...
Phil
-- Unpatched IE vulnerability: document.domain parent DNS resolver Description: Improper duality check leading to firewall breach Published: July 29 2002 Reference: http://online.securityfocus.com/archive/1/284908/2002-07-27/2002-08-02/0
- Next message: Bryan Parkoff: "RDTSC Is not Accuracy"
- Previous message: vivek: "Invalid instructions in 64bit code (x86-64/AMD64)"
- In reply to: Matt Taylor: "Re: SSE2"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|