Re: Fixed Point Square Root Improvements?
From: Matt (spamtrap_at_crayne.org)
Date: 11/26/04
- Next message: Chris Williams: "Re: Fixed Point Square Root Improvements?"
- Previous message: Mark Gibson : "Re: [Clax86list] Re: Linux Assembler"
- In reply to: Chris Williams: "Fixed Point Square Root Improvements?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 26 Nov 2004 04:06:50 +0000 (UTC)
"Chris Williams" <spamtrap@crayne.org> wrote in message
news:da271e02.0411242014.31d3f168@posting.google.com...
[...]
> The C version appears to work about 2.2 times slower than C's sqrt()
> on a float and the assembly at about 1.5.
> Was just wondering if anyone had any recommendations? I am relatively
> pleased to have gotten that close to the standard library, but
> certainly wouldn't complain if I could get more speed and/or more
> decimal bits with a different approach.
I poked around not too long ago in search of the exact same. I have a
non-LUT solution which is about 2-2.5 times slower than sqrt() on my Athlon
with gcc -O3 -finline-functions. The actual speed depends on the data, and
2.5 seems to be the worst case. While poking around I also found an integer
sqrt() which ranges from 1.0 to 1.5 times slower than C's sqrt() function. I
have not taken the time to fully understand it, and thus I have no
fixed-point version. All of the algorithms in my code are credited to the
following page: http://www.azillionmonkeys.com/qed/sqroot.html
I compared Newton-Rhapson to these algorithms also, and I seem to recall
Newton-Rhapson being slower on average, but I'm not 100% certain. For 32-bit
values, you need 5 iterations of Newton-Rhapson to guarantee convergence.
Note that all the algorithms I am discussing give full precision. My
implementation of the square root functions follow:
// Size of mantissa in bits.
#define MANTISSA (10)
unsigned long lzc(unsigned long x)
{
#ifdef __GNUC__
asm("bsrl %1, %0" : "=r" (x) : "r" (x));
#else
_asm
{
mov eax, x
bsr eax, eax
mov x, eax
}
#endif /* __GNUC__ */
return x;
}
// Algorithm is based on polynomial arithmetic:
// 1) Compute initial guess using 2^(msb_index / 2)
// 2) Iterate over remaining bits
// 3) Set bit to 1 if (1 << bit_index) < (x - guess^2)
fix fixsqrt(fix x)
{
fix guess, gsqr, bsqr, gxb;
unsigned int i, bitmask;
if (x == 0)
return 0;
// Compute guess
i = (((lzc(x) - MANTISSA) / 2) + MANTISSA);
guess = 1 << i;
// Compute mask containing next bit to be checked
bitmask = (1 << i) >> 1;
// Cache guess^2, bitmask^2, and 2*guess*bitmask for fast polynomial
// eval.
gsqr = guess << (i - MANTISSA); // guess * guess
gxb = guess << (i - MANTISSA); // guess * bitmask * 2
bsqr = (bitmask << (i - MANTISSA)) >> 1; // bitmask * bitmask
// Determine remaining bits
do
{
// (guess + bitmask)^2 == guess^2 + 2*guess*bitmask + bitmask^2
// == gsqr + gxb + bsqr
if ((gsqr + gxb + bsqr) <= x)
{
guess |= bitmask;
gsqr += (gxb + bsqr);
gxb += (bsqr << 1);
}
bitmask >>= 1;
bsqr >>= 2;
gxb >>= 1;
}
while(bitmask != 0);
return guess;
}
// The integer square root function. Copied directly off the webpage;
// seems to pick 1 bit per iteration, but I haven't looked closely at it.
int isqrt(int x)
{
unsigned long temp, g=0, b, bshft;
bshft = lzc(x) / 2;
b = 1 << bshft;
do {
if (x >= (temp = (((g<<1)+b)<<bshft--))) {
g += b;
x -= temp;
}
} while (b >>= 1);
return g;
}
-Matt
- Next message: Chris Williams: "Re: Fixed Point Square Root Improvements?"
- Previous message: Mark Gibson : "Re: [Clax86list] Re: Linux Assembler"
- In reply to: Chris Williams: "Fixed Point Square Root Improvements?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|