Re: Fixed Point Square Root Improvements?

From: Matt (spamtrap_at_crayne.org)
Date: 11/26/04


Date: Fri, 26 Nov 2004 04:06:50 +0000 (UTC)


"Chris Williams" <spamtrap@crayne.org> wrote in message
news:da271e02.0411242014.31d3f168@posting.google.com...
[...]
> The C version appears to work about 2.2 times slower than C's sqrt()
> on a float and the assembly at about 1.5.
> Was just wondering if anyone had any recommendations? I am relatively
> pleased to have gotten that close to the standard library, but
> certainly wouldn't complain if I could get more speed and/or more
> decimal bits with a different approach.

I poked around not too long ago in search of the exact same. I have a
non-LUT solution which is about 2-2.5 times slower than sqrt() on my Athlon
with gcc -O3 -finline-functions. The actual speed depends on the data, and
2.5 seems to be the worst case. While poking around I also found an integer
sqrt() which ranges from 1.0 to 1.5 times slower than C's sqrt() function. I
have not taken the time to fully understand it, and thus I have no
fixed-point version. All of the algorithms in my code are credited to the
following page: http://www.azillionmonkeys.com/qed/sqroot.html

I compared Newton-Rhapson to these algorithms also, and I seem to recall
Newton-Rhapson being slower on average, but I'm not 100% certain. For 32-bit
values, you need 5 iterations of Newton-Rhapson to guarantee convergence.

Note that all the algorithms I am discussing give full precision. My
implementation of the square root functions follow:

// Size of mantissa in bits.
#define MANTISSA (10)

unsigned long lzc(unsigned long x)
{
#ifdef __GNUC__
 asm("bsrl %1, %0" : "=r" (x) : "r" (x));
#else
 _asm
 {
  mov eax, x
  bsr eax, eax
  mov x, eax
 }
#endif /* __GNUC__ */
 return x;
}

// Algorithm is based on polynomial arithmetic:
// 1) Compute initial guess using 2^(msb_index / 2)
// 2) Iterate over remaining bits
// 3) Set bit to 1 if (1 << bit_index) < (x - guess^2)
fix fixsqrt(fix x)
{
 fix guess, gsqr, bsqr, gxb;
 unsigned int i, bitmask;

 if (x == 0)
  return 0;

 // Compute guess
 i = (((lzc(x) - MANTISSA) / 2) + MANTISSA);
 guess = 1 << i;

 // Compute mask containing next bit to be checked
 bitmask = (1 << i) >> 1;

 // Cache guess^2, bitmask^2, and 2*guess*bitmask for fast polynomial
 // eval.
 gsqr = guess << (i - MANTISSA); // guess * guess
 gxb = guess << (i - MANTISSA); // guess * bitmask * 2
 bsqr = (bitmask << (i - MANTISSA)) >> 1; // bitmask * bitmask

 // Determine remaining bits
 do
 {
  // (guess + bitmask)^2 == guess^2 + 2*guess*bitmask + bitmask^2
  // == gsqr + gxb + bsqr
  if ((gsqr + gxb + bsqr) <= x)
  {
   guess |= bitmask;
   gsqr += (gxb + bsqr);
   gxb += (bsqr << 1);
  }

  bitmask >>= 1;
  bsqr >>= 2;
  gxb >>= 1;
 }
 while(bitmask != 0);

 return guess;
}

// The integer square root function. Copied directly off the webpage;
// seems to pick 1 bit per iteration, but I haven't looked closely at it.
int isqrt(int x)
{
  unsigned long temp, g=0, b, bshft;
  bshft = lzc(x) / 2;
  b = 1 << bshft;
  do {
    if (x >= (temp = (((g<<1)+b)<<bshft--))) {
      g += b;
      x -= temp;
    }
  } while (b >>= 1);
  return g;
}

-Matt



Relevant Pages

  • Re: initialization time?
    ... These too can upset your measurement. ... algorithms, on which was 5% faster after 20 iterations, but was 10 ... times slower in the first 2. ... algorithm unless in real life you run this code 20+ times per run? ...
    (comp.lang.java.help)
  • Re: complexity of numerical software
    ... > which I would classify with discrete or combinatorics algorithms. ... Cody's celefunt suite of accuracy tests for complex elementary ... nonnegative normalized short floating-point numbers x, SQRT ... output error, epsilon, maybe gives you some hope of ...
    (sci.math.symbolic)
  • Re: Cascading different algorithms?
    ... >Is the idea that you would have SQRT states and they would be running ... that quantum computers may be easier to build than predicted. ... there is doubt developing new QC resistant algorithms. ...
    (sci.crypt)

Loading