Re: Any progress yet? (was Re: Fast pi program?)
- From: mike3 <mike4ty4@xxxxxxxxx>
- Date: Sun, 02 Sep 2007 11:26:20 -0700
On Aug 31, 1:52 pm, mike3 <mike4...@xxxxxxxxx> wrote:
On Aug 31, 12:43 am, "Alf P. Steinbach" <al...@xxxxxxxx> wrote:
* mike3:
On Aug 30, 11:49 pm, "Alf P. Steinbach" <al...@xxxxxxxx> wrote:
I'd had to really delve into this to give specific advice (and Things Take Time).
For now, looking only at the [pib26.c] main program source, it seems the
program is actually /calculating/ in base 26^4. Perhaps that's a choice
found after measuring the cost of calculating in a binary base and then
converting to base 26, or perhaps it's not? If number system conversion
requires quadratic time (does it?) perhaps this is best, but if not,
perhaps a change to internal binary system computation, with final
conversion to base 26, would really speed things up? Just a wild guess.
It does calculate in base 26^4. You are right. To compute in base 10
you need to compute 2x the digits then convert to 26 (since the
algorithms work best with powers of 2 numeric sizes),
Uh, no, I suggested a binary base. In particular, e.g. base 2^32, or,
since it seems the program is doing 64-bit calculations, base 2^64.
Base 2^32 would /reduce/ the number of digits by a factor of about 1.7
compared to base 26^4; base 2^64 would reduce the number of digits by
twice that; and both would perhaps (I don't know) also avoid costly
per-digit remainder operations in all arbitrary precision operations.
But as I wrote quoted above, I don't know the cost of the final radix
conversion (I suspect it would take seconds to google but that's your
work! :-)).
Sorry but that won't do either. At base 2^32 the "multiplication
pyramid"
overflows very easily. If you pack 4 elements and you use 8 primes
for the modulo math in the NTT (FFT done in a finite field. The
complex
numbers approximated with floating point is even worse.), the "pyramid
size" goes up to like 256 bits, but the primes only handle around 248
or so. You'd have to use some "weird" base that doesn't fit nice on
byte boundaries and that just makes the programming worse (shifts
and masks start appearing in add/sub routines, etc.).
You can use the following formula to determine how many bits
the FFT must be capable of representing:
Pyramid Size = lg(nine^2) + lg(FFT Size)
Here, "nine = base - 1", so-named since 9 = 10-1.
If we want to pack 1 digit per prime (otherwise we start to
decrease the effective base) the effective Base = base^8,
so then we have lg(base^16) in there approximately. That
limits us to lg(base^16) + lg(FFT Size). With FFT Size = 16777216,
and Pyramid Size = 248, the base is
Base = sixteenth root(2^(Pyramid Size - lg(FFT Size)))
= sixteenth root(2^(248 - 24))
= sixteenth root(2^224)
= 2^(224/16)
= 2^14
= 16,384.
If we pack only numprimes/2 = 4 like we do with base 456,976
then the maximum base goes to
Base = eighth root(2^224)
= 2^(224/8)
= 2^28
= 268435456.
That's not much better than 5 base 26 digits per element. (it's
between 5 and 6). Not much of a gain.
So yes, it does cancel the performance gain if there was any.
Any answers then, anyone?
.
- Prev by Date: Re: Efficient interval representation
- Next by Date: Re: How to get kids to start programming? ...
- Previous by thread: Re: Efficient interval representation
- Next by thread: Re: How to get kids to start programming? ...
- Index(es):
Relevant Pages
|
Loading