Re: Nth Prime
From: James Van Buskirk (not_valid_at_comcast.net)
Date: 04/03/04
- Next message: The Wannabee: "Re: newbe about API"
- Previous message: Nathan C. Baker: "Re: newbe about API"
- In reply to: Bx.C: "Re: Nth Prime"
- Next in thread: Phil Carmody: "Re: Nth Prime"
- Reply: Phil Carmody: "Re: Nth Prime"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 02 Apr 2004 23:37:00 GMT
"Bx.C" <invalid-email-address@invalid.shiragajin> wrote in message
news:IVibc.436$tT3.413@bignews6.bellsouth.net...
> hmmm.. WOW!! this is interesting... if done properly... a 64K segment can
record primes between
> 3 and 1M+1... we know 2 is prime and 1 and 0 are not... and no other even
numbers are prime...
The 'if done properly' part is the hard part to pin down. The
sieve of eratosthenes comes up every now and then in clax, and
there are quite a few things you can do to speed it up. Bit
compression, as you note, is one of them, but there are several
different choices. You propose mod 2 compression, which is the
simplest form, but I have had success with mod 30 compression
and others advocate even mod 210 (or higher.) The reason my
code stops at mod 30 is that, since phi(30) = 8, for each
prime to sieve each block of memory there are 8 possible residues
for the prime to have mod 30 as well as 8 possible quotients mod
30 for the first hit in the current buffer for a total of 64
possible loops. Writing out these loops with their different
prologs and epilogs (I call this a loop explosion) results in
a body of code that starts to get comparable in size to the
instruction cache in various processors. Code that is bigger
than instruction cache is hardly going to be fast.
Given the compression factor, you then have to choose your
buffer sizes. If you try running a sieve table out to 1.0e9
you will find that it will be slow because after the smallest
primes, most of your sieve table hits will be cache misses.
To get around this problem, what you do is to define a small
buffer size and sieve only one buffer at a time before moving
on to the next. The limiting case of this is often posted in
newsgroups where a naive soul proposes to determine whether
each number is prime by dividing it by all primes less than
its square root before moving on to test the next number.
Rather of a slow process, but the number being checked for
primality stays in cache throughout! With buffering we are
checking perhaps a million or so numbers more or less
simultaneously and they stay in cache until we want to test
the next set. At a certain point primes get big enough that
the penalty of reading the next prime from the growing database
and setting up the loop to sieve the current buffer with that
prime becomes appreciable and it becomes necessary to create
a superbuffer that fits in L2 cache and holds several of the
smaller buffers. I tried even a third-level buffer to go
with the L3 cache on my 21164, but it seems that TLB
thrashing prevents one from using the 2 MB of L3 usefully
for this problem.
Also you need to have paint-roller code for updating 64 bits
of the table at a time for the smallest primes which haven't
been eliminated by compression in the first paragraph. You
wouldn't think this is so important, but recall that the
smallest primes get the most hits in the table and also that
the very smallest primes can hit twice in the same byte.
You can't have well pipelined code when sieve table updates
are forced to be sequential by this possibility.
So you have 5 classes of primes according to size: the ones
eliminated by compression, the paint-roller primes, the L1
cache primes, the L2 cache primes, and the primes that are
so big that they only get at most one hit in L2 cache. These
last primes are problematic in that just looking at them is
in itself an L2 cache miss so it is necessary to keep them
in a data structure such that the next such prime to be
touched is consecutive in memory with the previous one. Of
course this requires motion for each touch because who is the
successor of a given prime changes on each go-round. It's
absolutely necessary to get this last class of primes flying
if you want to do things like check Goldbach's conjecture or
various moduli of Chebyshev's bias out to integer overflow.
I have some code where all this is implemented fully except
the largest set only partially, because it seems useful to
further subdivide it into sets to save on memory. Unfortunately
the HLL compiler I was using was a beta that expired so I
don't even recall at this point which version of the code
that I have on my hard disk is the currently working one. As a
check to see whether your sieve code is working efficiently, it
should be able to count all primes less than 1.0e9 in under
a second.
-- write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, & 6.0134700243160014d-154/),(/'x'/)); end
- Next message: The Wannabee: "Re: newbe about API"
- Previous message: Nathan C. Baker: "Re: newbe about API"
- In reply to: Bx.C: "Re: Nth Prime"
- Next in thread: Phil Carmody: "Re: Nth Prime"
- Reply: Phil Carmody: "Re: Nth Prime"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|