# Re: greatest multiple algorithm

"Phil Carmody" <thefatphil_demunged@xxxxxxxxxxx> wrote in message
news:878x9o4jgy.fsf@xxxxxxxxxxxxxxxxxxxxxxx
"Rod Pemberton" <spamtrap@xxxxxxxxxx> writes:
"Phil Carmody" <thefatphil_demunged@xxxxxxxxxxx> wrote in message

Why?

Because I'm interested in speed. And I'm always talking in
terms of 10^12-10^16. Anything that's just 10^9 or 10^10
is just a toy and not of interest to me.

As I've pointed out, each has advantages and limitations. For
integers upto the largest cpu supported integer type, modulo is faster
on
modern machines.

In practice, it seems that the pure bit sieve (although using
Bernstein or Galway's quadratic form (Atkin) sieve rather than
Eratosthenes) should remain optimal until well beyond 10^20,
and after that trial division makes no sense at all.

This could be used to reduce the time needed to setup the
lower or smaller primes in bit sieve. The bit sieve becomes
progressively
slower with larger sieve sizes due to the large number of memory
accesses.
That can be reduced by a significant factor using the technique
documented on TOS's web page. Basically you throw the primes into
a simplified priority queue. Only look at the primes that are
relevant to each segment you're sieving at the time. (James -
did yours do this too - I've not looked at your code for a long
time, and barely speak ForTran.)

However, the bit sieve has the advantage that it isn't limited to
finding
primes that fit within the cpu supported integer type.

bit sieve
using single bits to represent odd numbers
finds primes upto memory_in_bytes*8*2
requires no modulo computation
requires many memory accesses
slows down based on size memory used
limited to available memory

You can use a wheel to decrease the memory requirements. Most
fast sieves use the 8 residues mod 30 as their wheel. That
almost halves the memory requirements. Even a wheel of 6 saves
you a third.

One question is how much does the extra logic needed to skip bits slow
down
the algorithm versus the gain in computable range. For example, I found
that skipping multiples of 5 for the trial division algortihm below
("wheel
of 30" ?)

Yup.

slowed the algorithm down alot.

It's possible. It takes a lot of care to use a mod 30 wheel
effectively. TOS and DJB have done it, I almost never care
beyond the trivial mod 6 wheel.

The comparison and conditional
were much slower than the modulo.

Often efficient wheels are implemented by unrolling loops,
and creating 8 different versions of the code for the 8
different stepping patterns (James' and TOS's optimised
code does this, I'm sure (and it doesn't apply to an Atkin
sieve such as DJB's)).

[...]
Forgot to mention another one:

test potential primes with GCD composed of all lower primes
modulo with "primed" GCD
very fast
"primed" GCD exceeds cpu integer size rapidly (i.e., prime
47...)

Hmm, maybe I should try floats on that one just to see how far I can
get it
to run without error...

That technique goes all the way back to Euclid. It's very useful
in many situations. If you look in the GMP source for their
probable primality test you'll see them do a couple of GCDs with
products of primes before anything else.

Honestly, from my experiences, I wasn't really sold on the idea that a bit
sieve could be very fast. Looking at other sieves on the Internet didn't
change my attitude any. That was until I saw John Moyer's webpage. I'm not
sure how his compares to TOS, DJB, or James', but the performance numbers he
gives for his "wheel of 30" indicate that it's a "screamer." The one thing
I noticed right away was that he wasn't storing multiples of 2,3,5 - just
the other 8 possible primes per set of 30 - nice compact encoding...

http://www.rsok.com/~jrm/printprimes.html

Rod Pemberton

.

## Relevant Pages

• Re: greatest multiple algorithm
... Bernstein or Galway's quadratic form sieve rather than ... lower or smaller primes in bit sieve. ... slower with larger sieve sizes due to the large number of memory accesses. ... You can use a wheel to decrease the memory requirements. ...
(comp.lang.asm.x86)
• Re: greatest multiple algorithm
... array stores found primes ... modulo with found primes in array to find new primes ... consumes large amounts of memory ... That doesn't sound like a sieve, ...
(comp.lang.asm.x86)
• Re: greatest multiple algorithm
... modulo with found primes in array to find new primes ... That doesn't sound like a sieve, ... integers upto the largest cpu supported integer type, modulo is faster on ... slower with larger sieve sizes due to the large number of memory accesses. ...
(comp.lang.asm.x86)
• Re: [Clax86list] Can WXP handle 4 GB of Physical Memory
... determine if huge chunks of memory are able to speed up a sieve ... using L1 cache. ... Since there are about TEN times as many primes between ... sieve checks for primes up to 10**18. ...
(comp.lang.asm.x86)
• Re: Programs ro evaluate speed and size of L1 / L2 Cache
... Sieve the 8 different totients independently. ... a comparison for the main Atkin sieve (which has wheel 60). ... you could run a few hundred small primes over the 1 ... I tried the Vista speech recognition by running the tutorial. ...
(comp.lang.asm.x86)