Re: (!?) need fast algorithm for computing factorials

From: Calum (calum.bulk_at_ntlworld.com)
Date: 11/11/03


Date: Tue, 11 Nov 2003 10:54:17 +0000

This got me thinking. How easy is it to produce the prime factors of n! ?

As I discovered, this is surprisingly fast. The algorithm I wrote is
based upon a prime number sieve. 10000! contains only prime factors
less than or equal to 10000. It turns out that

10000! = 2^9995 * 3^4996 * 5^2499 * 7^1665 * 11^998 * 13^832 * 17^624 *
19^554 * ... * 9973

Looking at the power of 5 indicates this number will have 2499 trailing
zeros (base 10). Calculating x^y takes O(lg y) multiplications (by
repeated squaring), instead of the original O(y). Doing this reduces
the problem from 9998 multiplications to 3338.

But then I discovered an algorithm that reduces the number of
multiplications to 1522. It relies on the fact that the prime powers
are monotonically decreasing. I can't explain it very well, but some
code is at http://visula.org/calum/fact2.cpp

I'd never really considered that factorial could be optimized much, or
indeed that anyone would want to...

Calum Grant

CBFalconer wrote:
> Richard Heathfield wrote:
>
>>Adam Russell wrote:
>>
>>
>>>I am looking for the fastest algorithm for computing factorials
>>>(ex: 4!=4*3*2*1). This is for a programming contest I am
>>>thinking of entering. The program to be used is Labview and it
>>>needs to compute up to 10000!.
>>
>>http://www.rjgh.co.uk/prg/c/misc/f10000.c can do this quite
>>quickly. Just supply it with the appropriate configuration file
>>which you can find at http://www.rjgh.co.uk/prg/c/mist/f10000.txt
>>- like this:
>>
>>Linux:
>>./f10000 < f10000.txt
>>Windows:
>>f10000 < f10000.txt
>>
>>On my rather slow system, it completes in about 0.02 CPU seconds.
>>
>>
>>>I realize that these numbers will be too large for simple
>>>representation, and I think I can handle that. But I believe
>>>that simply multiplying each and every number probably is not
>>>the most efficient algorithm so would like advice on any
>>>numeric methods that might be quicker.
>>
>>Consider gathering multiples of 5 and multiples of 2. Make as
>>many pairs of these as you can, and don't bother multiplying
>>them. Just add that many 0s to the output at the end.
>>
>>The number 10000 has almost 2500 zeros at the end (it's a 35660
>>digit number when represented in decimal). That's a lot of
>>multiplying you can avoid doing.
>>
>>If you're allowed to do this in hex, even greater savings can
>>be made, since you can avoid multiplying by an /awful/ lot of
>>numbers.
>
>
> Along those lines, I did this about a year ago:
>
> /* compute factorials, extended range
> on a 32 bit machine this can reach fact(15) without
> unusual output formats. With the prime table shown
> overflow occurs at 101.
>
> Public domain, by C.B. Falconer.
> */
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <limits.h>
>
> /* 2 and 5 are handled separately
> Placing 2 at the end attempts to preserve such factors
> for use with the 5 factor and exponential notation
> */
> static unsigned char primes[] = {3,7,11,13,17,19,23,29,31,37,
> 41,43,47,53,57,59,61,67,71,
> /* add further primes here -->*/
> 2,5,0};
> static unsigned int primect[sizeof primes]; /* = {0} */
>
> static
> unsigned long int fact(unsigned int n, unsigned int *zeroes)
> {
> unsigned long val;
> unsigned int i, j, k;
>
> #define OFLOW ((ULONG_MAX / j) < val)
>
> /* This is a crude mechanism for passing back values */
> for (i = 0; i < sizeof primes; i++) primect[i] = 0;
>
> for (i = 1, val = 1UL, *zeroes = 0; i <= n; i++) {
> j = i;
> /* extract exponent of 10 */
> while ((0 == (j % 5)) && (!(val & 1))) {
> j /= 5; val /= 2;
> (*zeroes)++;
> }
> /* Now try to avoid any overflows */
> k = 0;
> while (primes[k] && OFLOW) {
> /* remove factors primes[k] */
> while (0 == (val % primes[k]) && OFLOW) {
> val /= primes[k];
> ++primect[k];
> }
> while (0 == (j % primes[k]) && OFLOW) {
> j /= primes[k];
> ++primect[k];
> }
> k++;
> }
>
> /* Did we succeed in the avoidance */
> if (OFLOW) {
> #if DEBUG
> fprintf(stderr, "Overflow at %u, %lue%u * %u\n",
> i, val, *zeroes, j);
> #endif
> val = 0;
> break;
> }
> val *= j;
> }
> return val;
> } /* fact */
>
> /* ---------------- */
>
> int main(int argc, char *argv[])
> {
> unsigned int x, zeroes;
> unsigned long f;
>
> if ((2 == argc) && (1 == sscanf(argv[1], "%u", &x))) {
> if (!(f = fact(x, &zeroes))) {
> fputs("Overflow\n", stderr);
> return EXIT_FAILURE;
> }
>
> printf("Factorial(%u) == %lu", x, f);
> if (zeroes) printf("e%u", zeroes);
> for (x = 0; primes[x]; x++) {
> if (primect[x]) {
> printf(" * pow(%d,%d)", primes[x], primect[x]);
> }
> }
> putchar('\n');
> return 0;
> }
> fputs("Usage: fact n\n", stderr);
> return EXIT_FAILURE;
> } /* main */
>



Relevant Pages

  • Re: Adds and computational complexity
    ... Looking for a an effective and efficient adaptive filtering ... algorithm, I stumbled across something called the Fast Euclidean ... multiplications since we already know x*x' and xhas ... DSP chips are specialized chips that perform ...
    (comp.soft-sys.matlab)
  • Re: fft
    ... Split-radix and radix-2 require at least N/4 real trigonometric constants ... algorithm requires 19 real constants vs. 15 for ordinary conjugate- ... implemented with 3 additions and 3 multiplications, ... trick would apply there since you don't have complex multiplications, ...
    (comp.dsp)
  • [PATCH 1/5] AF_RXRPC: Add blkcipher accessors for using kernel data directly
    ... Also add a CRYPTO_ALG_DMA algorithm capability flag to permit or deny the use ... If kernel data is going to be accessed directly, then CRYPTO_ALG_DMA must, for ... struct scatterlist *dst, struct scatterlist *src, ... unsigned int min_keysize; ...
    (Linux-Kernel)
  • [PATCH 1/5] AF_RXRPC: Add blkcipher accessors for using kernel data directly [try #2]
    ... Also add a CRYPTO_ALG_DMA algorithm capability flag to permit or deny the use ... If kernel data is going to be accessed directly, then CRYPTO_ALG_DMA must, for ... struct scatterlist *dst, struct scatterlist *src, ... unsigned int min_keysize; ...
    (Linux-Kernel)
  • Re: division problem
    ... unsigned int dividend: ... division instructions to accomplish a larger division, ... then go on to the next digit. ... The classic algorithm assumes that the machine has a division instruction ...
    (comp.lang.c)