Re: random number problem

From: Paul Hsieh (qed_at_pobox.com)
Date: 07/12/04


Date: 12 Jul 2004 03:17:15 -0700


"copx" <invalid@invalid.com> wrote:
> <Jens.Toerring@physik.fu-berlin.de> schrieb:
> > copx <invalid@invalid.com> wrote:
> [snip]
> > > random_int = (rand() * max) + min
>
> > > And I think this additional step destroys the uniformity..
> >
> > Why should it do that? It's a simple linear transformation.
> > If you have uniformly distributed points drawn on an elastic
> > band and then stretch it they stay uniformly ditributed.
> > And a simple multiplication does nothing else (and adding an
> > offset doesn't change anything).
>
> Really? Well, I guess you're right because I really don't
> know sh*t about mathematics (I'm just a hobby coder)
> and you sound like you know what you're talking about.

First of all, to have this nice linear property you really do need a
floating point random number generator. The C language doesn't have
one of those. If you do the integer transformation as suggested
(whether or not you use the upper bits) your distribution will be
wrong if it does not divide evenly into (RAND_MAX + 1).

On my C compiler RAND_MAX is defined to the very tiny number: 32767.
Since you say you shy away from math, lets just prove it with a
program:

=============================================================================

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

#ifdef USE_MERSENNE_TWISTER
#include <limits.h>
#include "mt.h"

void srand_TEST (unsigned long s) {
        init_genrand (s);
}

unsigned long rand_TEST (void) {
        return genrand_int32 ();
}

#define RAND_MAX_TEST ULONG_MAX

#else

#define srand_TEST srand
#define rand_TEST rand
#define RAND_MAX_TEST RAND_MAX

#endif

#define AVERAGE_SAMPLES_PER_BUCKET (1000)

int main () {
int i, d = 0, n, minb, maxb;
static unsigned long * bucket;
double scale, a2;

        scanf ("%d", &d);
        if (d <= 0) {
                fprintf (stderr, "Enter a positive integer\n");
                exit (-1);
        }

        scale = d / (RAND_MAX_TEST + 1.0);

        bucket = (unsigned long *) malloc (sizeof (unsigned long) * d);
        for (i=0; i < d; i++) {
                bucket[i] = 0;
        }
 
        srand_TEST (time (NULL) + clock () + d);

        n = d * AVERAGE_SAMPLES_PER_BUCKET;

        for (i=0; i < n; i++) {
                int r = rand_TEST () * scale;
                bucket[r]++;
        }

        /* Some statistical tests */

        minb = n;
        maxb = 0;

        a2 = 0;
        for (i=0; i < d; i++) {
                if (minb > bucket[i]) minb = bucket[i];
                if (maxb < bucket[i]) maxb = bucket[i];
                a2 += ((double)bucket[i]) * bucket[i];
        }

        a2 = a2 / d - (AVERAGE_SAMPLES_PER_BUCKET*AVERAGE_SAMPLES_PER_BUCKET);

        printf ("min bucket: %d\n", minb);
        printf ("max bucket: %d\n", maxb);
        printf ("last bucket: %d\n", bucket[d-1]);
        printf ("variance: %g (supposed to be about %d)\n", a2,
            AVERAGE_SAMPLES_PER_BUCKET);

        free (bucket);

        return 0;
}

=============================================================================

For most small numbers (less then 1000) you enter, when you run the
program, the important number that it outputs, the variance, is around
1000, which is what theory says it should be for a uniform
distribution. But as soon as you start trying out numbers in excess
of 1000 you will see some disturbing results in the variance (actually
it depends on how close the number is to one substantially divisible
into 32768 -- compare the result of 4096 with 4000). It just
increases and increases, until you approach the value of RAND_MAX
itself, when it finally converges back to about 1000. Numbers beyond
RAND_MAX are clearly useless.

If you want a *serious* random number generator, without thinking too
hard about it, use the "Mersenne Twister" which can be located here:

    http://www.math.sci.hiroshima-u.ac.jp/%7Em-mat/MT/emt.html

It passes the test above for much larger integer values, as well as
coming with a real number random generator. If you want to *study*
random number generators, I suggest you do a search in google groups
for "George Marsaglia".

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/


Relevant Pages

  • Re: issues with statistical test suite from http://csrc.nist.gov/rng/
    ... A distribution, or data set, is symmetric if it looks the ... >> A generator is bad when it fails a test that has a good mathematical ... The most common criteria are lack of correlation and even ... Since some of the best minds in mathematics can't come up with ...
    (sci.crypt)
  • Re: issues with statistical test suite from http://csrc.nist.gov/rng/
    ... >> ses is the standard error of skewness. ... >distribution differs from the expected one. ... >Unfortunately I can't know if a generator under test is good or bad if it ... Kurtosis could be ...
    (sci.crypt)
  • Re: question on entity generation/termination
    ... use the event based random number generator block. ... be the distribution parameters, and using those you can generate random ... The EML block is a time-based block ... The parameter of the event-based random number ...
    (comp.soft-sys.matlab)
  • Re: How to test a distribution for uniformity?
    ... > observations occured is roughly uniform. ... > distribution of observation times differs significantly from ... I am using bins (those are my 45 minute ... I wonder is there such a thing as a chi-square test which is adjusted to ...
    (sci.stat.math)
  • Re: How to test a pseudo random prime number generator?
    ... number generators. ... This method is efficient but does not produce a uniform distribution ... approximate distribution of gaps. ...
    (sci.math)