Re: Choose k random lines from file

From: Arthur J. O'Dwyer (ajo_at_nospam.andrew.cmu.edu)
Date: 05/31/04


Date: Mon, 31 May 2004 13:01:24 -0400 (EDT)


On Mon, 31 May 2004, Gerry Quinn wrote:
>
> gerryq@DELETETHISindigo.ie says...
> > ajo@nospam.andrew.cmu.edu says...
> > > I don't like to use potentially-infinite procedures when there
> > > are algorithms available, no matter how fast they are. Call it
> > > an ideological position. ;)
>
> Something interesting just occurred to me - if you want an even
> distribution you may *need* a non-deterministic algorithm, because the
> alternative will be an infinite, or at least horribly long one!
>
> How so? Well, suppose you have seventeen pages, and you want to choose
> one at random, but your RNG has a period that's a power of two, or
> anyway some number that's not a multiple of seventeen.

  It's not the period that would be the problem; it's the range.
Suppose our RNG (doesn't even have to be a PRNG, but any RNG) gives
us numbers in the range 0..2^N-1. And we want a number in the range
0..16 (seventeen choices total). There's no algorithm to give us
that number without a bias. None. There are procedures much less
wasteful than the canonical

    again:
        R = rand();
        if (R >= (RAND_MAX/17)*17) goto again;
        return (R % 17);

[which I see you showed below],
but there's no way to get an unbiased number with a range of M
out of an unbiased generator with a range of N, if M and N are
coprime. Or something like that. I'm not going to prove my
wild assertions on a federal holiday! ;)

> To get an equally good deterministic algorithm, I suppose you could call
> rand() seventeen times and add them. To be honest I'm not 100% sure
> that this is valid, but anyway if you have a lot of pages the amount of
> calculation rises sharply...

  Not valid. That gets a random number in the range 0..(17*RAND_MAX),
all right, but it's no longer unbiased. There are a lot of numbers
clustered around 8.5*RAND_MAX, and only one way to get 0. So just
taking the sum of 17 rand()s and modding by 17 doesn't get you an
unbiased distribution either.

> Creating a PRNG with a period of 17 has been suggested.

  Has it? :) s/period/range/ and you have a point that I don't
recall being made in this thread; but I'm not sure such a beast
exists for any given prime. Sounds like a job for Knuth to me!

-Arthur



Relevant Pages

  • Re: Choose k random lines from file
    ... > Suppose our RNG ... > that number without a bias. ... > unbiased distribution either. ... distribution for the sum. ...
    (comp.programming)
  • Re: Bad news for Block Ciphers?
    ... >> Finally it is clear to me now that there is no bias in AES whatsoever. ... > of a bad RNG, does that make his "bad RNG" a selection function on the ... > Has he published the exact experimental setup he used, ...
    (sci.crypt)
  • Re: Bad news for Block Ciphers?
    ... > Finally it is clear to me now that there is no bias in AES whatsoever. ... of a bad RNG, does that make his "bad RNG" a selection function on the ... Has he published the exact experimental setup he used, ...
    (sci.crypt)
  • Re: Bad news for Block Ciphers?
    ... >of a bad RNG, does that make his "bad RNG" a selection function on the ... and the apparent bias would be much larger than expected. ... see 530,000 ones and 470,000 zeros. ...
    (sci.crypt)