Re: Choose k random lines from file
From: Arthur J. O'Dwyer (ajo_at_nospam.andrew.cmu.edu)
Date: 05/31/04
- Next message: Nick Landsberg: "Re: Choose k random lines from file"
- Previous message: Steve: "Re: Extended Backus Naur Form (EBNF) Notation"
- In reply to: Gerry Quinn: "Re: Choose k random lines from file"
- Next in thread: Nick Landsberg: "Re: Choose k random lines from file"
- Reply: Nick Landsberg: "Re: Choose k random lines from file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 31 May 2004 13:01:24 -0400 (EDT)
On Mon, 31 May 2004, Gerry Quinn wrote:
>
> gerryq@DELETETHISindigo.ie says...
> > ajo@nospam.andrew.cmu.edu says...
> > > I don't like to use potentially-infinite procedures when there
> > > are algorithms available, no matter how fast they are. Call it
> > > an ideological position. ;)
>
> Something interesting just occurred to me - if you want an even
> distribution you may *need* a non-deterministic algorithm, because the
> alternative will be an infinite, or at least horribly long one!
>
> How so? Well, suppose you have seventeen pages, and you want to choose
> one at random, but your RNG has a period that's a power of two, or
> anyway some number that's not a multiple of seventeen.
It's not the period that would be the problem; it's the range.
Suppose our RNG (doesn't even have to be a PRNG, but any RNG) gives
us numbers in the range 0..2^N-1. And we want a number in the range
0..16 (seventeen choices total). There's no algorithm to give us
that number without a bias. None. There are procedures much less
wasteful than the canonical
again:
R = rand();
if (R >= (RAND_MAX/17)*17) goto again;
return (R % 17);
[which I see you showed below],
but there's no way to get an unbiased number with a range of M
out of an unbiased generator with a range of N, if M and N are
coprime. Or something like that. I'm not going to prove my
wild assertions on a federal holiday! ;)
> To get an equally good deterministic algorithm, I suppose you could call
> rand() seventeen times and add them. To be honest I'm not 100% sure
> that this is valid, but anyway if you have a lot of pages the amount of
> calculation rises sharply...
Not valid. That gets a random number in the range 0..(17*RAND_MAX),
all right, but it's no longer unbiased. There are a lot of numbers
clustered around 8.5*RAND_MAX, and only one way to get 0. So just
taking the sum of 17 rand()s and modding by 17 doesn't get you an
unbiased distribution either.
> Creating a PRNG with a period of 17 has been suggested.
Has it? :) s/period/range/ and you have a point that I don't
recall being made in this thread; but I'm not sure such a beast
exists for any given prime. Sounds like a job for Knuth to me!
-Arthur
- Next message: Nick Landsberg: "Re: Choose k random lines from file"
- Previous message: Steve: "Re: Extended Backus Naur Form (EBNF) Notation"
- In reply to: Gerry Quinn: "Re: Choose k random lines from file"
- Next in thread: Nick Landsberg: "Re: Choose k random lines from file"
- Reply: Nick Landsberg: "Re: Choose k random lines from file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|