Re: bootstrapping in Perl



On 9/26/07, Pedro Soto <pedrosoto2007@xxxxxxxxx> wrote:

I need to derive a subsample with replacement from a large distribution of
data. Say if my large sample is 10000 I need to get 100 data out of the
10000 data and repeat the procedure n times(that's what I called
boostrapping).

Perl can easily select 100 items at random from 10000, as many times
as you need.

I am using the function of perl srand to generate random
numbers in order to do the resampling at 'random'.

It's rare to need to use srand(). You probably want just plain rand().

The problem is that the
distribution of the original data (10000) does not follow a gaussian
distribution and therefore I am not sure if using only this function
(srand) in perl would be enough, because the numbers of the large
distribution won't have the same probability of being selected.

The probability of an item being selected by rand() shouldn't normally
depend upon the item itself. This code pulls 100 samples at random
from a list (@source) of at least that many items, but the items
themselves don't have any influence on the selection.

my $samples_needed = 100;
die "Not enough data" if @source < $samples_needed;
my $count = 0;
my @samples; # starts off empty
foreach my $item (@source) {
next if $samples_needed / (++$count) <= rand;
if (@samples < $samples_needed) {
push @samples, $item;
} else {
$samples[rand @samples] = $item; # random index
}
}

Does this get you any closer to a solution? Good luck with it!

--Tom Phoenix
Stonehenge Perl Training
.



Relevant Pages