Re: bootstrapping in Perl
- From: tom@xxxxxxxxxxxxxx (Tom Phoenix)
- Date: Wed, 26 Sep 2007 09:43:15 -0700
On 9/26/07, Pedro Soto <pedrosoto2007@xxxxxxxxx> wrote:
I need to derive a subsample with replacement from a large distribution of
data. Say if my large sample is 10000 I need to get 100 data out of the
10000 data and repeat the procedure n times(that's what I called
boostrapping).
Perl can easily select 100 items at random from 10000, as many times
as you need.
I am using the function of perl srand to generate random
numbers in order to do the resampling at 'random'.
It's rare to need to use srand(). You probably want just plain rand().
The problem is that the
distribution of the original data (10000) does not follow a gaussian
distribution and therefore I am not sure if using only this function
(srand) in perl would be enough, because the numbers of the large
distribution won't have the same probability of being selected.
The probability of an item being selected by rand() shouldn't normally
depend upon the item itself. This code pulls 100 samples at random
from a list (@source) of at least that many items, but the items
themselves don't have any influence on the selection.
my $samples_needed = 100;
die "Not enough data" if @source < $samples_needed;
my $count = 0;
my @samples; # starts off empty
foreach my $item (@source) {
next if $samples_needed / (++$count) <= rand;
if (@samples < $samples_needed) {
push @samples, $item;
} else {
$samples[rand @samples] = $item; # random index
}
}
Does this get you any closer to a solution? Good luck with it!
--Tom Phoenix
Stonehenge Perl Training
.
- References:
- bootstrapping in Perl
- From: Pedro Soto
- Re: bootstrapping in Perl
- From: Tom Phoenix
- bootstrapping in Perl
- Prev by Date: Re: Changing CPAN config before initial config?
- Next by Date: RE: Changing CPAN config before initial config?
- Previous by thread: Re: bootstrapping in Perl
- Next by thread: need help parsing file for output
- Index(es):
Relevant Pages
|