Re: Is there a better way to simulate randomly choosing from a weighted set?



In article <1141032840.306704.88070@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
yay_frogs@xxxxxxxxx writes
[I originally posted this to comp.lang.c++ since that is the language
I'm working in, but then I realized that my problem isn't language
specific and this would probably be a better group.]

Here is my problem: suppose there are, say, five events with these
probabilities:

event1 0.7
event2 0.1
event3 0.1
event4 0.05
event5 0.05

Note that sum of the probabilities is 1.0. I would like a function that
simulates these events and returns an int to indicate which event
occurred: the function should statistically return 1 about 70% of the
time, 2 about 10% of the time, and so on.

I have figured out a way to do this, but I suspect my way is
suboptimal.

I build a vector of five elements that looks like:

( 0.05, 0.05+0.05, 0.05+0.05+0.1, 0.05+0.05+0.1+0.1,
0.05+0.05+0.1+0.1+0.7 )
= ( 0.05, 0.1, 0.2, 0.3, 1.0 )

I then generate a random float in the interval 0.0 ... 1.0, and if the
random float is in the range 0 to 0.05, I return event 5, and if the
random float is in the range 0.05-0.1, I return event 4, and so on.
(Actually, I should test for event 1 first since it is most common, but
I'm too lazy to re-type my example vector above.)


Knuth Vol II (Seminumerical Algorithms) section 3.4.1 Exercise 7

Here's what I remember without rereading:
Given 5 possible outcomes as above build a table as follows:
Case A generate event 1 with prob 3/4 and event 5 with prob 1/4
Case B generate event 1 with prob 3/4 and event 4 with prob 1/4
Case C generate event 1 with prob 1/2 and event 3 with prob 1/2
Case D generate event 1 with prob 1/2 and event 2 with prob 1/2
Case E generate event 1
Now chose a case at random with equal probability to all cases and then
follow the instructions for that case. Event 5 happens with prob 0.2 / 4
= 0.05 as required and so on.

You can always build such a table as follows:

You start off with n cases to consider. If all n have prob 1/n you build
a table in which each case generates a single event. If not, at least
one case has prob < 1/n and at least one has prob > 1/n. Take one of
your n cases and make it responsible for all events of the type with
prob < 1/n. Use whatever is left over from that case to reduce the
probability required for the event with prob > 1/n. You now have n-1
types of events to worry about and n-1 cells left to play with,
generating a total of (n-1)/n of probability-stuff. So either all n-1
probabilities left are weight 1/n or at least one is weight left less
than 1/n, so you can repeat the exercise until you have filled all the
cells, at which point you will have generated all possible cases.

Note that the table-building process described here is linear in the
size of the input data (you don't have to bother keeping stuff sorted by
weight: just keep pools for items <1/n, >1/n, and = 1/n). So this is
about as efficient as you might hope for general input data even if you
only want to generate a single random item.
--
A.G.McDowell
.



Relevant Pages

  • Re: How do I scale the probabilities for Viterbi?
    ... total += prob ... if v_prob> valmax: ... Viterbi algorithm. ... simply takes the maximum over all path probabilities and remembers ...
    (comp.speech.research)
  • Is there a better way to simulate randomly choosing from a weighted set?
    ... but then I realized that my problem isn't language ... Note that sum of the probabilities is 1.0. ... I have figured out a way to do this, but I suspect my way is ... I then generate a random float in the interval 0.0 ... ...
    (comp.theory)
  • Re: Dice problem - follow up
    ... Graeme wrote: ... of success for 1,2,3 and 4 correct guesses together. ... So prob = 51.x% ...
    (sci.stat.math)
  • Re: Smarter way of doing this?
    ... >then randomly select one of the probabilities and return it's index. ... >The idea is to have two lists, one with a value, and another with a ... for prob, item in zip: ... for i in xrange: ...
    (comp.lang.python)