Re: Is there a better way to simulate randomly choosing from a weighted set?




yay_frogs@xxxxxxxxx wrote:
[I originally posted this to comp.lang.c++ since that is the language
I'm working in, but then I realized that my problem isn't language
specific and this would probably be a better group.]

Here is my problem: suppose there are, say, five events with these
probabilities:

event1 0.7
event2 0.1
event3 0.1
event4 0.05
event5 0.05

Note that sum of the probabilities is 1.0. I would like a function that
simulates these events and returns an int to indicate which event
occurred: the function should statistically return 1 about 70% of the
time, 2 about 10% of the time, and so on.

I have figured out a way to do this, but I suspect my way is
suboptimal.

I build a vector of five elements that looks like:

( 0.05, 0.05+0.05, 0.05+0.05+0.1, 0.05+0.05+0.1+0.1,
0.05+0.05+0.1+0.1+0.7 )
= ( 0.05, 0.1, 0.2, 0.3, 1.0 )

I then generate a random float in the interval 0.0 ... 1.0, and if the
random float is in the range 0 to 0.05, I return event 5, and if the
random float is in the range 0.05-0.1, I return event 4, and so on.
(Actually, I should test for event 1 first since it is most common, but
I'm too lazy to re-type my example vector above.)

If you will be sampling many times from the same discrete
distribution (and the number of elements is large), you can
find the element more efficiently by binary searching the
cumulative probabilities. (It's also possible to do in constant
time using a technique of Warren Smith.)

For my real problem, I have to deal with many different cases where the
number of events to consider constantly varies, and I suspect there has
to be a better way than building a vector to represent the different
ranges a random variable can fall in and then seeing which range it
falls in.

So is there a better way?

It depends on the number of elements and how often you
will be sampling from the same (or nearby) discrete
distribution.

If the discrete distribution changes slowly, you can maintain
the cumulative probabilities in a binary tree structure
(each node stores the total weight beneath it) and sample
and update in logarithmic time.

.



Relevant Pages

  • Re: Mersenne Twister -- A Revised C++ Implementation
    ... >> Stephen Howe ... IMHO you also "calculate" numbers following a distribution, ... > Now you are changing it to cumulative probabilities. ... To be exact he is talking about cumulative distributions (CDF) and not ...
    (comp.lang.cpp)
  • Re: chi-square two sample test
    ... Your description of your context is exactly what ... a one-sample chi-squared test is for. ... How your distribution is defined makes no difference at all. ... cumulative probabilities from that fitted distribution, ...
    (comp.soft-sys.matlab)
  • Re: random numbers according to user defined distribution ??
    ... contain the distribution I need :-( ... def pseudonorm: ... """Return a random float in the range ... arbitrary random distribution. ...
    (comp.lang.python)
  • Re: Mersenne Twister -- A Revised C++ Implementation
    ... >> That is not a distribution. ... >> Now you are changing it to cumulative probabilities. ... Generating random numbers with a given distribution has a well-established ...
    (comp.lang.cpp)