Sampling a population
Robert Kern
robert.kern at gmail.com
Fri Jun 2 16:03:16 EDT 2006
Brian Quinlan wrote:
> The fastest algorithm that I have been able to devise for doing so is:
> O(n * log(len(lst))). Can anyone think or a solution with a better time
> complexity? If not, is there an obviously better way to do this
> (assuming n is big and the list size is small).
numpy.searchsorted() can do all of the n lookups in C.
# Untested.
import numpy as np
def draw(n, lst):
ps = np.cumsum([0.0] + [x[1] for x in lst])
# watch for floating point errors here. It's likely that the last
# item won't be quite 1.0
r = np.random.random(n) # n psuedorandom numbers uniform on [0, 1)
indices = np.searchsorted(ps, r)
# now do what you like with indices, which is an array of indices into
# lst.
Ed Schofield has an implementation of an algorithm by Marsaglia[1] which turns
sampling into a fast table lookup. If your probabilities have limited precision
(2**-30 or so rather than the full double precision 2**-52 or so), then this
might be an attractive option.
[1] http://www.jstatsoft.org/v11/i03/v11i03.pdf
The implementation is in the scipy sandbox currently, but I don't think it
builds painlessly at the moment.
http://svn.scipy.org/svn/scipy/trunk/Lib/sandbox/montecarlo/
Ask on one of the scipy mailing lists if you need help building it.
http://www.scipy.org/Mailing_Lists
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list