sampling from frequency distribution / histogram without replacement

duncan smith duncan at invalid.invalid
Mon Jan 14 15:11:56 EST 2019


Hello,
      Just checking to see if anyone has attacked this problem before
for cases where the population size is unfeasibly large. i.e. The number
of categories is manageable, but the sum of the frequencies, N,
precludes simple solutions such as creating a list, shuffling it and
using the first n items to populate the sample (frequency distribution /
histogram).

I note that numpy.random.hypergeometric will allow me to generate a
sample when I only have two categories, and that I could probably
implement some kind of iterative / partitioning approach calling this
repeatedly. But before I do I thought I'd ask if anyone has tackled this
before. Can't find much on the web. Cheers.

Duncan



More information about the Python-list mailing list