sampling from frequency distribution / histogram without replacement

Mon Jan 14 17:59:38 EST 2019

duncan smith wrote:
> Hello,
>       Just checking to see if anyone has attacked this problem before
> for cases where the population size is unfeasibly large.

The fastest way I know of is to create a list of cumulative
frequencies, then generate uniformly distributed numbers and
use a binary search to find where they fall in the list.
That's O(log n) per sample in the size of the list once it's
been set up.

-- 
Greg