random.sample with large weighted sample-sets?

Ned Batchelder ned at nedbatchelder.com
Sun Feb 16 09:32:54 EST 2014


On 2/16/14 9:22 AM, Tim Chase wrote:
> 3) you meant to write "(10, 'apple')" rather than 0.  With my original
> example code, a 0-probability shouldn't ever show up in the sampling,
> where it looks like it might when using this sample code. In my
> particular use case, I can limit/ensure that 0-probability items never
> appear in the list, filtering them upon loading.

Terry didn't state this explicitly, but he restructured your data to 
have cumulative probabilities.

You had:

   data = (
     ("apple", 20),
     ("orange", 50),
     ("grape", 30),
     )

He turned it into:

   data = [
     (0, 'apple'),
     (0+20, 'orange'),
     (0+20+50, 'grape'),
   ]

Each number is the cumulative probability up to but not including the item.

-- 
Ned Batchelder, http://nedbatchelder.com




More information about the Python-list mailing list