Random from Dictionary

Mon Oct 22 08:31:48 EDT 2001

Martin von Loewis <loewis at informatik.hu-berlin.de> writes:

> "Tim Peters" <tim.one at home.com> writes:
> 
> > Season to taste.  In 2.2 you could use a more memory-efficient scheme via
> > iterating over d directly:
> > 
> > def random_from_dict(d, x):
> >     "Return and remove x random (k, v) pairs from d."
> >     from random import random
> >     n = float(len(d))  # float() so x/n later doesn't truncate to 0
> >     if not 1 <= x <= n:
> >         raise ValueError("go away")
> >     result = []
> >     for k in d:
> >         # Choose this item with probability x/n.
> >         if random() <= x/n:
> >             result.append((k, d[k]))
> >             x -= 1
> >             if x == 0:
> >                 break
> >         n -= 1
> >     for k, v in result:
> >         del d[k]
> >     return result
> 
> Does that give each key in the dictionary the same probability of
> being picked (assuming random() distributes uniformly)? 

Sort of.  

Item i has a x/n probability of being picked.
If picked item i+1 has a (x-1)/(n-1) probability of being picked
else item i+1 has a x/(n-1) probability of being picked

thus the total probability of item i+1 being picked is

(x/n)*(x-1)/(n-1)+(1-(x/n))*x/(n-1)
(x^2-x)/(n^2-n)+(x-(x^2/n))/(n-1)
(x^2-x)/(n^2-n)+(xn-x^2)/(n^2-n)
(x^2-x^2-x+xn)/(n^2-n)
(xn-x)/(n^2-n)
x(n-1)/n(n-1)
x/n

But having proved that you are only letting me assume that random() is
uniform.  There is some rounding error in the floats that will make
the above proof not quite true.  Though if you are that sensitive to
non-uniformity perhaps you should rethink the problem.  (You can't
avoid this, of course.  Presuming it were uniform, random() would
produce one of 2**52 floats, which can't be distributed uniformly
across a set of 5 elements...)

-- 
Christopher A. Craig <com-nospam at ccraig.org>