Multiple disjoint sample sets?

Roy Smith roy at panix.com
Fri Jan 11 09:15:29 EST 2013


I have a list of items.  I need to generate n samples of k unique items 
each.  I not only want each sample set to have no repeats, but I also 
want to make sure the sets are disjoint (i.e. no item repeated between 
sets).

random.sample(items, k) will satisfy the first constraint, but not the 
second.  Should I just do random.sample(items, k*n), and then split the 
resulting big list into n pieces?  Or is there some more efficient way?

Typical values:

len(items) = 5,000,000
n = 10
k = 100,000



More information about the Python-list mailing list