Multiple disjoint sample sets?

Dave Angel d at davea.name
Fri Jan 11 10:14:15 EST 2013


On 01/11/2013 09:36 AM, MRAB wrote:
> On 2013-01-11 14:15, Roy Smith wrote:
>> I have a list of items.  I need to generate n samples of k unique items
>> each.  I not only want each sample set to have no repeats, but I also
>> want to make sure the sets are disjoint (i.e. no item repeated between
>> sets).
>>
>> random.sample(items, k) will satisfy the first constraint, but not the
>> second.  Should I just do random.sample(items, k*n), and then split the
>> resulting big list into n pieces?  Or is there some more efficient way?
>>
>> Typical values:
>>
>> len(items) = 5,000,000
>> n = 10
>> k = 100,000
>>
> I don't know how efficient it would be, but couldn't you shuffle the
> list and then use slicing to get the samples?

I like that answer best, but just to offer another choice...

You start with a (presumably unique) list of items.  After collecting
your first sample, you could subtract them from the list, and use the
smaller list for the next sample.

One way is to convert list to set, subtract, then convert back to list.



-- 

DaveA




More information about the Python-list mailing list