[Python-ideas] random.sample should work better with iterators
Franklin? Lee
leewangzhong+python at gmail.com
Wed Jun 27 12:58:14 EDT 2018
On Wed, Jun 27, 2018 at 3:11 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Tue, 26 Jun 2018 23:52:55 -0500
> Tim Peters <tim.peters at gmail.com> wrote:
>>
>> In Python today, the easiest way to spell Abe's intent is, e.g.,
>>
>> >>> from heapq import nlargest # or nsmallest - doesn't matter
>> >>> from random import random
>> >>> nlargest(4, (i for i in range(100000)), key=lambda x: random())
>> [75260, 45880, 99486, 13478]
>> >>> nlargest(4, (i for i in range(100000)), key=lambda x: random())
>> [31732, 72288, 26584, 72672]
>> >>> nlargest(4, (i for i in range(100000)), key=lambda x: random())
>> [14180, 86084, 22639, 2004]
>>
>> That also arranges to preserve `sample()'s promise that all sub-slices of
>> the result are valid random samples too (because `nlargest` sorts by the
>> randomly generated keys before returning the list).
>
> How could slicing return an invalid random sample?
If the sample isn't randomly ordered.
def sample(population, k):
population = list(population)
shuffle(population)
return sorted(population[:k]) #No, don't sort!
More information about the Python-ideas
mailing list