[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)

Tim Peters report at bugs.python.org
Fri Jul 17 16:58:14 EDT 2020


Tim Peters <tim at python.org> added the comment:

Julia's randsubseq() doesn't allow to specify the _size_ of the output desired. It picks each input element independently with probability p, and the output can be of any size from 0 through the input's size (with mean output length p*length(A)). Reservoir sampling is simply irrelevant to that, although they almost certainly use a form of skip-generation internally.

The quoted docs don't make much sense: for any given p, O(p*N) = O(N). I'm guessing they're trying to say that the mean of the number of times the RNG is called is p*N.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41311>
_______________________________________


More information about the Python-bugs-list mailing list