[issue41311] Add a function to get a random sample from an iterable (reservoir sampling)
Tim Peters
report at bugs.python.org
Thu Jul 16 22:40:06 EDT 2020
Tim Peters <tim at python.org> added the comment:
Thanks! That explanation really helps explain where "geometric distribution" comes from. Although why it keeps taking k'th roots remains a mystery to me ;-)
Speaking of which, the two instances of
exp(log(random())/k)
are numerically suspect. Better written as
random()**(1/k)
The underlying `pow()` implementation will, in effect, compute log(random()) with extra bits of precision for internal use. Doing log(random()) forces it to use a 53-bit approximation. Not to mention that it's more _obvious_ to write a k'th root as a k'th root. Note: then the 1/k can be computed once outside the loop.
Perhaps worse is
log(1-W)
which should be written
log1p(-W)
instead. W is between 0 and 1, and the closer it is to 0 the more its trailing bits are entirely lost in computing 1-W. It's the purpose of log1p to combat this very problem.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41311>
_______________________________________
More information about the Python-bugs-list
mailing list