[issue44080] Bias in random.choices(long_sequence)

Sun May 9 13:58:19 EDT 2021

Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:

This is known and an intentional design decision.  It isn't just a speed issue.  Because the weights can be floats, we have floats involved at the outset and some round-off is unavoidable.  To keep the method internally consistent, the same technique is used even when the weights aren't specified:

    >>> from random import choices, seed
    >>> seed(8675309**3)
    >>> s = choices('abcdefg', k=20)
    >>> seed(8675309**3)
    >>> t = choices('abcdefg', [0.7] * 7, k=20)
    >>> s == t
    True

FWIW, this is documented: 
"""
For a given seed, the choices() function with equal weighting typically produces a different sequence than repeated calls to choice(). The algorithm used by choices() uses floating point arithmetic for internal consistency and speed. The algorithm used by choice() defaults to integer arithmetic with repeated selections to avoid small biases from round-off error.
"""

----------
assignee:  -> rhettinger
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue44080>
_______________________________________