random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

Paul Moore p.f.moore at gmail.com
Mon May 14 08:49:12 EDT 2018


On 14 May 2018 at 13:27, Chris Angelico <rosuav at gmail.com> wrote:
> On Mon, May 14, 2018 at 9:59 PM, Paul Moore <p.f.moore at gmail.com> wrote:
>> The problem is that supplying cum_weights allows the code to run in
>> O(log n) by using bisection. This is significantly faster on large
>> populations. Adding a test that the cumulative weights are
>> nondecreasing would add an O(n) step to the code.
>>
>
> Hang on - are the 'n' and 'log n' there referring to the same n?

Yes. The number of elements in the sample population (which is the
same as the number of entries in the weights/cum_weights arrays). See
https://github.com/python/cpython/blob/master/Lib/random.py#L382 for
details, but basically calculating cum_weights from weights costs
O(n), and locating the right index into the population by doing a
bisection search (bisect.bisect) on the cum_weights sequence costs
O(log n). Using the cum_weights argument rather than the weights
argument skips the O(n) step.

If it's possible to check that cum_weights is nondecreasing in O(log
n) time (either directly here, or in bisect.bisect), then the check
wouldn't affect the algorithmic complexity of that case (it would
affect the constants, but I assume we don't care too much about that).
But I don't know of a way of doing that.

Improving the documentation is of course free of runtime cost. And
making it clear that "you should only use cum_weights if you know what
you're doing, and in particular it doesn't cost you O(n) to work them
out" would seem entirely reasonable to me.

Paul



More information about the Python-list mailing list