random.choices() Suggest that the code confirm that cum_weights sequence is in ascending order

Mon May 14 15:03:38 EDT 2018

forgot to edit the subject.  Sorry.
  paul c.

On Mon, May 14, 2018 at 12:02 PM, Paul <tallpaul at gmail.com> wrote:

> Hello all,
>    Thanks for the thoughtful (and non-snarky) replies.
>
> First, a suggestion for a documentation change:
>
> To this paragraph:
>
> *If neither weights nor cum_weights are specified, selections are made
> with equal probability. If a weights sequence is supplied, it must be the
> same length as the population sequence. It is a TypeError
> <https://docs.python.org/3/library/exceptions.html#TypeError> to specify
> both weights and cum_weights.*
> Add this sentence:
>
> "A cum_weights sequence, if supplied, must be in strictly-ascending order,
> else incorrect results will be (silently) returned."
>
> Secondly, about the cost of verifying the sequence:
>
> 1) I understand the added cost of verifying the sequence.  However, this
> appears to be a one-time cost.  E.G., if I submit this,
>
> random.choices(lm,cum_weights=[25,26,36,46,136],k=400
>
> then the code will do an O(n log n) operation 400 times.
>
> If verification was added, then the the code would do an O(n log n)
> operation 400 times, plus an O(n) operation done *one* time.   So, I'm not
> sure that this would be a significant efficiency hit (except in rare cases).
>
> 2) Paul Moore wrote:
>
> > So the people who *really* need cum_weights are those
>
> > who have the cumulative weights already, and cannot
>
> > afford an O(n)precalculation step.
>
>
> I agree that with the "already have the cum_weights" argument.  Based on
> my point #1, I'm not convinced about the "can't afford" argument.
>
> 3) A minor point.  The documentation also says: "so supplying the
> cumulative weights saves work."  However, this is work done (once, as noted
> above) by a computer rather than work done (even if aided by a a computer)
> by a human, so I'd vote for having the computer do it. :)
>
>
> To conclude, I would still lean slightly toward having the code enforce
> the 'strictly-ascending sequence' requirement.  However, given that a)
> improving the documentation is much more doable and that, b) in some cases,
> the addition of an order O(n) step might be significant, I'd be more than
> happy if the documentation could be improved (as suggested).
>
> thanks
>   Paul Czyzewki
>
> PS.  I see the issue which steven.daprano opened.  Thanks, Steven.
> However, I'm not sure what's appropriate in terms of updating that issue,
> or even if I have permission to update it, so I'd appreciate if someone
> would add this response to the issue. Thanks.
>
>
>
>> From: Paul Moore <p.f.moore at gmail.com>
>> To: "Steven D'Aprano" <steve+comp.lang.python at pearwood.info>
>> Cc: Python <python-list at python.org>
>> Bcc:
>> Date: Mon, 14 May 2018 14:35:34 +0100
>> Subject: Re: random.choices() Suggest that the code confirm that
>> cum_weights sequence is in ascending order
>> On 14 May 2018 at 14:07, Steven D'Aprano
>> <steve+comp.lang.python at pearwood.info> wrote:
>> > On Mon, 14 May 2018 12:59:28 +0100, Paul Moore wrote:
>> >
>> >> The problem is that supplying cum_weights allows the code to run in
>> >> O(log n) by using bisection. This is significantly faster on large
>> >> populations. Adding a test that the cumulative weights are
>> nondecreasing
>> >> would add an O(n) step to the code.
>> >>
>> >> So while I understand the OP's problem, I don't think it's soluble
>> >> without making the cum_weights argument useless in practice.
>> >
>> > How does O(N) make it "useless"? There are lots of O(N) algorithms, even
>> > O(N**2) and O(2**N) which are nevertheless still useful.
>>
>> Well, I've never seen an actual use case for this argument (I can't
>> think of a case where I'd even have cumulative weights rather than
>> weights, and obviously calculating the cumulative weights from the
>> actual weights is what we're trying to avoid). And if you have
>> cum_weights and O(n) is fine, then calculating weights from
>> cum_weights is acceptable (although pointless, as it simply duplicates
>> work). So the people who *really* need cum_weights are those who have
>> the cumulative weights already, and cannot afford an O(n)
>> precalculation step.
>>
>> But yes, clearly in itself an O(n) algorithm isn't useless. And
>> agreed, in most cases whether random.choices() is O(n) or O(log n) is
>> irrelevant in practice.
>>
>> Paul
>>
>