Python-list Digest, Vol 176, Issue 16

Paul tallpaul at gmail.com
Mon May 14 15:02:33 EDT 2018


Hello all,
   Thanks for the thoughtful (and non-snarky) replies.

First, a suggestion for a documentation change:

To this paragraph:

*If neither weights nor cum_weights are specified, selections are made with
equal probability. If a weights sequence is supplied, it must be the same
length as the population sequence. It is a TypeError
<https://docs.python.org/3/library/exceptions.html#TypeError> to specify
both weights and cum_weights.*
Add this sentence:

"A cum_weights sequence, if supplied, must be in strictly-ascending order,
else incorrect results will be (silently) returned."

Secondly, about the cost of verifying the sequence:

1) I understand the added cost of verifying the sequence.  However, this
appears to be a one-time cost.  E.G., if I submit this,

random.choices(lm,cum_weights=[25,26,36,46,136],k=400

then the code will do an O(n log n) operation 400 times.

If verification was added, then the the code would do an O(n log n)
operation 400 times, plus an O(n) operation done *one* time.   So, I'm not
sure that this would be a significant efficiency hit (except in rare cases).

2) Paul Moore wrote:

> So the people who *really* need cum_weights are those

> who have the cumulative weights already, and cannot

> afford an O(n)precalculation step.


I agree that with the "already have the cum_weights" argument.  Based on my
point #1, I'm not convinced about the "can't afford" argument.

3) A minor point.  The documentation also says: "so supplying the
cumulative weights saves work."  However, this is work done (once, as noted
above) by a computer rather than work done (even if aided by a a computer)
by a human, so I'd vote for having the computer do it. :)


To conclude, I would still lean slightly toward having the code enforce the
'strictly-ascending sequence' requirement.  However, given that a)
improving the documentation is much more doable and that, b) in some cases,
the addition of an order O(n) step might be significant, I'd be more than
happy if the documentation could be improved (as suggested).

thanks
  Paul Czyzewki

PS.  I see the issue which steven.daprano opened.  Thanks, Steven.
However, I'm not sure what's appropriate in terms of updating that issue,
or even if I have permission to update it, so I'd appreciate if someone
would add this response to the issue. Thanks.



> From: Paul Moore <p.f.moore at gmail.com>
> To: "Steven D'Aprano" <steve+comp.lang.python at pearwood.info>
> Cc: Python <python-list at python.org>
> Bcc:
> Date: Mon, 14 May 2018 14:35:34 +0100
> Subject: Re: random.choices() Suggest that the code confirm that
> cum_weights sequence is in ascending order
> On 14 May 2018 at 14:07, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
> > On Mon, 14 May 2018 12:59:28 +0100, Paul Moore wrote:
> >
> >> The problem is that supplying cum_weights allows the code to run in
> >> O(log n) by using bisection. This is significantly faster on large
> >> populations. Adding a test that the cumulative weights are nondecreasing
> >> would add an O(n) step to the code.
> >>
> >> So while I understand the OP's problem, I don't think it's soluble
> >> without making the cum_weights argument useless in practice.
> >
> > How does O(N) make it "useless"? There are lots of O(N) algorithms, even
> > O(N**2) and O(2**N) which are nevertheless still useful.
>
> Well, I've never seen an actual use case for this argument (I can't
> think of a case where I'd even have cumulative weights rather than
> weights, and obviously calculating the cumulative weights from the
> actual weights is what we're trying to avoid). And if you have
> cum_weights and O(n) is fine, then calculating weights from
> cum_weights is acceptable (although pointless, as it simply duplicates
> work). So the people who *really* need cum_weights are those who have
> the cumulative weights already, and cannot afford an O(n)
> precalculation step.
>
> But yes, clearly in itself an O(n) algorithm isn't useless. And
> agreed, in most cases whether random.choices() is O(n) or O(log n) is
> irrelevant in practice.
>
> Paul
>



More information about the Python-list mailing list