[SciPy-Dev] SciPy-Dev Digest, Vol 193, Issue 12

rlucas7 at vt.edu rlucas7 at vt.edu
Fri Nov 15 20:54:33 EST 2019


> On Fri, Nov 15, 2019 at 4:01 PM Andrew Reed <reed at cs.unc.edu> wrote
> 
>> I imagine that implementing a new method for rvs will probability break
>> the repeatability of previous versions of SciPy, and I'm not sure if this
>> is a distribution that warrants optimization.
>> 
> 
> I don't think we've ever made such guarantees for the `rvs()` distribution
> methods. In any case, moving from the inefficient default inversion to a
> reasonably efficient sampling algorithm would win the argument over
> stability.

Not sure the implementation and probably would still want to “check” some extreme parameter cases to make sure things don’t break down because of numerical over/underflow-at least not more than current. 

I think the rvs() method unit tests have an assumption of A single uniform draw for each sampled value in the unit tests. If you are using something like the difference of 2 exponential random variables with the same rate/scale you’ll need to turn off the tests IIRC. Last I checked there are some existing examples of that in the tests.

Hope it helps. 
-Lucas Roberts

> On Nov 15, 2019, at 8:17 PM, scipy-dev-request at python.org wrote:
> 
> Send SciPy-Dev mailing list submissions to
>    scipy-dev at python.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>    https://mail.python.org/mailman/listinfo/scipy-dev
> or, via email, send a message with subject or body 'help' to
>    scipy-dev-request at python.org
> 
> You can reach the person managing the list at
>    scipy-dev-owner at python.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-Dev digest..."
> 
> 
> Today's Topics:
> 
>   1. Optimization to stats.dlaplace.rvs (Andrew Reed)
>   2. Re: Optimization to stats.dlaplace.rvs (Evgeni Burovski)
>   3. Re: Optimization to stats.dlaplace.rvs (Robert Kern)
>   4. Re: Adding logsoftmax function to scipy.special (Ralf Gommers)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Fri, 15 Nov 2019 16:00:45 -0500
> From: Andrew Reed <reed at cs.unc.edu>
> To: scipy-dev at python.org
> Subject: [SciPy-Dev] Optimization to stats.dlaplace.rvs
> Message-ID:
>    <CAL7O2ZvdAHqJWu8RaMkuUNskxhWp6a40cDMAOrQXVVhGTyQg_Q at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> All,
> 
> Some of you may have seen a message I sent to the NumPy mailing list about
> adding a two-sided geometric distribution and/or the comments to my PR on
> Github:
> https://mail.python.org/pipermail/numpy-discussion/2019-November/080223.html
> https://github.com/numpy/numpy/pull/14890
> 
> Bottom line, rather than add it as a distribution to NumPy, it was
> suggested that I look into adding it to stats.dlaplace.rvs (which currently
> uses the inverted CDF) and I was provided with some code to get me started.
> 
> I have been able to add the suggested code, with only minor tweaks, to
> SciPy.  A few tests with timeit seem to confirm that the new code provides
> a speedup of about 250% on my machine.  Furthermore, the default rvs
> function would get killed when I tried to generate 100 million samples,
> whereas this new code can generate at least 100 million samples (I get a
> MemoryError on my VM when I try to go any higher).
> 
> I think I'm at the point now where I need to start working through some
> broadcasting errors, but before I do, I wanted to gauge the potential
> interest in these improvements.
> 
> I imagine that implementing a new method for rvs will probability break the
> repeatability of previous versions of SciPy, and I'm not sure if this is a
> distribution that warrants optimization.
> 
> Thanks,
> Andrew
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191115/31ff410d/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 2
> Date: Sat, 16 Nov 2019 01:06:23 +0300
> From: Evgeni Burovski <evgeny.burovskiy at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Optimization to stats.dlaplace.rvs
> Message-ID:
>    <CAMRo0ivaKjdZPXN=1xiiDGPo9=F=akFRBGkX=f5=5aTQL9XaqQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Yes, an efficient implementation of dlaplace._rvs would be in scope, I'd
> think.
> 
> ??, 16 ????. 2019 ?., 0:01 Andrew Reed <reed at cs.unc.edu>:
> 
>> All,
>> 
>> Some of you may have seen a message I sent to the NumPy mailing list about
>> adding a two-sided geometric distribution and/or the comments to my PR on
>> Github:
>> 
>> https://mail.python.org/pipermail/numpy-discussion/2019-November/080223.html
>> https://github.com/numpy/numpy/pull/14890
>> 
>> Bottom line, rather than add it as a distribution to NumPy, it was
>> suggested that I look into adding it to stats.dlaplace.rvs (which currently
>> uses the inverted CDF) and I was provided with some code to get me started.
>> 
>> I have been able to add the suggested code, with only minor tweaks, to
>> SciPy.  A few tests with timeit seem to confirm that the new code provides
>> a speedup of about 250% on my machine.  Furthermore, the default rvs
>> function would get killed when I tried to generate 100 million samples,
>> whereas this new code can generate at least 100 million samples (I get a
>> MemoryError on my VM when I try to go any higher).
>> 
>> I think I'm at the point now where I need to start working through some
>> broadcasting errors, but before I do, I wanted to gauge the potential
>> interest in these improvements.
>> 
>> I imagine that implementing a new method for rvs will probability break
>> the repeatability of previous versions of SciPy, and I'm not sure if this
>> is a distribution that warrants optimization.
>> 
>> Thanks,
>> Andrew
>> 
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191116/b70a14de/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 3
> Date: Fri, 15 Nov 2019 17:13:44 -0500
> From: Robert Kern <robert.kern at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Optimization to stats.dlaplace.rvs
> Message-ID:
>    <CAF6FJisoCA=42TS64SdQiyKDwjrw+rE_QF+St0Z5ZRYeAZdhGg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> On Fri, Nov 15, 2019 at 4:01 PM Andrew Reed <reed at cs.unc.edu> wrote
> 
>> I imagine that implementing a new method for rvs will probability break
>> the repeatability of previous versions of SciPy, and I'm not sure if this
>> is a distribution that warrants optimization.
>> 
> 
> I don't think we've ever made such guarantees for the `rvs()` distribution
> methods. In any case, moving from the inefficient default inversion to a
> reasonably efficient sampling algorithm would win the argument over
> stability. Tweaking an existing efficient algorithm, maybe you could argue
> more in favor of stability, but as you point out, this is a GO/NO-GO kind
> of improvement.
> 
> -- 
> Robert Kern
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191115/36b6632c/attachment-0001.html>
> 
> ------------------------------
> 
> Message: 4
> Date: Fri, 15 Nov 2019 17:16:35 -0800
> From: Ralf Gommers <ralf.gommers at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Adding logsoftmax function to scipy.special
> Message-ID:
>    <CABL7CQh9MJH8=QmroERcOvgz_TmuueC5oNUZKT9LBV2Aah0Oyw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Takuya,
> 
> 
>> On Tue, Nov 12, 2019 at 9:05 PM Takuya Koumura <koumura at cycentum.com> wrote:
>> 
>> Hello,
>> 
>> I raised a GitHub issue (#11058) and was suggested to post it to scipy-dev.
>> 
>> I?m considering to send a PR to add logsoftmax function in scipy.special.
>> Before that, I would like to hear your opinion (partly because it?s my
>> first time to send a PR to scipy).
>> 
> 
> Welcome! Thanks for proposing that. logsoftmax is fairly popular at least
> in deep learning, so it makes sense I think, and we already have a bunch of
> other log* functions in scipy.special.
> 
> I noticed that both PyTorch and Tensorflow name this function `log_softmax`
> rather than `logsoftmax`. The latter would be a little more consistent with
> other functions (although we also have `special.log_ndtr`), while the
> former is consistent with other implementations of the same functionality.
> I'd be okay with either, with a slight preference for `log_softmax`.
> 
> 
>> I would like to implement logsoftmax(x) as x-logsumexp(x). Actually,
>> special.softmax(x) = np.exp(x-logsumexp(x)), so it is trivial for those who
>> read the source code of softmax, but I think including logsoftmax as a
>> separate function will be useful for other users. Logsoftmax is more
>> accurate with inputs that make softmax saturate, eg: When x=[1000, 0],
>> np.log(softmax(x))=[0, -Inf] (maybe depending on the floating point
>> precision), while logsoftmax(x)=[0, -1000].
>> 
>> I am planning to add the new function at the bottom of
>> special/_logsumexp.py following the softmax function, and add some unit
>> tests in special/test/test_logsumexp.py. If you have comments, I?d
>> appreciate any.
>> 
> 
> That seems like a good place.
> 
> Cheers,
> Ralf
> 
> 
>> Best wishes,
>> --
>> Takuya KOUMURA
>> koumura at cycentum.com
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191115/9b864ed6/attachment.html>
> 
> ------------------------------
> 
> Subject: Digest Footer
> 
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
> 
> 
> ------------------------------
> 
> End of SciPy-Dev Digest, Vol 193, Issue 12
> ******************************************


More information about the SciPy-Dev mailing list