[SciPy-Dev] SciPy-Dev Digest, Vol 193, Issue 12
rlucas7 at vt.edu
rlucas7 at vt.edu
Fri Nov 15 20:54:33 EST 2019
> On Fri, Nov 15, 2019 at 4:01 PM Andrew Reed <reed at cs.unc.edu> wrote
>
>> I imagine that implementing a new method for rvs will probability break
>> the repeatability of previous versions of SciPy, and I'm not sure if this
>> is a distribution that warrants optimization.
>>
>
> I don't think we've ever made such guarantees for the `rvs()` distribution
> methods. In any case, moving from the inefficient default inversion to a
> reasonably efficient sampling algorithm would win the argument over
> stability.
Not sure the implementation and probably would still want to “check” some extreme parameter cases to make sure things don’t break down because of numerical over/underflow-at least not more than current.
I think the rvs() method unit tests have an assumption of A single uniform draw for each sampled value in the unit tests. If you are using something like the difference of 2 exponential random variables with the same rate/scale you’ll need to turn off the tests IIRC. Last I checked there are some existing examples of that in the tests.
Hope it helps.
-Lucas Roberts
> On Nov 15, 2019, at 8:17 PM, scipy-dev-request at python.org wrote:
>
> Send SciPy-Dev mailing list submissions to
> scipy-dev at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman/listinfo/scipy-dev
> or, via email, send a message with subject or body 'help' to
> scipy-dev-request at python.org
>
> You can reach the person managing the list at
> scipy-dev-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-Dev digest..."
>
>
> Today's Topics:
>
> 1. Optimization to stats.dlaplace.rvs (Andrew Reed)
> 2. Re: Optimization to stats.dlaplace.rvs (Evgeni Burovski)
> 3. Re: Optimization to stats.dlaplace.rvs (Robert Kern)
> 4. Re: Adding logsoftmax function to scipy.special (Ralf Gommers)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 15 Nov 2019 16:00:45 -0500
> From: Andrew Reed <reed at cs.unc.edu>
> To: scipy-dev at python.org
> Subject: [SciPy-Dev] Optimization to stats.dlaplace.rvs
> Message-ID:
> <CAL7O2ZvdAHqJWu8RaMkuUNskxhWp6a40cDMAOrQXVVhGTyQg_Q at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> All,
>
> Some of you may have seen a message I sent to the NumPy mailing list about
> adding a two-sided geometric distribution and/or the comments to my PR on
> Github:
> https://mail.python.org/pipermail/numpy-discussion/2019-November/080223.html
> https://github.com/numpy/numpy/pull/14890
>
> Bottom line, rather than add it as a distribution to NumPy, it was
> suggested that I look into adding it to stats.dlaplace.rvs (which currently
> uses the inverted CDF) and I was provided with some code to get me started.
>
> I have been able to add the suggested code, with only minor tweaks, to
> SciPy. A few tests with timeit seem to confirm that the new code provides
> a speedup of about 250% on my machine. Furthermore, the default rvs
> function would get killed when I tried to generate 100 million samples,
> whereas this new code can generate at least 100 million samples (I get a
> MemoryError on my VM when I try to go any higher).
>
> I think I'm at the point now where I need to start working through some
> broadcasting errors, but before I do, I wanted to gauge the potential
> interest in these improvements.
>
> I imagine that implementing a new method for rvs will probability break the
> repeatability of previous versions of SciPy, and I'm not sure if this is a
> distribution that warrants optimization.
>
> Thanks,
> Andrew
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191115/31ff410d/attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 16 Nov 2019 01:06:23 +0300
> From: Evgeni Burovski <evgeny.burovskiy at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Optimization to stats.dlaplace.rvs
> Message-ID:
> <CAMRo0ivaKjdZPXN=1xiiDGPo9=F=akFRBGkX=f5=5aTQL9XaqQ at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Yes, an efficient implementation of dlaplace._rvs would be in scope, I'd
> think.
>
> ??, 16 ????. 2019 ?., 0:01 Andrew Reed <reed at cs.unc.edu>:
>
>> All,
>>
>> Some of you may have seen a message I sent to the NumPy mailing list about
>> adding a two-sided geometric distribution and/or the comments to my PR on
>> Github:
>>
>> https://mail.python.org/pipermail/numpy-discussion/2019-November/080223.html
>> https://github.com/numpy/numpy/pull/14890
>>
>> Bottom line, rather than add it as a distribution to NumPy, it was
>> suggested that I look into adding it to stats.dlaplace.rvs (which currently
>> uses the inverted CDF) and I was provided with some code to get me started.
>>
>> I have been able to add the suggested code, with only minor tweaks, to
>> SciPy. A few tests with timeit seem to confirm that the new code provides
>> a speedup of about 250% on my machine. Furthermore, the default rvs
>> function would get killed when I tried to generate 100 million samples,
>> whereas this new code can generate at least 100 million samples (I get a
>> MemoryError on my VM when I try to go any higher).
>>
>> I think I'm at the point now where I need to start working through some
>> broadcasting errors, but before I do, I wanted to gauge the potential
>> interest in these improvements.
>>
>> I imagine that implementing a new method for rvs will probability break
>> the repeatability of previous versions of SciPy, and I'm not sure if this
>> is a distribution that warrants optimization.
>>
>> Thanks,
>> Andrew
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191116/b70a14de/attachment-0001.html>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 15 Nov 2019 17:13:44 -0500
> From: Robert Kern <robert.kern at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Optimization to stats.dlaplace.rvs
> Message-ID:
> <CAF6FJisoCA=42TS64SdQiyKDwjrw+rE_QF+St0Z5ZRYeAZdhGg at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Fri, Nov 15, 2019 at 4:01 PM Andrew Reed <reed at cs.unc.edu> wrote
>
>> I imagine that implementing a new method for rvs will probability break
>> the repeatability of previous versions of SciPy, and I'm not sure if this
>> is a distribution that warrants optimization.
>>
>
> I don't think we've ever made such guarantees for the `rvs()` distribution
> methods. In any case, moving from the inefficient default inversion to a
> reasonably efficient sampling algorithm would win the argument over
> stability. Tweaking an existing efficient algorithm, maybe you could argue
> more in favor of stability, but as you point out, this is a GO/NO-GO kind
> of improvement.
>
> --
> Robert Kern
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191115/36b6632c/attachment-0001.html>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 15 Nov 2019 17:16:35 -0800
> From: Ralf Gommers <ralf.gommers at gmail.com>
> To: SciPy Developers List <scipy-dev at python.org>
> Subject: Re: [SciPy-Dev] Adding logsoftmax function to scipy.special
> Message-ID:
> <CABL7CQh9MJH8=QmroERcOvgz_TmuueC5oNUZKT9LBV2Aah0Oyw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Takuya,
>
>
>> On Tue, Nov 12, 2019 at 9:05 PM Takuya Koumura <koumura at cycentum.com> wrote:
>>
>> Hello,
>>
>> I raised a GitHub issue (#11058) and was suggested to post it to scipy-dev.
>>
>> I?m considering to send a PR to add logsoftmax function in scipy.special.
>> Before that, I would like to hear your opinion (partly because it?s my
>> first time to send a PR to scipy).
>>
>
> Welcome! Thanks for proposing that. logsoftmax is fairly popular at least
> in deep learning, so it makes sense I think, and we already have a bunch of
> other log* functions in scipy.special.
>
> I noticed that both PyTorch and Tensorflow name this function `log_softmax`
> rather than `logsoftmax`. The latter would be a little more consistent with
> other functions (although we also have `special.log_ndtr`), while the
> former is consistent with other implementations of the same functionality.
> I'd be okay with either, with a slight preference for `log_softmax`.
>
>
>> I would like to implement logsoftmax(x) as x-logsumexp(x). Actually,
>> special.softmax(x) = np.exp(x-logsumexp(x)), so it is trivial for those who
>> read the source code of softmax, but I think including logsoftmax as a
>> separate function will be useful for other users. Logsoftmax is more
>> accurate with inputs that make softmax saturate, eg: When x=[1000, 0],
>> np.log(softmax(x))=[0, -Inf] (maybe depending on the floating point
>> precision), while logsoftmax(x)=[0, -1000].
>>
>> I am planning to add the new function at the bottom of
>> special/_logsumexp.py following the softmax function, and add some unit
>> tests in special/test/test_logsumexp.py. If you have comments, I?d
>> appreciate any.
>>
>
> That seems like a good place.
>
> Cheers,
> Ralf
>
>
>> Best wishes,
>> --
>> Takuya KOUMURA
>> koumura at cycentum.com
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20191115/9b864ed6/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
>
> ------------------------------
>
> End of SciPy-Dev Digest, Vol 193, Issue 12
> ******************************************
More information about the SciPy-Dev
mailing list