[SciPy-Dev] GSoC: Integrate library UNU.RAN into scipy.stats

Christoph Baumgarten christoph.baumgarten at gmail.com
Fri Apr 2 15:44:24 EDT 2021


Hi Tirth,

great to hear that you are interested in the project! My main goal would be
to add the "universal" rv generation methods to SciPy, e.g. PINV, TDR (UNU.RAN
User Manual (wu.ac.at)
<http://statmath.wu.ac.at/software/unuran/doc/unuran.html#Methods_005ffor_005fCONT>).
At the moment, we just have one such function in SciPy (Statistical
functions (scipy.stats) — SciPy v1.6.2 Reference Guide
<https://docs.scipy.org/doc/scipy/reference/stats.html#random-variate-generation>)
and it is very basic (I implemented it a while ago). Such functionality is
very useful in many situations, see e.g. OverflowError when sampling from
some handmade stats distributions · Issue #13051 · scipy/scipy (github.com)
<https://github.com/scipy/scipy/issues/13051> So the API would rather be
name_of_sampling_method(pdf / cdf, parameters of the sampling methods).

Whether one should add a keyword to distribution.rvs(...) that allows the
user to choose the sampling method might be a question for a follow-up
project. This would also be quite time-consuming since you need to verify
which method is appropriate for a given distribution. A simpler task could
be to check if the rvs methods of a specific distribution could be
overwritten with the corresponding method in UNU.RAN (UNU.RAN User Manual
(wu.ac.at)
<http://statmath.wu.ac.at/software/unuran/doc/unuran.html#Stddist>).
For example, geninvgauss in SciPy relies on a Python implementation of a
rejection method / RoU and the implementation in UNU.RAN (gig / gig2) might
be faster. Also distributions with slow ppf methods relying on special
functions would be natural candidates. But that would also be of lower
priority for me.

I hope it helps. Feel free to reach out if you have more questions.

Christoph

On Fri, Apr 2, 2021 at 6:00 PM <scipy-dev-request at python.org> wrote:

> Send SciPy-Dev mailing list submissions to
>         scipy-dev at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.python.org/mailman/listinfo/scipy-dev
> or, via email, send a message with subject or body 'help' to
>         scipy-dev-request at python.org
>
> You can reach the person managing the list at
>         scipy-dev-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of SciPy-Dev digest..."
>
>
> Today's Topics:
>
>    1. Multivariate non-central hypergeometric distributions
>       (Wallenius' and Fisher's) (???? ?????????)
>    2. GSoC: Integrate library UNU.RAN into scipy.stats (Tirth Patel)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 2 Apr 2021 00:05:10 +0200
> From: ???? ????????? <samogot at gmail.com>
> To: scipy-dev at python.org
> Subject: [SciPy-Dev] Multivariate non-central hypergeometric
>         distributions (Wallenius' and Fisher's)
> Message-ID:
>         <
> CAMJZOa0xemJR7WCjcMcEyGUKvqMqPF6YDtLxbb+r2e4d1k-HDA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi everyone.
>
> Univariate versions of non-central hypergeometric distributions based
> on Agner Fog's BiasedUrn C++ code were added recently (in
> https://github.com/scipy/scipy/pull/13330). C++ code added in that PR
> already contains the implementation of multivariate versions of the same
> distributions. As far as I understand, the only things needed for
> multivariate distributions to work are Python wrapper and probably some
> tests.
>
> Is anyone interested in adding them? If not, I might get to it myself later
> this month, but as I haven't made any scipy contributions yet and am not
> familiar with the codebase, I will need much more time to rump up than an
> experienced contributor :)
>
> --
> Regards,
> Ivan Naydonov
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> https://mail.python.org/pipermail/scipy-dev/attachments/20210402/455ec86f/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Fri, 2 Apr 2021 19:19:26 +0530
> From: Tirth Patel <tirthasheshpatel at gmail.com>
> To: scipy-dev <scipy-dev at python.org>
> Subject: [SciPy-Dev] GSoC: Integrate library UNU.RAN into scipy.stats
> Message-ID:
>         <CABpuv38XtcJWOT6kskF_Rv3T=_
> 0iSoNCVr7gtnupL0kGQixfWg at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi all,
>
> I would like to participate in GSoC this year and found this project
> very interesting!
>
> TL; DR: I have a few questions regarding the project:
>   - Is the user interface desired as a separate python submodule
> (inside `scipy.stats`) or does it serve as an extension of the `rvs`
> method?
>   - Should UNU.RAN C library be included as a submodule within SciPy
> (e.g. gh-12043) or be cloned from a separate GitHub submodule (e.g
> gh-13328)?
>
> About Me
> ********
> I am Tirth (@tirthasheshpatel on GitHub), a third-year computer
> science undergrad student. I am quite familiar with Cython and a lot
> of my college courses make use of C. I have a good knowledge of
> probability theory and statistics.
>
> Open Source work: I have participated in GSoC with the PyMC team last
> year. I am a contributor to SciPy since May 2020 and recently a
> maintainer.
>
> About Project
> *************
> I had a question about the project. Is the user interface desired as a
> separate python submodule inside `scipy.stats`? like:
>
>     import scipy.stats as stats
>
>     # sample a 1000 variates from a normal distribution
>     # with mean 0 and std 1.5. Let UNU.RAN choose the method
>     rvs = stats.random.normal(0., 1.5, size=1000, method='auto')
>
>     # sample 100 samples from the beta distribution using TDR method
>     beta_rvs = stats.random.beta(1, 2, size=100, method='tdr')
>
>     # the `rvs` methods remains unaffected.
>     norm_rvs = stats.norm.rvs(0, 1.5, size=1000)
>
> Or does it serve as an extension of the `rvs` method:
>
>     from scipy.stats import norm, beta
>
>     # something like this:
>     # method = None => same behaviour as previous versions
>     # method = 'auto' => use UNU.RAN and let it choose the method
>     rvs = norm.rvs(0, 1.5, size=1000, method='auto')
>
>     beta_rvs = beta.rvs(1, 2, size=100, method='tdr')
>
> Also, should UNU.RAN C library be included as a submodule within SciPy
> (e.g. gh-12043) or be cloned from a separate GitHub submodule (e.g
> gh-13328)?
>
>
> --
> Kind Regards,
> Tirth
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
>
> ------------------------------
>
> End of SciPy-Dev Digest, Vol 210, Issue 2
> *****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210402/5411758e/attachment.html>


More information about the SciPy-Dev mailing list