[SciPy-Dev] GSoC: Integrate library UNU.RAN into scipy.stats

Hanno Klemm h.klemm at gmx.de
Sun Apr 4 11:51:41 EDT 2021


Hi Christoph, Tirth,

this sounds like an interesting project, however, when I look at the documentation of UNU.RAN, it seems to be licensed under the GPL. I always thought that GPL is incompatible with scipy’s license?

Kind regards,
Hanno

> On 2. Apr 2021, at 21:44, Christoph Baumgarten <christoph.baumgarten at gmail.com> wrote:
> 
> 
> 
> Hi Tirth,
> 
> great to hear that you are interested in the project! My main goal would be to add the "universal" rv generation methods to SciPy, e.g. PINV, TDR (UNU.RAN User Manual (wu.ac.at)). At the moment, we just have one such function in SciPy (Statistical functions (scipy.stats) — SciPy v1.6.2 Reference Guide) and it is very basic (I implemented it a while ago). Such functionality is very useful in many situations, see e.g. OverflowError when sampling from some handmade stats distributions · Issue #13051 · scipy/scipy (github.com) So the API would rather be name_of_sampling_method(pdf / cdf, parameters of the sampling methods).
> 
> Whether one should add a keyword to distribution.rvs(...) that allows the user to choose the sampling method might be a question for a follow-up project. This would also be quite time-consuming since you need to verify which method is appropriate for a given distribution. A simpler task could be to check if the rvs methods of a specific distribution could be overwritten with the corresponding method in UNU.RAN (UNU.RAN User Manual (wu.ac.at)). For example, geninvgauss in SciPy relies on a Python implementation of a rejection method / RoU and the implementation in UNU.RAN (gig / gig2) might be faster. Also distributions with slow ppf methods relying on special functions would be natural candidates. But that would also be of lower priority for me.
> 
> I hope it helps. Feel free to reach out if you have more questions.
> 
> Christoph
> 
>> On Fri, Apr 2, 2021 at 6:00 PM <scipy-dev-request at python.org> wrote:
>> Send SciPy-Dev mailing list submissions to
>>         scipy-dev at python.org
>> 
>> To subscribe or unsubscribe via the World Wide Web, visit
>>         https://mail.python.org/mailman/listinfo/scipy-dev
>> or, via email, send a message with subject or body 'help' to
>>         scipy-dev-request at python.org
>> 
>> You can reach the person managing the list at
>>         scipy-dev-owner at python.org
>> 
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of SciPy-Dev digest..."
>> 
>> 
>> Today's Topics:
>> 
>>    1. Multivariate non-central hypergeometric distributions
>>       (Wallenius' and Fisher's) (???? ?????????)
>>    2. GSoC: Integrate library UNU.RAN into scipy.stats (Tirth Patel)
>> 
>> 
>> ----------------------------------------------------------------------
>> 
>> Message: 1
>> Date: Fri, 2 Apr 2021 00:05:10 +0200
>> From: ???? ????????? <samogot at gmail.com>
>> To: scipy-dev at python.org
>> Subject: [SciPy-Dev] Multivariate non-central hypergeometric
>>         distributions (Wallenius' and Fisher's)
>> Message-ID:
>>         <CAMJZOa0xemJR7WCjcMcEyGUKvqMqPF6YDtLxbb+r2e4d1k-HDA at mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>> 
>> Hi everyone.
>> 
>> Univariate versions of non-central hypergeometric distributions based
>> on Agner Fog's BiasedUrn C++ code were added recently (in
>> https://github.com/scipy/scipy/pull/13330). C++ code added in that PR
>> already contains the implementation of multivariate versions of the same
>> distributions. As far as I understand, the only things needed for
>> multivariate distributions to work are Python wrapper and probably some
>> tests.
>> 
>> Is anyone interested in adding them? If not, I might get to it myself later
>> this month, but as I haven't made any scipy contributions yet and am not
>> familiar with the codebase, I will need much more time to rump up than an
>> experienced contributor :)
>> 
>> --
>> Regards,
>> Ivan Naydonov
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210402/455ec86f/attachment-0001.html>
>> 
>> ------------------------------
>> 
>> Message: 2
>> Date: Fri, 2 Apr 2021 19:19:26 +0530
>> From: Tirth Patel <tirthasheshpatel at gmail.com>
>> To: scipy-dev <scipy-dev at python.org>
>> Subject: [SciPy-Dev] GSoC: Integrate library UNU.RAN into scipy.stats
>> Message-ID:
>>         <CABpuv38XtcJWOT6kskF_Rv3T=_0iSoNCVr7gtnupL0kGQixfWg at mail.gmail.com>
>> Content-Type: text/plain; charset="UTF-8"
>> 
>> Hi all,
>> 
>> I would like to participate in GSoC this year and found this project
>> very interesting!
>> 
>> TL; DR: I have a few questions regarding the project:
>>   - Is the user interface desired as a separate python submodule
>> (inside `scipy.stats`) or does it serve as an extension of the `rvs`
>> method?
>>   - Should UNU.RAN C library be included as a submodule within SciPy
>> (e.g. gh-12043) or be cloned from a separate GitHub submodule (e.g
>> gh-13328)?
>> 
>> About Me
>> ********
>> I am Tirth (@tirthasheshpatel on GitHub), a third-year computer
>> science undergrad student. I am quite familiar with Cython and a lot
>> of my college courses make use of C. I have a good knowledge of
>> probability theory and statistics.
>> 
>> Open Source work: I have participated in GSoC with the PyMC team last
>> year. I am a contributor to SciPy since May 2020 and recently a
>> maintainer.
>> 
>> About Project
>> *************
>> I had a question about the project. Is the user interface desired as a
>> separate python submodule inside `scipy.stats`? like:
>> 
>>     import scipy.stats as stats
>> 
>>     # sample a 1000 variates from a normal distribution
>>     # with mean 0 and std 1.5. Let UNU.RAN choose the method
>>     rvs = stats.random.normal(0., 1.5, size=1000, method='auto')
>> 
>>     # sample 100 samples from the beta distribution using TDR method
>>     beta_rvs = stats.random.beta(1, 2, size=100, method='tdr')
>> 
>>     # the `rvs` methods remains unaffected.
>>     norm_rvs = stats.norm.rvs(0, 1.5, size=1000)
>> 
>> Or does it serve as an extension of the `rvs` method:
>> 
>>     from scipy.stats import norm, beta
>> 
>>     # something like this:
>>     # method = None => same behaviour as previous versions
>>     # method = 'auto' => use UNU.RAN and let it choose the method
>>     rvs = norm.rvs(0, 1.5, size=1000, method='auto')
>> 
>>     beta_rvs = beta.rvs(1, 2, size=100, method='tdr')
>> 
>> Also, should UNU.RAN C library be included as a submodule within SciPy
>> (e.g. gh-12043) or be cloned from a separate GitHub submodule (e.g
>> gh-13328)?
>> 
>> 
>> --
>> Kind Regards,
>> Tirth
>> 
>> 
>> ------------------------------
>> 
>> Subject: Digest Footer
>> 
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>> 
>> 
>> ------------------------------
>> 
>> End of SciPy-Dev Digest, Vol 210, Issue 2
>> *****************************************
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210404/1c249fc2/attachment-0001.html>


More information about the SciPy-Dev mailing list