[SciPy-Dev] GSoC: Integrating library UNU.RAN in SciPy

Ralf Gommers ralf.gommers at gmail.com
Tue Jun 8 13:43:41 EDT 2021


On Tue, Jun 8, 2021 at 6:53 PM Tirth Patel <tirthasheshpatel at gmail.com>
wrote:

> Hi all,
>
> I have been working on integrating UNU.RAN in scipy.stats for a few weeks
> on my fork (https://github.com/tirthasheshpatel/scipy/pull/6) with my
> mentors, Christoph and Nicholas. The design is still fairly basic but is
> almost ready for initial reviews. Also, it builds on Windows, macOS, and
> Linux. Tests pass with an exception of unrelated failures on Linux. I am
> planning to propose a PR on SciPy this week but before I do that, I thought
> it would be a good idea to summarize the progress here.
>

Thanks for the nice overview Tirth!


> The methods I propose to add and details regarding the parameters they
> take can be found in this excel sheet:
> https://docs.google.com/spreadsheets/d/1r36HypXwpit7sHt9YAe3K7yPL7HNNuQjTkbVYQIJ4xI/edit?usp=sharing.
> Feel free to comment on the sheet if you have any suggestions.
>
> The API looks something like this:
>
>     import numpy as np
>     from scipy.stats import TransformedDensityRejection
>
>     pdf = lambda x: 1 - x*x
>     dpdf = lambda x: -2*x
>
>     urng = np.random.default_rng(123)
>     rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=urng)
>
>     # sample a 100,000 samples from the distribution
>     rvs = rng.rvs(100_000)
>
>     # evaluate the upper bound of the PPF at some points in the domain
>     u = np.linspace(0, 1, num=1000)
>     ppf_hat = rng.ppf_hat(u)
>
> Internal API
> ------------
>
> For Python callbacks, I have used the `ccallback` API and used
> setjmp/longjmp for error handling. Using `ctypes` gives similar performance
> but I wasn't able to figure out a way to handle errors occurring in UNU.RAN
> and in the callbacks. Here is the C code I wrote to acquire and release the
> callbacks:
> https://github.com/tirthasheshpatel/scipy/blob/unuran_c_thunk/scipy/stats/unuran/unuran_callback.h
>
> For more details on the internal design of the API, I have attempted to
> create a few flowcharts on draw.io but they are very much a WIP:
> https://drive.google.com/file/d/1TH70SSvvc5eF6-YmDO8kFNvLqKaGROW-/view?usp=sharing
>
> Other Notes
> -----------
>
> I think NumPy 1.19 introduced a stable interface for its Generator API.
> Before that the RandomState API was stable. But RandomState doesn't have a
> Cython Interface and its Python interface is very slow to be practically
> useful. For instance, it takes around 7ms to sample 100,000 samples from
> the normal distribution using the TDR method on NumPy >= 1.19 while it
> takes around 911ms on NumPy == 1.18 which is around 130 times slower! Is
> there a plan to drop 1.16 soon and can we use the unstable Generator API
> from 1.17 onwards or would it too unsafe? Maybe this discussion isn't
> suited here but I just thought to put it out as a note.
>

We can drop NumPy 1.16 right now. I'm not sure if the 1.17 C API for
numpy.random is usable - it was either missing some features or not present
at all. What the recently added biasedurn does is a conditional compile -
only use that C API for NumPy >= 1.19. If the performance on <1.19 isn't
completely unusable, that may be a good option?

Cheers,
Ralf


>
> Anyways, please feel free to comment on the design, naming convention, etc
> and suggest changes/ideas that would be good to incorporate before
> proposing a PR on SciPy. Thanks for reading this!
>
>
> --
> Kind Regards,
> Tirth Patel
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210608/418bfb12/attachment-0001.html>


More information about the SciPy-Dev mailing list