[SciPy-Dev] GSoC: Integrating library UNU.RAN in SciPy

Tirth Patel tirthasheshpatel at gmail.com
Wed Jun 9 06:15:19 EDT 2021


Hi Ralf,

On Tue, Jun 8, 2021 at 11:14 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Tue, Jun 8, 2021 at 6:53 PM Tirth Patel <tirthasheshpatel at gmail.com>
> wrote:
>
>> Hi all,
>>
>> I have been working on integrating UNU.RAN in scipy.stats for a few weeks
>> on my fork (https://github.com/tirthasheshpatel/scipy/pull/6) with my
>> mentors, Christoph and Nicholas. The design is still fairly basic but is
>> almost ready for initial reviews. Also, it builds on Windows, macOS, and
>> Linux. Tests pass with an exception of unrelated failures on Linux. I am
>> planning to propose a PR on SciPy this week but before I do that, I thought
>> it would be a good idea to summarize the progress here.
>>
>
> Thanks for the nice overview Tirth!
>
>
>> The methods I propose to add and details regarding the parameters they
>> take can be found in this excel sheet:
>> https://docs.google.com/spreadsheets/d/1r36HypXwpit7sHt9YAe3K7yPL7HNNuQjTkbVYQIJ4xI/edit?usp=sharing.
>> Feel free to comment on the sheet if you have any suggestions.
>>
>> The API looks something like this:
>>
>>     import numpy as np
>>     from scipy.stats import TransformedDensityRejection
>>
>>     pdf = lambda x: 1 - x*x
>>     dpdf = lambda x: -2*x
>>
>>     urng = np.random.default_rng(123)
>>     rng = TransformedDensityRejection(pdf, dpdf, domain=(-1,1), seed=urng)
>>
>>     # sample a 100,000 samples from the distribution
>>     rvs = rng.rvs(100_000)
>>
>>     # evaluate the upper bound of the PPF at some points in the domain
>>     u = np.linspace(0, 1, num=1000)
>>     ppf_hat = rng.ppf_hat(u)
>>
>> Internal API
>> ------------
>>
>> For Python callbacks, I have used the `ccallback` API and used
>> setjmp/longjmp for error handling. Using `ctypes` gives similar performance
>> but I wasn't able to figure out a way to handle errors occurring in UNU.RAN
>> and in the callbacks. Here is the C code I wrote to acquire and release the
>> callbacks:
>> https://github.com/tirthasheshpatel/scipy/blob/unuran_c_thunk/scipy/stats/unuran/unuran_callback.h
>>
>> For more details on the internal design of the API, I have attempted to
>> create a few flowcharts on draw.io but they are very much a WIP:
>> https://drive.google.com/file/d/1TH70SSvvc5eF6-YmDO8kFNvLqKaGROW-/view?usp=sharing
>>
>> Other Notes
>> -----------
>>
>> I think NumPy 1.19 introduced a stable interface for its Generator API.
>> Before that the RandomState API was stable. But RandomState doesn't have a
>> Cython Interface and its Python interface is very slow to be practically
>> useful. For instance, it takes around 7ms to sample 100,000 samples from
>> the normal distribution using the TDR method on NumPy >= 1.19 while it
>> takes around 911ms on NumPy == 1.18 which is around 130 times slower! Is
>> there a plan to drop 1.16 soon and can we use the unstable Generator API
>> from 1.17 onwards or would it too unsafe? Maybe this discussion isn't
>> suited here but I just thought to put it out as a note.
>>
>
> We can drop NumPy 1.16 right now. I'm not sure if the 1.17 C API for
> numpy.random is usable - it was either missing some features or not present
> at all.
>

Nice to hear that we don't need to support 1.16 now! With that, I think
there is a possibility of using the Cython API of NumPy BitGenerator to
speed things up. I checked out a few releases on NumPy and found out
BitGenerator was added in 1.17 with a Cython API. All we need is the
`next_double` and `state` member of the `bitgen_t` object which are present
from 1.17 onwards. The only difference is that 1.17 contains the `bitgen_t`
in `numpy.random.common` module while it was shifted to `numpy.random` from
1.18 onwards. I don't know if there are any known bugs in 1.17 and 1.18
before becoming stable in 1.19. If not, we might be able to use the
unstable NumPy Generator in 1.17 and 1.18 for our purpose. What do you
think?

What the recently added biasedurn does is a conditional compile - only use
> that C API for NumPy >= 1.19. If the performance on <1.19 isn't completely
> unusable, that may be a good option?
>

Yes, that's what I do right now. But I am just worried if the performance
on <1.19 is too slow to practically rely on. Anyways, thanks for looking
into this!

Cheers,
> Ralf
>
>
>>
>> Anyways, please feel free to comment on the design, naming convention,
>> etc and suggest changes/ideas that would be good to incorporate before
>> proposing a PR on SciPy. Thanks for reading this!
>>
>>
>> --
>> Kind Regards,
>> Tirth Patel
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at python.org
>> https://mail.python.org/mailman/listinfo/scipy-dev
>>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at python.org
> https://mail.python.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scipy-dev/attachments/20210609/c7bd9e17/attachment-0001.html>


More information about the SciPy-Dev mailing list