[SciPy-Dev] Adding a convenient method to create ufuncs in for scipy.stats

Ralf Gommers ralf.gommers at gmail.com
Sat Mar 25 21:21:07 EDT 2017


On Fri, Mar 17, 2017 at 5:20 PM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:

>
>
> On Thu, Mar 16, 2017 at 4:14 PM, Robert Kern <robert.kern at gmail.com>
> wrote:
>
>> On Thu, Mar 16, 2017 at 12:39 PM, Warren Weckesser <
>> warren.weckesser at gmail.com> wrote:
>> >
>> > On Thu, Mar 16, 2017 at 3:19 PM, Warren Weckesser <
>> warren.weckesser at gmail.com> wrote:
>> >>
>> >> I'm working on an update to the Frechet distribution in scipy.stats
>> (see https://github.com/scipy/scipy/issues/3258 and
>> https://github.com/scipy/scipy/pull/3275).
>> >>
>> >> Instead jumping through the "lazy_where" hoops that are required for
>> conditional computations, it would be much easier to create a ufunc for the
>> standard PDF, CDF and possibly other required functions.  Easier, that is,
>> if I use the ufunc generation tools that we have over in scipy.special.
>> Would there be any objections to this?  We already have quite a few
>> functions for probability distributions in scipy.special:
>> https://docs.scipy.org/doc/scipy/reference/special.html#raw-
>> statistical-functions
>> >>
>> >> I wouldn't mind creating ufuncs for some of the other distributions,
>> too.  A ufunc implementation is more efficient, simplifies the code in
>> scipy.stats, and automatically handles broadcasting.
>> >>
>> >> I'm bringing this up here to see if anyone has any objections to the
>> expansion of the statistical functions in scipy.special.
>> >>
>> >> Warren
>> >
>> > In my previous email, the heading hints at an alternative that I didn't
>> mention in the text.  The question implied in the heading is: what do folks
>> think about adding ufunc generation tools to scipy.stats, instead of
>> generating the ufuncs in scipy.special.  There are a lot of conditional
>> computations in scipy.stats that would benefit from being implemented as
>> ufuncs, but probably don't need to be public functions.  So instead of
>> adding more functions to scipy.special, perhaps we could add code in
>> scipy.stats for generating ufuncs, many of which would be private.  Of
>> course, we could just generate private ufuncs in scipy.special, and only
>> use them in scipy.stats.
>>
>
The change to ufuncs instead of lazywhere usage looks good. If the ufuncs
remain private I don't have much of a preference where they live. Putting
the ufunc generation machinery in scipy._lib may be useful long term for
other purposes as well, so in that case these ufucs could live in
scipy.stats.


>
>> +1 for adding additional more standard PDF/CDF functions to scipy.special
>> as needed.
>>
>
If we'd do that for all or most distributions, that'd be several hundred
more functions. I don't think those should all be added to the
scipy.special namespace, it'll become too large. Admittedly it's already a
mess, but let's not make it worse.

Not sure this is too relevant though - we first need to decide on public vs
private. Currently I don't see the point in exposing frechet_pdf and
frechet_cdf as public functions. They'll get very limited use, and they
don't add much over using stats.frechet.pdf/cdf. So my vote is for keeping
them private.

Ralf



>
>> There's already precedent for putting statistics-related but not
>> distribution-related ufuncs into scipy.special, specifically for the
>> conditional operations, e.g. boxcox(). On the other hand, if the functions
>> you are thinking of would not be part of the public API, then I'd prefer to
>> implement them in scipy.stats instead of scipy.special.
>>
>> What work do you think is entailed in implementing the ufuncs in
>> scipy.stats? Is there infrastructure we need to duplicate? Can we abstract
>> out the build infrastructure to a common place? I haven't looked at the
>> build details for scipy.special in some time.
>>
>
>
> The code that generates the ufunc boilerplate code is in
> scipy/special/generate_ufuncs.py.  It generates the appropriate wrapper
> code for a core scalar function that is written in Cython, C or C++.  I
> just submitted a pull request (https://github.com/scipy/scipy/pull/7190,
> still WIP) in which I wrote the core distribution functions for the Frechet
> distribution in Cython, added the signature information to the big honkin'
> FUNCS string in generate_ufuncs.py, added placeholders for the docstrings
> in add_newdocs.py, and then used the ufuncs in the implementation of the
> `frechet` class in stats.
>
> For the moment, the Frechet distribution ufuncs are in scipy.special, and
> they are private, but a trivial change will make them public, if there is
> interest.  I don't have a strong opinion either way, but as you say, there
> is a precedent for including them as public functions in scipy.special.
> If we start converting existing distribution implementations (which I think
> would be a good thing for the stats code), we'll end up with a *lot* more
> functions being added somewhere.
>
> Warren
>
>
>
>
>>
>> --
>> Robert Kern
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20170326/cd2abe2d/attachment-0001.html>


More information about the SciPy-Dev mailing list