[SciPy-Dev] Scipy 1.0 roadmap - stats

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Sep 22 06:46:31 EDT 2013


On Sun, Sep 22, 2013 at 4:52 AM, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
>
> On Sun, Sep 22, 2013 at 7:52 AM, Christopher Jordan-Squire <cjordan1 at uw.edu>
> wrote:
>>
>> For scipy stats, is there anything on the table regarding somehow
>> unifying the sampling in numpy.random and the distributions in
>> scipy.stats? I'm specifically thinking of two issues:
>>
>> (1) There's a lot of duplication between numpy.random and scipy.stats
>> but with different interfaces. This seems like something that ideally
>> would be reduced.
>
>
> numpy.random only provides sampling and only has about half the
> distributions of scipy.stats. Sampling is really only a small part of what
> scipy.stats provides (pdf, cdf, moments, fitting a distribution, etc.). So
> I'm not bothered by that duplication. If we'd want to reduce it I think it
> would have to be removed from numpy, which doesn't sound like a good idea.

There is no code duplication, so it also never bothered me.
scipy.stats distributions has quite a bit more overhead than numpy
random if you call random number generation repeatedly instead of
requesting one big array.

There are some naming inconsistencies between scipy.stats and
numpy.random, but I never looked systematically for that, and there is
no open issue. The distributions in scipy.stats have restrictions on
the choice of paramaterization because of the generic use of loc and
scale.


One issue that should be added to the roadmap is fixing the
broadcasting of loc and scale in the scipy random numbers.
I don't see a way to fix this in a backwards compatible way.

---

Some functions like nanmean, nanstd and others can be removed from
scipy.stats because they will be available in numpy. (when scipy
requires a minimum version of numpy that contains those.)

>
>>
>> (2) The interface for the distributions in scipy.stats seems to
>> explicitly be for scalar random variables, so there's no multivariate
>> normals, multinomials, dirichlet, wishart, etc.. Instead the sampling
>> is in numpy.random, and pdf's aren't there.
>
>
> Two days ago PR-2726 was merged, which adds a multivariate normal
> distribution. Others can be added. IIRC there has been an enhancement ticket
> for wishhart somewhere and there's a Python implementation floating around
> somewhere.
>
> Cheers,
> Ralf
>
>>
>> Has this been discussed elsewhere?

No, all the discussions for scipy.stats are inn github issues
(including PRs) or on the mailing list.

Josef

>>
>> On Sat, Sep 21, 2013 at 8:03 PM, Blake Griffith
>> <blake.a.griffith at gmail.com> wrote:
>> >
>> >> sparse
>> >> ``````
>> >>
>> >> Don't emulate np.matrix behavior, drop 2-D?
>> >
>> >
>> > What is meant by this? Emulate np.array instead?
>> >
>> > _______________________________________________
>> > SciPy-Dev mailing list
>> > SciPy-Dev at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list