[SciPy-Dev] Rationale behind *_gen and *_frozen in _multivariate.py

Evgeni Burovski evgeny.burovskiy at gmail.com
Wed Jul 27 09:15:10 EDT 2016


On Wed, Jul 27, 2016 at 1:44 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Jul 27, 2016 at 12:36 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Wed, Jul 27, 2016 at 7:09 AM, Lukas Drude <mail at lukas-drude.de> wrote:
>> > Hello Scipy,
>> >
>> > I would like to implement additional distributions (at least locally for
>> > now).
>> >
>> > To do so, I looked at scipy/stats/_multivariate.py and would like to
>> > understand the rationale behind the *_gen and *_frozen classes.
>> >
>> > - Are the frozen-classes used to avoid parameter checks during run time?
>> > - Why is i.e. in line 1360 dirichlet = dirichlet_gen() [1]? It seems
>> > like an
>> > object is created during the import although it appears to the user as
>> > if
>> > scipy.stats.dirichlet was a module and scipy.stats.dirichlet.pdf() was a
>> > function of just that module?
>>
>> > I do not want to change the scipy code. I would just like to know, what
>> > the
>> > benefits are.
>> >
>> > With best regards
>> > Lukas
>> >
>> > [1]
>> >
>> > https://github.com/scipy/scipy/blob/ffaebc9e684e5bd23bbd3d5234c27a71369990b7/scipy/stats/_multivariate.py#L1360
>>
>> some history in the following, Evgeni knows better the recent changes
>>
>> The original implementation of the distributions was mostly
>> "functional". Classes are used as namespace and to make implementation
>> easier, but users only used a single global instance of the
>> distribution classes.
>>
>> Because it is only a single global instance it cannot keep state, i.e.
>> store intermediate results and parameters as attributes. This was a
>> headache and source of bugs when state spilled over in the global
>> instance from one use to the next.
>
> That's not quite right, I don't think. Only the multivariate distributions,
> which are quite new, store intermediate results. No global state was ever
> stored in the "unfrozen" distribution instances. Storing intermediate
> results were not a consideration in adding frozen distributions.
>
> When the distributions were first designed, Python did not have
> classmethods. So the API `norm.pdf(x, loc, scale)` would not have been
> possible if `norm` were a class. You had to make an instance of a class to
> get callable methods. At first, this was the only API provided. At the time,
> scipy definitely had a bias against using objects in its API (i.e. forcing
> users to construct objects, not just using pre-existing instances).
> Object-orientation was seen as an unnecessary complication for scientific
> programmers. Things are different now.
>
> However, this API was sometimes inconvenient because one would always have
> to pass around the distribution and the arguments separately, making it hard
> to write generic code. Frozen distributions were added to bind the
> parameters to the distribution so that one could just pass around a single
> object. Now, you can write generic code that just accepts a single frozen
> distribution object and call `dist.pdf(x)` without the code needing to know
> anything about which distribution is being used or its parameters.
>
> --
> Robert Kern
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>


Re: storing intermediate results.
There was some discussion a while ago,
https://github.com/scipy/scipy/issues/2823
At least my conclusion was that it's not worth it, not generically at
least: https://github.com/scipy/scipy/issues/2823#issuecomment-23806104
It's perfectly possible one can do better and there's a way :-).

At the moment, a frozen distribution holds an instance separate from
the global one.
(https://github.com/scipy/scipy/blob/master/scipy/stats/_distn_infrastructure.py#L429)
E.g.

In [32]: from scipy.stats import gamma

In [33]: gamma.shapes
Out[33]: 'a'

In [34]: rv = gamma(a=1)

In [35]: rv.dist
Out[35]: <scipy.stats._continuous_distns.gamma_gen at 0x7f9706647710>

In [36]: gamma
Out[36]: <scipy.stats._continuous_distns.gamma_gen at 0x7f9706d11e50>

In [37]: rv.dist is gamma
Out[37]: False

So that one can

* use a separate random_state for drawing variates:

n [38]: gamma.random_state
Out[38]: <mtrand.RandomState at 0x7f970d834e10>

In [39]: rv.random_state
Out[39]: <mtrand.RandomState at 0x7f970d834e10>      # same!

In [40]: rv.random_state = 123

In [41]: rv.random_state
Out[41]: <mtrand.RandomState at 0x7f97141084d0>      # different

In [42]: gamma.random_state
Out[42]: <mtrand.RandomState at 0x7f970d834e10>     # intact

* monkey-patch the instance methods to store intermediates if desired.



More information about the SciPy-Dev mailing list