[SciPy-Dev] Rationale behind *_gen and *_frozen in _multivariate.py

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Jul 27 09:04:30 EDT 2016


On Wed, Jul 27, 2016 at 8:44 AM, Robert Kern <robert.kern at gmail.com> wrote:
> On Wed, Jul 27, 2016 at 12:36 PM, <josef.pktd at gmail.com> wrote:
>>
>> On Wed, Jul 27, 2016 at 7:09 AM, Lukas Drude <mail at lukas-drude.de> wrote:
>> > Hello Scipy,
>> >
>> > I would like to implement additional distributions (at least locally for
>> > now).
>> >
>> > To do so, I looked at scipy/stats/_multivariate.py and would like to
>> > understand the rationale behind the *_gen and *_frozen classes.
>> >
>> > - Are the frozen-classes used to avoid parameter checks during run time?
>> > - Why is i.e. in line 1360 dirichlet = dirichlet_gen() [1]? It seems
>> > like an
>> > object is created during the import although it appears to the user as
>> > if
>> > scipy.stats.dirichlet was a module and scipy.stats.dirichlet.pdf() was a
>> > function of just that module?
>>
>> > I do not want to change the scipy code. I would just like to know, what
>> > the
>> > benefits are.
>> >
>> > With best regards
>> > Lukas
>> >
>> > [1]
>> >
>> > https://github.com/scipy/scipy/blob/ffaebc9e684e5bd23bbd3d5234c27a71369990b7/scipy/stats/_multivariate.py#L1360
>>
>> some history in the following, Evgeni knows better the recent changes
>>
>> The original implementation of the distributions was mostly
>> "functional". Classes are used as namespace and to make implementation
>> easier, but users only used a single global instance of the
>> distribution classes.
>>
>> Because it is only a single global instance it cannot keep state, i.e.
>> store intermediate results and parameters as attributes. This was a
>> headache and source of bugs when state spilled over in the global
>> instance from one use to the next.
>
> That's not quite right, I don't think. Only the multivariate distributions,
> which are quite new, store intermediate results. No global state was ever
> stored in the "unfrozen" distribution instances. Storing intermediate
> results were not a consideration in adding frozen distributions.

It took me a few months to figure out why the distributions sometimes
produces different, i.e. wrong, results, and to fix those bugs. Using
attributes and state might not have been the plan, but it was and is
in the actual implementation.

(And it's the source of my allergy to the possibility of stale state
in statsmodels.)

>
> When the distributions were first designed, Python did not have
> classmethods. So the API `norm.pdf(x, loc, scale)` would not have been
> possible if `norm` were a class. You had to make an instance of a class to
> get callable methods. At first, this was the only API provided. At the time,
> scipy definitely had a bias against using objects in its API (i.e. forcing
> users to construct objects, not just using pre-existing instances).
> Object-orientation was seen as an unnecessary complication for scientific
> programmers. Things are different now.
>
> However, this API was sometimes inconvenient because one would always have
> to pass around the distribution and the arguments separately, making it hard
> to write generic code. Frozen distributions were added to bind the
> parameters to the distribution so that one could just pass around a single
> object. Now, you can write generic code that just accepts a single frozen
> distribution object and call `dist.pdf(x)` without the code needing to know
> anything about which distribution is being used or its parameters.

This clarifies but doesn't contradict my comments.

"frozen" was then the object-oriented backdoor for users that didn't
want to know about objects.

Times have fortunately changed away from the matlab/fortran tradition,
except for Julia where developers and users still prefer greek and one
letter names and no classes. :)


Norm(loc, scale).pdf(x) is much more work than norm.pdf(x, loc, scale)

Josef


>
> --
> Robert Kern
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list