[SciPy-Dev] Rationale behind *_gen and *_frozen in _multivariate.py

Robert Kern robert.kern at gmail.com
Wed Jul 27 08:44:10 EDT 2016


On Wed, Jul 27, 2016 at 12:36 PM, <josef.pktd at gmail.com> wrote:
>
> On Wed, Jul 27, 2016 at 7:09 AM, Lukas Drude <mail at lukas-drude.de> wrote:
> > Hello Scipy,
> >
> > I would like to implement additional distributions (at least locally for
> > now).
> >
> > To do so, I looked at scipy/stats/_multivariate.py and would like to
> > understand the rationale behind the *_gen and *_frozen classes.
> >
> > - Are the frozen-classes used to avoid parameter checks during run time?
> > - Why is i.e. in line 1360 dirichlet = dirichlet_gen() [1]? It seems
like an
> > object is created during the import although it appears to the user as
if
> > scipy.stats.dirichlet was a module and scipy.stats.dirichlet.pdf() was a
> > function of just that module?
>
> > I do not want to change the scipy code. I would just like to know, what
the
> > benefits are.
> >
> > With best regards
> > Lukas
> >
> > [1]
> >
https://github.com/scipy/scipy/blob/ffaebc9e684e5bd23bbd3d5234c27a71369990b7/scipy/stats/_multivariate.py#L1360
>
> some history in the following, Evgeni knows better the recent changes
>
> The original implementation of the distributions was mostly
> "functional". Classes are used as namespace and to make implementation
> easier, but users only used a single global instance of the
> distribution classes.
>
> Because it is only a single global instance it cannot keep state, i.e.
> store intermediate results and parameters as attributes. This was a
> headache and source of bugs when state spilled over in the global
> instance from one use to the next.

That's not quite right, I don't think. Only the multivariate distributions,
which are quite new, store intermediate results. No global state was ever
stored in the "unfrozen" distribution instances. Storing intermediate
results were not a consideration in adding frozen distributions.

When the distributions were first designed, Python did not have
classmethods. So the API `norm.pdf(x, loc, scale)` would not have been
possible if `norm` were a class. You had to make an instance of a class to
get callable methods. At first, this was the only API provided. At the
time, scipy definitely had a bias against using objects in its API (i.e.
forcing users to construct objects, not just using pre-existing instances).
Object-orientation was seen as an unnecessary complication for scientific
programmers. Things are different now.

However, this API was sometimes inconvenient because one would always have
to pass around the distribution and the arguments separately, making it
hard to write generic code. Frozen distributions were added to bind the
parameters to the distribution so that one could just pass around a single
object. Now, you can write generic code that just accepts a single frozen
distribution object and call `dist.pdf(x)` without the code needing to know
anything about which distribution is being used or its parameters.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20160727/1208e1cf/attachment.html>


More information about the SciPy-Dev mailing list