[SciPy-user] scipy.stats rv objects from data

Fri May 2 18:37:50 EDT 2008

2008/5/2 Erik Tollerud <erik.tollerud at gmail.com>:

>  My interpretation of this is that it is using the normal distribution
>  - I want a distribution that is a smoothed/interpolated version of the
>  discrete distribution I generated above.  I take this to mean there's
>  no built-in utility to do this, so I just have to make my own - this
>  seems like a useful thing for data analysis, though, so I may submit
>  it later to be added to SVN.

Well, this is tricky. You need to decide what it is you want from your
distribution: the *way* in which you smooth your distribution function
can make a tremendous difference to the results you get. If you're
doing statistics, coping with this can be a real challenge.

There is one family of techniques, called kernel density estimators,
for "smoothing" distributions in a controlled and statistically
well-understood fashion. They are implemented in scipy as well. They
are used specifically for reconstructing a distribution when you have
a collection of samples drawn from it; the resulting pdf is a sum of
Gaussians, one centered at each sample. The width is automatically
chosen based on the samples.

You can also, of course, consruct an arbitrary pdf as a function, and
then use numerical integration on it. (If you use splines it can be
integrated even more efficiently, though I don't know that you can
ensure that they are everywhere positive; I'd be inclined to
interpolate the log instead, but then you lose easy integration.) I
don't know whether scipy.stats allows you to construct a distribution
object, with all its standard methods, from a given pdf, but this
would be a vary useful feature.

Anne