[SciPy-user] scipy.stats rv objects from data
Anne Archibald
peridot.faceted at gmail.com
Fri May 2 18:37:50 EDT 2008
2008/5/2 Erik Tollerud <erik.tollerud at gmail.com>:
> My interpretation of this is that it is using the normal distribution
> - I want a distribution that is a smoothed/interpolated version of the
> discrete distribution I generated above. I take this to mean there's
> no built-in utility to do this, so I just have to make my own - this
> seems like a useful thing for data analysis, though, so I may submit
> it later to be added to SVN.
Well, this is tricky. You need to decide what it is you want from your
distribution: the *way* in which you smooth your distribution function
can make a tremendous difference to the results you get. If you're
doing statistics, coping with this can be a real challenge.
There is one family of techniques, called kernel density estimators,
for "smoothing" distributions in a controlled and statistically
well-understood fashion. They are implemented in scipy as well. They
are used specifically for reconstructing a distribution when you have
a collection of samples drawn from it; the resulting pdf is a sum of
Gaussians, one centered at each sample. The width is automatically
chosen based on the samples.
You can also, of course, consruct an arbitrary pdf as a function, and
then use numerical integration on it. (If you use splines it can be
integrated even more efficiently, though I don't know that you can
ensure that they are everywhere positive; I'd be inclined to
interpolate the log instead, but then you lose easy integration.) I
don't know whether scipy.stats allows you to construct a distribution
object, with all its standard methods, from a given pdf, but this
would be a vary useful feature.
Anne
More information about the SciPy-User
mailing list