[SciPy-User] Probability Density Estimation
Zachary Pincus
zachary.pincus at yale.edu
Tue Apr 5 16:18:26 EDT 2011
On Apr 5, 2011, at 4:12 PM, josef.pktd at gmail.com wrote:
> Here is a recipe how to subclass scipy.stats.gaussian_kde to set the
> bandwidth manually:
>
> http://mail.scipy.org/pipermail/scipy-user/2010-January/023877.html
> (I have misplaced the file right now, which happens twice a year.)
>
> As alternative, if they are really outliers, then you could try to
> identify them and remove them from the dataset before running the kde.
> But maybe they are not outliers and you have a distribution with heavy
> tails.
Or, better, remove them from the dataset before calculating the
bandwidth, but add them back for the actual density estimation. Or
(effectively the same procedure), substitute in a robust covariance
estimator for the calls to numpy.cov (or whatever it is in there) --
look e.g. at the MCD method. (Very easy in 1D -- I have code for that
special case but not the general case.)
Zach
More information about the SciPy-User
mailing list