[SciPy-User] Probability Density Estimation

Zachary Pincus zachary.pincus at yale.edu
Tue Apr 5 16:18:26 EDT 2011


On Apr 5, 2011, at 4:12 PM, josef.pktd at gmail.com wrote:

> Here is a recipe how to subclass scipy.stats.gaussian_kde to set the
> bandwidth manually:
>
> http://mail.scipy.org/pipermail/scipy-user/2010-January/023877.html
> (I have misplaced the file right now, which happens twice a year.)
>
> As alternative, if they are really outliers, then you could try to
> identify them and remove them from the dataset before running the kde.
> But maybe they are not outliers and you have a distribution with heavy
> tails.

Or, better, remove them from the dataset before calculating the  
bandwidth, but add them back for the actual density estimation. Or  
(effectively the same procedure), substitute in a robust covariance  
estimator for the calls to numpy.cov (or whatever it is in there) --  
look e.g. at the MCD method. (Very easy in 1D -- I have code for that  
special case but not the general case.)

Zach



More information about the SciPy-User mailing list