[SciPy-User] Probability Density Estimation

Fri Apr 29 16:06:46 EDT 2011

On Fri, Apr 29, 2011 at 3:01 PM, Hans Georg Schaathun
<hg+scipy at schaathun.net> wrote:
> Dear all,
>
> this is a bit overdue; thanks a lot to everyone who helped me
> with my KDE trouble three weeks ago.
>
> On Wed, Apr 06, 2011 at 11:47:26AM -0400, josef.pktd at gmail.com wrote:
>> If you have or find some results and are willing to share, then I will
>> be very interested.
>
> The solution to my problem was to make sure that I use the same
> bandwidth for all the KDE estimations relating to the same
> variable, i.e. for P(X), P(X|Y=1) and P(X|Y=0).  Now the results
> seem plausible, even though I did not get the results I was hoping
> for :-)
>
> I attach a monkey-patched and annotated version of gaussian_kde making
> it easier to choose the bandwidth.  The main reason I monkey-patched
> rather than inherit is that I needed to annotate it to understand what
> I was doing.  The comments are intended for pylit and sphinx, but do
> not work as well as I would have hoped.
>
> The second attachment is a derived class supporting entropy estimation.
>
> I have not done any really serious testing, and may have made mistakes.
>
> Any feedback would be welcome.

Thanks,

I need to play with it to see where it differs from the scipy or my
older version (except for the exclude trimming).
I didn't see yet how you set the bandwidth from 2 samples to use for
each individual, but it sounds like an interesting idea that I need to
check for my case in mutual information.

You didn't specify a license, which would be good so that readers know
what they are allowed to do with the code.

And as an observation: I find the literate programming comments pretty
distracting (at least reading it in the browser without a highlighting
editor, wrong intend for my taste) and would prefer numpy style
docstrings, e.g. I didn't find a description of exclude.

some details I'm not sure about when reading it, for example

idx = [ i for i in xrange(self.n) if B[:,i].all() ]
isn't this the same as
idx = np.nonzero(B.all(0))

ahmadlin looks like it would be a nice extension to scipy.stats also
as a standalone function (like your contEntropy), if the automatic
bandwidth choice can be made robust enough.

Was the outlier trimming enough to solve your problem with estimating
a kernel density, or did you try also try other kernels?

Josef

> --
> :-- Hans Georg
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>