[SciPy-Dev] scipy.stats.kde

Fri Aug 27 15:39:27 EDT 2010

On Fri, Aug 27, 2010 at 3:27 PM, Sam Birch <sam.m.birch at gmail.com> wrote:
>> Bandwidth selection is a hotly debated topic, at least in one
>
> dimension, so perhaps not just different methods but tools for
>
> diagnosing bandwidth selection problems would be nice - at the least,
>
> it should be made straightforward to vary the bandwidth (e.g. to plot
>
> the KDE with a range of different bandwidth values).
>
> Well by allowing them to use a custom bandwidth matrix they can vary it
> themselves, no?
>
>> At the other end of the spectrum, for very dense KDEs, on the circle I
>
> found it extremely convenient to use Fourier transforms to carry out
>
> the convolution of kernel with points. In particular, I represented
>
> the KDE in terms of its Fourier coefficients, so that an inverse FFT
>
> immediately gave me the KDE evaluated on a grid (or, with some
>
> fiddling, integrated over the bins of a histogram). I don't know
>
> whether this is a useful optimization for KDEs on the line or in
>
> higher dimensions, since there's the problem of wrapping.
>
> That sounds very interesting. Sorry if I'm being dense (or just wrong, or
> both), but do you convolve post-FFT or before? If before why does it make it
> easier?

and also: Do you grid the initial points first ?
I think it sounds similar to what Skipper was trying at some point.
>From the paper it sounded like it's expensive to construct the initial
points, but then much cheaper to evaluate the kde at many points
because of the use of the fft for the actual convolution.

Josef

> -Sam
> On Fri, Aug 27, 2010 at 2:48 PM, Anne Archibald <aarchiba at physics.mcgill.ca>
> wrote:
>>
>> My only experience with KDEs has been on the circle, where there seems
>> to be little or no literature and the constraints are rather
>> different.
>>
>> On 27 August 2010 14:38,  <josef.pktd at gmail.com> wrote:
>> > On Fri, Aug 27, 2010 at 2:17 PM, Sam Birch <sam.m.birch at gmail.com>
>> > wrote:
>> >> Hi all,
>> >> I was thinking of renovating the kernel density estimation package
>> >> (although
>> >> no promises; I'm leaving for college tomorrow morning!). I was
>> >> wondering:
>> >> a) whether anyone had started code in that direction
>> >
>> > Mike Crowe wrote code for kernel regression  and Skipper started a 1D
>> > kernel density estimator in scikits.statsmodels, which cover a larger
>> > number of kernels
>> >
>> > I don't think I have seen any higher dimensional kernel density
>> > estimation in python besides scipy.stats.kde. The Gaussian kde in
>> > scipy.stats is targeted to the underlying Fortran code for
>> > multivariate normal cdf.
>> > It's not clear to me what other n-dimensional kdes would require or
>> > whether they would fit well with the current code.
>> >
>> > One extension that Robert also mentioned in the past that it would be
>> > nice to have adaptive kernels, which I also haven't seen in python
>> > yet.
>> >
>> >> b) what people want in it
>> >> I was thinking (as an ideal, not necessarily goal):
>> >> - Support for more than Gaussian kernels (e.g. custom,
>> >> uniform, Epanechnikov, triangular, quartic, cosine, etc.)
>> >> - More options for bandwidth selection (custom bandwidth matrices,
>> >> AMISE
>> >> optimization, cross-validation, etc.)
>> >
>> > definitely yes, I don't think they are even available for 1D yet.
>>
>> Bandwidth selection is a hotly debated topic, at least in one
>> dimension, so perhaps not just different methods but tools for
>> diagnosing bandwidth selection problems would be nice - at the least,
>> it should be made straightforward to vary the bandwidth (e.g. to plot
>> the KDE with a range of different bandwidth values).
>>
>> >> - Assorted conveniences: automatically generate the mesh, limit the
>> >> kernel's
>> >> support for speed
>> >
>> > Using scipy.spatial to limit the number of neighbors in a bounded
>> > support kernel might be a good idea.
>>
>> Simply using it to find the neighbors that need to be used should
>> speed things up. There may also be some shortcuts for
>> unbounded-support kernels (no point adding a Gaussian a hundred sigma
>> away if there's any points nearby).
>>
>> At the other end of the spectrum, for very dense KDEs, on the circle I
>> found it extremely convenient to use Fourier transforms to carry out
>> the convolution of kernel with points. In particular, I represented
>> the KDE in terms of its Fourier coefficients, so that an inverse FFT
>> immediately gave me the KDE evaluated on a grid (or, with some
>> fiddling, integrated over the bins of a histogram). I don't know
>> whether this is a useful optimization for KDEs on the line or in
>> higher dimensions, since there's the problem of wrapping.
>>
>> Anne
>>
>> > (just some thought on the topic)
>> >
>> > Josef
>> >
>> >> So, thoughts anyone? I figure it's better to over-specify and then
>> >> under-produce, so don't hold back.
>> >> Thanks,
>> >> Sam
>> >> _______________________________________________
>> >> SciPy-Dev mailing list
>> >> SciPy-Dev at scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >>
>> >>
>> > _______________________________________________
>> > SciPy-Dev mailing list
>> > SciPy-Dev at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/scipy-dev
>> >
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
>