[SciPy-Dev] Expanding Scipy's KDE functionality

Daniel Smith smith.daniel.br at gmail.com
Fri Jan 25 15:32:35 EST 2013


> As for your other part, I will have to think about it, but essentially I came up with the conclusion
> that the bandwidth estimation would require a sparser grid than the density estimation. Making
> some test, a grid of 2^10 elements seem plenty (i.e. I get 4 significant digits compared to 2^14)
> and computation time falls from ~250ms to ~15ms using a dataset with 1000 samples. And 15
> ms to compute the bandwidth is perfectly acceptable for me. Now, if you have an adaptive
> method that can perform similarly, that would be awesome. The bandwidth can then be used in
> any context in which it make sense.

I obviously did not write clearly enough. My apologies. You're
absolutely right that the density estimation can use a sparser grid.
The algorithm I was describing was designed to calculate the bandwidth
estimate with the minimal grid necessary. You still need a density
estimate, but not necessarily the 2^14 estimate I have defaulted. I
will actually code that idea up to make it more clear.

Thank you very much for your code.

> It would be useful, for me and maybe to others, if you could use
> github to keep track of the different versions (your repo or gists).

> I would like to see how the boundary and periodicity are affected by
> the different fft and dct, since I bump into this also in other areas.

Future updates should go to that same github link I sent earlier. I
haven't written any new code. I simply just generated some
exponentially distributed data set MIN to be 0. My periodic idea
involves a different kernel, so it will take a moment to map out. I
haven't played with the code Barbier de Reuille Pierre posted yet. I
will add the distribution generating code to that same git depository.

Daniel



More information about the SciPy-Dev mailing list