[SciPy-Dev] Cover trees for nearest neighbors in general metric space

Ralf Gommers ralf.gommers at googlemail.com
Sat Mar 10 17:18:53 EST 2012


On Sat, Mar 10, 2012 at 10:26 PM, Patrick Varilly <patvarilly at gmail.com>wrote:

> On Sat, Mar 10, 2012 at 8:00 PM, Ralf Gommers <ralf.gommers at googlemail.com
> > wrote:
>
>>
>>
>> On Fri, Mar 9, 2012 at 2:23 AM, Patrick Varilly <patvarilly at gmail.com>wrote:
>>
>>> Dear all,
>>>
>>> Following up from the conversation with Emanuele Olivetti and Jake
>>> VanderPlas, I've implemented [0] a drop-in replacement for
>>> scipy.spatial.kdtree that uses cover trees instead of kd-trees to answer
>>> nearest neighbor queries in a general metric space.  To me, this is useful
>>> for finding nearby particles in a 3D periodic box in the context of
>>> molecular simulations, but I'm sure it's more generally useful.  It
>>> addresses the same problem that Jake's BallTrees code addresses in
>>> scikit-learn, but I've done my best to reproduce the API of
>>> scipy.spatial.kdtree in order to make this code mostly painless to use.  In
>>> particular, kd-tree's useful function for finding all the points in one
>>> tree that are neighbors of every point in another tree (ironically,
>>> "query_ball_tree") is also implemented here for cover trees.  I modified
>>> kdtree's extensive unit test to use cover trees, and the code passes it.
>>>
>>
>> Out of interest, are you planning to propose this for inclusion in scipy
>> or scikit-learn once it's done, or keep it as a standalone package?
>>
>
> Eventually, I'd like to propose it for inclusion in scipy, since this
> functionality is not exclusive to machine learning.  I'm using it for
> molecular simulations, and didn't even know that nearest-neighbor queries
> were useful in machine learning!  I would never have though of looking in
> scikit-learn for this.  But I would like to address the two outstanding
> issues (vectorized distances and Cython implementation) before proposing it
> for inclusion.  On that same vein, I don't know why BallTree is in
> scikit-learn and not scipy, for the same reasons.
>

Resistance to adding more C++ code IIRC.


> Finally, in reading carefully through kdtree.py, I found a clear bug in
>>> the code, whereby the "eps" parameter for approximate queries doesn't get
>>> forwarded from the externally visible function "query" to the internal
>>> function "__query" that does the work.
>>>
>>
>> Could you open a ticket for that with a little more detail? Or if you
>> feel like it, a pull request would be even better:)\
>>
>
> I sent in a pull request for it.  It's the first time I do this, so
> apologies in advance if I somehow screwed it up.
>

No apologies necessary - it all looks good. Thanks for doing that.

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20120310/f2a93a3b/attachment.html>


More information about the SciPy-Dev mailing list