[Numpy-discussion] Proposal: scipy.spatial

Anne Archibald peridot.faceted at gmail.com
Tue Sep 30 17:31:17 EDT 2008


2008/9/30 Peter <numpy-discussion at maubp.freeserve.co.uk>:
> On Tue, Sep 30, 2008 at 5:10 AM, Christopher Barker
> <Chris.Barker at noaa.gov> wrote:
>>
>> Anne Archibald wrote:
>>> I suggest the creation of
>>> a new submodule of scipy, scipy.spatial,
>>
>> +1
>>
>> Here's one to consider:
>> http://pypi.python.org/pypi/Rtree
>> and perhaps other stuff from:
>> http://trac.gispython.org/spatialindex/wiki
>> which I think is LGPL -- can scipy use that?
>
> There is also a KDTree module in Biopython (which is under a BSD/MIT
> style licence),
> http://biopython.org/SRC/biopython/Bio/KDTree/
>
> The current version is in C, there is an older version available in
> the CVS history in C++ too,
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/KDTree/?cvsroot=biopython

I think the profusion of different implementations is an argument for
including this in scipy. I think it is also an argument for providing
a standard interface with (at least potentially) several different
implementations. At the moment, that proposed interface looks like:

T = KDTree(data)

distances, indices = T.query(xs) # single nearest neighbor

distances, indices = T.query(xs, k=10) # ten nearest neighbors

distances, indices = T.query(xs, k=None, distance_upper_bound=1.0) #
all within 1 of x

In the first two cases, missing neighbors are represented with an
infinite distance and an invalid index. In the last case, distances
and indices are both either lists (if there's only one query point) or
object arrays of lists (if there are many query points). If only one
neighbor is requested, the array does not have a dimension of length 1
in the which-neighbor position. If (potentially) many neighbors are
returned, they are sorted by distance, nearest first.

What do you think of this interface?

It may make sense to provide additional kinds of query - nearest
neighbors between two trees, for example - some of which would be
available only for some implementations.

Anne



More information about the NumPy-Discussion mailing list