[SciPy-dev] Proposal: scipy.spatial

Nathan Bell wnbell at gmail.com
Tue Sep 30 00:40:04 EDT 2008


On Mon, Sep 29, 2008 at 11:24 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:

Thanks for taking the initiative on scipy.spatial Anne.  I have no
doubt that it will become an important and widely used part of scipy.

Have you considered starting a "spatial" branch off the scipy trunk?

>
> I am particularly concerned about providing a convenient return
> format. The natural return from a query is a list of neighbors, since
> it may have variable length (there may not be r elements in the tree,
> or you might have supplied a maximum distance which doesn't contain r
> points). For a single query, it's simple to return a python list
> (should it be sorted? currently it's a heap). But if you want to
> vectorize the process, storing the results in an array becomes
> cumbersome. One option is an object array full of lists; another,
> which, I currently use, is an array with nonexistent points marked
> with an infinite distance and an invalid index. A third would be to
> return masked arrays. How do you recommend handling this variable (or
> potentially-variable) sized output?
>

I'd go with an array of lists for now.  If you support an "all points
in a given sphere" query, then you'll need something that can
accommodate the potentially large size imbalance among query results.

FWIW you could encode this information in a sparse matrix.  For
instance, the lil_matrix format is essentially an array of lists with
entries of the form (column index,value).  I don't know if this sort
of approach would simplify matters or not.

>
> Research suggests that at least in high dimension a more geometric
> balancing criterion can produce vastly better results. I used the
> "sliding midpoint" rule, which doesn't allow such a numerical
> balancing but does ensure that you don't have too many long skinny
> cells (since a sphere tends to cut very many of these).

I will defer to your judgment here.  My experience, and the paper I
referenced, is oriented towards low dimensional cases that arise in
graphics applications.

-- 
Nathan Bell wnbell at gmail.com
http://graphics.cs.uiuc.edu/~wnbell/



More information about the SciPy-Dev mailing list