[SciPy-dev] [Numpy-discussion] Proposal: scipy.spatial

Wed Oct 1 13:56:22 EDT 2008

On Wed, Oct 1, 2008 at 12:53 PM, Prabhu Ramachandran
<prabhu at aero.iitb.ac.in> wrote:
>
> I thought this was a discussion on a good API for such an algorithm
> (sorry if I misunderstood!).  Given that context, if this is a general
> purpose algorithm I'd rather not assume that the number of results is
> typically going to be small.

However, you *can* assume log(k) of a k-nearest neighbor query is
going to be small.

>
> If there is no need for sorting and there are a large number of nearest
> neighbor queries (in my case there are O(N) nearest neighbor queries),
> this quickly adds up to an unnecessary O(N) inefficiency that I'd like
> to avoid.
>

Although you might not sort a k-nearest neighbor query, but as Anne
has said, you end up needing a sorted structure (like a priority
queue) to perform the query itself anyway.

The only case where not sorting makes sense is an "in sphere" query
where there's no need to prioritize the results.  In this case, I'd
still argue that the overhead of sorting isn't likely to matter.

>
> I don't see why the addition of a sort keyword will double your unit
> tests.  Another option would be to return unsorted results and have a
> separate sort method if it makes testing easier.
>

It would be more profitable to offload the queries to native code.
That is a much larger speedup than the O(log k) sorting factor.
Interpreter overhead is going to dominate any pure-python hierarchy
traversal.

-- 
Nathan Bell wnbell at gmail.com
http://graphics.cs.uiuc.edu/~wnbell/