[SciPy-Dev] cKDTree

Jacob Vanderplas jakevdp at cs.washington.edu
Tue Sep 6 14:08:41 EDT 2016


As for adding more functionality along those lines, I would advocate
creation of a new package or perhaps a scikit. As we've seen with
scikit-learn, useful features can filter-up into scipy (e.g.
sparse.csgraph) and the development of new features within an independent
package is *much* easier than development from within a scipy PR.
   Jake

 Jake VanderPlas
 Senior Data Science Fellow
 Director of Research in Physical Sciences
 University of Washington eScience Institute

On Tue, Sep 6, 2016 at 11:03 AM, Sylvain Corlay <sylvain.corlay at gmail.com>
wrote:

> Would you guys consider in scope for scipy to have implementation of
> faster nearest neighbor search methods than KdTree?
>
> Some methods are fairly simple... e.g principal axis tree which use the
> principal direction of the dataset to split the dataset into smaller
> subsets.  As soon as intrinsic dimensionality is significantly smaller than
> the dimension of the space, it is significantly faster.
>
> Besides, only having to compute the (an approximate) principal axis is
> much faster than doing an actual PCA.
>
> On Tue, Sep 6, 2016 at 4:14 AM, Jacob Vanderplas <
> jakevdp at cs.washington.edu> wrote:
>
>> From my own casual benchmarks, the new scipy cKDTree is much faster than
>> any of the scikit-learn options, though it still only supports axis-aligned
>> euclidean-like metrics (where sklearn's BallTree supports dozens of
>> additional metrics). The cKDTree also has a limited range of query types
>> compared to scikit-learn's trees,
>>    Jake
>>
>>  Jake VanderPlas
>>  Senior Data Science Fellow
>>  Director of Research in Physical Sciences
>>  University of Washington eScience Institute
>>
>> On Mon, Sep 5, 2016 at 12:46 AM, Daπid <davidmenhur at gmail.com> wrote:
>>
>>> On 4 September 2016 at 23:00, Robert Lucente <rlucente at pipeline.com>
>>> wrote:
>>> > Please note that I am a newbie and just a lurker.
>>> >
>>> > I noticed in a recent email that cKDTree was mentioned.
>>> >
>>> > Q: What is the relationship if any between SciPy an scikit-learn when
>>> it comes to cKDTree?
>>> >
>>> > The reason that I ask are the following 2 links
>>> >
>>> > https://jakevdp.github.io/blog/2013/04/29/benchmarking-neare
>>> st-neighbor-searches-in-python/
>>> >
>>> > https://github.com/scikit-learn/scikit-learn/issues/3682
>>>
>>> Note that these benchmarks are from 2013 and 2014. Scipy's KDTree has
>>> seen its performance recently improved, twice. Scikit's last update to
>>> its KDTree was in 2015. So, we need to run the benchmarks again.
>>>
>>> /David.
>>> _______________________________________________
>>> SciPy-Dev mailing list
>>> SciPy-Dev at scipy.org
>>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>>
>>
>>
>> _______________________________________________
>> SciPy-Dev mailing list
>> SciPy-Dev at scipy.org
>> https://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> https://mail.scipy.org/mailman/listinfo/scipy-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20160906/1a422031/attachment.html>


More information about the SciPy-Dev mailing list