[scikit-learn] Nearest neighbor search with 2 distance measures

Jacob Vanderplas jakevdp at cs.washington.edu
Tue Aug 1 13:59:21 EDT 2017


On Tue, Aug 1, 2017 at 10:50 AM, Rohin Kumar <yrohinkumar at gmail.com> wrote:

> I started with KD-tree but after realising it doesn't support custom
> metrics (I don't know the reason for this - would be nice feature)
>

The scikit-learn KD-tree doesn't support custom metrics because it utilizes
relatively strong assumptions about the form of the metric when
constructing the tree. The Ball Tree makes fewer assumptions, which is why
it can support arbitrary metrics. It would in principal be possible to
create a KD Tree that supports custom *axis-aligned* metrics, but again I
think that would be too specialized for inclusion in scikit-learn.

One project you might check out is cykdtree:
https://pypi.python.org/pypi/cykdtree
I'm not certain whether it supports the queries you need, but I would bet
the team behind that would be willing to work toward these sorts of
specialized queries if they don't already exist.

   Jake



> I shifted to BallTree and was looking for a 2 metric based categorisation.
> After looking around, the best I could find at most were brute-force
> methods written in python (had my own version too) or better optimised ones
> in C or FORTRAN. The closest one was halotools which again works with
> euclidean metric. For now, I will try to get my work done with 2 different
> BallTrees iteratively in bins. If I find a better option will try to post
> an update.
>
> Regards,
> Rohin.
>
>
> On Tue, Aug 1, 2017 at 10:55 PM, Jacob Vanderplas <
> jakevdp at cs.washington.edu> wrote:
>
>> Hi Rohin,
>> Ah, I see. I don't think a BallTree is the right data structure for an
>> anisotropic N-point query, because it fundamentally assumes spherical
>> symmetry of the metric. You may be able to do something like this with a
>> specialized KD-tree, but scikit-learn doesn't support this, and I don't
>> imagine that it ever will given the very specialized nature of the
>> application.
>>
>> I'm certain someone has written efficient code for this operation in the
>> astronomy community, but I don't know of any good Python package to
>> recommend for this – I'd suggest googling for keywords and seeing where
>> that gets you.
>>
>> Thanks,
>>   Jake
>>
>>  Jake VanderPlas
>>  Senior Data Science Fellow
>>  Director of Open Software
>>  University of Washington eScience Institute
>>
>> On Tue, Aug 1, 2017 at 6:15 AM, Rohin Kumar <yrohinkumar at gmail.com>
>> wrote:
>>
>>> Since you seem to be from Astrophysics/Cosmology background (I am
>>> assuming you are jakevdp - the creator of astroML - if you are - I am
>>> lucky!), I can explain my application scenario. I am trying to calculate
>>> the anisotropic two-point correlation function something like done in
>>> rp_pi_tpcf
>>> <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.rp_pi_tpcf.html#halotools.mock_observables.rp_pi_tpcf>
>>>  or s_mu_tpcf
>>> <http://halotools.readthedocs.io/en/latest/api/halotools.mock_observables.s_mu_tpcf.html#halotools.mock_observables.s_mu_tpcf>
>>> using pairs (DD,DR,RR) calculated from BallTree.two_point_correlation
>>>
>>> In halotools (http://halotools.readthedocs.
>>> io/en/latest/function_usage/mock_observables_functions.html) it is
>>> implemented using rectangular grids. I could calculate 2pcf with custom
>>> metrics using one variable with BallTree as done in astroML. I intend to
>>> find the anisotropic counter part.
>>>
>>> Thanks & Regards,
>>> Rohin
>>>
>>>
>>> On Tue, Aug 1, 2017 at 5:18 PM, Rohin Kumar <yrohinkumar at gmail.com>
>>> wrote:
>>>
>>>> Dear Jake,
>>>>
>>>> Thanks for your response. I meant to group/count pairs in boxes (using
>>>> two arrays simultaneously-hence needing 2 metrics) instead of one distance
>>>> array as the binning parameter. I don't know if the algorithm supports such
>>>> a thing. For now, I am proceeding with your suggestion of two ball trees at
>>>> huge computational cost. I hope I am able to frame my question properly.
>>>>
>>>> Thanks & Regards,
>>>> Rohin.
>>>>
>>>>
>>>>
>>>> On Mon, Jul 31, 2017 at 8:16 PM, Jacob Vanderplas <
>>>> jakevdp at cs.washington.edu> wrote:
>>>>
>>>>> On Sun, Jul 30, 2017 at 11:18 AM, Rohin Kumar <yrohinkumar at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> *update*
>>>>>>
>>>>>> May be it doesn't have to be done at the tree creation level. It
>>>>>> could be using loops and creating two different balltrees. Something like
>>>>>>
>>>>>> tree1=BallTree(X,metric='metric1') #for x-z plane
>>>>>> tree2=BallTree(X,metric='metric2') #for y-z plane
>>>>>>
>>>>>> And then calculate correlation functions in a loop to get
>>>>>> tpcf(X,r1,r2) using tree1.two_point_correlation(X,r1) and
>>>>>> tree2.two_point_correlation(X,r2)
>>>>>>
>>>>>
>>>>> Hi Rohin,
>>>>> It's not exactly clear to me what you wish the tree to do with the two
>>>>> different metrics, but in any case the ball tree only supports one metric
>>>>> at a time. If you can construct your desired result from two ball trees
>>>>> each with its own metric, then that's probably the best way to proceed,
>>>>>    Jake
>>>>>
>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> scikit-learn mailing list
>>>>>> scikit-learn at python.org
>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> scikit-learn mailing list
>>>>> scikit-learn at python.org
>>>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> scikit-learn mailing list
>>> scikit-learn at python.org
>>> https://mail.python.org/mailman/listinfo/scikit-learn
>>>
>>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170801/969160c2/attachment-0001.html>


More information about the scikit-learn mailing list