[Numpy-discussion] Robust Sorting of Points

josef.pktd at gmail.com josef.pktd at gmail.com
Sun Oct 27 16:26:04 EDT 2013


On Sun, Oct 27, 2013 at 4:22 PM,  <josef.pktd at gmail.com> wrote:
> On Sun, Oct 27, 2013 at 3:22 PM, Freddie Witherden
> <freddie at witherden.org> wrote:
>> On 27/10/13 18:54, Daniele Nicolodi wrote:
>>> On 27/10/2013 19:42, Freddie Witherden wrote:
>>>> On 27/10/13 18:35, Nathaniel Smith wrote:
>>>>> On Sun, Oct 27, 2013 at 6:28 PM, Freddie Witherden
>>>>> <freddie at witherden.org> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> This is a question which has been bugging me for a while.  I have an (N,
>>>>>> 3) array where N ~ 16 of points.  These points are all unique and
>>>>>> separated by a reasonable distance.
>>>>>>
>>>>>> I wish to sort these points into a canonical order in a fashion which is
>>>>>> robust against small perturbations.  In other words changing any
>>>>>> component of any of the points by an epsilon ~ 1e-12 should not affect
>>>>>> the resulting sorted order.
>>>>>
>>>>> I don't understand how this is possible even in principle.
>>>>>
>>>>> Say your points are
>>>>>
>>>>>  a = [0, 0, 0]
>>>>>  b = [0, 0, 1e-12]
>>>>>
>>>>> According to your criterion, either a or b should go first -- I don't
>>>>> know which. Let's say our canonical ordering decides that a goes
>>>>> first. But if you perturb both of them, then you have
>>>>>
>>>>>  a = [0, 0, 1e-12]
>>>>>  b = [0, 0, 0]
>>>>>
>>>>> And now your requirement says that a still has to go first. But if a
>>>>> goes first this time, then b had to go first the last time, by
>>>>> symmetry. Thus your criterion is self-contradictory...?
>>>>
>>>> Not exactly; in your case the distance between a and b is of the order
>>>> epislon.  However, my points are "all unique and separated by a
>>>> reasonable distance."  This requires at least one of the components of
>>>> any two points to differ in all instances, permitting an ordering to be
>>>> defined.  (Where if epislon ~ 1e-12 the minimum instance between any two
>>>> points is of order ~ 1e-6.)
>>>
>>> Do you mean that all you points are distributed around some fixed points
>>> in your space?  In this case, it looks like what you are looking for is
>>> categorization or clustering and not sorting.  Once you perform
>>> clustering, you can simply define an arbitrary order in which report the
>>> content of each cluster.  If this is not the case, the problem that
>>> Nathaniel highlishts is still present.
>>
>> I am not entirely sure what you mean here.  If x is my array of points
>> of size (16, 3) then I am guarenteeing that
>>
>>   np.min(scipy.spatial.distance.pdist(x)) >= 1e-6
>>
>> In this instance I am unsure how the issue highlighted by Nathaniel
>> might arise.  Of course it is (very) possible that I am missing
>> something, however, I believe under the terms of this constraint that it
>> is always possible to define an order with which to iterate through the
>> points which is invarient to shuffling of the points and small
>> pertubations of the components.
>
>
> If the epsilon or scale depends on the column, then, I think, divmod
> should work to cut off the noise
>
>>>> my_array[np.lexsort(divmod(my_array, [1e-1, 1e-12, 1])[0].T)]
> array([[-0.5       ,  0.        ,  1.41421356],
>        [ 0.5       ,  0.        ,  1.41421356]])

>>> my_array = np.array([[-0.5, 0, 2**0.5], [-0.5, 1e-16, 2**0.5], [-0.5, -1e-16, 2**0.5],[0.5, 0, 2**0.5 - 1e-15]])
>>> my_array[np.lexsort(divmod(my_array, [1e-1, 1e-12, 1])[0].T)]
array([[ -5.00000000e-01,  -1.00000000e-16,   1.41421356e+00],
       [ -5.00000000e-01,   0.00000000e+00,   1.41421356e+00],
       [ -5.00000000e-01,   1.00000000e-16,   1.41421356e+00],
       [  5.00000000e-01,   0.00000000e+00,   1.41421356e+00]])


>
> Josef
>
>
>>
>> Regards, Freddie.
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>



More information about the NumPy-Discussion mailing list