Sorting NaNs

Chris Angelico rosuav at gmail.com
Sat Jun 2 05:42:08 EDT 2018


On Sat, Jun 2, 2018 at 7:31 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 2 June 2018 at 08:32, Peter J. Holzer <hjp-python at hjp.at> wrote:
>> Browsing through older messages I came upon a thread discussing the
>> treatment of NaNs by median(). Since you have to (partially) sort the
>> values to compute the median, I played around with sorted():
>>
>> Python 3.5.3 (default, Jan 19 2017, 14:11:04)
>> [GCC 6.3.0 20170118] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>
>>>>> sorted([3, 5, float('NaN'), 1])
>> [3, 5, nan, 1]
>>
>> What? Does NaN cause sorted to return the original list?
>>
>>>>> sorted([3, 5, float('NaN'), 1, 0.5])
>> [3, 5, nan, 0.5, 1]
>>
>> Nope. Does it partition the list into sublists, which are sorted
>> individually?
>>
>>>>> sorted([3, 5, -8, float('NaN'), 1, 0.5])
>> [-8, 0.5, 1, 3, 5, nan]
>>>>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33])
>> [-8, 0.5, 1, 3, 5, nan, 33]
>>
>> Also nope. It looks like NaNs just mess up sorting in an unpredictable
>> way. Is this the intended behaviour or just an accident of
>> implementation? (I think it's the latter: I can see how a sort algorithm
>> which doesn't treat NaN specially would produce such results.)
>
> I'd simply assume it's the result of two factors:
>
> 1. The behaviour of comparisons involving NaN values is weird (not
> undefined, as far as I know NaN behaviour is very well defined, but
> violates a number of normally fundamental properties of comparisons)
> 2. The precise behaviour of the sort algorithm use by Python is not
> mandated by the language.
>
> A consequence of (1) is that there is no meaningful definition of "a
> sorted list of numbers" if that list includes NaN, as the definition
> "being sorted" relies on properties like "only one of a < b and b < a
> is true" (precisely which properties depend on how you define "sorted"
> which no-one normally cares about because under normal assumptions
> they are all equivalent). A list including NaNs therefore cannot be
> permuted into a sorted order (which is the basic language-mandated
> detail of sorted() - there are others, but this is what matters here).
>
> So call it an accident of implementation of you like. Or "sorting a
> list with NaNs in it is meaningless" if you prefer. Or "undefined
> behaviour" if you're a fan of the language in the C standard.

Sorting a list of uncomparable objects is meaningless (though not
undefined). I believe that a list of floats with some NaNs in it
should have all the other values sort correctly, with the NaNs doing
whatever they feel like doing (they're like cats - have you ever tried
to sort an array of cats?); but that's hard to do with the vanilla
sorting operation. If you want some sort of predictable behaviour,
it's not too hard; all you need is a comparison function that puts all
the NaNs together:

>>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33], key=lambda x: (x!=x, x))
[-8, 0.5, 1, 3, 5, 33, nan]

Or if you'd rather shove them all to the beginning:

>>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33], key=lambda x: (x==x, x))
[nan, -8, 0.5, 1, 3, 5, 33]

With that change, you have a guarantee that all finite and infinite
values will be correctly sorted, and the NaNs will be grouped (but in
no particular order within that group).

ChrisA



More information about the Python-list mailing list