Sorting NaNs

Paul Moore p.f.moore at gmail.com
Sat Jun 2 05:31:37 EDT 2018


On 2 June 2018 at 08:32, Peter J. Holzer <hjp-python at hjp.at> wrote:
> Browsing through older messages I came upon a thread discussing the
> treatment of NaNs by median(). Since you have to (partially) sort the
> values to compute the median, I played around with sorted():
>
> Python 3.5.3 (default, Jan 19 2017, 14:11:04)
> [GCC 6.3.0 20170118] on linux
> Type "help", "copyright", "credits" or "license" for more information.
>
>>>> sorted([3, 5, float('NaN'), 1])
> [3, 5, nan, 1]
>
> What? Does NaN cause sorted to return the original list?
>
>>>> sorted([3, 5, float('NaN'), 1, 0.5])
> [3, 5, nan, 0.5, 1]
>
> Nope. Does it partition the list into sublists, which are sorted
> individually?
>
>>>> sorted([3, 5, -8, float('NaN'), 1, 0.5])
> [-8, 0.5, 1, 3, 5, nan]
>>>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33])
> [-8, 0.5, 1, 3, 5, nan, 33]
>
> Also nope. It looks like NaNs just mess up sorting in an unpredictable
> way. Is this the intended behaviour or just an accident of
> implementation? (I think it's the latter: I can see how a sort algorithm
> which doesn't treat NaN specially would produce such results.)

I'd simply assume it's the result of two factors:

1. The behaviour of comparisons involving NaN values is weird (not
undefined, as far as I know NaN behaviour is very well defined, but
violates a number of normally fundamental properties of comparisons)
2. The precise behaviour of the sort algorithm use by Python is not
mandated by the language.

A consequence of (1) is that there is no meaningful definition of "a
sorted list of numbers" if that list includes NaN, as the definition
"being sorted" relies on properties like "only one of a < b and b < a
is true" (precisely which properties depend on how you define "sorted"
which no-one normally cares about because under normal assumptions
they are all equivalent). A list including NaNs therefore cannot be
permuted into a sorted order (which is the basic language-mandated
detail of sorted() - there are others, but this is what matters here).

So call it an accident of implementation of you like. Or "sorting a
list with NaNs in it is meaningless" if you prefer. Or "undefined
behaviour" if you're a fan of the language in the C standard.

Paul



More information about the Python-list mailing list