[New-bugs-announce] [issue36095] Better NaN sorting.
Brandt Bucher
report at bugs.python.org
Sat Feb 23 14:39:17 EST 2019
New submission from Brandt Bucher <brandtbucher at gmail.com>:
Sorting sequences containing NaN values produces an incompletely sorted result. Further, because of the complexity of the timsort, this incomplete sort often silently produces unintuitive, unstable-seeming results that are extremely sensitive to the ordering of the inputs:
>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2.0, 3, nan, 2]
>>> sorted(reversed([3, 1, 2, float('nan'), 2.0, 2, 2.0]))
[1, 2.0, 2, 2.0, nan, 2, 3]
The patch I have provided addresses these issues, including for lists containing nested lists/tuples with NaN values. Specifically, it stably sorts NaNs to the end of the list with no changes to the timsort itself (just the element-wise comparison functions):
>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2, 2.0, 3, nan]
>>> sorted([[3], [1], [2], [float('nan')], [2.0], [2], [2.0]])
[[1], [2], [2.0], [2], [2.0], [3], [nan]]
It also includes a new regression test for this behavior.
Some other benefits to this patch:
* These changes generally result in a sorting performance improvement across data types. The largest increases here are for nested lists, since we add a new unsafe_list_compare function. Other speed increases are due to safe_object_compare's delegation to unsafe comparison functions for objects of the same type. Specifically, the speed impact (positive is faster, negative is slower) is between:
* -3% and +3% (10 elements, no PGO)
* 0% and +4% (10 elements, PGO)
* 0% and +9% (1000 elements, no PGO)
* -1% and +9% (1000 elements, PGO)
* The current weird NaN-sorting behavior is not documented, so this is not a breaking change.
* IEEE754 compliance is maintained. The result is still a stable (arguably, more stable), nondecreasing ordering of the original list.
----------
components: Interpreter Core
messages: 336401
nosy: brandtbucher
priority: normal
severity: normal
status: open
title: Better NaN sorting.
type: behavior
versions: Python 3.8
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36095>
_______________________________________
More information about the New-bugs-announce
mailing list