[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann
report at bugs.python.org
Thu Oct 21 19:19:13 EDT 2021
Stefan Pochmann <stefan.pochmann at gmail.com> added the comment:
I see you mentioned that PyObject_RichCompareBool(..., Py_EQ) might be faster for example because it checks identity. Why not do an identity check before the ms->tuple_elem_compare calls? Too expensive and rarely successful?
> Extending the idea to positions beyond the first is dubious on those grounds: if the first tuple positions in fact often match, then the optimization has already backfired. Time to admit defeat then, not double down on failed arrogance ;-)
I don't see that. First and second positions are quite different.
For example I sorted a million tuples where both first and second element are randomly chosen from a population of 10,000. So their amounts of duplication were the same. But these are the statistics from sorting:
- First position: 18,603,981 equality comparisons, 29.87% equal
- Second position: 5,556,203 equality comparisons, 0.09% equal
Many first positions matched, almost no second positions matched.
One more idea: What do you think of sorting lists of tuples (or tuple keys) in two stages?
1) Sort the list only by first position (comparing with only one tuple_elem_compare).
2) Identify equal-first-position-runs (with tuple_elem_compare) and sort each run independently (comparing only positions 1+).
Some experiments I did with this looked good, not sure if too off-topic to post here...
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45530>
_______________________________________
More information about the Python-bugs-list
mailing list