[issue45530] Improve listobject.c's unsafe_tuple_compare()

Thu Oct 21 19:19:13 EDT 2021

Stefan Pochmann <stefan.pochmann at gmail.com> added the comment:

I see you mentioned that PyObject_RichCompareBool(..., Py_EQ) might be faster for example because it checks identity. Why not do an identity check before the ms->tuple_elem_compare calls? Too expensive and rarely successful?

> Extending the idea to positions beyond the first is dubious on those grounds: if the first tuple positions in fact often match, then the optimization has already backfired. Time to admit defeat then, not double down on failed arrogance ;-)

I don't see that. First and second positions are quite different.

For example I sorted a million tuples where both first and second element are randomly chosen from a population of 10,000. So their amounts of duplication were the same. But these are the statistics from sorting:
- First position:   18,603,981 equality comparisons, 29.87% equal
- Second position:   5,556,203 equality comparisons,  0.09% equal
Many first positions matched, almost no second positions matched.

One more idea: What do you think of sorting lists of tuples (or tuple keys) in two stages?
1) Sort the list only by first position (comparing with only one tuple_elem_compare).
2) Identify equal-first-position-runs (with tuple_elem_compare) and sort each run independently (comparing only positions 1+).
Some experiments I did with this looked good, not sure if too off-topic to post here...

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue45530>
_______________________________________