[Python-ideas] Sorted lists

Steven D'Aprano steve at pearwood.info
Sun Apr 7 22:32:05 EDT 2019


There are quite a few important algorithms which require lists to be 
sorted. For example, the bisect module, and for statistics median and 
other quantiles.

Sorting a list is potentially expensive: while Timsort is very 
efficient, it is still ultimately an O(N log N) algorithm which limits 
how efficient it can be. Checking whether a list is sorted is O(N).

What if we could check that lists were sorted in constant time?

Proposal: let's give lists a dunder flag, __issorted__, that tracks 
whether the list is definitely sorted or not:

- Empty lists, or lists with a single item, are created with
  __issorted__ = True; lists with two or more items are created
  with the flag set to False.

- Appending or inserting items sets the flag to False.

- Deleting or popping items doesn't change the flag.

- Reversing the list doesn't change the flag.

- Sorting it sets the flag to True. (The sort method should NOT
  assume the list is sorted just because the flag is set.)

Functions that require the list to be sorted can use the flag as a 
quick check:

    if not alist.__issorted__:
        alist.sort()
    ...

The flag will be writable, so that functions such as those in 
bisect can mark that they have kept the sorted invariant:


    bisect.insort(alist, x)
    assert alist.__issorted__


Being writable, the flag is advisory, not a guarantee, and "consenting 
adults" applies. People can misuse the flag:

    alist = [1, 4, 2, 0, 5]
    alist.__issorted__ = True

but then they have nobody to blame but themselves if they shoot 
themselves in the foot. That's no worse than the situation we have now, 
were you might pass an unsorted list to bisect.

The flag doesn't guarantee that the list is sorted the way you want 
(e.g. biggest to smallest, by some key, etc) only that it has been 
sorted. Its up to the user to ensure they sort it the right way:

    # Don't do this and expect it to work!
    alist.sort(key=random.random)
    bisect.insort(alist, 1)


If you really want to be sure about the state of the list, you have to 
make a copy and sort it. But that's no different from the situation 
right now. But for those willing to assume "consenting adults", you 
might trust the flag and avoid sorting.

Downsides:

- Every list grows an extra attribute; however, given that lists are
  already quite big data structures and are often over-allocated, I 
  don't think this will matter much.

- insert(), append(), extend(), __setitem__() will be a tiny bit 
  slower due to the need to set the flag.



Thoughts?




-- 
Steven


More information about the Python-ideas mailing list