[Python-ideas] Fwd: Add sorted (ordered) containers

Grant Jenks grant.jenks at gmail.com
Thu Oct 13 20:25:20 EDT 2016


On Thu, Oct 13, 2016 at 1:36 PM, Марк Коренберг <socketpair at gmail.com> wrote:
>
> I mean mutable containers that are always sorted when iterating over them.
>
> See http://bugs.python.org/issue28433
>
> for example:
>
> * SortedSet (sorted unique elements, implemented using (rb?)tree instead of hash)
> * SortedList (sorted elements, the same as SortedSet, but without uniquiness constraint) - actually a (rb?)tree, not a list (i.e. not an array)
> * SortedDict (sorted by key when interating) - like C++'s ordered_map

Can you share more about your use cases for these containers? What are
you making?

Nick Coghlan gave an answer to this question on StackOverflow at
http://stackoverflow.com/a/5958960/232571 The answer kind of boils
down to "there should be one obvious way to do it" and existing Python
features like lists, sorted, bisect, and heapq cover many use cases.

I wrote the answer that is now the second highest rated for that
question. I've noticed that the upvotes have been accumulating at a
slightly higher rate than Nick's answer. I think that reflects an
increase in interest and maybe gradual tide change of opinion.

> There are many implementations in the net, like:
>
> http://www.grantjenks.com/docs/sortedcontainers

That's my project. I am the primary developer of the SortedContainers project.

You may also be interested in the
[SortedCollections](http://www.grantjenks.com/docs/sortedcollections/)
module which builds atop SortedContainers with data types like
ValueSortedDict and ItemSortedDict. Because it's pure-Python,
SortedContainers offers a lot of opportunity for
extension/customization. That's also made it easier for the API to
adapt/expand over time.

> I think it should be one standardized implementation of such containers in CPython.
>
> Instead of trees, implementation may use SkipList structure, but this is just implementation details.
>
> Such structres imply fast insertion and deletion, ability to iterate, and also memory efficiency.

I gave a talk at PyCon 2016 about Python Sorted Collections[1] that's
worth watching. The first third discusses about six different
implementations with different strengths and weaknesses. The choice of
data type is more than implementation details. One of the biggest
issues is the various tradeoffs of data types like blists, rbtrees,
etc.

I have been meaning to discuss sorted collections further with Raymond
Hettinger (author of the Python collections module). We spoke after
the PyCon talk and wanted to continue the conversation. But I had a
busy summer and just a few weeks ago welcomed my first son into the
world. So realistically that conversation won't happen until 2017.

[1]: http://www.grantjenks.com/docs/sortedcontainers/pycon-2016-talk.html

> I recommend to read thorough review articles written by Andrew Barnert:
>
> http://stupidpythonideas.blogspot.com/2013/07/sorted-collections-in-stdlib.html
>
> http://stupidpythonideas.blogspot.com/2014/04/sortedcontainers.html

One of Andrew Barnert's conclusions is that SortedContainers could not
scale. I did a pretty rigorous performance analysis and benchmarking
at http://www.grantjenks.com/docs/sortedcontainers/performance-scale.html
Short answer: I scaled SortedContainers up through ten billion
elements, well past the memory limits of most machines.


More information about the Python-ideas mailing list