sorteddict PEP proposal [started off as orderedict]

Raymond Hettinger python at rcn.com
Wed Sep 26 13:59:52 EDT 2007


[Mark Summerfield]
> Below is a PEP proposal for a sorteddict. It arises out of a
> discussion on this list that began a few weeks ago with the subject of
> "An ordered dictionary for the Python library?"

It is worth remembering that the long sought after datatype was a dict
that could loop over keys in *insertion* order, not based on some
function of the key itself.


> If there is positive feedback I will submit the PEP to the reviewers,
> so if you think it is a good idea please say so.

As one who was submitted many PEPs, I can advise that the quickest way
to irrevocably kill an idea is to submit a PEP to early (we almost
didn't get genexps because the PEP was submitted pre-maturely).
Instead, I advise posting an ASPN recipe so the details can be
hammered-out and a fan club can be grown.

For this particular idea, I am -1 on inclusion in the standard
library.  The PEP focuses mainly on implementation but is weakest in
the rationale section.  Its advantages over "for k in sorted(d)" are
miniscule.  To be considered for the collections module, there needs
to be compelling use case advantages either in terms of usability or
performance.

After grepping through my accumulated code base for "sorted", I see
remarkably few places where sortdict would have be preferred and in
those few cases, the advantage was so slight that the code would not
be worth changing.  The majority of dictionary sorts are characterized
by fully building a dictionary and then sorting it. In those cases,
sortdict merely offers a different spelling and adds a small
performance cost during key insertion.  The sortdict has some
potential in the comparatively rare case of a long lived dictionary
where periodic sorted key requests are interleaved with operations
that mutate the dictionary.  In those cases, the separate tracking of
added/removed keys may save some sorting effort; however, there is an
offsetting cost of many __hash__/__cmp__ calls in the list
comprehension update and the modest savings was in all cases
trivialized by cost of whatever was initiating the dictionary
mutations (user inputs or web updates).

Also, there were cases when the sortdict was at a distinct
disadvantage.  In those cases, there were multiple key functions
(key=record.name or key=record.date or key=record.size).  There
standard approach using sorted() creates three separate sorted lists
and only one dict.  In contrast, the sortdict approach would need
three sortdicts each with their own dict and sorted list.

IMHO, the collections module should be saved for something much more
powerful and generic like a dictionary that generates a callback
whenever it is mutated.  A datatype like that would have it much
easier for you to implement something like this sortdict or the long
sought after dict that remembers its key insertion order.


Raymond




More information about the Python-list mailing list