sorteddict PEP proposal [started off as orderedict]

Duncan Booth duncan.booth at invalid.invalid
Wed Sep 26 04:31:15 EDT 2007


Mark Summerfield <m.n.summerfield at googlemail.com> wrote:

> As you've no doubt realised, this particular implementation gives best
> performance when the pattern of use is: lots of edits, lots of
> lookups, ..., and gives worst performance when the pattern of use is:
> edit, lookup, edit, lookup (in which case using a dict and sorted() is
> probably better).
> 
> So there is lots of scope for someone to do a version that has good
> performance for all patterns of use:-)

I that's the point though: you can't write one implementation that has good 
performance for all patterns of use: you have to either compromise on 
performance somewhere, or use an implementation specifically matched to 
your use case.

For example, if the use case was some updates followed by iterating over 
the first few keys only, it may be faster to use a heap for iteration.

I suspect that for many use patterns you could improve performance 
significantly by having two levels of invalidation for the the key list: in 
__setitem__, if the key already exists you don't need to discard the list 
in __keycache (although if a key or cmp function was provided it may no 
longer be sorted). In that case it might be worthwhile keeping __keycache 
but flagging it as needing to be sorted again next time it is used. Given 
that Python's sort is very fast on sorted or nearly sorted lists this could 
provide a worthwhile speedup for cases where the values change but the keys 
don't change often.




More information about the Python-list mailing list