sorteddict PEP proposal [started off as orderedict]

Paul Hankin paul.hankin at gmail.com
Wed Sep 26 07:14:57 EDT 2007


On Sep 26, 9:31 am, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
> Mark Summerfield <m.n.summerfi... at googlemail.com> wrote:
> > As you've no doubt realised, this particular implementation gives best
> > performance when the pattern of use is: lots of edits, lots of
> > lookups, ..., and gives worst performance when the pattern of use is:
> > edit, lookup, edit, lookup (in which case using a dict and sorted() is
> > probably better).
>
> > So there is lots of scope for someone to do a version that has good
> > performance for all patterns of use:-)
>
> I that's the point though: you can't write one implementation that has good
> performance for all patterns of use: you have to either compromise on
> performance somewhere, or use an implementation specifically matched to
> your use case.
>
> For example, if the use case was some updates followed by iterating over
> the first few keys only, it may be faster to use a heap for iteration.
>
> I suspect that for many use patterns you could improve performance
> significantly by having two levels of invalidation for the the key list: in
> __setitem__, if the key already exists you don't need to discard the list
> in __keycache (although if a key or cmp function was provided it may no
> longer be sorted). In that case it might be worthwhile keeping __keycache
> but flagging it as needing to be sorted again next time it is used. Given
> that Python's sort is very fast on sorted or nearly sorted lists this could
> provide a worthwhile speedup for cases where the values change but the keys
> don't change often.

More flexibly, keep a set of inserted keys that haven't yet been
included in the sorted list, and a set of deleted keys that haven't
yet been removed from the sorted list. The cache is invalid if either
of these sets are empty - and to make it valid you can choose what to
do based on the sizes of the two sets (and the sorted list). For
instance, if there's just been one insertion you're probably better
doing an insertion rather than a full resort. Of course, there's a few
nasty cases here but it's always possible to just throw away the
sorted list and reconstruct it from the dict keys when things get too
hairy (eg, the user deletes a key that's in the inserted-but-not-yet-
sorted set).

--
Paul Hankin




More information about the Python-list mailing list