sorteddict PEP proposal [started off as orderedict]

Tue Sep 25 14:58:36 EDT 2007

Paul Hankin wrote:
> On Sep 25, 12:51 pm, Mark Summerfield <m.n.summerfi... at googlemail.com>
> wrote:
>> On 25 Sep, 12:19, Paul Hankin <paul.han... at gmail.com> wrote:
>>
>>
>>
>>> Recall sorted...
>>>     sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted
>>> list
>>> So why not construct sorteddicts using the same idea of sortedness?
>>> sorteddict((mapping | sequence | nothing), cmp=None, key=None,
>>> reverse=None)
>>> Then we can specify the exact behaviour of sorteddicts: it's the same
>>> as a regular dict, except when the order of keys is important they're
>>> ordered as if they were sorted by sorted(keys, cmp=sd._cmp,
>>> key=sd._key, reverse=sd._reverse). Here I imagine cmp, key and reverse
>>> are stored in the new sorteddict as attributes _cmp, _key, _reverse -
>>> obviously the actual implementation may differ.
>>> This has the benefit of making sorteddict's behaviour explicit and
>>> easy to understand, and doesn't introduce a new sorting API when we
>>> already have a perfectly decent one.
>>> The only problem here is the **kwargs form of the dict constructor
>>> doesn't translate as we're using keyword args to pass in the sort
>>> criterion. Perhaps someone has an idea of how this can be solved. If
>>> nothing else, this constructor could be dropped for sorteddicts, and
>>> sorteddict(dict(**kwargs), cmp=..., key=..., reverse=...) when that
>>> behaviour is wanted.
>>> I don't like the integral indexing idea or the slicing: indexing and
>>> slicing are already part of python and it's bad to have different
>>> syntax for the same concept on sorteddicts. I'd say it's not an
>>> important enough for sorteddicts anyway.
>>> --
>>> Paul Hankin
>> This makes a lot of sense to me. But how do you envisage it would be
>> implemented?
> 
> Here's a first go. Sorting occurs when the keys are iterated over,
> making it fast (almost as a dict) for construction, insertion, and
> deletion, but slow if you're iterating a lot. You should look at some
> use cases to decide if this approach is best, or if a sorted
> datastructure should be used instead, but my instinct is that this is
> a decent approach. Certainly, you're unlikely to get a simpler
> implementation :)
> 
> class sorteddict(dict):
>     "A sorted dictionary"
>     def __init__(self, arg=None, cmp=None, key=None, reverse=False):
>         if arg:
>             super(sorteddict, self).__init__(arg)
>         else:
>             super(sorteddict, self).__init__()
>         self._cmp = cmp
>         self._key = key
>         self._reverse = reverse
>     def keys(self):
>         return sorted(super(sorteddict, self).keys(), cmp=self._cmp,
>             key=self._key, reverse=self._reverse)
>     def iter_keys(self):
>         return (s for s in self.keys())
>     def items(self):
>         return [(key, self[key]) for key in self.keys()]
>     def iter_items(self):
>         return ((key, self[key]) for key in self.keys())
>     def values(self):
>         return [self[key] for key in self.keys()]
>     def iter_values(self):
>         return (self[key] for key in self.keys())
>     def __str__(self):
>         return '{' + ', '.join('%s: %s' % (repr(k), repr(v))
>             for k, v in self.iter_items()) + '}'
>     def __repr__(self):
>         return str(self)
>     def __iter__(self):
>         return self.iter_keys()

With this is the implementation, I'm definitely -1. Not because it's a 
bad implementation, but because if the iteration is always doing a sort, 
then there's no reason for a separate data structure. Just use sorted() 
directly on a regular dict. That has the advantage of being much more 
obvious about where the expensive operations are::

     for key in sorted(d, ...):
         ... whatever you want to do ...

IMHO, the only reason to introduce a sorteddict() is if it minimizes the 
cost of sorting the keys.

STeVe