sorteddict PEP proposal [started off as orderedict]

Duncan Booth duncan.booth at invalid.invalid
Tue Sep 25 06:51:42 EDT 2007


Mark Summerfield <m.n.summerfield at googlemail.com> wrote:

>     When handling collections of key-value data, it is often
>     convenient to access the data in key order.  One way to do this is
>     to use an unsorted data structure (e.g., a Python dict), and then
>     sort the keys as needed. Another way is to use a sorted data
>     structure (e.g., a sorteddict as proposed in this PEP), and simply
>     access the keys, knowing that they are always in sorted order.

Please define 'sorted order' and make sure it is customisable.

>     keys(firstindex : int = None, secondindex : int = None) ->
>  list of keys

I don't really see the point of this, but general support for slicing
might be interesting: 

   sd[start:end] -> list of values for all keys such that start <= key <
   end (using the appropriate definition of 'sorted order'). 


>     key(index : int) -> value
> 
>     item(index : int) -> (key, value)
> 
>     value(index : int) -> key

I'm confused: the key method returns a value and the value method
returns a key??? 

> 
>     set_value(index : int, value)
> 
>     delete(index : int)

All of those can of course be replaced with a single method that returns
the key at a specified index and then using that key. Yes that would be
less efficient, but I'd rather have as close to a standard dictionary
interface as possible. 

> Examples
> 
>     To keep a collection of filenames on a case-insensitive file
>     system in sorted order, we could use code like this:
> 
>         files = collections.sorteddict.sorteddict()
>         for name in os.listdir("."):
>             files[name.lower()] = name
> 
>     We can add new filenames at any time, safe in the knowledge that
>     whenever we call files.values(), we will get the files in
>     case-insensitive alphabetical order.

I don't see this as a terribly useful example since you don't explain 
why we need to keep filenames in case-insensitive alphabetical order at 
all. If we want to print a list of filenames we can sort them once when 
printing them, why do we need to keep them 'always sorted'?

>     To be able to iterate over a collection of data items in various
>     predetermined orders, we could maintain two or more sorteddicts:
> 
>         itemsByDate = collections.sorteddict.sorteddict()
>         itemsByName = collections.sorteddict.sorteddict()
>         itemsBySize = collections.sorteddict.sorteddict()
>         for item in listOfItems:
>             itemsByDate["%s\t%17X" % (item.date, id(item))] = item
>             itemsByName["%s\t%17X" % (item.name, id(item))] = item
>             itemsBySize["%s\t%17X" % (item.size, id(item))] = item

I think perl might make you use strings as keys, fortunately Python 
doesn't. What I really don't like about this is that you've copied the 
date, so if you mutate item.date the itemsByDate are no longer sorted.

Again it sounds like sorting for printing will be less overhead that 
keeping them always sorted. You need to come up with examples where 
there is an actual benefit to not sorting for output.

>     Here we have used the item IDs to ensure key uniqueness.

If you are going to do this then it really does need to be:

   itemsByDate = sorteddict(items, key=lambda self, k: self[k].date)

so that you can maintain sorted order when mutating an item. How the 
dictionary knows that its contents have been mutated is another question 
entirely.

> 
>     Now we can iterate in date or name order, for example:
> 
>         for item in itemsByDate:
>             print item.name, item.date

Which we can do just as well with an ordinary dictionary:

   for item in sorted(items, key=byDate):
       ...

given an appropriate definition of byDate.

The cases I can think of where a sorteddict might be useful would be 
things like collecting web server statistics and maintaining a regularly 
updated display of the top n pages. You need a combination of factors 
for this to be useful: frequent updates, and frequent extraction of 
sorted elements + the requirement to access elements as a dictionary.

Also, none of your use cases target any of the additional fripperies you 
threw into your proposal.



More information about the Python-list mailing list