sorteddict PEP proposal [started off as orderedict]

Mark Summerfield m.n.summerfield at googlemail.com
Tue Sep 25 04:53:22 EDT 2007


Hi,

Below is a PEP proposal for a sorteddict. It arises out of a
discussion on this list that began a few weeks ago with the subject of
"An ordered dictionary for the Python library?", and a similar one on
the P3K mailing list with the subject "ordered dict for p3k
collections?".

If there is positive feedback I will submit the PEP to the reviewers,
so if you think it is a good idea please say so. (I'm sure that if you
_don't_ like it you'll tell me anyway:-)

PEP: XXX
Title: Sorted Dictionary
Version: $Revision$
Last-Modified: $Date$
Author: Mark Summerfield
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 25-Sep-2007
Python-Version: 2.6
Post-History:


Abstract

    This PEP proposes the addition of a sorted dictionary class to the
    standard library's collections module.


Rationale

    When handling collections of key-value data, it is often
convenient to
    access the data in key order.  One way to do this is to use an
unsorted
    data structure (e.g., a Python dict), and then sort the keys as
needed.
    Another way is to use a sorted data structure (e.g., a sorteddict
as
    proposed in this PEP), and simply access the keys, knowing that
they
    are always in sorted order.

    Both approaches make sense in the right circumstances, but Python
    currently only supports the first approach out of the box.  A
    sorteddict never needs sorting and is ideal for creating indexes
to
    data where the keys are based on some property of the data.
Adding a
    sorteddict to the collections module would not add significantly
to the
    size of Python's standard library, but would provide a very useful
data
    structure.


Specification

    The proposed sorteddict has the same API to the builtin dict, so
can be
    used as a drop-in replacement.  The only behavioural difference
being
    that any list returned by the sorteddict (whether of keys, values,
or
    items) will be in key order, and similarly any iterator returned
will
    iterate in key order.

    In addition, the keys() method has two optional arguments:

    keys(firstindex : int = None, secondindex : int = None) -> list of
keys

    The parameter names aren't nice, but using say "start" and "end"
would
    be misleading since the intention is for the parameters to work
like
    they do in range(), e.g.

        sd.keys()       # returns a list of all the sorteddict's keys
        sd.keys(10)     # returns a list of the first 10 keys
        sd.keys(19, 35) # returns a list of the 19th-34th keys
inclusive

    If an out of range index is given, an IndexError exception is
raised.

    Since the sorteddict's data is always kept in key order, indexes
    (integer offsets) into the sorteddict make sense.  Five additional
    methods are proposed to take advantage of this:

    key(index : int) -> value

    item(index : int) -> (key, value)

    value(index : int) -> key

    set_value(index : int, value)

    delete(index : int)

    Items and values can still be accessed using the key, e.g.,
sd[key],
    since all the dict's methods are available.


Examples

    To keep a collection of filenames on a case-insensitive file
system in
    sorted order, we could use code like this:

        files = collections.sorteddict.sorteddict()
        for name in os.listdir("."):
            files[name.lower()] = name

    We can add new filenames at any time, safe in the knowledge that
    whenever we call files.values(), we will get the files in
    case-insensitive alphabetical order.

    To be able to iterate over a collection of data items in various
    predetermined orders, we could maintain two or more sorteddicts:

        itemsByDate = collections.sorteddict.sorteddict()
        itemsByName = collections.sorteddict.sorteddict()
        itemsBySize = collections.sorteddict.sorteddict()
        for item in listOfItems:
            itemsByDate["%s\t%17X" % (item.date, id(item))] = item
            itemsByName["%s\t%17X" % (item.name, id(item))] = item
            itemsBySize["%s\t%17X" % (item.size, id(item))] = item

    Here we have used the item IDs to ensure key uniqueness.

    Now we can iterate in date or name order, for example:

        for item in itemsByDate:
            print item.name, item.date


Implementation

    A pure Python implementation is available at:

    http://pypi.python.org/pypi/sorteddict


Copyright

    This document has been placed in the public domain.




More information about the Python-list mailing list