An ordered dictionary for the Python library?

Mark Summerfield m.n.summerfield at googlemail.com
Wed Sep 12 13:07:13 EDT 2007


On 12 Sep, 15:04, Michele Simionato <michele.simion... at gmail.com>
wrote:
> On Sep 12, 3:54 pm, Mark Summerfield <m.n.summerfi... at googlemail.com>
> wrote:
>
>
>
> > On 12 Sep, 13:46, Michele Simionato <michele.simion... at gmail.com>
>
> > Actually I meant by key order, so insertion order doesn't matter at
> > all. If you need a dictionary-like data structure that respects
> > insertion order you could use a list of (key, value) tuples.
>
> > Another respondent asked about use cases.
>
> > I have found time and again that I needed (key, value) pairs where the
> > key is some string that provides a representation for human readers
> > and the value is some internal key (e.g., database ID) that the system
> > uses internally. In these cases I often need to present the user with
> > a list of items from which to choose and want to present them in
> > sorted order. Naturally, I could use an ordinary dict and then do
> > this:
>
> > for string in sorted(d.keys()):
> >     process(string)
>
> > But what happens when I need to do this a *lot* and when the number of
> > items is hundreds or a few thousands? I'm having to sort again and
> > again, since it is often the case that the items in the list changes
> > during the runtime of the application. So my solution in C++ is to use
> > an ordered dictionary (a map in C++ jargon), which in Python means I
> > can simply write:
>
> > for string in od.keys():
> >     process(string)
>
> For your use case I would wrap a list [(key, value)] with a dict-like
> object and I would use the bisect module in the standard library to
> keep
> the inner list ordered.
>
>  M.S.

Actually, in my own ordereddict I wrap a dict and keep the keys in
order using the bisect module.
This works fine, but I imagine that a C-based implementation based on
B*trees or skiplists would be a lot faster.

In response to some of the following emails, I note one responder says
such a class is trivial to implement. Yes, and so presumably is
defaultdict, but the latter is in the standard library. One of the
"problems" of Python is that the range of skills of its users varies
from casual "amateur" programmers to guru professionals and everywhere
inbetween. Sure at somewhere on that continuum implementing *anything*
is "trivial", but for some users it is worthwhile to have it out of
the box.

What I like about ordered dictionaries is the ability to have sorted
data without sorting. It lets me have multiple sort orders. For
example, in some programs I give the user the choice of how they want
their data ordered and can provide such orderings without sorting,
simply by maintaining a set of parallel data structures with keys
being ordered and values being pointers (object references in Python)
to the relevant values. This costs in memory (but not much since no
values are duplicated and the values are usually large the keys
usually small).

Psuedocode example:

class Person:
    def __init__(self, name, payrollno, grade, dept):
        self.name = name
        # etc

For each person created:
    personFromName["%s\t%012d" % (person.name.lower(),
person.payrollno)] = person
    personFromPayrollNo[person.payrollno] = person
    personFromGrade["%s\t%012d" % (person.grade, person.payrollno)] =
person
    personFromDept["%s\t%012d" % (person.dept, person.payrollno)] =
person

So now I have four orderings with no sorting. (I have to disambiguate
to avoid duplicates and to provide subordering.)

If I have 10,000 people I would potentially have to resort 10,000
instances whenever the user chose a new sort order: this way I just
have to iterate over the right ordered dictionary.

So I do feel that there are good use cases, and I think that for some
Python users having this in the library would be a genuine benefit.

Most of the use cases I envisage involve lots of insertions at start
up, relatively few during runtime, and lots of iterations over the
data as the user changes their view of it.

Of course Python does have an ordered dictionary, it is just not
necessarily always installed. You can use the bsdbd3 module's btopen()
method and pass None as filename to get an in-memory Btree, but I
believe (not sure) that there may be length limits on the keys.




More information about the Python-list mailing list