PEP 372 -- Adding an ordered directory to collections

Matimus mccredie at gmail.com
Wed Jun 18 19:20:59 EDT 2008


On Jun 16, 1:37 am, Armin Ronacher <armin.ronac... at active-4.com>
wrote:
> Abstract
> ========
>
> This PEP proposes an ordered dictionary as a new data structure for
> the ``collections`` module, called "odict" in this PEP for short.  The
> proposed API incorporates the experiences gained from working with
> similar implementations that exist in various real-world applications
> and other programming languages.
>
> Rationale
> =========
>
> In current Python versions, the widely used built-in dict type does
> not specify an order for the key/value pairs stored.  This makes it
> hard to use dictionaries as data storage for some specific use cases.
>
> Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
> certain order on iteration.  In those languages, and existing Python
> ordered-dict implementations, the ordering of items is defined by the
> time of insertion of the key.  New keys are appended at the end, keys
> that are overwritten and not moved.
>
> The following example shows the behavior for simple assignments:
>
> >>> d = odict()
> >>> d['parrot'] = 'dead'
> >>> d['penguin'] = 'exploded'
> >>> d.items()
>
> [('parrot', 'dead'), ('penguin', 'exploded')]
>
> That the ordering is preserved makes an odict useful for a couple of
> situations:
>
> - XML/HTML processing libraries currently drop the ordering of
>   attributes, use a list instead of a dict which makes filtering
>   cumbersome, or implement their own ordered dictionary.  This affects
>   ElementTree, html5lib, Genshi and many more libraries.
>
> - There are many ordererd dict implementations in various libraries
>   and applications, most of them subtly incompatible with each other.
>   Furthermore, subclassing dict is a non-trivial task and many
>   implementations don't override all the methods properly which can
>   lead to unexpected results.
>
>   Additionally, many ordered dicts are implemented in an inefficient
>   way, making many operations more complex then they have to be.
>
> - PEP 3115 allows metaclasses to change the mapping object used for
>   the class body.  An ordered dict could be used to create ordered
>   member declarations similar to C structs.  This could be useful, for
>   example, for future ``ctypes`` releases as well as ORMs that define
>   database tables as classes, like the one the Django framework ships.
>   Django currently uses an ugly hack to restore the ordering of
>   members in database models.
>
> - Code ported from other programming languages such as PHP often
>   depends on a ordered dict.  Having an implementation of an
>   ordering-preserving dictionary in the standard library could ease
>   the transition and improve the compatibility of different libraries.
>
> Ordered Dict API
> ================
>
> The ordered dict API would be mostly compatible with dict and existing
> ordered dicts.  (Note: this PEP refers to the Python 2.x dictionary
> API; the transfer to the 3.x API is trivial.)
>
> The constructor and ``update()`` both accept iterables of tuples as
> well as mappings like a dict does.  The ordering however is preserved
> for the first case:
>
> >>> d = odict([('a', 'b'), ('c', 'd')])
> >>> d.update({'foo': 'bar'})
> >>> d
>
> collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
>
> If ordered dicts are updated from regular dicts, the ordering of new
> keys is of course undefined again unless ``sort()`` is called.
>
> All iteration methods as well as ``keys()``, ``values()`` and
> ``items()`` return the values ordered by the the time the key-value
> pair was inserted:
>
> >>> d['spam'] = 'eggs'
> >>> d.keys()
>
> ['a', 'c', 'foo', 'spam']>>> d.values()
>
> ['b', 'd', 'bar', 'eggs']>>> d.items()
>
> [('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
>
> New methods not available on dict:
>
>     ``odict.byindex(index)``
>
>         Index-based lookup is supported by ``byindex()`` which returns
>         the key/value pair for an index, that is, the "position" of a
>         key in the ordered dict.  0 is the first key/value pair, -1
>         the last.
>
>         >>> d.byindex(2)
>         ('foo', 'bar')
>
>     ``odict.sort(cmp=None, key=None, reverse=False)``
>
>         Sorts the odict in place by cmp or key.  This works exactly
>         like ``list.sort()``, but the comparison functions are passed
>         a key/value tuple, not only the value.
>
>         >>> d = odict([(42, 1), (1, 4), (23, 7)])
>         >>> d.sort()
>         >>> d
>         collections.odict([(1, 4), (23, 7), (42, 1)])
>
>     ``odict.reverse()``
>
>         Reverses the odict in place.
>
>     ``odict.__reverse__()``
>
>         Supports reverse iteration by key.
>
> Questions and Answers
> =====================
>
> What happens if an existing key is reassigned?
>
>     The key is not moved but assigned a new value in place.  This is
>     consistent with existing implementations and allows subclasses to
>     change the behavior easily::
>
>         class movingcollections.odict):
>             def __setitem__(self, key, value):
>                 self.pop(key, None)
>                 odict.__setitem__(self, key, value)
>
> What happens if keys appear multiple times in the list passed to the
> constructor?
>
>     The same as for regular dicts: The latter item overrides the
>     former.  This has the side-effect that the position of the first
>     key is used because the key is actually overwritten:
>
>     >>> odict([('a', 1), ('b', 2), ('a', 3)])
>     collections.odict([('a', 3), ('b', 2)])
>
>     This behavior is consistent with existing implementations in
>     Python, the PHP array and the hashmap in Ruby 1.9.
>
> Why is there no ``odict.insert()``?
>
>     There are few situations where you really want to insert a key at
>     an specified index.  To avoid API complication, the proposed
>     solution for this situation is creating a list of items,
>     manipulating that and converting it back into an odict:
>
>     >>> d = odict([('a', 42), ('b', 23), ('c', 19)])
>     >>> l = d.items()
>     >>> l.insert(1, ('x', 0))
>     >>> odict(l)
>     collections.odict([('a', 42), ('x', 0), ('b', 23), ('c', 19)])
>
> Example Implementation
> ======================
>
> A poorly performing example implementation of the odict written in
> Python is available:
>
>     `odict.py <http://dev.pocoo.org/hg/sandbox/raw-file/tip/
> odict.py>`_
>
> The version for ``collections`` should be implemented in C and use a
> linked list internally.
>
> Other implementations of ordered dicts in various Python projects or
> standalone libraries, that inspired the API proposed here, are:
>
> - `odict in Babel`_
> - `OrderedDict in Django`_
> - `The odict module`_
> - `ordereddict`_ (a C implementation of the odict module)
> - `StableDict`_
> - `Armin Rigo's OrderedDict`_
>
> .. _odict in Babel:http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
> .. _OrderedDict in Django:
>    http://code.djangoproject.com/browser/django/trunk/django/utils/datas...
> .. _The odict module:http://www.voidspace.org.uk/python/odict.html
> .. _ordereddict:http://www.xs4all.nl/~anthon/Python/ordereddict/
> .. _StableDict:http://pypi.python.org/pypi/StableDict/0.2
> .. _Armin Rigo's OrderedDict:http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py
>
> Future Directions
> =================
>
> With the availability of an ordered dict in the standard library,
> other libraries may take advantage of that.  For example, ElementTree
> could return odicts in the future that retain the attribute ordering
> of the source file.
>
> Copyright
> =========
>
> This document has been placed in the public domain.

In Python 3.0 dict.items(), dict.keys() and dict.values() return set-
like views of the dictionary. http://www.python.org/dev/peps/pep-3106/.
You should consider doing the same thing for odict. Of course, the
views would be list-like, not set-like.

At the very least, I would like to see a discussion about it.

Matt



More information about the Python-list mailing list