PEP 372 -- Adding an ordered directory to collections
Matimus
mccredie at gmail.com
Wed Jun 18 19:20:59 EDT 2008
On Jun 16, 1:37 am, Armin Ronacher <armin.ronac... at active-4.com>
wrote:
> Abstract
> ========
>
> This PEP proposes an ordered dictionary as a new data structure for
> the ``collections`` module, called "odict" in this PEP for short. The
> proposed API incorporates the experiences gained from working with
> similar implementations that exist in various real-world applications
> and other programming languages.
>
> Rationale
> =========
>
> In current Python versions, the widely used built-in dict type does
> not specify an order for the key/value pairs stored. This makes it
> hard to use dictionaries as data storage for some specific use cases.
>
> Some dynamic programming languages like PHP and Ruby 1.9 guarantee a
> certain order on iteration. In those languages, and existing Python
> ordered-dict implementations, the ordering of items is defined by the
> time of insertion of the key. New keys are appended at the end, keys
> that are overwritten and not moved.
>
> The following example shows the behavior for simple assignments:
>
> >>> d = odict()
> >>> d['parrot'] = 'dead'
> >>> d['penguin'] = 'exploded'
> >>> d.items()
>
> [('parrot', 'dead'), ('penguin', 'exploded')]
>
> That the ordering is preserved makes an odict useful for a couple of
> situations:
>
> - XML/HTML processing libraries currently drop the ordering of
> attributes, use a list instead of a dict which makes filtering
> cumbersome, or implement their own ordered dictionary. This affects
> ElementTree, html5lib, Genshi and many more libraries.
>
> - There are many ordererd dict implementations in various libraries
> and applications, most of them subtly incompatible with each other.
> Furthermore, subclassing dict is a non-trivial task and many
> implementations don't override all the methods properly which can
> lead to unexpected results.
>
> Additionally, many ordered dicts are implemented in an inefficient
> way, making many operations more complex then they have to be.
>
> - PEP 3115 allows metaclasses to change the mapping object used for
> the class body. An ordered dict could be used to create ordered
> member declarations similar to C structs. This could be useful, for
> example, for future ``ctypes`` releases as well as ORMs that define
> database tables as classes, like the one the Django framework ships.
> Django currently uses an ugly hack to restore the ordering of
> members in database models.
>
> - Code ported from other programming languages such as PHP often
> depends on a ordered dict. Having an implementation of an
> ordering-preserving dictionary in the standard library could ease
> the transition and improve the compatibility of different libraries.
>
> Ordered Dict API
> ================
>
> The ordered dict API would be mostly compatible with dict and existing
> ordered dicts. (Note: this PEP refers to the Python 2.x dictionary
> API; the transfer to the 3.x API is trivial.)
>
> The constructor and ``update()`` both accept iterables of tuples as
> well as mappings like a dict does. The ordering however is preserved
> for the first case:
>
> >>> d = odict([('a', 'b'), ('c', 'd')])
> >>> d.update({'foo': 'bar'})
> >>> d
>
> collections.odict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
>
> If ordered dicts are updated from regular dicts, the ordering of new
> keys is of course undefined again unless ``sort()`` is called.
>
> All iteration methods as well as ``keys()``, ``values()`` and
> ``items()`` return the values ordered by the the time the key-value
> pair was inserted:
>
> >>> d['spam'] = 'eggs'
> >>> d.keys()
>
> ['a', 'c', 'foo', 'spam']>>> d.values()
>
> ['b', 'd', 'bar', 'eggs']>>> d.items()
>
> [('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
>
> New methods not available on dict:
>
> ``odict.byindex(index)``
>
> Index-based lookup is supported by ``byindex()`` which returns
> the key/value pair for an index, that is, the "position" of a
> key in the ordered dict. 0 is the first key/value pair, -1
> the last.
>
> >>> d.byindex(2)
> ('foo', 'bar')
>
> ``odict.sort(cmp=None, key=None, reverse=False)``
>
> Sorts the odict in place by cmp or key. This works exactly
> like ``list.sort()``, but the comparison functions are passed
> a key/value tuple, not only the value.
>
> >>> d = odict([(42, 1), (1, 4), (23, 7)])
> >>> d.sort()
> >>> d
> collections.odict([(1, 4), (23, 7), (42, 1)])
>
> ``odict.reverse()``
>
> Reverses the odict in place.
>
> ``odict.__reverse__()``
>
> Supports reverse iteration by key.
>
> Questions and Answers
> =====================
>
> What happens if an existing key is reassigned?
>
> The key is not moved but assigned a new value in place. This is
> consistent with existing implementations and allows subclasses to
> change the behavior easily::
>
> class movingcollections.odict):
> def __setitem__(self, key, value):
> self.pop(key, None)
> odict.__setitem__(self, key, value)
>
> What happens if keys appear multiple times in the list passed to the
> constructor?
>
> The same as for regular dicts: The latter item overrides the
> former. This has the side-effect that the position of the first
> key is used because the key is actually overwritten:
>
> >>> odict([('a', 1), ('b', 2), ('a', 3)])
> collections.odict([('a', 3), ('b', 2)])
>
> This behavior is consistent with existing implementations in
> Python, the PHP array and the hashmap in Ruby 1.9.
>
> Why is there no ``odict.insert()``?
>
> There are few situations where you really want to insert a key at
> an specified index. To avoid API complication, the proposed
> solution for this situation is creating a list of items,
> manipulating that and converting it back into an odict:
>
> >>> d = odict([('a', 42), ('b', 23), ('c', 19)])
> >>> l = d.items()
> >>> l.insert(1, ('x', 0))
> >>> odict(l)
> collections.odict([('a', 42), ('x', 0), ('b', 23), ('c', 19)])
>
> Example Implementation
> ======================
>
> A poorly performing example implementation of the odict written in
> Python is available:
>
> `odict.py <http://dev.pocoo.org/hg/sandbox/raw-file/tip/
> odict.py>`_
>
> The version for ``collections`` should be implemented in C and use a
> linked list internally.
>
> Other implementations of ordered dicts in various Python projects or
> standalone libraries, that inspired the API proposed here, are:
>
> - `odict in Babel`_
> - `OrderedDict in Django`_
> - `The odict module`_
> - `ordereddict`_ (a C implementation of the odict module)
> - `StableDict`_
> - `Armin Rigo's OrderedDict`_
>
> .. _odict in Babel:http://babel.edgewall.org/browser/trunk/babel/util.py?rev=374#L178
> .. _OrderedDict in Django:
> http://code.djangoproject.com/browser/django/trunk/django/utils/datas...
> .. _The odict module:http://www.voidspace.org.uk/python/odict.html
> .. _ordereddict:http://www.xs4all.nl/~anthon/Python/ordereddict/
> .. _StableDict:http://pypi.python.org/pypi/StableDict/0.2
> .. _Armin Rigo's OrderedDict:http://codespeak.net/svn/user/arigo/hack/pyfuse/OrderedDict.py
>
> Future Directions
> =================
>
> With the availability of an ordered dict in the standard library,
> other libraries may take advantage of that. For example, ElementTree
> could return odicts in the future that retain the attribute ordering
> of the source file.
>
> Copyright
> =========
>
> This document has been placed in the public domain.
In Python 3.0 dict.items(), dict.keys() and dict.values() return set-
like views of the dictionary. http://www.python.org/dev/peps/pep-3106/.
You should consider doing the same thing for odict. Of course, the
views would be list-like, not set-like.
At the very least, I would like to see a discussion about it.
Matt
More information about the Python-list
mailing list