collections.chain

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Sat Oct 25 14:56:24 EDT 2008


Several languages like Java, C# etc have a List type in the std lib.
Python has a built-in list(), it's implemented as array dynamic on the
right.

Not too much time ago Hettinger has added a collections.deque (its C
code is really nice), that compared to list() allows a faster append
on the right and a much faster prepend on the left. It's implemented
as a double linked list of fixed-sized blocks. All blocks but the
first and last are fully filled, so such blocks don't need to store
their length, and the data structure just needs to store the length of
the first and last block.

In the C++ STL I think the deque can be implemented as a dynamic array
of pointers, that point to the start of each fixed-size block. This
allows a faster access of items, because you need two lookups and you
don't need to follow the linked list (plus maybe a module operation).
I don't know why the collections.deque uses a double linked list,
maybe because it allows a simpler design (in the dynamic arrays of
pointers you have to manage them as a circular array, so you need the
module or an if).

A double-ended queue covers lot of usages of a linked list, but not
all of them. So if enough Python programmers feel the need of a
(light) data structure that allows O(1) removal and add of items in
any point, then such data structure can be created. The name can be
"chain", because it's easy, short, it means the right thing, and
"list" is already taken.

Its implementation can be a normal double linked list. But on modern
CPUs they can be not much efficient, so there are few strategies to
improve that:
http://en.wikipedia.org/wiki/CDR_coding
http://en.wikipedia.org/wiki/Unrolled_linked_list
I think an unrolled double linked list is a fit implementation for
this purpose. This data structure is quite similar to the
collections.deque, but each block has to keep the number of items it
contains (note: if experiments show that such overhead in memory and
speed is little enough, then most of the C code of the deque may even
be thrown away, using the chain to implement it).

Are enough Python programmers interested in such chain data structure?
Can the typical Python programs enjoy the use of it? I presume it's
not very important, but sometimes I have found a use for it.

If enough people are interested in this data structure, then I think
there's a problem to be solved. How to manage references into the
chain itself. You need references, otherwise many operations become
O(n), and this makes the chain quite less useful.

A simple solution is to create another independent object like
chainptr, that's essentially a pointer to an item of the chain, it can
also become nil/Nil/Null/null, I presume... If you have two chainptr
that point the n-th item, and you remove the n-th item, then the
second chainptr now doesn't point to the n+1-th item (this is true for
python list, where pointers are integer numbers), it's a broken
pointer. Etc. Such low-level problems are probably out of place in
Python.

A way to avoid those problems is to make the pointers part of the
chain object itself, so they are kept consistent. I presume there must
be some way to add/remove such indexes dynamically, but I presume in
Python this is not too much a problem, but such kind of management
slows down the data structure a little. Every operation has to control
the state of all defined indexes, but I presume their number is usally
low (one, two) so it may not be a problem.

I don't have ideas for a good API yet, but if the pointers are part of
the chain, the syntax may becomes something like:

from collections import chain
d = chain("ABCD")
d.addindex() # creates p0 that points A
d.p0.next()
d.p0.next() # now p0 point C
d.addindex(d.p0) # now p1 point C
d.p0.delete() # now p0 and p1 point D (or nothing?)

Well, that's ugly. But first of all it's important to see if a chain i
useful, if the answer is positive, then I can look for a decent API.

Bye,
bearophile



More information about the Python-list mailing list