[Python-Dev] deque alternative (was: Linked lists)

Mon Dec 26 20:04:15 CET 2005

> [restoring context and attributions lost in the original]
> 
> [Tim Peters]
> >>>> Like Phillip Eby, I use 2-tuples for this when I feel the need
> >>>> (usually during a backtracking graph search, to keep track of paths
> >>>> back to the root in a space-efficient way), and am happy with that.
> 
> [Josiah Carlson]
> >>> Then there's the whole Python list with append() and pop().  It
> >>> takes a method resolution and function call, but at least in
> >>> Python 2.3, it is a hair faster...
> 
> [Martin Blais]
> >> Josiah, that's beside the point, because it is not space-efficient,
> >> the list is actually an dynamic array/vector, and I would expect an
> >> efficient array implementation to grow exponentially.  One of the
> >> reasons we're talking about lists is to avoid exactly this problem...
> 
> [James Y Knight]
> > Okay, now *that* reason is a pretty crazy one. Consider what you're
> > saying here.

[Tim Peters]
> I'm sure he did ;-)  Consider a very simple graph, a skinny tree
> rooted at `A` that branches once "at the end", represented by a dict
> of adjacency lists:

Are you sure?

> A "flat" list or tuple would certainly be more space-efficient up to
> this point.  But when the graph branches, the 2-tuple representation
> allows *sharing* the common path suffix (which may be very long!):
...
> If there's an N-way branch, using 2-tuples allows recording the N new
> paths back to the root with incremental memory burden N *
> some_constant.  You can't do this kind of sharing with a flat list or
> tuple, and the incremental memory burden at an N-way branch zooms to
> (N + some_other_constant) * (the amount of memory needed to record the
> path from the branch point back to the root), due to duplicating the
> back-to-the-root info N times.   The potential memory saving from
> using 2-tuples instead is unbounded.

But one doesn't _need_ to use flat lists.  If one were to combine cons
cells with Python lists, you can get the space efficiency of lists with
the reusability of the desired cons cells (or tuples as the case may be).
How? (i, l), where i is the prefix of list l which is the desired
portion of whatever l represents.  You toss one of those anywhere in
your 'flat list', and you have your stored/saved path with a couple
dozen bytes. If you are not careful, you end up storing/saving paths
which would be stored more efficiently by copying the contents, but at
that point we are talking about a constant factor blowup, not the
(potentially) exponential blowup of storing all paths as copies, and we
can always copy to be more efficient if necessary.

I have personally used this trick to construct a union-find data
structure in which all unions are reversable regardless of find
operation. [1]

> > So, the list will generally be 1/8th of its size overallocated, or
> > 112 elements plus 9 words overhead is 121 words. Better than any cons-
> > linked list could be, and *insanely better* than a cons-linked list
> > would be in python.
> 
> You seem to be ignoring possiblities for sharing across lists, and
> such sharing is natural in many graph algorithms.

Not necessarily so as I have pointed out above.  You said yourself;
practicality beats purity.  While using cons cells are pure, as us using
only flat lists are pure, flat lists are practically smaller than cons
cells in certain cases (by a factor of ~4), and non-flat lists can be
smaller than cons cells in the rest of the cases.

 - Josiah

[1] If you remember, the ammortized running time of O(n) unions
and finds on a union-find data structure is O(n*alpha(n,n)), where
alpha(n,n) is never larger than 5 in practice.  The space overhead of
using non-flat lists as shared paths provides sufficient information to
offer reversable union operations in ammortized O(n*alpha(n,n)) space.