list.pop(0) vs. collections.dequeue

Mon Jan 25 23:31:07 EST 2010

Steve Howell <showell30 at yahoo.com> writes:
> I haven't profiled deque vs. list, but I think you are correct about
> pop() possibly being a red herring....
> For really large lists, I suppose memmove() would eventually start to
> become a bottleneck, but it's brutally fast when it just moves a
> couple kilobytes of data around.

One way to think of Python is as a scripting wrapper around a bunch of C
functions, rather than as a full-fledged programming language.  Viewed
that way, list operations like pop(0) are essentially constant time
unless the list is quite large.  By that I mean you can implement
classic structures like doubly-linked lists using Python tuples, but
even though inserting into the middle of them is theoretically O(1), the
memmove's of the native list operations will be much faster in practice.
Programs dealing with large lists (more than a few thousand elements)
are obviously different and if your program is using such large lists,
you have to plan a little differently when writing the code.

>> I've often run applications with millions of lists
> I bet even in your application, the amount of memory consumed by the
> PyListObjects themselves is greatly dwarfed by other objects, notably
> the list elements themselves

Such lists often would just one element or even be empty.  For example,
you might have a dictionary mapping names to addresses.  Most people
have just one address, but some might have no address, and a few might
have more than one address, so you would have a list of addresses for
each name.  Of course the dictionary slots in that example would also
use space.

> Well, I am not trying to solve problems wastefully here.  CPU cycles
> are also scarce, so it seems wasteful to do an O(N) memmove that could
> be avoided by storing an extra pointer per list. 

Realistically the CPython interpreter is so slow that the memmove is
unnoticable, and Python (at least CPython) just isn't all that
conductive to writing fast code.  It makes up for this in programmer
productivity for the many sorts of problems in which moderate speed
is acceptable.

> Thanks for your patience in responding to me, despite the needlessly
> abrasive tone of my earlier postings.  

I wondered whether you might have come over from the Lisp newsgroups,
which are pretty brutal.  We try to be friendlier here (not that we're
always successful).  Anyway, welcome.

>   1) Summarize all this discussion and my lessons learned in some kind
> of document.  It does not have to be a PEP per se, but I could provide
> a useful service to the community by listing pros/cons/etc.

I suppose that can't hurt, but there are probably other areas (multicore
parallelism is a perennial one) of much higher community interest.

http://wiki.python.org/moin/ is probably a good place to put such
a document.

>   2) I would still advocate for removing the warning against list.pop
> (0) from the tutorial.  I agree with Steven D'Aprano that docs really
> should avoid describing implementation details in many instances

On general principles I agree with Alex Stepanov that the running time
of a function should be part of its interface (nobody wants to use a
stack of popping an element takes quadratic time) and therefore should
be stated in the docs.  Python just has a weird incongruence between the
interpreter layer and the C layer, combined with a library well-evolved
for everyday problem sizes, so the traditional asymptotic approach to
algorithm selection often doesn't give the best practical choice.

I don't feel like looking up what the tutorial says about pop(0), but if
it just warns against it without qualification, it should probably
be updated.