list.pop(0) vs. collections.dequeue

Sat Jan 23 02:43:28 EST 2010

* Steve Howell:
> On Jan 22, 7:09 pm, Roy Smith <r... at panix.com> wrote:
>> In article
>> <3ac173bd-4124-434d-b726-0b9baaeec... at 36g2000yqu.googlegroups.com>,
>>  Steve Howell <showel... at yahoo.com> wrote:
>>
>>> In my case I'm not really concerned about giving the memory back right
>>> away, it's more about keeping my code simple.  Once I'm done with an
>>> element, I do just want to pop it off and keep all the simplicity of
>>> the lists, but I am just concerned enough speed that for a 1000
>>> element list, I don't want to be doing 1000 memmoves for an average of
>>> 500 pointers, which effectively moves about a million bytes around for
>>> no reason.
>> If you really want to pop all the elements from a long list, reverse the
>> list and pop them off the end.  Popping every element off the beginning of
>> the list is O(n^2), as you pointed out.  Reversing the list is O(n), and
>> each pop after that is O(1), so the overall complexity is O(n).
> 
> I really want to use list *normally* with all its perfectly good
> semantics and reasonable implementation, except for its blind spot
> with respect to popping the first element off the list.  The whole
> reason I use CPython vs. C in the first place is that CPython
> programmers can generally program basic data structures better than I
> can.  But list.pop(0) is the exception.  And, with the possible
> exception of dicts, lists are the most fundamental data structures
> that Python has.

Having optimized list.pop() for first element, then you would have a blind spot 
with respect to insertion at the front... Which could then be optimized for the 
cases where there is some buffer space at the front, left over from a pop. I 
think it would just be harder to understand and harder to explain. And I think 
that due to that, as usual people would build their own elaborate 
"explanations", with erroneous conclusions, and would then use it in inefficient 
ways (like, popping from the middle or inserting at the front).

I think the "harder to use correctly" is the strongest argument against the 
optimization, but there is also a non-obvious *memory overhead*...

Having popped just one element at the front, in order for the implementation to 
not /accumulate/ unused memory, that is, in order to avoid an ongoing /memory 
leak/, extending the buffer to accomodate e.g. an append() can no longer be done 
as a (on relevant systems) constant time reallocation (letting the OS do its 
virtual memory paging magic). Instead, a new buffer would have to be allocated 
and all data copied over. One could still have amortized constant time for 
appends by using a doubling buffer (which is the strategy used in C++ 
'std::vector'), but at the cost of wasting some memory, a percentage...

The implementation as a pure array is easy to understand and use correctly, and 
doesn't waste memory.

But it would IMHO have been better if it wasn't called "list", which brings in 
the wrong associations for someone used to other languages. The name does 
matter. E.g. I don't think this discussion about a pop optimization would have 
been there except for the name, which makes that optimization sound reasonable. 
Perhaps some standard alternative name could be devised. Like, "array" would 
have been nice, except that that's already grabbed by the array module.

> I know Python's number one concern will never be speed, but if Python
> makes an O(1) operation into an unnecessarily O(N) operation for no
> good reasons other than "it's too complicated, " or it "adds another
> pointer to the structure," or "it adds another conditional check to
> list_ass_slice for operations that aren't popping off the top," I
> think it's reasonable to challenge the design philosophy.

See above. In addition to being more difficult /to use/ correctly, that is, 
being much easier to misunderstand, it incurs a memory overhead  --  not the one 
directly from the pop optimization, but by the requirements of buffer extension. 
Essentially, as discussed above, it would then have to use a doubling buffer.

Cheers & hth.,

- Alf