list.pop(0) vs. collections.dequeue

Sat Jan 23 10:54:26 EST 2010

On Sat, 23 Jan 2010 09:57:04 -0500, Roy Smith wrote:

> In article <hje979$kc9$1 at news.eternal-september.org>,
>  "Alf P. Steinbach" <alfps at start.no> wrote:
> 
>> But it would IMHO have been better if it wasn't called "list", which
>> brings in the wrong associations for someone used to other languages.
> 
> +1.
> 
> When I first started using Python (back in the 1.4 days), I assumed a
> list was a singly-linked list.

Why would you do that? I can think of at least eight different 
implementations of the abstract list data structure:

constant-size array
variable-size array
variable-size array with amortised O(1) appends
singly-linked list
singly-linked list with CDR coding
doubly-linked list
skip list
indexable skip list

One can reasonably disregard constant-sized arrays as a possibility, 
given that Python lists aren't fixed size (pity the poor Pascal and 
Fortran coders who are stuck with static arrays!), but the rest are all 
reasonable possibilities. Why assume one specific implementation in the 
absence of documentation promising certain performance characteristics?

> Oddly enough, I was going to write in the above paragraph, "like a C++
> STL list", until I happened to glance at the STL docs and refreshed my
> memory that an STL list is doubly-linked.  Which just goes to show that
> making assumptions based on names is a bad idea.

Exactly :)

> So, we're right back to my statement earlier in this thread that the
> docs are deficient in that they describe behavior with no hint about
> cost. Given that, it should be no surprise that users make incorrect
> assumptions about cost.

There are quite a few problems with having the documentation specify cost:

(1) Who is going to do it? Any volunteers?

(2) Big-oh notation can be misleading, especially for naive users, or 
those whose intuition for what's fast has been shaped by other languages. 
Big-oh doesn't tell you whether something is fast or slow, only how it 
scales -- and sometimes not even then.

(3) Having documented a particular performance, that discourages 
implementation changes. Any would-be patch or new implementation not only 
has to consider that the functional behaviour doesn't change, but that 
the performance doesn't either.

In practice the Python developers are unlikely to make an implementation 
change which leads to radically worse performance, particularly for 
critical types like list and dict. But in other cases, they might choose 
to change big-oh behaviour, and not wish to be tied down by documentation 
of the cost of operations.

(4) How much detail is necessary? What about degenerate cases? E.g. dict 
lookup in CPython is typically O(1) amortised, but if all the keys hash 
to the same value, it falls to O(N).

(5) Should the language guarantee such degenerate behaviour? Who decides 
which costs are guaranteed and which are not?

(6) Such performance guarantees should be implementation specific, not 
language specific. CPython is only one implementation of the language out 
of many.

-- 
Steven