[Python-Dev] collections module

Kurt B. Kaiser kbk at shore.net
Sat Jan 10 02:58:56 EST 2004


"Tim Peters" <tim.one at comcast.net> writes:

> [Guido]
>> You misunderstood -- the *fields* I was talking about are
>> counters or pointers that keep track of the overallocation, and 1 or
>> at most 2 is enough.
>
> One would be enough.  To avoid breaking huge mounds of code all over the
> place, I think the existing ob_item should "float", to point always to the
> current element "at index 0".  So ob_item would change when inserting or
> deleting "at the left end".  The only new thing you need then is a field
> recording the number of unused slots "left-side" slots, between the start of
> the memory actually allocated and the slot ob_item points at; that field
> would have to be added or subtracted to/from ob_item in the obvious ways
> when allocating or deallocating the raw memory, but *most* of listobject.c
> could ignore it, and none of the PyList macros would need to change (in
> particular, the speed of indexed access wouldn't change).
>
> The overallocation strategy gets more complicated, though, and less
> effective for prepending.  If you run out of room when prepending, it's not
> enough just to ask realloc() for more memory, you also have to memmove() the
> entire vector, to "make room at the left end".  In previous lives I've found
> that "appending at the right" is as efficient as it is in practice largely
> because most non-trivial C realloc() calls end up extending the vector
> in-place, not needing to copy anything.  The O() behavior is the same if
> each boost in size has to move everything that already existed, but the
> constant factor goes up more than just measurably <wink>.

Aren't array-based implementations of queue usually based on circular buffers?

Head and tail pointers plus a count are sufficient.  Then a push is:
check count (resize if necessary), write, adjust head pointer, adjust
count.  A pull is similar, using the tail pointer.  Constant time.
The resize is a little tricky because the gap between the tail and
head pointers is what needs to grow.

The current implementation of Queue (with what appears to me to be a
listpop(v, 0) for get() ) involves an internal shift of the whole
pointer vector, O(n), for every read of the queue.  The Cookbook
examples mentioned alleviate this but still with excess copying or
unbounded memory usage.  IMHO it's better done in listobject.

I'm not up to speed on the reasons why ob_item needs to float but
perhaps there is a straightforward indirect relationship to the head
pointer.

Why not have listobject grow deque functionality using circular buffer
techniques?  That way you get high performance queues and deques while
avoiding the name collision with Queue.

put(x) == insert(0, x) 
get() == pop()
put_tail(x) == append(x) == push(x) (?? this must have been discussed))
get_head() == pop(0)

All list operations would preserve the head and tail pointers.

-- 
KBK



More information about the Python-Dev mailing list