list.pop(0) vs. collections.dequeue

Steve Howell showell30 at yahoo.com
Mon Jan 25 13:00:03 EST 2010


On Jan 24, 10:07 pm, Steven D'Aprano
<ste... at REMOVE.THIS.cybersource.com.au> wrote:
> On Sun, 24 Jan 2010 20:12:11 -0800, Steve Howell wrote:
> >> > The most ambitious proposal is to fix the memory manager itself to
> >> > allow the release of memory from the start of the chunk.
>
> >> That's inappropriate given the memory fragmentation it would cause.
>
> > Bullshit.  Memory managers consolidate free memory chunks all the time.
> > That's their job.
>
> So let me get this straight...
>
> You've complained that Python's list.pop(0) is lame because it moves
> memory around. And your solution to that is to have the memory manager
> move the memory around instead?
>
> Perhaps I'm missing something, but I don't see the advantage here. At
> best, you consolidate all those moves you wanted to avoid and do them all
> at once instead of a few at a time. At worst, you get a situation where
> the application periodically, and unpredictably, grinds to a halt while
> the memory manager tries to defrag all those lists.
>

You are misunderstanding what I meant, because I did not explain it
very well.  When you release memory from the front of the list, if the
memory before it was also free, the memory manager could consolidate
the two chunks as one free chunk.

There is no rational scenario where the memory manager grinds to a
halt tries to "defrag all those lists."

Of course, once the list gets fully garbage collected, then entire
chunk of memory is freed up.


> >> Your approach of snarling against list is not persuading anyone that
> >> list needs to be changed, because most everyone is satisfied with the
> >> existing solution.
>
> > Please provide evidence of that.  I am pretty sure that everybody who
> > chooses alternatives to Python would disagree.
>
> Do you honestly believe that "everybody" who prefers another language
> over Python does so because they dislike the performance of list.pop(0)?
>

No I don't believe any statement that makes gross generalizations, so
I also don't believe "most everyone is satisfied with the existing
solution."

> >> You might change approaches and discuss deque, what's wrong with it,
> >> and whether it can be fixed.  Getting a change approved for deque is
> >> probably much easier than getting one approved for list, just because
> >> nowhere near as many things depend on deque's performance.
>
> > Again...I am not looking to improve deque, which is a perfectly valid
> > data structure for a limited set of problems.
>
> >> And when discussing performance in this contextc additive constants do
> >> matter.
>
> > Wrong again.  Operations that mutate lists are already expensive, and a
> > few checks to see if unreleased memory can be reclaimed are totally
> > NEGLIGIBLE.
>
> Popping from the end of the list isn't expensive. Reversing lists is
> relatively cheap. In-place modifications are very cheap.
>

I am talking in relative terms here.  I am saying that checking a
single flag in C code isn't gonna significantly slow down any
operation that calls list_resize().  Delete operations would already
be doing a memmove operation, and insert operations already have to
decide whether to optimistically allocate memory and create the new
list element.

Regarding the extra use of memory, I addressed this in my prior
posting.

Here is code for list_resize:

static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
	PyObject **items;
	size_t new_allocated;
	Py_ssize_t allocated = self->allocated;

	/* Bypass realloc() when a previous overallocation is large enough
	   to accommodate the newsize.  If the newsize falls lower than half
	   the allocated size, then proceed with the realloc() to shrink the
list.
	*/
	if (allocated >= newsize && newsize >= (allocated >> 1)) {
		assert(self->ob_item != NULL || newsize == 0);
		Py_SIZE(self) = newsize;
		return 0;
	}

	/* This over-allocates proportional to the list size, making room
	 * for additional growth.  The over-allocation is mild, but is
	 * enough to give linear-time amortized behavior over a long
	 * sequence of appends() in the presence of a poorly-performing
	 * system realloc().
	 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
	 */
	new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

	/* check for integer overflow */
	if (new_allocated > PY_SIZE_MAX - newsize) {
		PyErr_NoMemory();
		return -1;
	} else {
		new_allocated += newsize;
	}

	if (newsize == 0)
		new_allocated = 0;
	items = self->ob_item;
	if (new_allocated <= ((~(size_t)0) / sizeof(PyObject *)))
		PyMem_RESIZE(items, PyObject *, new_allocated);
	else
		items = NULL;
	if (items == NULL) {
		PyErr_NoMemory();
		return -1;
	}
	self->ob_item = items;
	Py_SIZE(self) = newsize;
	self->allocated = new_allocated;
	return 0;
}


Here is the code for list_ass_slice:

static int
list_ass_slice(PyListObject *a, Py_ssize_t ilow, Py_ssize_t ihigh,
PyObject *v)
{
	/* Because [X]DECREF can recursively invoke list operations on
	   this list, we must postpone all [X]DECREF activity until
	   after the list is back in its canonical shape.  Therefore
	   we must allocate an additional array, 'recycle', into which
	   we temporarily copy the items that are deleted from the
	   list. :-( */
	PyObject *recycle_on_stack[8];
	PyObject **recycle = recycle_on_stack; /* will allocate more if
needed */
	PyObject **item;
	PyObject **vitem = NULL;
	PyObject *v_as_SF = NULL; /* PySequence_Fast(v) */
	Py_ssize_t n; /* # of elements in replacement list */
	Py_ssize_t norig; /* # of elements in list getting replaced */
	Py_ssize_t d; /* Change in size */
	Py_ssize_t k;
	size_t s;
	int result = -1;	/* guilty until proved innocent */
#define b ((PyListObject *)v)
	if (v == NULL)
		n = 0;
	else {
		if (a == b) {
			/* Special case "a[i:j] = a" -- copy b first */
			v = list_slice(b, 0, Py_SIZE(b));
			if (v == NULL)
				return result;
			result = list_ass_slice(a, ilow, ihigh, v);
			Py_DECREF(v);
			return result;
		}
		v_as_SF = PySequence_Fast(v, "can only assign an iterable");
		if(v_as_SF == NULL)
			goto Error;
		n = PySequence_Fast_GET_SIZE(v_as_SF);
		vitem = PySequence_Fast_ITEMS(v_as_SF);
	}
	if (ilow < 0)
		ilow = 0;
	else if (ilow > Py_SIZE(a))
		ilow = Py_SIZE(a);

	if (ihigh < ilow)
		ihigh = ilow;
	else if (ihigh > Py_SIZE(a))
		ihigh = Py_SIZE(a);

	norig = ihigh - ilow;
	assert(norig >= 0);
	d = n - norig;
	if (Py_SIZE(a) + d == 0) {
		Py_XDECREF(v_as_SF);
		return list_clear(a);
	}
	item = a->ob_item;
	/* recycle the items that we are about to remove */
	s = norig * sizeof(PyObject *);
	if (s > sizeof(recycle_on_stack)) {
		recycle = (PyObject **)PyMem_MALLOC(s);
		if (recycle == NULL) {
			PyErr_NoMemory();
			goto Error;
		}
	}
	memcpy(recycle, &item[ilow], s);

	if (d < 0) { /* Delete -d items */
		memmove(&item[ihigh+d], &item[ihigh],
			(Py_SIZE(a) - ihigh)*sizeof(PyObject *));
		list_resize(a, Py_SIZE(a) + d);
		item = a->ob_item;
	}
	else if (d > 0) { /* Insert d items */
		k = Py_SIZE(a);
		if (list_resize(a, k+d) < 0)
			goto Error;
		item = a->ob_item;
		memmove(&item[ihigh+d], &item[ihigh],
			(k - ihigh)*sizeof(PyObject *));
	}
	for (k = 0; k < n; k++, ilow++) {
		PyObject *w = vitem[k];
		Py_XINCREF(w);
		item[ilow] = w;
	}
	for (k = norig - 1; k >= 0; --k)
		Py_XDECREF(recycle[k]);
	result = 0;
 Error:
	if (recycle != recycle_on_stack)
		PyMem_FREE(recycle);
	Py_XDECREF(v_as_SF);
	return result;
#undef b
}




More information about the Python-list mailing list