[Python-Dev] {}.popitem() (was Re: {}.first[key,value,item] ...)

Guido van Rossum guido@python.org
Fri, 08 Dec 2000 13:43:39 -0500


After the last round of discussion, I was left with the idea that the
best thing we could do to help destructive iteration is to introduce a
{}.popitem() that returns an arbitrary (key, value) pair and deletes
it.  I wrote about this:

> > One more concern: if you repeatedly remove the *first* item, the hash
> > table will start looking lobsided.  Since we don't resize the hash
> > table on deletes, maybe picking an item at random (but not using an
> > expensive random generator!) would be better.

and Tim replied:

> Which is the reason SETL doesn't specify *which* set item is removed:  if
> you always start looking at "the front" of a dict that's being consumed, the
> dict fills with turds without shrinking, you skip over them again and again,
> and consuming the entire dict is still quadratic time.
> 
> Unfortunately, while using a random start point is almost always quicker
> than that, the expected time for consuming the whole dict remains quadratic.
> 
> The clearest way around that is to save a per-dict search finger, recording
> where the last search left off.  Start from its current value.  Failure if
> it wraps around.  This is linear time in non-pathological cases (a
> pathological case is one in which it isn't linear time <wink>).

I've implemented this, except I use a static variable for the finger
intead of a per-dict finger.  I'm concerned about adding 4-8 extra
bytes to each dict object for a feature that most dictionaries never
need.  So, instead, I use a single shared finger.  This works just as
well as long as this is used for a single dictionary.  For multiple
dictionaries (either used by the same thread or in different threads),
it'll work almost as well, although it's possible to make up a
pathological example that would work qadratically.

An easy example of such a pathological example is to call popitem()
for two identical dictionaries in lock step.

Comments please!  We could:

- Live with the pathological cases.

- Forget the whole thing; and then also forget about firstkey()
  etc. which has the same problem only worse.

- Fix the algorithm. Maybe jumping criss-cross through the hash table
  like lookdict does would improve that; but I don't understand the
  math used for that ("Cycle through GF(2^n)-{0}" ???).

I've placed a patch on SourceForge:

http://sourceforge.net/patch/?func=detailpatch&patch_id=102733&group_id=5470

The algorithm is:

static PyObject *
dict_popitem(dictobject *mp, PyObject *args)
{
	static int finger = 0;
	int i;
	dictentry *ep;
	PyObject *res;

	if (!PyArg_NoArgs(args))
		return NULL;
	if (mp->ma_used == 0) {
		PyErr_SetString(PyExc_KeyError,
				"popitem(): dictionary is empty");
		return NULL;
	}
	i = finger;
	if (i >= mp->ma_size)
		ir = 0;
	while ((ep = &mp->ma_table[i])->me_value == NULL) {
		i++;
		if (i >= mp->ma_size)
			i = 0;
	}
	finger = i+1;
	res = PyTuple_New(2);
	if (res != NULL) {
		PyTuple_SET_ITEM(res, 0, ep->me_key);
		PyTuple_SET_ITEM(res, 1, ep->me_value);
		Py_INCREF(dummy);
		ep->me_key = dummy;
		ep->me_value = NULL;
		mp->ma_used--;
	}
	return res;
}

--Guido van Rossum (home page: http://www.python.org/~guido/)