[Python-Dev] Re: test_bsddb blocks testing popitem - reason

Mon Oct 27 16:56:48 EST 2003

On Mon, Oct 27, 2003 at 11:25:16AM +0100, Alex Martelli wrote:
> I still don't quite see how the lock ends up being "held", but, don't mind
> me -- the intricacy of mixins and wrappings and generators and delegations
> in those modules is making my head spin anyway, so it's definitely not
> surprising that I can't quite see what's going on.

BerkeleyDB internally always grabs a read lock (i believe at the page
level; i don't think BerkeleyDB does record locking) for any database read
when opened with DB_THREAD | DB_INIT_LOCK flags.  I believe the problem
is that a DBCursor object holds this lock as long as it is open/exists.
Other reads can go on happily, but writes must to wait for the read lock
to be released before they can proceed.

> > How do python dictionaries deal with modifications to the dictionary
> > intermixed with iteration?
> 
> In general, Python doesn't deal well with modifications to any
> iterable in the course of a loop using an iterator on that iterable.
> 
> The one kind of "modification during the loop" that does work is:
> 
> for k in somedict:
>     somedict[k] = ...whatever...
> 
> i.e. one can change the values corresponding to keys, but not
> change the set of keys in any way -- any changes to the set of
> keys can cause unending loops or other such misbehavior (not
> deadlocks nor crashes, though...).
> 
> However, on a real Python dict,
>     k, v = thedict.iteritems().next()
> doesn't constitute "a loop" -- the iterator object returned by
> the iteritems call is dropped since there are no outstanding
> references to it right after this statement.  So, following up
> with
>     del thedict[k]
> is quite all right -- the dictionary isn't being "looped on" at
> that time.

What about the behaviour of multiple iterators for the same dict being
used at once (either interleaved or by multiple threads; it shouldn't
matter)?  I expect that works fine in python.

This is something the _DBWithCursor iteration interface does not currently
support due to its use of a single DBCursor internally.

_DBWithCursor is currently written such that the cursor is never closed
once created.  This leaves tons of potential for deadlock even in single
threaded apps.  Reworking _DBWithCursor into a _DBThatUsesCursorsSafely
such that each iterator creates its own cursor in an internal pool
and other non cursor methods that would write to the db destroy all
cursors after saving their current() position so that the iterators can
reopen+reposition them is a solution.

> Given that in bsddb's case that iteritems() first [and only]
> next() boils down to a self.first() which in turn does a 
> self.dbc.first() I _still_ don't see exactly what's holding the
> lock.  But the simplest fix would appear to be in __delitem__,
> i.e., if we have a cursor we should delete through it:
> 
>     def __delitem__(self, key):
>         self._checkOpen()
>         if self.dbc is not None:
>             self.dbc.set(key)
>             self.dbc.delete()
>         else:
>             del self.db[key]
> 
> ...but this doesn't in fact remove the deadlock on the
> unit-test for popitem, which just confirms I don't really
> grasp what's going on, yet!-)

hmm.  i would've expected your __delitem__ to work.  Regardless, using the
debugger I can stop the deadlock from occurring if i do "self.dbc.close();
self.dbc = None" just before popitem's "del self[k]"

Greg