"RuntimeError: dictionary changed size during iteration" ; Good atomic copy operations?

Sun Mar 12 20:56:37 EST 2006

[robert]
> In very rare cases a program crashes (hard to reproduce) :
>
> * several threads work on an object tree with dict's etc. in it. Items
> are added, deleted, iteration over .keys() ... ). The threads are "good"
> in such terms, that this core data structure is changed only by atomic
> operations, so that the data structure is always consistent regarding
> the application. Only the change-operations on the dicts and lists
> itself seem to cause problems on a Python level ..
>
> * one thread periodically pickle-dumps the tree to a file:
>    >>> cPickle.dump(obj, f)
>
> "RuntimeError: dictionary changed size during iteration" is raised by
> .dump ( or a similar "..list changed ..." )
>
> What can I do about this to get a stable pickle-dump without risiking
> execution error or even worse - errors in the pickled file ?
>
> Is a copy.deepcopy  ( -> "cPickle.dump(copy.deepcopy(obj),f)" ) an
> atomic opertion with a guarantee to not fail?

No.  It is non-atomic.

It seems that your application design intrinsically incorporates a race
condition -- even if deepcopying and pickling were atomic, there would
be no guarantee whether the pickle dump occurs before or after another
thread modifies the structure.  While that design smells of a rat, it
may be that your apps can accept a dump of any consistent state and
that possibly concurrent transactions may be randomly included or
excluded without affecting the result.

Python's traditional recommendation is to put all access to a resource
in one thread and to have other threads communicate their transaction
requests via the Queue module.  Getting results back was either done
through other Queues or by passing data through a memory location
unique to each thread.  The latter approach has become trivially simple
with the advent of Py2.4's thread-local variables.

Thinking about future directions for Python threading, I wonder if
there is a way to expose the GIL (or simply impose a temporary
moratorium on thread switches) so that it becomes easy to introduce
atomicity when needed:

   gil.acquire(BLOCK=True)
   try:
      #do some transaction that needs to be atomic
   finally:
      gil.release()

> Or can I only retry several times in case of RuntimeError?  (which would
> apears to me as odd gambling; retry how often?)

Since the app doesn't seem to care when the dump occurs,  it might be
natural to put it in a while-loop that continuously retries until it
succeeds; however, you still run the risk that other threads may never
leave the object alone long enough to dump completely.

Raymond