Are python objects thread-safe?

Aaron Brady castironpi at gmail.com
Mon Dec 22 18:36:15 EST 2008


On Dec 22, 2:59 am, Duncan Booth <duncan.bo... at invalid.invalid> wrote:
> RajNewbie <raj.indian... at gmail.com> wrote:
> > Say, I have two threads, updating the same dictionary object - but for
> > different parameters:
> > Please find an example below:
> > a = {file1Data : '',
> >        file2Data : ''}
>
> > Now, I send it to two different threads, both of which are looping
> > infinitely:
> > In thread1:
> > a['file1Data'] = open(filename1).read
> >           and
> > in thread2:
> > a['file2Data'] = open(filename2).read
>
> > My question is  - is this object threadsafe? - since we are working on
> > two different parameters in the object. Or should I have to block the
> > whole object?
>
> It depends exactly what you mean by 'threadsafe'. The GIL will guarantee
> that you can't screw up Python's internal data structures: so your
> dictionary always remains a valid dictionary rather than a pile of bits.
>
> However, when you dig a bit deeper, it makes very few guarantees at the
> Python level. Individual bytecode instructions are not guaranteed
> atomic: for example, any assignment (including setting a new value into
> the dictionary) could overwrite an existing value and the value which is
> overwritten may have a destructor written in Python. If that happens you
> can get context switches within the assignment.

Th.1   Th.2
a=X
       a=Y
a=Z

You are saying that if 'a=Z' interrupts 'a=Y' at the wrong time, the
destructor for 'X' or 'Y' might not get called.  Correct?  In serial
flow, the destructor for X is called, then Y.

> Other nasty things can happen if you use dictionaries from multiple
> threads. You cannot add or remove a dictionary key while iterating over
> a dictionary. This isn't normally a big issue, but as soon as you try to
> share the dictionary between threads you'll have to be careful never to
> iterate through it.

These aren't documented, IIRC.  Did you just discover them by trial
and error?

> You will probably find it less error prone in the long run if you get
> your threads to write (key,value) tuples into a queue which the
> consuming thread can read and use to update the dictionary.

Perhaps there's a general data structure which can honor 'fire-and-
forget' method calls in serial.

a= async( {} )
a[0]= X
a[0]= Y

-->
obj_queue[a].put( a.__setitem__, 0, X )
obj_queue[a].put( a.__setitem__, 0, Y )

If you need the return value, you'll need to block.

print a[0]
-->
  res= obj_queue[a].put( a.__getitem__, 0 )
  res.wait()
  return res
print res

Or you can use a Condition object.  But you can also delegate the
print farther down the line of processing:

obj_queue[a].link( print ).link( a.__getitem__, 0 )

(As you can see, the author (I) finds it a more interesting problem to
get required information in the right places at the right times in
execution.  The actual implementation is left to the reader; I'm
merely claiming that there exists a consistent one taking the above
instructions to be sufficient givens.)



More information about the Python-list mailing list