WeakValueDict and threadsafety

Darren Dale dsdale24 at gmail.com
Sat Dec 10 12:56:38 EST 2011


On Dec 10, 11:19 am, Duncan Booth <duncan.bo... at invalid.invalid>
wrote:
> Darren Dale <dsdal... at gmail.com> wrote:
> > I'm concerned that this is not actually thread-safe. When I no longer
> > hold strong references to an instance of data, at some point the
> > garbage collector will kick in and remove that entry from my registry.
> > How can I ensure the garbage collection process does not modify the
> > registry while I'm holding the lock?
>
> You can't, but it shouldn't matter.
>
> So long as you have a strong reference in 'data' that particular object
> will continue to exist. Other entries in 'registry' might disappear while
> you are holding your lock but that shouldn't matter to you.
>
> What is concerning though is that you are using `id(data)` as the key and
> then presumably storing that separately as your `oid` value. If the
> lifetime of the value stored as `oid` exceeds the lifetime of the strong
> references to `data` then you might get a new data value created with the
> same id as some previous value.
>
> In other words I think there's a problem here, but nothing to do with the
> lock.

Thank you for the considered response. In reality, I am not using
id(data). I took that from the example in the documentation at
python.org in order to illustrate the basic approach, but it looks
like I introduced an error in the code. It should read:

def get_data(oid):
    with reglock:
        data = registry.get(oid, None)
        if data is None:
            data = make_data(oid)
            registry[oid] = data
    return data

Does that look better? I am actually working on the h5py project
(bindings to hdf5), and the oid is an hdf5 object identifier.
make_data(oid) creates a proxy object that stores a strong reference
to oid.

My concern is that the garbage collector is modifying the dictionary
underlying WeakValueDictionary at the same time that my multithreaded
code is trying to access it, producing a race condition. This morning
I wrote a synchronized version of WeakValueDictionary (actually
implemented in cython):

class _Registry:

    def __cinit__(self):
        def remove(wr, selfref=ref(self)):
            self = selfref()
            if self is not None:
                self._delitem(wr.key)
        self._remove = remove
        self._data = {}
        self._lock = FastRLock()

    __hash__ = None

    def __setitem__(self, key, val):
        with self._lock:
            self._data[key] = KeyedRef(val, self._remove, key)

    def _delitem(self, key):
        with self._lock:
            del self._data[key]

    def get(self, key, default=None):
        with self._lock:
            try:
                wr = self._data[key]
            except KeyError:
                return default
            else:
                o = wr()
                if o is None:
                    return default
                else:
                    return o

Now that I am using this _Registry class instead of
WeakValueDictionary, my test scripts and my actual program are no
longer producing segfaults.



More information about the Python-list mailing list