Thread safe object cache without locking

Timo timo.linna at gmail.com
Thu Sep 15 10:09:55 EDT 2005


I'm trying to make a thread safe object cache without locking.

The objects are cached by the id of the data dict given in __new__.
Objects are removed from the cache as soon as they are no longer
referenced. The type of the data must be a Python dict (comes from an
external source).

Here's what I've got so far:

import time
from weakref import ref

cache = {} # {id(data): ref_to_Foo(data), ...}

class Foo(object):

    __slots__ = ('__weakref__', '_data')

    def __new__(cls, data):
        new_self = object.__new__(cls)

        def remove(wr, _cache=cache, _data=data):
            try:
                del _cache[id(_data)]
            except KeyError:
                pass # should never happen

        while 1:
            self = cache.setdefault(id(data), ref(new_self, remove))()
            try:
                self._data = data
                return self # this is new_self or a cached instance
            except AttributeError:
                # happens rarely: got a 'dead' cached instance
                time.sleep(0) # small delay before retrying

# Usage example:
data = {'a':1, 'b':2}
a = Foo(data)
b = Foo(data)
assert a is b

I've got couple of assumptions based on I think the code is working
correctly:

a) Python's dict is thread safe, i.e. setdefault can't replace an
existing value until the key has been removed - no matter how many
threads are calling it simultaneously. Additionally, only one thread
succeeds in setting the new value.

b) Unreferenced ref(new_self, remove) objects are garbage collected
without the callbacks being called.

c) The remove() keeps a reference to the data dict. That should prevent
Python from reusing the object id too soon.

Are these assumptions correct? Do you have ideas for improvements
(especially regarding speed)? I'm restricted to Py2.3+ compatibility.

Thanks,
    Timo




More information about the Python-list mailing list