Threading question .. am I doing this right?

Robert Latest boblatest at yahoo.com
Thu Feb 24 07:08:50 EST 2022


I have a multi-threaded application (a web service) where several threads need
data from an external database. That data is quite a lot, but it is almost
always the same. Between incoming requests, timestamped records get added to
the DB.

So I decided to keep an in-memory cache of the DB records that gets only
"topped up" with the most recent records on each request:


    from threading import Lock, Thread


    class MyCache():
        def __init__(self):
            self.cache = None
            self.cache_lock = Lock()

        def _update(self):
            new_records = query_external_database()
            if self.cache is None:
                self.cache = new_records
            else:
                self.cache.extend(new_records)

        def get_data(self):
            with self.cache_lock:
                self._update()

            return self.cache

    my_cache = MyCache() # module level


This works, but even those "small" queries can sometimes hang for a long time,
causing incoming requests to pile up at the "with self.cache_lock" block.

Since it is better to quickly serve the client with slightly outdated data than
not at all, I came up with the "impatient" solution below. The idea is that an
incoming request triggers an update query in another thread, waits for a short
timeout for that thread to finish and then returns either updated or old data.

class MyCache():
    def __init__(self):
        self.cache = None
        self.thread_lock = Lock()
        self.update_thread = None

    def _update(self):
        new_records = query_external_database()
        if self.cache is None:
            self.cache = new_records
        else:
            self.cache.extend(new_records)

    def get_data(self):
        if self.cache is None:
            timeout = 10 # allow more time to get initial batch of data
        else:
            timeout = 0.5
        with self.thread_lock:
            if self.update_thread is None or not self.update_thread.is_alive():
                self.update_thread = Thread(target=self._update)
                self.update_thread.start()
                self.update_thread.join(timeout)

        return self.cache

    my_cache = MyCache()

My question is: Is this a solid approach? Am I forgetting something? For
instance, I believe that I don't need another lock to guard self.cache.append()
because _update() can ever only run in one thread at a time. But maybe I'm
overlooking something.



More information about the Python-list mailing list