Thread safety issue (I think) with defaultdict

Israel Brewster israel at ravnalaska.net
Wed Nov 1 14:53:20 EDT 2017


On Nov 1, 2017, at 9:58 AM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> 
> On Tue, Oct 31, 2017 at 11:38 AM, Israel Brewster <israel at ravnalaska.net> wrote:
>> A question that has arisen before (for example, here: https://mail.python.org/pipermail/python-list/2010-January/565497.html <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is the question of "is defaultdict thread safe", with the answer generally being a conditional "yes", with the condition being what is used as the default value: apparently default values of python types, such as list, are thread safe,
> 
> I would not rely on this. It might be true for current versions of
> CPython, but I don't think there's any general guarantee and you could
> run into trouble with other implementations.

Right, completely agreed. Kinda feels "dirty" to rely on things like this to me.

> 
>> [...]
> 
> [...] You could use a regular dict and just check if
> the key is present, perhaps with the additional argument to .get() to
> return a default value.

True. Using defaultdict is simply saves having to stick the same default in every call to get(). DRY principal and all. That said, see below - I don't think the defaultdict is the issue.

> 
> Individual lookups and updates of ordinary dicts are atomic (at least
> in CPython). A lookup followed by an update is not, and this would be
> true for defaultdict as well.
> 
>> [...]
>> 1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary.
> 
> No, a thread-safety issue would be something like this:
> 
>    account[user] = account[user] + 1
> 
> where the value of account[user] could potentially change between the
> time it is looked up and the time it is set again.

That's what I thought - changing values/different values from expected, not missing values.

All that said, I just had a bit of an epiphany: the main thread is actually a Flask app, running through UWSGI with multiple *processes*, and using the flask-uwsgi-websocket plugin, which further uses greenlets. So what I was thinking was simply a separate thread was, in reality, a completely separate *process*. I'm sure that makes a difference. So what's actually happening here is the following:

1) the main python process starts, which initializes the dictionary (since it is at a global level)
2) uwsgi launches off a bunch of child worker processes (10 to be exact, each of which is set up with 10 gevent threads)
3a) a client connects (web socket connection to be exact). This connection is handled by an arbitrary worker, and an arbitrary green thread within that worker, based on UWSGI algorithms.
3b) This connection triggers launching of a *true* thread (using the python threading library) which, presumably, is now a child thread of that arbitrary uwsgi worker. <== BAD THING, I would think
4) The client makes a request for the list, which is handled by a DIFFERENT (presumably) arbitrary worker process and green thread.

So the end result is that the thread that "updates" the dictionary, and the thread that initially *populates* the dictionary are actually running in different processes. In fact, any given request could be in yet another process, which would seem to indicate that all bets are off as to what data is seen.

Now that I've thought through what is really happening, I think I need to re-architect things a bit here. For one thing, the update thread should be launched from the main process, not an arbitrary UWSGI worker. I had launched it from the client connection because there is no point in having it running if there is no one connected, but I may need to launch it from the __init__.py file instead. For another thing, since this dictionary will need to be accessed from arbitrary worker processes, I'm thinking I may need to move it to some sort of external storage, such as a redis database. Oy, I made my life complicated :-)

> That said it's not
> obvious to me what your problem actually is.
> -- 
> https://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list