Thread safety issue (I think) with defaultdict

Ian Kelly ian.g.kelly at gmail.com
Wed Nov 1 13:58:29 EDT 2017


On Tue, Oct 31, 2017 at 11:38 AM, Israel Brewster <israel at ravnalaska.net> wrote:
> A question that has arisen before (for example, here: https://mail.python.org/pipermail/python-list/2010-January/565497.html <https://mail.python.org/pipermail/python-list/2010-January/565497.html>) is the question of "is defaultdict thread safe", with the answer generally being a conditional "yes", with the condition being what is used as the default value: apparently default values of python types, such as list, are thread safe,

I would not rely on this. It might be true for current versions of
CPython, but I don't think there's any general guarantee and you could
run into trouble with other implementations.

> whereas more complicated constructs, such as lambdas, make it not thread safe. In my situation, I'm using a lambda, specifically:
>
> lambda: datetime.min
>
> So presumably *not* thread safe.
>
> My goal is to have a dictionary of aircraft and when they were last "seen", with datetime.min being effectively "never". When a data point comes in for a given aircraft, the data point will be compared with the value in the defaultdict for that aircraft, and if the timestamp on that data point is newer than what is in the defaultdict, the defaultdict will get updated with the value from the datapoint (not necessarily current timestamp, but rather the value from the datapoint). Note that data points do not necessarily arrive in chronological order (for various reasons not applicable here, it's just the way it is), thus the need for the comparison.

Since you're going to immediately replace the default value with an
actual value, it's not clear to me what the purpose of using a
defaultdict is here. You could use a regular dict and just check if
the key is present, perhaps with the additional argument to .get() to
return a default value.

Individual lookups and updates of ordinary dicts are atomic (at least
in CPython). A lookup followed by an update is not, and this would be
true for defaultdict as well.

> When the program first starts up, two things happen:
>
> 1) a thread is started that watches for incoming data points and updates the dictionary as per above, and
> 2) the dictionary should get an initial population (in the main thread) from hard storage.
>
> The behavior I'm seeing, however, is that when step 2 happens (which generally happens before the thread gets any updates), the dictionary gets populated with 56 entries, as expected. However, none of those entries are visible when the thread runs. It's as though the thread is getting a separate copy of the dictionary, although debugging says that is not the case - printing the variable from each location shows the same address for the object.
>
> So my questions are:
>
> 1) Is this what it means to NOT be thread safe? I was thinking of race conditions where individual values may get updated wrong, but this apparently is overwriting the entire dictionary.

No, a thread-safety issue would be something like this:

    account[user] = account[user] + 1

where the value of account[user] could potentially change between the
time it is looked up and the time it is set again. That said it's not
obvious to me what your problem actually is.



More information about the Python-list mailing list