Thread safety issue (I think) with defaultdict

Israel Brewster israel at ravnalaska.net
Fri Nov 3 14:12:39 EDT 2017


-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------




> On Nov 3, 2017, at 7:11 AM, Rhodri James <rhodri at kynesim.co.uk> wrote:
> 
> On 03/11/17 14:50, Chris Angelico wrote:
>> On Fri, Nov 3, 2017 at 10:26 PM, Rhodri James <rhodri at kynesim.co.uk> wrote:
>>> On 02/11/17 20:24, Chris Angelico wrote:
>>>> 
>>>> Thank you. I've had this argument with many people, smart people (like
>>>> Steven), people who haven't grokked that all concurrency has costs -
>>>> that threads aren't magically more dangerous than other options.
>>> 
>>> 
>>> I'm with Steven.  To be fair, the danger with threads is that most people
>>> don't understand thread-safety, and in particular don't understand either
>>> that they have a responsibility to ensure that shared data access is done
>>> properly or what the cost of that is.  I've seen far too much thread-based
>>> code over the years that would have been markedly less buggy and not much
>>> slower if it had been written sequentially.
>> Yes, but what you're seeing is that *concurrent* code is more
>> complicated than *sequential* code. Would the code in question have
>> been less buggy if it had used multiprocessing instead of
>> multithreading? What if it used explicit yield points?
> 
> My experience with situations where I can do a reasonable comparison is limited, but the answer appears to be "Yes".
> Multiprocessing
>> brings with it a whole lot of extra complications around moving data
>> around.
> 
> People generally understand how to move data around, and the mistakes are usually pretty obvious when they happen.  

I think the existence of this thread indicates otherwise :-) This mistake was far from obvious, and clearly I didn't understand properly how to move data around *between processes*. Unless you are just saying I am ignorant or something? :-)

> People may not understand how to move data around efficiently, but that's a separate argument.
> 
> Multithreading brings with it a whole lot of extra
>> complications around NOT moving data around.
> 
> I think this involves more subtle bugs that are harder to spot.  

Again, the existence of this thread indicates otherwise. This bug was quite subtile and hard to spot. It was only when I started looking at how many times a given piece of code was called (specifically, the part that handled data coming in for which there wasn't an entry in the dictionary) that I spotted the problem. If I hadn't had logging in place in that code block, I would have never realized it wasn't working as intended. You don't get much more subtile than that. And, furthermore, it only existed because I *wasn't* using threads. This bug simply doesn't exist in a threaded model, only in a multiprocessing model. Yes, the *explanation* of the bug is simple enough - each process "sees" a different value, since memory isn't shared - but the bug in my code was neither obvious or easy to spot, at least until you knew what was happening.

> People seem to find it harder to reason about atomicity and realising that widely separated pieces of code may interact unexpectedly.
> 
> Yield points bring with
>> them the risk of locking another thread out unexpectedly (particularly
>> since certain system calls aren't async-friendly on certain OSes).
> 
> I've got to admit I find coroutines straightforward, but I did cut my teeth on a cooperative OS.  It certainly makes the atomicity issues easier to deal with.

I still can't claim to understand them. Threads? No problem. Obviously I'm still lacking some understanding of how data works in the multiprocessing model, however.

> 
> All
>> three models have their pitfalls.
> 
> Assuredly.  I just think threads are soggier and hard to light^W^W^W^W^W prone to subtler and more mysterious-looking bugs.

And yet, this thread exists because of a subtle and mysterious-looking bug with multiple *processes* that doesn't exist with multiple *threads*. Thus the point - threads are no *worse* - just different - than any other concurrency model.

> 
> -- 
> Rhodri James *-* Kynesim Ltd
> -- 
> https://mail.python.org/mailman/listinfo/python-list <https://mail.python.org/mailman/listinfo/python-list>



More information about the Python-list mailing list