Thread safety issue (I think) with defaultdict

Steve D'Aprano steve+python at pearwood.info
Fri Nov 3 11:45:14 EDT 2017


On Sat, 4 Nov 2017 01:50 am, Chris Angelico wrote:

> On Fri, Nov 3, 2017 at 10:26 PM, Rhodri James <rhodri at kynesim.co.uk> wrote:

>> I'm with Steven.  To be fair, the danger with threads is that most people
>> don't understand thread-safety, and in particular don't understand either
>> that they have a responsibility to ensure that shared data access is done
>> properly or what the cost of that is.  I've seen far too much thread-based
>> code over the years that would have been markedly less buggy and not much
>> slower if it had been written sequentially.
> 
> Yes, but what you're seeing is that *concurrent* code is more
> complicated than *sequential* code. Would the code in question have
> been less buggy if it had used multiprocessing instead of
> multithreading? 

Maybe.

There's no way to be sure unless you actually compare a threading
implementation with a processing implementation -- and they have to
be "equally good, for the style" implementations. No fair comparing the
multiprocessing equivalent of "Stooge Sort" with the threading equivalent
of "Quick Sort", and concluding that threading is better.

However, we can predict the likelihood of which will be less buggy by
reasoning in general principles. And the general principle is that shared
data tends, all else being equal, to lead to more bugs than no shared data.
The more data is shared, the more bugs, more or less.

I don't know if there are any hard scientific studies on this, but experience
and anecdote strongly suggests it is true. Programming is not yet fully
evidence-based.

For example, most of us accept "global variables considered harmful". With few
exceptions, the use of application-wide global variables to communicate
between functions is harmful and leads to problems. This isn't because of any
sort of mystical or magical malignity from global variables. It is because
the use of global variables adds coupling between otherwise distant parts of
the code, and that adds complexity, and the more complex code is, the more
likely we mere humans are to screw it up.

So, all else being equal, which is likely to have more bugs?


1. Multiprocessing code with very little coupling between processes; or

2. Threaded code with shared data and hence higher coupling between threads?


Obviously the *best* threaded code will have fewer bugs than the *worst*
multiprocessing code, but my heuristic is that, in general, the average
application using threading is likely to be more highly coupled, hence more
complicated, than the equivalent using multiprocessing.

(Async is too new, and to me, too confusing, for me to have an opinion on yet,
but I lean slightly towards the position that deterministic task-switching is
probably better than non-deterministic.)



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.




More information about the Python-list mailing list