What kind of "thread safe" are deque's actually?

Jack Dangler tdldev at gmail.com
Wed Mar 29 14:14:48 EDT 2023


On 3/29/23 13:13, Chris Angelico wrote:
> On Thu, 30 Mar 2023 at 01:52, Jack Dangler <tdldev at gmail.com> wrote:
>>
>> On 3/29/23 02:08, Chris Angelico wrote:
>>> On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list
>>> <python-list at python.org> wrote:
>>>> On 28/03/23 2:25 pm, Travis Griggs wrote:
>>>>> Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a median?
>>>> It may be a matter of whether the GIL is held or not. I had a look
>>>> at the source for deque, and it doesn't seem to explicitly do
>>>> anything about locking, it just relies on the GIL.
>>>>
>>>> So maybe statistics.median() is implemented in C and statistics.mean()
>>>> in Python, or something like that?
>>>>
>>> Both functions are implemented in Python, but median() starts out with
>>> this notable line:
>>>
>>>       data = sorted(data)
>>>
>>> which gives back a copy, iterated over rapidly in C. All subsequent
>>> work is done on that copy.
>>>
>>> The same effect could be had with mean() by taking a snapshot using
>>> list(q) and, I believe, would have the same effect (the source code
>>> for the sorted() function begins by calling PySequence_List).
>>>
>>> In any case, it makes *conceptual* sense to do your analysis on a copy
>>> of the queue, thus ensuring that your stats are stable. The other
>>> threads can keep going while you do your calculations, even if that
>>> means changing the queue.
>>>
>>> ChrisA
>> Sorry for any injected confusion here, but that line "data =
>> sorted(data)" appears as though it takes the value of the variable named
>> _data_, sorts it and returns it to the same variable store, so no copy
>> would be created. Am I missing something there?
> The variable name "data" is the parameter to median(), so it's
> whatever you ask for the median of. (I didn't make that obvious in my
> previous post - an excess of brevity on my part.)
>
> The sorted() function, UNlike list.sort(), returns a sorted copy of
> what it's given. I delved into the CPython source code for that, and
> it begins with the PySequence_List call to (effectively) call
> list(data) to get a copy of it. It ought to be a thread-safe copy due
> to holding the GIL the entire time. I'm not sure what would happen in
> a GIL-free world but most likely the lock on the input object would
> still ensure thread safety.
>
> ChrisA
Aah - thanks, Chris! That makes much more sense.


More information about the Python-list mailing list