What kind of "thread safe" are deque's actually?

Chris Angelico rosuav at gmail.com
Wed Mar 29 13:13:12 EDT 2023


On Thu, 30 Mar 2023 at 01:52, Jack Dangler <tdldev at gmail.com> wrote:
>
>
> On 3/29/23 02:08, Chris Angelico wrote:
> > On Wed, 29 Mar 2023 at 16:56, Greg Ewing via Python-list
> > <python-list at python.org> wrote:
> >> On 28/03/23 2:25 pm, Travis Griggs wrote:
> >>> Interestingly the error also only started showing up when I switched from running a statistics.mean() on one of these, instead of what I had been using, a statistics.median(). Apparently the kind of iteration done in a mean, is more conflict prone than a median?
> >> It may be a matter of whether the GIL is held or not. I had a look
> >> at the source for deque, and it doesn't seem to explicitly do
> >> anything about locking, it just relies on the GIL.
> >>
> >> So maybe statistics.median() is implemented in C and statistics.mean()
> >> in Python, or something like that?
> >>
> > Both functions are implemented in Python, but median() starts out with
> > this notable line:
> >
> >      data = sorted(data)
> >
> > which gives back a copy, iterated over rapidly in C. All subsequent
> > work is done on that copy.
> >
> > The same effect could be had with mean() by taking a snapshot using
> > list(q) and, I believe, would have the same effect (the source code
> > for the sorted() function begins by calling PySequence_List).
> >
> > In any case, it makes *conceptual* sense to do your analysis on a copy
> > of the queue, thus ensuring that your stats are stable. The other
> > threads can keep going while you do your calculations, even if that
> > means changing the queue.
> >
> > ChrisA
> Sorry for any injected confusion here, but that line "data =
> sorted(data)" appears as though it takes the value of the variable named
> _data_, sorts it and returns it to the same variable store, so no copy
> would be created. Am I missing something there?

The variable name "data" is the parameter to median(), so it's
whatever you ask for the median of. (I didn't make that obvious in my
previous post - an excess of brevity on my part.)

The sorted() function, UNlike list.sort(), returns a sorted copy of
what it's given. I delved into the CPython source code for that, and
it begins with the PySequence_List call to (effectively) call
list(data) to get a copy of it. It ought to be a thread-safe copy due
to holding the GIL the entire time. I'm not sure what would happen in
a GIL-free world but most likely the lock on the input object would
still ensure thread safety.

ChrisA


More information about the Python-list mailing list