From chris.jerdonek at gmail.com  Mon Feb 12 23:10:32 2018
From: chris.jerdonek at gmail.com (Chris Jerdonek)
Date: Mon, 12 Feb 2018 20:10:32 -0800
Subject: [Async-sig] avoiding accidentally calling blocking code
Message-ID: <CAOTb1wd=7X-kAae8viC4cHgNc4MU9H74orTLZszsfTZihJy-nQ@mail.gmail.com>

Hi,

This is a general Python async question I've had that I haven't seen a
discussion of. Do people know of any techniques other than manually
inspecting code line by line to find out if you're accidentally making
a raw call to a blocking function in a coroutine?

Related to this, again, outside of inspecting the external source
code, is there any way to know if a function you're calling (e.g. in
someone else's library or in the standard lib) has the potential of
blocking?

What would be the way of detecting something like this if it made its
way into "production"? What would be some of the symptoms?

--Chris

From dimaqq at gmail.com  Tue Feb 13 00:07:24 2018
From: dimaqq at gmail.com (Dima Tisnek)
Date: Tue, 13 Feb 2018 13:07:24 +0800
Subject: [Async-sig] avoiding accidentally calling blocking code
In-Reply-To: <CAOTb1wd=7X-kAae8viC4cHgNc4MU9H74orTLZszsfTZihJy-nQ@mail.gmail.com>
References: <CAOTb1wd=7X-kAae8viC4cHgNc4MU9H74orTLZszsfTZihJy-nQ@mail.gmail.com>
Message-ID: <CAGGBzX+9yGkAHqDEpt2t-vTsjgQB-_HuY_MYwOzx1eR6uMaSiQ@mail.gmail.com>

Let me try to answer the question behind the question.

Like any code validation, it's a healthy mix of:
* unit tests [perhaps with
mock.patch_all_known_blocking_calls(side_effect=Exception)]
* good judgement [open("foo").read() technically blocks, but only a
problem on network filesystems]
* monitoring prod [at least to ensure that tests correspond to prod]
* peer review


Specific issues you've raised:

re: detecting blocking calls ahead of time.
That's a hard problem (insert quip about Turing completeness)
Perhaps it can be alleviated by function annotations, but that's
dangerously close to Java way...

re: detecting blocking calls at run time.
Let's say there's a per-thread global "blocking lock" that's managed
by event loop executor and it's in "allowed" state when executor has
no runnables and wishes to select/poll/epoll and it's in "denied"
state when user code is running.
Some calls can be caught (e.g. check fd fcntl bits before any recv/send/...)
Some calls can never be caught (anything via ctypes or extensions in general)
Case in point: what do you propose to do about socket.gethostbyname()?

re: after the fact.
hack1. run a separate thread/process that inspects event loop thread's
`wchan` value (Linux)
hack2. same via taskstats
hack3. detect event loop lag (false positives are cpu-bound tasks and
overloaded system, but I think you'd like to detect those too)

On 13 February 2018 at 12:10, Chris Jerdonek <chris.jerdonek at gmail.com> wrote:
> Hi,
>
> This is a general Python async question I've had that I haven't seen a
> discussion of. Do people know of any techniques other than manually
> inspecting code line by line to find out if you're accidentally making
> a raw call to a blocking function in a coroutine?
>
> Related to this, again, outside of inspecting the external source
> code, is there any way to know if a function you're calling (e.g. in
> someone else's library or in the standard lib) has the potential of
> blocking?
>
> What would be the way of detecting something like this if it made its
> way into "production"? What would be some of the symptoms?
>
> --Chris
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/

From andrew.svetlov at gmail.com  Tue Feb 13 02:24:36 2018
From: andrew.svetlov at gmail.com (Andrew Svetlov)
Date: Tue, 13 Feb 2018 07:24:36 +0000
Subject: [Async-sig] avoiding accidentally calling blocking code
In-Reply-To: <CAGGBzX+9yGkAHqDEpt2t-vTsjgQB-_HuY_MYwOzx1eR6uMaSiQ@mail.gmail.com>
References: <CAOTb1wd=7X-kAae8viC4cHgNc4MU9H74orTLZszsfTZihJy-nQ@mail.gmail.com>
 <CAGGBzX+9yGkAHqDEpt2t-vTsjgQB-_HuY_MYwOzx1eR6uMaSiQ@mail.gmail.com>
Message-ID: <CAL3CFcV27EwAy92SJ60cCPd_wJE2JKk9i-HgwgTk=u_cBe2Lrg@mail.gmail.com>

loop.slow_callback_duration + PYTHONASYNCIODEBUG=1
Does it fork for you?
Do you need extra info?

On Tue, Feb 13, 2018 at 7:07 AM Dima Tisnek <dimaqq at gmail.com> wrote:

> Let me try to answer the question behind the question.
>
> Like any code validation, it's a healthy mix of:
> * unit tests [perhaps with
> mock.patch_all_known_blocking_calls(side_effect=Exception)]
> * good judgement [open("foo").read() technically blocks, but only a
> problem on network filesystems]
> * monitoring prod [at least to ensure that tests correspond to prod]
> * peer review
>
>
> Specific issues you've raised:
>
> re: detecting blocking calls ahead of time.
> That's a hard problem (insert quip about Turing completeness)
> Perhaps it can be alleviated by function annotations, but that's
> dangerously close to Java way...
>
> re: detecting blocking calls at run time.
> Let's say there's a per-thread global "blocking lock" that's managed
> by event loop executor and it's in "allowed" state when executor has
> no runnables and wishes to select/poll/epoll and it's in "denied"
> state when user code is running.
> Some calls can be caught (e.g. check fd fcntl bits before any
> recv/send/...)
> Some calls can never be caught (anything via ctypes or extensions in
> general)
> Case in point: what do you propose to do about socket.gethostbyname()?
>
> re: after the fact.
> hack1. run a separate thread/process that inspects event loop thread's
> `wchan` value (Linux)
> hack2. same via taskstats
> hack3. detect event loop lag (false positives are cpu-bound tasks and
> overloaded system, but I think you'd like to detect those too)
>
> On 13 February 2018 at 12:10, Chris Jerdonek <chris.jerdonek at gmail.com>
> wrote:
> > Hi,
> >
> > This is a general Python async question I've had that I haven't seen a
> > discussion of. Do people know of any techniques other than manually
> > inspecting code line by line to find out if you're accidentally making
> > a raw call to a blocking function in a coroutine?
> >
> > Related to this, again, outside of inspecting the external source
> > code, is there any way to know if a function you're calling (e.g. in
> > someone else's library or in the standard lib) has the potential of
> > blocking?
> >
> > What would be the way of detecting something like this if it made its
> > way into "production"? What would be some of the symptoms?
> >
> > --Chris
> > _______________________________________________
> > Async-sig mailing list
> > Async-sig at python.org
> > https://mail.python.org/mailman/listinfo/async-sig
> > Code of Conduct: https://www.python.org/psf/codeofconduct/
> _______________________________________________
> Async-sig mailing list
> Async-sig at python.org
> https://mail.python.org/mailman/listinfo/async-sig
> Code of Conduct: https://www.python.org/psf/codeofconduct/
>
-- 
Thanks,
Andrew Svetlov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20180213/ba817ac9/attachment.html>

From chris.jerdonek at gmail.com  Wed Feb 14 03:42:20 2018
From: chris.jerdonek at gmail.com (Chris Jerdonek)
Date: Wed, 14 Feb 2018 00:42:20 -0800
Subject: [Async-sig] avoiding accidentally calling blocking code
In-Reply-To: <CAL3CFcV27EwAy92SJ60cCPd_wJE2JKk9i-HgwgTk=u_cBe2Lrg@mail.gmail.com>
References: <CAOTb1wd=7X-kAae8viC4cHgNc4MU9H74orTLZszsfTZihJy-nQ@mail.gmail.com>
 <CAGGBzX+9yGkAHqDEpt2t-vTsjgQB-_HuY_MYwOzx1eR6uMaSiQ@mail.gmail.com>
 <CAL3CFcV27EwAy92SJ60cCPd_wJE2JKk9i-HgwgTk=u_cBe2Lrg@mail.gmail.com>
Message-ID: <CAOTb1wevtMH3DFgtH2A22ujU+zw=hWPNJ=w3xs8=W-x0SMKkbA@mail.gmail.com>

Thanks, Dima and Andrew, for your suggestions.

Re: loop.slow_callback_duration, that's interesting. I didn't know
about that. However, it doesn't seem like that alone would work to
catch many operations that are normally fast, but strictly speaking
blocking. For example, it seems like simple disk I/O operations
wouldn't be caught even with slow_callback_duration set to a small
value. Dima suggests that such calls are okay. Is there a consensus on
that?

Dima's mock.patch_all_known_blocking_calls() is an interesting idea
and seems like it would work for the case I mentioned. Has anyone
started writing such a method (e.g. for certain standard lib modules)?

--Chris


On Mon, Feb 12, 2018 at 11:24 PM, Andrew Svetlov
<andrew.svetlov at gmail.com> wrote:
> loop.slow_callback_duration + PYTHONASYNCIODEBUG=1
> Does it fork for you?
> Do you need extra info?
>
> On Tue, Feb 13, 2018 at 7:07 AM Dima Tisnek <dimaqq at gmail.com> wrote:
>>
>> Let me try to answer the question behind the question.
>>
>> Like any code validation, it's a healthy mix of:
>> * unit tests [perhaps with
>> mock.patch_all_known_blocking_calls(side_effect=Exception)]
>> * good judgement [open("foo").read() technically blocks, but only a
>> problem on network filesystems]
>> * monitoring prod [at least to ensure that tests correspond to prod]
>> * peer review
>>
>>
>> Specific issues you've raised:
>>
>> re: detecting blocking calls ahead of time.
>> That's a hard problem (insert quip about Turing completeness)
>> Perhaps it can be alleviated by function annotations, but that's
>> dangerously close to Java way...
>>
>> re: detecting blocking calls at run time.
>> Let's say there's a per-thread global "blocking lock" that's managed
>> by event loop executor and it's in "allowed" state when executor has
>> no runnables and wishes to select/poll/epoll and it's in "denied"
>> state when user code is running.
>> Some calls can be caught (e.g. check fd fcntl bits before any
>> recv/send/...)
>> Some calls can never be caught (anything via ctypes or extensions in
>> general)
>> Case in point: what do you propose to do about socket.gethostbyname()?
>>
>> re: after the fact.
>> hack1. run a separate thread/process that inspects event loop thread's
>> `wchan` value (Linux)
>> hack2. same via taskstats
>> hack3. detect event loop lag (false positives are cpu-bound tasks and
>> overloaded system, but I think you'd like to detect those too)
>>
>> On 13 February 2018 at 12:10, Chris Jerdonek <chris.jerdonek at gmail.com>
>> wrote:
>> > Hi,
>> >
>> > This is a general Python async question I've had that I haven't seen a
>> > discussion of. Do people know of any techniques other than manually
>> > inspecting code line by line to find out if you're accidentally making
>> > a raw call to a blocking function in a coroutine?
>> >
>> > Related to this, again, outside of inspecting the external source
>> > code, is there any way to know if a function you're calling (e.g. in
>> > someone else's library or in the standard lib) has the potential of
>> > blocking?
>> >
>> > What would be the way of detecting something like this if it made its
>> > way into "production"? What would be some of the symptoms?
>> >
>> > --Chris
>> > _______________________________________________
>> > Async-sig mailing list
>> > Async-sig at python.org
>> > https://mail.python.org/mailman/listinfo/async-sig
>> > Code of Conduct: https://www.python.org/psf/codeofconduct/
>> _______________________________________________
>> Async-sig mailing list
>> Async-sig at python.org
>> https://mail.python.org/mailman/listinfo/async-sig
>> Code of Conduct: https://www.python.org/psf/codeofconduct/
>
> --
> Thanks,
> Andrew Svetlov

From andrew.svetlov at gmail.com  Wed Feb 14 04:41:15 2018
From: andrew.svetlov at gmail.com (Andrew Svetlov)
Date: Wed, 14 Feb 2018 09:41:15 +0000
Subject: [Async-sig] avoiding accidentally calling blocking code
In-Reply-To: <CAOTb1wevtMH3DFgtH2A22ujU+zw=hWPNJ=w3xs8=W-x0SMKkbA@mail.gmail.com>
References: <CAOTb1wd=7X-kAae8viC4cHgNc4MU9H74orTLZszsfTZihJy-nQ@mail.gmail.com>
 <CAGGBzX+9yGkAHqDEpt2t-vTsjgQB-_HuY_MYwOzx1eR6uMaSiQ@mail.gmail.com>
 <CAL3CFcV27EwAy92SJ60cCPd_wJE2JKk9i-HgwgTk=u_cBe2Lrg@mail.gmail.com>
 <CAOTb1wevtMH3DFgtH2A22ujU+zw=hWPNJ=w3xs8=W-x0SMKkbA@mail.gmail.com>
Message-ID: <CAL3CFcWDQO9rgD9Y6YF4iHFeK8b6PZ786m2ygV5PmXWsb-ohAA@mail.gmail.com>

Mocking everything sounds promising but the problem is in *everything* word.
slow_callback_duration is a viable compromise.
Personally I don't care about fast blocking IO until it is really very fast.

On Wed, Feb 14, 2018 at 10:42 AM Chris Jerdonek <chris.jerdonek at gmail.com>
wrote:

> Thanks, Dima and Andrew, for your suggestions.
>
> Re: loop.slow_callback_duration, that's interesting. I didn't know
> about that. However, it doesn't seem like that alone would work to
> catch many operations that are normally fast, but strictly speaking
> blocking. For example, it seems like simple disk I/O operations
> wouldn't be caught even with slow_callback_duration set to a small
> value. Dima suggests that such calls are okay. Is there a consensus on
> that?
>
> Dima's mock.patch_all_known_blocking_calls() is an interesting idea
> and seems like it would work for the case I mentioned. Has anyone
> started writing such a method (e.g. for certain standard lib modules)?
>
> --Chris
>
>
> On Mon, Feb 12, 2018 at 11:24 PM, Andrew Svetlov
> <andrew.svetlov at gmail.com> wrote:
> > loop.slow_callback_duration + PYTHONASYNCIODEBUG=1
> > Does it fork for you?
> > Do you need extra info?
> >
> > On Tue, Feb 13, 2018 at 7:07 AM Dima Tisnek <dimaqq at gmail.com> wrote:
> >>
> >> Let me try to answer the question behind the question.
> >>
> >> Like any code validation, it's a healthy mix of:
> >> * unit tests [perhaps with
> >> mock.patch_all_known_blocking_calls(side_effect=Exception)]
> >> * good judgement [open("foo").read() technically blocks, but only a
> >> problem on network filesystems]
> >> * monitoring prod [at least to ensure that tests correspond to prod]
> >> * peer review
> >>
> >>
> >> Specific issues you've raised:
> >>
> >> re: detecting blocking calls ahead of time.
> >> That's a hard problem (insert quip about Turing completeness)
> >> Perhaps it can be alleviated by function annotations, but that's
> >> dangerously close to Java way...
> >>
> >> re: detecting blocking calls at run time.
> >> Let's say there's a per-thread global "blocking lock" that's managed
> >> by event loop executor and it's in "allowed" state when executor has
> >> no runnables and wishes to select/poll/epoll and it's in "denied"
> >> state when user code is running.
> >> Some calls can be caught (e.g. check fd fcntl bits before any
> >> recv/send/...)
> >> Some calls can never be caught (anything via ctypes or extensions in
> >> general)
> >> Case in point: what do you propose to do about socket.gethostbyname()?
> >>
> >> re: after the fact.
> >> hack1. run a separate thread/process that inspects event loop thread's
> >> `wchan` value (Linux)
> >> hack2. same via taskstats
> >> hack3. detect event loop lag (false positives are cpu-bound tasks and
> >> overloaded system, but I think you'd like to detect those too)
> >>
> >> On 13 February 2018 at 12:10, Chris Jerdonek <chris.jerdonek at gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > This is a general Python async question I've had that I haven't seen a
> >> > discussion of. Do people know of any techniques other than manually
> >> > inspecting code line by line to find out if you're accidentally making
> >> > a raw call to a blocking function in a coroutine?
> >> >
> >> > Related to this, again, outside of inspecting the external source
> >> > code, is there any way to know if a function you're calling (e.g. in
> >> > someone else's library or in the standard lib) has the potential of
> >> > blocking?
> >> >
> >> > What would be the way of detecting something like this if it made its
> >> > way into "production"? What would be some of the symptoms?
> >> >
> >> > --Chris
> >> > _______________________________________________
> >> > Async-sig mailing list
> >> > Async-sig at python.org
> >> > https://mail.python.org/mailman/listinfo/async-sig
> >> > Code of Conduct: https://www.python.org/psf/codeofconduct/
> >> _______________________________________________
> >> Async-sig mailing list
> >> Async-sig at python.org
> >> https://mail.python.org/mailman/listinfo/async-sig
> >> Code of Conduct: https://www.python.org/psf/codeofconduct/
> >
> > --
> > Thanks,
> > Andrew Svetlov
>
-- 
Thanks,
Andrew Svetlov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/async-sig/attachments/20180214/f53a83e8/attachment.html>

From njs at pobox.com  Wed Feb 14 05:18:33 2018
From: njs at pobox.com (Nathaniel Smith)
Date: Wed, 14 Feb 2018 02:18:33 -0800
Subject: [Async-sig] avoiding accidentally calling blocking code
In-Reply-To: <CAOTb1wevtMH3DFgtH2A22ujU+zw=hWPNJ=w3xs8=W-x0SMKkbA@mail.gmail.com>
References: <CAOTb1wd=7X-kAae8viC4cHgNc4MU9H74orTLZszsfTZihJy-nQ@mail.gmail.com>
 <CAGGBzX+9yGkAHqDEpt2t-vTsjgQB-_HuY_MYwOzx1eR6uMaSiQ@mail.gmail.com>
 <CAL3CFcV27EwAy92SJ60cCPd_wJE2JKk9i-HgwgTk=u_cBe2Lrg@mail.gmail.com>
 <CAOTb1wevtMH3DFgtH2A22ujU+zw=hWPNJ=w3xs8=W-x0SMKkbA@mail.gmail.com>
Message-ID: <CAPJVwBk+U+jx6wmKsBdHqHird0=q8eLWM=xcHen1L3h6=NPHHQ@mail.gmail.com>

On Wed, Feb 14, 2018 at 12:42 AM, Chris Jerdonek
<chris.jerdonek at gmail.com> wrote:
> Thanks, Dima and Andrew, for your suggestions.
>
> Re: loop.slow_callback_duration, that's interesting. I didn't know
> about that. However, it doesn't seem like that alone would work to
> catch many operations that are normally fast, but strictly speaking
> blocking. For example, it seems like simple disk I/O operations
> wouldn't be caught even with slow_callback_duration set to a small
> value. Dima suggests that such calls are okay. Is there a consensus on
> that?

It's not generally possible to avoid occasional arbitrary blocking,
e.g. due to the GC running, the OS scheduler, page faults, etc.
Basically the problem caused by blocking is when it means that other
tasks are stuck waiting when they could be getting useful work done.
If callbacks are finishing quickly then this isn't happening, so
slow_callback_duration is checking for exactly the right thing.

Where it might fall down is for operations that are only occasionally
slow, so they slip past your testing. E.g. if your disk is fast when
testing on your developer machine, but then in production you run on
some high-occupancy cloud host and a noisy neighbor starts pounding
the disk and suddenly your disk latencies shoot up.

> Dima's mock.patch_all_known_blocking_calls() is an interesting idea
> and seems like it would work for the case I mentioned. Has anyone
> started writing such a method (e.g. for certain standard lib modules)?

The implementation of gevent.monkey.patch_all() is probably not too
far from what you want.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org