From chris.jerdonek at gmail.com Wed Aug 2 19:30:06 2017 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Wed, 2 Aug 2017 16:30:06 -0700 Subject: [Async-sig] question re: loop.shutdown_asyncgens() In-Reply-To: References: <37549fdf-0c2d-482a-8ebb-317600e66117@Spark> <1aeafffe-d40c-4007-9a08-15d0cdadb2ce@Spark>

Message-ID: On Fri, Jul 28, 2017 at 1:58 PM, Yury Selivanov wrote: > On Jul 28, 2017, 4:57 PM -0400, Chris Jerdonek , wrote: >> Thanks! I'll try to find time to propose a PR. >> >> Also, for suggestions around the new API, would you prefer that be posted to PR #465, or can it be done here? > > I think we can discuss it here, but up to you. > So here are some of my thoughts related to exposing an API for asyncio.run() and friends: I think it would be helpful if, in addition to asyncio.run(), the API made some attempt to expose building blocks (where natural) so that if the user wants to do something slightly different from what run() supports, they don't need to copy from the internal implementation. I'm including one suggestion for this below. Related to this, I'd like to ask that the following two use cases be contemplated / relatively easy to support through the API. It doesn't need to be through the top-level functions like run(), but maybe by using the building blocks: 1) creating loops "on-the-fly" in different threads, like I asked about in this thread: https://mail.python.org/pipermail/async-sig/2017-July/000348.html The PR #465 discussion currently seems to be leaning away from supporting this in any way. 2) creating fresh loops for individual unittest TestCase methods / subTests, without mandating a specific approach. Different approaches that I think should be supportable through the API include user-implemented TestCase method decorators that create and destroy a loop using the API, as well as doing this through TestCase's setUp() and tearDown(). One "building block" that I think should be exposed is a context manager for managing the lifetime of a loop. It could look something like this: @contextmanager def new_loop(debug=False): loop = events.new_event_loop() try: events.set_event_loop(loop) if debug: loop.set_debug(True) yield loop finally: _cleanup(loop) Both run() and run_forever() in the latest patch posted on PR #465 can be implemented in terms of this context manager without changing the logic: https://github.com/python/asyncio/pull/465/commits/275072a2fbae0c98619597536d85a65bd72d706b Even if it was decided not to make this context manager public, I think the implementations of run() / run_forever() / etc. would benefit by making the commonalities between them clearer, etc. Thanks, --Chris From chris.jerdonek at gmail.com Sun Aug 6 00:41:42 2017 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Sat, 5 Aug 2017 21:41:42 -0700 Subject: [Async-sig] pattern for handling interrupt signals in asyncio Message-ID: I want to share a pattern I came up with for handling interrupt signals in asyncio to see if you had any feedback (ways to make it easier, similar approaches, etc). I wanted something that was easy to check and reason about. I'm already familiar with some of the pitfalls in handling signals, for example as described in Nathaniel's Control-C blog post announced here: https://mail.python.org/pipermail/async-sig/2017-April/thread.html The basic idea is to create a Future to run alongside the main coroutine whose only purpose is to "catch" the signal. And then call-- asyncio.wait(futures, return_when=asyncio.FIRST_COMPLETED) When a signal is received, both tasks stop, and then you have access to the main task (which will be pending) for things like cleanup and inspection. One advantage of this approach is that it lets you put all your cleanup logic in the main program instead of putting some of it in the signal handler. You also don't need to worry about things like handling KeyboardInterrupt at arbitrary points in your code. I'm including the code at bottom. On the topic of asyncio.run() that I mentioned in an earlier email [1], it doesn't look like the run() API posted in PR #465 [2] has hooks to support what I'm describing (but I could be wrong). So maybe this is another use case that the future API should contemplate. --Chris [1] https://mail.python.org/pipermail/async-sig/2017-August/000373.html [2] https://github.com/python/asyncio/pull/465 import asyncio import io import signal def _cleanup(loop): try: loop.run_until_complete(loop.shutdown_asyncgens()) finally: loop.close() def handle_sigint(future): future.set_result(signal.SIGINT) async def run(): print('running...') await asyncio.sleep(1000000) def get_message(sig, task): stream = io.StringIO() task.print_stack(file=stream) traceback = stream.getvalue() return f'interrupted by {sig.name}:\n{traceback}' def main(coro): loop = asyncio.new_event_loop() try: # This is made truthy if the loop is interrupted by a signal. interrupted = [] future = asyncio.Future(loop=loop) future.add_done_callback(lambda future: interrupted.append(1)) loop.add_signal_handler(signal.SIGINT, handle_sigint, future) futures = [future, coro] future = asyncio.wait(futures, return_when=asyncio.FIRST_COMPLETED) done, pending = loop.run_until_complete(future) if interrupted: # Do whatever cleanup you want here and/or get the stacktrace # of the interrupted main task. sig = done.pop().result() task = pending.pop() msg = get_message(sig, task) task.cancel() raise KeyboardInterrupt(msg) finally: _cleanup(loop) main(run()) Below is what the code above outputs if you run it and then press Control-C: running... ^CTraceback (most recent call last): File "test-signal.py", line 54, in main(run()) File "test-signal.py", line 49, in main raise KeyboardInterrupt(msg) KeyboardInterrupt: interrupted by SIGINT: Stack for wait_for=()]>> (most recent call last): File "test-signal.py", line 17, in run await asyncio.sleep(1000000) From chris.jerdonek at gmail.com Sun Aug 6 15:04:59 2017 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Sun, 6 Aug 2017 12:04:59 -0700 Subject: [Async-sig] pattern for handling interrupt signals in asyncio In-Reply-To: References: Message-ID: On Sat, Aug 5, 2017 at 9:41 PM, Chris Jerdonek wrote: > I want to share a pattern I came up with for handling interrupt > signals in asyncio to see if you had any feedback (ways to make it > easier, similar approaches, etc). > Just after sending this email, I learned that approaches like this can't work in general since they can't interrupt a "tight loop," and that changes to asyncio are needed. There are some discussions about this on GitHub: The signal solution is indeed nicer, because it ensures that interrupts are > treated as regular asyncio events, but it means you can't interrupt code > that's stuck in a tight CPU loop (e.g. while True: pass), and it requires > more sophistication from users. (from: https://github.com/python/asyncio/pull/305#issuecomment-168541045 ) This is a big no-no. In the first version of uvloop I did exactly this -- > handle SIGINT and let the loop to handle it asynchronously. It was > completely unusable. Turns out people write tight loops quite frequently, > and inability to stop your Python program with Ctrl-C is something they > aren't prepared to handle at all. (from: https://github.com/python/asyncio/issues/341#issuecomment-236443331 ) The current open issue is here: https://github.com/python/asyncio/issues/341 --Chris > I wanted something that was easy to check and reason about. I'm > already familiar with some of the pitfalls in handling signals, for > example as described in Nathaniel's Control-C blog post announced > here: > https://mail.python.org/pipermail/async-sig/2017-April/thread.html > > The basic idea is to create a Future to run alongside the main > coroutine whose only purpose is to "catch" the signal. And then call-- > > asyncio.wait(futures, return_when=asyncio.FIRST_COMPLETED) > > When a signal is received, both tasks stop, and then you have access > to the main task (which will be pending) for things like cleanup and > inspection. > > One advantage of this approach is that it lets you put all your > cleanup logic in the main program instead of putting some of it in the > signal handler. You also don't need to worry about things like > handling KeyboardInterrupt at arbitrary points in your code. > > I'm including the code at bottom. > > On the topic of asyncio.run() that I mentioned in an earlier email > [1], it doesn't look like the run() API posted in PR #465 [2] has > hooks to support what I'm describing (but I could be wrong). So maybe > this is another use case that the future API should contemplate. > > --Chris > > [1] https://mail.python.org/pipermail/async-sig/2017-August/000373.html > [2] https://github.com/python/asyncio/pull/465 > > > import asyncio > import io > import signal > > def _cleanup(loop): > try: > loop.run_until_complete(loop.shutdown_asyncgens()) > finally: > loop.close() > > def handle_sigint(future): > future.set_result(signal.SIGINT) > > async def run(): > print('running...') > await asyncio.sleep(1000000) > > def get_message(sig, task): > stream = io.StringIO() > task.print_stack(file=stream) > traceback = stream.getvalue() > return f'interrupted by {sig.name}:\n{traceback}' > > def main(coro): > loop = asyncio.new_event_loop() > > try: > # This is made truthy if the loop is interrupted by a signal. > interrupted = [] > > future = asyncio.Future(loop=loop) > future.add_done_callback(lambda future: interrupted.append(1)) > > loop.add_signal_handler(signal.SIGINT, handle_sigint, future) > > futures = [future, coro] > future = asyncio.wait(futures, return_when=asyncio.FIRST_ > COMPLETED) > done, pending = loop.run_until_complete(future) > > if interrupted: > # Do whatever cleanup you want here and/or get the stacktrace > # of the interrupted main task. > sig = done.pop().result() > task = pending.pop() > msg = get_message(sig, task) > > task.cancel() > raise KeyboardInterrupt(msg) > > finally: > _cleanup(loop) > > main(run()) > > > Below is what the code above outputs if you run it and then press > Control-C: > > running... > ^CTraceback (most recent call last): > File "test-signal.py", line 54, in > main(run()) > File "test-signal.py", line 49, in main > raise KeyboardInterrupt(msg) > KeyboardInterrupt: interrupted by SIGINT: > Stack for > wait_for= 0x10fe9b9a8>()]>> (most recent call last): > File "test-signal.py", line 17, in run > await asyncio.sleep(1000000) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pfreixes at gmail.com Sun Aug 6 18:57:07 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Mon, 7 Aug 2017 00:57:07 +0200 Subject: [Async-sig] Feedback, loop.load() function Message-ID: Hi guys, I would appreciate any feedback about the idea of implementing a new load function to ask about how saturated is your reactor. I have a proof of concept [1] of how the load function might be implemented in the Asyncio python loop. The idea is to provide a method that can be used to ask about the load of the reactor in a specific time, this implementation returns the load taking into account the last 60 seconds but it can easily return the 5m and 15minutes ones u others. This method can help services built on to of Asyncio to implement back pressure mechanisms that take into account a metric coming from the loop, instead of inferring the load using other metrics provided by external agents such as the CPU, load average u others. Nowadays exists some alternatives for other languages that address this situation using the lag of a scheduler callback, produced by saturated reactors. The most known implementation is toobusy [2] a nodejs implementation. IMHO the solution provided by tobusy has a strong dependency with the hardware needing to tune the maximum lag allowed in terms of milliseconds [3]. in the POF presented the user can use an exact value meaning the percentage of the load, perhaps 0.9 Any comment would be appreciated. [1] https://github.com/pfreixes/cpython/commit/5fef3cae043abd62165ce40b181286e18f5fb19c [2] https://www.npmjs.com/package/toobusy [3] https://www.npmjs.com/package/toobusy#tunable-parameters -- --pau From chris.jerdonek at gmail.com Fri Aug 11 05:41:20 2017 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Fri, 11 Aug 2017 02:41:20 -0700 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References: Message-ID: On Sun, Aug 6, 2017 at 3:57 PM, Pau Freixes wrote: > Hi guys, > > I would appreciate any feedback about the idea of implementing a new > load function to ask about how saturated is your reactor. Hi, Would it be possible for you to rephrase what you've done in terms of asyncio terminology? From what I can tell, "reactor" isn't a term used in the asyncio docs or code base. It might also improve the readability of your asyncio patch to use asyncio terminology in the code comments, doc strings, etc. --Chris > > I have a proof of concept [1] of how the load function might be > implemented in the Asyncio python loop. > > The idea is to provide a method that can be used to ask about the load > of the reactor in a specific time, this implementation returns the > load taking into account the last 60 seconds but it can easily return > the 5m and 15minutes ones u others. > > This method can help services built on to of Asyncio to implement back > pressure mechanisms that take into account a metric coming from the > loop, instead of inferring the load using other metrics provided by > external agents such as the CPU, load average u others. > > Nowadays exists some alternatives for other languages that address > this situation using the lag of a scheduler callback, produced by > saturated reactors. The most known implementation is toobusy [2] a > nodejs implementation. > > IMHO the solution provided by tobusy has a strong dependency with the > hardware needing to tune the maximum lag allowed in terms of > milliseconds [3]. in the POF presented the user can use an exact value > meaning the percentage of the load, perhaps 0.9 > > Any comment would be appreciated. > > [1] https://github.com/pfreixes/cpython/commit/5fef3cae043abd62165ce40b181286e18f5fb19c > [2] https://www.npmjs.com/package/toobusy > [3] https://www.npmjs.com/package/toobusy#tunable-parameters > -- > --pau > _______________________________________________ > Async-sig mailing list > Async-sig at python.org > https://mail.python.org/mailman/listinfo/async-sig > Code of Conduct: https://www.python.org/psf/codeofconduct/ From njs at pobox.com Fri Aug 11 14:04:43 2017 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 11 Aug 2017 11:04:43 -0700 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References: Message-ID: It looks like your "load average" is computing something very different than the traditional Unix "load average". If I'm reading right, yours is a measure of what percentage of the time the loop spent sleeping waiting for I/O, taken over the last 60 ticks of a 1 second timer (so generally slightly longer than 60 seconds). The traditional Unix load average is an exponentially weighted moving average of the length of the run queue. Is one of those definitions better for your goal of detecting when to shed load? I don't know. But calling them the same thing is pretty confusing :-). The Unix version also has the nice property that it can actually go above 1; yours doesn't distinguish between a service whose load is at exactly 100% of capacity and barely keeping up, versus one that's at 200% of capacity and melting down. But for load shedding maybe you always want your tripwire to be below that anyway. More broadly we might ask what's the best possible metric for this purpose ? how do we judge? A nice thing about the JavaScript library you mention is that scheduling delay is a real thing that directly impacts quality of service ? it's more of an "end to end" measure in a sense. Of course, if you really want an end to end measure you can do things like instrument your actual logic, see how fast you're replying to http requests or whatever, which is even more valid but creates complications because some requests are supposed to take longer than others, etc. I don't know which design goals are important for real operations. On Aug 6, 2017 3:57 PM, "Pau Freixes" wrote: > Hi guys, > > I would appreciate any feedback about the idea of implementing a new > load function to ask about how saturated is your reactor. > > I have a proof of concept [1] of how the load function might be > implemented in the Asyncio python loop. > > The idea is to provide a method that can be used to ask about the load > of the reactor in a specific time, this implementation returns the > load taking into account the last 60 seconds but it can easily return > the 5m and 15minutes ones u others. > > This method can help services built on to of Asyncio to implement back > pressure mechanisms that take into account a metric coming from the > loop, instead of inferring the load using other metrics provided by > external agents such as the CPU, load average u others. > > Nowadays exists some alternatives for other languages that address > this situation using the lag of a scheduler callback, produced by > saturated reactors. The most known implementation is toobusy [2] a > nodejs implementation. > > IMHO the solution provided by tobusy has a strong dependency with the > hardware needing to tune the maximum lag allowed in terms of > milliseconds [3]. in the POF presented the user can use an exact value > meaning the percentage of the load, perhaps 0.9 > > Any comment would be appreciated. > > [1] https://github.com/pfreixes/cpython/commit/ > 5fef3cae043abd62165ce40b181286e18f5fb19c > [2] https://www.npmjs.com/package/toobusy > [3] https://www.npmjs.com/package/toobusy#tunable-parameters > -- > --pau > _______________________________________________ > Async-sig mailing list > Async-sig at python.org > https://mail.python.org/mailman/listinfo/async-sig > Code of Conduct: https://www.python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From manu.mirandad at gmail.com Sat Aug 12 16:24:16 2017 From: manu.mirandad at gmail.com (manuel miranda) Date: Sat, 12 Aug 2017 20:24:16 +0000 Subject: [Async-sig] Async-sig Digest, Vol 14, Issue 5 In-Reply-To: References: Message-ID: I have one comment regarding what Nathaniel Smith said: """ Of course, if you really want an end to end measure you can do things like instrument your actual logic, see how fast you're replying to http requests or whatever, which is even more valid but creates complications because some requests are supposed to take longer than others, etc. """ you can't always use http requests or other metrics to measure how busy is your worker. Some examples of invalid metrics: - Your service can depend on external services that may be the ones making you slow. In this case, scaling up because a spike on your http response time doesn't help, its a waste of resources - You can't use metrics like CPU, memory, etc. Of course, they may mean something, but how do you know if its your worker who is using the CPU or any other random process triggered because admin connected manually, redis is also in the same machine, and a large of "please don't do that in PROD" list :) So, IMO how busy is the loop (I don't know what's the correct metric name here) is specific to that worker which will tell you that your service is dying because it is receiving too many asyncio.Tasks to be served inside what you consider a normal window time. For example, if you have an API where more than 1 second response time is not acceptable, if that loop metric is above 2 seconds (stable) you know you have to do something (scale up, improve something, etc). On Sat, Aug 12, 2017 at 6:03 PM wrote: > Send Async-sig mailing list submissions to > async-sig at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/async-sig > or, via email, send a message with subject or body 'help' to > async-sig-request at python.org > > You can reach the person managing the list at > async-sig-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Async-sig digest..." > > > Today's Topics: > > 1. Re: Feedback, loop.load() function (Nathaniel Smith) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 11 Aug 2017 11:04:43 -0700 > From: Nathaniel Smith > To: Pau Freixes > Cc: async-sig at python.org > Subject: Re: [Async-sig] Feedback, loop.load() function > Message-ID: > < > CAPJVwBmEx7UzMtN6WGVpWvhbrOhnhzrv7EkOf3vLseiyRf6dbQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > It looks like your "load average" is computing something very different > than the traditional Unix "load average". If I'm reading right, yours is a > measure of what percentage of the time the loop spent sleeping waiting for > I/O, taken over the last 60 ticks of a 1 second timer (so generally > slightly longer than 60 seconds). The traditional Unix load average is an > exponentially weighted moving average of the length of the run queue. > > Is one of those definitions better for your goal of detecting when to shed > load? I don't know. But calling them the same thing is pretty confusing > :-). The Unix version also has the nice property that it can actually go > above 1; yours doesn't distinguish between a service whose load is at > exactly 100% of capacity and barely keeping up, versus one that's at 200% > of capacity and melting down. But for load shedding maybe you always want > your tripwire to be below that anyway. > > More broadly we might ask what's the best possible metric for this purpose > ? how do we judge? A nice thing about the JavaScript library you mention is > that scheduling delay is a real thing that directly impacts quality of > service ? it's more of an "end to end" measure in a sense. Of course, if > you really want an end to end measure you can do things like instrument > your actual logic, see how fast you're replying to http requests or > whatever, which is even more valid but creates complications because some > requests are supposed to take longer than others, etc. I don't know which > design goals are important for real operations. > > On Aug 6, 2017 3:57 PM, "Pau Freixes" wrote: > > > Hi guys, > > > > I would appreciate any feedback about the idea of implementing a new > > load function to ask about how saturated is your reactor. > > > > I have a proof of concept [1] of how the load function might be > > implemented in the Asyncio python loop. > > > > The idea is to provide a method that can be used to ask about the load > > of the reactor in a specific time, this implementation returns the > > load taking into account the last 60 seconds but it can easily return > > the 5m and 15minutes ones u others. > > > > This method can help services built on to of Asyncio to implement back > > pressure mechanisms that take into account a metric coming from the > > loop, instead of inferring the load using other metrics provided by > > external agents such as the CPU, load average u others. > > > > Nowadays exists some alternatives for other languages that address > > this situation using the lag of a scheduler callback, produced by > > saturated reactors. The most known implementation is toobusy [2] a > > nodejs implementation. > > > > IMHO the solution provided by tobusy has a strong dependency with the > > hardware needing to tune the maximum lag allowed in terms of > > milliseconds [3]. in the POF presented the user can use an exact value > > meaning the percentage of the load, perhaps 0.9 > > > > Any comment would be appreciated. > > > > [1] https://github.com/pfreixes/cpython/commit/ > > 5fef3cae043abd62165ce40b181286e18f5fb19c > > [2] https://www.npmjs.com/package/toobusy > > [3] https://www.npmjs.com/package/toobusy#tunable-parameters > > -- > > --pau > > _______________________________________________ > > Async-sig mailing list > > Async-sig at python.org > > https://mail.python.org/mailman/listinfo/async-sig > > Code of Conduct: https://www.python.org/psf/codeofconduct/ > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/async-sig/attachments/20170811/ee09abaf/attachment-0001.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Async-sig mailing list > Async-sig at python.org > https://mail.python.org/mailman/listinfo/async-sig > > > ------------------------------ > > End of Async-sig Digest, Vol 14, Issue 5 > **************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pfreixes at gmail.com Sun Aug 13 06:36:36 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Sun, 13 Aug 2017 12:36:36 +0200 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References: Message-ID: > Would it be possible for you to rephrase what you've done in terms of > asyncio terminology? From what I can tell, "reactor" isn't a term used > in the asyncio docs or code base. s/reactor/loop, in any case, I will take into account to make all of my comments consistency with the current Python documentation. >It might also improve the > readability of your asyncio patch to use asyncio terminology in the > code comments, doc strings, etc. Just a POC to gather some feedbacks. But appreciate your comment, I will take into account for new code. -- --pau From pfreixes at gmail.com Sun Aug 13 06:54:11 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Sun, 13 Aug 2017 12:54:11 +0200 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References: Message-ID: > It looks like your "load average" is computing something very different than > the traditional Unix "load average". If I'm reading right, yours is a > measure of what percentage of the time the loop spent sleeping waiting for > I/O, taken over the last 60 ticks of a 1 second timer (so generally slightly > longer than 60 seconds). The traditional Unix load average is an > exponentially weighted moving average of the length of the run queue. The implementation proposed wants to expose the load of the loop. Having a direct metric that comes from the loop instead of using an external metric such as CPU, load average u others. Yes, the load average uses a decay function based on the length of the run queue for those processes that are using or waiting for a CPU, this gives us extra information about how overloaded is our system. If you compare it with the CPU load. In the case presented, the load of the loop is something equivalent with the load of the CPU and it does not have the ability to inform you about how much overloaded is your loop once reached the 100%. > > Is one of those definitions better for your goal of detecting when to shed > load? I don't know. But calling them the same thing is pretty confusing :-). > The Unix version also has the nice property that it can actually go above 1; > yours doesn't distinguish between a service whose load is at exactly 100% of > capacity and barely keeping up, versus one that's at 200% of capacity and > melting down. But for load shedding maybe you always want your tripwire to > be below that anyway. Well, I partially disagree with this. The load definition has its equivalent in computing with other metrics that have a close range, such as the CPU one. I've never had the intention to align the load of the loop with the load average, I've just used the concept as an example of the metric that might be used to check how loaded is your system. > > More broadly we might ask what's the best possible metric for this purpose ? > how do we judge? A nice thing about the JavaScript library you mention is > that scheduling delay is a real thing that directly impacts the quality of > service ? it's more of an "end to end" measure in a sense. Of course, if you > really want an end to end measure you can do things like instrument your > actual logic, see how fast you're replying to HTTP requests or whatever, > which is even more valid but creates complications because some requests are > supposed to take longer than others, etc. I don't know which design goals > are important for real operations. Here the key for me, something where I should have based my rationale. How good is the way presented to measure a load of your asynchronous system compared with the toobusy one? what can we achieve with this metric? I will work on that as the base of my rationale for the change proposed. Then, once if the rationale is accepted the implementation is peanuts :) -- --pau From pfreixes at gmail.com Sun Aug 20 19:27:32 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Mon, 21 Aug 2017 01:27:32 +0200 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References:

Message-ID: Hi, I have a second implementation of the load function [1], the previous one was too much naive. The main changes compared with the previous one are: 1) Uses a decay function instead of a vector of samples. 2) The load stands for the percentage of CPU used, giving a global view of how many CPU resources are still unused. 3) The update is done at each _LOAD_FREQ - default 1 second - without using a scheduler callback 4) Many corner cases fixed The performance impact introduced in the default loop implemented with Python is approx 3% running a trivial program that does not have an application overhead [2]. Therefore, in real applications with at least some footprint introduced by the application, this performance impact should be negligible. As an example of how the load method can be used, the following code [3] runs the loop using different ratios of coroutines per second, where each coroutine has a CPU impact of 0.02. Having a maximum throughput expected of 50 coroutines per second. In this example, the coroutine asks first for the load of the system before start consuming the CPU, if the load is higher than 0.9 the coroutine leaves doing nothing. The following snippet shows the execution output : Load reached for 10.0 coros/seq: 0.20594804872227002, abandoned 0/100 reseting load.... Load reached for 20.0 coros/seq: 0.40599215789994814, abandoned 0/200 reseting load.... Load reached for 40.0 coros/seq: 0.8055964270483202, abandoned 0/400 reseting load.... Load reached for 80.0 coros/seq: 0.9390106831339007, abandoned 450/800 The program runs as was said different levels of throughput printing at the final of each one the load of the system reached, and the coroutines that were abandoned vs the overall. As can be seen, once the last test begins to run and the load of the system reaches the 0.90 it is able to reduce the pressure at the application level, in that case just leaving doing nothing but in other environments doing the proper fallback. As you can notice the load values seem to be aligned with the values expected, having in mind that the maximum throughput reached would be 50 coroutines per second. The current implementation tries to return the load taking into account the global CPU resources, therefore other processes impacting on the use of the same CPU used by the loop should be considered. It would give to the developer a reliable metric that can be used in environments where the CPU is shared by other processes. What is missing? Investigate how difficult will be implemented this feature in libuv [4] to make it available in uvloop. Give more information about the difference between this implementation vs the toobusy one [5]. Test performance impact in real applications. I will like to get more feedback from you. And, if you believe that this implementation has some chances to be part of CPython repo which would be the next steps that I should make. [1] https://github.com/pfreixes/cpython/commit/ac07fef5af51746c7311494f21b0f067c772a2bf [2] https://gist.github.com/pfreixes/233fd8c6a6ec82f2cde4688a2976bf2d [3] https://gist.github.com/pfreixes/fd26c36391b33056b7efd525e4690aef [4] http://docs.libuv.org/en/v1.x/ [5] https://github.com/lloyd/node-toobusy On Sun, Aug 13, 2017 at 12:54 PM, Pau Freixes wrote: >> It looks like your "load average" is computing something very different than >> the traditional Unix "load average". If I'm reading right, yours is a >> measure of what percentage of the time the loop spent sleeping waiting for >> I/O, taken over the last 60 ticks of a 1 second timer (so generally slightly >> longer than 60 seconds). The traditional Unix load average is an >> exponentially weighted moving average of the length of the run queue. > > The implementation proposed wants to expose the load of the loop. > Having a direct metric that comes from the loop instead of using an > external metric such as CPU, load average u others. > > Yes, the load average uses a decay function based on the length of the > run queue for those processes that are using or waiting for a CPU, > this gives us extra information about how overloaded is our system. If > you compare it with the CPU load. > > In the case presented, the load of the loop is something equivalent > with the load of the CPU and it does not have the ability to inform > you about how much overloaded is your loop once reached the 100%. > >> >> Is one of those definitions better for your goal of detecting when to shed >> load? I don't know. But calling them the same thing is pretty confusing :-). >> The Unix version also has the nice property that it can actually go above 1; >> yours doesn't distinguish between a service whose load is at exactly 100% of >> capacity and barely keeping up, versus one that's at 200% of capacity and >> melting down. But for load shedding maybe you always want your tripwire to >> be below that anyway. > > Well, I partially disagree with this. The load definition has its > equivalent in computing with other metrics that have a close range, > such as the CPU one. I've never had the intention to align the load of > the loop with the load average, I've just used the concept as an > example of the metric that might be used to check how loaded is your > system. > >> >> More broadly we might ask what's the best possible metric for this purpose ? >> how do we judge? A nice thing about the JavaScript library you mention is >> that scheduling delay is a real thing that directly impacts the quality of >> service ? it's more of an "end to end" measure in a sense. Of course, if you >> really want an end to end measure you can do things like instrument your >> actual logic, see how fast you're replying to HTTP requests or whatever, >> which is even more valid but creates complications because some requests are >> supposed to take longer than others, etc. I don't know which design goals >> are important for real operations. > > Here the key for me, something where I should have based my rationale. > How good is the way presented to measure a load of your asynchronous > system compared with the toobusy one? what can we achieve with this > metric? > > I will work on that as the base of my rationale for the change > proposed. Then, once if the rationale is accepted the implementation > is peanuts :) > > -- > --pau -- --pau From yselivanov at gmail.com Mon Aug 21 12:15:21 2017 From: yselivanov at gmail.com (Yury Selivanov) Date: Mon, 21 Aug 2017 12:15:21 -0400 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References:

Message-ID: Hi Pau, I personally don't think we need this in asyncio. ?While the function has a relatively low overhead, it's still an overhead, it's a couple more syscalls on each loop iteration, and it's a bit of a technical debt. The bar for adding new functionality to asyncio is very high, and I don't see a good rationale for adding this function. ?Is it for debugging purposes? ?Or for profiling live applications? ?If it's the latter, then there are so many other things we want to see, and some of them are protocol-specific. If we want to add some tracing/profiling functions there should be a way to disable them, otherwise the performance of event loops like uvloop will degrade, and I'm not sure that all of its users want to pay a price for something they won't ever be using. ?All of this just adds to the complexity. OTOH, event loops in asyncio are pluggable. ?You can just subclass the asyncio event loop and add your method. asyncio code base is very stable, so you won't need to fix your code frequently. Thanks, Yury On Aug 20, 2017, 7:27 PM -0400, Pau Freixes , wrote: > Hi, > > I have a second implementation of the load function [1], the previous > one was too much naive. The main changes compared with the previous > one are: > > 1) Uses a decay function instead of a vector of samples. > 2) The load stands for the percentage of CPU used, giving a global > view of how many CPU resources are still unused. > 3) The update is done at each _LOAD_FREQ - default 1 second - without > using a scheduler callback > 4) Many corner cases fixed > > The performance impact introduced in the default loop implemented with > Python is approx 3% running a trivial program that does not have an > application overhead [2]. Therefore, in real applications with at > least some footprint introduced by the application, this performance > impact should be negligible. > > As an example of how the load method can be used, the following code > [3] runs the loop using different ratios of coroutines per second, > where each coroutine has a CPU impact of 0.02. Having a maximum > throughput expected of 50 coroutines per second. In this example, the > coroutine asks first for the load of the system before start consuming > the CPU, if the load is higher than 0.9 the coroutine leaves doing > nothing. > > The following snippet shows the execution output : > > Load reached for 10.0 coros/seq: 0.20594804872227002, abandoned 0/100 > reseting load.... > Load reached for 20.0 coros/seq: 0.40599215789994814, abandoned 0/200 > reseting load.... > Load reached for 40.0 coros/seq: 0.8055964270483202, abandoned 0/400 > reseting load.... > Load reached for 80.0 coros/seq: 0.9390106831339007, abandoned 450/800 > > The program runs as was said different levels of throughput printing > at the final of each one the load of the system reached, and the > coroutines that were abandoned vs the overall. As can be seen, once > the last test begins to run and the load of the system reaches the > 0.90 it is able to reduce the pressure at the application level, in > that case just leaving doing nothing but in other environments doing > the proper fallback. As you can notice the load values seem to be > aligned with the values expected, having in mind that the maximum > throughput reached would be 50 coroutines per second. > > The current implementation tries to return the load taking into > account the global CPU resources, therefore other processes impacting > on the use of the same CPU used by the loop should be considered. It > would give to the developer a reliable metric that can be used in > environments where the CPU is shared by other processes. > > What is missing? Investigate how difficult will be implemented this > feature in libuv [4] to make it available in uvloop. Give more > information about the difference between this implementation vs the > toobusy one [5]. Test performance impact in real applications. > > I will like to get more feedback from you. And, if you believe that > this implementation has some chances to be part of CPython repo which > would be the next steps that I should make. > > > [1] https://github.com/pfreixes/cpython/commit/ac07fef5af51746c7311494f21b0f067c772a2bf > [2] https://gist.github.com/pfreixes/233fd8c6a6ec82f2cde4688a2976bf2d > [3] https://gist.github.com/pfreixes/fd26c36391b33056b7efd525e4690aef > [4] http://docs.libuv.org/en/v1.x/ > [5] https://github.com/lloyd/node-toobusy > > On Sun, Aug 13, 2017 at 12:54 PM, Pau Freixes wrote: > > > It looks like your "load average" is computing something very different than > > > the traditional Unix "load average". If I'm reading right, yours is a > > > measure of what percentage of the time the loop spent sleeping waiting for > > > I/O, taken over the last 60 ticks of a 1 second timer (so generally slightly > > > longer than 60 seconds). The traditional Unix load average is an > > > exponentially weighted moving average of the length of the run queue. > > > > The implementation proposed wants to expose the load of the loop. > > Having a direct metric that comes from the loop instead of using an > > external metric such as CPU, load average u others. > > > > Yes, the load average uses a decay function based on the length of the > > run queue for those processes that are using or waiting for a CPU, > > this gives us extra information about how overloaded is our system. If > > you compare it with the CPU load. > > > > In the case presented, the load of the loop is something equivalent > > with the load of the CPU and it does not have the ability to inform > > you about how much overloaded is your loop once reached the 100%. > > > > > > > > Is one of those definitions better for your goal of detecting when to shed > > > load? I don't know. But calling them the same thing is pretty confusing :-). > > > The Unix version also has the nice property that it can actually go above 1; > > > yours doesn't distinguish between a service whose load is at exactly 100% of > > > capacity and barely keeping up, versus one that's at 200% of capacity and > > > melting down. But for load shedding maybe you always want your tripwire to > > > be below that anyway. > > > > Well, I partially disagree with this. The load definition has its > > equivalent in computing with other metrics that have a close range, > > such as the CPU one. I've never had the intention to align the load of > > the loop with the load average, I've just used the concept as an > > example of the metric that might be used to check how loaded is your > > system. > > > > > > > > More broadly we might ask what's the best possible metric for this purpose ? > > > how do we judge? A nice thing about the JavaScript library you mention is > > > that scheduling delay is a real thing that directly impacts the quality of > > > service ? it's more of an "end to end" measure in a sense. Of course, if you > > > really want an end to end measure you can do things like instrument your > > > actual logic, see how fast you're replying to HTTP requests or whatever, > > > which is even more valid but creates complications because some requests are > > > supposed to take longer than others, etc. I don't know which design goals > > > are important for real operations. > > > > Here the key for me, something where I should have based my rationale. > > How good is the way presented to measure a load of your asynchronous > > system compared with the toobusy one? what can we achieve with this > > metric? > > > > I will work on that as the base of my rationale for the change > > proposed. Then, once if the rationale is accepted the implementation > > is peanuts :) > > > > -- > > --pau > > > > -- > --pau > _______________________________________________ > Async-sig mailing list > Async-sig at python.org > https://mail.python.org/mailman/listinfo/async-sig > Code of Conduct: https://www.python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Aug 21 19:35:05 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 21 Aug 2017 16:35:05 -0700 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References:

Message-ID: On Mon, Aug 21, 2017 at 9:15 AM, Yury Selivanov wrote: > Hi Pau, > > I personally don't think we need this in asyncio. While the function has a > relatively low overhead, it's still an overhead, it's a couple more syscalls > on each loop iteration, and it's a bit of a technical debt. > > The bar for adding new functionality to asyncio is very high, and I don't > see a good rationale for adding this function. Is it for debugging > purposes? Or for profiling live applications? If it's the latter, then > there are so many other things we want to see, and some of them are > protocol-specific. One approach would be to add a generic instrumentation API. Trio has something like this, that I think is expressive enough to let Pau implement their busyness checking as a library: https://trio.readthedocs.io/en/latest/reference-core.html#instrument-api This has several advantages over subclassing: multiple libraries can define their own instrumentation without interfering with each other, you don't have to redefine the instrumentation for every loop implementation, and you don't have to hook up the instrumentation when setting up the loop, e.g. you could just do something like: import toobusy with toobusy.Monitor(loop) as monitor: if monitor.toobusy(): ... -n -- Nathaniel J. Smith -- https://vorpus.org From pfreixes at gmail.com Tue Aug 22 07:55:01 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Tue, 22 Aug 2017 13:55:01 +0200 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References:

Message-ID: Hi, > I personally don't think we need this in asyncio. While the function has a > relatively low overhead, it's still an overhead, it's a couple more syscalls > on each loop iteration, and it's a bit of a technical debt. > > The bar for adding new functionality to asyncio is very high, and I don't > see a good rationale for adding this function. Is it for debugging > purposes? Or for profiling live applications? If it's the latter, then > there are so many other things we want to see, and some of them are > protocol-specific. Let's try to evolve the rationale and put some extra links. The load of an Asyncio loop can be at some point easily inferred using the sleeping time vs the overall time, this information brings us to understand how to saturate is the loop with a metric that informs you how many CPU resources are being used, or most important how many CPU resources left. How helpful can be this method? In our organization, e use back-pressure at the application layer of our REST microservices architecture. It allows us to prevent overloading the services. Once the back pressures kicks in we can scale horizontally our services to cope the current load. This is already implemented for other languages and we are currently working on how to implement it with the aiohttp(asyncio) stack. For more info about this technique these articles [1] [2] We are not the first ones running microservices at scale, and this pattern has been implemented by other organizations. I would like to mention the Google case [2]. From that link I would like to bold the following paragraph: """ A better solution is to measure capacity directly in available resources. For example, you may have a total of 500 CPU cores and 1 TB of memory reserved for a given service in a given datacenter. Naturally, it works much better to use those numbers directly to model a datacenter's capacity. We often speak about the cost of a request to refer to a normalized measure of how much CPU time it has consumed (over different CPU architectures, with consideration of performance differences). In a majority of cases (although certainly not in all), we've found that simply using CPU consumption as the signal for provisioning works well """ >From my understanding, the comment is pretty aligned with the implementation proposal for the Asyncio loop Having, as a result, a way to measure if there are enough resources to cope the ongoing metric. [1] https://dzone.com/articles/applying-back-pressure-when [2] http://engineering.voxer.com/2013/09/16/backpressure-in-nodejs/ [3] https://landing.google.com/sre/book/chapters/handling-overload.html > If we want to add some tracing/profiling functions there should be a way to > disable them, otherwise the performance of event loops like uvloop will > degrade, and I'm not sure that all of its users want to pay a price for > something they won't ever be using. All of this just adds to the > complexity. The goal will be, have an implementation without impact performance for real applications. I'm still not sure if this is reachable with the uvloop, I would like to start working on this as soon as possible, having the proper numbers and the possibilities to implement this in libuv will help to get the proper answer. If at last, there is no way to make it negligible, then I would agree that is needed a way to switch off or switch on. From pfreixes at gmail.com Tue Aug 22 08:36:55 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Tue, 22 Aug 2017 14:36:55 +0200 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References:

Message-ID: > One approach would be to add a generic instrumentation API. Trio has > something like this, that I think is expressive enough to let Pau > implement their busyness checking as a library: > > https://trio.readthedocs.io/en/latest/reference-core.html#instrument-api > > This has several advantages over subclassing: multiple libraries can > define their own instrumentation without interfering with each other, > you don't have to redefine the instrumentation for every loop > implementation, and you don't have to hook up the instrumentation when > setting up the loop, e.g. you could just do something like: > > import toobusy > with toobusy.Monitor(loop) as monitor: > if monitor.toobusy(): > ... It will help also other loops to meet the same contract making them compatibles with already implemented instruments. Maybe the major concern here is the performance penalty, do you have some numbers about how negligible is have all of these signals available to be used? -- --pau From pfreixes at gmail.com Tue Aug 22 19:31:38 2017 From: pfreixes at gmail.com (Pau Freixes) Date: Wed, 23 Aug 2017 01:31:38 +0200 Subject: [Async-sig] Feedback, loop.load() function In-Reply-To: References:

Message-ID: I had an idea based on the Trios implementation. Trying to move the load method implementation out of the loop scope but avoiding the subclassing. My primary concern with the instrumentation was the performance impact. I've run some tests in one experimental branch, instrumenting 4 events for the _run_once function. The idea was to see how the performance degrades, despite there were not listeners/instruments attached. My gut feelings said that only the call to emit the event and the proper check to see if there were listeners will have an expensive performance cost. The numbers were pretty clear, the default Asyncio loses a 10% of performance. Degrading from the 30K coroutines per second to 27K in my laptop. Therefore, looks like we are facing a stopper that would need to be addressed first. How might Python address this situation? Basically implementing a new object type that will provide the same interface as a Signal. Many languages have their own implementation C# [1], Java [2] or they have libraries that support Signals or Events. Indeed Python has plenty of them implemented as independent modules, the most known are Django Signals [3] or Blinker [4] >From my understanding here the problem is the language by itself, having the interpreter behind the scenes. What I would propose here is a new Python object called Signal implemented as a new type in the CPython API that will allow the developers to implement the following snippet having the chance to speed up the code or at least don't be penalized when there are no observers listening from a specific event. class Foo: def __init__(): self._cb = [] def add_listener(self, cb): self._cp.append(cb) def _signal(self, msg): for cb in self._cb: cb(msg) def run(): self._signal("start") self._signal("stop") f = Foo() f.run() def print_signal(msg): print(msg) f.add_listener(print_signal) f.run() This previous snippet might be implemented using a new object type called signal, having the next snippet: class Foo: def __init__(): self.start = signal() self.stop = signal() def run(): self.start.send() self.stop.send() f = Foo() f.run() def start(): print(msg) def stop(): print(msg) f.start.connect(start) f.stop.connect(stop) f.run() This kind of pattern is spread in many frameworks [5], and they might get benefited for free thanks to this new type of object. I'm wondering if there were other attempts in the past to implement this kind of object in the CPython API. There is some resistance to modify or add superfluous code in the code if it can be implemented as a Python module? The simple fact to become a must for an internal feature might enough to think on that? Thoughts ? [1] http://docs.oracle.com/javase/tutorial/uiswing/events/index.html [2] https://msdn.microsoft.com/en-us/library/aa645739(v=vs.71).aspx [3] https://docs.djangoproject.com/en/1.11/topics/signals/ [4] http://pythonhosted.org/blinker/ [5] http://flask.pocoo.org/docs/0.12/signals/ On Tue, Aug 22, 2017 at 2:36 PM, Pau Freixes wrote: >> One approach would be to add a generic instrumentation API. Trio has >> something like this, that I think is expressive enough to let Pau >> implement their busyness checking as a library: >> >> https://trio.readthedocs.io/en/latest/reference-core.html#instrument-api >> >> This has several advantages over subclassing: multiple libraries can >> define their own instrumentation without interfering with each other, >> you don't have to redefine the instrumentation for every loop >> implementation, and you don't have to hook up the instrumentation when >> setting up the loop, e.g. you could just do something like: >> >> import toobusy >> with toobusy.Monitor(loop) as monitor: >> if monitor.toobusy(): >> ... > > It will help also other loops to meet the same contract making them > compatibles with already implemented instruments. Maybe the major > concern here is the performance penalty, do you have some numbers > about how negligible is have all of these signals available to be > used? > -- > --pau -- --pau