[Python-ideas] Protecting finally clauses of interruptions

Wed Apr 4 18:59:57 CEST 2012

On 2012-04-04, at 4:04 AM, Paul Colomiets wrote:

> Hi,
> 
> On Wed, Apr 4, 2012 at 4:23 AM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
>> On 2012-04-03, at 3:22 PM, Paul Colomiets wrote:
>>> (Although, I don't know how `yield from` changes working with
>>> yield-based coroutines, may be it's behavior is quite different)
>>> 
>>> For greenlets situation is a bit different, as Python knows the
>>> stack there, but you still need to traverse it (or as Andrew
>>> mentioned, you can just propagate flag).
>> 
>> Why traverse?  Why propagate?  As I explained in my previous posts
>> here, you need to protect only the top-stack coroutines in the
>> timeouts or trampoline execution queues.  You should illustrate
>> your logic with a more clear example - say three or four coroutines
>> that call each other + with a glimpse of how your trampoline works.
>> But I'm not sure that is really necessary.
>> 
> 
> Here is more detailed previous example (although, still simplified):
> 
> @coroutine
> def add_money(user_id, money):
>    yield redis_lock(user_id)
>    try:
>        yield redis_incr('user:'+user_id+':money', money)
>    finally:
>        yield redis_unlock(user_id)
> 
> # this one is crucial to show the point of discusssion
> # other function are similar:
> @coroutine
> def redis_unlock(lock):
>    yield redis_socket.wait_write()  # yields back when socket is
> ready for writing
>    cmd = ('DEL user:'+lock+'\n').encode('ascii')
>    redis_socket.write(cmd)  # should be loop here, actually
>    yield redis_socket.wait_read()
>    result = redis_socket.read(1024)  # here loop too
>    assert result == 'OK\n'
> 
> The trampoline when gets coroutine from `next()` or `send()` method
> puts it on top of stack and doesn't dispatch original one until topmost
> one is exited.
> 
> The point is that if timeout arrives inside a `redis_unlock` function, we
> must wait until finally from `add_user` is finished

How can it "arrive" inside "redis_unlock"?  Let's assume you called
"add_money" as such:

yield add_money(1, 10).with_timeout(10)

Then it's the 'add_money' coroutine that should be in the tieouts queue/tree!
'add_money' specifically should be tried to be interrupted when your 10s timeout
reaches.  And if 'add_money' is in its 'finally' statement - you simply postpone
its interruption, meaning that 'redis_unlock' will end its execution nicely.

Again, I'm not sure how exactly you manage your timeouts.  The way I am, 
simplified: I have a timeouts heapq with pointers to those coroutines
that were *explicitly* executed with a timeout.  So I'm protecting only
the coroutines in that queue, because only them can be interrupted.  And
the coroutines they call, are protected *automatically*.

If you do it differently, can you please elaborate on how your scheduler
is actually designed?

>>> 
>>> The whole intention of using coroutine library is to not to
>>> have thread pool. Could you describe your use case
>>> with more details?
>> 
>> Well, our company has been using coroutines for like 2.5 years
>> now (the framework in not yet opensourced).  And in our practice
>> threadpool is really handy, as it allows you to:
>> 
>> - use non-asyncronous libraries, which you don't want to
>> monkeypatch with greensockets (or even unable to mokeypatch)
>> 
> 
> And we rewrite them in python. It seems to be more useful.

Sometimes you can't afford the luxury ;)

> 
>> - wrap some functions that are usually very fast, but once in
>> a while may take some time.  And sometimes you don't want to
>> offload them to a separate process
>> 
> 
> Ack.
> 
>> - and yes, do DNS lookups if you don't have a compiled cpython
>> extension that wraps c-ares or something alike.
>> 
> 
> Maybe let's propose asynchronous DNS library for python?
> We have same problem, although we do not resolve hosts at
> runtime (only at startup) so synchronous one is well enough
> for our purposes.
> 
>> Please let's avoid shifting further discussion to proving or
>> disproving the necessity of threadpools.
> 
> Agreed.
> 
>> They are being actively used and there is a demand for
>> (more or less) graceful threads interruption or abortion.
>> 
> 
> Given use cases, what stops you to make explicit
> interrtuption points?
> 
>> 
>> Please write a PEP and we'll continue discussion from that
>> point.  Hopefully, it will get more attention than this thread.
>> 
> 
> I don't see the point in writing a PEP until I have an idea
> what PEP should propose. If you have, you can do it. Again

OK, point taken.  Please give me couple of days to at least
come up with a summary document.  I still don't like your
solution because it works directly with frames.  With an
upcoming PyPy support of python 3, I don't think I want
to loose the JIT support.

I also want to take a look at the new PyPy continuations.

Ideally, as I proposed earlier, we should introduce some
sort of interruption protocol -- method 'interrupt()', with 
perhaps a callback.

> you want to implement thread interruption, and that's not
> my point, there is another thread for that.

We have two requests: ability to safely interrupt python
function or generator (1); ability to safely interrupt
python's threads (2).  Both (1) and (2) share the same
requirement of safe 'finally' statements.  To me, both
features are similar enough to come up with a single
solution, rather than inventing different approaches.

> On Wed, Apr 4, 2012 at 3:03 AM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> 
>> I don't think a frame flag on its own is quite enough.
>> You don't just want to prevent interruptions while in
>> a finally block, you want to defer them until the finally
>> counter gets back to zero. Making the interrupter sleep
>> and try again in that situation is rather ugly.

That's the second reason I don't like your proposal.

def foo():
   try:
      ..
   finally:
      yield unlock()
   # <--- the ideal point to interrupt foo

   f = open('a', 'w')
   # what if we interrupt it here?
   try:
      ..
   finally:
      f.close()

>> So perhaps there could also be a callback that gets
>> invoked when the counter goes down to zero.
> 
> Do you mean put callback in a frame, which get
> executed at next bytecode just like signal handler,
> except it waits until finally clause is executed?
> 
> I would work, except in may have light performance
> impact on each bytecode. But I'm not sure if it will
> be noticeable.

That's essentially the way we currently did it.  We transform the 
coroutine's __code__ object to make it from:

def a():
   try:
      # code1
   finally:
      # code2

to:

def a():
   __self__ = __get_current_coroutine()
   try:
     # code1
   finally:
     __self__.enter_finally()
     try:
       # code2
     finally:
       __self__.exit_finally()

'enter_finally' and 'exit_finally' maintain the internal counter
of finally blocks.  If a coroutine needs to be interrupted, we check
that counter.  If it is 0 - throw in a special exception.  If not - 
wait till it becomes 0 and throw the exception in 'exit_finally'.

Works flawlessly, but with the high cost of having to patch
__code__ objects.

-
Yury