Reducing "yield from" overhead in recursive generators

Rathmann rathmann.os at gmail.com
Tue Mar 22 18:44:04 EDT 2022


> On 19 Mar 2022, at 03:07, Greg Ewing <greg.... at canterbury.ac.nz> wrote:
>
> On 19/03/22 9:40 am, Rathmann wrote:
>> The other challenge/question would be to see if there is a way to implement
>> this basic approach with lower overhead.
>
> My original implementation of yield-from didn't have this problem,
> because it didn't enter any of the intermediate frames -- it just
> ran down a chain of C pointers to find the currently active one.
>
> At some point this was changed, I think so that all the frames
> would show up in the traceback in the event of an exception.
> This was evidently seen as more important than having efficient
> resumption of nested generators.

So that sounds like the original CPython implementation had an
O(depth) component, but with a lower constant factor than the current
version?  (Of course, since Python limits recursion depth, speaking of
O(n) is not quite right.)

Prioritizing stack traces over performance seems consistent with the
well known blog post on tail recursion elimination.  This is one of
the reasons I was motivated to try to attack this at the
library/application level -- when the programmer invokes a
transformation like I am proposing, they know that it is going to
affect the stack trace.

I am still hoping to find a way to implement the transformed
function/driver combination with lower overhead.  This is a little
tricky since the performance of the approaches is very dependent on
the Python implementation.  Below are some timings for 2.7 and 3.8
CPython and PyPy. For 2.7, for the untransformed version I used a loop
of yield statements, and timed with time.clock(), for 3.8, it is
"yield from" and time.perf_counter().  All times are for traversing a
balanced tree of 8 million nodes on X86_64 Linux.

3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0]
For depth= 22 nodes= 8388607 iters= 1 
  Elapsed time (yield from): 6.840768865999053 driver: 6.702329244999419

3.6.9 (7.3.1+dfsg-4, Apr 22 2020, 05:15:29)
[PyPy 7.3.1 with GCC 9.3.0]
For depth= 22 nodes= 8388607 iters= 1 
  Elapsed time (yield from): 4.037276787999872 driver: 2.3724582960003318

2.7.18 (default, Mar  8 2021, 13:02:45) 
[GCC 9.3.0]
For depth= 22 nodes= 8388607 iters= 1 
  Elapsed time (yield loop): 8.863244 driver: 13.788312

2.7.13 (7.3.1+dfsg-2, Apr 21 2020, 05:05:41)
[PyPy 7.3.1 with GCC 9.3.0]
For depth= 22 nodes= 8388607 iters= 1 
  Elapsed time (yield loop): 8.802712562 driver: 2.034388533

The most extreme variation was 2.7 pypy, where the driver was 4 times
faster.  (Of course, 2.7 is less used these days.)


More information about the Python-list mailing list