[Python-ideas] Idea: Compressing the stack on the fly

Tue May 28 09:30:51 CEST 2013

On May 27, 2013, at 7:05, Chris Angelico <rosuav at gmail.com> wrote:

> On Mon, May 27, 2013 at 11:52 PM, Ned Batchelder <ned at nedbatchelder.com> wrote:
>> 
>> On 5/27/2013 9:30 AM, Chris Angelico wrote:
>>> What can tail recursion do that can't be done by reassigning to the
>>> function parameters and 'goto' back to the top?
>> 
>> That style can't handle mutually recursive procedures, or the extreme case:
>> a state machine implemented with N functions, each of which calls the next
>> state function at the end.  Tail-call elimination isn't simply about
>> noticing recursive calls.  It's about noticing that a function ends with a
>> function call, and not burning another stack frame in the process.
> 
> Ahh, gotcha. Of course. Mutual recursion would be a bit more of a
> problem to the compressor, too, though; it'd have to recognize a
> pattern that spans multiple stack frames.

That's exactly why I suggested that it would be more interesting in a PyPy-style tracing JIT than in a static compiler. At runtime, you can detect that your traced fast path has a FUNC opcode that calls a function whose body is already part of the trace, and turn that into code that stashes just enough information to generate stack frames if needed, then do a goto. In simple cases, that stashing could amount to no more than incrementing a counter; in complex cases, I think it might be closely related to stashing enough info to create generator states if you fall off the fast path.

That should be much simpler than trying to detect patterns that will lead to tail calls. And it means you only make the effort when it makes a difference.

But, most importantly, it means you can benefit even in non-tail-call cases, like the naive recursion in the original post. Even when you can't eliminate recursion you will still "compress" the state. And it might also let you optimize chains of generator yields or coroutine sends with not much extra effort (which might even be a basis for optimizing out explicit coroutine trampolines, which would be very cool).

This is pretty different from traditional TCO, but given that TCO isn't relevant to the OP's proposal or to his example, I don't think that's a problem.