[Python-ideas] Exposing flat bytecode representation to optimizers

Fri Feb 5 21:37:29 EST 2016

On Feb 5, 2016, at 15:54, Victor Stinner <victor.stinner at gmail.com> wrote:
> 
> 2016-02-05 20:58 GMT+01:00 Andrew Barnert via Python-ideas
> <python-ideas at python.org>:
>> It would break the optimizer APIs, but `PyCode_Optimize` isn't public
> 
> Sadly, PyCode_Optimize() is public and part of the stable ABI.

Oops, you're right--I thought it was inside the Py_LIMITED_API check, but it's after the #endif. 

Anyway, it's not documented anywhere--no mention in the C API docs, and no explanation of what it does in the header (not even the usual one-liner comment). So, I think if you break that in PEP 511, hopefully nobody will complain :)

> But I'm
> ok to take the responsability of breaking it if we implement the PEP
> 511 since we will probably have a much better API to optimize AST and
> bytecode, and I would prefer that no one directly uses
> PyCode_Optimize()!

Great!

>> and the API proposed by PEP 511 is public, but that PEP isn't even finalized, much less accepted yet.
> 
> To be clear: the PEP 511 is a draft still stuck in python-ideas, it's
> not ready for a review on python-dev. I still expect changes on the
> API.
> 
> I'm very interested by your feedback on the bytecode transformer API :-)

As I mentioned before, I think it probably has to take and return a complete code object, not just the four pieces of the code object that the peephole optimizer happens to need.

But meanwhile:

>>  [^3]: The compiler doesn't actually have exactly what the optimizers would want, but it's pretty close: it has a linked list of block objects, each of which has an array of instruction objects, with jump targets being pointers to blocks. That's actually even better to work with, but too complicated to expose to optimizers.
> 
> The current implementation of the peephole optimizer has a complex
> code to update jumps.

Yes, and it has to be complex, even despite punting on all the hard cases. :)

> IMHO it would be simpler if the code would be
> blocks of instructions.

Sure. But I didn't want to expose the compiler's internal blocks concept to the optimizer; that seems way too brittle.

We could design and  build a new optimizer-friendly concept of blocks that's independent of the compiler's (as it sounds like you've done in the past), or something simpler like MRAB's labels, or some other "macro-assembler"-style feature that nobody's yet thought of in this discussion.

If we really need to do that, I think it's seriously worth looking at incorporating byteplay, porting it to C, and providing a C API for it rather than designing something new. Besides the fact that it hews pretty closely to the design of the dis module, which is already in the stdlib, a variety of people have been using it for years, and I think experience shows that it makes transforming code objects, including inserting and moving ops, easy enough. It also takes care of other annoying things, like getting all 17 of the constructor arguments to code right.

But I think Serhiy's much simpler suggestion may be good enough: making NOP removal part of the repack process means that a simple optimizer (like the existing peephole optimizer) never has to renumber jumps or lnotab, and using a flat array of fixed-size instructions means that even if more complicated optimizers do have to renumber, it's easy instead of complex. I won't really know how his idea pans out until I build it and play with it, but I'm hopeful.

At any rate, whatever format we expose (unpacked bytecode, a macro-assembler-ish format, the internal compiler blocks, etc.) obviously determines the API for PEP 511 bytecode processors.

> My plan is to implement most implementations at AST level and only
> optimizer jumps at "bytecode" level.

Sure, you've already explained (and Serhiy recently re-explained) that most of what the peephole optimizer does, and much of what you'd want it to do but it doesn't yet do, can be done at the AST level. But there will still be a few things that have to be done with bytecode. (Also, consider that things like decorators have to work on bytecode. And that, while your const optimization can be done at the AST level if we add a new AST node, that wouldn't help for anyone who had the same idea and just wanted to experiment with it, or deploy it locally, etc., without patching the compiler and the ast module.)

> Do you know if byteplay and other projects have a similar design?

The design is pretty well documented. But it's basically a label-based design. Each original jump target is given a label, and you can add new labels anywhere (or give them nice names). At the end, when you convert a byteplay.Code object back to a code object, it maps those labels to offsets.

> It
> would be nice to find the root common structure to be a generic API.

I don't know if there's anything common between labels, blocks, generalized trees, ... that's worth capturing here. We just want to find the simplest structure that works.

>> Flattening it would be trivial. Or, if that's too expensive, we could do something almost as simple and much cheaper: convert it in-line to a deque-like linked list of arrays, with jump targets being indices or pointers into that. Or we could just expose the list of blocks as-is, as an opaque thing with a mutable-deque-of-instructions API around it.
> 
> For the PEP 511 we can imagine multiple levels for code tranformers:
> block of instructions, flatten bytecode, AST, etc.

I don't think anyone will ever want to write transformers at all possible levels. If you can access source bytes, source text, token stream, AST, and some friendlier version of bytecode, when would you want to access the ASDL, or the unfriendly bytecode, or any other stage in the process?

> The cost of the transformation is to convert internal from CPython
> objects to Python objects, and then convert the result of the code
> transformer back to internal CPython objects.

For bytecode, the compiler is already creating Python bytes and list objects to pass to the optimizer. (The AST is another story.)