[Python-ideas] PEP 511: API for code transformers

Fri Jan 15 17:57:09 EST 2016

Sent from my iPhone
> On Jan 15, 2016, at 15:14, Victor Stinner <victor.stinner at gmail.com> wrote:
> 
> Wow, giant emails (as mine, ok).

Well, this is a big idea, so it needs a big breakfast. I  mean a big email. :) But fortunately, you had great answers to most of my points, which means I can snip them out of this reply and make it not quite as giant.
> 
> 2016-01-15 20:41 GMT+01:00 Andrew Barnert <abarnert at yahoo.com>:
>> * You can register transformers in any order, and they're run in the order specified, first all the AST transformers, then all the code transformers. That's very weird; it seems like it would be conceptually simpler to have a list of AST transformers, then a separate list of code transformers.
> 
> The goal is to have a short optimizer tag. I'm not sure yet that it
> makes sense, but I would like to be able to transform AST and bytecode
> in a single code transformer.

But that doesn't work as soon as there are even two of them: the bytecode #0 no longer runs after ast #0, but after ast #1; similarly, bytecode #1 no longer runs after ast #1, but after bytecode #0. So, it seems like whatever benefits you get by keeping them coupled will be illusory.

> I prefer to add a single get/set
> function to sys, instead of two (4 new functions).

That's a good point. (I suppose you could have a pair of get/set functions that each set multiple lists instead of one, but that isn't really any simpler than multiple get/set functions...)

>> It seems like the only advantage to require attaching them to a class is to associate each one with a name
> 
> I started with a function, but it's a little bit weird to set a name
> attribute to a function (func.name = "fat").

It looks a lot less weird with a decorator `@transform('fat')` that sets it for you.

> Moreover, it's convenient
> to store some data in the object. In fatoptimizer, I store the
> configuration. Even in the most simple AST transformer example of the
> PEP, the constructor creates an object:
> https://www.python.org/dev/peps/pep-0511/#id1
> 
> It may be possible to use functions, but classes are just more
> "natural" in Python.

In general, sure. But for data that isn't accessible from outside, and only needs to be used in a single call, a simple function (with the option of a wrapping data in a closure) can be simpler. That's why so many decorators are functions that return a closure, not classes that build an object with a __call__ method.

But more specifically to this case, after looking over your examples, maybe the class makes sense here.

>> * There are other reasons to write AST and bytecode transformations besides optimization. MacroPy, which you mentioned, is an obvious example. But also, playing with new ideas for Python is a lot easier if you can do most of it with a simple hook that only makes you deal with the level you care about, rather than hacking up everything from the grammar to the interpreter. So, that's an additional benefit you might want to mention in your proposal.
> 
> I wrote "A preprocessor has various and different usages." Maybe I can
> elaborate :-)

Sure. It's just a matter of emphasis, and whether more of it would help sell your idea or not. From the other big reply you got, maybe it would even hurt selling it... So, your call.

> It looks like it is possible to "implement" f-string (PEP 498) using
> macros. I think that it's a good example of experimenting evolutions
> of the language (without having to modify the C code which is much
> more complex, Yury Selivanov may want to share his experience here for
> this async/await PEP ;-)).

I did an experiment last year where I tried to add the same feature two ways (Haskell-style operator partials, so you can write `(* 2)` instead of `lambda x: x * 2)` or `rpartial(mul, 2)` or whatever). First, I did all the steps to add it "for real", from the grammar through to the code generator. Second, I added a quick grammar hack to create a noop AST node, then did everything else in Python with an import hook--preprocessor the text to get the noop nodes, then preprocessing the AST to turn those into nodes that do the intended semantics. As you might expect, the second version took a lot less time, required debugging a lot fewer segfaults, etc. and if your proposal removed the need for the import hook, it would be even simpler (and cleaner, too).

>> * It might be useful to have an API that handled bytes and text (and tokens, but that requires refactoring the token stream API, which is a separate project) as well as AST and bytecode.
>> (...)
>> Is there a reason you can't add text_transformer as well?
> 
> I don't know this part of the compiler.
> 
> Does Python already has an API to manipulate tokens, etc.? What about
> other Python implementations?

Well, Python does have an API to manipulate tokens, but it involves manually tokenizing the text, modifying the token stream, untokenizing it back to text, and then parsing and compiling the result, which is far from ideal. (In fact, in some cases you even need to encode back to bytes.) There's an open enhancement issue to make it easier to write token processors.

But don't worry about that part for now. A text preprocessor step should be very easy to add, and useful on its own (and it opens the door for adding a token preprocessor between text and AST in the future when that becomes feasible).

I also mentioned a bytes preprocessor, which could munge the bytes before the decoding to text. But that seems a lot less useful. (Maybe if you needed an alternative to the coding-declaration syntax for some reason?) I only included it because it's another layer you can hook in an import hook today, so it seems like if it is left out, that should be an intentional decision, not just something nobody thought about.

> I proposed AST transformers because it's already commonly used in the wild.

Text preprocessors are also used in the wild. IIRC, Guido mentioned having written one that turns Python 3-style annotations into something that compiles as legal Python 2.7 (although he later abandoned it, because it turned out to be too hard to integrate with their other Python 2 tools).

(Token preprocessors are not used much I n the wild, because it's painful to write them, nor are bytes preprocessors, because they're not that useful.)

> The Hy language uses its own parser and emits Python AST. Why not
> using this design?

By the same token, why not use your own code generator and emit Python bytecode, instead of just preprocessing ASTs?

If you're making a radical change, that makes sense. But for most uses, where you only want to make a small change on top of the normal processing, it makes a lot more sense to just hook the normal processing than to completely reproduce everything it does.

>> Even if that's out of scope, a paragraph explaining how to use byteplay with a code_transformer, and why it isn't integrated into the proposal, might be helpful.
> 
> byteplay doesn't seem to be maintained anymore. Last commit in 2010...

There's a byteplay3 fork, which is maintained. But it doesn't support 3.5 yet. (As I mentioned, it's usually a few months to a few years behind each new Python release. Which is one reason integrating parts of it into the core might be nice. The dis module changes in 3.4 were basically integrating part of byteplay, and that part has paid off--the code in dis is automatically up to date with the compiler. There may be more you could do here. But probably it's out of scope for your project.)

> IHMO you can do the same than byteplay on the AST with much simpler
> code.

If that's really true, then you shouldn't include code_transformers in the PEP at all. You're just making things more complicated, in multiple ways, to enable a feature you don't think anyone will ever need.

However, based on my own experience, I think code transformers _are_ sometimes useful, but they usually require something like byteplay. Even just something as simple as removing an unnecessary jump instruction requires reordering the arguments of every other jump; something like merging two finally blocks would be a nightmare to do manually.

>> * One thing I've always wanted is a way to write decorators that transform at the AST level. But code objects only have bytecode and source;
> 
> You should take a look at MacroPy,

Yes, I love MacroPy. But it doesn't provide the functionality I'm asking about here. (It _might_ be possible to write a macro that stores the AST on each function object; I haven't tried.)

Anyway, the reason I bring it up is that it's trivial to write a decorator that byteplay-hacks a function after compilation, and not much harder to write one that text-hacks the source and recompiles it, but taking the AST and recompiling it is more painful. Since your proposal is about making similar things easier in other cases, it could be nice to do that here as well. But, as I said at the top, I realize some of these ideas are out of scope; some of them are more about getting a definite "yeah, that might be cool but it's out of scope" as opposed to not knowing whether it had even been considered.  

> Modifying and recompiling the code at runtime (using AST, something
> higher level than bytecode) sounds like a Lisp feature and like JIT
> compiler, two cool stuff ;)

Well, part of the point of Lisp is that there is only one step--effectively, your source bytes are your AST. Python has to decode, tokenize, and parse to get to the AST. But being able to start there instead of repeating that work would give us the best of both worlds (as easy to do stuff as Lisp, but as readable as Python).