[Python-ideas] History on proposals for Macros?

Mon Mar 30 02:36:00 CEST 2015

> On Mar 29, 2015, at 10:21, Ron Adam <ron3200 at gmail.com> wrote:
> 
> 
> 
>> On 03/28/2015 09:51 PM, Steven D'Aprano wrote:
>> But we might be able to rescue this proposal by dropping the requirement
>> that the compiler knows when to pass the syntax tree and when to
>> evaluate it. Suppose instead we had a lightweight syntax for generating
>> the AST plus grabbing the current context:
>> 
>>     x = 23
>>     spam(x + 1, !(x+1))  #  macro syntax !( ... )
>> 
>> 
>> Now the programmer is responsible for deciding when to use an AST and
>> when to evaluate it, not the compiler, and "macros" become regular
>> functions which just happen to expect an AST as their argument.
> 
> Something related to this that I've wanted to experiment with, but is hard to do in python to be able to split a function signature and body, and be able to use them independently. (But in a well defined way.)

Almost everything you're asking for is already there.

A function object contains, among other things, a sequence of closure cells, a local and global environment, default parameter values, and a code object.

A code object contains, among other things, parameter names, a count of locals, and a bytecode string.

You can see the attributes of these objects at runtime, and the inspect module docs describe what they mean. You can also construct these objects at runtime by using their constructors (you have to use types.FunctionType and types.CodeType; the built-in help can show you the parameters).

You can also compile source code (or an AST) to a code object with the compile function.

You can call a code object with the exec function, which takes a namespace (or, optionally, separate local and global namespaces--and, in a slightly hacky way, you can also override the builtin namespace).

There are also Signature objects in the inspect module, but they're not "live" usable objects, they're nicely-organized-for-human-use representations of the signature of a function. So practically you'd use a dummy function object or just a dict or something, and create a new function from its attributes/members and the new code.

So, except for minor spelling differences, that's exactly what you're asking for, and it's already there.

But if I've guessed right about _why_ you want this, it doesn't do what you'd want it to, and I don't think there's any way it could.

Bytecode accesses locals (including arguments), constants, and closure cells by index from the frame object, not by name from a locals dict (although the frame has one of those as well, in case you want to debug or introspect, or call locals()). So, when you call a function, Python sets up the frame object, matching positional and keyword arguments (and default values in the function object) up to parameters and building up the sequence of locals. The frame is also essential for returning and uncaught exceptions (it has a back pointer to the calling frame).

The big thing you can't do directly is to create new closure cells programmatically from Python. The compiler has to know which of your locals will be used as closure variables by any embedded functions; it then stores these specially within your code object, so the MAKE_CLOSURE bytecode that creates a function object out of each embedded function can create matching closure cells to store in the embedded function object. This is the part that you need to add into what Python already has, and I'm not sure there's a clean way to do it.

But you really should learn how all the existing stuff works (the inspect docs, the dis module, and the help for the constructors in the types module are actually sufficient for this, without having to read the C source code, in 3.4 and later) and find out for yourself, because if I'm wrong, you may come up with something cool. (Plus, it's fun and useful to learn about.)

That only gets you through the first half of your message--enough to make inc_x work as a local function (well, almost--without a "nonlocal x" statement it's going to compile x as a local variable rather than a closure variable, and however you execute it, you're just going to get UnboundLocalError).

What about the second part, where you execute code in an existing frame?

That's even trickier. A frame already has its complete lists of locals, cells, and constants at construction time. If your code objects never used any new locals or constants, and never touched any nonlocal variables that weren't already touched by the calling function, all you need is some kind of "relocation" step that remaps the indices compiled into the bytecode into the indices in the calling frame (there's enough info in the code objects to do the mapping; for actually creating the relocated bytecode from the original you'll want something like the byteplay module, which unfortunately doesn't exist for 3.4--although I have an incomplete port that might be good enough to play with if you're interested).

You can almost get away with the "no new locals or nonlocal cells" part, but "no new constants" is pretty restrictive. For example, if you compile inc_x into a fragment that can be executed inline, the number 1 is going to be constant #0 in its code object. And now, you try to "relocate" it to run in a frame with a different code object, and (unless that different code object happened to refer to 1 as a constant as well) there's nothing to match it up to.

And again, I don't see a way around this without an even more drastic rearchitecting of how Python frames work--but again, I think it's worth looking for yourself in hopes that I'm wrong.

There's another problem: every function body compiler to code ends with a return bytecode. If you didn't write one explicitly, you get the equivalent of "return None". That means that, unless you solve that in some way, executing a fragment inline is always going to return from the calling function. And what do you want to do about explicit return? Or break and continue?

Of the two approaches, I think the first one seems cleaner. If you can make a closure cell out of x and then wrap inc_x's code in a normal closure that references it, that still feels like Python. (And having to make "nonlocal x" explicit seems like a good thing, not a limitation.) Fixing up fragments to run in a different frame, and modifying frames at runtime to allow them to be fixed up, seems a lot hackier. And the whole return issue is pretty serious, too.

One last possibility to consider is something between the two: a different kind of object, defined differently, like a proc in Ruby (which is defined with a block rather than a def) might solve some of the problems with either approach. And stealing from Ruby again, procs have their own "mini-frames"; there's a two-level stack where every function stack frame has a proc stack frame, which allows a solution to the return-value problem that wouldn't be available with either closures or fragments. (However, note that the return-value problem is much more serious in Ruby, where everything is supposed to be an expression, with a value; in Python you can just say "fragment calls are statements, so they don't have values" if that's what you want.)

One last note inline:

> A signature object could have a default body that returns the closure.
> 
> And a body (or code) could have a default signature that *takes* a namespace.
> 
> 
> Then a function becomes ...
> 
>     code(sig(...))    <--->   function(...)
> 
> 
> 
> The separate parts could be created with a decorator.
> 
>    @signature
>    def sig_x(x): pass
> 
>    @code
>    def inc_x(): x += 1
> 
>    @code
>    def dec_x(): x -= 1
> 
> 
> In most cases it's best to think of applying code bodies to names spaces.
> 
>    names = sig_x(0)
>    inc_x(names)
>    dec_x(names)
> 
> That is nicer than continuations as each code block is a well defined unit that executes to completion and doesn't require suspending the frame.
> 
> (Yes, it can be done with dictionaries, but that wouldn't give the macro like functionality (see below) this would.  And there may be other benifits to having it at a lower more efficient level.)
> 
> 
> To allow macro like ability a code block needs to be executable in the current scope.  That can be done just by doing...
> 
>    code(locals())      #  Dependable?
> 
> 
> And sugar to do that could be...
> 
>    if x < 10:
>       ^^ inc_x   #just an example syntax.
>    else:
>       ^^ dec_x   # Note the ^^ looks like the M in Macro. ;-)
> 
> 
> Possibly the decorators could be used with lambda directly to get inline functionality.
> 
>    code(lambda : x + 1)

This is a very different thing from what you were doing above. A function that modifies a closure cell's value, like inc_x, can't be written as a lambda (because assignments are statements). And this lambda is completely pointless if you're going to use it in a context where you ignore its return value (like the way you used inc_x above). So, I'm not sure what you're trying to do here, but I think you may have another problem to solve on top of the ones I already mentioned.

> And a bit of sugar to shorten the common uses if needed.
> 
>    spam(x + 1, code(lambda : x + 1))
> 
>    spam(x + 1, ^^: x + 1)
> 
> 
> Cheers,
>   Ron
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/