[Python-ideas] If branch merging

Wed Jun 10 05:27:38 CEST 2015

On Wed, Jun 10, 2015 at 11:58 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
>> How hard would it be to hack the bytecode compiler to treat two names
>> as distinct despite appearing the same?
>
> Here's a quick&dirty idea that might work: Basically, just gensyn a name like .0 for the second e (as is done for comprehensions), compile as normal, then rename the .0 back to e in the code attributes.
>

That's something like what I was thinking of, yeah.

> The problem is how to make this interact with all kinds of other stuff. What if someone calls locals()?

Ow, that one I have no idea about. Hmm. That could be majorly
problematic; if you call locals() inside the inner scope, and then use
that dictionary outside it, you should expect it to work. This would
be hard.

> What if the outer e was nonlocal or global?

The inner e will always get its magic name, and it doesn't matter what
the outer e is. That's exactly the same as would happen if there were
no shadowing:

>>> def f(x):
...     global e
...     try: 1/x
...     except ZeroDivisionError as e: pass
...     return e**x
...
>>> e=2.718281828
>>> f(3)
20.085536913011932
>>> f(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in f
NameError: name 'e' is not defined

If x is nonzero, the except clause doesn't happen, and no shadowing
happens. With this theory, the same would happen if x is zero - the
"as e" would effectively be "as <e.0>" or whatever the magic name is,
and then "e**x" would use the global e.

It would have to be an error to use a global or nonlocal statement
*inside* the as-governed block:

def f(x):
    try:
        whatever
    except Exception as e:
        global e # SyntaxError

I can't imagine that this would be a problem to anyone. The rule is
that "as X" makes X into a statement-local name, and that's
incompatible with a global declaration.

> What if either e is referenced by an inner function?

I don't know about internals and how hard it'd be, but I would expect
that the as-name propagation should continue into the function. A
quick check with dis.dis() suggests that CPython uses a
LOAD_DEREF/STORE_DEREF bytecode to work with nonlocals, so that one
might have to become scope-aware too. (It would be based on
definition, not call, so it should be able to be compiled in somehow,
but I can't say for sure.)

> What if another statement re-rebinds e inside the first statement?

As in, something like this?

def f(x):
    e = 2.718
    try: 1/0
    except Exception as e:
        e = 1
    print(e)

The "e = 1" would assign to <e.0>, because it's in a scope where the
local name e translates into that. Any use of that name, whether
rebinding or referencing, will use the inner scope. But I would expect
this sort of thing to be unusual.

> What if you do this inside a class (or at top level)?

At top level, it would presumably have to create another global. If
you call a function from inside that block, it won't see your
semi-local, though I'm not sure what happens if you _define_ a
function inside a block like that:

with open("spam.log", "a") as logfile:
    def log(x):
        logfile.write(x)

Given that this example wouldn't work anyway (the file would get
closed before the function gets called), and I can't think of any
non-trivial examples where you'd actually want this, I can't call what
ought to happen.

> I think for a quick hack to play with this, you don't have to worry about any of those issues; just say that's illegal, and whatever happens (even a segfault) is your own fault for trying it. But for a real implementation, I'm not even sure what the rules should be, much less how to implement them.
>

Sure, for a quick-and-dirty. I think some will be illegal long-term too.

> (I'm guessing the implementation could either involve having a stack of symbol tables, or tagging things at the AST level while we've still got a tree and using that info in the last step, but I think there's still a problem telling the machinery how to set up closure cells to link inner functions' free variables.)
>

I have no idea about the CPython internals, but my broad thinking is
something like this: You start with an empty stack, and add to it
whenever you hit an "as" clause. Whenever you look up a name, you
proceed through the stack from newest to oldest; if you find the name,
you use the mangled name from that stack entry. Otherwise, you use the
same handling as current.

> Also, all of this assumes that none of the machinery, even for tracebacks and debugging, cares about the name of the variable, just its index. Is that true?
>

I'm not entirely sure, but I think that tracebacks etc will start with
the index and then look it up. Having duplicate names in co_varnames
would allow them to look correct. Can someone confirm?

>> Example:
>>
>> def f(x):
>>    e = 2.718281828
>>    try:
>>        return e/x
>>    except ZeroDivisionError as e:
>>        raise ContrivedCodeException from e
>>
>> Currently, f.__code__.co_varnames is ('x', 'e'), and all the
>> references to e are working with slot 1; imagine if, instead,
>> co_varnames were ('x', 'e', 'e') and the last two lines used slot 2
>> instead. Then the final act of the except clause would be to unbind
>> its local name e (slot 2),
>> and then any code after the except block
>> would use slot 1 for e, and the original value would "reappear".
>
> I don't think that "unbind" is a real step that needs to happen. The names have to get mapped to slot numbers at compile time anyway, so if all code outside of the except clause was compiled to LOAD_FAST 1 instead of LOAD_FAST 2, it doesn't matter that slot 2 has the same name. The only thing you need to do is the existing implicit "del e" on slot 2. (If you somehow managed to do another LOAD_FAST 2 after that, it would just be an UnboundLocalError, which is fine. But no code outside the except clause can compile to that anyway, unless there's a bug in your idea of its implementation or someone does some byteplay stuff).
>

The unbind is there to prevent a reference loop from causing problems.
And yes, it's effectively the implicit "del e" on slot 2.

ChrisA