[Python-Dev] PEP 563: Postponed Evaluation of Annotations

Mon Nov 6 00:55:12 EST 2017

On 6 November 2017 at 14:40, Lukasz Langa <lukasz at langa.pl> wrote:
> On 4 Nov, 2017, at 6:32 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> The only workaround I can see for that breakage is that instead of
>> using strings, we could instead define a new "thunk" type that
>> consists of two things:
>
>> 1. A code object to be run with eval()
>> 2. A dictionary mapping from variable names to closure cells (or None
>> for not yet resolved references to globals and builtins)
>
> This is intriguing.
>
> 1. Would that only be used for type annotations? Any other interesting
> things we could do with them?

Yes, they'd have the potential to replace strings for at least some
data analysis use cases, where passing in lambdas is too awkward
syntactically, since you have to spell out all the parameters.

The pandas.DataFrame.query operation is a reasonable example of that
kind of thing: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html
(Not an exact example, since Pandas uses a python-like expression
language, rather than specifically Python)

Right now, folks tend to use strings for this kind of use case, which
has the same performance problem that pre-f-string string formatting
does: it defers the expression parsing and compilation step until
runtime, rather than being able to do it once and then cache the
result in __pycache__.

> 2. It feels to me like that would make annotations *heavier* at runtime
> instead of leaner, since now we're forcing the relevant closures to stay in
> memory.

Cells are pretty cheap (they're just a couple of pointers), and if
they're references to module or class attributes, the object
referenced by the cell would have remained alive regardless.

Even for nonlocal variable references (which a solely string-based
approach would disallow), the referenced objects will already be
getting kept alive anyway by way of the typing machinery.

> 3. This form of lazy evaluation seems pretty implicit to me for the reader.
> Peter Ludemann's example of magic logging.debug() is a case in point here.

One of the biggest advantages though is that just like functions, all
of the necessary logic for doing the delayed evaluation can be
captured in a __call__ method, rather than via elaborate instructions
on how to appropriately invoke eval() based on knowledge of where the
annotation came from.

This is especially important if typing gets taken out of the standard
library: you'll need a replacement for typing.get_type_hints() in PEP
563, and a thunk.__call__() method would be a good spelling for that.

> All in all, unless somebody else is ready to step up and write the PEP on
> this subject (and its implementation) right now, I think this idea will miss
> Python 3.7.

As long as we don't argue for that being an adequate excuse to rush
into "We're using plain strings with ill-defined name resolution
semantics because we couldn't be bothered coming up with a proper
thunk-based design to evaluate", I'd be fine with that. None of this
is urgent, and it's mainly of interest to large organisations that
will see a direct economic benefit from implementing it, so the entire
idea can easily be delayed to 3.8 if they're not prepared to fund a
proper evaluation of the available design options over the next 3
months.

Python's name resolution rules are already ridiculously complicated,
and PEP 563 is proposing to make them *even worse*, purely for the
sake of an optional feature primarily of interest to large enterprise
users. If delayed evaluation of type annotations is deemed important
enough to burden every future Pythonista with learning a second set of
name resolution semantics purely for type annotations, then it's
important enough to postpone implementing it until someone invests the
time in coming up with a competing thunk-based alternative that is
able to rely entirely on the *existing* name resolution semantics.

Exploring that potential thunk-based approach a bit further:

1. We'd continue to eagerly compile annotations (as we do today), but
treat them like a nested class body with a single expression. Unlike
an implicit lambda, this compilation mode will allow the resulting
code object to be used with the two-argument form of the exec builtin
2. That code object would be the main item stored on the thunk object
3. If __classcell__ is defined in the current namespace and names from
the current namespace are referenced, then that can be captured on the
thunk, giving its __call__ method access to any class attributes
needed for name resolution
4. Closure references would be captured automatically, but class
bodies already allow locals to override nonlocals (for compatibility
with pre-populated namespaces returned from __prepare__)
5. A thunk's __globals__ reference would be implicitly captured the
same way it is for a regular function

That's enough to leave nested classes as the main problematic case,
since they can't see each other's attributes by default, and the
proposed injected locals semantics in PEP 563 don't get this right
either (they only account for MRO-based name resolution, not lexical
nesting, even though the PEP claims the latter is expected to work)

To illustrate the problem:

```
>>> class C:
...     field = 1
...     class D:
...         field2 = field
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in C
  File "<stdin>", line 4, in D
NameError: name 'field' is not defined
```

The __class__ ref used for zero-arg super support doesn't currently
solve this problem, as right now, it only extends a single level - the
inner class definition hides the outer one from method implementations
(and deliberately so).

There are two main ways of resolving this, with the simplest being to
say that type annotations still need to be resolvable using normal
closure semantics. That is, the nested class example in the PEP would
be changed as follows:

```
# C is defined at module or function scope, not inside another class
class C:
    field = 'c_field'

    def method(self, arg: field) -> None:  # this is OK
        ...

    def method2(self, arg: C.field) -> None:  # this is OK
        ...

    class D:
        field2 = 'd_field'
        def method(self, arg: C.field) -> C.D.field2:  # this is OK
            ...

        def method2(self, arg: C.field) -> field2:  # this is OK
            ...

        def method3(self, arg: field) -> field2:  # this fails (can't
find 'field')
            ...

        def method4(self, arg: C.field) -> D.field2:  # this fails
(can't find 'D')
            ...

```

This means the compiler needs to be involved at least enough to
capture references to classes that aren't defined at the top level of
a module.

If you *don't* use existing closure semantics to solve it, then you'd
instead need to either update the compiler to capture a stack of
__class__ references, or else reverse engineer something based on
__qualname__. However, the latter approach wouldn't work for classes
defined inside a function (since there's no navigation path from the
module namespace back down to the individual classes - you *need* a
cell reference in that case).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia