[Python-ideas] PEP 511: Add a check function to decide if a "language extension" code transformer should be used or not

Wed Jan 27 10:39:10 EST 2016

Hi,

Thank you for all feedback on my PEP 511. It looks like the current
blocker point is the unclear status of "language extensions": code
tranformers which deliberately changes the Python semantics. I would
like to discuss how we should register them. I think that the PEP 511
must discuss "language extensions" even if it doesn't have to propose
a solution to make their usage easier. It's an obvious usage of code
transformers. If possible, I would like to find a compromise to
support them, but make it explicit that they change the Python
semantics.

By the way, I discussed with Joseph Jevnik who wrote codetransformer
(bytecode transformer) and lazy_python (AST transformer). He wrote me:

"One concern that I have though is that transformers are registered
globally. I think that the decorators in codetransformer do a good job
of signalling to reader the scope of some new code generation."

Currently, the PEP 511 doesn't provide a way to register a code
transformer but only use it under some conditions. For example, if
fatoptimizer is registered, all .pyc files will be called
file.cpython-36.fat-0.pyc even if fatoptimizer was disabled.

I propose to change the design of sys.set_code_transformers() to use
it more like a registry similar to the codecs registry
(codecs.register), but different (details below). A difference is that
the codecs registry uses a mapping (codec name => codec functions),
whereas sys.set_code_transformers() uses an ordered sequence (list) of
code transformers. A sequence is used because multiple code
transformers can be applied sequentially on a single .py file.

Petr Viktorin wrote that language extensions "target specific modules,
with which they're closely coupled: The modules won't run without the
transformer. And with other modules, the transformer either does
nothing (as with MacroPy, hopefully), or would fail altogether (as
with Hy). So, they would benefit from specific packages opting in. The
effects of enabling them globally range from inefficiency (MacroPy) to
failures or needing workarounds (Hy)."

Problem (A): solutions proposed below don't make code tranformers
mandatory. If a code *requires* a code transformer and the code
transformer is not registered, Python doesn't complain. Do you think
that it is a real issue in practice? For MacroPy, it's not a problem
in practice since functions must be decorated using a decorator from
the macropy package. If importing macropy fails, the module cannot be
imported.

Problem (B): proposed solutions below adds markers to ask to enable a
specific code transformer, but a code transformer can decide to always
modify the Python semantics without using such marker. According to
Nick Coghlan, code transformers changing the Python semantics *must*
require a marker in the code using them. IMHO it's the responsability
of the author of the code transformer to use markers, not the
responsability of Python.

Code transformers should maybe return a flag telling if they changed
the code or not. I prefer a flag rather than comparing the output to
the input, since the comparison can be expensive, especially for a
deep AST tree. Example:

class Optimizer:
    def ast_optimizer(self, tree, context):
        # ...
        return modified, tree

*modified* must be True if tree was modified.

There are several options to decide if a code transformer must be used
on a specific source file.

(1) Add a check_code() and check_ast() functions to code transformers.
The code transformer is responsible to decide if it wants to transform
the code or not. Python doesn't use the code transformer if the check
method returns False.

Examples:

* MacroPy can search for the "import macropy" statement (of "from
macropy import ...") in the AST tree
* fatoptimizer can search for "__fatoptimizer__ = {'enabled': False}"
in the code: if this variable is found, the optimizer is completly
skipped

(2) Petr proposed to extend importlib to pass a code transformer when
importing a module.

    importlib.util.import_with_transformer(
        'mypackage.specialmodule', MyTransformer())

IMHO this option is too specific: it's restricted to importlib
(py_compile, compileall and interactive interpreter don't have the
feature). I also dislike the API.

(3) Petr also proposed "a special flag in packages":

    __transformers_for_submodules__ = [MyTransformer()]

I don't like having to get access to MyTransformer. The PEP 511
mentions an use case where the transformed code is run *without*
registering the transformer. But this issue can easily be fixed by
using the string to identify the transformer in the registery (ex:
"fat") rather than its class.

I'm not sure that putting a flag on the package (package/__init__.py?)
is a good idea. I would prefer to enable language extensions on
individual files to restrict their scope.

(4) Sjoerd Job Postmus proposed something similar but using a comment
and not for packages, but any source file:

    #:Transformers modname.TransformerClassName,
modname.OtherTransformerClassName

The problem is that comments are not stored in the AST tree. I would
prefer to use AST to decide if an AST transformer should be used or
not.

Note: I'm not really motived to extend the AST to start to include
comments, or even code formatting (spaces, newlines, etc.).
https://pypi.python.org/pypi/redbaron/ can be used if you want to
transform a .py file without touching the format. But I don't think
that AST must go to this direction. I prefer to keep AST simple.

(5) Nick proposed (indirectly) to use a different filename (don't use
".py") for language extensions.

This option works with my option (2): the context contains the
filename which can be used to decide to enable or not the code
transformer.

I understand that the code transformer must also install an importlib
hook to search for other filenames than only .py files. Am I right?

(6) Nick proposed (indirectly) to use an encoding cookie "which are
visible as a comment in the module header".

Again, I dislike this option because comments are not stored in AST.

Victor