[Python-Dev] advice needed: best approach to enabling "metamodules"?

Nathaniel Smith njs at pobox.com
Sat Nov 29 20:37:41 CET 2014


On Sat, Nov 29, 2014 at 4:21 AM, Guido van Rossum <guido at python.org> wrote:
> Are these really all our options? All of them sound like hacks, none of them
> sound like anything the language (or even the CPython implementation) should
> sanction. Have I missed the discussion where the use cases and constraints
> were analyzed and all other approaches were rejected? (I might have some
> half-baked ideas, but I feel I should read up on the past discussion first,
> and they are probably more fit for python-ideas than for python-dev. Plus
> I'm just writing this email because I'm procrastinating on the type hinting
> PEP. :-)

The previous discussions I was referring to are here:
  http://thread.gmane.org/gmane.comp.python.ideas/29487/focus=29555
  http://thread.gmane.org/gmane.comp.python.ideas/29788

There might well be other options; these are just the best ones I
could think of :-). The constraints are pretty tight, though:
- The "new module" object (whatever it is) should have a __dict__ that
aliases the original module globals(). I can elaborate on this if my
original email wasn't enough, but hopefully it's obvious that making
two copies of the same namespace and then trying to keep them in sync
at the very least smells bad :-).
- The "new module" object has to be a subtype of ModuleType, b/c there
are lots of places that do isinstance(x, ModuleType) checks (notably
-- but not only -- reload()). Since a major goal here is to make it
possible to do cleaner deprecations, it would be really unfortunate if
switching an existing package to use the metamodule support itself
broke things :-).
- Lookups in the normal case should have no additional performance
overhead, because module lookups are extremely extremely common. (So
this rules out dict proxies and tricks like that -- we really need
'new_module.__dict__ is globals()' to be true.)

AFAICT there are three logically possible strategies for satisfying
that first constraint:
(a) convert the original module object into the type we want, in-place
(b) create a new module object that acts like the original module object
(c) somehow arrange for our special type to be used from the start

My options 1 and 2 are means of accomplishing (a), and my options 3
and 4 are means of accomplishing (b) while working around the
behavioural quirks of module objects (as required by the second
constraint).

The python-ideas thread did also consider several methods of
implementing strategy (c), but they're messy enough that I left them
out here. The problem is that somehow we have to execute code to
create the new subtype *before* we have an entry in sys.modules for
the package that contains the code for the subtype. So one option
would be to add a new rule, that if a file pkgname/__new__.py exists,
then this is executed first and is required to set up
sys.modules["pkgname"] before we exec pkgname/__init__.py. So
pkgname/__new__.py might look like:

    import sys
    from pkgname._metamodule import MyModuleSubtype
    sys.modules[__name__] = MyModuleSubtype(__name__, docstring)

This runs into a lot of problems though. To start with, the 'from
pkgname._metamodule ...' line is an infinite loop, b/c this is the
code used to create sys.modules["pkgname"]. It's not clear where the
globals dict for executing __new__.py comes from (who defines
__name__? Currently that's done by ModuleType.__init__). It only works
for packages, not modules. The need to provide the docstring here,
before __init__.py is even read, is weird. It adds extra stat() calls
to every package lookup. And, the biggest showstopper IMHO: AFAICT
it's impossible to write a polyfill to support this code on old python
versions, so it's useless to any package which needs to keep
compatibility with 2.7 (or even 3.4). Sure, you can backport the whole
import system like importlib2, but telling everyone that they need to
replace every 'import numpy' with 'import importlib2; import numpy' is
a total non-starter.

So, yeah, those 4 options are really the only plausible ones I know of.

Option 1 and option 3 are pretty nice at the language level! Most
Python objects allow assignment to __class__ and __dict__, and both
PyPy and Jython at least do support __class__ assignment. Really the
only downside with Option 1 is that actually implementing it requires
attention from someone with deep knowledge of typeobject.c.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org


More information about the Python-Dev mailing list