[Python-ideas] metamodules (was: Re: Idea to support lazy loaded names.)

Thu Oct 9 05:24:21 CEST 2014

On Tue, Oct 7, 2014 at 5:12 AM, Andrew Barnert
<abarnert at yahoo.com.dmarc.invalid> wrote:
> On Oct 6, 2014, at 20:51, Terry Reedy <tjreedy at udel.edu> wrote:
>
>> So a 'solution' might be to make modules be instances (but with no __new__ or __init__) of a module metaclass, so that module dicts could act like class dicts with respect to descriptors.  I have no idea how much code this would break ;-).
>
> Didn't we just have this discussion a few weeks ago, in the context of making lazy loading of subpackages easier to implement?

Yeah, and having had some time to think about that discussion and do
some prototyping, I'm going to argue below that allowing assignment to
module instances'  __class__ really is the best path forward.

(For those who find the below TLDR, check this out instead:
https://github.com/njsmith/metamodule)

> IIRC, the not-obviously-unreasonable options suggested were:

Great list! I've rearranged a bit to make my argument clearer.

> 1) Analogy with __new__: For packages only, if there's a __new__.py, that gets executed first. If it "returns" (not sure how that was defined) an instance of a subclass of ModuleType, that instance is used to run __init__.py instead of a normal module instance.

This is very similar to the current approach of having __init__.py
reassign sys.modules[__name__]. The advantages are:
- It gives a way to enforce the rule that you have to do this
assignment as the very first thing inside your module, before allowing
arbitrary code to run (e.g. by importing other modules which might
recursively import your module in turn, and access
sys.modules[__name__] before you've modified it).
- Because __new__.py would run *before* the It avoids the headache of
having to juggle two module objects, one of whose __dict__'s is
already being used as the execution environment for the code that is
trying to do the switcheroo.

But:
- It's a pretty complicated way to accomplish the stated goals.
- The restriction to packages is unfortunate.
- The backcompat story is terrible -- faking __new__.py support in old
versions of python would be really difficult, and the main reason I
care about this stuff in the first place is because I want to be able
to e.g. deprecate module attributes that are in the public API of old,
widely-used packages. It will be many years before such packages can
require 3.5.

So I think we should probably try to make the existing
sys.modules[__name__] = ... strategy work before considering this.

> 2) Analogy with metaclass= (or with 2.x __metaclass__): If a module (or a package's __init__.py) does some new syntax or magic comment before any non-comment code, it can specify a custom type in place of ModuleType (not sure how that type gets imported and made available).

I don't see any way to solve this import problem you refer to at the
end -- in most cases the code implementing the metamodule type will be
defined *inside* the module/package which wants to use the metamodule,
so we have a chicken-and-egg problem.

> 4) Make it easier to write import hooks for this special purpose.

This has the same problem as the previous -- who imports the importer?

> 5) Make it easier for a module to construct a ModuleType-subclass instance with a copy of the same dict and replace itself in sys.modules.

So, trying to *copy* the dict is just not going to work. Consider the
package foo, with a foo/__init__.py that looks like:

orig_dict = sys.modules[__name__].__dict__
sys.modules[__name__] = MyModule(__name__, __doc__)
a = 1
from .submod import b
c = 3
sys.modules[__name__].__dict__.update(orig_dict)

and where foo/submod.py looks like:

import foo
b = foo.a + 1
def c():
    return foo.a + 2

This won't work, because at the time we import .submod,
sys.modules["foo"].__dict__ does not contain an entry for "a" -- only
the original module's dict has that.

There are a bunch of different ways we could try writing our
__init__.py. We might try putting the sys.module assignment at the
end:

a = 1
from .submod import b, c
d = 4
orig_dict = sys.modules[__name__].__dict__
sys.modules[__name__] = MyModule(__name__, __doc__)
sys.modules[__name__].__dict__.update(orig_dict)

Now when .submod re-imports the top-level module, it ends up with a
reference to the original module object, which has an "a" entry, so
the definition of "b" works. But, now .submod.foo will continue to
refer to the original module object, even after we substitute in the
metamodule object. If we do 'foo.a = 5' later on, then foo.c() will
continue to use the original binding of 'a'; this mutation will be
invisible to it.

I guess the only way to make it work in this case is to do multiple
copies, one before every nested-import:

orig_dict = sys.modules[__name__].__dict__
sys.modules[__name__] = MyModule(__name__, __doc__)
a = 1
sys.modules[__name__].__dict__.update(orig_dict)
from .submod import b, c
d = 4
sys.modules[__name__].__dict__.update(orig_dict)

...but this is incredibly ugly and error-prone.

What we really want to do instead is to make our new metamodule object
refer directly to the original module's __dict__:

orig_dict = sys.modules[__name__].__dict__
sys.modules[__name__] = MyModule(__name__, __doc__)
sys.modules[__name__].__dict__ = orig_dict
a = 1
from .submod import b, c
d = 4

That way they will always be in sync. This looks like it should work
great! But it has a few problems:

- Trying to assign to a module's __dict__ attribute raises "TypeError:
readonly attribute".
- So I actually implemented a fix for that, and ran into a new
problem: module's take jealous ownership of their __dict__. In
particular, they assume that they are when they are deallocated they
should wipe their dict clean
(https://www.python.org/doc/essays/cleanup/). Obviously this is bad
for us because we are still using that dict!
- Also, in modern Python module objects contain more state besides
__dict__ -- in particular, PEP 3121-related state. There's no public
API to get at this.
- Possibly future version of Python will add more state fields again, who knows.

The easiest way to solve all these problems is to *swap* all of the
internal fields between the old module object and the new metamodule
object. This can be done hackishly using ctypes; this requires knowing
about CPython's struct layouts, but that's okay for prototyping and
for backwards compatibility hacks (which only have to support specific
known versions). To do it non-hackishly, I was at first thinking that
we should provide an official API for swapping module object states.
But then I realized that at that point, we're basically back to...

> 3) Analogy with re-classing object instances: Just allow modules to set __class__ during execution (or after, if you want). Variations on this include allowing that for all non-heap types, or even getting rid of the concept of non-heap types.

...this proposal after all. And allowing __class__ assignment on
modules strikes me as more asthetic than having a
sys.swap_module_contents function.

I implemented a prototype of this functionality here:
    https://github.com/njsmith/metamodule

The implementation is here:
    https://github.com/njsmith/metamodule/blob/master/metamodule.py

That file has 3 parts:
- A fancy metamodule class that handles implicit-imports and
warn-on-attribute-access.
- A utility function for setting up a metamodule; it tries __class__
assignment, and if that doesn't work falls back on ctypes hackery.
- The aforementioned ctypes hackery. It's pretty ugly, but if we add
__class__ assignment then it will become unnecessary on future Python
versions, woohoo!

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org