[Import-SIG] Proto-PEP: Redesigning extension module loading

Tue Feb 24 22:49:33 CET 2015

On Mon, Feb 23, 2015 at 4:16 PM, Brett Cannon <brett at python.org> wrote:
> I mostly have grammar/typo comments and one suggestion to minimize the
> number of ways of initializing a module by not letting PyModuleCreate_* do
> that step on its own.

Thanks for the corrections! Lesson learned, I'll use a spell checker next time.

...
>> The PyModuleCreate function
>> ---------------------------
>>
>> This PyModuleCreate function is used to implement "loader.create_module"
>> defined in PEP 451.
>>
>> By exporting the "PyModuleCreate_modulename" symbol, an extension module
>> indicates that it uses a custom module object.
>>
>> This prevents loading the extension in a pre-created module,
>> but gives greater flexibility in allowing a custom C-level layout
>> of the module object.
>>
>> The "module_spec" argument receives a "ModuleSpec" instance, as defined in
>> PEP 451.
>>
>> When called, this function must create and return a module object.
>>
>> If "PyModuleExec_module" is undefined, this function must also initialize
>> the module; see PyModuleExec_module for details on initialization.
>
>
> Why conflate module creation with initialization? If one is going to have
> initialization code then it can't be difficult to factor out into a
> PyModuleExec_* function, so I don't see a good reason to support only
> defining PyModuleCreate_*.

Right. Originally, to me, Exec seemed to not be very useful when
Create is specified, because reload support for extension modules
isn't very useful (unless you're Cython and want to emulate Python
modules as well as possible). But given the fact that you can't safely
call user code from Create, it does make sense to always require Exec,
so people aren't tempted to take shortcuts.

It does stretch the __new__/__init__ parallel Nick mentioned. But
while that parallel was is a good stepping stone to get to this
design, I don't think it is too useful for explaining how the design
works.
I feel that people who know what __new__ can do and why it is
necessary should have no problem understanding a module creation hook
without relating to classes. Classes make me think about inheritance,
which doesn't apply here. Most __init__s don't register methods or
class constants, but Exec should add functions and module globals.

So I plan to drop the Create-only option, and to not mention the
__new__/__init__ parallel.
Nick, does that sound reasonable to you?

...
>> In this scheme, it is not possible to create a module with C-level state,
>> which would be able to exec itself in any externally provided module
>> object,
>> short of putting PyCapsules in the module dict.
>>
>> The proposal repurposes PyModule_SetDocString, PyModule_AddObject,
>> PyModule_AddIntMacro et.al. to work on any object.
>> Would it be better to have these in the PyObject namespace?
>
>
> No. They are setting explicit attributes that are meant only for modules so
> its more generalization than is necessary to rename them.

OK.
I will add PyModule_AddCapsule and PyModule_GetCapsule as simple
helpers for C-level state.

>> We should expose some kind of API in importlib.util (or a better place?)
>> that
>> can be used to check that a module works with reloading and
>> subinterpreters.
>
>
> What would such an API actually check to verify that a module could be
> reloaded?

Obviously we can't check for static state or object leakage between
subinterpreters.
By using the new API, you promise that the extension does support
reloading and subinterpreters. This will be prominently stated in the
docs, and checked by this function.
For the old API, PyModule_Create with m_size>=0 can be used to support
subinterpreters. But I don't think the language in the docs is strong
enough to say that m_size>=0 is a promise of such support.