[Import-SIG] Running C extension modules using -m switch

Mon May 22 10:51:55 EDT 2017

On 05/22/2017 11:33 AM, Petr Viktorin wrote:
> On 05/20/2017 06:36 AM, Nick Coghlan wrote:
>> On 19 May 2017 at 21:43, Petr Viktorin <encukou at gmail.com> wrote:
>>> On 05/19/2017 12:24 PM, Nick Coghlan wrote:
>>>>
>>>> On 18 May 2017 at 22:50,  <gmarcel.plch at gmail.com> wrote:
>>>>>
>>>>> Greetings,
>>>>>
>>>>> This has been already sent to python-ideas, but since I got no
>>>>> response, so I'm re-sending it to this SIG. I would welcome any
>>>>> comments.
>>>>
>>>>
>>> ...
>>>>>
>>>>>
>>>>> This new method calls into the _imp module, which executes the module
>>>>> as a script.
>>>>> I can see two ways of doing this. Both expect that the module uses PEP
>>>>> 489 multi-phase initialization.
>>>>
>>>>
>>>> The main reason I didn't immediately reply is that I had a vague
>>>> recollection of thinking this could be done *without* a new method on
>>>> loaders, but I needed to refresh my memory of our plans in that
>>>> regard.
>>>>
>>>> I've now done that, and I'm pretty sure the unwritten plan was to
>>>> change runpy to do something like the following:
>>>>
>>>>       spec = importlib.find_spec(modname)
>>>>       created = spec.loader.create_module()
>>>>       if created is not None:
>>>>           raise RuntimeError("Cannot use customised module instance as
>>>> __main__")
>>>>       spec.loader.exec_module(main_mod)
>>>>
>>>> That's oversimplified quite a bit, but it gives the general idea.
>>>
>>>
>>> The problem here is that for extension modules,
>>> `spec.loader.create_module()` returns None.
>>
>> I'm guessing this was meant to be "doesn't return None". I thought I
>> was forgetting something, and that would be it :)
>>
>>> It can't: the PyModuleDef is
>>> attached to the returned module, and that's where the Py_mod_exec 
>>> function
>>> is stored. This is unlike with source modules, where the code is always
>>> looked up by module name.
>>>
>>> So I see these ways to make things work:
>>> - Make spec.loader.create_module() return None if Py_mod_create is 
>>> missing,
>>> and either store the PyModuleDef on the loader (which doesn't really 
>>> fit in
>>> with how importlib works), or re-load it from the .so every time (which
>>> seems wasteful and hacky).
>>> - Make exec_module take two modules – the module in whose namespace 
>>> to run,
>>> and the module whose code should run. Or make it take a module and a 
>>> spec of
>>> a different module. This would be an API change, affecting all 
>>> third-party
>>> loaders, so it's out.
>>> - Add a new loader method taking two modules (or module and spec) as 
>>> above
>>> - Add a new loader method to explicitly run as main
>>
>> As a third variant on the last two options: add a new optional
>> "exec_in_namespace" method - that could potentially be useful for
>> generalising reload and lazy loading support, as well as making it
>> easier for pdb, profile, etc, to support non-traditional modules.
> 
> That won't be possible, since Py_mod_exec expects a module argument.
> The main extra thing a module has in addition to its namespace dict is 
> the C-level module state (which I don't think is handled in the current 
> PoC – Marcel, can you add that?)
> In the face of C module state, I think asking extension authors to 
> always handle reloading correctly is too much. But the other use cases 
> should be possible.
> 

Marcel had to leave for the day. To prevent losing a PyCon sprint day, 
I've fixed up his latest PoC and pushed it here:

Branch: https://github.com/encukou/cpython/tree/main_c_modules_namespace
Diff: 
https://github.com/encukou/cpython/compare/master...encukou:main_c_modules_namespace?expand=1

These changes get rid of Py_mod_main, and add an optional exec_in_module 
method to loaders. This method initializes a given module using a given 
spec.

Does this approach look good?