[Import-SIG] PEP 489: Redesigning extension module loading

Wed Mar 25 13:11:44 CET 2015

On 25 March 2015 at 02:34, Petr Viktorin <encukou at gmail.com> wrote:
> I'll share my notes on an API with PEP 384-style slots, before attempting to
> write it out in PEP language.
>
> I struggled to find a good name for the "PyType_Spec" equivalent, since
> ModuleDef and ModuleSpec are both taken, but then I realized that, if the
> docstring is put in a slot, I just need an array of slots...

Because we're looking for an exported symbol, I think there's value in
having a more clearly defined top level structure rather than just an
array.

PyModule_Export or PyModule_Declare come to mind, with a preference
for the former (since we're exporting a module definition for CPython
to import)

typedef struct PyModule_Export {
  const char* doc;
  PyModule_Slot *slots; /* terminated by slot==0. */
} PyModule_Export;

I prefer this mostly because it's easier to document and hence to
understand - you can cover the process of creating the overall module
in relation to PyModule_Export, while PyModule_Slot docs can focus on
defining the *content* of the module.

Having the docstring as the only expected field helps suggest that
modules should at least define that much. Unlike types, we can leave
the name out by default, as it will usually be implied by the file
name (as is the case with Python modules).

You've sold me on the idea of using a slots based API, though.
However, the PEP's going to need to spend a bit more time on how to
map this to the existing PyModule_Create API for modules that also
want to support older versions of Python, while using the new system
on 3.5+.

> Does the following look reasonable?
>
> in moduleobject.h:
>
> typedef struct PyModule_Slot{
>     int slot;
>     void *pfunc;
> } PyModuleDesc_Slot;

"pfunc" doesn't fit in this case, so I think a more generic field name
like "value" would be needed.

> typedef struct PyModule_StateDef {
>     int size;
>     traverseproc m_traverse;
>     inquiry m_clear;
>     freefunc m_free;
> }
>
> #define Py_m_doc 1
> #define Py_m_create 2
> #define Py_m_methods 3
> #define Py_m_statedef 4
> #define Py_m_exec 5

Py_mod_*, perhaps?

I'm also wondering if "exec" should move to be an "m_init" method in
PyModule_StateDef, rather than an independent slot, replacing it with
a PyType_Spec "types" slot as suggested below.

> in the extension:
>
> static PyMethodDef spam_methods[] = {
>     {"demo", (PyCFunction)spam_demo,  ...},
>     {NULL, NULL}
> };
>
> static PyModule_StateDef spam_statedef[] = {
>     sizeof(spam_state_t),
>     spam_state_traverse,
>     spam_state_clear,
>     spam_state_free
>     /* any of those three can be NULL if not needed */
> }
>
> static PyModule_Slot spam_slots[] = {
>     {Py_m_methods, spam_methods},
>     {Py_m_statedef, spam_statedef},
>     {Py_m_exec, spam_exec},
>     {0, NULL}
> }

PyModule_Export PyModule_Export_spam = {
    PyDoc_STR("A spammy module"),
    spam_slots
}

>
> PyModuleDesc *PyModuleInit_spam {
>     return spam_slots;
> }

I suspect this is a holdover from an earlier iteration of the design.

>
> There is both a Create and Exec slot, among others – anyone can choose what
> they need.
>
> If you set the Py_m_create slot, then you can't also set Py_m_state. All the
> other items are honored (including name and doc, which will be set by the
> module machinery – but name might not match).
>
> The exec method is tied to the module; it's only called on modules created
> from the description (or ones that look as if they were, in runpy's case).
> It is called only once for each module; reload()ing an extension module will
> only reset import-related attributes (as it does now).

That sounds reasonable to me.

> If you don't set Py_m_create, you'll be able to run the module with python
> -m.
>
>
> For non-ASCII module names: the X in PyModuleGetDesc_X will be in punycode
> (s/-/_/), PyModuleDesc.name in UTF-8, and filename in the filesystem
> encoding.

Adjusted appropriately for exporting a PyModule_Export struct, agreed.

> I've thought about supporting multiple modules per extension, but I don't
> see a clear way to do that. The standard ModuleSpec machinery assumes one
> module per file, and it's not straightforward to get around that. To load
> more modules from an extension, you'd need a custom finder or loader anyway.
> So I'm going to implement helpers needed to load a module given an arbitrary
> PyModuleDesc, and leave implementing multi-mod support to people who need it
> for now.
> So, an "inittab" is out for now.

Symlinks should work for making the same binary file importable under
different names in simple cases, and more complex cases are likely to
need a custom finder and loader anyway.

> Perhaps a slot for automatically adding classes (from array of PyType_Spec)
> would help PyType_Spec adoption.

Perhaps this one would be worth including in the initial proposal to
help make it clear why we decided the slots based design was
worthwhile?

> And then a slot adding string/int/... constants from arrays of name/value
> would mean most modules wouldn't need an exec function.

For those cases, I think the module internally is likely to want fast
C level access to the relevant constants - this note is the one that
inspired my suggestion of moving the "exec" link into the statedef
slot.

> And an "inittab" slot should be possible for package-style extensions.
> I'll leave these ideas out for now, but possibilities for extending are
> there.

If I recall correctly, there's actually a longstanding RFE somewhere
for builtin packages that this change may eventually be able to help
with. It was something embedding the full Qt libraries I think.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia