[Import-SIG] PEP 489: Redesigning extension module loading

Wed Mar 25 14:36:42 CET 2015

On 03/25/2015 01:11 PM, Nick Coghlan wrote:
> On 25 March 2015 at 02:34, Petr Viktorin <encukou at gmail.com> wrote:
>> I'll share my notes on an API with PEP 384-style slots, before attempting to
>> write it out in PEP language.
>>
>> I struggled to find a good name for the "PyType_Spec" equivalent, since
>> ModuleDef and ModuleSpec are both taken, but then I realized that, if the
>> docstring is put in a slot, I just need an array of slots...
>
> Because we're looking for an exported symbol, I think there's value in
> having a more clearly defined top level structure rather than just an
> array.

OK.
I'm not sure on cross-platform support of data rather than functions 
exported from shared libraries, so kept the hook as a function.
Perhaps I'm being too paranoid here?

> PyModule_Export or PyModule_Declare come to mind, with a preference
> for the former (since we're exporting a module definition for CPython
> to import)

That's the name I was looking for, thanks!

> typedef struct PyModule_Export {
>    const char* doc;
>    PyModule_Slot *slots; /* terminated by slot==0. */
> } PyModule_Export;
>
> I prefer this mostly because it's easier to document and hence to
> understand - you can cover the process of creating the overall module
> in relation to PyModule_Export, while PyModule_Slot docs can focus on
> defining the *content* of the module.

I don't think this is a problem. I can document creating with the 
PyModuleExport_<modulename> symbol, and then when say that it's an array 
of PyModule_Slot in the appropriate section.

> Having the docstring as the only expected field helps suggest that
> modules should at least define that much. Unlike types, we can leave
> the name out by default, as it will usually be implied by the file
> name (as is the case with Python modules).

The downside is that it's additional boilerplate. PyType_Spec has a 
bunch of mandatory int fields, but here everything is a pointer.

Also, does the docstring always need to be specified (as a constant)? I 
think some internal modules are fine without a docstring (see _hashlib, 
_multiprocessing, _elementtree, _sqlite3, ...).

But if you're convinced a separate PyModule_Export structure is better, 
I won't fight.

> You've sold me on the idea of using a slots based API, though.
> However, the PEP's going to need to spend a bit more time on how to
> map this to the existing PyModule_Create API for modules that also
> want to support older versions of Python, while using the new system
> on 3.5+.

Agreed.

>> Does the following look reasonable?
>>
>> in moduleobject.h:
>>
>> typedef struct PyModule_Slot{
>>      int slot;
>>      void *pfunc;
>> } PyModuleDesc_Slot;
>
> "pfunc" doesn't fit in this case, so I think a more generic field name
> like "value" would be needed.
>
>> typedef struct PyModule_StateDef {
>>      int size;
>>      traverseproc m_traverse;
>>      inquiry m_clear;
>>      freefunc m_free;
>> }
>>
>> #define Py_m_doc 1
>> #define Py_m_create 2
>> #define Py_m_methods 3
>> #define Py_m_statedef 4
>> #define Py_m_exec 5
>
> Py_mod_*, perhaps?

Sure.

> I'm also wondering if "exec" should move to be an "m_init" method in
> PyModule_StateDef, rather than an independent slot, replacing it with
> a PyType_Spec "types" slot as suggested below.

No. Sometimes the exec doesn't need C state. It can work with just the 
module dict, for example to export some methods conditionally, or export 
objects that aren't methods/classes/whatever there's a special slot for.

[...]
>> I've thought about supporting multiple modules per extension, but I don't
>> see a clear way to do that. The standard ModuleSpec machinery assumes one
>> module per file, and it's not straightforward to get around that. To load
>> more modules from an extension, you'd need a custom finder or loader anyway.
>> So I'm going to implement helpers needed to load a module given an arbitrary
>> PyModuleDesc, and leave implementing multi-mod support to people who need it
>> for now.
>> So, an "inittab" is out for now.
>
> Symlinks should work for making the same binary file importable under
> different names in simple cases, and more complex cases are likely to
> need a custom finder and loader anyway.
>
>> Perhaps a slot for automatically adding classes (from array of PyType_Spec)
>> would help PyType_Spec adoption.
>
> Perhaps this one would be worth including in the initial proposal to
> help make it clear why we decided the slots based design was
> worthwhile?
>
>> And then a slot adding string/int/... constants from arrays of name/value
>> would mean most modules wouldn't need an exec function.
>
> For those cases, I think the module internally is likely to want fast
> C level access to the relevant constants - this note is the one that
> inspired my suggestion of moving the "exec" link into the statedef
> slot.

This is for wrapping constants that are already known at the C level.
For example _ssl has a long list of these calls:
     PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
                             PY_SSL_ERROR_ZERO_RETURN);
     PyModule_AddIntConstant(m, "SSL_ERROR_WANT_READ",
                             PY_SSL_ERROR_WANT_READ);
     PyModule_AddIntConstant(m, "SSL_ERROR_WANT_WRITE",
                             PY_SSL_ERROR_WANT_WRITE);
     PyModule_AddIntConstant(m, "SSL_ERROR_WANT_X509_LOOKUP",
                             PY_SSL_ERROR_WANT_X509_LOOKUP);
     PyModule_AddIntConstant(m, "SSL_ERROR_SYSCALL",
                             PY_SSL_ERROR_SYSCALL);
     PyModule_AddIntConstant(m, "SSL_ERROR_SSL",
                             PY_SSL_ERROR_SSL);
     PyModule_AddIntConstant(m, "SSL_ERROR_WANT_CONNECT",
                             PY_SSL_ERROR_WANT_CONNECT);

... and so on. Many modules don't have proper error checking for this.

>> And an "inittab" slot should be possible for package-style extensions.
>> I'll leave these ideas out for now, but possibilities for extending are
>> there.
>
> If I recall correctly, there's actually a longstanding RFE somewhere
> for builtin packages that this change may eventually be able to help
> with. It was something embedding the full Qt libraries I think.

There are probably more use cases, but let's stick to the basics for now.