From ericsnowcurrently at gmail.com Sun May 3 00:22:31 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 2 May 2015 16:22:31 -0600 Subject: [Import-SIG] an old idea: getting rid of __init__.py Message-ID: When namespace packages were under discussion I remember we were seriously considering eliminating the requirement of __init__.py for *all* packages. Well, I stumbled onto the following post from Guido predating namespace packages by several years: https://mail.python.org/pipermail/python-dev/2006-April/064400.html Food for thought. :) -eric p.s. I haven't yet read through the thread, but I expect the conversation dragged out long enough that the proposal lost steam. From solipsis at pitrou.net Sun May 3 00:41:07 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 3 May 2015 00:41:07 +0200 Subject: [Import-SIG] an old idea: getting rid of __init__.py References: Message-ID: <20150503004107.568b4089@fsol> On Sat, 2 May 2015 16:22:31 -0600 Eric Snow wrote: > When namespace packages were under discussion I remember we were > seriously considering eliminating the requirement of __init__.py for > *all* packages. Well, I stumbled onto the following post from Guido > predating namespace packages by several years: Well, I've already been bitten by Python mistaking a directory for a "namespace package", just because of its simple existence. I wouldn't want things to get any more annoying. The argument that __init__.py is confusing to beginners is a bit arbitrary; not requiring any __init__.py makes for situations that are just as confusing. Regards Antoine. > > https://mail.python.org/pipermail/python-dev/2006-April/064400.html > > Food for thought. :) > > -eric > > p.s. I haven't yet read through the thread, but I expect the > conversation dragged out long enough that the proposal lost steam. From ncoghlan at gmail.com Tue May 5 09:49:19 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 5 May 2015 17:49:19 +1000 Subject: [Import-SIG] an old idea: getting rid of __init__.py In-Reply-To: References: Message-ID: On 3 May 2015 at 08:22, Eric Snow wrote: > When namespace packages were under discussion I remember we were > seriously considering eliminating the requirement of __init__.py for > *all* packages. Which is what we effectively did. You only need an __init__.py now if you: a) want module level attributes, rather than only subpackages; b) want to run other code at package import time; or c) want to forcibly close the package to further extension in other directories. As Antoine notes, the implicit nature of magically scanning directories for subpackages trades away comprehensibility for the sake of convenience. It's main advantage is actually "that's the way other languages handle import namespacing". The kind of traditional package created by adding __init__.py could be described as being more akin to a "directory module" than it is to a pure namespace package (certainly "directory module" is an accurate description of former single-file modules like unittest, which go out of their way to hide the fact that they're now implemented across multiple files). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu May 7 17:35:02 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 07 May 2015 17:35:02 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 Message-ID: <554B8626.8000709@gmail.com> Hello! Based on previous discussions, particularly the lacks of objections to repurposing ModuleDef.m_reload, I've sent an updated version of PEP 489 to the editors. I'm including a copy below. The implementation is nearly finished, with several things missing: - Support for non-Linuxy platforms - PyImport_Inittab, see below - Documentation - porting "xx" and "xxsubtype" modules (but "xxlimited" is done) The changes from the last update are: - PyModuleExport -> PyModuleDef (which brings us down to two slot types, create & exec) - Removed "singleton modules" - Stated that PyModule_Create, PyState_FindModule, PyState_AddModule, PyState_RemoveModule will not work on slots-based modules. - Added a section on C-level callbacks - Clarified that if PyModuleExport_* returns NULL, it's as if it wasn't defined (i.e. falls back to PyInit) - Added API functions: PyModule_FromDefAndSpec, PyModule_ExecDef - Added PyModule_AddMethods and PyModule_AddDocstring helpers - Added PyMODEXPORT_FUNC macro for x-platform declarations of the export function - Added summary of API changes - Added example code for a backwards-compatible module - Changed modules ported in the initial implementation to "array" and "xx*" - Changed ImportErrors to SystemErrors in cases where the module is badly written (and to mirror what PyInit does now) - Several typo fixes and clarifications Some further thoughts: The docstring and methods are initialized in the creation step, rather than exec. I don't think it's important enough to do this in exec, and this way the implementation is easier (with respect to NULL slots, and backwards compatibility with PyInit-based modules where Exec is a no-op). As I was implementing this, I ran into PyImport_Inittab. I'll need to add a similar list of PyModuleDefs. And now for the PEP: -- PEP: 489 Title: Redesigning extension module loading Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015 Resolution: Abstract ======== This PEP proposes a redesign of the way in which extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension authors to only define features they need, and to allow future additions to extension module declarations. Extensions modules are created in a two-step process, fitting better into the ModuleSpec architecture, with parallels to __new__ and __init__ of classes. Extension modules can safely store arbitrary C-level per-module state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. The proposal also allows extension modules with non-ASCII names. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialization. The initialization function is not passed the ModuleSpec, or any information it contains, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. In Py3, modules are also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to re-import it and thus run into an infinite loop when it executes the module init function again. Without the FQMN, it is not trivial to correctly add the module to sys.modules either. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of "__init__.py" modules, i.e. packages, especially when relative imports are being used at module init time. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or interpreter reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps backwards compatibility, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process =================== Currently, extension modules export an initialization function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialized module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then continues to initialize it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the fully qualified name of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The proposal ============ The current extension module initialization will be deprecated in favor of a new initialization scheme. Since the current scheme will continue to be available, existing code will continue to work unchanged, including binary compatibility. Extension modules that support the new initialization scheme must export the public symbol "PyModuleExport_", where "modulename" is the name of the module. (For modules with non-ASCII names the symbol name is slightly different, see "Export Hook Name" below.) If defined, this symbol must resolve to a C function with the following signature:: PyModuleDef* (*PyModuleExportFunction)(void) For cross-platform compatibility, the function should be declared as:: PyMODEXPORT_FUNC PyModuleExport_(void) The function must return a pointer to a PyModuleDef structure. This structure must be available for the lifetime of the module created from it ? usually, it will be declared statically. Alternatively, this function can return NULL, in which case it is as if the symbol was not defined ? see the "Legacy Init" section. The PyModuleDef structure will be changed to contain a list of slots, similarly to PEP 384's PyType_Spec for types. To keep binary compatibility, and avoid needing to introduce a new structure (which would introduce additional supporting functions and per-module storage), the currently unused m_reload pointer of PyModuleDef will be changed to hold the slots. The structures are defined as:: typedef struct { int slot; void *value; } PyModuleDef_Slot; typedef struct PyModuleDef { PyModuleDef_Base m_base; const char* m_name; const char* m_doc; Py_ssize_t m_size; PyMethodDef *m_methods; PyModuleDef_Slot *m_slots; /* changed from `inquiry m_reload;` */ traverseproc m_traverse; inquiry m_clear; freefunc m_free; } PyModuleDef; The *m_slots* member must be either NULL, or point to an array of PyModuleDef_Slot structures, terminated by a slot with id set to 0 (i.e. ``{0, NULL}``). To specify a slot, a unique slot ID must be provided. New Python versions may introduce new slot IDs, but slot IDs will never be recycled. Slots may get deprecated, but will continue to be supported throughout Python 3.x. A slot's value pointer may not be NULL, unless specified otherwise in the slot's documentation. The following slots are currently available, and described later: * Py_mod_create * Py_mod_exec Unknown slot IDs will cause the import to fail with SystemError. When using the new import mechanism, m_size must not be negative. Also, the *m_name* field of PyModuleDef will not be unused during importing; the module name will be taken from the ModuleSpec. Module Creation --------------- Module creation ? that is, the implementation of ExecutionLoader.create_module ? is governed by the Py_mod_create slot. The Py_mod_create slot ...................... The Py_mod_create slot is used to support custom module subclasses. The value pointer must point to a function with the following signature:: PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def) The function receives a ModuleSpec instance, as defined in PEP 451, and the PyModuleDef structure. It should return a new module object, or set an error and return NULL. This function is not responsible for setting import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) on the new module. There is no requirement for the returned object to be an instance of types.ModuleType. Any type can be used, as long as it supports setting and getting attributes, including at least the import-related attributes. However, only ModuleType instances support module-specific functionality such as per-module state. Note that when this function is called, the module's entry in sys.modules is not populated yet. Attempting to import the same module again (possibly transitively), may lead to an infinite loop. Extension authors are advised to keep Py_mod_create minimal, an in particular to not call user code from it. Multiple Py_mod_create slots may not be specified. If they are, import will fail with SystemError. If Py_mod_create is not specified, the import machinery will create a normal module object by PyModule_New. The name is taken from *spec*. Post-creation steps ................... If the Py_mod_create function returns an instance of types.ModuleType (or subclass), or if a Py_mod_create slot is not present, the import machinery will do the following steps after the module is created: * If *m_size* is specified, per-module state is allocated and made accessible through PyModule_GetState * The PyModuleDef is associated with the module, making it accessible to PyModule_GetDef, and enabling the m_traverse, m_clear and m_free hooks. * The docstring is set from m_doc. * The module's functions are initialized from m_methods. If the Py_mod_create function does not return a module subclass, then m_size must be 0 or negative, and m_traverse, m_clear and m_free must all be NULL. Otherwise, SystemError is raised. Module Execution ---------------- Module execution -- that is, the implementation of ExecutionLoader.exec_module -- is governed by "execution slots". This PEP only adds one, Py_mod_exec, but others may be added in the future. Execution slots may be specified multiple times, and are processed in the order they appear in the slots array. When using the default import machinery, they are processed after import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) are set and the module is added to sys.modules. The Py_mod_exec slot .................... The entry in this slot must point to a function with the following signature:: int (*PyModuleExecFunction)(PyObject* module) It will be called to initialize a module. Usually, this amounts to setting the module's initial attributes. The "module" argument receives the module object to initialize. If PyModuleExec replaces the module's entry in sys.modules, the new object will be used and returned by importlib machinery. (This mirrors the behavior of Python modules. Note that for extensions, implementing Py_mod_create is usually a better solution for the use cases this serves.) The function must return ``0`` on success, or, on error, set an exception and return ``-1``. Legacy Init ----------- If the PyModuleExport function is not defined, or if it returns NULL, the import machinery will try to initialize the module using the "PyInit_" hook, as described in PEP 3121. If the PyModuleExport function is defined, the PyInit function will be ignored. Modules requiring compatibility with previous versions of CPython may implement the PyInit function in addition to the new hook. Modules using the legacy init API will be initialized entirely in the Loader.create_module step; Loader.exec_module will be a no-op. A module that supports older CPython versions can be coded as:: #define Py_LIMITED_API #include static int spam_exec(PyObject *module) { PyModule_AddStringConstant(module, "food", "spam"); return 0; } static PyModuleDef_Slot spam_slots[] = { {Py_mod_exec, spam_exec}, {0, NULL} }; static PyModuleDef spam_def = { PyModuleDef_HEAD_INIT, /* m_base */ "spam", /* m_name */ PyDoc_STR("Utilities for cooking spam"), /* m_doc */ 0, /* m_size */ NULL, /* m_methods */ spam_slots, /* m_slots */ NULL, /* m_traverse */ NULL, /* m_clear */ NULL, /* m_free */ }; PyModuleDef* PyModuleExport_spam(void) { return &spam_def; } PyMODINIT_FUNC PyInit_spam(void) { PyObject *module; module = PyModule_Create(&spam_def); if (module == NULL) return NULL; if (spam_exec(module) != 0) { Py_DECREF(module); return NULL; } return module; } Note that this must be *compiled* on a new CPython version, but the resulting shared library will be backwards compatible. (Source-level compatibility is possible with preprocessor directives.) If a Py_mod_create slot is used, PyInit should call its function instead of PyModule_Create. Keep in mind that the ModuleSpec object is not available in the legacy init scheme. Subinterpreters and Interpreter Reloading ----------------------------------------- Extensions using the new initialization scheme are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. The mechanism is designed to make this easy, but care is still required on the part of the extension author. No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object's storage reachable by PyModule_GetState. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. Behavior of existing module creation functions ---------------------------------------------- The PyModule_Create function will fail when used on a PyModuleDef structure with a non-NULL m_slots pointer. The function doesn't have access to the ModuleSpec object necessary for "new style" module creation. The PyState_FindModule function will return NULL, and PyState_AddModule and PyState_RemoveModule will fail with SystemError. PyState registration is disabled because multiple module objects may be created from the same PyModuleDef. Module state and C-level callbacks ---------------------------------- Due to the unavailability of PyState_FindModule, any function that needs access to module-level state (including functions, classes or exceptions defined at the module level) must receive a reference to the module object (or the particular object it needs), either directly or indirectly. This is currently difficult in two situations: * Methods of classes, which receive a reference to the class, but not to the class's module * Libraries with C-level callbacks, unless the callbacks can receive custom data set at cllback registration Fixing these cases is outside of the scope of this PEP, but will be needed for the new mechanism to be useful to all modules. Proper fixes have been discussed on the import-sig mailing list [#findmodule-discussion]_. As a rule of thumb, modules that rely on PyState_FindModule are, at the moment, not good candidates for porting to the new mechanism. New Functions ------------- A new function and macro will be added to implement module creation. These are similar to PyModule_Create and PyModule_Create2, except they take an additional ModuleSpec argument, and handle module definitions with non-NULL slots:: PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec) PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec, int module_api_version) A new function will be added to run "execution slots" on a module:: PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def) Additionally, two helpers will be added for setting the docstring and methods on a module:: int PyModule_SetDocString(PyObject *, const char *) int PyModule_AddFunctions(PyObject *, PyMethodDef *) Export Hook Name ---------------- As portable C identifiers are limited to ASCII, module names must be encoded to form the PyModuleExport hook name. For ASCII module names, the import hook is named PyModuleExport_, where is the name of the module. For module names containing non-ASCII characters, the import hook is named PyModuleExportU_, where the name is encoded using CPython's "punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix), with hyphens ("-") replaced by underscores ("_"). In Python:: def export_hook_name(name): try: suffix = b'_' + name.encode('ascii') except UnicodeEncodeError: suffix = b'U_' + name.encode('punycode').replace(b'-', b'_') return b'PyModuleExport' + suffix Examples: ============= =========================== Module name Export hook name ============= =========================== spam PyModuleExport_spam lan?m?t PyModuleExportU_lanmt_2sa6t ??? PyModuleExportU_zck5b2b ============= =========================== Module Reloading ---------------- Reloading an extension module using importlib.reload() will continue to have no effect, except re-setting import-related attributes. Due to limitations in shared library loading (both dlopen on POSIX and LoadModuleEx on Windows), it is not generally possible to load a modified library after it has changed on disk. Use cases for reloading other than trying out a new version of the module are too rare to require all module authors to keep reloading in mind. If reload-like functionality is needed, authors can export a dedicated function for it. Multiple modules in one library ------------------------------- To support multiple Python modules in one shared library, the library can export additional PyModuleExport* symbols besides the one that corresponds to the library's filename. Note that this mechanism can currently only be used to *load* extra modules, not to *find* them. Given the filesystem location of a shared library and a module name, a module may be loaded with:: import importlib.machinery import importlib.util loader = importlib.machinery.ExtensionFileLoader(name, path) spec = importlib.util.spec_from_loader(name, loader) module = importlib.util.module_from_spec(spec) loader.exec_module(module) return module On platforms that support symbolic links, these may be used to install one library under multiple names, exposing all exported modules to normal import machinery. Testing and initial implementations ----------------------------------- For testing, a new built-in module ``_testmoduleexport`` will be created. The library will export several additional modules using the mechanism described in "Multiple modules in one library". The ``_testcapi`` module will be unchanged, and will use the old API indefinitely (or until the old API is removed). The ``array`` and ``xx*`` modules will be converted to the new API as part of the initial implementation. API Changes and Additions ------------------------- New functions: * PyModule_FromDefAndSpec (macro) * PyModule_FromDefAndSpec2 * PyModule_ExecDef * PyModule_SetDocString * PyModule_AddFunctions New macros: * PyMODEXPORT_FUNC * Py_mod_create * Py_mod_exec New structures: * PyModuleDef_Slot PyModuleDef.m_reload changes to PyModuleDef.m_slots. Possible Future Extensions ========================== The slots mechanism, inspired by PyType_Slot from PEP 384, allows later extensions. Some extension modules exports many constants; for example _ssl has a long list of calls in the form:: PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", PY_SSL_ERROR_ZERO_RETURN); Converting this to a declarative list, similar to PyMethodDef, would reduce boilerplate, and provide free error-checking which is often missing. String constants and types can be handled similarly. (Note that non-default bases for types cannot be portably specified statically; this case would need a Py_mod_exec function that runs before the slots are added. The free error-checking would still be beneficial, though.) Another possibility is providing a "main" function that would be run when the module is given to Python's -m switch. For this to work, the runpy module will need to be modified to take advantage of ModuleSpec-based loading introduced in PEP 451. Also, it will be necessary to add a mechanism for setting up a module according to slots it wasn't originally defined with. Implementation ============== Work-in-progress implementation is available in a Github repository [#gh-repo]_; a patchset is at [#gh-patch]_. Previous Approaches =================== Stefan Behnel's initial proto-PEP [#stefans_protopep]_ had a "PyInit_modulename" hook that would create a module class, whose ``__init__`` would be then called to create the module. This proposal did not correspond to the (then nonexistent) PEP 451, where module creation and initialization is broken into distinct steps. It also did not support loading an extension into pre-existing module objects. Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype implementation [#nicks-prototype]_. At this time PEP 451 was still not implemented, so the prototype does not use ModuleSpec. The original version of this PEP used Create and Exec hooks, and allowed loading into arbitrary pre-constructed objects with Exec hook. The proposal made extension module initialization closer to how Python modules are initialized, but it was later recognized that this isn't an important goal. The current PEP describes a simpler solution. References ========== .. [#lazy_import_concerns] https://mail.python.org/pipermail/python-dev/2013-August/128129.html .. [#pep-0451-attributes] https://www.python.org/dev/peps/pep-0451/#attributes .. [#stefans_protopep] https://mail.python.org/pipermail/python-dev/2013-August/128087.html .. [#nicks-prototype] https://mail.python.org/pipermail/python-dev/2013-August/128101.html .. [#rfc-3492] http://tools.ietf.org/html/rfc3492 .. [#gh-repo] https://github.com/encukou/cpython/commits/pep489 .. [#gh-patch] https://github.com/encukou/cpython/compare/master...encukou:pep489.patch .. [#findmodule-discussion] https://mail.python.org/pipermail/import-sig/2015-April/000959.html Copyright ========= This document has been placed in the public domain. From encukou at gmail.com Wed May 13 16:31:26 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 13 May 2015 16:31:26 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 In-Reply-To: <554B8626.8000709@gmail.com> References: <554B8626.8000709@gmail.com> Message-ID: On Thu, May 7, 2015 at 5:35 PM, Petr Viktorin wrote: > Hello! > > Based on previous discussions, particularly the lacks of objections to > repurposing ModuleDef.m_reload, I've sent an updated version of PEP 489 > to the editors. I'm including a copy below. > > The implementation is nearly finished, with several things missing: > - Support for non-Linuxy platforms > - PyImport_Inittab, see below > - Documentation > - porting "xx" and "xxsubtype" modules (but "xxlimited" is done) [...] > > > Some further thoughts: > > The docstring and methods are initialized in the creation step, rather > than exec. I don't think it's important enough to do this in exec, and > this way the implementation is easier (with respect to NULL slots, and > backwards compatibility with PyInit-based modules where Exec is a no-op). > > As I was implementing this, I ran into PyImport_Inittab. I'll need to > add a similar list of PyModuleDefs. And here I'm somewhat stumped, can someone help me find the right direction? There's a tool called freeze, which (among other things) generates the PyImport_Inittab, in the file config.c which looks a bit like this: extern PyObject* PyInit__thread(void); extern PyObject* PyInit__signal(void); [... and so on for the other modules ...] struct _inittab _PyImport_Inittab[] = { {"_thread", PyInit__thread}, {"_signal", PyInit__signal}, [... and so on for the other modules ...] }; This file is generated just from a list of module names, without loading them. So, it can't easily determine whether a module uses PyInit_*, or PyModuleExport_*. But it needs to choose the hook name correctly, otherwise the program will fail to link. I can see three solutions for this problem. I could modify freeze to inspect the modules somehow. I'm wary of writing platform-specific code for such an edge case, though, and I'm not sure if freeze always has access to the modules it processes, rather than just their names. I could introduce some way to specify which hook is used out-of band. But that's just passing the problem on to users, not solving it. Also, freeze is pretty minimal and I'm vaguely aware of third-party tools that do something similar (cx_freeze, py2exe, py2app); I might need to coordinate with them. Or, I could keep the "PyInit_*" hook name, and allow it to return PyModuleDef instead of a module. This is obviously a hack, and would force me to get back down to the drawing board, but considering the options it seems best to explore this option. (PyInit_* and PyModuleExport_* signatures are technically compatible, since a PyModuleDef is a PyObject) I'd welcome your thoughts. From ncoghlan at gmail.com Wed May 13 18:04:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 May 2015 02:04:24 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 In-Reply-To: References: <554B8626.8000709@gmail.com> Message-ID: On 14 May 2015 at 00:31, Petr Viktorin wrote: > Or, I could keep the "PyInit_*" hook name, and allow it to return > PyModuleDef instead of a module. This is obviously a hack, and would > force me to get back down to the drawing board, but considering the > options it seems best to explore this option. > (PyInit_* and PyModuleExport_* signatures are technically compatible, > since a PyModuleDef is a PyObject) > > I'd welcome your thoughts. Would it be feasible to go with a model where _PyImport_inittab continues to be based on the legacy extension module initialisation system for the time being? That would mean implementing PyInit_* would remain required rather than optional for 3.5, but lots of folks are going to want to provide it anyway for compatibility with 3.4 and earlier. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu May 14 10:10:51 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 14 May 2015 10:10:51 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 In-Reply-To: References: <554B8626.8000709@gmail.com> Message-ID: On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan wrote: > On 14 May 2015 at 00:31, Petr Viktorin wrote: >> Or, I could keep the "PyInit_*" hook name, and allow it to return >> PyModuleDef instead of a module. This is obviously a hack, and would >> force me to get back down to the drawing board, but considering the >> options it seems best to explore this option. >> (PyInit_* and PyModuleExport_* signatures are technically compatible, >> since a PyModuleDef is a PyObject) >> >> I'd welcome your thoughts. > > Would it be feasible to go with a model where _PyImport_inittab > continues to be based on the legacy extension module initialisation > system for the time being? That would mean implementing PyInit_* would > remain required rather than optional for 3.5, but lots of folks are > going to want to provide it anyway for compatibility with 3.4 and > earlier. That doesn't really solve the problem, just delays it until we decide that PyInit_* is really optional. It would mean you couldn't take advantage of the improvements in PEP 489 (create/exec split and ModuleSpec). You'd just write more boilerplate for no benefit (except small stuff like non-ASCII module names). What might be worse, it would mean that modules would have different behavior depending on whether they're frozen or not, which would probably result in subtle bugs you'd only find when creating frozen binaries. From ncoghlan at gmail.com Thu May 14 10:48:45 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 14 May 2015 18:48:45 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 In-Reply-To: References: <554B8626.8000709@gmail.com> Message-ID: On 14 May 2015 at 18:10, Petr Viktorin wrote: > On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan wrote: >> On 14 May 2015 at 00:31, Petr Viktorin wrote: >>> Or, I could keep the "PyInit_*" hook name, and allow it to return >>> PyModuleDef instead of a module. This is obviously a hack, and would >>> force me to get back down to the drawing board, but considering the >>> options it seems best to explore this option. >>> (PyInit_* and PyModuleExport_* signatures are technically compatible, >>> since a PyModuleDef is a PyObject) >>> >>> I'd welcome your thoughts. >> >> Would it be feasible to go with a model where _PyImport_inittab >> continues to be based on the legacy extension module initialisation >> system for the time being? That would mean implementing PyInit_* would >> remain required rather than optional for 3.5, but lots of folks are >> going to want to provide it anyway for compatibility with 3.4 and >> earlier. > > That doesn't really solve the problem, just delays it until we decide > that PyInit_* is really optional. Yeah, I was seeing if you thought a "buy more time to think about it further" approach might be viable here. I think you're right that we need a better answer up front, though. > It would mean you couldn't take advantage of the improvements in PEP > 489 (create/exec split and ModuleSpec). You'd just write more > boilerplate for no benefit (except small stuff like non-ASCII module > names). > > What might be worse, it would mean that modules would have different > behavior depending on whether they're frozen or not, which would > probably result in subtle bugs you'd only find when creating frozen > binaries. Looking at https://hg.python.org/cpython/file/default/Tools/freeze/makeconfig.py, I'm thinking your "out-of-band" option may be a reasonable way to go, with a corresponding tweak to the semantics of https://docs.python.org/3/c-api/import.html#c._inittab to permit (initfunc) to be a pointer to a PyInit_* function OR to a PyModuleExport_* function. We'd then have to determine which was which at runtime when processing the inittab internally, by checking whether or not the result of the call was a PyModuleDef or not. For the inittab generation side, freeze would need to be updated to: * allow builtin modules to be specifically nominated as "initialised modules" or "defined modules" * allow the default handling of builtin modules not nominated as one or the other to be configured * for backwards compatibility, builtin modules would be treated as initialised modules by default If you had a new module that was export only, you'd get a link time error looking for the init function that didn't exist if you didn't explicitly flag it as a "defined module". Similarly, if you switched the default to be defined modules, you'd get a link time error for a legacy module that didn't support the new API. Does that approach sound plausible to you? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu May 14 14:38:43 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 14 May 2015 14:38:43 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 In-Reply-To: References: <554B8626.8000709@gmail.com> Message-ID: On Thu, May 14, 2015 at 10:48 AM, Nick Coghlan wrote: > On 14 May 2015 at 18:10, Petr Viktorin wrote: >> On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan wrote: >>> On 14 May 2015 at 00:31, Petr Viktorin wrote: >>>> Or, I could keep the "PyInit_*" hook name, and allow it to return >>>> PyModuleDef instead of a module. This is obviously a hack, and would >>>> force me to get back down to the drawing board, but considering the >>>> options it seems best to explore this option. >>>> (PyInit_* and PyModuleExport_* signatures are technically compatible, >>>> since a PyModuleDef is a PyObject) >>>> >>>> I'd welcome your thoughts. >>> >>> Would it be feasible to go with a model where _PyImport_inittab >>> continues to be based on the legacy extension module initialisation >>> system for the time being? That would mean implementing PyInit_* would >>> remain required rather than optional for 3.5, but lots of folks are >>> going to want to provide it anyway for compatibility with 3.4 and >>> earlier. >> >> That doesn't really solve the problem, just delays it until we decide >> that PyInit_* is really optional. > > Yeah, I was seeing if you thought a "buy more time to think about it > further" approach might be viable here. I think you're right that we > need a better answer up front, though. > >> It would mean you couldn't take advantage of the improvements in PEP >> 489 (create/exec split and ModuleSpec). You'd just write more >> boilerplate for no benefit (except small stuff like non-ASCII module >> names). >> >> What might be worse, it would mean that modules would have different >> behavior depending on whether they're frozen or not, which would >> probably result in subtle bugs you'd only find when creating frozen >> binaries. > > Looking at https://hg.python.org/cpython/file/default/Tools/freeze/makeconfig.py, > I'm thinking your "out-of-band" option may be a reasonable way to go, > with a corresponding tweak to the semantics of > https://docs.python.org/3/c-api/import.html#c._inittab to permit > (initfunc) to be a pointer to a PyInit_* function OR to a > PyModuleExport_* function. > > We'd then have to determine which was which at runtime when processing > the inittab internally, by checking whether or not the result of the > call was a PyModuleDef or not. That would work, but I don't see much of an advantage over allowing PyInit_* itself to return either module or PyModuleDef. > For the inittab generation side, freeze would need to be updated to: > > * allow builtin modules to be specifically nominated as "initialised > modules" or "defined modules" > * allow the default handling of builtin modules not nominated as one > or the other to be configured > * for backwards compatibility, builtin modules would be treated as > initialised modules by default > > If you had a new module that was export only, you'd get a link time > error looking for the init function that didn't exist if you didn't > explicitly flag it as a "defined module". Similarly, if you switched > the default to be defined modules, you'd get a link time error for a > legacy module that didn't support the new API. > > Does that approach sound plausible to you? I think the "initialized" vs. "exported" distinction is an implementation detail of the module, and this would expose it too much. According to its README, freeze "[parses] the program (and all its modules) and scans the generated byte code for IMPORT instructions". I think py2exe does something similar. The end users of such tools would need to designate which modules use init vs. export. Allowing PyInit to optionally return PyModuleDef is a bit of a hack, but it keeps the details isolated between the module and the import machinery. PyModuleDef is a PyObject, so the PyInit signature matches. Just the PyInit name is a bit misleading :( I think I have a favorite direction now. (Sorry for asking for directions and then wanting to ignore them! The discussion is helpful.) Somewhat related: any thoughts on the legacy init example code [0]? You asked for an example like this; is it what you had in mind? If you compile this with a PEP-489 Python with the stable API, the .so can be used with older Pythons as well. I now think it's a bit silly: it would be enough to use #ifdef: define either PyModuleExport or PyInit, depending on the Python version. This won't do if you're targetting the stable API, but in that case you can't use any of the new PEP 489 features anyway, so it's enough to only define PyInit. Or is there something I missed? [0] https://www.python.org/dev/peps/pep-0489/#legacy-init From ncoghlan at gmail.com Thu May 14 18:45:54 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 15 May 2015 02:45:54 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 In-Reply-To: References: <554B8626.8000709@gmail.com> Message-ID: On 14 May 2015 at 22:38, Petr Viktorin wrote: > I think the "initialized" vs. "exported" distinction is an > implementation detail of the module, and this would expose it too > much. > According to its README, freeze "[parses] the program (and all its > modules) and scans the generated byte code for IMPORT instructions". I > think py2exe does something similar. The end users of such tools would > need to designate which modules use init vs. export. > > Allowing PyInit to optionally return PyModuleDef is a bit of a hack, > but it keeps the details isolated between the module and the import > machinery. > PyModuleDef is a PyObject, so the PyInit signature matches. Just the > PyInit name is a bit misleading :( Agreed it makes the name of PyInit_* a bit misleading, but also agreed that it sounds like a good trick for making this work in a way that can handle _PyImport_inittab appropriately. In terms of documenting it in a way that lets the hook name still make sense, perhaps we can refer to returning PyModuleDef as "multi-phase initialisation"? That is: - initialise the module definition - create the module object - execute the module body If you *don't* return a module definition, then the import system will assume single phase initialisation. > I think I have a favorite direction now. (Sorry for asking for > directions and then wanting to ignore them! The discussion is > helpful.) I find that seeing a suggestion I don't like often sparks new ideas as I attempt to figure out why I don't like it :) > Somewhat related: any thoughts on the legacy init example code [0]? > You asked for an example like this; is it what you had in mind? If you > compile this with a PEP-489 Python with the stable API, the .so can be > used with older Pythons as well. > I now think it's a bit silly: it would be enough to use #ifdef: define > either PyModuleExport or PyInit, depending on the Python version. > This won't do if you're targetting the stable API, but in that case > you can't use any of the new PEP 489 features anyway, so it's enough > to only define PyInit. > Or is there something I missed? I think the idea above makes it mandatory to use "#ifdef" to request multi-phase initialisation on 3.5+ and single-phase initialisation on earlier versions. An example of the relevant incantations might still be useful though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu May 14 21:04:35 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 14 May 2015 21:04:35 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 In-Reply-To: References: <554B8626.8000709@gmail.com> Message-ID: On Thu, May 14, 2015 at 6:45 PM, Nick Coghlan wrote: > On 14 May 2015 at 22:38, Petr Viktorin wrote: >> Allowing PyInit to optionally return PyModuleDef is a bit of a hack, >> but it keeps the details isolated between the module and the import >> machinery. >> PyModuleDef is a PyObject, so the PyInit signature matches. Just the >> PyInit name is a bit misleading :( > > Agreed it makes the name of PyInit_* a bit misleading, but also agreed > that it sounds like a good trick for making this work in a way that > can handle _PyImport_inittab appropriately. > > In terms of documenting it in a way that lets the hook name still make > sense, perhaps we can refer to returning PyModuleDef as "multi-phase > initialisation"? That is: > > - initialise the module definition > - create the module object > - execute the module body Yes! That'll even make a much better name for the PEP; currently it reads like "yet another change". (I hope I can rename a PEP once submitted?) >> Somewhat related: any thoughts on the legacy init example code [0]? >> You asked for an example like this; is it what you had in mind? If you >> compile this with a PEP-489 Python with the stable API, the .so can be >> used with older Pythons as well. >> I now think it's a bit silly: it would be enough to use #ifdef: define >> either PyModuleExport or PyInit, depending on the Python version. >> This won't do if you're targetting the stable API, but in that case >> you can't use any of the new PEP 489 features anyway, so it's enough >> to only define PyInit. >> Or is there something I missed? > > I think the idea above makes it mandatory to use "#ifdef" to request > multi-phase initialisation on 3.5+ and single-phase initialisation on > earlier versions. An example of the relevant incantations might still > be useful though. Definitely. From ncoghlan at gmail.com Fri May 15 08:10:02 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 15 May 2015 16:10:02 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading; version 4 In-Reply-To: References: <554B8626.8000709@gmail.com> Message-ID: On 15 May 2015 05:04, "Petr Viktorin" wrote: > > On Thu, May 14, 2015 at 6:45 PM, Nick Coghlan wrote: > > On 14 May 2015 at 22:38, Petr Viktorin wrote: > >> Allowing PyInit to optionally return PyModuleDef is a bit of a hack, > >> but it keeps the details isolated between the module and the import > >> machinery. > >> PyModuleDef is a PyObject, so the PyInit signature matches. Just the > >> PyInit name is a bit misleading :( > > > > Agreed it makes the name of PyInit_* a bit misleading, but also agreed > > that it sounds like a good trick for making this work in a way that > > can handle _PyImport_inittab appropriately. > > > > In terms of documenting it in a way that lets the hook name still make > > sense, perhaps we can refer to returning PyModuleDef as "multi-phase > > initialisation"? That is: > > > > - initialise the module definition > > - create the module object > > - execute the module body > > Yes! That'll even make a much better name for the PEP; currently it > reads like "yet another change". > (I hope I can rename a PEP once submitted?) Yes, renaming is fine. That's one of the advantages of using PEP numbers in their permanent URLs, rather than their names. Cheers, Nick. P.S. I think this change makes this PEP another fine example of why reference implementations are such an important part of the process - they usually uncover issues and implications that *nobody* had thought of yet :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Mon May 18 16:02:37 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 18 May 2015 16:02:37 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 Message-ID: <5559F0FD.3080704@gmail.com> Hello! I've sent the latest update of PEP 489 to the editors. I am quite happy with how it turned out, and I don't expect too many further changes. In this iteration, PyModuleExport is removed, and instead PyInit can return a PyModuleDef instead of an initialized module. This means you can again derive the hook name from the module name, which is necessary for PyImport_Inittab and its supporting code, and the freeze tool. The mechanism the PEP introduces is now called "multi-phase initialization", and the PEP is renamed to reflect that. Thanks Nick for the discussion, and the name! The new PEP also mentions built-in modules, which will also support multi-phase init. Per-module state is now allocated at the beginning of the execute step; the presence of the state pointer is checked to prevent re-running exec on reload. Also, docstrings and methods from the PyModuleDef are always added to whatever create returns, even if it's not a PyModule (sub)type. The implementation [0] should be complete and tested now. It is at the point of needing a second pair of eyes :) I have made the changes for non-Linux platforms, but I have no way to test them. Documentation still remains to be written. [0] https://github.com/encukou/cpython/compare/master...encukou:pep489.patch The PEP should be live soon; in the mean time, here is the text: PEP: 489 Title: Multi-phase extension module initialization Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015 Resolution: Abstract ======== This PEP proposes a redesign of the way in which built-in and extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension authors to only define features they need, and to allow future additions to extension module declarations. Extensions modules are created in a two-step process, fitting better into the ModuleSpec architecture, with parallels to __new__ and __init__ of classes. Extension modules can safely store arbitrary C-level per-module state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. The proposal also allows extension modules with non-ASCII names. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module object is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions (i.e. shared libraries) and built-in modules, the module init function is executed straight away and does both the creation and initialization. The initialization function is not passed the ModuleSpec, or any information it contains, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. In Py3, modules are also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to re-import it and thus run into an infinite loop when it executes the module init function again. Without access to the fully-qualified module name, it is not trivial to correctly add the module to sys.modules either. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of "__init__.py" modules, i.e. packages, especially when relative imports are being used at module init time. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or interpreter reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps backwards compatibility, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process =================== Currently, extension and built-in modules export an initialization function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return a fully initialized module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef object. It then continues to initialize it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the fully qualified name of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The proposal ============ The initialization function (PyInit_modulename) will be allowed to return a pointer to a PyModuleDef object. The import machinery will be in charge of constructing the module object, calling hooks provided in the PyModuleDef in the relevant phases of initialization (as described below). This multi-phase initialization is an additional possibility. Single-phase initialization, the current practice of returning a fully initialized module object, will still be accepted, so existing code will work unchanged, including binary compatibility. The PyModuleDef structure will be changed to contain a list of slots, similarly to PEP 384's PyType_Spec for types. To keep binary compatibility, and avoid needing to introduce a new structure (which would introduce additional supporting functions and per-module storage), the currently unused m_reload pointer of PyModuleDef will be changed to hold the slots. The structures are defined as:: typedef struct { int slot; void *value; } PyModuleDef_Slot; typedef struct PyModuleDef { PyModuleDef_Base m_base; const char* m_name; const char* m_doc; Py_ssize_t m_size; PyMethodDef *m_methods; PyModuleDef_Slot *m_slots; /* changed from `inquiry m_reload;` */ traverseproc m_traverse; inquiry m_clear; freefunc m_free; } PyModuleDef; The *m_slots* member must be either NULL, or point to an array of PyModuleDef_Slot structures, terminated by a slot with id set to 0 (i.e. ``{0, NULL}``). To specify a slot, a unique slot ID must be provided. New Python versions may introduce new slot IDs, but slot IDs will never be recycled. Slots may get deprecated, but will continue to be supported throughout Python 3.x. A slot's value pointer may not be NULL, unless specified otherwise in the slot's documentation. The following slots are currently available, and described later: * Py_mod_create * Py_mod_exec Unknown slot IDs will cause the import to fail with SystemError. When using multi-phase initialization, the *m_name* field of PyModuleDef will not be used during importing; the module name will be taken from the ModuleSpec. To prevent crashes when the module is loaded in older versions of Python, the PyModuleDef object must be initialized using the newly added PyModuleDef_Init function. For example, an extension module "example" would be exported as:: static PyModuleDef example_def = {...} PyMODINIT_FUNC PyInit_example(void) { return PyModuleDef_Init(&example_def); } The PyModuleDef object must be available for the lifetime of the module created from it ? usually, it will be declared statically. Module Creation Phase --------------------- Creation of the module object ? that is, the implementation of ExecutionLoader.create_module ? is governed by the Py_mod_create slot. The Py_mod_create slot ...................... The Py_mod_create slot is used to support custom module subclasses. The value pointer must point to a function with the following signature:: PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def) The function receives a ModuleSpec instance, as defined in PEP 451, and the PyModuleDef structure. It should return a new module object, or set an error and return NULL. This function is not responsible for setting import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) on the new module. There is no requirement for the returned object to be an instance of types.ModuleType. Any type can be used, as long as it supports setting and getting attributes, including at least the import-related attributes. However, only ModuleType instances support module-specific functionality such as per-module state. Note that when this function is called, the module's entry in sys.modules is not populated yet. Attempting to import the same module again (possibly transitively), may lead to an infinite loop. Extension authors are advised to keep Py_mod_create minimal, an in particular to not call user code from it. Multiple Py_mod_create slots may not be specified. If they are, import will fail with SystemError. If Py_mod_create is not specified, the import machinery will create a normal module object using PyModule_New. The name is taken from *spec*. Post-creation steps ................... If the Py_mod_create function returns an instance of types.ModuleType or a subclass (or if a Py_mod_create slot is not present), the import machinery will associate the PyModuleDef with the module, making it accessible to PyModule_GetDef, and enabling the m_traverse, m_clear and m_free hooks. If the Py_mod_create function does not return a module subclass, then m_size must be 0, and m_traverse, m_clear and m_free must all be NULL. Otherwise, SystemError is raised. Additionally, initial attributes specified in the PyModuleDef are set on the module object, regardless of its type: * The docstring is set from m_doc, if non-NULL. * The module's functions are initialized from m_methods, if any. Module Execution Phase ---------------------- Module execution -- that is, the implementation of ExecutionLoader.exec_module -- is governed by "execution slots". This PEP only adds one, Py_mod_exec, but others may be added in the future. Execution slots may be specified multiple times, and are processed in the order they appear in the slots array. When using the default import machinery, they are processed after import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) are set and the module is added to sys.modules. Pre-Execution steps ------------------- Before processing the execution slots, per-module state is allocated for the module. From this point on, per-module state is accessible through PyModule_GetState. The Py_mod_exec slot .................... The entry in this slot must point to a function with the following signature:: int (*PyModuleExecFunction)(PyObject* module) It will be called to initialize a module. Usually, this amounts to setting the module's initial attributes. The "module" argument receives the module object to initialize. If PyModuleExec replaces the module's entry in sys.modules, the new object will be used and returned by importlib machinery. (This mirrors the behavior of Python modules. Note that implementing Py_mod_create is usually a better solution for the use cases this serves.) The function must return ``0`` on success, or, on error, set an exception and return ``-1``. Legacy Init ----------- The backwards-compatible single-phase initialization continues to be supported. In this scheme, the PyInit function returns a fully initialized module rather than a PyModuleDef object. In this case, the PyInit hook implements the creation phase, and the execution phase is a no-op. Modules that need to work unchanged on older versions of Python should not use multi-phase initialization, because the benefits it brings can't be back-ported. Nevertheless, here is an example of a module that supports multi-phase initialization, and falls back to single-phase when compiled for an older version of CPython:: #include static int spam_exec(PyObject *module) { PyModule_AddStringConstant(module, "food", "spam"); return 0; } #ifdef Py_mod_exec static PyModuleDef_Slot spam_slots[] = { {Py_mod_exec, spam_exec}, {0, NULL} }; #endif static PyModuleDef spam_def = { PyModuleDef_HEAD_INIT, /* m_base */ "spam", /* m_name */ PyDoc_STR("Utilities for cooking spam"), /* m_doc */ 0, /* m_size */ NULL, /* m_methods */ #ifdef Py_mod_exec spam_slots, /* m_slots */ #else NULL, #endif NULL, /* m_traverse */ NULL, /* m_clear */ NULL, /* m_free */ }; PyMODINIT_FUNC PyInit_spam(void) { #ifdef Py_mod_exec return PyModuleDef_Init(&spam_def); #else PyObject *module; module = PyModule_Create(&spam_def); if (module == NULL) return NULL; if (spam_exec(module) != 0) { Py_DECREF(module); return NULL; } return module; #endif } Built-In modules ---------------- Any extension module can be used as a built-in module by linking it into the executable, and including it in the inittab (either at runtime with PyImport_AppendInittab, or at configuration time, using tools like *freeze*). To keep this possibility, all changes to extension module loading introduced in this PEP will also apply to built-in modules. The only exception is non-ASCII module names, explained below. Subinterpreters and Interpreter Reloading ----------------------------------------- Extensions using the new initialization scheme are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. The mechanism is designed to make this easy, but care is still required on the part of the extension author. No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object's storage reachable by PyModule_GetState. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. Functions incompatible with multi-phase initialization ------------------------------------------------------ The PyModule_Create function will fail when used on a PyModuleDef structure with a non-NULL *m_slots* pointer. The function doesn't have access to the ModuleSpec object necessary for multi-phase initialization. The PyState_FindModule function will return NULL, and PyState_AddModule and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*. PyState registration is disabled because multiple module objects may be created from the same PyModuleDef. Module state and C-level callbacks ---------------------------------- Due to the unavailability of PyState_FindModule, any function that needs access to module-level state (including functions, classes or exceptions defined at the module level) must receive a reference to the module object (or the particular object it needs), either directly or indirectly. This is currently difficult in two situations: * Methods of classes, which receive a reference to the class, but not to the class's module * Libraries with C-level callbacks, unless the callbacks can receive custom data set at callback registration Fixing these cases is outside of the scope of this PEP, but will be needed for the new mechanism to be useful to all modules. Proper fixes have been discussed on the import-sig mailing list [#findmodule-discussion]_. As a rule of thumb, modules that rely on PyState_FindModule are, at the moment, not good candidates for porting to the new mechanism. New Functions ------------- A new function and macro implementing the module creation phase will be added. These are similar to PyModule_Create and PyModule_Create2, except they take an additional ModuleSpec argument, and handle module definitions with non-NULL slots:: PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec) PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec, int module_api_version) A new function implementing the module execution phase will be added. This allocates per-module state (if not allocated already), and *always* processes execution slots. The import machinery calls this method when a module is executed, unless the module is being reloaded:: PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def) Another function will be introduced to initialize a PyModuleDef object. This idempotent function fills in the type, refcount, and module index. It returns its argument cast to PyObject*, so it can be returned directly from a PyInit function:: PyObject * PyModuleDef_Init(PyModuleDef *); Additionally, two helpers will be added for setting the docstring and methods on a module:: int PyModule_SetDocString(PyObject *, const char *) int PyModule_AddFunctions(PyObject *, PyMethodDef *) Export Hook Name ---------------- As portable C identifiers are limited to ASCII, module names must be encoded to form the PyInit hook name. For ASCII module names, the import hook is named PyInit_, where is the name of the module. For module names containing non-ASCII characters, the import hook is named PyInitU_, where the name is encoded using CPython's "punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix), with hyphens ("-") replaced by underscores ("_"). In Python:: def export_hook_name(name): try: suffix = b'_' + name.encode('ascii') except UnicodeEncodeError: suffix = b'U_' + name.encode('punycode').replace(b'-', b'_') return b'PyInit' + suffix Examples: ============= =================== Module name Init hook name ============= =================== spam PyInit_spam lan?m?t PyInitU_lanmt_2sa6t ??? PyInitU_zck5b2b ============= =================== For modules with non-ASCII names, single-phase initialization is not supported. In the initial implementation of this PEP, built-in modules with non-ASCII names will not be supported. Module Reloading ---------------- Reloading an extension module using importlib.reload() will continue to have no effect, except re-setting import-related attributes. Due to limitations in shared library loading (both dlopen on POSIX and LoadModuleEx on Windows), it is not generally possible to load a modified library after it has changed on disk. Use cases for reloading other than trying out a new version of the module are too rare to require all module authors to keep reloading in mind. If reload-like functionality is needed, authors can export a dedicated function for it. Multiple modules in one library ------------------------------- To support multiple Python modules in one shared library, the library can export additional PyInit* symbols besides the one that corresponds to the library's filename. Note that this mechanism can currently only be used to *load* extra modules, but not to *find* them. Given the filesystem location of a shared library and a module name, a module may be loaded with:: import importlib.machinery import importlib.util loader = importlib.machinery.ExtensionFileLoader(name, path) spec = importlib.util.spec_from_loader(name, loader) module = importlib.util.module_from_spec(spec) loader.exec_module(module) return module On platforms that support symbolic links, these may be used to install one library under multiple names, exposing all exported modules to normal import machinery. Testing and initial implementations ----------------------------------- For testing, a new built-in module ``_testmultiphase`` will be created. The library will export several additional modules using the mechanism described in "Multiple modules in one library". The ``_testcapi`` module will be unchanged, and will use single-phase initialization indefinitely (or until it is no longer supported). The ``array`` and ``xx*`` modules will be converted to use multi-phase initialization as part of the initial implementation. Summary of API Changes and Additions ------------------------------------ New functions: * PyModule_FromDefAndSpec (macro) * PyModule_FromDefAndSpec2 * PyModule_ExecDef * PyModule_SetDocString * PyModule_AddFunctions * PyModuleDef_Init New macros: * Py_mod_create * Py_mod_exec New types: * PyModuleDef_Type will be exposed New structures: * PyModuleDef_Slot PyModuleDef.m_reload changes to PyModuleDef.m_slots. Possible Future Extensions ========================== The slots mechanism, inspired by PyType_Slot from PEP 384, allows later extensions. Some extension modules exports many constants; for example _ssl has a long list of calls in the form:: PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", PY_SSL_ERROR_ZERO_RETURN); Converting this to a declarative list, similar to PyMethodDef, would reduce boilerplate, and provide free error-checking which is often missing. String constants and types can be handled similarly. (Note that non-default bases for types cannot be portably specified statically; this case would need a Py_mod_exec function that runs before the slots are added. The free error-checking would still be beneficial, though.) Another possibility is providing a "main" function that would be run when the module is given to Python's -m switch. For this to work, the runpy module will need to be modified to take advantage of ModuleSpec-based loading introduced in PEP 451. Also, it will be necessary to add a mechanism for setting up a module according to slots it wasn't originally defined with. Implementation ============== Work-in-progress implementation is available in a Github repository [#gh-repo]_; a patchset is at [#gh-patch]_. Previous Approaches =================== Stefan Behnel's initial proto-PEP [#stefans_protopep]_ had a "PyInit_modulename" hook that would create a module class, whose ``__init__`` would be then called to create the module. This proposal did not correspond to the (then nonexistent) PEP 451, where module creation and initialization is broken into distinct steps. It also did not support loading an extension into pre-existing module objects. Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype implementation [#nicks-prototype]_. At this time PEP 451 was still not implemented, so the prototype does not use ModuleSpec. The original version of this PEP used Create and Exec hooks, and allowed loading into arbitrary pre-constructed objects with Exec hook. The proposal made extension module initialization closer to how Python modules are initialized, but it was later recognized that this isn't an important goal. The current PEP describes a simpler solution. A further iteration used a "PyModuleExport" hook as an alternative to PyInit, where PyInit was used for existing scheme, and PyModuleExport for multi-phase. However, not being able to determine the hook name based on module name complicated automatic generation of PyImport_Inittab by tools like freeze. Keeping only the PyInit hook name, even if it's not entirely appropriate for exporting a definition, yielded a much simpler solution. References ========== .. [#lazy_import_concerns] https://mail.python.org/pipermail/python-dev/2013-August/128129.html .. [#pep-0451-attributes] https://www.python.org/dev/peps/pep-0451/#attributes .. [#stefans_protopep] https://mail.python.org/pipermail/python-dev/2013-August/128087.html .. [#nicks-prototype] https://mail.python.org/pipermail/python-dev/2013-August/128101.html .. [#rfc-3492] http://tools.ietf.org/html/rfc3492 .. [#gh-repo] https://github.com/encukou/cpython/commits/pep489 .. [#gh-patch] https://github.com/encukou/cpython/compare/master...encukou:pep489.patch .. [#findmodule-discussion] https://mail.python.org/pipermail/import-sig/2015-April/000959.html Copyright ========= This document has been placed in the public domain. From solipsis at pitrou.net Mon May 18 16:51:03 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 May 2015 16:51:03 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 References: <5559F0FD.3080704@gmail.com> Message-ID: <20150518165103.34c9ed20@fsol> Hi, On Mon, 18 May 2015 16:02:37 +0200 Petr Viktorin wrote: > > I've sent the latest update of PEP 489 to the editors. I am quite happy > with how it turned out, and I don't expect too many further changes. I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be discouraged in favour of custom module object fields? Regards Antoine. From encukou at gmail.com Mon May 18 17:07:20 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 18 May 2015 17:07:20 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <20150518165103.34c9ed20@fsol> References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> Message-ID: On Mon, May 18, 2015 at 4:51 PM, Antoine Pitrou wrote: > > Hi, > > On Mon, 18 May 2015 16:02:37 +0200 > Petr Viktorin wrote: >> >> I've sent the latest update of PEP 489 to the editors. I am quite happy >> with how it turned out, and I don't expect too many further changes. > > I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be > discouraged in favour of custom module object fields? No, it's the other way around -- we want to discourage using custom module subclasses; most modules should just customize the exec phase. From solipsis at pitrou.net Mon May 18 17:15:07 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 May 2015 17:15:07 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> Message-ID: <20150518171507.6f711718@fsol> On Mon, 18 May 2015 17:07:20 +0200 Petr Viktorin wrote: > On Mon, May 18, 2015 at 4:51 PM, Antoine Pitrou wrote: > > > > Hi, > > > > On Mon, 18 May 2015 16:02:37 +0200 > > Petr Viktorin wrote: > >> > >> I've sent the latest update of PEP 489 to the editors. I am quite happy > >> with how it turned out, and I don't expect too many further changes. > > > > I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be > > discouraged in favour of custom module object fields? > > No, it's the other way around -- we want to discourage using custom > module subclasses; most modules should just customize the exec phase. Can you explain why? The module state mechanism has turned out to be cumbersome and inefficient, and is the main reason why PEP 3121 conversions of many stdlib modules have been deferred or abandoned. A fast, easy way to access module "state" without defining global variables at the C level is required. Regards Antoine. From encukou at gmail.com Mon May 18 17:32:13 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 18 May 2015 17:32:13 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <20150518171507.6f711718@fsol> References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> Message-ID: On Mon, May 18, 2015 at 5:15 PM, Antoine Pitrou wrote: > On Mon, 18 May 2015 17:07:20 +0200 > Petr Viktorin wrote: >> On Mon, May 18, 2015 at 4:51 PM, Antoine Pitrou wrote: >> > >> > Hi, >> > >> > On Mon, 18 May 2015 16:02:37 +0200 >> > Petr Viktorin wrote: >> >> >> >> I've sent the latest update of PEP 489 to the editors. I am quite happy >> >> with how it turned out, and I don't expect too many further changes. >> > >> > I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be >> > discouraged in favour of custom module object fields? >> >> No, it's the other way around -- we want to discourage using custom >> module subclasses; most modules should just customize the exec phase. > > Can you explain why? The module state mechanism has turned out to be > cumbersome and inefficient, and is the main reason why PEP 3121 > conversions of many stdlib modules have been deferred or abandoned. One reason against custom module subclasses is that it won't be easy to support "python -m" for them (see https://mail.python.org/pipermail/import-sig/2015-March/000923.html) Nick, can you give some others? Preferring real module objects is something I remember from our early discussions. > A fast, easy way to access module "state" without defining global > variables at the C level is required. You can have a custom subclass, or you can use per-module state, or put a capsule in the module dict. This PEP doesn't add a fourth better way, but I don't think that's really in its scope ("The goal is [...] bringing extension modules closer to the way Python modules behave"). What it does do, with slots, is provide a mechanism to add such a better way in the future, relatively painlessly. From ncoghlan at gmail.com Mon May 18 17:55:19 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 19 May 2015 01:55:19 +1000 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> Message-ID: On 19 May 2015 01:32, "Petr Viktorin" wrote: > > On Mon, May 18, 2015 at 5:15 PM, Antoine Pitrou wrote: > > On Mon, 18 May 2015 17:07:20 +0200 > > Petr Viktorin wrote: > >> On Mon, May 18, 2015 at 4:51 PM, Antoine Pitrou wrote: > >> > > >> > Hi, > >> > > >> > On Mon, 18 May 2015 16:02:37 +0200 > >> > Petr Viktorin wrote: > >> >> > >> >> I've sent the latest update of PEP 489 to the editors. I am quite happy > >> >> with how it turned out, and I don't expect too many further changes. > >> > > >> > I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be > >> > discouraged in favour of custom module object fields? > >> > >> No, it's the other way around -- we want to discourage using custom > >> module subclasses; most modules should just customize the exec phase. > > > > Can you explain why? The module state mechanism has turned out to be > > cumbersome and inefficient, and is the main reason why PEP 3121 > > conversions of many stdlib modules have been deferred or abandoned. > > One reason against custom module subclasses is that it won't be easy > to support "python -m" for them (see > https://mail.python.org/pipermail/import-sig/2015-March/000923.html) > Nick, can you give some others? Preferring real module objects is > something I remember from our early discussions. I thought you talked me out of that somewhere along the line? My recollection at this point is that I was originally wanting the use of the Create slot to be compatible with runpy, but didn't actually have a compelling reason for why we should accept that as a design constraint. > > A fast, easy way to access module "state" without defining global > > variables at the C level is required. > > You can have a custom subclass, or you can use per-module state, or > put a capsule in the module dict. > This PEP doesn't add a fourth better way, but I don't think that's > really in its scope ("The goal is [...] bringing extension modules > closer to the way Python modules behave"). What it does do, with > slots, is provide a mechanism to add such a better way in the future, > relatively painlessly. Right, I think there's still a problem worth solving there, but I don't think this specific PEP needs to solve it directly. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon May 18 17:58:03 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 May 2015 17:58:03 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> Message-ID: <20150518175803.03a1e0cf@fsol> On Mon, 18 May 2015 17:32:13 +0200 Petr Viktorin wrote: > > > A fast, easy way to access module "state" without defining global > > variables at the C level is required. > > You can have a custom subclass, or you can use per-module state, or > put a capsule in the module dict. The latter two are cumbersome and inefficient. Only custom subclasses can make things easy and fast at the C level. Which is why I'm surprised that you seem to be encouraging, or not discouraging, the "module state" API. Regards Antoine. From encukou at gmail.com Mon May 18 18:27:50 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 18 May 2015 18:27:50 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <20150518175803.03a1e0cf@fsol> References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> <20150518175803.03a1e0cf@fsol> Message-ID: On Mon, May 18, 2015 at 5:58 PM, Antoine Pitrou wrote: > On Mon, 18 May 2015 17:32:13 +0200 > Petr Viktorin wrote: >> >> > A fast, easy way to access module "state" without defining global >> > variables at the C level is required. >> >> You can have a custom subclass, or you can use per-module state, or >> put a capsule in the module dict. > > The latter two are cumbersome and inefficient. Only custom subclasses > can make things easy and fast at the C level. With per-module state, you need a one-liner macro, and a pointer dereference at runtime. Is that too cumbersome and inefficient, or am I missing something? The PEP still supports custom subclasses, for cases where you need easy and fast module state. > Which is why I'm surprised that you seem to be encouraging, or not > discouraging, the "module state" API. No, I'm not discouraging it. The PEP makes sure it continues to work. Should there be another PEP to deprecate it? From solipsis at pitrou.net Mon May 18 18:57:13 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 May 2015 18:57:13 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> <20150518175803.03a1e0cf@fsol> Message-ID: <20150518185713.0c07a4c8@fsol> On Mon, 18 May 2015 18:27:50 +0200 Petr Viktorin wrote: > On Mon, May 18, 2015 at 5:58 PM, Antoine Pitrou wrote: > > On Mon, 18 May 2015 17:32:13 +0200 > > Petr Viktorin wrote: > >> > >> > A fast, easy way to access module "state" without defining global > >> > variables at the C level is required. > >> > >> You can have a custom subclass, or you can use per-module state, or > >> put a capsule in the module dict. > > > > The latter two are cumbersome and inefficient. Only custom subclasses > > can make things easy and fast at the C level. > > With per-module state, you need a one-liner macro, and a pointer > dereference at runtime. Is that too cumbersome and inefficient, or am > I missing something? The main problem is the PyState_FindModule() function. It's not terribly efficient, and most of all you have to check its return value for NULL. Regards Antoine. From encukou at gmail.com Mon May 18 19:06:53 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 18 May 2015 19:06:53 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <20150518185713.0c07a4c8@fsol> References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> <20150518175803.03a1e0cf@fsol> <20150518185713.0c07a4c8@fsol> Message-ID: On Mon, May 18, 2015 at 6:57 PM, Antoine Pitrou wrote: > On Mon, 18 May 2015 18:27:50 +0200 > Petr Viktorin wrote: >> On Mon, May 18, 2015 at 5:58 PM, Antoine Pitrou wrote: >> > On Mon, 18 May 2015 17:32:13 +0200 >> > Petr Viktorin wrote: >> >> >> >> > A fast, easy way to access module "state" without defining global >> >> > variables at the C level is required. >> >> >> >> You can have a custom subclass, or you can use per-module state, or >> >> put a capsule in the module dict. >> > >> > The latter two are cumbersome and inefficient. Only custom subclasses >> > can make things easy and fast at the C level. >> >> With per-module state, you need a one-liner macro, and a pointer >> dereference at runtime. Is that too cumbersome and inefficient, or am >> I missing something? > > The main problem is the PyState_FindModule() function. It's not > terribly efficient, and most of all you have to check its return value > for NULL. Ah, but that one is orthogonal to per-module state. The PyState_FindModule is concerned with finding "the" module corresponding to a given PyModuleDef in a given interpreter. The problem it attempts to solve is that the module can't easily be passed around to all the places that need it. You'd actually have the exact same problem with a custom subclass -- it's finding the module instance that's the problem, not getting data from it. The PEP actually discourages PyState_FindModule quite strongly: this family of functions just doesn't work with modules initialized multi-phase init. The PEP tells you that if you need PyState_FindModule, we're sorry, and you should stick to the old way of doing things until we solve the problem (and then it links to preliminary discussion about the solution, which is out of its scope). https://www.python.org/dev/peps/pep-0489/#functions-incompatible-with-multi-phase-initialization https://www.python.org/dev/peps/pep-0489/#module-state-and-c-level-callbacks From solipsis at pitrou.net Mon May 18 19:17:21 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 May 2015 19:17:21 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> <20150518175803.03a1e0cf@fsol> <20150518185713.0c07a4c8@fsol> Message-ID: <20150518191721.7bbde78f@fsol> On Mon, 18 May 2015 19:06:53 +0200 Petr Viktorin wrote: > > > > The main problem is the PyState_FindModule() function. It's not > > terribly efficient, and most of all you have to check its return value > > for NULL. > > Ah, but that one is orthogonal to per-module state. The > PyState_FindModule is concerned with finding "the" module > corresponding to a given PyModuleDef in a given interpreter. > The problem it attempts to solve is that the module can't easily be > passed around to all the places that need it. You'd actually have the > exact same problem with a custom subclass -- it's finding the module > instance that's the problem, not getting data from it. That's a fair point. But it means the PEP won't help those stdlib modules which haven't been converted to PEP 3121, then. Regards Antoine. From encukou at gmail.com Mon May 18 19:35:57 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 18 May 2015 19:35:57 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <20150518191721.7bbde78f@fsol> References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> <20150518175803.03a1e0cf@fsol> <20150518185713.0c07a4c8@fsol> <20150518191721.7bbde78f@fsol> Message-ID: On Mon, May 18, 2015 at 7:17 PM, Antoine Pitrou wrote: > On Mon, 18 May 2015 19:06:53 +0200 > Petr Viktorin wrote: >> > >> > The main problem is the PyState_FindModule() function. It's not >> > terribly efficient, and most of all you have to check its return value >> > for NULL. >> >> Ah, but that one is orthogonal to per-module state. The >> PyState_FindModule is concerned with finding "the" module >> corresponding to a given PyModuleDef in a given interpreter. >> The problem it attempts to solve is that the module can't easily be >> passed around to all the places that need it. You'd actually have the >> exact same problem with a custom subclass -- it's finding the module >> instance that's the problem, not getting data from it. > > That's a fair point. But it means the PEP won't help those stdlib > modules which haven't been converted to PEP 3121, then. Correct. This is not the PEP you're looking for. Originally we did want to solve this problem, and I guess wording that suggests it's solved might still be around. Is that the case? Should I clarify that the problem is not yet solved? As the author, it's easy for me to lose track of the big picture. From solipsis at pitrou.net Mon May 18 19:42:02 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 18 May 2015 19:42:02 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> <20150518175803.03a1e0cf@fsol> <20150518185713.0c07a4c8@fsol> <20150518191721.7bbde78f@fsol> Message-ID: <20150518194202.394abe0a@fsol> On Mon, 18 May 2015 19:35:57 +0200 Petr Viktorin wrote: > > Correct. This is not the PEP you're looking for. > > Originally we did want to solve this problem, and I guess wording that > suggests it's solved might still be around. Is that the case? Should I > clarify that the problem is not yet solved? The following wording in the PEP: """This PEP proposes a redesign of the way in which built-in and extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121 , but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451 .""" suggests that it will indeed help overcome the issues with PEP 3121. It turns out it doesn't, except in one specific case (i.e. Cython). Regards Antoine. From encukou at gmail.com Mon May 18 19:49:43 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 18 May 2015 19:49:43 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <20150518194202.394abe0a@fsol> References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol> <20150518171507.6f711718@fsol> <20150518175803.03a1e0cf@fsol> <20150518185713.0c07a4c8@fsol> <20150518191721.7bbde78f@fsol> <20150518194202.394abe0a@fsol> Message-ID: On Mon, May 18, 2015 at 7:42 PM, Antoine Pitrou wrote: > On Mon, 18 May 2015 19:35:57 +0200 > Petr Viktorin wrote: >> >> Correct. This is not the PEP you're looking for. >> >> Originally we did want to solve this problem, and I guess wording that >> suggests it's solved might still be around. Is that the case? Should I >> clarify that the problem is not yet solved? > > The following wording in the PEP: > > """This PEP proposes a redesign of the way in which built-in and > extension modules interact with the import machinery. This was last > revised for Python 3.0 in PEP 3121 , but did not solve all problems at > the time. The goal is to solve them by bringing extension modules > closer to the way Python modules behave; specifically to hook into the > ModuleSpec-based loading mechanism introduced in PEP 451 .""" > > suggests that it will indeed help overcome the issues with PEP 3121. It > turns out it doesn't, except in one specific case (i.e. Cython). Ah, the abstract. My eyes must have glazed over, and I didn't expand "PEP 3121" when re-reading it. I'll reword this. Thanks for noticing, and sorry for the confusion! From ericsnowcurrently at gmail.com Tue May 19 02:07:57 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 18 May 2015 18:07:57 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <5559F0FD.3080704@gmail.com> References: <5559F0FD.3080704@gmail.com> Message-ID: Thanks for working on this, Petr (et al.). Sorry I've missed the previous discussion. Comments are in-line. -eric On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin wrote: > [snip] > > Furthermore, the majority of currently existing extension modules has > problems with sub-interpreter support and/or interpreter reloading, and, > while > it is possible with the current infrastructure to support these > features, it is neither easy nor efficient. > Addressing these issues was the goal of PEP 3121, but many extensions, > including some in the standard library, took the least-effort approach > to porting to Python 3, leaving these issues unresolved. > This PEP keeps backwards compatibility, which should reduce pressure and > give > extension authors adequate time to consider these issues when porting. So just be to sure I understand, now PyModuleDef.m_slots will unambiguously indicate whether or not an extension module is compliant, right? > [snip] > > The proposal > ============ This section should include an indication of how the loader (and perhaps finder) will change for builtin, frozen, and extension modules. It may help to describe the proposal up front by how the loader implementation would look if it were somehow implemented in Python code. The subsequent sections sometimes indicate where different things take place, but an explicit outline (as Python code) would make the entire flow really obvious. Putting that toward the beginning of this section would help clearly set the stage for the rest of the proposal. > [snip] > Unknown slot IDs will cause the import to fail with SystemError. Was there any consideration made for just ignoring unknown slot IDs? My gut reaction is that you have it the right way, but I can still imagine use cases for custom slots that PyModuleDef_Init wouldn't know about. > > When using multi-phase initialization, the *m_name* field of PyModuleDef > will > not be used during importing; the module name will be taken from the > ModuleSpec. So m_name will be strictly ignored by PyModuleDef_Init? > > To prevent crashes when the module is loaded in older versions of Python, > the PyModuleDef object must be initialized using the newly added > PyModuleDef_Init function. > For example, an extension module "example" would be exported as:: > > static PyModuleDef example_def = {...} > > PyMODINIT_FUNC > PyInit_example(void) > { > return PyModuleDef_Init(&example_def); > } This example is helpful. :) > > The PyModuleDef object must be available for the lifetime of the module > created > from it ? usually, it will be declared statically. How easily will this be a source of mysterious errors-at-a-distance? > [snip] > However, only ModuleType instances support module-specific functionality > such as per-module state. This is a pretty important point. Presumably this constraints later behavior and precedes all functionality related to per-module state. > [snip] > Extension authors are advised to keep Py_mod_create minimal, an in > particular > to not call user code from it. This is a pretty important point as well. We'll need to make sure this is sufficiently clear in the documentation. Would it make sense to provide helpers for common cases, to encourage extension authors to keep the create function minimal? > [snip] > > If PyModuleExec replaces the module's entry in sys.modules, > the new object will be used and returned by importlib machinery. Just to be sure, something like "mod = sys.modules[modname]" is done before each execution slot. In other words, the result of the previous execution slot should be used for the next one. > (This mirrors the behavior of Python modules. Note that implementing > Py_mod_create is usually a better solution for the use cases this serves.) Could you elaborate? What are those use cases and why would Py_mod_create be better? > [snip] > > Modules that need to work unchanged on older versions of Python should not > use multi-phase initialization, because the benefits it brings can't be > back-ported. Given your example below, "should not" seems a bit strong to me. In fact, what are the objections to encouraging the approach from the example? > Nevertheless, here is an example of a module that supports multi-phase > initialization, and falls back to single-phase when compiled for an older > version of CPython:: > > #include > > static int spam_exec(PyObject *module) { > PyModule_AddStringConstant(module, "food", "spam"); > return 0; > } > > #ifdef Py_mod_exec > static PyModuleDef_Slot spam_slots[] = { > {Py_mod_exec, spam_exec}, > {0, NULL} > }; > #endif > > static PyModuleDef spam_def = { > PyModuleDef_HEAD_INIT, /* m_base */ > "spam", /* m_name */ > PyDoc_STR("Utilities for cooking spam"), /* m_doc */ > 0, /* m_size */ > NULL, /* m_methods */ > #ifdef Py_mod_exec > spam_slots, /* m_slots */ > #else > NULL, > #endif > NULL, /* m_traverse */ > NULL, /* m_clear */ > NULL, /* m_free */ > }; > > PyMODINIT_FUNC > PyInit_spam(void) { > #ifdef Py_mod_exec > return PyModuleDef_Init(&spam_def); > #else > PyObject *module; > module = PyModule_Create(&spam_def); > if (module == NULL) return NULL; > if (spam_exec(module) != 0) { > Py_DECREF(module); > return NULL; > } > return module; > #endif > } > This example is really helpful! > [snip] > > Subinterpreters and Interpreter Reloading > ----------------------------------------- > > Extensions using the new initialization scheme are expected to support > subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. Presumably this support is explicitly and completely defined in the subsequent sentences. Is it really just keeping "hidden" module state encapsulated on the module object? If not then it may make sense to enumerate the requirements better for the sake of extension module authors. > The mechanism is designed to make this easy, but care is still required > on the part of the extension author. > No user-defined functions, methods, or instances may leak to different > interpreters. > To achieve this, all module-level state should be kept in either the module > dict, or in the module object's storage reachable by PyModule_GetState. Is this programmatically enforceable? Is there any mechanism for easily copying module state? How about sharing some state between subinterpreters? How much room is there for letting extension module authors define how their module behaves across multiple interpreters or across multiple Initialize/Finalize cycles? > A simple rule of thumb is: Do not define any static data, except > built-in types > with no mutable or user-settable class attributes. This is another one of those points that needs to be crystal clear in the docs. > As a rule of thumb, modules that rely on PyState_FindModule are, at the > moment, > not good candidates for porting to the new mechanism. Are there any plans for a follow-up effort to help with this case? > [snip] > > Module Reloading > ---------------- > > Reloading an extension module using importlib.reload() will continue to > have no effect, except re-setting import-related attributes. > > Due to limitations in shared library loading (both dlopen on POSIX and > LoadModuleEx on Windows), it is not generally possible to load > a modified library after it has changed on disk. > > Use cases for reloading other than trying out a new version of the module > are too rare to require all module authors to keep reloading in mind. > If reload-like functionality is needed, authors can export a dedicated > function for it. Keep in mind the semantics of reload for pure Python modules. The module is executed into the existing namespace, overwriting the loaded namespace but leaving non-colliding attributes alone. While the semantics for reloading an extension/builtin/frozen module are currently basic (i.e. a no-op), there may well be room to support reload behavior that mirrors that of pure Python modules without needing to reload an SO file. I would expect either the behavior of exec to get repeated (tricky due to "hidden" module state?) or for there to be a "reload" slot that would mirror Py_mod_exec. At the same time, one may argue that reloading modules is not something to encourage. :) > > > Multiple modules in one library > ------------------------------- > > To support multiple Python modules in one shared library, the library can > export additional PyInit* symbols besides the one that corresponds > to the library's filename. > > Note that this mechanism can currently only be used to *load* extra modules, > but not to *find* them. What do you mean by "currently"? It may also be worth tying the above statement with the following text, since the following appears to be an explanation of how to address the "finder" caveat. > > Given the filesystem location of a shared library and a module name, > a module may be loaded with:: > > import importlib.machinery > import importlib.util > loader = importlib.machinery.ExtensionFileLoader(name, path) > spec = importlib.util.spec_from_loader(name, loader) > module = importlib.util.module_from_spec(spec) > loader.exec_module(module) > return module > > On platforms that support symbolic links, these may be used to install one > library under multiple names, exposing all exported modules to normal > import machinery. > > > Testing and initial implementations > ----------------------------------- > > For testing, a new built-in module ``_testmultiphase`` will be created. > The library will export several additional modules using the mechanism > described in "Multiple modules in one library". > > The ``_testcapi`` module will be unchanged, and will use single-phase > initialization indefinitely (or until it is no longer supported). > > The ``array`` and ``xx*`` modules will be converted to use multi-phase > initialization as part of the initial implementation. What do you mean by "initial implementation"? Will it be done differently in a later implementation? > > > Summary of API Changes and Additions > ------------------------------------ > > New functions: > > * PyModule_FromDefAndSpec (macro) > * PyModule_FromDefAndSpec2 > * PyModule_ExecDef > * PyModule_SetDocString > * PyModule_AddFunctions > * PyModuleDef_Init > > New macros: > > * Py_mod_create > * Py_mod_exec > > New types: > > * PyModuleDef_Type will be exposed > > New structures: > > * PyModuleDef_Slot > > PyModuleDef.m_reload changes to PyModuleDef.m_slots. This section is missing any explanation of the impact on Python/import.c, on the _imp/imp module, and on the 3 finders/loaders in Lib/importlib/_bootstrap[_external].py (builtin/frozen/extension). > > > Possible Future Extensions > ========================== > > The slots mechanism, inspired by PyType_Slot from PEP 384, > allows later extensions. > > Some extension modules exports many constants; for example _ssl has > a long list of calls in the form:: > > PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", > PY_SSL_ERROR_ZERO_RETURN); > > Converting this to a declarative list, similar to PyMethodDef, > would reduce boilerplate, and provide free error-checking which > is often missing. Great idea, including as it applies to other constants and types. > > String constants and types can be handled similarly. > (Note that non-default bases for types cannot be portably specified > statically; this case would need a Py_mod_exec function that runs > before the slots are added. The free error-checking would still be > beneficial, though.) This implies to me that now is the time to ensure that this PEP appropriately accommodates that need. It would be unfortunate if we had to later hack in some extra API to accommodate a use case we already know about. Better if we made sure the currently proposed changes could accommodate the need, even if the implementation of that part were not part of this PEP. > > Another possibility is providing a "main" function that would be run > when the module is given to Python's -m switch. > For this to work, the runpy module will need to be modified to take > advantage of ModuleSpec-based loading introduced in PEP 451. I'll point out that the pure-Python equivalent has been proposed on a number of occasions and been rejected every time. However, in the case of extension modules it is more justifiable. If extension modules gain such a mechanism then it may be a justification for doing something similar in Python. > Also, it will be necessary to add a mechanism for setting up a module > according to slots it wasn't originally defined with. What does this mean? > > > Implementation > ============== > > Work-in-progress implementation is available in a Github repository > [#gh-repo]_; > a patchset is at [#gh-patch]_. I'll have to take a look. > [snip] From ncoghlan at gmail.com Tue May 19 05:51:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 19 May 2015 13:51:22 +1000 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> Message-ID: On 19 May 2015 at 10:07, Eric Snow wrote: > On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin wrote: >> [snip] >> >> Furthermore, the majority of currently existing extension modules has >> problems with sub-interpreter support and/or interpreter reloading, and, >> while >> it is possible with the current infrastructure to support these >> features, it is neither easy nor efficient. >> Addressing these issues was the goal of PEP 3121, but many extensions, >> including some in the standard library, took the least-effort approach >> to porting to Python 3, leaving these issues unresolved. >> This PEP keeps backwards compatibility, which should reduce pressure and >> give >> extension authors adequate time to consider these issues when porting. > > So just be to sure I understand, now PyModuleDef.m_slots will > unambiguously indicate whether or not an extension module is > compliant, right? I'm not sure what you mean by "compliant". A non-NULL m_slots will indicate usage of multi-phase initialisation, so it at least indicates *intent* to correctly support subinterpreters et al. Actual delivery on that promise is still a different question :) >> [snip] >> >> The proposal >> ============ > > This section should include an indication of how the loader (and > perhaps finder) will change for builtin, frozen, and extension > modules. It may help to describe the proposal up front by how the > loader implementation would look if it were somehow implemented in > Python code. The subsequent sections sometimes indicate where > different things take place, but an explicit outline (as Python code) > would make the entire flow really obvious. Putting that toward the > beginning of this section would help clearly set the stage for the > rest of the proposal. +1 for a pseudo-code overview of the loader implementation. > >> [snip] >> Unknown slot IDs will cause the import to fail with SystemError. > > Was there any consideration made for just ignoring unknown slot IDs? > My gut reaction is that you have it the right way, but I can still > imagine use cases for custom slots that PyModuleDef_Init wouldn't know > about. The "known slots only, all other slot IDs are reserved for future use" slot semantics were copied directly from PyType_FromSpec in PEP 384. Since it's just a numeric slot ID, you'd run a high risk of conflicts if you allowed for custom extensions. If folks want to do more clever things, they'll need to use the create or exec slot to stash them on the module object, rather than storing them in the module definition. >> The PyModuleDef object must be available for the lifetime of the module >> created >> from it ? usually, it will be declared statically. > > How easily will this be a source of mysterious errors-at-a-distance? It shouldn't be any worse than static type definitions, and normal reference counting semantics should keep it alive regardless. >> [snip] >> Extension authors are advised to keep Py_mod_create minimal, an in >> particular >> to not call user code from it. > > This is a pretty important point as well. We'll need to make sure > this is sufficiently clear in the documentation. Would it make sense > to provide helpers for common cases, to encourage extension authors to > keep the create function minimal? The main encouragement is to not handcode your extension modules at all, and let something like Cython or SWIG take care of the boilerplate :) >> [snip] >> >> If PyModuleExec replaces the module's entry in sys.modules, >> the new object will be used and returned by importlib machinery. > > Just to be sure, something like "mod = sys.modules[modname]" is done > before each execution slot. In other words, the result of the > previous execution slot should be used for the next one. That's not the original intent of this paragraph - rather, it is referring to the existing behaviour of the import machinery. However, I agree that now we're allowing the Py_mod_exec slot to be supplied multiple times, we should also be updating the module reference between slot invocations. I also think the PEP could do with a brief mention of the additional modularity this approach brings at the C level - rather than having to jam everything into one function, an extension module can easily break up its initialisation into multiple steps, and its technically even possible to share common steps between different modules. >> (This mirrors the behavior of Python modules. Note that implementing >> Py_mod_create is usually a better solution for the use cases this serves.) > > Could you elaborate? What are those use cases and why would > Py_mod_create be better? Rather than replacing the implicitly created normal module during Py_mod_exec (which is the only option available to Python modules), PEP 489 lets you define the Py_mod_create slot to override the module object creation directly. Outside conversion of a Python module that manipulates sys.modules to an extension module with Cython, there's no real reason to use the "replacing yourself in sys.modules" option over using Py_mod_create directly. >> [snip] >> >> Modules that need to work unchanged on older versions of Python should not >> use multi-phase initialization, because the benefits it brings can't be >> back-ported. > > Given your example below, "should not" seems a bit strong to me. In > fact, what are the objections to encouraging the approach from the > example? Agreed, "should not" is probably too strong here. On the other hand, preserving compatibility with older Python versions in a module that has been updated to rely on multi-phase initialization is likely to be a matter of "graceful degradation", rather than being able to reproduce comparable functionality (which I believe may have been the point Petr was trying to convey). I expect Cython and SWIG may be able to manage that through appropriate use of #ifdef's in the generated code, but doing it by hand is likely to be painful, hence the potential benefits of just sticking with single-phase initialisation for the time being. >> [snip] >> >> Subinterpreters and Interpreter Reloading >> ----------------------------------------- >> >> Extensions using the new initialization scheme are expected to support >> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. > > Presumably this support is explicitly and completely defined in the > subsequent sentences. Is it really just keeping "hidden" module state > encapsulated on the module object? If not then it may make sense to > enumerate the requirements better for the sake of extension module > authors. I'd actually like to have a better way of doing scenario testing for extension modules (subinterpreters, multiple initialize/finalize cycles, freezing), but I'm not sure this PEP is the best place to define that. Perhaps we could do a PyPI project that was a tox-based test battery for this kind of thing? >> The mechanism is designed to make this easy, but care is still required >> on the part of the extension author. >> No user-defined functions, methods, or instances may leak to different >> interpreters. >> To achieve this, all module-level state should be kept in either the module >> dict, or in the module object's storage reachable by PyModule_GetState. > > Is this programmatically enforceable? Is there any mechanism for > easily copying module state? How about sharing some state between > subinterpreters? How much room is there for letting extension module > authors define how their module behaves across multiple interpreters > or across multiple Initialize/Finalize cycles? It's not programmatically enforcable, hence the idea above of finding a way to make it easier for people to test their extension modules are importable across multiple Python versions and deployment scenarios. >> As a rule of thumb, modules that rely on PyState_FindModule are, at the >> moment, >> not good candidates for porting to the new mechanism. > > Are there any plans for a follow-up effort to help with this case? The problem here is that the PEP 3121 module state approach provides storage on a *per-interpreter* basis, that is then shared amongst all module instances created from a given module definition. This means that when _PyImport_FindExtensionObject (see https://hg.python.org/cpython/file/fc2eed9fc2d0/Python/import.c#l518) reinitialises an extension module, the state is shared between the two instances. When PEP 3121 was written, this was not seen as a problem, since the expectation was that the behaviour would only be triggered by multiple interpreter level initialize/finalize cycles. One key scenario we missed at the time was "deleting an extension module from sys.modules and importing it a second time, while retaining a local reference for later restoration". Under PEP 3121, the two instances collide on their state storage, as we have two simultaneously existing module objects created in the same interpreter from the same module definition. PEP 489 would inherit that same problem if you tried to use it with the PyState_* APIs, so it simply doesn't allow them at all. (Earlier versions of the PEP allowed it with an "EXPORT_SINGLETON" slot that would disallow reimporting entirely, which we took out in favour of "just keep using the existing initialisation model in those cases for the time being") For pure Python code, we don't have this problem, since the interpreter takes care of providing a properly scoped globals() reference to *all* functions defined in that module, regardless of whether they're module level functions or method definitions on a class. At the C level, we don't have that, as only module level functions get a module reference passed in - methods only get a reference to their class instance, without a reference to the module globals, and delayed callbacks can be a problem as well. The best improved API we could likely offer at this point is a convenience API for looking up a module in *sys.modules* based on a PyModuleDef instance, and updating PEP 489 to write the as-imported module name into the returned PyModuleDef structure. That's probably not a bad way to go, given that PEP 489 currently *ignores* the m_name slot - flipping it around to be a *writable* slot would be a way to let extension modules know dynamically how to look themselves up in sys.modules. The new lookup API would then be the moral equivalent of Python code doing "mod = sys.modules[__name__]". With this approach, actively *using* multiple references to a given module at the same time would still break (since you'll always get the module currently in sys.modules, even if that isn't the one you expected), but the "save-and-restore" model needed for certain kinds of testing and potentially other scenarios would work correctly. >> Module Reloading >> ---------------- >> >> Reloading an extension module using importlib.reload() will continue to >> have no effect, except re-setting import-related attributes. >> >> Due to limitations in shared library loading (both dlopen on POSIX and >> LoadModuleEx on Windows), it is not generally possible to load >> a modified library after it has changed on disk. >> >> Use cases for reloading other than trying out a new version of the module >> are too rare to require all module authors to keep reloading in mind. >> If reload-like functionality is needed, authors can export a dedicated >> function for it. > > Keep in mind the semantics of reload for pure Python modules. The > module is executed into the existing namespace, overwriting the loaded > namespace but leaving non-colliding attributes alone. While the > semantics for reloading an extension/builtin/frozen module are > currently basic (i.e. a no-op), there may well be room to support > reload behavior that mirrors that of pure Python modules without > needing to reload an SO file. I would expect either the behavior of > exec to get repeated (tricky due to "hidden" module state?) or for > there to be a "reload" slot that would mirror Py_mod_exec. We considered this, and decided it was fairly pointless, since you can't modify the extension module code. The one case I see where it potentially makes sense is a "transitive reload", where the extension module retrieves and caches attributes from another pure Python module at import time, and that extension module has been reloaded. It may also make a difference in the context of utilities like https://docs.python.org/3/library/test.html#test.support.import_fresh_module, where we manipulate the import system state to control how conditional imports are handled. > At the same time, one may argue that reloading modules is not > something to encourage. :) There's a reason import_fresh_module has never made it out of test.support :) >> Multiple modules in one library >> ------------------------------- >> >> To support multiple Python modules in one shared library, the library can >> export additional PyInit* symbols besides the one that corresponds >> to the library's filename. >> >> Note that this mechanism can currently only be used to *load* extra modules, >> but not to *find* them. > > What do you mean by "currently"? It's a limitation of the way the existing finders work, rather than an inherent limitation of the import system as a whole. > It may also be worth tying the above statement with the following > text, since the following appears to be an explanation of how to > address the "finder" caveat. Agreed that this could be clearer. >> Testing and initial implementations >> ----------------------------------- >> >> For testing, a new built-in module ``_testmultiphase`` will be created. >> The library will export several additional modules using the mechanism >> described in "Multiple modules in one library". >> >> The ``_testcapi`` module will be unchanged, and will use single-phase >> initialization indefinitely (or until it is no longer supported). >> >> The ``array`` and ``xx*`` modules will be converted to use multi-phase >> initialization as part of the initial implementation. > > What do you mean by "initial implementation"? Will it be done > differently in a later implementation? These modules will be converted in the reference implementation, other modules won't be. >> String constants and types can be handled similarly. >> (Note that non-default bases for types cannot be portably specified >> statically; this case would need a Py_mod_exec function that runs >> before the slots are added. The free error-checking would still be >> beneficial, though.) > > This implies to me that now is the time to ensure that this PEP > appropriately accommodates that need. It would be unfortunate if we > had to later hack in some extra API to accommodate a use case we > already know about. Better if we made sure the currently proposed > changes could accommodate the need, even if the implementation of that > part were not part of this PEP. This would be a new kind of execution slot, so the PEP already accommodates these possible future extensions. >> Another possibility is providing a "main" function that would be run >> when the module is given to Python's -m switch. >> For this to work, the runpy module will need to be modified to take >> advantage of ModuleSpec-based loading introduced in PEP 451. > > I'll point out that the pure-Python equivalent has been proposed on a > number of occasions and been rejected every time. However, in the > case of extension modules it is more justifiable. If extension > modules gain such a mechanism then it may be a justification for doing > something similar in Python. > >> Also, it will be necessary to add a mechanism for setting up a module >> according to slots it wasn't originally defined with. > > What does this mean? When you use the -m switch, you always run in the builtin __main__ module namespace, and runpy fiddles with __main__.__spec__ to match the details of the module passed to the switch. That's not currently a trick we can manage when the "thing to run" is an extension module. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Tue May 19 13:06:31 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 19 May 2015 13:06:31 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> Message-ID: <555B1937.5020001@gmail.com> On 05/19/2015 05:51 AM, Nick Coghlan wrote: > On 19 May 2015 at 10:07, Eric Snow wrote: >> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin wrote: >>> [snip] >>> >>> Furthermore, the majority of currently existing extension modules has >>> problems with sub-interpreter support and/or interpreter reloading, and, >>> while >>> it is possible with the current infrastructure to support these >>> features, it is neither easy nor efficient. >>> Addressing these issues was the goal of PEP 3121, but many extensions, >>> including some in the standard library, took the least-effort approach >>> to porting to Python 3, leaving these issues unresolved. >>> This PEP keeps backwards compatibility, which should reduce pressure and >>> give >>> extension authors adequate time to consider these issues when porting. >> >> So just be to sure I understand, now PyModuleDef.m_slots will >> unambiguously indicate whether or not an extension module is >> compliant, right? > > I'm not sure what you mean by "compliant". A non-NULL m_slots will > indicate usage of multi-phase initialisation, so it at least indicates > *intent* to correctly support subinterpreters et al. Actual delivery > on that promise is still a different question :) Yes, non-NULL m_slots means the module is compliant. If it's not, it's a bug in the *module* (i.e. compliance is not *just* a matter of setting setting m_slots). This will be explained in the docs. >>> [snip] >>> >>> The proposal >>> ============ >> >> This section should include an indication of how the loader (and >> perhaps finder) will change for builtin, frozen, and extension >> modules. It may help to describe the proposal up front by how the >> loader implementation would look if it were somehow implemented in >> Python code. The subsequent sections sometimes indicate where >> different things take place, but an explicit outline (as Python code) >> would make the entire flow really obvious. Putting that toward the >> beginning of this section would help clearly set the stage for the >> rest of the proposal. > > +1 for a pseudo-code overview of the loader implementation. OK. Along with a link to PEP 451 code [*], it should make things clearer. [*] https://www.python.org/dev/peps/pep-0451/#how-loading-will-work >>> [snip] >>> Unknown slot IDs will cause the import to fail with SystemError. >> >> Was there any consideration made for just ignoring unknown slot IDs? >> My gut reaction is that you have it the right way, but I can still >> imagine use cases for custom slots that PyModuleDef_Init wouldn't know >> about. > > The "known slots only, all other slot IDs are reserved for future use" > slot semantics were copied directly from PyType_FromSpec in PEP 384. > Since it's just a numeric slot ID, you'd run a high risk of conflicts > if you allowed for custom extensions. > > If folks want to do more clever things, they'll need to use the create > or exec slot to stash them on the module object, rather than storing > them in the module definition. Right, if you need custom behavior, put it in a function and use the provided hook. (If you need custom "slots" on PyModuleDef for some reason, use a PyModuleDef subclass -- but I can't see where it would be helpful.) Ignoring unknown slot IDs would mean letting errors go unnoticed. (Technicality: PyModuleDef_Init doesn't care about slots; PyModule_FromDefAndSpec and PyModule_ExecDef do. and they will raise the errors.) >> When using multi-phase initialization, the *m_name* field of PyModuleDef >> will >> not be used during importing; the module name will be taken from the >> ModuleSpec. > > So m_name will be strictly ignored by PyModuleDef_Init? Yes. The name is useful for introspection, but the import machinery will use the name provided by the ModuleSpec. (Technicality: again, PyModuleDef_Init doesn't touch names at all. PyModule_FromDefAndSpec and PyModule_ExecDef do, and they will ignore the name from the def.) >>> The PyModuleDef object must be available for the lifetime of the module >>> created >>> from it ? usually, it will be declared statically. >> >> How easily will this be a source of mysterious errors-at-a-distance? > > It shouldn't be any worse than static type definitions, and normal > reference counting semantics should keep it alive regardless. It's the the same as the current behavior (PEP 3121), where a PyModuleDef is stored in the module, and if you let it die, PyModule_GetState will give you an invalid pointer. It's just that in PEP 489, the import machinery itself uses def, so you actually get to feel the pain if you deallocate it. All in all, this should not be a problem in practice; the PEP specifies what'll happen if you go off doing exotic things. (For example, Cython might run into this if it tries implementing a reloading scheme we talked about earlier in the thread, and even then it shouldn't be a major source of mysterious errors.) Normal mortals will be OK. >> [snip] >> However, only ModuleType instances support module-specific functionality >> such as per-module state. > > This is a pretty important point. Presumably this constraints later > behavior and precedes all functionality related to per-module state. Yes. Module objects support more module-like behavior than other objects. What you can and cannot use should be clear from the API. I'll clarify a bit more what functionality depends on using a PyModule_Type (or subclass) instance. One thing I see I forgot to add is that execution slots are looked up via PyModule_GetDef, so they won't be processed on non-module objects. It's a very good idea to use a module subclass rather than a completely custom object. The docs will need to strongly recommend this. >>> [snip] >>> Extension authors are advised to keep Py_mod_create minimal, an in >>> particular >>> to not call user code from it. >> >> This is a pretty important point as well. We'll need to make sure >> this is sufficiently clear in the documentation. Would it make sense >> to provide helpers for common cases, to encourage extension authors to >> keep the create function minimal? > > The main encouragement is to not handcode your extension modules at > all, and let something like Cython or SWIG take care of the > boilerplate :) Yes, Cython should be default. For hand-written modules, the common case should be not defining create at all. >>> [snip] >>> >>> If PyModuleExec replaces the module's entry in sys.modules, >>> the new object will be used and returned by importlib machinery. >> >> Just to be sure, something like "mod = sys.modules[modname]" is done >> before each execution slot. In other words, the result of the >> previous execution slot should be used for the next one. > > That's not the original intent of this paragraph - rather, it is > referring to the existing behaviour of the import machinery. > > However, I agree that now we're allowing the Py_mod_exec slot to be > supplied multiple times, we should also be updating the module > reference between slot invocations. No, that won't work. It's possible (via direct calls to the import machinery) to load a module without adding it to sys.modules. The behavior should be clear (when you think about it) after I include the loader implementation pseudocode. > I also think the PEP could do with a brief mention of the additional > modularity this approach brings at the C level - rather than having to > jam everything into one function, an extension module can easily break > up its initialisation into multiple steps, and its technically even > possible to share common steps between different modules. Eh, I think it's better to create one function that calls the parts, which was always possible, and works just as well. Repeating slots is allowed because it would be an unnecessary bother to check for duplicates. It's not a feature to advertise, the PEP just specifies that in the weird edge case, the intuitive thing will happen. (I did have a useful future use case for repeated slots, but the current PEP allows a better and more obvious solution so I'll not even mention it again.) Still, the steps are processed in a loop from a single function (PyModule_ExecDef), and that function operates on a module object -- it doesn't know about sys.modules and can't easily check if you replaced the module somewhere. >>> (This mirrors the behavior of Python modules. Note that implementing >>> Py_mod_create is usually a better solution for the use cases this serves.) >> >> Could you elaborate? What are those use cases and why would >> Py_mod_create be better? > > Rather than replacing the implicitly created normal module during > Py_mod_exec (which is the only option available to Python modules), > PEP 489 lets you define the Py_mod_create slot to override the module > object creation directly. > > Outside conversion of a Python module that manipulates sys.modules to > an extension module with Cython, there's no real reason to use the > "replacing yourself in sys.modules" option over using Py_mod_create > directly. Yes. The workaround you need to use in Python modules is possible for extensions, but there's no reason to use it. I'll try to make it clearer that it's an unnecessary workaround. >>> [snip] >>> >>> Modules that need to work unchanged on older versions of Python should not >>> use multi-phase initialization, because the benefits it brings can't be >>> back-ported. >> >> Given your example below, "should not" seems a bit strong to me. In >> fact, what are the objections to encouraging the approach from the >> example? > > Agreed, "should not" is probably too strong here. On the other hand, > preserving compatibility with older Python versions in a module that > has been updated to rely on multi-phase initialization is likely to be > a matter of "graceful degradation", rather than being able to > reproduce comparable functionality (which I believe may have been the > point Petr was trying to convey). My point is that if you need graceful degradation, your best bet is to stick with single-phase init. Then you'll have one code path that works the same on all versions. If you *need* the features of multi-phase init, you need to remove support for Pythons that don't have it. If you need both backwards compatibility and multi-phase init, you essentially need to create two modules (with shared contents), and make sure they end up in the same state after they're loaded. > I expect Cython and SWIG may be able to manage that through > appropriate use of #ifdef's in the generated code, but doing it by > hand is likely to be painful, hence the potential benefits of just > sticking with single-phase initialisation for the time being. Yes, code generators are in a position to create two versions of the module, and select one using using #ifdef. The example in the PEP is helpful for other reasons than encouraging #ifdef: it shows what needs to change when porting. Think of it as a diff :) >>> [snip] >>> >>> Subinterpreters and Interpreter Reloading >>> ----------------------------------------- >>> >>> Extensions using the new initialization scheme are expected to support >>> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. >> >> Presumably this support is explicitly and completely defined in the >> subsequent sentences. Is it really just keeping "hidden" module state >> encapsulated on the module object? If not then it may make sense to >> enumerate the requirements better for the sake of extension module >> authors. It is explained in the docs, see "Bugs and caveats" here: https://docs.python.org/3/c-api/init.html#sub-interpreter-support I'll add a link to that page. > I'd actually like to have a better way of doing scenario testing for > extension modules (subinterpreters, multiple initialize/finalize > cycles, freezing), but I'm not sure this PEP is the best place to > define that. Perhaps we could do a PyPI project that was a tox-based > test battery for this kind of thing? I think that's the wrong place to start. Currently, sub-interpreter support is buried away in a docs chapter about Python initialization/finalization, so a typical extension author won't even notice it. We need to first make it *possible* to support subinterpreters easily and correctly (so that Cython can do it), and to document it prominently in the "writing extensions" part of the docs, not only in "extending Python". Then, This PEP does part of the first step, and the docs for it (which aren't written yet) will do the second step. After that, it could make sense to provide a tool for testing this. >>> The mechanism is designed to make this easy, but care is still required >>> on the part of the extension author. >>> No user-defined functions, methods, or instances may leak to different >>> interpreters. >>> To achieve this, all module-level state should be kept in either the module >>> dict, or in the module object's storage reachable by PyModule_GetState. >> >> Is this programmatically enforceable? No. (I believe you could even prove this formally.) >> Is there any mechanism for easily copying module state? No. This would be impossible to provide in the general case. It's the responsibility of your C code. That said, if you need to copy module state, chances are your design could use some rethinking. >> How about sharing some state between subinterpreters? The PyCapsule API was designed for this. >> How much room is there for letting extension module >> authors define how their module behaves across multiple interpreters >> or across multiple Initialize/Finalize cycles? Technically, you have all the freedom you want. But if I embed Python into my project/library, I'd want multiple sub-interpreters completely isolated by default. If I use two libraries that each embed Python into my app, I definitely want them isolated. So the PEP tries to make it easy to keep multiple interpreters isolated. > It's not programmatically enforcable, hence the idea above of finding > a way to make it easier for people to test their extension modules are > importable across multiple Python versions and deployment scenarios. > >>> As a rule of thumb, modules that rely on PyState_FindModule are, at the >>> moment, >>> not good candidates for porting to the new mechanism. >> >> Are there any plans for a follow-up effort to help with this case? See the link in the PEP. for initial discussion. > The problem here is that the PEP 3121 module state approach provides > storage on a *per-interpreter* basis, that is then shared amongst all > module instances created from a given module definition. > > This means that when _PyImport_FindExtensionObject (see > https://hg.python.org/cpython/file/fc2eed9fc2d0/Python/import.c#l518) > reinitialises an extension module, the state is shared between the two > instances. When PEP 3121 was written, this was not seen as a problem, > since the expectation was that the behaviour would only be triggered > by multiple interpreter level initialize/finalize cycles. > > One key scenario we missed at the time was "deleting an extension > module from sys.modules and importing it a second time, while > retaining a local reference for later restoration". Under PEP 3121, > the two instances collide on their state storage, as we have two > simultaneously existing module objects created in the same interpreter > from the same module definition. PEP 489 would inherit that same > problem if you tried to use it with the PyState_* APIs, so it simply > doesn't allow them at all. (Earlier versions of the PEP allowed it > with an "EXPORT_SINGLETON" slot that would disallow reimporting > entirely, which we took out in favour of "just keep using the existing > initialisation model in those cases for the time being") > > For pure Python code, we don't have this problem, since the > interpreter takes care of providing a properly scoped globals() > reference to *all* functions defined in that module, regardless of > whether they're module level functions or method definitions on a > class. At the C level, we don't have that, as only module level > functions get a module reference passed in - methods only get a > reference to their class instance, without a reference to the module > globals, and delayed callbacks can be a problem as well. > > The best improved API we could likely offer at this point is a > convenience API for looking up a module in *sys.modules* based on a > PyModuleDef instance, and updating PEP 489 to write the as-imported > module name into the returned PyModuleDef structure. That's probably > not a bad way to go, given that PEP 489 currently *ignores* the m_name > slot - flipping it around to be a *writable* slot would be a way to > let extension modules know dynamically how to look themselves up in > sys.modules. > > The new lookup API would then be the moral equivalent of Python code > doing "mod = sys.modules[__name__]". With this approach, actively > *using* multiple references to a given module at the same time would > still break (since you'll always get the module currently in > sys.modules, even if that isn't the one you expected), but the > "save-and-restore" model needed for certain kinds of testing and > potentially other scenarios would work correctly. I still think providing the module to classes is a better idea than a lookup API, but that's going out of scope here. >>> Module Reloading >>> ---------------- >>> >>> Reloading an extension module using importlib.reload() will continue to >>> have no effect, except re-setting import-related attributes. >>> >>> Due to limitations in shared library loading (both dlopen on POSIX and >>> LoadModuleEx on Windows), it is not generally possible to load >>> a modified library after it has changed on disk. >>> >>> Use cases for reloading other than trying out a new version of the module >>> are too rare to require all module authors to keep reloading in mind. >>> If reload-like functionality is needed, authors can export a dedicated >>> function for it. >> >> Keep in mind the semantics of reload for pure Python modules. The >> module is executed into the existing namespace, overwriting the loaded >> namespace but leaving non-colliding attributes alone. While the >> semantics for reloading an extension/builtin/frozen module are >> currently basic (i.e. a no-op), there may well be room to support >> reload behavior that mirrors that of pure Python modules without >> needing to reload an SO file. I would expect either the behavior of >> exec to get repeated (tricky due to "hidden" module state?) or for >> there to be a "reload" slot that would mirror Py_mod_exec. > > We considered this, and decided it was fairly pointless, since you > can't modify the extension module code. The one case I see where it > potentially makes sense is a "transitive reload", where the extension > module retrieves and caches attributes from another pure Python module > at import time, and that extension module has been reloaded. > > It may also make a difference in the context of utilities like > https://docs.python.org/3/library/test.html#test.support.import_fresh_module, > where we manipulate the import system state to control how conditional > imports are handled. > >> At the same time, one may argue that reloading modules is not >> something to encourage. :) > > There's a reason import_fresh_module has never made it out of test.support :) Right. Implementation-wise, it would actually be much easier to support reload rather than make it a no-op. But then C module authors would need to think about this edge case, which might be tricky to get right, would not be likely to get test coverage, and is generally not useful anyway, . If it turns out to be useful, it would be very simple to add an explicit reload slot in the future. >>> Multiple modules in one library >>> ------------------------------- >>> >>> To support multiple Python modules in one shared library, the library can >>> export additional PyInit* symbols besides the one that corresponds >>> to the library's filename. >>> >>> Note that this mechanism can currently only be used to *load* extra modules, >>> but not to *find* them. >> >> What do you mean by "currently"? > > It's a limitation of the way the existing finders work, rather than an > inherent limitation of the import system as a whole. > >> It may also be worth tying the above statement with the following >> text, since the following appears to be an explanation of how to >> address the "finder" caveat. > > Agreed that this could be clearer. OK, I'll clarify. >> Summary of API Changes and Additions >> ------------------------------------ >> >> New functions: >> >> * PyModule_FromDefAndSpec (macro) >> * PyModule_FromDefAndSpec2 >> * PyModule_ExecDef >> * PyModule_SetDocString >> * PyModule_AddFunctions >> * PyModuleDef_Init >> >> New macros: >> >> * Py_mod_create >> * Py_mod_exec >> >> New types: >> >> * PyModuleDef_Type will be exposed >> >> New structures: >> >> * PyModuleDef_Slot >> >> PyModuleDef.m_reload changes to PyModuleDef.m_slots. > > This section is missing any explanation of the impact on > Python/import.c, on the _imp/imp module, and on the 3 finders/loaders > in Lib/importlib/_bootstrap[_external].py (builtin/frozen/extension). I'll add a summary. The internal _imp module will have backwards incompatible changes -- functions will be added and removed as necessary. That's what the underscore means :) The deprecated imp module will get a backwards compatibility shim for anything it imported from _imp that got removed. importlib will stay backwards compatible. Python/import.c and Python/importdl.* will be rewritten entirely. See the patches (linked from the PEP) for details. From encukou at gmail.com Tue May 19 16:55:04 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 19 May 2015 16:55:04 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <555B1937.5020001@gmail.com> References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> Message-ID: <555B4EC8.3020002@gmail.com> On 05/19/2015 01:06 PM, Petr Viktorin wrote: > On 05/19/2015 05:51 AM, Nick Coghlan wrote: >> On 19 May 2015 at 10:07, Eric Snow wrote: >>> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin wrote: [snip] >>>> >>>> The proposal >>>> ============ >>> >>> This section should include an indication of how the loader (and >>> perhaps finder) will change for builtin, frozen, and extension >>> modules. It may help to describe the proposal up front by how the >>> loader implementation would look if it were somehow implemented in >>> Python code. The subsequent sections sometimes indicate where >>> different things take place, but an explicit outline (as Python code) >>> would make the entire flow really obvious. Putting that toward the >>> beginning of this section would help clearly set the stage for the >>> rest of the proposal. >> >> +1 for a pseudo-code overview of the loader implementation. Here is an overview of how the modified importers will operate. Details such as logging or handling of errors and invalid states are left out, and C code is presented with a concise Python-like syntax. The framework that calls the importers is explained in PEP 451 [#pep-0451-loading]_. importlib/_bootstrap.py: class BuiltinImporter: def create_module(self, spec): module = _imp.create_builtin(spec) def exec_module(self, module): _imp.exec_dynamic(module) def load_module(self, name): # use a backwards compatibility shim _load_module_shim(self, name) importlib/_bootstrap_external.py: class ExtensionFileLoader: def create_module(self, spec): module = _imp.create_dynamic(spec) def exec_module(self, module): _imp.exec_dynamic(module) def load_module(self, name): # use a backwards compatibility shim _load_module_shim(self, name) Python/import.c (the _imp module): def create_dynamic(spec): name = spec.name path = spec.origin # Find an already loaded module that used single-phase init. # For multi-phase initialization, mod is NULL, so a new module # is always created. mod = _PyImport_FindExtensionObject(name, name) if mod: return mod return _PyImport_LoadDynamicModuleWithSpec(spec) def exec_dynamic(module): def = PyModule_GetDef(module) state = PyModule_GetState(module) if state is NULL: PyModule_ExecDef(module, def) def create_builtin(spec): name = spec.name # Find an already loaded module that used single-phase init. # For multi-phase initialization, mod is NULL, so a new module # is always created. mod = _PyImport_FindExtensionObject(name, name) if mod: return mod for initname, initfunc in PyImport_Inittab: if name == initname: m = initfunc() if isinstance(m, PyModuleDef): def = m return PyModule_FromDefAndSpec(def, spec) else: # fall back to single-phase initialization module = m _PyImport_FixupExtensionObject(module, name, name) return module Python/importdl.c: def _PyImport_LoadDynamicModuleWithSpec(spec): path = spec.origin package, dot, name = spec.name.rpartition('.') # see the "Non-ASCII module names" section for export_hook_name hook_name = export_hook_name(name) # call platform-specific function for loading exported function # from shared library exportfunc = _find_shared_funcptr(hook_name, path) m = exportfunc() if isinstance(m, PyModuleDef): def = m return PyModule_FromDefAndSpec(def, spec) module = m # fall back to single-phase initialization .... Objects/moduleobject.c: def PyModule_FromDefAndSpec(def, spec): name = spec.name create = None for slot, value in def.m_slots: if slot == Py_mod_create: create = value if create: m = create(spec, def) else: m = PyModule_New(name) if isinstance(m, types.ModuleType): m.md_state = None m.md_def = def if def.m_methods: PyModule_AddFunctions(m, def.m_methods) if def.m_doc: PyModule_SetDocString(m, def.m_doc) def PyModule_ExecDef(module, def): if isinstance(module, types.module_type): if module.md_state is NULL: # allocate a block of zeroed-out memory module.md_state = _alloc(module.md_size) if def.m_slots is NULL: return for slot, value in def.m_slots: if slot == Py_mod_exec: value(module) From ericsnowcurrently at gmail.com Wed May 20 01:56:34 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 19 May 2015 17:56:34 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> Message-ID: On Mon, May 18, 2015 at 9:51 PM, Nick Coghlan wrote: > On 19 May 2015 at 10:07, Eric Snow wrote: [snip] >> Was there any consideration made for just ignoring unknown slot IDs? >> My gut reaction is that you have it the right way, but I can still >> imagine use cases for custom slots that PyModuleDef_Init wouldn't know >> about. > > The "known slots only, all other slot IDs are reserved for future use" > slot semantics were copied directly from PyType_FromSpec in PEP 384. > Since it's just a numeric slot ID, you'd run a high risk of conflicts > if you allowed for custom extensions. > > If folks want to do more clever things, they'll need to use the create > or exec slot to stash them on the module object, rather than storing > them in the module definition. Makes sense. This does remind me of something I wanted to ask. Would it make sense to leverage ModuleSpec.loader_state? If I recall correctly, we added loader_state with extension modules in mind. > >>> The PyModuleDef object must be available for the lifetime of the module >>> created >>> from it ? usually, it will be declared statically. >> >> How easily will this be a source of mysterious errors-at-a-distance? > > It shouldn't be any worse than static type definitions, and normal > reference counting semantics should keep it alive regardless. Got it. > >>> [snip] >>> Extension authors are advised to keep Py_mod_create minimal, an in >>> particular >>> to not call user code from it. >> >> This is a pretty important point as well. We'll need to make sure >> this is sufficiently clear in the documentation. Would it make sense >> to provide helpers for common cases, to encourage extension authors to >> keep the create function minimal? > > The main encouragement is to not handcode your extension modules at > all, and let something like Cython or SWIG take care of the > boilerplate :) Hey, I tried to make something happen over on python-ideas! :) Some folks just don't want to go far enough. [snip] >> Could you elaborate? What are those use cases and why would >> Py_mod_create be better? > > Rather than replacing the implicitly created normal module during > Py_mod_exec (which is the only option available to Python modules), > PEP 489 lets you define the Py_mod_create slot to override the module > object creation directly. > > Outside conversion of a Python module that manipulates sys.modules to > an extension module with Cython, there's no real reason to use the > "replacing yourself in sys.modules" option over using Py_mod_create > directly. Ah, I got it. We just want to ensure we match Python module behavior, where there is no module-defined create step. This would seem even more important with tools like Cython that convert Python modules into C extensions, even if the appropriate solution for a C extension module would be a different approach (e.g. Py_mod_create). [snip] >> Given your example below, "should not" seems a bit strong to me. In >> fact, what are the objections to encouraging the approach from the >> example? > > Agreed, "should not" is probably too strong here. On the other hand, > preserving compatibility with older Python versions in a module that > has been updated to rely on multi-phase initialization is likely to be > a matter of "graceful degradation", rather than being able to > reproduce comparable functionality (which I believe may have been the > point Petr was trying to convey). Understood. This section could stand to be clarified then. > > I expect Cython and SWIG may be able to manage that through > appropriate use of #ifdef's in the generated code, but doing it by > hand is likely to be painful, hence the potential benefits of just > sticking with single-phase initialisation for the time being. Hmm. The example made it look relatively straight-forward. Regardless, it's not a big deal. > >>> [snip] >>> >>> Subinterpreters and Interpreter Reloading >>> ----------------------------------------- >>> >>> Extensions using the new initialization scheme are expected to support >>> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. >> >> Presumably this support is explicitly and completely defined in the >> subsequent sentences. Is it really just keeping "hidden" module state >> encapsulated on the module object? If not then it may make sense to >> enumerate the requirements better for the sake of extension module >> authors. > > I'd actually like to have a better way of doing scenario testing for > extension modules (subinterpreters, multiple initialize/finalize > cycles, freezing), but I'm not sure this PEP is the best place to > define that. Perhaps we could do a PyPI project that was a tox-based > test battery for this kind of thing? Interesting idea. I think that a lot of folks would find that useful. It feels a bit like some of the work Dave Malcolm did with validating extension modules. > >>> The mechanism is designed to make this easy, but care is still required >>> on the part of the extension author. >>> No user-defined functions, methods, or instances may leak to different >>> interpreters. >>> To achieve this, all module-level state should be kept in either the module >>> dict, or in the module object's storage reachable by PyModule_GetState. >> >> Is this programmatically enforceable? Is there any mechanism for >> easily copying module state? How about sharing some state between >> subinterpreters? How much room is there for letting extension module >> authors define how their module behaves across multiple interpreters >> or across multiple Initialize/Finalize cycles? > > It's not programmatically enforcable, hence the idea above of finding > a way to make it easier for people to test their extension modules are > importable across multiple Python versions and deployment scenarios. That's what I figured. > >>> As a rule of thumb, modules that rely on PyState_FindModule are, at the >>> moment, >>> not good candidates for porting to the new mechanism. >> >> Are there any plans for a follow-up effort to help with this case? > > The problem here is that the PEP 3121 module state approach provides > storage on a *per-interpreter* basis, that is then shared amongst all > module instances created from a given module definition. You mean a form of interpreter-local storage? Also, the module definition is effectively global right? > > This means that when _PyImport_FindExtensionObject (see > https://hg.python.org/cpython/file/fc2eed9fc2d0/Python/import.c#l518) > reinitialises an extension module, the state is shared between the two > instances. When PEP 3121 was written, this was not seen as a problem, > since the expectation was that the behaviour would only be triggered > by multiple interpreter level initialize/finalize cycles. > > One key scenario we missed at the time was "deleting an extension > module from sys.modules and importing it a second time, while > retaining a local reference for later restoration". Under PEP 3121, > the two instances collide on their state storage, as we have two > simultaneously existing module objects created in the same interpreter > from the same module definition. PEP 489 would inherit that same > problem if you tried to use it with the PyState_* APIs, so it simply > doesn't allow them at all. (Earlier versions of the PEP allowed it > with an "EXPORT_SINGLETON" slot that would disallow reimporting > entirely, which we took out in favour of "just keep using the existing > initialisation model in those cases for the time being") That seems reasonable. > > For pure Python code, we don't have this problem, since the > interpreter takes care of providing a properly scoped globals() > reference to *all* functions defined in that module, regardless of > whether they're module level functions or method definitions on a > class. At the C level, we don't have that, as only module level > functions get a module reference passed in - methods only get a > reference to their class instance, without a reference to the module > globals, and delayed callbacks can be a problem as well. Yuck. Is this something we could fix? Is __module__ not set on all functions? > > The best improved API we could likely offer at this point is a > convenience API for looking up a module in *sys.modules* based on a > PyModuleDef instance, and updating PEP 489 to write the as-imported > module name into the returned PyModuleDef structure. That's probably > not a bad way to go, given that PEP 489 currently *ignores* the m_name > slot - flipping it around to be a *writable* slot would be a way to > let extension modules know dynamically how to look themselves up in > sys.modules. That sounds useful. > > The new lookup API would then be the moral equivalent of Python code > doing "mod = sys.modules[__name__]". With this approach, actively > *using* multiple references to a given module at the same time would > still break (since you'll always get the module currently in > sys.modules, even if that isn't the one you expected), but the > "save-and-restore" model needed for certain kinds of testing and > potentially other scenarios would work correctly. Right, though I would expect there to be trouble if the replacement module didn't support the module state API in the expected way. > >>> Module Reloading >>> ---------------- >>> >>> Reloading an extension module using importlib.reload() will continue to >>> have no effect, except re-setting import-related attributes. >>> >>> Due to limitations in shared library loading (both dlopen on POSIX and >>> LoadModuleEx on Windows), it is not generally possible to load >>> a modified library after it has changed on disk. >>> >>> Use cases for reloading other than trying out a new version of the module >>> are too rare to require all module authors to keep reloading in mind. >>> If reload-like functionality is needed, authors can export a dedicated >>> function for it. >> >> Keep in mind the semantics of reload for pure Python modules. The >> module is executed into the existing namespace, overwriting the loaded >> namespace but leaving non-colliding attributes alone. While the >> semantics for reloading an extension/builtin/frozen module are >> currently basic (i.e. a no-op), there may well be room to support >> reload behavior that mirrors that of pure Python modules without >> needing to reload an SO file. I would expect either the behavior of >> exec to get repeated (tricky due to "hidden" module state?) or for >> there to be a "reload" slot that would mirror Py_mod_exec. > > We considered this, and decided it was fairly pointless, since you > can't modify the extension module code. The one case I see where it > potentially makes sense is a "transitive reload", where the extension > module retrieves and caches attributes from another pure Python module > at import time, and that extension module has been reloaded. The reload approach specified in the PEP seems satisfactory at this point. > > It may also make a difference in the context of utilities like > https://docs.python.org/3/library/test.html#test.support.import_fresh_module, > where we manipulate the import system state to control how conditional > imports are handled. > >> At the same time, one may argue that reloading modules is not >> something to encourage. :) > > There's a reason import_fresh_module has never made it out of test.support :) > >>> Multiple modules in one library >>> ------------------------------- >>> >>> To support multiple Python modules in one shared library, the library can >>> export additional PyInit* symbols besides the one that corresponds >>> to the library's filename. >>> >>> Note that this mechanism can currently only be used to *load* extra modules, >>> but not to *find* them. >> >> What do you mean by "currently"? > > It's a limitation of the way the existing finders work, rather than an > inherent limitation of the import system as a whole. Ah. It sounded like the PEP was leading to some future solution to resolve that. > >> It may also be worth tying the above statement with the following >> text, since the following appears to be an explanation of how to >> address the "finder" caveat. > > Agreed that this could be clearer. > >>> Testing and initial implementations >>> ----------------------------------- >>> >>> For testing, a new built-in module ``_testmultiphase`` will be created. >>> The library will export several additional modules using the mechanism >>> described in "Multiple modules in one library". >>> >>> The ``_testcapi`` module will be unchanged, and will use single-phase >>> initialization indefinitely (or until it is no longer supported). >>> >>> The ``array`` and ``xx*`` modules will be converted to use multi-phase >>> initialization as part of the initial implementation. >> >> What do you mean by "initial implementation"? Will it be done >> differently in a later implementation? > > These modules will be converted in the reference implementation, other > modules won't be. That's what I thought. The use of the word "initial" threw me off. > >>> String constants and types can be handled similarly. >>> (Note that non-default bases for types cannot be portably specified >>> statically; this case would need a Py_mod_exec function that runs >>> before the slots are added. The free error-checking would still be >>> beneficial, though.) >> >> This implies to me that now is the time to ensure that this PEP >> appropriately accommodates that need. It would be unfortunate if we >> had to later hack in some extra API to accommodate a use case we >> already know about. Better if we made sure the currently proposed >> changes could accommodate the need, even if the implementation of that >> part were not part of this PEP. > > This would be a new kind of execution slot, so the PEP already > accommodates these possible future extensions. Sounds good. The explanation made it sound like a mechanism would be required that could not be handled via a slot. > >>> Another possibility is providing a "main" function that would be run >>> when the module is given to Python's -m switch. >>> For this to work, the runpy module will need to be modified to take >>> advantage of ModuleSpec-based loading introduced in PEP 451. >> >> I'll point out that the pure-Python equivalent has been proposed on a >> number of occasions and been rejected every time. However, in the >> case of extension modules it is more justifiable. If extension >> modules gain such a mechanism then it may be a justification for doing >> something similar in Python. >> >>> Also, it will be necessary to add a mechanism for setting up a module >>> according to slots it wasn't originally defined with. >> >> What does this mean? > > When you use the -m switch, you always run in the builtin __main__ > module namespace, and runpy fiddles with __main__.__spec__ to match > the details of the module passed to the switch. That's not currently a > trick we can manage when the "thing to run" is an extension module. I see now. -eric From ericsnowcurrently at gmail.com Wed May 20 02:22:47 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 19 May 2015 18:22:47 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <555B1937.5020001@gmail.com> References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> Message-ID: On Tue, May 19, 2015 at 5:06 AM, Petr Viktorin wrote: > On 05/19/2015 05:51 AM, Nick Coghlan wrote: >> On 19 May 2015 at 10:07, Eric Snow wrote: >>> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin wrote: >>>> [snip] >>>> >>>> Furthermore, the majority of currently existing extension modules has >>>> problems with sub-interpreter support and/or interpreter reloading, and, >>>> while >>>> it is possible with the current infrastructure to support these >>>> features, it is neither easy nor efficient. >>>> Addressing these issues was the goal of PEP 3121, but many extensions, >>>> including some in the standard library, took the least-effort approach >>>> to porting to Python 3, leaving these issues unresolved. >>>> This PEP keeps backwards compatibility, which should reduce pressure and >>>> give >>>> extension authors adequate time to consider these issues when porting. >>> >>> So just be to sure I understand, now PyModuleDef.m_slots will >>> unambiguously indicate whether or not an extension module is >>> compliant, right? >> >> I'm not sure what you mean by "compliant". A non-NULL m_slots will >> indicate usage of multi-phase initialisation, so it at least indicates >> *intent* to correctly support subinterpreters et al. Actual delivery >> on that promise is still a different question :) > > Yes, non-NULL m_slots means the module is compliant. If it's not, it's a > bug in the *module* (i.e. compliance is not *just* a matter of setting > setting m_slots). > This will be explained in the docs. Perfect. > >>>> [snip] >>>> >>>> The proposal >>>> ============ >>> >>> This section should include an indication of how the loader (and >>> perhaps finder) will change for builtin, frozen, and extension >>> modules. It may help to describe the proposal up front by how the >>> loader implementation would look if it were somehow implemented in >>> Python code. The subsequent sections sometimes indicate where >>> different things take place, but an explicit outline (as Python code) >>> would make the entire flow really obvious. Putting that toward the >>> beginning of this section would help clearly set the stage for the >>> rest of the proposal. >> >> +1 for a pseudo-code overview of the loader implementation. > > OK. Along with a link to PEP 451 code [*], it should make things clearer. > [*] https://www.python.org/dev/peps/pep-0451/#how-loading-will-work Sounds good. > >>>> [snip] >>>> Unknown slot IDs will cause the import to fail with SystemError. >>> >>> Was there any consideration made for just ignoring unknown slot IDs? >>> My gut reaction is that you have it the right way, but I can still >>> imagine use cases for custom slots that PyModuleDef_Init wouldn't know >>> about. >> >> The "known slots only, all other slot IDs are reserved for future use" >> slot semantics were copied directly from PyType_FromSpec in PEP 384. >> Since it's just a numeric slot ID, you'd run a high risk of conflicts >> if you allowed for custom extensions. >> >> If folks want to do more clever things, they'll need to use the create >> or exec slot to stash them on the module object, rather than storing >> them in the module definition. > > Right, if you need custom behavior, put it in a function and use the > provided hook. (If you need custom "slots" on PyModuleDef for some > reason, use a PyModuleDef subclass -- but I can't see where it would be > helpful.) > Ignoring unknown slot IDs would mean letting errors go unnoticed. This is reasonable. Thanks. > > (Technicality: PyModuleDef_Init doesn't care about slots; > PyModule_FromDefAndSpec and PyModule_ExecDef do. and they will raise the > errors.) > >>> When using multi-phase initialization, the *m_name* field of PyModuleDef >>> will >>> not be used during importing; the module name will be taken from the >>> ModuleSpec. >> >> So m_name will be strictly ignored by PyModuleDef_Init? > > Yes. The name is useful for introspection, but the import machinery will > use the name provided by the ModuleSpec. Okay. > > (Technicality: again, PyModuleDef_Init doesn't touch names at all. > PyModule_FromDefAndSpec and PyModule_ExecDef do, and they will ignore > the name from the def.) > >>>> The PyModuleDef object must be available for the lifetime of the module >>>> created >>>> from it ? usually, it will be declared statically. >>> >>> How easily will this be a source of mysterious errors-at-a-distance? >> >> It shouldn't be any worse than static type definitions, and normal >> reference counting semantics should keep it alive regardless. > > It's the the same as the current behavior (PEP 3121), where a > PyModuleDef is stored in the module, and if you let it die, > PyModule_GetState will give you an invalid pointer. It's just that in > PEP 489, the import machinery itself uses def, so you actually get to > feel the pain if you deallocate it. > All in all, this should not be a problem in practice; the PEP specifies > what'll happen if you go off doing exotic things. (For example, Cython > might run into this if it tries implementing a reloading scheme we > talked about earlier in the thread, and even then it shouldn't be a > major source of mysterious errors.) Normal mortals will be OK. Thanks for explaining. I'm less concerned now. > >>> [snip] >>> However, only ModuleType instances support module-specific functionality >>> such as per-module state. >> >> This is a pretty important point. Presumably this constraints later >> behavior and precedes all functionality related to per-module state. > > Yes. Module objects support more module-like behavior than other > objects. What you can and cannot use should be clear from the API. I'll > clarify a bit more what functionality depends on using a PyModule_Type > (or subclass) instance. > One thing I see I forgot to add is that execution slots are looked up > via PyModule_GetDef, so they won't be processed on non-module objects. Okay. That makes sense now. > > It's a very good idea to use a module subclass rather than a completely > custom object. The docs will need to strongly recommend this. Agreed. And the docs should also be clear on how non-module objects are basically ignored, slot-wise. > >>>> [snip] >>>> Extension authors are advised to keep Py_mod_create minimal, an in >>>> particular >>>> to not call user code from it. >>> >>> This is a pretty important point as well. We'll need to make sure >>> this is sufficiently clear in the documentation. Would it make sense >>> to provide helpers for common cases, to encourage extension authors to >>> keep the create function minimal? >> >> The main encouragement is to not handcode your extension modules at >> all, and let something like Cython or SWIG take care of the >> boilerplate :) > > Yes, Cython should be default. For hand-written modules, the common case > should be not defining create at all. The docs should be explicit about this. > >>>> [snip] >>>> >>>> If PyModuleExec replaces the module's entry in sys.modules, >>>> the new object will be used and returned by importlib machinery. >>> >>> Just to be sure, something like "mod = sys.modules[modname]" is done >>> before each execution slot. In other words, the result of the >>> previous execution slot should be used for the next one. >> >> That's not the original intent of this paragraph - rather, it is >> referring to the existing behaviour of the import machinery. >> >> However, I agree that now we're allowing the Py_mod_exec slot to be >> supplied multiple times, we should also be updating the module >> reference between slot invocations. > > No, that won't work. It's possible (via direct calls to the import > machinery) to load a module without adding it to sys.modules. What direct calls do you mean? I would not expect any such mechanism to work properly with extension modules. > The behavior should be clear (when you think about it) after I include > the loader implementation pseudocode. Okay. > >> I also think the PEP could do with a brief mention of the additional >> modularity this approach brings at the C level - rather than having to >> jam everything into one function, an extension module can easily break >> up its initialisation into multiple steps, and its technically even >> possible to share common steps between different modules. > > Eh, I think it's better to create one function that calls the parts, > which was always possible, and works just as well. > Repeating slots is allowed because it would be an unnecessary bother to > check for duplicates. It's not a feature to advertise, the PEP just > specifies that in the weird edge case, the intuitive thing will happen. Be that as it may, I think it would be a mistake to treat support for multiple exec slots as a second-class citizen in the design. Personally I find it an appealing feature. > > (I did have a useful future use case for repeated slots, but the current > PEP allows a better and more obvious solution so I'll not even mention > it again.) > > Still, the steps are processed in a loop from a single function > (PyModule_ExecDef), and that function operates on a module object -- it > doesn't know about sys.modules and can't easily check if you replaced > the module somewhere. I would consider this approach to be a mistake as well. The approach should stay consistent with the semantics of the whole import system, where sys.modules is checked directly. Unfortunately, that ship has already sailed. > >>>> (This mirrors the behavior of Python modules. Note that implementing >>>> Py_mod_create is usually a better solution for the use cases this serves.) >>> >>> Could you elaborate? What are those use cases and why would >>> Py_mod_create be better? >> >> Rather than replacing the implicitly created normal module during >> Py_mod_exec (which is the only option available to Python modules), >> PEP 489 lets you define the Py_mod_create slot to override the module >> object creation directly. >> >> Outside conversion of a Python module that manipulates sys.modules to >> an extension module with Cython, there's no real reason to use the >> "replacing yourself in sys.modules" option over using Py_mod_create >> directly. > > Yes. The workaround you need to use in Python modules is possible for > extensions, but there's no reason to use it. I'll try to make it clearer > that it's an unnecessary workaround. Thank you. > >>>> [snip] >>>> >>>> Modules that need to work unchanged on older versions of Python should not >>>> use multi-phase initialization, because the benefits it brings can't be >>>> back-ported. >>> >>> Given your example below, "should not" seems a bit strong to me. In >>> fact, what are the objections to encouraging the approach from the >>> example? >> >> Agreed, "should not" is probably too strong here. On the other hand, >> preserving compatibility with older Python versions in a module that >> has been updated to rely on multi-phase initialization is likely to be >> a matter of "graceful degradation", rather than being able to >> reproduce comparable functionality (which I believe may have been the >> point Petr was trying to convey). > > My point is that if you need graceful degradation, your best bet is to > stick with single-phase init. Then you'll have one code path that works > the same on all versions. > If you *need* the features of multi-phase init, you need to remove > support for Pythons that don't have it. > If you need both backwards compatibility and multi-phase init, you > essentially need to create two modules (with shared contents), and make > sure they end up in the same state after they're loaded. > >> I expect Cython and SWIG may be able to manage that through >> appropriate use of #ifdef's in the generated code, but doing it by >> hand is likely to be painful, hence the potential benefits of just >> sticking with single-phase initialisation for the time being. > > Yes, code generators are in a position to create two versions of the > module, and select one using using #ifdef. > > The example in the PEP is helpful for other reasons than encouraging > #ifdef: it shows what needs to change when porting. Think of it as a diff :) It may be worth being more clear about that. > >>>> [snip] >>>> >>>> Subinterpreters and Interpreter Reloading >>>> ----------------------------------------- >>>> >>>> Extensions using the new initialization scheme are expected to support >>>> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. >>> >>> Presumably this support is explicitly and completely defined in the >>> subsequent sentences. Is it really just keeping "hidden" module state >>> encapsulated on the module object? If not then it may make sense to >>> enumerate the requirements better for the sake of extension module >>> authors. > > It is explained in the docs, see "Bugs and caveats" here: > https://docs.python.org/3/c-api/init.html#sub-interpreter-support > I'll add a link to that page. Cool. > >> I'd actually like to have a better way of doing scenario testing for >> extension modules (subinterpreters, multiple initialize/finalize >> cycles, freezing), but I'm not sure this PEP is the best place to >> define that. Perhaps we could do a PyPI project that was a tox-based >> test battery for this kind of thing? > > I think that's the wrong place to start. Currently, sub-interpreter > support is buried away in a docs chapter about Python > initialization/finalization, so a typical extension author won't even > notice it. We need to first make it *possible* to support > subinterpreters easily and correctly (so that Cython can do it), and to > document it prominently in the "writing extensions" part of the docs, > not only in "extending Python". Then, > This PEP does part of the first step, and the docs for it (which aren't > written yet) will do the second step. > After that, it could make sense to provide a tool for testing this. There's nothing about the docs that precludes putting testing helpers up on PyPI though. However, I'm definitely +1 on improving the docs. > >>>> The mechanism is designed to make this easy, but care is still required >>>> on the part of the extension author. >>>> No user-defined functions, methods, or instances may leak to different >>>> interpreters. >>>> To achieve this, all module-level state should be kept in either the module >>>> dict, or in the module object's storage reachable by PyModule_GetState. >>> >>> Is this programmatically enforceable? > > No. (I believe you could even prove this formally.) > >>> Is there any mechanism for easily copying module state? > > No. This would be impossible to provide in the general case. It's the > responsibility of your C code. > That said, if you need to copy module state, chances are your design > could use some rethinking. > >>> How about sharing some state between subinterpreters? > > The PyCapsule API was designed for this. I'm simply thinking in terms of the options we have for a PEP I'm working on that will facilitate passing objects between subinterpreters and even possibly sharing some state between them. Currently it will be practically necessary to exclude extension modules from any such mechanism. So I was wondering if there would be a way to allow extension module authors to define how at least some of the module's data could be shared between subinterpreters. > >>> How much room is there for letting extension module >>> authors define how their module behaves across multiple interpreters >>> or across multiple Initialize/Finalize cycles? > > Technically, you have all the freedom you want. But if I embed Python > into my project/library, I'd want multiple sub-interpreters completely > isolated by default. If I use two libraries that each embed Python into > my app, I definitely want them isolated. > So the PEP tries to make it easy to keep multiple interpreters isolated. As I just noted, I'm looking at making use of subinterpreters for a different use case where it *does* make sense to effectively share objects between them. [snip] >>> At the same time, one may argue that reloading modules is not >>> something to encourage. :) >> >> There's a reason import_fresh_module has never made it out of test.support :) > > Right. Implementation-wise, it would actually be much easier to support > reload rather than make it a no-op. But then C module authors would need > to think about this edge case, which might be tricky to get right, would > not be likely to get test coverage, and is generally not useful anyway, . > > If it turns out to be useful, it would be very simple to add an explicit > reload slot in the future. Agreed. [snip] >> This section is missing any explanation of the impact on >> Python/import.c, on the _imp/imp module, and on the 3 finders/loaders >> in Lib/importlib/_bootstrap[_external].py (builtin/frozen/extension). > > I'll add a summary. > > The internal _imp module will have backwards incompatible changes -- > functions will be added and removed as necessary. That's what the > underscore means :) Be careful with that assumption. We've had plenty of experiences where the assumption because unreliable. > The deprecated imp module will get a backwards compatibility shim for > anything it imported from _imp that got removed. > importlib will stay backwards compatible. > > Python/import.c and Python/importdl.* will be rewritten entirely. > See the patches (linked from the PEP) for details. > -eric From ericsnowcurrently at gmail.com Wed May 20 02:33:03 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 19 May 2015 18:33:03 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <555B4B4A.5000902@redhat.com> References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> <555B4B4A.5000902@redhat.com> Message-ID: On Tue, May 19, 2015 at 8:40 AM, Petr Viktorin wrote: > Here is an overview of how the modified importers will operate. > Details such as logging or handling of errors and invalid states > are left out, and C code is presented with a concise Python-like syntax. > > The framework that calls the importers is explained in PEP 451 > [#pep-0451-loading]_. I know. I wrote that PEP. :) > > importlib/_bootstrap.py: > > class BuiltinImporter: > def create_module(self, spec): > module = _imp.create_builtin(spec) > > def exec_module(self, module): > _imp.exec_dynamic(module) > > def load_module(self, name): > # use a backwards compatibility shim > _load_module_shim(self, name) Won't frozen modules be likewise affected? > > importlib/_bootstrap_external.py: > > class ExtensionFileLoader: > def create_module(self, spec): > module = _imp.create_dynamic(spec) > > def exec_module(self, module): > _imp.exec_dynamic(module) > > def load_module(self, name): > # use a backwards compatibility shim > _load_module_shim(self, name) > > Python/import.c (the _imp module): > > def create_dynamic(spec): > name = spec.name > path = spec.origin > > # Find an already loaded module that used single-phase init. > # For multi-phase initialization, mod is NULL, so a new module > # is always created. > mod = _PyImport_FindExtensionObject(name, name) > if mod: > return mod > > return _PyImport_LoadDynamicModuleWithSpec(spec) > > def exec_dynamic(module): > def = PyModule_GetDef(module) This is the point where custom module types get ignored, right? > state = PyModule_GetState(module) > if state is NULL: > PyModule_ExecDef(module, def) Ah, it is idempotent. > > def create_builtin(spec): > name = spec.name > > # Find an already loaded module that used single-phase init. > # For multi-phase initialization, mod is NULL, so a new module > # is always created. > mod = _PyImport_FindExtensionObject(name, name) > if mod: > return mod > > for initname, initfunc in PyImport_Inittab: > if name == initname: > m = initfunc() > if isinstance(m, PyModuleDef): > def = m > return PyModule_FromDefAndSpec(def, spec) > else: > # fall back to single-phase initialization > module = m > _PyImport_FixupExtensionObject(module, name, name) > return module > > Python/importdl.c: > > def _PyImport_LoadDynamicModuleWithSpec(spec): > path = spec.origin > package, dot, name = spec.name.rpartition('.') > > # see the "Non-ASCII module names" section for export_hook_name > hook_name = export_hook_name(name) > > # call platform-specific function for loading exported function > # from shared library > exportfunc = _find_shared_funcptr(hook_name, path) > > m = exportfunc() > if isinstance(m, PyModuleDef): > def = m > return PyModule_FromDefAndSpec(def, spec) > > module = m > > # fall back to single-phase initialization > .... > > Objects/moduleobject.c: > > def PyModule_FromDefAndSpec(def, spec): > name = spec.name > create = None > for slot, value in def.m_slots: > if slot == Py_mod_create: > create = value > if create: > m = create(spec, def) > else: > m = PyModule_New(name) > > if isinstance(m, types.ModuleType): > m.md_state = None > m.md_def = def > > if def.m_methods: > PyModule_AddFunctions(m, def.m_methods) > if def.m_doc: > PyModule_SetDocString(m, def.m_doc) > > def PyModule_ExecDef(module, def): > if isinstance(module, types.module_type): > if module.md_state is NULL: > # allocate a block of zeroed-out memory > module.md_state = _alloc(module.md_size) > > if def.m_slots is NULL: > return > > for slot, value in def.m_slots: > if slot == Py_mod_exec: > value(module) > > It may also be worth outlining how PyModuleDef_Init will work. -eric From encukou at gmail.com Wed May 20 10:41:30 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 20 May 2015 10:41:30 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> <555B4B4A.5000902@redhat.com> Message-ID: <555C48BA.4080204@gmail.com> On 05/20/2015 02:33 AM, Eric Snow wrote: > On Tue, May 19, 2015 at 8:40 AM, Petr Viktorin wrote: >> Here is an overview of how the modified importers will operate. >> Details such as logging or handling of errors and invalid states >> are left out, and C code is presented with a concise Python-like syntax. >> >> The framework that calls the importers is explained in PEP 451 >> [#pep-0451-loading]_. > > I know. I wrote that PEP. :) > >> >> importlib/_bootstrap.py: >> >> class BuiltinImporter: >> def create_module(self, spec): >> module = _imp.create_builtin(spec) >> >> def exec_module(self, module): >> _imp.exec_dynamic(module) >> >> def load_module(self, name): >> # use a backwards compatibility shim >> _load_module_shim(self, name) > > Won't frozen modules be likewise affected? No, frozen modules are Python source, just not loaded from a file. [...] >> Python/import.c (the _imp module): >> >> def create_dynamic(spec): >> name = spec.name >> path = spec.origin >> >> # Find an already loaded module that used single-phase init. >> # For multi-phase initialization, mod is NULL, so a new module >> # is always created. >> mod = _PyImport_FindExtensionObject(name, name) >> if mod: >> return mod >> >> return _PyImport_LoadDynamicModuleWithSpec(spec) >> >> def exec_dynamic(module): >> def = PyModule_GetDef(module) > > This is the point where custom module types get ignored, right? Yes. The actual code has a check for non-modules, to skip exec_dynamic rather than have PyModule_GetDef raise. I'll add this to the overview to make things clearer. >> state = PyModule_GetState(module) >> if state is NULL: >> PyModule_ExecDef(module, def) > > Ah, it is idempotent. Yes, this is the part that disables reload(). [...] > It may also be worth outlining how PyModuleDef_Init will work. That's hard to do in Python syntax, since most of what it does is ensure the def is a valid PyObject. I'll explain it in a different section. It's a very small, idempotent function: PyObject* PyModuleDef_Init(struct PyModuleDef* def) { if (def->m_base.m_index == 0) { max_module_number++; Py_REFCNT(def) = 1; Py_TYPE(def) = &PyModuleDef_Type; def->m_base.m_index = max_module_number; } return (PyObject*)def; } The code is lifted straight from PyModule_Create2. The m_index is bookkeeping for for PyState_FindModule, so it's unused for modules with multi-phase init, but I didn't want to break the invariant that it's set up together with Py_TYPE. -- Petr Viktorin From encukou at gmail.com Wed May 20 12:55:37 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 20 May 2015 12:55:37 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> Message-ID: <555C6829.60901@gmail.com> On 05/20/2015 02:22 AM, Eric Snow wrote: > On Tue, May 19, 2015 at 5:06 AM, Petr Viktorin wrote: >> On 05/19/2015 05:51 AM, Nick Coghlan wrote: >>> On 19 May 2015 at 10:07, Eric Snow wrote: >>>> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin wrote: [snip] >>>>> >>>>> If PyModuleExec replaces the module's entry in sys.modules, >>>>> the new object will be used and returned by importlib machinery. >>>> >>>> Just to be sure, something like "mod = sys.modules[modname]" is done >>>> before each execution slot. In other words, the result of the >>>> previous execution slot should be used for the next one. >>> >>> That's not the original intent of this paragraph - rather, it is >>> referring to the existing behaviour of the import machinery. >>> >>> However, I agree that now we're allowing the Py_mod_exec slot to be >>> supplied multiple times, we should also be updating the module >>> reference between slot invocations. >> >> No, that won't work. It's possible (via direct calls to the import >> machinery) to load a module without adding it to sys.modules. > > What direct calls do you mean? I would not expect any such mechanism > to work properly with extension modules. Reimplement without the sys.modules parts. The point is that exec_module doesn't a priori depend on the module being in sys.modules, which I think is a good thing. >> The behavior should be clear (when you think about it) after I include >> the loader implementation pseudocode. > > Okay. > >> >>> I also think the PEP could do with a brief mention of the additional >>> modularity this approach brings at the C level - rather than having to >>> jam everything into one function, an extension module can easily break >>> up its initialisation into multiple steps, and its technically even >>> possible to share common steps between different modules. >> >> Eh, I think it's better to create one function that calls the parts, >> which was always possible, and works just as well. >> Repeating slots is allowed because it would be an unnecessary bother to >> check for duplicates. It's not a feature to advertise, the PEP just >> specifies that in the weird edge case, the intuitive thing will happen. > > Be that as it may, I think it would be a mistake to treat support for > multiple exec slots as a second-class citizen in the design. > Personally I find it an appealing feature. It's there, but I'll not not advertise it too much in the docs. >> (I did have a useful future use case for repeated slots, but the current >> PEP allows a better and more obvious solution so I'll not even mention >> it again.) >> >> Still, the steps are processed in a loop from a single function >> (PyModule_ExecDef), and that function operates on a module object -- it >> doesn't know about sys.modules and can't easily check if you replaced >> the module somewhere. > > I would consider this approach to be a mistake as well. The approach > should stay consistent with the semantics of the whole import system, > where sys.modules is checked directly. Unfortunately, that ship has > already sailed. It's the loader that checks sys.modules, *after* exec_module is called. No other implementation of exec_module checks sys.modules in the middle of its operation. So I think the semantics are consistent. [snip] >>>>> >>>>> Modules that need to work unchanged on older versions of Python should not >>>>> use multi-phase initialization, because the benefits it brings can't be >>>>> back-ported. >>>> >>>> Given your example below, "should not" seems a bit strong to me. In >>>> fact, what are the objections to encouraging the approach from the >>>> example? >>> >>> Agreed, "should not" is probably too strong here. On the other hand, >>> preserving compatibility with older Python versions in a module that >>> has been updated to rely on multi-phase initialization is likely to be >>> a matter of "graceful degradation", rather than being able to >>> reproduce comparable functionality (which I believe may have been the >>> point Petr was trying to convey). >> >> My point is that if you need graceful degradation, your best bet is to >> stick with single-phase init. Then you'll have one code path that works >> the same on all versions. >> If you *need* the features of multi-phase init, you need to remove >> support for Pythons that don't have it. >> If you need both backwards compatibility and multi-phase init, you >> essentially need to create two modules (with shared contents), and make >> sure they end up in the same state after they're loaded. >> >>> I expect Cython and SWIG may be able to manage that through >>> appropriate use of #ifdef's in the generated code, but doing it by >>> hand is likely to be painful, hence the potential benefits of just >>> sticking with single-phase initialisation for the time being. >> >> Yes, code generators are in a position to create two versions of the >> module, and select one using using #ifdef. >> >> The example in the PEP is helpful for other reasons than encouraging >> #ifdef: it shows what needs to change when porting. Think of it as a diff :) > > It may be worth being more clear about that. OK [snip] >>>>> The mechanism is designed to make this easy, but care is still required >>>>> on the part of the extension author. >>>>> No user-defined functions, methods, or instances may leak to different >>>>> interpreters. >>>>> To achieve this, all module-level state should be kept in either the module >>>>> dict, or in the module object's storage reachable by PyModule_GetState. >>>> >>>> Is this programmatically enforceable? >> >> No. (I believe you could even prove this formally.) >> >>>> Is there any mechanism for easily copying module state? >> >> No. This would be impossible to provide in the general case. It's the >> responsibility of your C code. >> That said, if you need to copy module state, chances are your design >> could use some rethinking. >> >>>> How about sharing some state between subinterpreters? >> >> The PyCapsule API was designed for this. > > I'm simply thinking in terms of the options we have for a PEP I'm > working on that will facilitate passing objects between > subinterpreters and even possibly sharing some state between them. > Currently it will be practically necessary to exclude extension > modules from any such mechanism. So I was wondering if there would be > a way to allow extension module authors to define how at least some of > the module's data could be shared between subinterpreters. You should be able to put that info in slots. It's hard to speculate without knowing specifics, though. >>>> How much room is there for letting extension module >>>> authors define how their module behaves across multiple interpreters >>>> or across multiple Initialize/Finalize cycles? >> >> Technically, you have all the freedom you want. But if I embed Python >> into my project/library, I'd want multiple sub-interpreters completely >> isolated by default. If I use two libraries that each embed Python into >> my app, I definitely want them isolated. >> So the PEP tries to make it easy to keep multiple interpreters isolated. > > As I just noted, I'm looking at making use of subinterpreters for a > different use case where it *does* make sense to effectively share > objects between them. OK. This PEP isn't designed for that, but it should offer enough extensibility. [snip] >>> This section is missing any explanation of the impact on >>> Python/import.c, on the _imp/imp module, and on the 3 finders/loaders >>> in Lib/importlib/_bootstrap[_external].py (builtin/frozen/extension). >> >> I'll add a summary. >> >> The internal _imp module will have backwards incompatible changes -- >> functions will be added and removed as necessary. That's what the >> underscore means :) > > Be careful with that assumption. We've had plenty of experiences > where the assumption because unreliable. That's why I provide backcompat shims for undocumented, deprecated functions in "imp". But _imp is just too low-level to do that easily. From encukou at gmail.com Wed May 20 13:08:53 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 20 May 2015 13:08:53 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> Message-ID: <555C6B45.9070001@gmail.com> On 05/20/2015 01:56 AM, Eric Snow wrote: > On Mon, May 18, 2015 at 9:51 PM, Nick Coghlan wrote: >> On 19 May 2015 at 10:07, Eric Snow wrote: > [snip] >>> Was there any consideration made for just ignoring unknown slot IDs? >>> My gut reaction is that you have it the right way, but I can still >>> imagine use cases for custom slots that PyModuleDef_Init wouldn't know >>> about. >> >> The "known slots only, all other slot IDs are reserved for future use" >> slot semantics were copied directly from PyType_FromSpec in PEP 384. >> Since it's just a numeric slot ID, you'd run a high risk of conflicts >> if you allowed for custom extensions. >> >> If folks want to do more clever things, they'll need to use the create >> or exec slot to stash them on the module object, rather than storing >> them in the module definition. > > Makes sense. This does remind me of something I wanted to ask. Would > it make sense to leverage ModuleSpec.loader_state? If I recall > correctly, we added loader_state with extension modules in mind. I don't think we want to go out of our way to support non-module objects. Module subclasses should cover any needed functionality, and they will support slots. >>>> [snip] >>>> Extension authors are advised to keep Py_mod_create minimal, an in >>>> particular >>>> to not call user code from it. >>> >>> This is a pretty important point as well. We'll need to make sure >>> this is sufficiently clear in the documentation. Would it make sense >>> to provide helpers for common cases, to encourage extension authors to >>> keep the create function minimal? >> >> The main encouragement is to not handcode your extension modules at >> all, and let something like Cython or SWIG take care of the >> boilerplate :) > > Hey, I tried to make something happen over on python-ideas! :) Some > folks just don't want to go far enough. Yeah, as someone who's trying to get Python3 porting patches to Samba, I can tell you some upstreams really, really, really don't like rewriting their code. >>>> As a rule of thumb, modules that rely on PyState_FindModule are, at the >>>> moment, >>>> not good candidates for porting to the new mechanism. >>> >>> Are there any plans for a follow-up effort to help with this case? >> >> The problem here is that the PEP 3121 module state approach provides >> storage on a *per-interpreter* basis, that is then shared amongst all >> module instances created from a given module definition. > > You mean a form of interpreter-local storage? Also, the module > definition is effectively global right? The PyModuleDef is global and static, but you can create any number of module objects from it. Each interpreter gets its own module object, with state specific to the module object. (And with a custom finder/loader you can make multiple modules from the same def within one interpreter. >> For pure Python code, we don't have this problem, since the >> interpreter takes care of providing a properly scoped globals() >> reference to *all* functions defined in that module, regardless of >> whether they're module level functions or method definitions on a >> class. At the C level, we don't have that, as only module level >> functions get a module reference passed in - methods only get a >> reference to their class instance, without a reference to the module >> globals, and delayed callbacks can be a problem as well. > > Yuck. Is this something we could fix? Is __module__ not set on all functions? The module object is not stored on classes, so methods dont' have access to it. I want a fix for that to be my next PEP :) -- Petr Viktorin From encukou at gmail.com Wed May 20 13:34:04 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 20 May 2015 13:34:04 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 6 Message-ID: Hello, Based mainly on comments by Eric Snow, I've sent another update to PEP 489. See the diff at https://hg.python.org/peps/rev/aad7a39a695b Here is a copy for your convenience: PEP: 489 Title: Multi-phase extension module initialization Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015 Resolution: Abstract ======== This PEP proposes a redesign of the way in which built-in and extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve import-related problems by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension authors to only define features they need, and to allow future additions to extension module declarations. Extensions modules are created in a two-step process, fitting better into the ModuleSpec architecture, with parallels to __new__ and __init__ of classes. Extension modules can safely store arbitrary C-level per-module state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. The proposal also allows extension modules with non-ASCII names. Not all problems tackled in PEP 3121 are solved in this proposal. In particular, problems with run-time module lookup (PyState_FindModule) are left to a future PEP. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module object is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions (i.e. shared libraries) and built-in modules, the module init function is executed straight away and does both the creation and initialization. The initialization function is not passed the ModuleSpec, or any information it contains, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. In Py3, modules are also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to re-import it and thus run into an infinite loop when it executes the module init function again. Without access to the fully-qualified module name, it is not trivial to correctly add the module to sys.modules either. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of "__init__.py" modules, i.e. packages, especially when relative imports are being used at module init time. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or interpreter reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps backwards compatibility, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process =================== Currently, extension and built-in modules export an initialization function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return a fully initialized module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef object. It then continues to initialize it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the fully qualified name of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The proposal ============ The initialization function (PyInit_modulename) will be allowed to return a pointer to a PyModuleDef object. The import machinery will be in charge of constructing the module object, calling hooks provided in the PyModuleDef in the relevant phases of initialization (as described below). This multi-phase initialization is an additional possibility. Single-phase initialization, the current practice of returning a fully initialized module object, will still be accepted, so existing code will work unchanged, including binary compatibility. The PyModuleDef structure will be changed to contain a list of slots, similarly to PEP 384's PyType_Spec for types. To keep binary compatibility, and avoid needing to introduce a new structure (which would introduce additional supporting functions and per-module storage), the currently unused m_reload pointer of PyModuleDef will be changed to hold the slots. The structures are defined as:: typedef struct { int slot; void *value; } PyModuleDef_Slot; typedef struct PyModuleDef { PyModuleDef_Base m_base; const char* m_name; const char* m_doc; Py_ssize_t m_size; PyMethodDef *m_methods; PyModuleDef_Slot *m_slots; /* changed from `inquiry m_reload;` */ traverseproc m_traverse; inquiry m_clear; freefunc m_free; } PyModuleDef; The *m_slots* member must be either NULL, or point to an array of PyModuleDef_Slot structures, terminated by a slot with id set to 0 (i.e. ``{0, NULL}``). To specify a slot, a unique slot ID must be provided. New Python versions may introduce new slot IDs, but slot IDs will never be recycled. Slots may get deprecated, but will continue to be supported throughout Python 3.x. A slot's value pointer may not be NULL, unless specified otherwise in the slot's documentation. The following slots are currently available, and described later: * Py_mod_create * Py_mod_exec Unknown slot IDs will cause the import to fail with SystemError. When using multi-phase initialization, the *m_name* field of PyModuleDef will not be used during importing; the module name will be taken from the ModuleSpec. To prevent crashes when the module is loaded in older versions of Python, the PyModuleDef object must be initialized using the newly added PyModuleDef_Init function. This sets the object type (which cannot be done statically on certain compilers), refcount, and internal bookkeeping data (m_index). For example, an extension module "example" would be exported as:: static PyModuleDef example_def = {...} PyMODINIT_FUNC PyInit_example(void) { return PyModuleDef_Init(&example_def); } The PyModuleDef object must be available for the lifetime of the module created from it ? usually, it will be declared statically. Pseudo-code Overview -------------------- Here is an overview of how the modified importers will operate. Details such as logging or handling of errors and invalid states are left out, and C code is presented with a concise Python-like syntax. The framework that calls the importers is explained in PEP 451 [#pep-0451-loading]_. :: importlib/_bootstrap.py: class BuiltinImporter: def create_module(self, spec): module = _imp.create_builtin(spec) def exec_module(self, module): _imp.exec_dynamic(module) def load_module(self, name): # use a backwards compatibility shim _load_module_shim(self, name) importlib/_bootstrap_external.py: class ExtensionFileLoader: def create_module(self, spec): module = _imp.create_dynamic(spec) def exec_module(self, module): _imp.exec_dynamic(module) def load_module(self, name): # use a backwards compatibility shim _load_module_shim(self, name) Python/import.c (the _imp module): def create_dynamic(spec): name = spec.name path = spec.origin # Find an already loaded module that used single-phase init. # For multi-phase initialization, mod is NULL, so a new module # is always created. mod = _PyImport_FindExtensionObject(name, name) if mod: return mod return _PyImport_LoadDynamicModuleWithSpec(spec) def exec_dynamic(module): if not isinstance(module, types.ModuleType): # non-modules are skipped -- PyModule_GetDef fails on them return def = PyModule_GetDef(module) state = PyModule_GetState(module) if state is NULL: PyModule_ExecDef(module, def) def create_builtin(spec): name = spec.name # Find an already loaded module that used single-phase init. # For multi-phase initialization, mod is NULL, so a new module # is always created. mod = _PyImport_FindExtensionObject(name, name) if mod: return mod for initname, initfunc in PyImport_Inittab: if name == initname: m = initfunc() if isinstance(m, PyModuleDef): def = m return PyModule_FromDefAndSpec(def, spec) else: # fall back to single-phase initialization module = m _PyImport_FixupExtensionObject(module, name, name) return module Python/importdl.c: def _PyImport_LoadDynamicModuleWithSpec(spec): path = spec.origin package, dot, name = spec.name.rpartition('.') # see the "Non-ASCII module names" section for export_hook_name hook_name = export_hook_name(name) # call platform-specific function for loading exported function # from shared library exportfunc = _find_shared_funcptr(hook_name, path) m = exportfunc() if isinstance(m, PyModuleDef): def = m return PyModule_FromDefAndSpec(def, spec) module = m # fall back to single-phase initialization .... Objects/moduleobject.c: def PyModule_FromDefAndSpec(def, spec): name = spec.name create = None for slot, value in def.m_slots: if slot == Py_mod_create: create = value if create: m = create(spec, def) else: m = PyModule_New(name) if isinstance(m, types.ModuleType): m.md_state = None m.md_def = def if def.m_methods: PyModule_AddFunctions(m, def.m_methods) if def.m_doc: PyModule_SetDocString(m, def.m_doc) def PyModule_ExecDef(module, def): if isinstance(module, types.module_type): if module.md_state is NULL: # allocate a block of zeroed-out memory module.md_state = _alloc(module.md_size) if def.m_slots is NULL: return for slot, value in def.m_slots: if slot == Py_mod_exec: value(module) Module Creation Phase --------------------- Creation of the module object ? that is, the implementation of ExecutionLoader.create_module ? is governed by the Py_mod_create slot. The Py_mod_create slot ...................... The Py_mod_create slot is used to support custom module subclasses. The value pointer must point to a function with the following signature:: PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def) The function receives a ModuleSpec instance, as defined in PEP 451, and the PyModuleDef structure. It should return a new module object, or set an error and return NULL. This function is not responsible for setting import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) on the new module. There is no requirement for the returned object to be an instance of types.ModuleType. Any type can be used, as long as it supports setting and getting attributes, including at least the import-related attributes. However, only ModuleType instances support module-specific functionality such as per-module state. Note that when this function is called, the module's entry in sys.modules is not populated yet. Attempting to import the same module again (possibly transitively), may lead to an infinite loop. Extension authors are advised to keep Py_mod_create minimal, an in particular to not call user code from it. Multiple Py_mod_create slots may not be specified. If they are, import will fail with SystemError. If Py_mod_create is not specified, the import machinery will create a normal module object using PyModule_New. The name is taken from *spec*. Post-creation steps ................... If the Py_mod_create function returns an instance of types.ModuleType or a subclass (or if a Py_mod_create slot is not present), the import machinery will associate the PyModuleDef with the module. This also makes the PyModuleDef accessible to execution phase, the PyModule_GetDef function, and garbage collection routines (traverse, clear, free). If the Py_mod_create function does not return a module subclass, then m_size must be 0, and m_traverse, m_clear and m_free must all be NULL. Otherwise, SystemError is raised. Additionally, initial attributes specified in the PyModuleDef are set on the module object, regardless of its type: * The docstring is set from m_doc, if non-NULL. * The module's functions are initialized from m_methods, if any. Module Execution Phase ---------------------- Module execution -- that is, the implementation of ExecutionLoader.exec_module -- is governed by "execution slots". This PEP only adds one, Py_mod_exec, but others may be added in the future. The execution phase is done on the PyModuleDef associated with the module object. For objects that are not a subclass of PyModule_Type (for which PyModule_GetDef would fail), the execution phase is skipped. Execution slots may be specified multiple times, and are processed in the order they appear in the slots array. When using the default import machinery, they are processed after import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) are set and the module is added to sys.modules. Pre-Execution steps ------------------- Before processing the execution slots, per-module state is allocated for the module. From this point on, per-module state is accessible through PyModule_GetState. The Py_mod_exec slot .................... The entry in this slot must point to a function with the following signature:: int (*PyModuleExecFunction)(PyObject* module) It will be called to initialize a module. Usually, this amounts to setting the module's initial attributes. The "module" argument receives the module object to initialize. If PyModuleExec replaces the module's entry in sys.modules, the new object will be used and returned by importlib machinery. (This mirrors the behavior of Python modules. Note that implementing Py_mod_create is usually a better solution for the use cases this serves.) The function must return ``0`` on success, or, on error, set an exception and return ``-1``. Legacy Init ----------- The backwards-compatible single-phase initialization continues to be supported. In this scheme, the PyInit function returns a fully initialized module rather than a PyModuleDef object. In this case, the PyInit hook implements the creation phase, and the execution phase is a no-op. Modules that need to work unchanged on older versions of Python should stick to single-phase initialization, because the benefits it brings can't be back-ported. Here is an example of a module that supports multi-phase initialization, and falls back to single-phase when compiled for an older version of CPython. It is included mainly as an illustration of the changes needed to enable multi-phase init:: #include static int spam_exec(PyObject *module) { PyModule_AddStringConstant(module, "food", "spam"); return 0; } #ifdef Py_mod_exec static PyModuleDef_Slot spam_slots[] = { {Py_mod_exec, spam_exec}, {0, NULL} }; #endif static PyModuleDef spam_def = { PyModuleDef_HEAD_INIT, /* m_base */ "spam", /* m_name */ PyDoc_STR("Utilities for cooking spam"), /* m_doc */ 0, /* m_size */ NULL, /* m_methods */ #ifdef Py_mod_exec spam_slots, /* m_slots */ #else NULL, #endif NULL, /* m_traverse */ NULL, /* m_clear */ NULL, /* m_free */ }; PyMODINIT_FUNC PyInit_spam(void) { #ifdef Py_mod_exec return PyModuleDef_Init(&spam_def); #else PyObject *module; module = PyModule_Create(&spam_def); if (module == NULL) return NULL; if (spam_exec(module) != 0) { Py_DECREF(module); return NULL; } return module; #endif } Built-In modules ---------------- Any extension module can be used as a built-in module by linking it into the executable, and including it in the inittab (either at runtime with PyImport_AppendInittab, or at configuration time, using tools like *freeze*). To keep this possibility, all changes to extension module loading introduced in this PEP will also apply to built-in modules. The only exception is non-ASCII module names, explained below. Subinterpreters and Interpreter Reloading ----------------------------------------- Extensions using the new initialization scheme are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly, avoiding the issues mentioned in Python documentation [#subinterpreter-docs]_. The mechanism is designed to make this easy, but care is still required on the part of the extension author. No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object's storage reachable by PyModule_GetState. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. Functions incompatible with multi-phase initialization ------------------------------------------------------ The PyModule_Create function will fail when used on a PyModuleDef structure with a non-NULL *m_slots* pointer. The function doesn't have access to the ModuleSpec object necessary for multi-phase initialization. The PyState_FindModule function will return NULL, and PyState_AddModule and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*. PyState registration is disabled because multiple module objects may be created from the same PyModuleDef. Module state and C-level callbacks ---------------------------------- Due to the unavailability of PyState_FindModule, any function that needs access to module-level state (including functions, classes or exceptions defined at the module level) must receive a reference to the module object (or the particular object it needs), either directly or indirectly. This is currently difficult in two situations: * Methods of classes, which receive a reference to the class, but not to the class's module * Libraries with C-level callbacks, unless the callbacks can receive custom data set at callback registration Fixing these cases is outside of the scope of this PEP, but will be needed for the new mechanism to be useful to all modules. Proper fixes have been discussed on the import-sig mailing list [#findmodule-discussion]_. As a rule of thumb, modules that rely on PyState_FindModule are, at the moment, not good candidates for porting to the new mechanism. New Functions ------------- A new function and macro implementing the module creation phase will be added. These are similar to PyModule_Create and PyModule_Create2, except they take an additional ModuleSpec argument, and handle module definitions with non-NULL slots:: PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec) PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec, int module_api_version) A new function implementing the module execution phase will be added. This allocates per-module state (if not allocated already), and *always* processes execution slots. The import machinery calls this method when a module is executed, unless the module is being reloaded:: PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def) Another function will be introduced to initialize a PyModuleDef object. This idempotent function fills in the type, refcount, and module index. It returns its argument cast to PyObject*, so it can be returned directly from a PyInit function:: PyObject * PyModuleDef_Init(PyModuleDef *); Additionally, two helpers will be added for setting the docstring and methods on a module:: int PyModule_SetDocString(PyObject *, const char *) int PyModule_AddFunctions(PyObject *, PyMethodDef *) Export Hook Name ---------------- As portable C identifiers are limited to ASCII, module names must be encoded to form the PyInit hook name. For ASCII module names, the import hook is named PyInit_, where is the name of the module. For module names containing non-ASCII characters, the import hook is named PyInitU_, where the name is encoded using CPython's "punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix), with hyphens ("-") replaced by underscores ("_"). In Python:: def export_hook_name(name): try: suffix = b'_' + name.encode('ascii') except UnicodeEncodeError: suffix = b'U_' + name.encode('punycode').replace(b'-', b'_') return b'PyInit' + suffix Examples: ============= =================== Module name Init hook name ============= =================== spam PyInit_spam lan?m?t PyInitU_lanmt_2sa6t ??? PyInitU_zck5b2b ============= =================== For modules with non-ASCII names, single-phase initialization is not supported. In the initial implementation of this PEP, built-in modules with non-ASCII names will not be supported. Module Reloading ---------------- Reloading an extension module using importlib.reload() will continue to have no effect, except re-setting import-related attributes. Due to limitations in shared library loading (both dlopen on POSIX and LoadModuleEx on Windows), it is not generally possible to load a modified library after it has changed on disk. Use cases for reloading other than trying out a new version of the module are too rare to require all module authors to keep reloading in mind. If reload-like functionality is needed, authors can export a dedicated function for it. Multiple modules in one library ------------------------------- To support multiple Python modules in one shared library, the library can export additional PyInit* symbols besides the one that corresponds to the library's filename. Note that this mechanism can currently only be used to *load* extra modules, but not to *find* them. (This is a limitation of the loader mechanism, which this PEP does not try to modify.) To work around the lack of a suitable finder, code like the following can be used:: import importlib.machinery import importlib.util loader = importlib.machinery.ExtensionFileLoader(name, path) spec = importlib.util.spec_from_loader(name, loader) module = importlib.util.module_from_spec(spec) loader.exec_module(module) return module On platforms that support symbolic links, these may be used to install one library under multiple names, exposing all exported modules to normal import machinery. Testing and initial implementations ----------------------------------- For testing, a new built-in module ``_testmultiphase`` will be created. The library will export several additional modules using the mechanism described in "Multiple modules in one library". The ``_testcapi`` module will be unchanged, and will use single-phase initialization indefinitely (or until it is no longer supported). The ``array`` and ``xx*`` modules will be converted to use multi-phase initialization as part of the initial implementation. Summary of API Changes and Additions ------------------------------------ New functions: * PyModule_FromDefAndSpec (macro) * PyModule_FromDefAndSpec2 * PyModule_ExecDef * PyModule_SetDocString * PyModule_AddFunctions * PyModuleDef_Init New macros: * Py_mod_create * Py_mod_exec New types: * PyModuleDef_Type will be exposed New structures: * PyModuleDef_Slot PyModuleDef.m_reload changes to PyModuleDef.m_slots. The internal ``_imp`` module will have backwards incompatible changes: ``create_builtin``, ``create_dynamic``, and ``exec_dynamic`` will be added; ``init_builtin``, ``load_dynamic`` will be removed. The undocumented functions ``imp.load_dynamic`` and ``imp.init_builtin`` will be replaced by backwards-compatible shims. Possible Future Extensions ========================== The slots mechanism, inspired by PyType_Slot from PEP 384, allows later extensions. Some extension modules exports many constants; for example _ssl has a long list of calls in the form:: PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", PY_SSL_ERROR_ZERO_RETURN); Converting this to a declarative list, similar to PyMethodDef, would reduce boilerplate, and provide free error-checking which is often missing. String constants and types can be handled similarly. (Note that non-default bases for types cannot be portably specified statically; this case would need a Py_mod_exec function that runs before the slots are added. The free error-checking would still be beneficial, though.) Another possibility is providing a "main" function that would be run when the module is given to Python's -m switch. For this to work, the runpy module will need to be modified to take advantage of ModuleSpec-based loading introduced in PEP 451. Also, it will be necessary to add a mechanism for setting up a module according to slots it wasn't originally defined with. Implementation ============== Work-in-progress implementation is available in a Github repository [#gh-repo]_; a patchset is at [#gh-patch]_. Previous Approaches =================== Stefan Behnel's initial proto-PEP [#stefans_protopep]_ had a "PyInit_modulename" hook that would create a module class, whose ``__init__`` would be then called to create the module. This proposal did not correspond to the (then nonexistent) PEP 451, where module creation and initialization is broken into distinct steps. It also did not support loading an extension into pre-existing module objects. Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype implementation [#nicks-prototype]_. At this time PEP 451 was still not implemented, so the prototype does not use ModuleSpec. The original version of this PEP used Create and Exec hooks, and allowed loading into arbitrary pre-constructed objects with Exec hook. The proposal made extension module initialization closer to how Python modules are initialized, but it was later recognized that this isn't an important goal. The current PEP describes a simpler solution. A further iteration used a "PyModuleExport" hook as an alternative to PyInit, where PyInit was used for existing scheme, and PyModuleExport for multi-phase. However, not being able to determine the hook name based on module name complicated automatic generation of PyImport_Inittab by tools like freeze. Keeping only the PyInit hook name, even if it's not entirely appropriate for exporting a definition, yielded a much simpler solution. References ========== .. [#pep-0451-attributes] https://www.python.org/dev/peps/pep-0451/#attributes .. [#stefans_protopep] https://mail.python.org/pipermail/python-dev/2013-August/128087.html .. [#nicks-prototype] https://mail.python.org/pipermail/python-dev/2013-August/128101.html .. [#rfc-3492] http://tools.ietf.org/html/rfc3492 .. [#gh-repo] https://github.com/encukou/cpython/commits/pep489 .. [#gh-patch] https://github.com/encukou/cpython/compare/master...encukou:pep489.patch .. [#findmodule-discussion] https://mail.python.org/pipermail/import-sig/2015-April/000959.html .. [#pep-0451-loading] https://www.python.org/dev/peps/pep-0451/#how-loading-will-work] .. [#subinterpreter-docs] https://docs.python.org/3/c-api/init.html#sub-interpreter-support Copyright ========= This document has been placed in the public domain. From ericsnowcurrently at gmail.com Wed May 20 16:07:52 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 20 May 2015 08:07:52 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <555C47CD.4060406@redhat.com> References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> <555B4B4A.5000902@redhat.com> <555C47CD.4060406@redhat.com> Message-ID: On Wed, May 20, 2015 at 2:37 AM, Petr Viktorin wrote: > On 05/20/2015 02:33 AM, Eric Snow wrote: [snip] >> Won't frozen modules be likewise affected? > > No, frozen modules are Python source, just not loaded from a file. Isn't the mechanism similar to builtins? Regardless, I was hopeful that we could fix FrozenImporter at the same time that we fixed BuiltinImporter. [snip] >> It may also be worth outlining how PyModuleDef_Init will work. > > That's hard to do in Python syntax, since most of what it does is ensure > the def is a valid PyObject. I'll explain it in a different section. > It's a very small, idempotent function: > > PyObject* > PyModuleDef_Init(struct PyModuleDef* def) > { > if (def->m_base.m_index == 0) { > max_module_number++; > Py_REFCNT(def) = 1; > Py_TYPE(def) = &PyModuleDef_Type; > def->m_base.m_index = max_module_number; > } > return (PyObject*)def; > } > > The code is lifted straight from PyModule_Create2. > > The m_index is bookkeeping for for PyState_FindModule, so it's unused > for modules with multi-phase init, but I didn't want to break the > invariant that it's set up together with Py_TYPE. Okay. Thanks for the explanation. So really PyModuleDef_Init does some bookkeeping and that's it. -eric From ericsnowcurrently at gmail.com Wed May 20 16:56:33 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 20 May 2015 08:56:33 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <555C6829.60901@gmail.com> References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> <555C6829.60901@gmail.com> Message-ID: On Wed, May 20, 2015 at 4:55 AM, Petr Viktorin wrote: > On 05/20/2015 02:22 AM, Eric Snow wrote: >> On Tue, May 19, 2015 at 5:06 AM, Petr Viktorin wrote: [snip] >>> No, that won't work. It's possible (via direct calls to the import >>> machinery) to load a module without adding it to sys.modules. >> >> What direct calls do you mean? I would not expect any such mechanism >> to work properly with extension modules. > > Reimplement > > without the sys.modules parts. You mean someone could do so? Sure, they could. Python has a philosophy of not stopping you from doing what is usually the wrong thing because sometimes it is the right thing for you. As we say, we're all consenting adults. In this case, we expect that folks will use the import system (or importlib) to import modules. If they do it manually then they are responsible to satisfy the semantics of the import system or risk bugs. One of the key goals of PEP 451 was to leave certain semantics up to the import machinery rather than requiring all finder/loader authors to implement the behavior. This includes a number of tricky parts like the sys.modules handling. > The point is that exec_module doesn't a priori depend on the module > being in sys.modules, which I think is a good thing. Well, there's an explicit specification about how sys.modules is used during loading. For post-exec sys.modules lookup specifically, https://docs.python.org/3.5//reference/import.html#id2. The note in the language reference says that it is an implementation detail. However, keep in mind that this PEP is a CPython-specific proposal. That said, I'm only -0 on not matching the sys.modules lookup behavior of module loading. It could be okay if we were to document the behavior clearly. My concern is with having different semantics even if it only relates to a remote corner case. It may be a corner case that someone will rely on. [snip] >> Be that as it may, I think it would be a mistake to treat support for >> multiple exec slots as a second-class citizen in the design. >> Personally I find it an appealing feature. > > It's there, but I'll not not advertise it too much in the docs. I'm okay with that. It's not like we're precluding promoting the behavior later. :) [snip] >>> Still, the steps are processed in a loop from a single function >>> (PyModule_ExecDef), and that function operates on a module object -- it >>> doesn't know about sys.modules and can't easily check if you replaced >>> the module somewhere. >> >> I would consider this approach to be a mistake as well. The approach >> should stay consistent with the semantics of the whole import system, >> where sys.modules is checked directly. Unfortunately, that ship has >> already sailed. > > It's the loader that checks sys.modules, *after* exec_module is called. Not the loader. It's the import machinery that does it. See importlib._bootstrap._exec. > No other implementation of exec_module checks sys.modules in the middle > of its operation. So I think the semantics are consistent. I was thinking of each exec slot as a parallel to Loader.exec_module. Thus I was expecting the same sys.modules lookup behavior that you get during module loading. That's why I would expect the module to get updated to sys.modules[spec.name] after each exec slot runs. At the moment I'm still -0 on not matching the sys.modules lookup semantics. However, like I said above, I can be convinced otherwise. [snip] >> I'm simply thinking in terms of the options we have for a PEP I'm >> working on that will facilitate passing objects between >> subinterpreters and even possibly sharing some state between them. >> Currently it will be practically necessary to exclude extension >> modules from any such mechanism. So I was wondering if there would be >> a way to allow extension module authors to define how at least some of >> the module's data could be shared between subinterpreters. > > You should be able to put that info in slots. It's hard to speculate > without knowing specifics, though. I'm sure you're right about slots so we should be fine. We can cross the bridge later. :) [snip] >> As I just noted, I'm looking at making use of subinterpreters for a >> different use case where it *does* make sense to effectively share >> objects between them. > > OK. This PEP isn't designed for that, but it should offer enough > extensibility. Right. [snip] >>> The internal _imp module will have backwards incompatible changes -- >>> functions will be added and removed as necessary. That's what the >>> underscore means :) >> >> Be careful with that assumption. We've had plenty of experiences >> where the assumption because unreliable. > > That's why I provide backcompat shims for undocumented, deprecated > functions in "imp". But _imp is just too low-level to do that easily. I'm okay with that, particularly since the _imp module is relatively new. -eric From ericsnowcurrently at gmail.com Wed May 20 17:14:37 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 20 May 2015 09:14:37 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: <555C6B45.9070001@gmail.com> References: <5559F0FD.3080704@gmail.com> <555C6B45.9070001@gmail.com> Message-ID: On Wed, May 20, 2015 at 5:08 AM, Petr Viktorin wrote: > On 05/20/2015 01:56 AM, Eric Snow wrote: >> Makes sense. This does remind me of something I wanted to ask. Would >> it make sense to leverage ModuleSpec.loader_state? If I recall >> correctly, we added loader_state with extension modules in mind. > > I don't think we want to go out of our way to support non-module > objects. Module subclasses should cover any needed functionality, and > they will support slots. Sorry I wasn't clear. ModuleSpec.loader_state isn't related to non-module objects or module subclasses. It's a mechanism by which finders can pass some loader-specific info to the loader. It could also be used to maintain some initial module state separately from the module. As I said, I thought we added loader_state with extension modules in mind, so I figured I'd ask. [snip] >> Hey, I tried to make something happen over on python-ideas! :) Some >> folks just don't want to go far enough. > > Yeah, as someone who's trying to get Python3 porting patches to Samba, I > can tell you some upstreams really, really, really don't like rewriting > their code. Sure. I'm not advocating for folks to rewrite their extension modules. Rather I want the docs to be more active in encouraging the use of tools like Cython. I think the discussion on python-ideas could still be resolved favorably. Mostly I had other things to do so I didn't move things forward. :) [snip] >> Yuck. Is this something we could fix? Is __module__ not set on all functions? > > The module object is not stored on classes, so methods dont' have access > to it. Do classes defined in an extension module not have a __module__ attribute (holding the module name)? > I want a fix for that to be my next PEP :) Cool! It may be good to have an explicit section in this PEP about possible follow-up features (e.g. "Out of Scope"). Also, it would be a good idea to have an explicit section in the PEP about backward-compatibility. (Pretty sure there wasn't one.) This is an important aspect of every PEP and should be clearly communicated, even if just to say there is no backward-incompatibility. Such a section is also a good place to clearly indicate what extension authors need to do to adapt to the new feature. -eric From ncoghlan at gmail.com Thu May 21 00:16:54 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 21 May 2015 08:16:54 +1000 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> <555C6829.60901@gmail.com> Message-ID: On 21 May 2015 at 00:56, Eric Snow wrote: > On Wed, May 20, 2015 at 4:55 AM, Petr Viktorin wrote: >> The point is that exec_module doesn't a priori depend on the module >> being in sys.modules, which I think is a good thing. > > Well, there's an explicit specification about how sys.modules is used > during loading. For post-exec sys.modules lookup specifically, > https://docs.python.org/3.5//reference/import.html#id2. The note in > the language reference says that it is an implementation detail. > However, keep in mind that this PEP is a CPython-specific proposal. > > That said, I'm only -0 on not matching the sys.modules lookup behavior > of module loading. It could be okay if we were to document the > behavior clearly. My concern is with having different semantics even > if it only relates to a remote corner case. It may be a corner case > that someone will rely on. We *will* match the semantics for the *overall* loading process. What Petr is saying is that *while* executing the "execution slots", they'll all receive the object returned by Py_mod_create (or the automatically created module if that slot is not defined), rather than any replacement injected into sys.modules. There's no Python level parallel for that "multiple execution slots" behaviour, so it makes sense to define the semantics based on simplicity of implementaiton and the fact we want to encourage the use of Py_mod_create for extension modules over sys.modules injection. >> No other implementation of exec_module checks sys.modules in the middle >> of its operation. So I think the semantics are consistent. > > I was thinking of each exec slot as a parallel to Loader.exec_module. > Thus I was expecting the same sys.modules lookup behavior that you get > during module loading. That's why I would expect the module to get > updated to sys.modules[spec.name] after each exec slot runs. I changed my mind when Petr posted the clarification that this is really just a matter of iterating over the defined slots in the loader's exec_module method, and calling any of them that are defined as execution slots (for the time, just Py_mod_exec). The entirety of a Python module runs in the same module namespace, regardless of what is done with sys.modules, so having all execution slots called with the same object is the extension module equivalent. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ericsnowcurrently at gmail.com Thu May 21 00:39:32 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 20 May 2015 16:39:32 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> <555C6829.60901@gmail.com> Message-ID: On Wed, May 20, 2015 at 4:16 PM, Nick Coghlan wrote: > On 21 May 2015 at 00:56, Eric Snow wrote: >> On Wed, May 20, 2015 at 4:55 AM, Petr Viktorin wrote: >>> The point is that exec_module doesn't a priori depend on the module >>> being in sys.modules, which I think is a good thing. >> >> Well, there's an explicit specification about how sys.modules is used >> during loading. For post-exec sys.modules lookup specifically, >> https://docs.python.org/3.5//reference/import.html#id2. The note in >> the language reference says that it is an implementation detail. >> However, keep in mind that this PEP is a CPython-specific proposal. >> >> That said, I'm only -0 on not matching the sys.modules lookup behavior >> of module loading. It could be okay if we were to document the >> behavior clearly. My concern is with having different semantics even >> if it only relates to a remote corner case. It may be a corner case >> that someone will rely on. > > We *will* match the semantics for the *overall* loading process. What > Petr is saying is that *while* executing the "execution slots", > they'll all receive the object returned by Py_mod_create (or the > automatically created module if that slot is not defined), rather than > any replacement injected into sys.modules. > > There's no Python level parallel for that "multiple execution slots" > behaviour, so it makes sense to define the semantics based on > simplicity of implementaiton and the fact we want to encourage the use > of Py_mod_create for extension modules over sys.modules injection. I was thinking along those same lines. I'm okay with that rationale. The PEP should be updated to clarify this point and its rationale. > >>> No other implementation of exec_module checks sys.modules in the middle >>> of its operation. So I think the semantics are consistent. >> >> I was thinking of each exec slot as a parallel to Loader.exec_module. >> Thus I was expecting the same sys.modules lookup behavior that you get >> during module loading. That's why I would expect the module to get >> updated to sys.modules[spec.name] after each exec slot runs. > > I changed my mind when Petr posted the clarification that this is > really just a matter of iterating over the defined slots in the > loader's exec_module method, and calling any of them that are defined > as execution slots (for the time, just Py_mod_exec). > > The entirety of a Python module runs in the same module namespace, > regardless of what is done with sys.modules, so having all execution > slots called with the same object is the extension module equivalent. Sounds good. Thanks for clarifying. -eric From ericsnowcurrently at gmail.com Wed May 20 23:47:29 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 20 May 2015 15:47:29 -0600 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 6 In-Reply-To: References: Message-ID: FYI, Nick asked if I would be willing to be BDFL-Delegate for this PEP and Guido has given the okay. I've added myself to the PEP's header. I'll try to make a decision soon (in time to land the patch before the feature freeze), but I also must be confident about the pronouncement. -eric On Wed, May 20, 2015 at 5:34 AM, Petr Viktorin wrote: > Hello, > Based mainly on comments by Eric Snow, I've sent another update to PEP 489. > > See the diff at https://hg.python.org/peps/rev/aad7a39a695b > > Here is a copy for your convenience: > > PEP: 489 > Title: Multi-phase extension module initialization > Version: $Revision$ > Last-Modified: $Date$ > Author: Petr Viktorin , > Stefan Behnel , > Nick Coghlan > Discussions-To: import-sig at python.org > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 11-Aug-2013 > Python-Version: 3.5 > Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015 > Resolution: > > > Abstract > ======== > > This PEP proposes a redesign of the way in which built-in and extension modules > interact with the import machinery. This was last revised for Python 3.0 in PEP > 3121, but did not solve all problems at the time. The goal is to solve > import-related problems by bringing extension modules closer to the way Python > modules behave; specifically to hook into the ModuleSpec-based loading > mechanism introduced in PEP 451. > > This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension > authors to only define features they need, and to allow future additions > to extension module declarations. > > Extensions modules are created in a two-step process, fitting better into > the ModuleSpec architecture, with parallels to __new__ and __init__ of classes. > > Extension modules can safely store arbitrary C-level per-module state in > the module that is covered by normal garbage collection and supports > reloading and sub-interpreters. > Extension authors are encouraged to take these issues into account > when using the new API. > > The proposal also allows extension modules with non-ASCII names. > > Not all problems tackled in PEP 3121 are solved in this proposal. > In particular, problems with run-time module lookup (PyState_FindModule) > are left to a future PEP. > > > Motivation > ========== > > Python modules and extension modules are not being set up in the same way. > For Python modules, the module object is created and set up first, then the > module code is being executed (PEP 302). > A ModuleSpec object (PEP 451) is used to hold information about the module, > and passed to the relevant hooks. > > For extensions (i.e. shared libraries) and built-in modules, the module > init function is executed straight away and does both the creation and > initialization. The initialization function is not passed the ModuleSpec, > or any information it contains, such as the __file__ or fully-qualified > name. This hinders relative imports and resource loading. > > In Py3, modules are also not being added to sys.modules, which means that a > (potentially transitive) re-import of the module will really try to re-import > it and thus run into an infinite loop when it executes the module init function > again. Without access to the fully-qualified module name, it is not trivial to > correctly add the module to sys.modules either. > This is specifically a problem for Cython generated modules, for which it's > not uncommon that the module init code has the same level of complexity as > that of any 'regular' Python module. Also, the lack of __file__ and __name__ > information hinders the compilation of "__init__.py" modules, i.e. packages, > especially when relative imports are being used at module init time. > > Furthermore, the majority of currently existing extension modules has > problems with sub-interpreter support and/or interpreter reloading, and, while > it is possible with the current infrastructure to support these > features, it is neither easy nor efficient. > Addressing these issues was the goal of PEP 3121, but many extensions, > including some in the standard library, took the least-effort approach > to porting to Python 3, leaving these issues unresolved. > This PEP keeps backwards compatibility, which should reduce pressure and give > extension authors adequate time to consider these issues when porting. > > > The current process > =================== > > Currently, extension and built-in modules export an initialization function > named "PyInit_modulename", named after the file name of the shared library. > This function is executed by the import machinery and must return a fully > initialized module object. > The function receives no arguments, so it has no way of knowing about its > import context. > > During its execution, the module init function creates a module object > based on a PyModuleDef object. It then continues to initialize it by adding > attributes to the module dict, creating types, etc. > > In the back, the shared library loader keeps a note of the fully qualified > module name of the last module that it loaded, and when a module gets > created that has a matching name, this global variable is used to determine > the fully qualified name of the module object. This is not entirely safe as it > relies on the module init function creating its own module object first, > but this assumption usually holds in practice. > > > The proposal > ============ > > The initialization function (PyInit_modulename) will be allowed to return > a pointer to a PyModuleDef object. The import machinery will be in charge > of constructing the module object, calling hooks provided in the PyModuleDef > in the relevant phases of initialization (as described below). > > This multi-phase initialization is an additional possibility. Single-phase > initialization, the current practice of returning a fully initialized module > object, will still be accepted, so existing code will work unchanged, > including binary compatibility. > > The PyModuleDef structure will be changed to contain a list of slots, > similarly to PEP 384's PyType_Spec for types. > To keep binary compatibility, and avoid needing to introduce a new structure > (which would introduce additional supporting functions and per-module storage), > the currently unused m_reload pointer of PyModuleDef will be changed to > hold the slots. The structures are defined as:: > > typedef struct { > int slot; > void *value; > } PyModuleDef_Slot; > > typedef struct PyModuleDef { > PyModuleDef_Base m_base; > const char* m_name; > const char* m_doc; > Py_ssize_t m_size; > PyMethodDef *m_methods; > PyModuleDef_Slot *m_slots; /* changed from `inquiry m_reload;` */ > traverseproc m_traverse; > inquiry m_clear; > freefunc m_free; > } PyModuleDef; > > The *m_slots* member must be either NULL, or point to an array of > PyModuleDef_Slot structures, terminated by a slot with id set to 0 > (i.e. ``{0, NULL}``). > > To specify a slot, a unique slot ID must be provided. > New Python versions may introduce new slot IDs, but slot IDs will never be > recycled. Slots may get deprecated, but will continue to be supported > throughout Python 3.x. > > A slot's value pointer may not be NULL, unless specified otherwise in the > slot's documentation. > > The following slots are currently available, and described later: > > * Py_mod_create > * Py_mod_exec > > Unknown slot IDs will cause the import to fail with SystemError. > > When using multi-phase initialization, the *m_name* field of PyModuleDef will > not be used during importing; the module name will be taken from the ModuleSpec. > > To prevent crashes when the module is loaded in older versions of Python, > the PyModuleDef object must be initialized using the newly added > PyModuleDef_Init function. This sets the object type (which cannot be done > statically on certain compilers), refcount, and internal bookkeeping data > (m_index). > For example, an extension module "example" would be exported as:: > > static PyModuleDef example_def = {...} > > PyMODINIT_FUNC > PyInit_example(void) > { > return PyModuleDef_Init(&example_def); > } > > The PyModuleDef object must be available for the lifetime of the module created > from it ? usually, it will be declared statically. > > Pseudo-code Overview > -------------------- > > Here is an overview of how the modified importers will operate. > Details such as logging or handling of errors and invalid states > are left out, and C code is presented with a concise Python-like syntax. > > The framework that calls the importers is explained in PEP 451 > [#pep-0451-loading]_. > > :: > > importlib/_bootstrap.py: > > class BuiltinImporter: > def create_module(self, spec): > module = _imp.create_builtin(spec) > > def exec_module(self, module): > _imp.exec_dynamic(module) > > def load_module(self, name): > # use a backwards compatibility shim > _load_module_shim(self, name) > > importlib/_bootstrap_external.py: > > class ExtensionFileLoader: > def create_module(self, spec): > module = _imp.create_dynamic(spec) > > def exec_module(self, module): > _imp.exec_dynamic(module) > > def load_module(self, name): > # use a backwards compatibility shim > _load_module_shim(self, name) > > Python/import.c (the _imp module): > > def create_dynamic(spec): > name = spec.name > path = spec.origin > > # Find an already loaded module that used single-phase init. > # For multi-phase initialization, mod is NULL, so a new module > # is always created. > mod = _PyImport_FindExtensionObject(name, name) > if mod: > return mod > > return _PyImport_LoadDynamicModuleWithSpec(spec) > > def exec_dynamic(module): > if not isinstance(module, types.ModuleType): > # non-modules are skipped -- PyModule_GetDef fails on them > return > > def = PyModule_GetDef(module) > state = PyModule_GetState(module) > if state is NULL: > PyModule_ExecDef(module, def) > > def create_builtin(spec): > name = spec.name > > # Find an already loaded module that used single-phase init. > # For multi-phase initialization, mod is NULL, so a new module > # is always created. > mod = _PyImport_FindExtensionObject(name, name) > if mod: > return mod > > for initname, initfunc in PyImport_Inittab: > if name == initname: > m = initfunc() > if isinstance(m, PyModuleDef): > def = m > return PyModule_FromDefAndSpec(def, spec) > else: > # fall back to single-phase initialization > module = m > _PyImport_FixupExtensionObject(module, name, name) > return module > > Python/importdl.c: > > def _PyImport_LoadDynamicModuleWithSpec(spec): > path = spec.origin > package, dot, name = spec.name.rpartition('.') > > # see the "Non-ASCII module names" section for export_hook_name > hook_name = export_hook_name(name) > > # call platform-specific function for loading exported function > # from shared library > exportfunc = _find_shared_funcptr(hook_name, path) > > m = exportfunc() > if isinstance(m, PyModuleDef): > def = m > return PyModule_FromDefAndSpec(def, spec) > > module = m > > # fall back to single-phase initialization > .... > > Objects/moduleobject.c: > > def PyModule_FromDefAndSpec(def, spec): > name = spec.name > create = None > for slot, value in def.m_slots: > if slot == Py_mod_create: > create = value > if create: > m = create(spec, def) > else: > m = PyModule_New(name) > > if isinstance(m, types.ModuleType): > m.md_state = None > m.md_def = def > > if def.m_methods: > PyModule_AddFunctions(m, def.m_methods) > if def.m_doc: > PyModule_SetDocString(m, def.m_doc) > > def PyModule_ExecDef(module, def): > if isinstance(module, types.module_type): > if module.md_state is NULL: > # allocate a block of zeroed-out memory > module.md_state = _alloc(module.md_size) > > if def.m_slots is NULL: > return > > for slot, value in def.m_slots: > if slot == Py_mod_exec: > value(module) > > > Module Creation Phase > --------------------- > > Creation of the module object ? that is, the implementation of > ExecutionLoader.create_module ? is governed by the Py_mod_create slot. > > The Py_mod_create slot > ...................... > > The Py_mod_create slot is used to support custom module subclasses. > The value pointer must point to a function with the following signature:: > > PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def) > > The function receives a ModuleSpec instance, as defined in PEP 451, > and the PyModuleDef structure. > It should return a new module object, or set an error > and return NULL. > > This function is not responsible for setting import-related attributes > specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or > ``__loader__``) on the new module. > > There is no requirement for the returned object to be an instance of > types.ModuleType. Any type can be used, as long as it supports setting and > getting attributes, including at least the import-related attributes. > However, only ModuleType instances support module-specific functionality > such as per-module state. > > Note that when this function is called, the module's entry in sys.modules > is not populated yet. Attempting to import the same module again > (possibly transitively), may lead to an infinite loop. > Extension authors are advised to keep Py_mod_create minimal, an in particular > to not call user code from it. > > Multiple Py_mod_create slots may not be specified. If they are, import > will fail with SystemError. > > If Py_mod_create is not specified, the import machinery will create a normal > module object using PyModule_New. The name is taken from *spec*. > > > Post-creation steps > ................... > > If the Py_mod_create function returns an instance of types.ModuleType > or a subclass (or if a Py_mod_create slot is not present), the import > machinery will associate the PyModuleDef with the module. > This also makes the PyModuleDef accessible to execution phase, the > PyModule_GetDef function, and garbage collection routines (traverse, > clear, free). > > If the Py_mod_create function does not return a module subclass, then m_size > must be 0, and m_traverse, m_clear and m_free must all be NULL. > Otherwise, SystemError is raised. > > Additionally, initial attributes specified in the PyModuleDef are set on the > module object, regardless of its type: > > * The docstring is set from m_doc, if non-NULL. > * The module's functions are initialized from m_methods, if any. > > > Module Execution Phase > ---------------------- > > Module execution -- that is, the implementation of > ExecutionLoader.exec_module -- is governed by "execution slots". > This PEP only adds one, Py_mod_exec, but others may be added in the future. > > The execution phase is done on the PyModuleDef associated with the module > object. For objects that are not a subclass of PyModule_Type (for which > PyModule_GetDef would fail), the execution phase is skipped. > > Execution slots may be specified multiple times, and are processed in the order > they appear in the slots array. > When using the default import machinery, they are processed after > import-related attributes specified in PEP 451 [#pep-0451-attributes]_ > (such as ``__name__`` or ``__loader__``) are set and the module is added > to sys.modules. > > > Pre-Execution steps > ------------------- > > Before processing the execution slots, per-module state is allocated for the > module. From this point on, per-module state is accessible through > PyModule_GetState. > > > The Py_mod_exec slot > .................... > > The entry in this slot must point to a function with the following signature:: > > int (*PyModuleExecFunction)(PyObject* module) > > It will be called to initialize a module. Usually, this amounts to > setting the module's initial attributes. > The "module" argument receives the module object to initialize. > > If PyModuleExec replaces the module's entry in sys.modules, > the new object will be used and returned by importlib machinery. > (This mirrors the behavior of Python modules. Note that implementing > Py_mod_create is usually a better solution for the use cases this serves.) > > The function must return ``0`` on success, or, on error, set an exception and > return ``-1``. > > > Legacy Init > ----------- > > The backwards-compatible single-phase initialization continues to be supported. > In this scheme, the PyInit function returns a fully initialized module rather > than a PyModuleDef object. > In this case, the PyInit hook implements the creation phase, and the execution > phase is a no-op. > > Modules that need to work unchanged on older versions of Python should stick to > single-phase initialization, because the benefits it brings can't be > back-ported. > Here is an example of a module that supports multi-phase initialization, > and falls back to single-phase when compiled for an older version of CPython. > It is included mainly as an illustration of the changes needed to enable > multi-phase init:: > > #include > > static int spam_exec(PyObject *module) { > PyModule_AddStringConstant(module, "food", "spam"); > return 0; > } > > #ifdef Py_mod_exec > static PyModuleDef_Slot spam_slots[] = { > {Py_mod_exec, spam_exec}, > {0, NULL} > }; > #endif > > static PyModuleDef spam_def = { > PyModuleDef_HEAD_INIT, /* m_base */ > "spam", /* m_name */ > PyDoc_STR("Utilities for cooking spam"), /* m_doc */ > 0, /* m_size */ > NULL, /* m_methods */ > #ifdef Py_mod_exec > spam_slots, /* m_slots */ > #else > NULL, > #endif > NULL, /* m_traverse */ > NULL, /* m_clear */ > NULL, /* m_free */ > }; > > PyMODINIT_FUNC > PyInit_spam(void) { > #ifdef Py_mod_exec > return PyModuleDef_Init(&spam_def); > #else > PyObject *module; > module = PyModule_Create(&spam_def); > if (module == NULL) return NULL; > if (spam_exec(module) != 0) { > Py_DECREF(module); > return NULL; > } > return module; > #endif > } > > > Built-In modules > ---------------- > > Any extension module can be used as a built-in module by linking it into > the executable, and including it in the inittab (either at runtime with > PyImport_AppendInittab, or at configuration time, using tools like *freeze*). > > To keep this possibility, all changes to extension module loading introduced > in this PEP will also apply to built-in modules. > The only exception is non-ASCII module names, explained below. > > > Subinterpreters and Interpreter Reloading > ----------------------------------------- > > Extensions using the new initialization scheme are expected to support > subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly, > avoiding the issues mentioned in Python documentation [#subinterpreter-docs]_. > The mechanism is designed to make this easy, but care is still required > on the part of the extension author. > No user-defined functions, methods, or instances may leak to different > interpreters. > To achieve this, all module-level state should be kept in either the module > dict, or in the module object's storage reachable by PyModule_GetState. > A simple rule of thumb is: Do not define any static data, except built-in types > with no mutable or user-settable class attributes. > > > Functions incompatible with multi-phase initialization > ------------------------------------------------------ > > The PyModule_Create function will fail when used on a PyModuleDef structure > with a non-NULL *m_slots* pointer. > The function doesn't have access to the ModuleSpec object necessary for > multi-phase initialization. > > The PyState_FindModule function will return NULL, and PyState_AddModule > and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*. > PyState registration is disabled because multiple module objects may be created > from the same PyModuleDef. > > > Module state and C-level callbacks > ---------------------------------- > > Due to the unavailability of PyState_FindModule, any function that needs access > to module-level state (including functions, classes or exceptions defined at > the module level) must receive a reference to the module object (or the > particular object it needs), either directly or indirectly. > This is currently difficult in two situations: > > * Methods of classes, which receive a reference to the class, but not to > the class's module > * Libraries with C-level callbacks, unless the callbacks can receive custom > data set at callback registration > > Fixing these cases is outside of the scope of this PEP, but will be needed for > the new mechanism to be useful to all modules. Proper fixes have been discussed > on the import-sig mailing list [#findmodule-discussion]_. > > As a rule of thumb, modules that rely on PyState_FindModule are, at the moment, > not good candidates for porting to the new mechanism. > > > New Functions > ------------- > > A new function and macro implementing the module creation phase will be added. > These are similar to PyModule_Create and PyModule_Create2, except they > take an additional ModuleSpec argument, and handle module definitions with > non-NULL slots:: > > PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec) > PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec, > int module_api_version) > > A new function implementing the module execution phase will be added. > This allocates per-module state (if not allocated already), and *always* > processes execution slots. The import machinery calls this method when > a module is executed, unless the module is being reloaded:: > > PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def) > > Another function will be introduced to initialize a PyModuleDef object. > This idempotent function fills in the type, refcount, and module index. > It returns its argument cast to PyObject*, so it can be returned directly > from a PyInit function:: > > PyObject * PyModuleDef_Init(PyModuleDef *); > > Additionally, two helpers will be added for setting the docstring and > methods on a module:: > > int PyModule_SetDocString(PyObject *, const char *) > int PyModule_AddFunctions(PyObject *, PyMethodDef *) > > > Export Hook Name > ---------------- > > As portable C identifiers are limited to ASCII, module names > must be encoded to form the PyInit hook name. > > For ASCII module names, the import hook is named > PyInit_, where is the name of the module. > > For module names containing non-ASCII characters, the import hook is named > PyInitU_, where the name is encoded using CPython's > "punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix), > with hyphens ("-") replaced by underscores ("_"). > > > In Python:: > > def export_hook_name(name): > try: > suffix = b'_' + name.encode('ascii') > except UnicodeEncodeError: > suffix = b'U_' + name.encode('punycode').replace(b'-', b'_') > return b'PyInit' + suffix > > Examples: > > ============= =================== > Module name Init hook name > ============= =================== > spam PyInit_spam > lan?m?t PyInitU_lanmt_2sa6t > ??? PyInitU_zck5b2b > ============= =================== > > For modules with non-ASCII names, single-phase initialization is not supported. > > In the initial implementation of this PEP, built-in modules with non-ASCII > names will not be supported. > > > Module Reloading > ---------------- > > Reloading an extension module using importlib.reload() will continue to > have no effect, except re-setting import-related attributes. > > Due to limitations in shared library loading (both dlopen on POSIX and > LoadModuleEx on Windows), it is not generally possible to load > a modified library after it has changed on disk. > > Use cases for reloading other than trying out a new version of the module > are too rare to require all module authors to keep reloading in mind. > If reload-like functionality is needed, authors can export a dedicated > function for it. > > > Multiple modules in one library > ------------------------------- > > To support multiple Python modules in one shared library, the library can > export additional PyInit* symbols besides the one that corresponds > to the library's filename. > > Note that this mechanism can currently only be used to *load* extra modules, > but not to *find* them. (This is a limitation of the loader mechanism, > which this PEP does not try to modify.) > To work around the lack of a suitable finder, code like the following > can be used:: > > import importlib.machinery > import importlib.util > loader = importlib.machinery.ExtensionFileLoader(name, path) > spec = importlib.util.spec_from_loader(name, loader) > module = importlib.util.module_from_spec(spec) > loader.exec_module(module) > return module > > On platforms that support symbolic links, these may be used to install one > library under multiple names, exposing all exported modules to normal > import machinery. > > > Testing and initial implementations > ----------------------------------- > > For testing, a new built-in module ``_testmultiphase`` will be created. > The library will export several additional modules using the mechanism > described in "Multiple modules in one library". > > The ``_testcapi`` module will be unchanged, and will use single-phase > initialization indefinitely (or until it is no longer supported). > > The ``array`` and ``xx*`` modules will be converted to use multi-phase > initialization as part of the initial implementation. > > > Summary of API Changes and Additions > ------------------------------------ > > New functions: > > * PyModule_FromDefAndSpec (macro) > * PyModule_FromDefAndSpec2 > * PyModule_ExecDef > * PyModule_SetDocString > * PyModule_AddFunctions > * PyModuleDef_Init > > New macros: > > * Py_mod_create > * Py_mod_exec > > New types: > > * PyModuleDef_Type will be exposed > > New structures: > > * PyModuleDef_Slot > > PyModuleDef.m_reload changes to PyModuleDef.m_slots. > > The internal ``_imp`` module will have backwards incompatible changes: > ``create_builtin``, ``create_dynamic``, and ``exec_dynamic`` will be added; > ``init_builtin``, ``load_dynamic`` will be removed. > > The undocumented functions ``imp.load_dynamic`` and ``imp.init_builtin`` will > be replaced by backwards-compatible shims. > > > Possible Future Extensions > ========================== > > The slots mechanism, inspired by PyType_Slot from PEP 384, > allows later extensions. > > Some extension modules exports many constants; for example _ssl has > a long list of calls in the form:: > > PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", > PY_SSL_ERROR_ZERO_RETURN); > > Converting this to a declarative list, similar to PyMethodDef, > would reduce boilerplate, and provide free error-checking which > is often missing. > > String constants and types can be handled similarly. > (Note that non-default bases for types cannot be portably specified > statically; this case would need a Py_mod_exec function that runs > before the slots are added. The free error-checking would still be > beneficial, though.) > > Another possibility is providing a "main" function that would be run > when the module is given to Python's -m switch. > For this to work, the runpy module will need to be modified to take > advantage of ModuleSpec-based loading introduced in PEP 451. > Also, it will be necessary to add a mechanism for setting up a module > according to slots it wasn't originally defined with. > > > Implementation > ============== > > Work-in-progress implementation is available in a Github repository [#gh-repo]_; > a patchset is at [#gh-patch]_. > > > Previous Approaches > =================== > > Stefan Behnel's initial proto-PEP [#stefans_protopep]_ > had a "PyInit_modulename" hook that would create a module class, > whose ``__init__`` would be then called to create the module. > This proposal did not correspond to the (then nonexistent) PEP 451, > where module creation and initialization is broken into distinct steps. > It also did not support loading an extension into pre-existing module objects. > > Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype > implementation [#nicks-prototype]_. > At this time PEP 451 was still not implemented, so the prototype > does not use ModuleSpec. > > The original version of this PEP used Create and Exec hooks, and allowed > loading into arbitrary pre-constructed objects with Exec hook. > The proposal made extension module initialization closer to how Python modules > are initialized, but it was later recognized that this isn't an important goal. > The current PEP describes a simpler solution. > > A further iteration used a "PyModuleExport" hook as an alternative to PyInit, > where PyInit was used for existing scheme, and PyModuleExport for multi-phase. > However, not being able to determine the hook name based on module name > complicated automatic generation of PyImport_Inittab by tools like freeze. > Keeping only the PyInit hook name, even if it's not entirely appropriate for > exporting a definition, yielded a much simpler solution. > > > References > ========== > > .. [#pep-0451-attributes] > https://www.python.org/dev/peps/pep-0451/#attributes > > .. [#stefans_protopep] > https://mail.python.org/pipermail/python-dev/2013-August/128087.html > > .. [#nicks-prototype] > https://mail.python.org/pipermail/python-dev/2013-August/128101.html > > .. [#rfc-3492] > http://tools.ietf.org/html/rfc3492 > > .. [#gh-repo] > https://github.com/encukou/cpython/commits/pep489 > > .. [#gh-patch] > https://github.com/encukou/cpython/compare/master...encukou:pep489.patch > > .. [#findmodule-discussion] > https://mail.python.org/pipermail/import-sig/2015-April/000959.html > > .. [#pep-0451-loading] > https://www.python.org/dev/peps/pep-0451/#how-loading-will-work] > > .. [#subinterpreter-docs] > https://docs.python.org/3/c-api/init.html#sub-interpreter-support > > > Copyright > ========= > > This document has been placed in the public domain. > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > https://mail.python.org/mailman/listinfo/import-sig From encukou at gmail.com Thu May 21 02:49:30 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 21 May 2015 02:49:30 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <555C6B45.9070001@gmail.com> Message-ID: On Wed, May 20, 2015 at 5:14 PM, Eric Snow wrote: > On Wed, May 20, 2015 at 5:08 AM, Petr Viktorin wrote: >> On 05/20/2015 01:56 AM, Eric Snow wrote: >>> Makes sense. This does remind me of something I wanted to ask. Would >>> it make sense to leverage ModuleSpec.loader_state? If I recall >>> correctly, we added loader_state with extension modules in mind. >> >> I don't think we want to go out of our way to support non-module >> objects. Module subclasses should cover any needed functionality, and >> they will support slots. > > Sorry I wasn't clear. ModuleSpec.loader_state isn't related to > non-module objects or module subclasses. It's a mechanism by which > finders can pass some loader-specific info to the loader. It could > also be used to maintain some initial module state separately from the > module. As I said, I thought we added loader_state with extension > modules in mind, so I figured I'd ask. It turns out to be unnecessary. I will add that if create returns a non-module object, no execution slots should be specified (i.e. there should only be a Py_mod_create). That will allow us to change our mind later if this turns out to be a bad idea, but I doubt it will. > [snip] >>> Yuck. Is this something we could fix? Is __module__ not set on all functions? >> >> The module object is not stored on classes, so methods dont' have access >> to it. > > Do classes defined in an extension module not have a __module__ > attribute (holding the module name)? They do, but that's not good enough: - Looking up the name in sys.modules is slow. - Both that and sys.modules are OK to be modified by Python code, so you can easily get a different module from such a lookup, and using a different module's state pointer will most likely segfault. (Maybe this discussion needs a new mail thread?) >> I want a fix for that to be my next PEP :) > > Cool! It may be good to have an explicit section in this PEP about > possible follow-up features (e.g. "Out of Scope"). There is a section for follow-up features already (it talks about possible future slots). This follow-up didn't make it in -- I think it's too far out of scope, as it isn't really concerned with loading modules. I think the link in the section about PyState_FindModule is enough. > Also, it would be a good idea to have an explicit section in the PEP > about backward-compatibility. (Pretty sure there wasn't one.) This > is an important aspect of every PEP and should be clearly > communicated, even if just to say there is no > backward-incompatibility. Such a section is also a good place to > clearly indicate what extension authors need to do to adapt to the new > feature. OK, I can add that. (in the morning; it's 3 AM here so the changes wouldn't be any good now.) From encukou at gmail.com Thu May 21 02:50:07 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 21 May 2015 02:50:07 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> <555C6829.60901@gmail.com> Message-ID: On Thu, May 21, 2015 at 12:39 AM, Eric Snow wrote: > On Wed, May 20, 2015 at 4:16 PM, Nick Coghlan wrote: >> On 21 May 2015 at 00:56, Eric Snow wrote: >>> On Wed, May 20, 2015 at 4:55 AM, Petr Viktorin wrote: >>>> The point is that exec_module doesn't a priori depend on the module >>>> being in sys.modules, which I think is a good thing. >>> >>> Well, there's an explicit specification about how sys.modules is used >>> during loading. For post-exec sys.modules lookup specifically, >>> https://docs.python.org/3.5//reference/import.html#id2. The note in >>> the language reference says that it is an implementation detail. >>> However, keep in mind that this PEP is a CPython-specific proposal. >>> >>> That said, I'm only -0 on not matching the sys.modules lookup behavior >>> of module loading. It could be okay if we were to document the >>> behavior clearly. My concern is with having different semantics even >>> if it only relates to a remote corner case. It may be a corner case >>> that someone will rely on. >> >> We *will* match the semantics for the *overall* loading process. What >> Petr is saying is that *while* executing the "execution slots", >> they'll all receive the object returned by Py_mod_create (or the >> automatically created module if that slot is not defined), rather than >> any replacement injected into sys.modules. >> >> There's no Python level parallel for that "multiple execution slots" >> behaviour, so it makes sense to define the semantics based on >> simplicity of implementaiton and the fact we want to encourage the use >> of Py_mod_create for extension modules over sys.modules injection. > > I was thinking along those same lines. I'm okay with that rationale. > The PEP should be updated to clarify this point and its rationale. There's no provision in the machinery to call multiple different implementations of exec_module. And all sys.modules lookup/manipulation is done by the machinery, so it doesn't make sense to do it in ExtensionFileLoader.exec_module, either. I believe that now, with the pseudo-code overview, this is clearer, so a rationale isn't needed (the reason it was needed in the first place is that the PEP was confusing.) I will clarify the semantics Py_mod_exec section, though. From stefan_ml at behnel.de Thu May 21 08:06:37 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 21 May 2015 08:06:37 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 6 In-Reply-To: References: Message-ID: Petr Viktorin schrieb am 20.05.2015 um 13:34: > To prevent crashes when the module is loaded in older versions of Python, > the PyModuleDef object must be initialized using the newly added > PyModuleDef_Init function. This sets the object type (which cannot be done > statically on certain compilers), refcount, and internal bookkeeping data > (m_index). > For example, an extension module "example" would be exported as:: > > static PyModuleDef example_def = {...} > > PyMODINIT_FUNC > PyInit_example(void) > { > return PyModuleDef_Init(&example_def); > } If PyModuleDef_Init() is really a function, this will not help with "older versions of Python", which do not have the function available. So, is it going to be a macro? Stefan From stefan_ml at behnel.de Thu May 21 08:22:27 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 21 May 2015 08:22:27 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 6 In-Reply-To: References: Message-ID: Stefan Behnel schrieb am 21.05.2015 um 08:06: > Petr Viktorin schrieb am 20.05.2015 um 13:34: >> To prevent crashes when the module is loaded in older versions of Python, >> the PyModuleDef object must be initialized using the newly added >> PyModuleDef_Init function. This sets the object type (which cannot be done >> statically on certain compilers), refcount, and internal bookkeeping data >> (m_index). >> For example, an extension module "example" would be exported as:: >> >> static PyModuleDef example_def = {...} >> >> PyMODINIT_FUNC >> PyInit_example(void) >> { >> return PyModuleDef_Init(&example_def); >> } > > If PyModuleDef_Init() is really a function, this will not help with "older > versions of Python", which do not have the function available. So, is it > going to be a macro? Ah, ok, I found it further down in the PEP. It's not actually supposed to be called in older Python versions, right? Meaning, we only provide source level backwards compatibility and not binary backwards compatibility for extension modules? Then the paragraph above is really misleading. Stefan From encukou at gmail.com Thu May 21 10:21:03 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 21 May 2015 10:21:03 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 5 In-Reply-To: References: <5559F0FD.3080704@gmail.com> <555B1937.5020001@gmail.com> <555B4B4A.5000902@redhat.com> <555C47CD.4060406@redhat.com> Message-ID: On Wed, May 20, 2015 at 4:07 PM, Eric Snow wrote: > On Wed, May 20, 2015 at 2:37 AM, Petr Viktorin wrote: >> On 05/20/2015 02:33 AM, Eric Snow wrote: > [snip] >>> Won't frozen modules be likewise affected? >> >> No, frozen modules are Python source, just not loaded from a file. > > Isn't the mechanism similar to builtins? No. FrozenImporter loads bytecode from a compiled-in marshalled string, and then exec() it. It's completely different. > Regardless, I was hopeful that we could fix FrozenImporter at the same time > that we fixed BuiltinImporter. I'm not sure what's to fix in FrozenImporter (it uses create_module/exec_module already, is there something else?), but I doubt this PEP is the right place. From encukou at gmail.com Thu May 21 13:27:16 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 21 May 2015 13:27:16 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 7 Message-ID: Hello, Based on the last round of comments, I've sent changes to PEP editors. There is one functional change: - Don't allow execution slots for non-module subclasses and several wording fixes/clarifications: - Remove misleading reason for PyModuleDef_Init - Clarify that sys.modules is not checked between execution steps - Add a Backwards Compatibility summary - Heading level fix, typo fix The full text follows: PEP: 489 Title: Multi-phase extension module initialization Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan BDFL-Delegate: Eric Snow Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015 Resolution: Abstract ======== This PEP proposes a redesign of the way in which built-in and extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve import-related problems by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension authors to only define features they need, and to allow future additions to extension module declarations. Extensions modules are created in a two-step process, fitting better into the ModuleSpec architecture, with parallels to __new__ and __init__ of classes. Extension modules can safely store arbitrary C-level per-module state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. The proposal also allows extension modules with non-ASCII names. Not all problems tackled in PEP 3121 are solved in this proposal. In particular, problems with run-time module lookup (PyState_FindModule) are left to a future PEP. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module object is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions (i.e. shared libraries) and built-in modules, the module init function is executed straight away and does both the creation and initialization. The initialization function is not passed the ModuleSpec, or any information it contains, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. In Py3, modules are also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to re-import it and thus run into an infinite loop when it executes the module init function again. Without access to the fully-qualified module name, it is not trivial to correctly add the module to sys.modules either. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of "__init__.py" modules, i.e. packages, especially when relative imports are being used at module init time. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or interpreter reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps backwards compatibility, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process =================== Currently, extension and built-in modules export an initialization function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return a fully initialized module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef object. It then continues to initialize it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the fully qualified name of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The proposal ============ The initialization function (PyInit_modulename) will be allowed to return a pointer to a PyModuleDef object. The import machinery will be in charge of constructing the module object, calling hooks provided in the PyModuleDef in the relevant phases of initialization (as described below). This multi-phase initialization is an additional possibility. Single-phase initialization, the current practice of returning a fully initialized module object, will still be accepted, so existing code will work unchanged, including binary compatibility. The PyModuleDef structure will be changed to contain a list of slots, similarly to PEP 384's PyType_Spec for types. To keep binary compatibility, and avoid needing to introduce a new structure (which would introduce additional supporting functions and per-module storage), the currently unused m_reload pointer of PyModuleDef will be changed to hold the slots. The structures are defined as:: typedef struct { int slot; void *value; } PyModuleDef_Slot; typedef struct PyModuleDef { PyModuleDef_Base m_base; const char* m_name; const char* m_doc; Py_ssize_t m_size; PyMethodDef *m_methods; PyModuleDef_Slot *m_slots; /* changed from `inquiry m_reload;` */ traverseproc m_traverse; inquiry m_clear; freefunc m_free; } PyModuleDef; The *m_slots* member must be either NULL, or point to an array of PyModuleDef_Slot structures, terminated by a slot with id set to 0 (i.e. ``{0, NULL}``). To specify a slot, a unique slot ID must be provided. New Python versions may introduce new slot IDs, but slot IDs will never be recycled. Slots may get deprecated, but will continue to be supported throughout Python 3.x. A slot's value pointer may not be NULL, unless specified otherwise in the slot's documentation. The following slots are currently available, and described later: * Py_mod_create * Py_mod_exec Unknown slot IDs will cause the import to fail with SystemError. When using multi-phase initialization, the *m_name* field of PyModuleDef will not be used during importing; the module name will be taken from the ModuleSpec. Before it is returned from PyInit_*, the PyModuleDef object must be initialized using the newly added PyModuleDef_Init function. This sets the object type (which cannot be done statically on certain compilers), refcount, and internal bookkeeping data (m_index). For example, an extension module "example" would be exported as:: static PyModuleDef example_def = {...} PyMODINIT_FUNC PyInit_example(void) { return PyModuleDef_Init(&example_def); } The PyModuleDef object must be available for the lifetime of the module created from it ? usually, it will be declared statically. Pseudo-code Overview -------------------- Here is an overview of how the modified importers will operate. Details such as logging or handling of errors and invalid states are left out, and C code is presented with a concise Python-like syntax. The framework that calls the importers is explained in PEP 451 [#pep-0451-loading]_. :: importlib/_bootstrap.py: class BuiltinImporter: def create_module(self, spec): module = _imp.create_builtin(spec) def exec_module(self, module): _imp.exec_dynamic(module) def load_module(self, name): # use a backwards compatibility shim _load_module_shim(self, name) importlib/_bootstrap_external.py: class ExtensionFileLoader: def create_module(self, spec): module = _imp.create_dynamic(spec) def exec_module(self, module): _imp.exec_dynamic(module) def load_module(self, name): # use a backwards compatibility shim _load_module_shim(self, name) Python/import.c (the _imp module): def create_dynamic(spec): name = spec.name path = spec.origin # Find an already loaded module that used single-phase init. # For multi-phase initialization, mod is NULL, so a new module # is always created. mod = _PyImport_FindExtensionObject(name, name) if mod: return mod return _PyImport_LoadDynamicModuleWithSpec(spec) def exec_dynamic(module): if not isinstance(module, types.ModuleType): # non-modules are skipped -- PyModule_GetDef fails on them return def = PyModule_GetDef(module) state = PyModule_GetState(module) if state is NULL: PyModule_ExecDef(module, def) def create_builtin(spec): name = spec.name # Find an already loaded module that used single-phase init. # For multi-phase initialization, mod is NULL, so a new module # is always created. mod = _PyImport_FindExtensionObject(name, name) if mod: return mod for initname, initfunc in PyImport_Inittab: if name == initname: m = initfunc() if isinstance(m, PyModuleDef): def = m return PyModule_FromDefAndSpec(def, spec) else: # fall back to single-phase initialization module = m _PyImport_FixupExtensionObject(module, name, name) return module Python/importdl.c: def _PyImport_LoadDynamicModuleWithSpec(spec): path = spec.origin package, dot, name = spec.name.rpartition('.') # see the "Non-ASCII module names" section for export_hook_name hook_name = export_hook_name(name) # call platform-specific function for loading exported function # from shared library exportfunc = _find_shared_funcptr(hook_name, path) m = exportfunc() if isinstance(m, PyModuleDef): def = m return PyModule_FromDefAndSpec(def, spec) module = m # fall back to single-phase initialization .... Objects/moduleobject.c: def PyModule_FromDefAndSpec(def, spec): name = spec.name create = None for slot, value in def.m_slots: if slot == Py_mod_create: create = value if create: m = create(spec, def) else: m = PyModule_New(name) if isinstance(m, types.ModuleType): m.md_state = None m.md_def = def if def.m_methods: PyModule_AddFunctions(m, def.m_methods) if def.m_doc: PyModule_SetDocString(m, def.m_doc) def PyModule_ExecDef(module, def): if isinstance(module, types.module_type): if module.md_state is NULL: # allocate a block of zeroed-out memory module.md_state = _alloc(module.md_size) if def.m_slots is NULL: return for slot, value in def.m_slots: if slot == Py_mod_exec: value(module) Module Creation Phase --------------------- Creation of the module object ? that is, the implementation of ExecutionLoader.create_module ? is governed by the Py_mod_create slot. The Py_mod_create slot ...................... The Py_mod_create slot is used to support custom module subclasses. The value pointer must point to a function with the following signature:: PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def) The function receives a ModuleSpec instance, as defined in PEP 451, and the PyModuleDef structure. It should return a new module object, or set an error and return NULL. This function is not responsible for setting import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) on the new module. There is no requirement for the returned object to be an instance of types.ModuleType. Any type can be used, as long as it supports setting and getting attributes, including at least the import-related attributes. However, only ModuleType instances support module-specific functionality such as per-module state and processing of execution slots. If something other than a ModuleType subclass is returned, no execution slots may be defined; if any are, a SystemError is raised. Note that when this function is called, the module's entry in sys.modules is not populated yet. Attempting to import the same module again (possibly transitively), may lead to an infinite loop. Extension authors are advised to keep Py_mod_create minimal, an in particular to not call user code from it. Multiple Py_mod_create slots may not be specified. If they are, import will fail with SystemError. If Py_mod_create is not specified, the import machinery will create a normal module object using PyModule_New. The name is taken from *spec*. Post-creation steps ................... If the Py_mod_create function returns an instance of types.ModuleType or a subclass (or if a Py_mod_create slot is not present), the import machinery will associate the PyModuleDef with the module. This also makes the PyModuleDef accessible to execution phase, the PyModule_GetDef function, and garbage collection routines (traverse, clear, free). If the Py_mod_create function does not return a module subclass, then m_size must be 0, and m_traverse, m_clear and m_free must all be NULL. Otherwise, SystemError is raised. Additionally, initial attributes specified in the PyModuleDef are set on the module object, regardless of its type: * The docstring is set from m_doc, if non-NULL. * The module's functions are initialized from m_methods, if any. Module Execution Phase ---------------------- Module execution -- that is, the implementation of ExecutionLoader.exec_module -- is governed by "execution slots". This PEP only adds one, Py_mod_exec, but others may be added in the future. The execution phase is done on the PyModuleDef associated with the module object. For objects that are not a subclass of PyModule_Type (for which PyModule_GetDef would fail), the execution phase is skipped. Execution slots may be specified multiple times, and are processed in the order they appear in the slots array. When using the default import machinery, they are processed after import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) are set and the module is added to sys.modules. Pre-Execution steps ................... Before processing the execution slots, per-module state is allocated for the module. From this point on, per-module state is accessible through PyModule_GetState. The Py_mod_exec slot .................... The entry in this slot must point to a function with the following signature:: int (*PyModuleExecFunction)(PyObject* module) It will be called to initialize a module. Usually, this amounts to setting the module's initial attributes. The "module" argument receives the module object to initialize. The function must return ``0`` on success, or, on error, set an exception and return ``-1``. If PyModuleExec replaces the module's entry in sys.modules, the new object will be used and returned by importlib machinery after all execution slots are processed. This is a feature of the import machinery itself. The slots themselves are all processed using the module returned from the creation phase; sys.modules is not consulted during the execution phase. (Note that for extension modules, implementing Py_mod_create is usually a better solution for using custom module objects.) Legacy Init ----------- The backwards-compatible single-phase initialization continues to be supported. In this scheme, the PyInit function returns a fully initialized module rather than a PyModuleDef object. In this case, the PyInit hook implements the creation phase, and the execution phase is a no-op. Modules that need to work unchanged on older versions of Python should stick to single-phase initialization, because the benefits it brings can't be back-ported. Here is an example of a module that supports multi-phase initialization, and falls back to single-phase when compiled for an older version of CPython. It is included mainly as an illustration of the changes needed to enable multi-phase init:: #include static int spam_exec(PyObject *module) { PyModule_AddStringConstant(module, "food", "spam"); return 0; } #ifdef Py_mod_exec static PyModuleDef_Slot spam_slots[] = { {Py_mod_exec, spam_exec}, {0, NULL} }; #endif static PyModuleDef spam_def = { PyModuleDef_HEAD_INIT, /* m_base */ "spam", /* m_name */ PyDoc_STR("Utilities for cooking spam"), /* m_doc */ 0, /* m_size */ NULL, /* m_methods */ #ifdef Py_mod_exec spam_slots, /* m_slots */ #else NULL, #endif NULL, /* m_traverse */ NULL, /* m_clear */ NULL, /* m_free */ }; PyMODINIT_FUNC PyInit_spam(void) { #ifdef Py_mod_exec return PyModuleDef_Init(&spam_def); #else PyObject *module; module = PyModule_Create(&spam_def); if (module == NULL) return NULL; if (spam_exec(module) != 0) { Py_DECREF(module); return NULL; } return module; #endif } Built-In modules ---------------- Any extension module can be used as a built-in module by linking it into the executable, and including it in the inittab (either at runtime with PyImport_AppendInittab, or at configuration time, using tools like *freeze*). To keep this possibility, all changes to extension module loading introduced in this PEP will also apply to built-in modules. The only exception is non-ASCII module names, explained below. Subinterpreters and Interpreter Reloading ----------------------------------------- Extensions using the new initialization scheme are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly, avoiding the issues mentioned in Python documentation [#subinterpreter-docs]_. The mechanism is designed to make this easy, but care is still required on the part of the extension author. No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object's storage reachable by PyModule_GetState. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. Functions incompatible with multi-phase initialization ------------------------------------------------------ The PyModule_Create function will fail when used on a PyModuleDef structure with a non-NULL *m_slots* pointer. The function doesn't have access to the ModuleSpec object necessary for multi-phase initialization. The PyState_FindModule function will return NULL, and PyState_AddModule and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*. PyState registration is disabled because multiple module objects may be created from the same PyModuleDef. Module state and C-level callbacks ---------------------------------- Due to the unavailability of PyState_FindModule, any function that needs access to module-level state (including functions, classes or exceptions defined at the module level) must receive a reference to the module object (or the particular object it needs), either directly or indirectly. This is currently difficult in two situations: * Methods of classes, which receive a reference to the class, but not to the class's module * Libraries with C-level callbacks, unless the callbacks can receive custom data set at callback registration Fixing these cases is outside of the scope of this PEP, but will be needed for the new mechanism to be useful to all modules. Proper fixes have been discussed on the import-sig mailing list [#findmodule-discussion]_. As a rule of thumb, modules that rely on PyState_FindModule are, at the moment, not good candidates for porting to the new mechanism. New Functions ------------- A new function and macro implementing the module creation phase will be added. These are similar to PyModule_Create and PyModule_Create2, except they take an additional ModuleSpec argument, and handle module definitions with non-NULL slots:: PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec) PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec, int module_api_version) A new function implementing the module execution phase will be added. This allocates per-module state (if not allocated already), and *always* processes execution slots. The import machinery calls this method when a module is executed, unless the module is being reloaded:: PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def) Another function will be introduced to initialize a PyModuleDef object. This idempotent function fills in the type, refcount, and module index. It returns its argument cast to PyObject*, so it can be returned directly from a PyInit function:: PyObject * PyModuleDef_Init(PyModuleDef *); Additionally, two helpers will be added for setting the docstring and methods on a module:: int PyModule_SetDocString(PyObject *, const char *) int PyModule_AddFunctions(PyObject *, PyMethodDef *) Export Hook Name ---------------- As portable C identifiers are limited to ASCII, module names must be encoded to form the PyInit hook name. For ASCII module names, the import hook is named PyInit_, where is the name of the module. For module names containing non-ASCII characters, the import hook is named PyInitU_, where the name is encoded using CPython's "punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix), with hyphens ("-") replaced by underscores ("_"). In Python:: def export_hook_name(name): try: suffix = b'_' + name.encode('ascii') except UnicodeEncodeError: suffix = b'U_' + name.encode('punycode').replace(b'-', b'_') return b'PyInit' + suffix Examples: ============= =================== Module name Init hook name ============= =================== spam PyInit_spam lan?m?t PyInitU_lanmt_2sa6t ??? PyInitU_zck5b2b ============= =================== For modules with non-ASCII names, single-phase initialization is not supported. In the initial implementation of this PEP, built-in modules with non-ASCII names will not be supported. Module Reloading ---------------- Reloading an extension module using importlib.reload() will continue to have no effect, except re-setting import-related attributes. Due to limitations in shared library loading (both dlopen on POSIX and LoadModuleEx on Windows), it is not generally possible to load a modified library after it has changed on disk. Use cases for reloading other than trying out a new version of the module are too rare to require all module authors to keep reloading in mind. If reload-like functionality is needed, authors can export a dedicated function for it. Multiple modules in one library ------------------------------- To support multiple Python modules in one shared library, the library can export additional PyInit* symbols besides the one that corresponds to the library's filename. Note that this mechanism can currently only be used to *load* extra modules, but not to *find* them. (This is a limitation of the loader mechanism, which this PEP does not try to modify.) To work around the lack of a suitable finder, code like the following can be used:: import importlib.machinery import importlib.util loader = importlib.machinery.ExtensionFileLoader(name, path) spec = importlib.util.spec_from_loader(name, loader) module = importlib.util.module_from_spec(spec) loader.exec_module(module) return module On platforms that support symbolic links, these may be used to install one library under multiple names, exposing all exported modules to normal import machinery. Testing and initial implementations ----------------------------------- For testing, a new built-in module ``_testmultiphase`` will be created. The library will export several additional modules using the mechanism described in "Multiple modules in one library". The ``_testcapi`` module will be unchanged, and will use single-phase initialization indefinitely (or until it is no longer supported). The ``array`` and ``xx*`` modules will be converted to use multi-phase initialization as part of the initial implementation. Summary of API Changes and Additions ==================================== New functions: * PyModule_FromDefAndSpec (macro) * PyModule_FromDefAndSpec2 * PyModule_ExecDef * PyModule_SetDocString * PyModule_AddFunctions * PyModuleDef_Init New macros: * Py_mod_create * Py_mod_exec New types: * PyModuleDef_Type will be exposed New structures: * PyModuleDef_Slot PyModuleDef.m_reload changes to PyModuleDef.m_slots. The internal ``_imp`` module will have backwards incompatible changes: ``create_builtin``, ``create_dynamic``, and ``exec_dynamic`` will be added; ``init_builtin``, ``load_dynamic`` will be removed. The undocumented functions ``imp.load_dynamic`` and ``imp.init_builtin`` will be replaced by backwards-compatible shims. Backwards Compatibility ----------------------- Existing modules will continue to be source- and binary-compatible with new versions of Python. Modules that use multi-phase initialization will not be compatible with versions of Python that do not implement this PEP. The functions ``init_builtin`` and ``load_dynamic`` will be removed from the ``_imp`` module (but not from the ``imp`` module). All changed loaders (``BuiltinImporter`` and ``ExtensionFileLoader``) will remain backwards-compatible; the ``load_module`` method will be replaced by a shim. Internal functions of Python/import.c and Python/importdl.c will be removed. (Specifically, these are ``_PyImport_GetDynLoadFunc``, ``_PyImport_GetDynLoadWindows``, and ``_PyImport_LoadDynamicModule``.) Possible Future Extensions ========================== The slots mechanism, inspired by PyType_Slot from PEP 384, allows later extensions. Some extension modules exports many constants; for example _ssl has a long list of calls in the form:: PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", PY_SSL_ERROR_ZERO_RETURN); Converting this to a declarative list, similar to PyMethodDef, would reduce boilerplate, and provide free error-checking which is often missing. String constants and types can be handled similarly. (Note that non-default bases for types cannot be portably specified statically; this case would need a Py_mod_exec function that runs before the slots are added. The free error-checking would still be beneficial, though.) Another possibility is providing a "main" function that would be run when the module is given to Python's -m switch. For this to work, the runpy module will need to be modified to take advantage of ModuleSpec-based loading introduced in PEP 451. Also, it will be necessary to add a mechanism for setting up a module according to slots it wasn't originally defined with. Implementation ============== Work-in-progress implementation is available in a Github repository [#gh-repo]_; a patchset is at [#gh-patch]_. Previous Approaches =================== Stefan Behnel's initial proto-PEP [#stefans_protopep]_ had a "PyInit_modulename" hook that would create a module class, whose ``__init__`` would be then called to create the module. This proposal did not correspond to the (then nonexistent) PEP 451, where module creation and initialization is broken into distinct steps. It also did not support loading an extension into pre-existing module objects. Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype implementation [#nicks-prototype]_. At this time PEP 451 was still not implemented, so the prototype does not use ModuleSpec. The original version of this PEP used Create and Exec hooks, and allowed loading into arbitrary pre-constructed objects with Exec hook. The proposal made extension module initialization closer to how Python modules are initialized, but it was later recognized that this isn't an important goal. The current PEP describes a simpler solution. A further iteration used a "PyModuleExport" hook as an alternative to PyInit, where PyInit was used for existing scheme, and PyModuleExport for multi-phase. However, not being able to determine the hook name based on module name complicated automatic generation of PyImport_Inittab by tools like freeze. Keeping only the PyInit hook name, even if it's not entirely appropriate for exporting a definition, yielded a much simpler solution. References ========== .. [#pep-0451-attributes] https://www.python.org/dev/peps/pep-0451/#attributes .. [#stefans_protopep] https://mail.python.org/pipermail/python-dev/2013-August/128087.html .. [#nicks-prototype] https://mail.python.org/pipermail/python-dev/2013-August/128101.html .. [#rfc-3492] http://tools.ietf.org/html/rfc3492 .. [#gh-repo] https://github.com/encukou/cpython/commits/pep489 .. [#gh-patch] https://github.com/encukou/cpython/compare/master...encukou:pep489.patch .. [#findmodule-discussion] https://mail.python.org/pipermail/import-sig/2015-April/000959.html .. [#pep-0451-loading] https://www.python.org/dev/peps/pep-0451/#how-loading-will-work] .. [#subinterpreter-docs] https://docs.python.org/3/c-api/init.html#sub-interpreter-support Copyright ========= This document has been placed in the public domain. From encukou at gmail.com Thu May 21 18:17:37 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 21 May 2015 18:17:37 +0200 Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization; version 6 In-Reply-To: References: Message-ID: On Wed, May 20, 2015 at 11:47 PM, Eric Snow wrote: > FYI, Nick asked if I would be willing to be BDFL-Delegate for this PEP > and Guido has given the okay. I've added myself to the PEP's header. > I'll try to make a decision soon (in time to land the patch before the > feature freeze), but I also must be confident about the pronouncement. > > -eric Thank you for taking this on! I believe all issues raised so far are addressed in the latest update, which is now live. If you still have an unaddressed point, please let me know. From bcannon at gmail.com Thu May 28 17:11:38 2015 From: bcannon at gmail.com (Brett Cannon) Date: Thu, 28 May 2015 15:11:38 +0000 Subject: [Import-SIG] Idea: concept of a builder or transformer to compliment loaders Message-ID: I should start off by saying I don't plan to pursue this idea, but I wanted to write it down for posterity and in case anyone else has thought about this. That being said, the idea of macros and other source-transforming things done to Python code has come up a few times on python-ideas as of late. Now experimenting with this sort of thing using a custom loader is not hard, and thanks to importlib.abc.ResourceLoader.source_to_code() it's fairly easy to do (by design; I tried to initially structure importlib's APIs to making alternative storage backends easy as well as alternative syntax stuff like Quixote from back in the day). But one thing I realized is that while finders and loaders are necessary for alternative code storage mechanisms, they are not the right abstraction for tweaking code semantics. Really all you need is a function that takes in source code and spits out a code object to use with exec() (hence ResourceLoader.source_to_code() even existing). It somewhat sucks that people who just want to tweak code semantics have to define a loader subclass and instantiate a new finder when all that is mostly stuff that doesn't concern them. It also sucks that they would have to do that for every storage type, e.g. local files and zip files. Now I don't have a solid solution to propose for this niche use case. It makes me want to have some kind of way to register compiler functions, but that would be limiting if it went source -> code object. AST -> AST would allow for chaining much like Victor has proposed in the past, but it also means that people who want a transpiler to go source -> source are left out. And then there is the whole thing of how to get the loaders to know of these transpilers/transformers/compilers as adding more global state to sys feels dirty (maybe an attribute on finders that they can draw from if they so choose?), but maybe it isn't that big of a deal as long as they are just callables and people realize they must be re-entrant. As I said, I don't plan to work on this, but I wanted to get my ideas written down in case someone else cared. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri May 29 00:18:57 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 28 May 2015 16:18:57 -0600 Subject: [Import-SIG] Idea: concept of a builder or transformer to compliment loaders In-Reply-To: References: Message-ID: On Thu, May 28, 2015 at 9:11 AM, Brett Cannon wrote: > I should start off by saying I don't plan to pursue this idea, but I wanted > to write it down for posterity and in case anyone else has thought about > this. > > That being said, the idea of macros and other source-transforming things > done to Python code has come up a few times on python-ideas as of late. Now > experimenting with this sort of thing using a custom loader is not hard, and > thanks to importlib.abc.ResourceLoader.source_to_code() it's fairly easy to > do (by design; I tried to initially structure importlib's APIs to making > alternative storage backends easy as well as alternative syntax stuff like > Quixote from back in the day). > > But one thing I realized is that while finders and loaders are necessary for > alternative code storage mechanisms, they are not the right abstraction for > tweaking code semantics. Agreed. > Really all you need is a function that takes in > source code and spits out a code object to use with exec() (hence > ResourceLoader.source_to_code() even existing). It somewhat sucks that > people who just want to tweak code semantics have to define a loader > subclass and instantiate a new finder when all that is mostly stuff that > doesn't concern them. It also sucks that they would have to do that for > every storage type, e.g. local files and zip files. Yep. > > Now I don't have a solid solution to propose for this niche use case. It > makes me want to have some kind of way to register compiler functions, but I had the same thought. > that would be limiting if it went source -> code object. AST -> AST would > allow for chaining much like Victor has proposed in the past, but it also > means that people who want a transpiler to go source -> source are left out. Yeah, it feels like there's an encapsulation there around the various pieces of compilation. Furthermore, I'd expect such an abstraction to consider the needs of alternate Python implementations as well. > And then there is the whole thing of how to get the loaders to know of these > transpilers/transformers/compilers > as adding more global state to sys feels > dirty (maybe an attribute on finders that they can draw from if they so > choose?), but maybe it isn't that big of a deal as long as they are just > callables and people realize they must be re-entrant. This is where something like ImportSystem (nee ImportEngine) would help. We'd just have sys.importsystem and add state there as appropriate without further cluttering up the sys module. FWIW, I've considered a number of minor additions similar to what you're talking about for niche needs where it would still be nice to have a convenient API because of the overhead of writing and managing a finder/loader. Perhaps it's just a matter of providing helper decorators along the lines of contextlib.contextmanager, which convert your simple function into the necessary format at register it in the correct place (e.g. finder+loader -> sys.path/sys.metapath). > > As I said, I don't plan to work on this, but I wanted to get my ideas > written down in case someone else cared. :) -eric