From ncoghlan at gmail.com Thu Apr 2 12:17:08 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 2 Apr 2015 20:17:08 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <551A97DC.40702@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> <5512B9EA.6000002@gmail.com> <5513D90D.2080905@gmail.com> <551A97DC.40702@gmail.com> Message-ID: On 31 March 2015 at 22:49, Petr Viktorin wrote: > On 03/30/2015 03:21 PM, Nick Coghlan wrote: > Also, some modules wrap a library that has global state (haven't checked > stdlib, but curses, readline, locale are candidates). > It doesn't make sense to allow loading more instances of such modules. > Perhaps there should be a flag to distinguish them? Or just let them use > PyState_AddModule/PyState_FindModule to prevent re-import? > > The need for flags would be a good argument to after all have a ModuleExport > structure wrapping around slots. Such a structure could also share > PyModuleDef_Base, making it usable with PyState_AddModule/PyState_FindModule > (there'd be new functions, but the machinery/data structure could be > reused). So I'm starting to be more inclined to do this again: > > typedef struct PyModule_Export { > PyModuleDef_Base m_base; > const char* m_doc; > int m_flags; > PyModule_Slot *m_slots; /* terminated by slot==0. */ > } PyModule_Export; The flag variable on types has proven useful many times, so that does sound potentially valuable here. For the specific case you're talking about, we could have a "PyModule_EXPORT_SINGLETON" flag that caused the import machinery to automatically call PyState_AddModuleFromExport() (or whatever the new function was called) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu Apr 2 13:05:27 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 02 Apr 2015 13:05:27 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> <5512B9EA.6000002@gmail.com> <5513D90D.2080905@gmail.com> <551A97DC.40702@gmail.com> Message-ID: <551D2277.6020700@gmail.com> On 04/02/2015 12:17 PM, Nick Coghlan wrote: > On 31 March 2015 at 22:49, Petr Viktorin wrote: >> On 03/30/2015 03:21 PM, Nick Coghlan wrote: >> Also, some modules wrap a library that has global state (haven't checked >> stdlib, but curses, readline, locale are candidates). >> It doesn't make sense to allow loading more instances of such modules. >> Perhaps there should be a flag to distinguish them? Or just let them use >> PyState_AddModule/PyState_FindModule to prevent re-import? >> >> The need for flags would be a good argument to after all have a ModuleExport >> structure wrapping around slots. Such a structure could also share >> PyModuleDef_Base, making it usable with PyState_AddModule/PyState_FindModule >> (there'd be new functions, but the machinery/data structure could be >> reused). So I'm starting to be more inclined to do this again: >> >> typedef struct PyModule_Export { >> PyModuleDef_Base m_base; >> const char* m_doc; >> int m_flags; >> PyModule_Slot *m_slots; /* terminated by slot==0. */ >> } PyModule_Export; > > The flag variable on types has proven useful many times, so that does > sound potentially valuable here. For the specific case you're talking > about, we could have a "PyModule_EXPORT_SINGLETON" flag that caused > the import machinery to automatically call > PyState_AddModuleFromExport() (or whatever the new function was > called) It's needed on a lower level (_PyImport_FixupExtensionObject), but calling PyState_AddModuleFromExport as well sounds good. There's another possibility I'm considering. PyModuleDef has a "m_reload" member, which is currently unused and must be set to NULL. Could we repurpose that to hold the slots? I realize renaming a member of a publicly available structure, and changing it from function pointer to data pointer, isn't a trivial change. But it shouldn;t break ABI. Maybe it can be made an union, for the rename case? Doing that would mean we wouldn't need additional PyState_AddModule/PyModule_GetDef equivalents, modules could avoid a md_slots member, and things like module_dealloc would only have one place to look for their hooks. In this scenario, flags could be implemented as a data-less slots, {Py_mod_flag_singleton, NULL}, or in a flag slot, {Py_mod_flags, &flags}. What do you think? From ncoghlan at gmail.com Thu Apr 2 16:37:43 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 3 Apr 2015 00:37:43 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <551D2277.6020700@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> <5512B9EA.6000002@gmail.com> <5513D90D.2080905@gmail.com> <551A97DC.40702@gmail.com> <551D2277.6020700@gmail.com> Message-ID: On 2 April 2015 at 21:05, Petr Viktorin wrote: > There's another possibility I'm considering. > PyModuleDef has a "m_reload" member, which is currently unused and must be > set to NULL. Could we repurpose that to hold the slots? > I realize renaming a member of a publicly available structure, and changing > it from function pointer to data pointer, isn't a trivial change. But it > shouldn;t break ABI. Maybe it can be made an union, for the rename case? > Doing that would mean we wouldn't need additional > PyState_AddModule/PyModule_GetDef equivalents, modules could avoid a > md_slots member, and things like module_dealloc would only have one place to > look for their hooks. > In this scenario, flags could be implemented as a data-less slots, > {Py_mod_flag_singleton, NULL}, or in a flag slot, {Py_mod_flags, &flags}. > > What do you think? I think I don't have a clear enough picture of all the moving parts in my head to follow what you're describing here without seeing it written out :) Probably the best thing to do if you're considering a couple of different options is to list them both in the next PEP draft. You may find a clear winner emerges while doing so, and if not, then we'll have a solid reference for the options we're discussing. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu Apr 16 13:05:28 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 16 Apr 2015 13:05:28 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading Message-ID: Hello, Based on previous discussions, I've just sent a rewritten version of PEP 489 to the PEP editors. I'm including a copy below. There's a question I need help with: go with a new PyModuleDesc structure, or stick with PyModuleDef and repurpose a currently unused member. You can find the details in the text itself. A wart I added is "singleton modules", necessary for "PyState_FindModule"-like functionality. I wouldn't mind not including this, but it would mean the new API can't replace all use cases of the old PyInit_. Let me know if you have any comments! PEP: 489 Title: Redesigning extension module loading Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015 Resolution: Abstract ======== This PEP proposes a redesign of the way in which extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension authors to only define features they need, and to allow future additions to extension module declarations. Extensions modules are created in a two-step process, fitting better into the ModuleSpec architecture, with parallels to __new__ and __init__ of classes. Extension modules can safely store arbitrary C-level per-module state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. The proposal also allows extension modules with non-ASCII names. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialization. The initialization function is not passed the ModuleSpec, or any information it contains, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. In Py3, modules are also not being added to sys.modules, which means that a (potentially transitive) re-import of the module will really try to re-import it and thus run into an infinite loop when it executes the module init function again. Without the FQMN, it is not trivial to correctly add the module to sys.modules either. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of "__init__.py" modules, i.e. packages, especially when relative imports are being used at module init time. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or interpreter reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps backwards compatibility, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process =================== Currently, extension modules export an initialization function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialized module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then continues to initialize it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the fully qualified name of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The proposal ============ The current extension module initialization will be deprecated in favor of a new initialization scheme. Since the current scheme will continue to be available, existing code will continue to work unchanged, including binary compatibility. Extension modules that support the new initialization scheme must export the public symbol "PyModuleExport_", where "modulename" is the name of the module. (For modules with non-ASCII names the symbol name is slightly different, see "Export Hook Name" below.) If defined, this symbol must resolve to a C function with the following signature:: PyModuleExport* (*PyModuleExportFunction)(void) The function must return a pointer to a PyModuleExport structure. This structure must be available for the lifetime of the module created from it ? usually, it will be declared statically. The PyModuleExport structure describes the new module, similarly to PEP 384's PyType_Spec for types. The structure is defined as:: typedef struct { int slot; void *value; } PyModuleDesc_Slot; typedef struct { const char* doc; int flags; PyModuleDesc_Slot *slots; } PyModuleDesc; The *doc* member specifies the module's docstring. The *flags* may currently be either 0 or ``PyModule_EXPORT_SINGLETON``, described in "Singleton Modules" below. Other flag values may be added in the future. The *slots* points to an array of PyModuleDesc_Slot structures, terminated by a slot with id set to 0 (i.e. ``{0, NULL}``). To specify a slot, a unique slot ID must be provided. New Python versions may introduce new slot IDs, but slot IDs will never be recycled. Slots may get deprecated, but will continue to be supported throughout Python 3.x. A slot's value pointer may not be NULL, unless specified otherwise in the slot's documentation. The following slots are available, and described later: * Py_mod_create * Py_mod_statedef * Py_mod_methods * Py_mod_exec Unknown slot IDs will cause the import to fail with ImportError. .. note:: An alternate proposal is to use PyModuleDef instead of PyModuleDesc, re-purposing the m_reload pointer to hold the slots:: typedef struct PyModuleDef { PyModuleDef_Base m_base; const char* m_name; const char* m_doc; Py_ssize_t m_size; PyMethodDef *m_methods; PyModuleDesc_Slot* m_slots; /* changed from `inquiry m_reload;` */ traverseproc m_traverse; inquiry m_clear; freefunc m_free; } PyModuleDef; This would simplify both the implementation and the API, at the expense of renaming a member of PyModuleDef, and re-purposing a function pointer as a data pointer. Creation Slots -------------- The following slots affect module creation phase, i.e. they are hooks for ExecutionLoader.create_module. They serve to describe creation of the module object itself. Py_mod_create ............. The Py_mod_create slot is used to support custom module subclasses. The value pointer must point to a function with the following signature:: PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDesc *desc) The function receives a ModuleSpec instance, as defined in PEP 451, and the PyModuleDesc structure. It should return a new module object, or set an error and return NULL. This function is not responsible for setting import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) on the new module. There is no requirement for the returned object to be an instance of types.ModuleType. Any type can be used, as long as it supports setting and getting attributes, including at least the import-related attributes. If a module instance is returned from Py_mod_create, the import machinery will store a pointer to PyModuleDesc in the module object so that it may be retrieved by PyModule_GetDesc (described later). .. note:: If PyModuleDef is used instead of PyModuleDesc, the def is stored instead, to be retrieved by PyModule_GetDef. Note that when this function is called, the module's entry in sys.modules is not populated yet. Attempting to import the same module again (possibly transitively), may lead to an infinite loop. Extension authors are advised to keep Py_mod_create minimal, an in particular to not call user code from it. Multiple Py_mod_create slots may not be specified. If they are, import will fail with ImportError. Py_mod_statedef ............... The Py_mod_statedef slot is used to allocate per-module storage for C-level state. The value pointer must point to the following structure:: typedef struct PyModule_StateDef { int size; traverseproc traverse; inquiry clear; freefunc free; } PyModule_StateDef; The meaning of the members is the same as for the corresponding members in PyModuleDef. Specifying multiple Py_mod_statedef slots, or specifying Py_mod_statedef together with Py_mod_create, will cause the import to fail with ImportError. .. note:: If PyModuleDef is reused, this information is taken from PyModuleDef, so the slot is not necessary. Execution slots --------------- The following slots affect module "execution" phase, i.e. they are processed in ExecutionLoader.exec_module. They serve to describe how the module is initialized ? e.g. how it is populated with functions, types, or constants, and what import-time side effects take place. These slots may be specified multiple times, and are processed in the order they appear in the slots array. When using the default import machinery, these slots are processed after import-related attributes specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or ``__loader__``) are set and the module is added to sys.modules. Py_mod_methods .............. This slot's value pointer must point to an array of PyMethodDef structures. The specified methods are added to the module, like with PyModuleDef.m_methods. .. note:: If PyModuleDef is reused this slot is unnecessary, since methods are already included in PyModuleDef. Py_mod_exec ........... The function in this slot must have the signature:: int (*PyModuleExecFunction)(PyObject* module) It will be called to initialize a module. Usually, this amounts to setting the module's initial attributes. The "module" argument receives the module object to initialize. This will always be the module object created from the corresponding PyModuleDesc. When this function is called, import-related attributes (such as ``__spec__``) will have been set, and the module has already been added to sys.modules. If PyModuleExec replaces the module's entry in sys.modules, the new object will be used and returned by importlib machinery. (This mirrors the behavior of Python modules. Note that for extensions, implementing Py_mod_create is usually a better solution for the use cases this serves.) The function must return ``0`` on success, or, on error, set an exception and return ``-1``. Legacy Init ----------- If the PyModuleExport function is not defined, the import machinery will try to initialize the module using the PyInit hook, as described in PEP 3121. If PyModuleExport is defined, PyModuleInit will be ignored. Modules requiring compatibility with previous versions of CPython may implement PyModuleInit in addition to the new hook. Modules using the legacy init API will be initialized entirely in the Loader.create_module step; Loader.exec_module will be a no-op. XXX: Give example code for a backwards-compatible Init based on slots .. note:: If PyModuleDef is reused, implementing the PyInit hook becomes easy: * call PyModule_Create with the PyModuleDef (m_reload was ignored in previous Python versions, so the slots array will be ignored). Alternatively, call the Py_mod_create function (keeping in mind that the spec is not available with PyInit). * call the Py_mod_exec function(s). Subinterpreters and Interpreter Reloading ----------------------------------------- Extensions using the new initialization scheme are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. The mechanism is designed to make this easy, but care is still required on the part of the extension author. No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object's storage reachable by PyModule_GetState. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. PyModule_GetDesc ---------------- To retrieve the PyModuleDesc structure used to create a module, a new function will be added:: PyModuleDesc* PyModule_GetDesc(PyObject *module) The function returns NULL if the parameter is not a module object, or was not created using PyModuleDesc. .. note:: This is unnecessary if PyModuleDef is reused: the existing PyModule_GetDef can be used instead. Singleton Modules ----------------- Modules defined by PyModuleDef may be registered with PyState_AddModule, and later retrieved with PyState_FindModule. Under the new API, there is no one-to-one mapping between PyModuleSpec and the module created from it. In particular, multiple modules may be loaded from the same description. This means that there is no "global" instance of a module object. Any C-level callbacks that need access to the module state need to be passed a reference to the module object, either directly or indirectly. However, there are some modules that really need to be only loaded once: typically ones that wrap a C library with global state. These modules should set the PyModule_EXPORT_SINGLETON flag in PyModuleDesc.flags. When this flag is set, loading an additional copy of the module after it has been loaded once will return the previously loaded object. This will be done on a low level, using _PyImport_FixupExtensionObject. Additionally, the module will be automatically registered using PyState_AddSingletonModule (see below) after execution slots are processed. Singleton modules can be retrieved, registered or unregistered with the interpreter state using three new functions, which parallel their PyModuleDef counterparts, PyState_FindModule, PyState_AddModule, and PyState_RemoveModule:: PyObject* PyState_FindSingletonModule(PyModuleDesc *desc) int PyState_AddSingletonModule(PyObject *module, PyModuleDesc *desc) int PyState_RemoveSingletonModule(PyModuleDesc *desc) .. note:: If PyModuleDef is used instead of PyModuleDesc, the flag would be specified as a slot with NULL value, i.e. ``{Py_mod_flag_singleton, NULL}``. In this case, PyState_FindModule, PyState_AddModule and PyState_RemoveModule can be used instead of the new functions. .. note:: Another possibility is to use PyModuleDef_Base in PyModuleDesc, and have PyState_FindModule and friends work with either of the two structures. Export Hook Name ---------------- As portable C identifiers are limited to ASCII, module names must be encoded to form the PyModuleExport hook name. For ASCII module names, the import hook is named PyModuleExport_, where is the name of the module. For module names containing non-ASCII characters, the import hook is named PyModuleExportU_, where the name is encoded using CPython's "punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix), with hyphens ("-") replaced by underscores ("_"). In Python:: def export_hook_name(name): try: encoded = b'_' + name.encode('ascii') except UnicodeDecodeError: encoded = b'U_' + name.encode('punycode').replace(b'-', b'_') return b'PyModuleExport' + encoded Examples: ============= =========================== Module name Export hook name ============= =========================== spam PyModuleExport_spam lan?m?t PyModuleExportU_lanmt_2sa6t ??? PyModuleExportU_zck5b2b ============= =========================== Module Reloading ---------------- Reloading an extension module using importlib.reload() will continue to have no effect, except re-setting import-related attributes. Due to limitations in shared library loading (both dlopen on POSIX and LoadModuleEx on Windows), it is not generally possible to load a modified library after it has changed on disk. Use cases for reloading other than trying out a new version of the module are too rare to require all module authors to keep reloading in mind. If reload-like functionality is needed, authors can export a dedicated function for it. Multiple modules in one library ------------------------------- To support multiple Python modules in one shared library, the library can export additional PyModuleExport* symbols besides the one that corresponds to the library's filename. Note that this mechanism can currently only be used to *load* extra modules, not to *find* them. Given the filesystem location of a shared library and a module name, a module may be loaded with:: import importlib.machinery import importlib.util loader = importlib.machinery.ExtensionFileLoader(name, path) spec = importlib.util.spec_from_loader(name, loader) return importlib.util.module_from_spec(spec) On platforms that support symbolic links, these may be used to install one library under multiple names, exposing all exported modules to normal import machinery. Testing and initial implementations ----------------------------------- For testing, a new built-in module ``_testimportmodexport`` will be created. The library will export several additional modules using the mechanism described in "Multiple modules in one library". The ``_testcapi`` module will be unchanged, and will use the old API indefinitely (or until the old API is removed). The ``_csv`` and ``readline`` modules will be converted to the new API as part of the initial implementation. Possible Future Extensions ========================== The slots mechanism, inspired by PyType_Slot from PEP 384, allows later extensions. Some extension modules exports many constants; for example _ssl has a long list of calls in the form:: PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", PY_SSL_ERROR_ZERO_RETURN); Converting this to a declarative list, similar to PyMethodDef, would reduce boilerplate, and provide free error-checking which is often missing. String constants and types can be handled similarly. (Note that non-default bases for types cannot be portably specified statically; this case would need a Py_mod_exec function that runs before the slots are added. The free error-checking would still be beneficial, though.) Another possibility is providing a "main" function that would be run when the module is given to Python's -m switch. For this to work, the runpy module will need to be modified to take advantage of ModuleSpec-based loading introduced in PEP 451. Also, it will be necessary to add a mechanism for setting up a module according to slots it wasn't originally defined with. Implementation ============== Work-in-progress implementation is available in a Github repository [#gh-repo]_; a patchset is at [#gh-patch]_. Previous Approaches =================== Stefan Behnel's initial proto-PEP [#stefans_protopep]_ had a "PyInit_modulename" hook that would create a module class, whose ``__init__`` would be then called to create the module. This proposal did not correspond to the (then nonexistent) PEP 451, where module creation and initialization is broken into distinct steps. It also did not support loading an extension into pre-existing module objects. Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype implementation [#nicks-prototype]_. At this time PEP 451 was still not implemented, so the prototype does not use ModuleSpec. The original version of this PEP used Create and Exec hooks, and allowed loading into arbitrary pre-constructed objects with Exec hook. The proposal made extension module initialization closer to how Python modules are initialized, but it was later recognized that this isn't an important goal. The current PEP describes a simpler solution. References ========== .. [#lazy_import_concerns] https://mail.python.org/pipermail/python-dev/2013-August/128129.html .. [#pep-0451-attributes] https://www.python.org/dev/peps/pep-0451/#attributes .. [#stefans_protopep] https://mail.python.org/pipermail/python-dev/2013-August/128087.html .. [#nicks-prototype] https://mail.python.org/pipermail/python-dev/2013-August/128101.html .. [#rfc-3492] http://tools.ietf.org/html/rfc3492 .. [#gh-repo] https://github.com/encukou/cpython/commits/pep489 .. [#gh-patch] https://github.com/encukou/cpython/compare/master...encukou:pep489.patch Copyright ========= This document has been placed in the public domain. From ncoghlan at gmail.com Thu Apr 16 22:48:11 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 16 Apr 2015 16:48:11 -0400 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: Message-ID: On 16 Apr 2015 07:05, "Petr Viktorin" wrote: > > Hello, > Based on previous discussions, I've just sent a rewritten version of > PEP 489 to the PEP editors. I'm including a copy below. I like the general approach described in this version of the PEP. > There's a question I need help with: go with a new PyModuleDesc > structure, or stick with PyModuleDef and repurpose a currently unused > member. You can find the details in the text itself. I think the PEP makes a decent case for: * m_reload being an essentially unimplementable idea * swapping it out for a slots array pointer being a way to migrate to an extensible approach to module declarations without needing a new struct definition in the C API (and associated new functions to work with that struct). That means I'm inclined to favour the "modify PyModuleDef" variant. > A wart I added is "singleton modules", necessary for > "PyState_FindModule"-like functionality. I wouldn't mind not including > this, but it would mean the new API can't replace all use cases of the > old PyInit_. I think the "wrapping a C library with global state" case is worth being able to declare explicitly. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Apr 17 07:59:58 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 17 Apr 2015 07:59:58 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References:

Message-ID: Nick Coghlan schrieb am 16.04.2015 um 22:48: > On 16 Apr 2015 07:05, "Petr Viktorin" wrote: >> There's a question I need help with: go with a new PyModuleDesc >> structure, or stick with PyModuleDef and repurpose a currently unused >> member. You can find the details in the text itself. > > I think the PEP makes a decent case for: > > * m_reload being an essentially unimplementable idea > * swapping it out for a slots array pointer being a way to migrate to an > extensible approach to module declarations without needing a new struct > definition in the C API (and associated new functions to work with that > struct). > > That means I'm inclined to favour the "modify PyModuleDef" variant. +1, but I think the decision should be done over on python-dev. Stefan From stefan_ml at behnel.de Fri Apr 17 08:49:56 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 17 Apr 2015 08:49:56 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: Message-ID: Petr Viktorin schrieb am 16.04.2015 um 13:05: > Extension modules that support the new initialization scheme must export > the public symbol "PyModuleExport_", where "modulename" > is the name of the module. (For modules with non-ASCII names the symbol name > is slightly different, see "Export Hook Name" below.) > > If defined, this symbol must resolve to a C function with the following > signature:: > > PyModuleExport* (*PyModuleExportFunction)(void) > > The function must return a pointer to a PyModuleExport structure. > This structure must be available for the lifetime of the module created from > it ? usually, it will be declared statically. > > The PyModuleExport structure describes the new module, similarly to > PEP 384's PyType_Spec for types. The structure is defined as:: > > typedef struct { > int slot; > void *value; > } PyModuleDesc_Slot; > > typedef struct { > const char* doc; > int flags; > PyModuleDesc_Slot *slots; > } PyModuleDesc; PyModuleDesc or -Export? > Py_mod_statedef > ............... > > The Py_mod_statedef slot is used to allocate per-module storage for C-level > state. Add a note here that the import machinery will create a normal types.ModuleType instance if this setup is used. However: > .. note:: > > If PyModuleDef is reused, this information is taken from PyModuleDef, > so the slot is not necessary. Yes, I would really prefer reuse here. > Execution slots > --------------- > These slots may be specified multiple times, and are processed in the order > they appear in the slots array. Interesting. Not sure if this is useful for anything, but why not. I don't think it hurts, and it fits the idea of a slot list. > When using the default import machinery, these slots are processed after > import-related attributes specified in PEP 451 [#pep-0451-attributes]_ > (such as ``__name__`` or ``__loader__``) are set and the module is added > to sys.modules. What would be a non-default import machinery? > Py_mod_exec > ........... > The function in this slot must have the signature:: Better: "The value (or entry?) in this slot must point to a function with the following signature". > Legacy Init > ----------- > If the PyModuleExport function is not defined, the import machinery will try to > initialize the module using the PyInit hook, as described in PEP 3121. > > If PyModuleExport is defined, PyModuleInit will be ignored. "PyModuleExportFunction", and also use "PyInit" in the last sentence to avoid ambiguity. "PyModuleInit" isn't a name that's being used anywhere else, I think. (It occurs a couple of times in this PEP.) > Singleton Modules > ----------------- I'll respond to this in a separate email as this seems to need some more discussion. > Multiple modules in one library > ------------------------------- > > To support multiple Python modules in one shared library, the library can > export additional PyModuleExport* symbols besides the one that corresponds > to the library's filename. > > Note that this mechanism can currently only be used to *load* extra modules, > not to *find* them. > > Given the filesystem location of a shared library and a module name, > a module may be loaded with:: > > import importlib.machinery > import importlib.util > loader = importlib.machinery.ExtensionFileLoader(name, path) > spec = importlib.util.spec_from_loader(name, loader) > return importlib.util.module_from_spec(spec) > > On platforms that support symbolic links, these may be used to install one > library under multiple names, exposing all exported modules to normal > import machinery. This makes me think, if a module wants to import another one from the same shared library (usually in Py_mod_exec), how would that work? Would it still require a symlink and full module discovery? Or would a call to PyModule_Create(PyModuleDef) be enough? The latter would then have to run the complete module initialisation that this PEP defines. That sounds like the right way to do it, but it should be mentioned somewhere in this PEP, I think. > Testing and initial implementations > ----------------------------------- > The ``_csv`` and ``readline`` modules will be converted to the new API as > part of the initial implementation. Ah, yes, good idea. While not required, I think it's good to have some real example in the stdlib, both for exercising the new implementation and for others to look at. > Possible Future Extensions > ========================== > [...] Good examples. And, if reloading can ever be made to work, maybe just on some platforms, this would be the place to provide the entry point. Nice work overall, thanks! Stefan From stefan_ml at behnel.de Fri Apr 17 08:51:34 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 17 Apr 2015 08:51:34 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: Message-ID: Petr Viktorin schrieb am 16.04.2015 um 13:05: > A wart I added is "singleton modules", necessary for > "PyState_FindModule"-like functionality. I wouldn't mind not including > this, but it would mean the new API can't replace all use cases of the > old PyInit_. > > Singleton Modules > ----------------- > > Modules defined by PyModuleDef may be registered with PyState_AddModule, > and later retrieved with PyState_FindModule. > > Under the new API, there is no one-to-one mapping between PyModuleSpec > and the module created from it. > In particular, multiple modules may be loaded from the same description. Is that because a single shared library (which is what the module spec refers to, right?) can contain multiple modules? Or are you referring to something else here? > This means that there is no "global" instance of a module object. > Any C-level callbacks that need access to the module state need to be passed > a reference to the module object, either directly or indirectly. > > > However, there are some modules that really need to be only loaded once: > typically ones that wrap a C library with global state. > These modules should set the PyModule_EXPORT_SINGLETON flag > in PyModuleDesc.flags. When this flag is set, loading an additional > copy of the module after it has been loaded once will return the previously > loaded object. > This will be done on a low level, using _PyImport_FixupExtensionObject. > Additionally, the module will be automatically registered using > PyState_AddSingletonModule (see below) after execution slots are processed. > > Singleton modules can be retrieved, registered or unregistered with > the interpreter state using three new functions, which parallel their > PyModuleDef counterparts, PyState_FindModule, PyState_AddModule, > and PyState_RemoveModule:: > > PyObject* PyState_FindSingletonModule(PyModuleDesc *desc) > int PyState_AddSingletonModule(PyObject *module, PyModuleDesc *desc) > int PyState_RemoveSingletonModule(PyModuleDesc *desc) > > > .. note:: > > If PyModuleDef is used instead of PyModuleDesc, the flag would be specified > as a slot with NULL value, i.e. ``{Py_mod_flag_singleton, NULL}``. > In this case, PyState_FindModule, PyState_AddModule and > PyState_RemoveModule can be used instead of the new functions. > > .. note:: > > Another possibility is to use PyModuleDef_Base in PyModuleDesc, and > have PyState_FindModule and friends work with either of the two structures. Yes, this is totally a wart. However, I'm not sure I understand the actual use case from what is written above. Could you clarify why this whole special case is needed? Stefan From encukou at gmail.com Fri Apr 17 10:40:00 2015 From: encukou at gmail.com (Petr Viktorin) Date: Fri, 17 Apr 2015 10:40:00 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: Message-ID: <5530C6E0.2020401@gmail.com> On 04/17/2015 08:49 AM, Stefan Behnel wrote: > Petr Viktorin schrieb am 16.04.2015 um 13:05: >> Extension modules that support the new initialization scheme must export >> the public symbol "PyModuleExport_", where "modulename" >> is the name of the module. (For modules with non-ASCII names the symbol name >> is slightly different, see "Export Hook Name" below.) >> >> If defined, this symbol must resolve to a C function with the following >> signature:: >> >> PyModuleExport* (*PyModuleExportFunction)(void) >> >> The function must return a pointer to a PyModuleExport structure. >> This structure must be available for the lifetime of the module created from >> it ? usually, it will be declared statically. >> >> The PyModuleExport structure describes the new module, similarly to >> PEP 384's PyType_Spec for types. The structure is defined as:: >> >> typedef struct { >> int slot; >> void *value; >> } PyModuleDesc_Slot; >> >> typedef struct { >> const char* doc; >> int flags; >> PyModuleDesc_Slot *slots; >> } PyModuleDesc; > > PyModuleDesc or -Export? Sorry, missed this. Export is a better name, I'll do a replace. >> Py_mod_statedef >> ............... >> >> The Py_mod_statedef slot is used to allocate per-module storage for C-level >> state. > > Add a note here that the import machinery will create a normal > types.ModuleType instance if this setup is used. However: > > >> .. note:: >> >> If PyModuleDef is reused, this information is taken from PyModuleDef, >> so the slot is not necessary. > > Yes, I would really prefer reuse here. I'll fix the PEP's minor issues, and take this point to python-dev for discussion. >> Execution slots >> --------------- >> These slots may be specified multiple times, and are processed in the order >> they appear in the slots array. > > Interesting. Not sure if this is useful for anything, but why not. I don't > think it hurts, and it fits the idea of a slot list. This is an easy to understand answer to the question "in what order do things happen?", which is bound to come up if more advanced slots are added. Perhaps I should write out a rule of thumb this implies: "always put Py_mod_exec last". As for allowing repeated slots: it certainly doesn't hurt, and I think checking for this case is unnecessary, as is making it undefined behavior. >> When using the default importmachinery, these slots are processed after >> import-related attributes specified in PEP 451 [#pep-0451-attributes]_ >> (such as ``__name__`` or ``__loader__``) are set and the module is added >> to sys.modules. > > What would be a non-default import machinery? Directly calling Loader.create_module/Loader.execute_module, or other such lower-level tools. >> Py_mod_exec >> ........... >> The function in this slot must have the signature:: > > Better: "The value (or entry?) in this slot must point to a function with > the following signature". OK >> Legacy Init >> ----------- >> If the PyModuleExport function is not defined, the import machinery will try to >> initialize the module using the PyInit hook, as described in PEP 3121. >> >> If PyModuleExport is defined, PyModuleInit will be ignored. > > "PyModuleExportFunction", and also use "PyInit" in the last sentence to > avoid ambiguity. "PyModuleInit" isn't a name that's being used anywhere > else, I think. (It occurs a couple of times in this PEP.) OK >> Multiple modules in one library >> ------------------------------- >> >> To support multiple Python modules in one shared library, the library can >> export additional PyModuleExport* symbols besides the one that corresponds >> to the library's filename. >> >> Note that this mechanism can currently only be used to *load* extra modules, >> not to *find* them. >> >> Given the filesystem location of a shared library and a module name, >> a module may be loaded with:: >> >> import importlib.machinery >> import importlib.util >> loader = importlib.machinery.ExtensionFileLoader(name, path) >> spec = importlib.util.spec_from_loader(name, loader) >> return importlib.util.module_from_spec(spec) >> >> On platforms that support symbolic links, these may be used to install one >> library under multiple names, exposing all exported modules to normal >> import machinery. > > This makes me think, if a module wants to import another one from the same > shared library (usually in Py_mod_exec), how would that work? Would it > still require a symlink and full module discovery? Or would a call to > PyModule_Create(PyModuleDef) be enough? The latter would then have to run > the complete module initialisation that this PEP defines. That sounds like > the right way to do it, but it should be mentioned somewhere in this PEP, I > think. No, PyModule_Create can't call the whole import machinery. It doesn't have the ModuleSpec to pass to the create slot. Maybe I should make that explicit, and having PyModule_Create fail when the def contains slots. As for the use case, I'm inclined to leave it out of this PEP. I'm interested in exploring it later, maybe adding slots for submodules to allow C libraries. But now it seems that this use case is mainly interesting for Cython, and there you can do the equivalent of the Python code above. >> Testing and initial implementations >> ----------------------------------- >> The ``_csv`` and ``readline`` modules will be converted to the new API as >> part of the initial implementation. > > Ah, yes, good idea. While not required, I think it's good to have some real > example in the stdlib, both for exercising the new implementation and for > others to look at. > > >> Possible Future Extensions >> ========================== >> [...] > > Good examples. And, if reloading can ever be made to work, maybe just on > some platforms, this would be the place to provide the entry point. > > Nice work overall, thanks! > > Stefan From pviktori at redhat.com Fri Apr 17 12:33:38 2015 From: pviktori at redhat.com (Petr Viktorin) Date: Fri, 17 Apr 2015 12:33:38 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: Message-ID: <5530E182.6040804@redhat.com> On 04/17/2015 08:51 AM, Stefan Behnel wrote: > Petr Viktorin schrieb am 16.04.2015 um 13:05: >> A wart I added is "singleton modules", necessary for >> "PyState_FindModule"-like functionality. I wouldn't mind not including >> this, but it would mean the new API can't replace all use cases of the >> old PyInit_. >> >> Singleton Modules >> ----------------- >> >> Modules defined by PyModuleDef may be registered with PyState_AddModule, >> and later retrieved with PyState_FindModule. >> >> Under the new API, there is no one-to-one mapping between PyModuleSpec >> and the module created from it. >> In particular, multiple modules may be loaded from the same description. > > Is that because a single shared library (which is what the module spec > refers to, right?) can contain multiple modules? Or are you referring to > something else here? By using Loader.create_module/Loader.exec_module directly, you can load an extension module without adding it to sys.modules. You can do this as many times as you like, and you always get a new, independent module object. >> This means that there is no "global" instance of a module object. >> Any C-level callbacks that need access to the module state need to be passed >> a reference to the module object, either directly or indirectly. >> >> >> However, there are some modules that really need to be only loaded once: >> typically ones that wrap a C library with global state. >> These modules should set the PyModule_EXPORT_SINGLETON flag >> in PyModuleDesc.flags. When this flag is set, loading an additional >> copy of the module after it has been loaded once will return the previously >> loaded object. >> This will be done on a low level, using _PyImport_FixupExtensionObject. >> Additionally, the module will be automatically registered using >> PyState_AddSingletonModule (see below) after execution slots are processed. >> >> Singleton modules can be retrieved, registered or unregistered with >> the interpreter state using three new functions, which parallel their >> PyModuleDef counterparts, PyState_FindModule, PyState_AddModule, >> and PyState_RemoveModule:: >> >> PyObject* PyState_FindSingletonModule(PyModuleDesc *desc) >> int PyState_AddSingletonModule(PyObject *module, PyModuleDesc *desc) >> int PyState_RemoveSingletonModule(PyModuleDesc *desc) >> >> >> .. note:: >> >> If PyModuleDef is used instead of PyModuleDesc, the flag would be specified >> as a slot with NULL value, i.e. ``{Py_mod_flag_singleton, NULL}``. >> In this case, PyState_FindModule, PyState_AddModule and >> PyState_RemoveModule can be used instead of the new functions. >> >> .. note:: >> >> Another possibility is to use PyModuleDef_Base in PyModuleDesc, and >> have PyState_FindModule and friends work with either of the two structures. > > Yes, this is totally a wart. However, I'm not sure I understand the actual > use case from what is written above. Could you clarify why this whole > special case is needed? Normally, you need to pass module object to any C-level callbacks that need the module state in any way. Since in the new scheme of things there may be multiple modules, you can't just attach a module object to interpreter state and then look it up. However, consider wrapping a C library with global state. The library might not allow you to pass arbitrary data to your callbacks, so there's no proper way to get to the module object. So you want to load the module only once, returning the same object when it's created from the same slots, and a way to get to your module object from anywhere. That's what Python does currently, with PyState_FindModule for finding the module. Well, that's the use case as I understand it. It would read a bit better if PyModule_Def is reused ? in that variant it's a way to keep PyState_FindModule working. The more I look at this though, the more I see using PyState_FindModule as something that should just be discontinued when converting a module to the new API. Perhaps it'll be better to remove the flag; there's always a possibility to add it in the future. -- Petr Viktorin From barry at python.org Wed Apr 22 17:59:59 2015 From: barry at python.org (Barry Warsaw) Date: Wed, 22 Apr 2015 11:59:59 -0400 Subject: [Import-SIG] Dabeaz's weird import discovery Message-ID: <20150422115959.1ff2ee58@limelight.wooz.org> So I've been trying to catch up on Pycon 2015 videos. David Beazley is always entertaining so I figured I'd spend a little time on his three hour tour of modules and packages: https://www.youtube.com/watch?v=0oTh1CXRaQ0 About half an hour in, I got shipwrecked on an oddity of the import system. That it surprised dabeaz too gave me some satisfaction, and like a Professor I got curious and did some experimentation. (ObMoratorium from here out: Gilligan's Island reference.) The weirdness is evident in asyncio/__init__.py where you have a bunch of explicit relative from-import-*'s and then seemingly out of nowhere, __all__ makes references to the named submodules. That's damn surprising if you understand how name bindings happen in import statements, which I thought I did. ;). There's no explicit name binding to those submodules so that __all__ should throw NameErrors. E.g. in from string import * you don't expect, nor do you get, 'string' bound in the current namespace. Yet when asyncio/__init__.py does from .base_events import * you *do* get a name binding for 'base_events'. David notes the weirdness in his talk, but his explanation was unsatisfying. Let's look at a short example: -----snip snip----- spam/ __init__.py foo.py bar.py spam/__init__py =============== from .foo import * print(foo) from .bar import * print(bar) __all__ = foo.__all__ + bar.__all__ spam/foo.py =========== print('foo') __all__ = ['Foo'] class Foo: pass spam/bar.py =========== print('bar') __all__ = ['Bar'] class Bar: pass $ python3 >>> from spam import * foo bar -----snip snip----- As it turns out, it's not the from-import-* that does the name binding, it's the importing of submodules. Use any other submodule import spelling to make it work. This includes import spam.foo from spam.foo import Foo __import__('spam.foo') importlib.import_module('spam.foo') Poking around in Lib/importlib/_bootstrap.py, I think you can see where this happens. In _find_and_load_unlocked(), 'round about line 2224 (in 3.5's hg:95593), you see this: if parent: # Set the module as an attribute on its parent. parent_module = sys.modules[parent] setattr(parent_module, name.rpartition('.')[2], module) It's clearly intentional, and fundamental to importlib so I don't think it's dependent on finder or loader. No matter how it happens, if a submodule is imported, its parent namespace gets a name binding to the submodule. What was the motivation for this? Was the intent really to bind submodule names in the parent module seemingly magically? AFAICT, this also isn't actually documented anywhere. I've looked in the Language Reference under the import system[*], and import statement, nor in the Library Reference under __import__(). There's lots of material here, so I could be missing it. I don't know whether any of the alternative implementations also implement this behavior, but they'll have to. I think this needs to be documented in the Language Reference, and after some feedback here, I'll open a docs bug and write some text to fix it. Cheers, -Barry [*] which I wrote, and I'm still surprised! :) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From ethan at stoneleaf.us Wed Apr 22 18:21:53 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 22 Apr 2015 09:21:53 -0700 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: <20150422115959.1ff2ee58@limelight.wooz.org> References: <20150422115959.1ff2ee58@limelight.wooz.org> Message-ID: <20150422162153.GF4073@stoneleaf.us> On 04/22, Barry Warsaw wrote: > Poking around in Lib/importlib/_bootstrap.py, I think you can see where this > happens. In _find_and_load_unlocked(), 'round about line 2224 (in 3.5's > hg:95593), you see this: > > if parent: > # Set the module as an attribute on its parent. > parent_module = sys.modules[parent] > setattr(parent_module, name.rpartition('.')[2], module) > > It's clearly intentional, and fundamental to importlib so I don't think it's > dependent on finder or loader. No matter how it happens, if a submodule is > imported, its parent namespace gets a name binding to the submodule. I see that it's intentional, but why is it fundamental? > What was the motivation for this? Was the intent really to bind submodule > names in the parent module seemingly magically? I have to say I don't care for this. In the cases where I'm importing sub- modules into __init__ I'm usually trying to keep a neat namespace, and with this behavior I'll have more work to do... although probably not a big deal with __all__ in the picture, so I could live with it if it is indeed "fundamental". ;) -- ~Ethan~ From fwierzbicki at gmail.com Wed Apr 22 18:36:39 2015 From: fwierzbicki at gmail.com (fwierzbicki at gmail.com) Date: Wed, 22 Apr 2015 09:36:39 -0700 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: <20150422115959.1ff2ee58@limelight.wooz.org> References: <20150422115959.1ff2ee58@limelight.wooz.org> Message-ID: On Wed, Apr 22, 2015 at 8:59 AM, Barry Warsaw wrote: > I don't know whether any of the alternative implementations also implement > this behavior, but they'll have to. Jython 2.7 does anyway. I think there are unit tests in the standard library that verify this behavior that I had to get working. -Frank From barry at python.org Wed Apr 22 19:09:42 2015 From: barry at python.org (Barry Warsaw) Date: Wed, 22 Apr 2015 13:09:42 -0400 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: <20150422162153.GF4073@stoneleaf.us> References: <20150422115959.1ff2ee58@limelight.wooz.org> <20150422162153.GF4073@stoneleaf.us> Message-ID: <20150422130942.12535e19@limelight.wooz.org> On Apr 22, 2015, at 09:21 AM, Ethan Furman wrote: >I see that it's intentional, but why is it fundamental? Fundamental in the sense that it's the importlib machinery, not any particular loader, that does the name binding. It's not fundamental in that the import machinery *has* to work this way. In fact, if I was reviewing the code today, I'd red flag this. >> What was the motivation for this? Was the intent really to bind submodule >> names in the parent module seemingly magically? > >I have to say I don't care for this. In the cases where I'm importing sub- >modules into __init__ I'm usually trying to keep a neat namespace, and with >this behavior I'll have more work to do... although probably not a big deal >with __all__ in the picture, so I could live with it if it is indeed >"fundamental". ;) It's certainly surprising! I don't think it saves much typing. If it wasn't already relied upon by existing code, I'd suggest removing it. Clearly there was a reason why it was added, although there's no clue that I can find as to what that reason was. If it's just a quirk of importlib that we have to live with now, let's at least document it. Cheers, -Barry From guido at python.org Wed Apr 22 19:15:57 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Apr 2015 10:15:57 -0700 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: <20150422115959.1ff2ee58@limelight.wooz.org> References: <20150422115959.1ff2ee58@limelight.wooz.org> Message-ID: On Wed, Apr 22, 2015 at 8:59 AM, Barry Warsaw wrote: > [...] > As it turns out, it's not the from-import-* that does the name binding, > it's > the importing of submodules. Use any other submodule import spelling to > make > it work. This includes > > import spam.foo > from spam.foo import Foo > __import__('spam.foo') > importlib.import_module('spam.foo') > > Poking around in Lib/importlib/_bootstrap.py, I think you can see where > this > happens. In _find_and_load_unlocked(), 'round about line 2224 (in 3.5's > hg:95593), you see this: > > if parent: > # Set the module as an attribute on its parent. > parent_module = sys.modules[parent] > setattr(parent_module, name.rpartition('.')[2], module) > > It's clearly intentional, and fundamental to importlib so I don't think > it's > dependent on finder or loader. No matter how it happens, if a submodule is > imported, its parent namespace gets a name binding to the submodule. > > What was the motivation for this? Was the intent really to bind submodule > names in the parent module seemingly magically? > > AFAICT, this also isn't actually documented anywhere. I've looked in the > Language Reference under the import system[*], and import statement, nor in > the Library Reference under __import__(). There's lots of material here, > so I > could be missing it. > > I don't know whether any of the alternative implementations also implement > this behavior, but they'll have to. > > I think this needs to be documented in the Language Reference, and after > some > feedback here, I'll open a docs bug and write some text to fix it. > It's definitely intentional, and it's fundamental to the package import design. We've had many implementations of package import (remember "ni.py"? last seen as "knee.py") and it was always there, because this is done as part of *submodule loading*. For better or for worse (and because I didn't know Java at the time :-) Python declares that if you write `import foo.bar` then later in your code you can use `foo.bar` to reference to the bar submodule of package foo. And the way this is done is to make each submodule an attribute of its parent package. This is done when the submodule is first loaded, and because of the strict separation between loading and importing, it is done no matter what form of import was used to load bar. I guess another thing to realize is that the globals of __init__.py are also the attribute namespace of the package. I'm not surprised it's in the reference manual -- that hasn't been updated thoroughly in ages, and I sometimes cry when I see it. :-) So please do clarify this for the benefit of future implementers. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Apr 22 19:21:44 2015 From: barry at python.org (Barry Warsaw) Date: Wed, 22 Apr 2015 13:21:44 -0400 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: References: <20150422115959.1ff2ee58@limelight.wooz.org> Message-ID: <20150422132144.5cd4ae8d@limelight.wooz.org> On Apr 22, 2015, at 10:15 AM, Guido van Rossum wrote: >I'm not surprised it's in the reference manual -- that hasn't been updated >thoroughly in ages, and I sometimes cry when I see it. :-) So please do >clarify this for the benefit of future implementers. http://bugs.python.org/issue24029 Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From ericsnowcurrently at gmail.com Wed Apr 22 19:33:34 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 22 Apr 2015 11:33:34 -0600 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: References: <20150422115959.1ff2ee58@limelight.wooz.org> Message-ID: On Wed, Apr 22, 2015 at 11:15 AM, Guido van Rossum wrote: > It's definitely intentional, and it's fundamental to the package import > design. We've had many implementations of package import (remember "ni.py"? > last seen as "knee.py") and it was always there, because this is done as > part of *submodule loading*. For better or for worse (and because I didn't > know Java at the time :-) Python declares that if you write `import foo.bar` > then later in your code you can use `foo.bar` to reference to the bar > submodule of package foo. And the way this is done is to make each submodule > an attribute of its parent package. This is done when the submodule is first > loaded, and because of the strict separation between loading and importing, > it is done no matter what form of import was used to load bar. Exactly. "import spam.eggs; spam.eggs" looks up "spam" and then its "eggs" attribute, so "eggs" has to be bound during the import. The surprising part is that it also happens for explicit relative imports. I'm guessing that part was unintentional and simply not noticed when PEP 328 was implemented. > > I guess another thing to realize is that the globals of __init__.py are also > the attribute namespace of the package. Do you think this is confusing for anyone? It seems obvious to me, but I'm pretty familiar with the import system. :) -eric From guido at python.org Wed Apr 22 19:37:11 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Apr 2015 10:37:11 -0700 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: References: <20150422115959.1ff2ee58@limelight.wooz.org> Message-ID: On Wed, Apr 22, 2015 at 10:33 AM, Eric Snow wrote: > On Wed, Apr 22, 2015 at 11:15 AM, Guido van Rossum > wrote: > > It's definitely intentional, and it's fundamental to the package import > > design. We've had many implementations of package import (remember > "ni.py"? > > last seen as "knee.py") and it was always there, because this is done as > > part of *submodule loading*. For better or for worse (and because I > didn't > > know Java at the time :-) Python declares that if you write `import > foo.bar` > > then later in your code you can use `foo.bar` to reference to the bar > > submodule of package foo. And the way this is done is to make each > submodule > > an attribute of its parent package. This is done when the submodule is > first > > loaded, and because of the strict separation between loading and > importing, > > it is done no matter what form of import was used to load bar. > > Exactly. "import spam.eggs; spam.eggs" looks up "spam" and then its > "eggs" attribute, so "eggs" has to be bound during the import. > > The surprising part is that it also happens for explicit relative > imports. I'm guessing that part was unintentional and simply not > noticed when PEP 328 was implemented. > No, that must also have been intentional, because even when you use relative import, the module you imported knows its full name, and that full name is used as its key in sys.modules. If someone else uses absolute import for the same module they should still get the same module object. > > I guess another thing to realize is that the globals of __init__.py are > also > > the attribute namespace of the package. > > Do you think this is confusing for anyone? It seems obvious to me, > but I'm pretty familiar with the import system. :) > When you look at it from a different angle it's totally obvious. But apparently it surprised Dave and Barry. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Apr 22 19:52:40 2015 From: barry at python.org (Barry Warsaw) Date: Wed, 22 Apr 2015 13:52:40 -0400 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: References: <20150422115959.1ff2ee58@limelight.wooz.org>

Message-ID: <20150422135240.2acd9ad4@limelight.wooz.org> On Apr 22, 2015, at 10:37 AM, Guido van Rossum wrote: >On Wed, Apr 22, 2015 at 10:33 AM, Eric Snow >wrote: >> The surprising part is that it also happens for explicit relative >> imports. I'm guessing that part was unintentional and simply not >> noticed when PEP 328 was implemented. > >No, that must also have been intentional, because even when you use >relative import, the module you imported knows its full name, and that full >name is used as its key in sys.modules. If someone else uses absolute >import for the same module they should still get the same module object. > >> > I guess another thing to realize is that the globals of __init__.py are >> also >> > the attribute namespace of the package. >> >> Do you think this is confusing for anyone? It seems obvious to me, >> but I'm pretty familiar with the import system. :) >> >When you look at it from a different angle it's totally obvious. But >apparently it surprised Dave and Barry. No, that part I get. I remember the discussions around this back in the Pythonlabs days. :) The surprising part was the effect of explicit relative imports on the namespace of the parent, but given your explanation above, it now makes sense. It's still surprising given Python's other name binding rules. Maybe as surprising (to some) that "import os" magically gives you "os.path" :). It'll be easy to document though. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From ericsnowcurrently at gmail.com Wed Apr 22 19:54:35 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 22 Apr 2015 11:54:35 -0600 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: References: <20150422115959.1ff2ee58@limelight.wooz.org>

Message-ID: On Wed, Apr 22, 2015 at 11:37 AM, Guido van Rossum wrote: > On Wed, Apr 22, 2015 at 10:33 AM, Eric Snow > wrote: >> The surprising part is that it also happens for explicit relative >> imports. I'm guessing that part was unintentional and simply not >> noticed when PEP 328 was implemented. > > > No, that must also have been intentional, because even when you use relative > import, the module you imported knows its full name, and that full name is > used as its key in sys.modules. If someone else uses absolute import for the > same module they should still get the same module object. I see what you're saying. __name__ and sys.modules definitely need to reflect the fully resolved name. I just mean that for relative import there is no need to bind the submodule to the parent, is there? I'd consider it a code smell if I saw a module that imports just the parent and expecting the submodule to be bound there due to an import in yet another module. EIBTI. That's not specific to relative imports, but it does demonstrate the only case I can think of where someone might be relying on this behavior of relative imports. That's why I think it was unintentional. Regardless, there's nothing to be done at this point besides document the behavior. :) -eric From guido at python.org Wed Apr 22 20:06:06 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 22 Apr 2015 11:06:06 -0700 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: References: <20150422115959.1ff2ee58@limelight.wooz.org>

Message-ID: On Wed, Apr 22, 2015 at 10:54 AM, Eric Snow wrote: > On Wed, Apr 22, 2015 at 11:37 AM, Guido van Rossum > wrote: > > On Wed, Apr 22, 2015 at 10:33 AM, Eric Snow > > > wrote: > >> The surprising part is that it also happens for explicit relative > >> imports. I'm guessing that part was unintentional and simply not > >> noticed when PEP 328 was implemented. > > > > > > No, that must also have been intentional, because even when you use > relative > > import, the module you imported knows its full name, and that full name > is > > used as its key in sys.modules. If someone else uses absolute import for > the > > same module they should still get the same module object. > > I see what you're saying. __name__ and sys.modules definitely need to > reflect the fully resolved name. I just mean that for relative import > there is no need to bind the submodule to the parent, is there? But there *is* a reason. The submodule must still be an attribute of the parent package, because of the invariant that if you have sys.modules['foo'] and sys.modules['foo.bar'], the latter must appear as the 'bar' attribute of the former. This is an invariant of module loading, and (I feel I'm repeating myself) the form of import used does not affect loading. > I'd > consider it a code smell if I saw a module that imports just the > parent and expecting the submodule to be bound there due to an import > in yet another module. It is indeed a code smell, and linters should point it out. (Unless this is a documented part of the contract of the parent package, like it is for os and os.path.) But it *is* required that if you are sure you have loaded foo.bar, bar is an attribute of foo. There is another import rule that is relevant here; while `import foo.bar` requires that bar is a module, `from foo import bar` accepts bar as any attribute that exists in foo. The rule is then that after `import foo.bar`, if you later do `from foo import bar`, it will use the bar attribute, and that will be the bar submodule. (Unless foo's __import__.py messed around with it.) > EIBTI. That's not specific to relative > imports, but it does demonstrate the only case I can think of where > someone might be relying on this behavior of relative imports. That's > why I think it was unintentional. > > Regardless, there's nothing to be done at this point besides document > the behavior. :) > I expect if we redesigned it from first principles we'd end up with the same rules in this case. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Wed Apr 22 20:24:42 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 22 Apr 2015 12:24:42 -0600 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: References: <20150422115959.1ff2ee58@limelight.wooz.org>

Message-ID: On Wed, Apr 22, 2015 at 12:06 PM, Guido van Rossum wrote: > On Wed, Apr 22, 2015 at 10:54 AM, Eric Snow >> I see what you're saying. __name__ and sys.modules definitely need to >> reflect the fully resolved name. I just mean that for relative import >> there is no need to bind the submodule to the parent, is there? > > > But there *is* a reason. The submodule must still be an attribute of the > parent package, because of the invariant that if you have sys.modules['foo'] > and sys.modules['foo.bar'], the latter must appear as the 'bar' attribute of > the former. This is an invariant of module loading, and (I feel I'm > repeating myself) the form of import used does not affect loading. Right. That invariant is the only reason the bahavior should apply to relative imports. I just hadn't considered that as a general invariant until this conversation. :) > There is another import rule that is relevant here; while `import foo.bar` > requires that bar is a module, `from foo import bar` accepts bar as any > attribute that exists in foo. The rule is then that after `import foo.bar`, > if you later do `from foo import bar`, it will use the bar attribute, and > that will be the bar submodule. (Unless foo's __import__.py messed around > with it.) Yep. The code to support this ("fromlist") is some of Brett's favorite. I believe he describes it as "hell". :) > >> >> EIBTI. That's not specific to relative >> imports, but it does demonstrate the only case I can think of where >> someone might be relying on this behavior of relative imports. That's >> why I think it was unintentional. >> >> Regardless, there's nothing to be done at this point besides document >> the behavior. :) > > > I expect if we redesigned it from first principles we'd end up with the same > rules in this case. Fair enough. :) -eric From encukou at gmail.com Fri Apr 24 12:19:25 2015 From: encukou at gmail.com (Petr Viktorin) Date: Fri, 24 Apr 2015 12:19:25 +0200 Subject: [Import-SIG] Singleton modules (Re: PEP 489: Redesigning extension module loading) In-Reply-To: <5530E182.6040804@redhat.com> References: <5530E182.6040804@redhat.com> Message-ID: <553A18AD.1020001@gmail.com> On 04/17/2015 12:33 PM, Petr Viktorin wrote: > On 04/17/2015 08:51 AM, Stefan Behnel wrote: >> Petr Viktorin schrieb am 16.04.2015 um 13:05: >>> A wart I added is "singleton modules", necessary for >>> "PyState_FindModule"-like functionality. I wouldn't mind not including >>> this, but it would mean the new API can't replace all use cases of the >>> old PyInit_. >>> >>> Singleton Modules >>> ----------------- >>> >>> Modules defined by PyModuleDef may be registered with PyState_AddModule, >>> and later retrieved with PyState_FindModule. >>> >>> Under the new API, there is no one-to-one mapping between PyModuleSpec >>> and the module created from it. >>> In particular, multiple modules may be loaded from the same description. >> >> Is that because a single shared library (which is what the module spec >> refers to, right?) can contain multiple modules? Or are you referring to >> something else here? > > By using Loader.create_module/Loader.exec_module directly, you can load > an extension module without adding it to sys.modules. You can do this as > many times as you like, and you always get a new, independent module > object. > > >>> This means that there is no "global" instance of a module object. >>> Any C-level callbacks that need access to the module state need to be >>> passed >>> a reference to the module object, either directly or indirectly. >>> >>> >>> However, there are some modules that really need to be only loaded once: >>> typically ones that wrap a C library with global state. >>> These modules should set the PyModule_EXPORT_SINGLETON flag >>> in PyModuleDesc.flags. When this flag is set, loading an additional >>> copy of the module after it has been loaded once will return the >>> previously >>> loaded object. >>> This will be done on a low level, using _PyImport_FixupExtensionObject. >>> Additionally, the module will be automatically registered using >>> PyState_AddSingletonModule (see below) after execution slots are >>> processed. >>> >>> Singleton modules can be retrieved, registered or unregistered with >>> the interpreter state using three new functions, which parallel their >>> PyModuleDef counterparts, PyState_FindModule, PyState_AddModule, >>> and PyState_RemoveModule:: >>> [...] >> >> Yes, this is totally a wart. However, I'm not sure I understand the >> actual >> use case from what is written above. Could you clarify why this whole >> special case is needed? > > Normally, you need to pass module object to any C-level callbacks that > need the module state in any way. Since in the new scheme of things > there may be multiple modules, you can't just attach a module object to > interpreter state and then look it up. > > However, consider wrapping a C library with global state. The library > might not allow you to pass arbitrary data to your callbacks, so there's > no proper way to get to the module object. > So you want to load the module only once, returning the same object when > it's created from the same slots, and a way to get to your module object > from anywhere. That's what Python does currently, with > PyState_FindModule for finding the module. > > Well, that's the use case as I understand it. > > It would read a bit better if PyModule_Def is reused ? in that variant > it's a way to keep PyState_FindModule working. The more I look at this > though, the more I see using PyState_FindModule as something that should > just be discontinued when converting a module to the new API. > > Perhaps it'll be better to remove the flag; there's always a possibility > to add it in the future. The technical reason for EXPORT_SINGLETON is to allow PyState_FindModule. I've looked into PyState_FindModule usage in the stdlib, and I think that rather than adding support for it in PEP 489, use cases for it should be removed, and I don't think good solutions are too related to the loading mechanism. In the meantime, modules that need PyState_FindModule can stick with PyInit, and if it turns out it's really needed, the flag/slot can be added at any time. That's the tl;dr version; I'll give details on my reasoning later. The upshot is: not all modules can be ported to PEP 489 (which would become just the first iteration of the new module loading mechanism). This insludes "_csv_ and "readline" which I picked to port as part of the initial implementation, so I'll pick other ones. (I did pick them precisely because they do complex things.) The details: I saw PyState_FindModule used in four scenarios. First is for modules that wrap a library with global state, or rather for C callbacks that don't get some state argument. For example, the readline library's rl_startup_hook is called with no arguments, so the wrapping module needs to look into global state to select Python code to run. This is where some kind of "singleton modules" would be useful. However, only one readline module can work correctly in a given *process*. A singleton mechanism would not only need prevent loading such a module multiple times in an interpreter, but also, somehow, across sub-interpreters. Solving this requires designing how readline (and others) should behave in the face of multiple interpreters. I don't think that is a job for PEP 489, and ? since using the PEP 489 mechanism is supposed to mean the module does support subinterpreters ? I now think providing singleton module support in PEP 489 is, at best, premature. If something like it does need to be added in the future, it will need better semantics than my current proposal. The second use of PyState_FindModule is in module-level functions, which (in Python 3) get the module object as an argument. This is just a holdover from Python 2 and can be fixed rather mechanically. The third use is as a crutch: the module reference is not passed to everything that needs it, so the stuff that needs it reaches out to global state. A problematic case is a method that needs to raise a module-specific exception: _pickle.Unpickler.dumps waits to raise _pickle.Error. Unfortunately, while a module's functions have a reference to the module (m_self), the classes don't. (And it's rather difficult to store arbitrary state on a class object; there's no m_size in PyType_FromSpec). So methods pretty much need to peek into global state. Here again, I think just allowing PyState_FindModule is not the proper solution. It unnecessarily restricts the module to a singleton. Also, if we ever get unloadable modules, it would become possible for a class to outlive its module, at which point PyState_FindModule would start failing. (And PyState_FindModule failures are usually fatal; the error handling story around it isn't great). I think the right solution would be to give classes a reference to their module, as methods have now. And I think this isn't in scope for PEP 489. (But it is possibly in scope for the future class-initialization slot.) The fourth use is sharing internal state with other modules. The _io module is a bit special since it's always available; non-stdlib modules should really use capsules for that. From ncoghlan at gmail.com Sat Apr 25 05:42:13 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 25 Apr 2015 13:42:13 +1000 Subject: [Import-SIG] Dabeaz's weird import discovery In-Reply-To: <20150422162153.GF4073@stoneleaf.us> References: <20150422115959.1ff2ee58@limelight.wooz.org> <20150422162153.GF4073@stoneleaf.us> Message-ID: On 23 April 2015 at 02:21, Ethan Furman wrote: > On 04/22, Barry Warsaw wrote: > >> Poking around in Lib/importlib/_bootstrap.py, I think you can see where this >> happens. In _find_and_load_unlocked(), 'round about line 2224 (in 3.5's >> hg:95593), you see this: >> >> if parent: >> # Set the module as an attribute on its parent. >> parent_module = sys.modules[parent] >> setattr(parent_module, name.rpartition('.')[2], module) >> >> It's clearly intentional, and fundamental to importlib so I don't think it's >> dependent on finder or loader. No matter how it happens, if a submodule is >> imported, its parent namespace gets a name binding to the submodule. > > I see that it's intentional, but why is it fundamental? The behaviour serves to preserve the equivalence of the following two snippets of code in terms of their impact on the "foo" namespace: from foo.bar import baz and: import foo.bar baz = foo.bar.baz The only difference between the two is that the latter also binds "foo" locally, while the from-import doesn't. The explicit relative import case is just slightly more surprising as the implied "foo" that is having a submodule added is the package namespace that's currently executing: from .bar import baz When we break the invariant that "foo.bar" existing in sys.modules implies there's a "bar" attribute on the parent "foo" module, all sorts of code tends to get upset (hence the weird errors folks can get in the face of circular imports, and why we finally relented and made from-import fall back to checking in sys.modules if it encounters a missing attribute) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Sun Apr 26 10:31:58 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 26 Apr 2015 10:31:58 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <5530E182.6040804@redhat.com> References: <5530E182.6040804@redhat.com> Message-ID: Petr Viktorin schrieb am 17.04.2015 um 12:33: > On 04/17/2015 08:51 AM, Stefan Behnel wrote: >> Petr Viktorin schrieb am 16.04.2015 um 13:05: >>> A wart I added is "singleton modules", necessary for >>> "PyState_FindModule"-like functionality. I wouldn't mind not including >>> this, but it would mean the new API can't replace all use cases of the >>> old PyInit_. >>> >>> Singleton Modules >>> ----------------- >>> >>> Modules defined by PyModuleDef may be registered with PyState_AddModule, >>> and later retrieved with PyState_FindModule. >>> >>> Under the new API, there is no one-to-one mapping between PyModuleSpec >>> and the module created from it. >>> In particular, multiple modules may be loaded from the same description. >> >> Is that because a single shared library (which is what the module spec >> refers to, right?) can contain multiple modules? Or are you referring to >> something else here? > > By using Loader.create_module/Loader.exec_module directly, you can load an > extension module without adding it to sys.modules. You can do this as many > times as you like, and you always get a new, independent module object. Should we even allow that for extension modules, as long as there is no reloading support? I guess subinterpreters would also have a say in this, right? Stefan From stefan_ml at behnel.de Sun Apr 26 11:23:40 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 26 Apr 2015 11:23:40 +0200 Subject: [Import-SIG] Singleton modules (Re: PEP 489: Redesigning extension module loading) In-Reply-To: <553A18AD.1020001@gmail.com> References: <5530E182.6040804@redhat.com> <553A18AD.1020001@gmail.com> Message-ID: Petr Viktorin schrieb am 24.04.2015 um 12:19: > I now think providing singleton > module support in PEP 489 is, at best, premature. > If something like it does need to be added in the future, it will need > better semantics than my current proposal. I think so, too. > The third use is as a crutch: the module reference is not passed to > everything that needs it, so the stuff that needs it reaches out to > global state. > A problematic case is a method that needs to raise a module-specific > exception: _pickle.Unpickler.dumps waits to raise _pickle.Error. > Unfortunately, while a module's functions have a reference to the module > (m_self), the classes don't. (And it's rather difficult to store > arbitrary state on a class object; there's no m_size in > PyType_FromSpec). So methods pretty much need to peek into global state. PyCFunctionObject contains a reference to the module, and manually implemented methods will usually be of that type. But yes, it's a general problem that it's not passed into the underlying C function. > I think the right solution would be to give classes a reference to their > module, as methods have now. And I think this isn't in scope for PEP > 489. (But it is possibly in scope for the future class-initialization slot.) Yes, I think the right solution is to extend the type struct, now that it's created on the heap via PyType_FromSpec() anyway. Then we could add a new C-API function that reads the module reference for some 'self' object. > The fourth use is sharing internal state with other modules. The _io > module is a bit special since it's always available; non-stdlib modules > should really use capsules for that. Yes, capsules are the correct mechanism here. Stefan From encukou at gmail.com Mon Apr 27 11:01:12 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 27 Apr 2015 11:01:12 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5530E182.6040804@redhat.com> Message-ID: On Sun, Apr 26, 2015 at 10:31 AM, Stefan Behnel wrote: > Petr Viktorin schrieb am 17.04.2015 um 12:33: >> On 04/17/2015 08:51 AM, Stefan Behnel wrote: >>> Petr Viktorin schrieb am 16.04.2015 um 13:05: >>>> A wart I added is "singleton modules", necessary for >>>> "PyState_FindModule"-like functionality. I wouldn't mind not including >>>> this, but it would mean the new API can't replace all use cases of the >>>> old PyInit_. >>>> >>>> Singleton Modules >>>> ----------------- >>>> >>>> Modules defined by PyModuleDef may be registered with PyState_AddModule, >>>> and later retrieved with PyState_FindModule. >>>> >>>> Under the new API, there is no one-to-one mapping between PyModuleSpec >>>> and the module created from it. >>>> In particular, multiple modules may be loaded from the same description. >>> >>> Is that because a single shared library (which is what the module spec >>> refers to, right?) can contain multiple modules? Or are you referring to >>> something else here? >> >> By using Loader.create_module/Loader.exec_module directly, you can load an >> extension module without adding it to sys.modules. You can do this as many >> times as you like, and you always get a new, independent module object. > > Should we even allow that for extension modules, as long as there is no > reloading support? That's a good question. Without reload/unload support, It isn't really necessary. But, in most cases avoiding PyState_FindModule is not that hard to do (especially once class methods are fixed, as mentioned in the other subthread). > I guess subinterpreters would also have a say in this, right? I also think independent modules are easier to reason about than subinterpreters; and if you support independent modules correctly (without PyState_FindModule) then there's a rather big chance you also got subinterpreter support right. Also it is easier to write tests with independent modules than with subinterpreters. So while PyState_FindModule works, I'd rather promote passing explicit references. (If that doesn't work somewhere, singleton modules can be added again, but that might be a special use case that'll need a slightly different solution.)