From ncoghlan at gmail.com Sun Mar 1 01:48:44 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 1 Mar 2015 10:48:44 +1000 Subject: [Import-SIG] PEP for the removal of PYO files In-Reply-To: References: <20150228175708.372d145d@fsol> Message-ID: On 1 Mar 2015 07:16, "Brett Cannon" wrote: > > > > On Sat, Feb 28, 2015 at 11:57 AM Antoine Pitrou wrote: >> >> On Fri, 27 Feb 2015 17:06:59 +0000 >> Brett Cannon wrote: >> > >> > A period was chosen over a hyphen as a separator so as to distinguish >> > clearly that the optimization level is not part of the interpreter >> > version as specified by the cache tag. It also lends to the use of >> > the period in the file name to delineate semantically different >> > concepts. >> >> Indeed but why would other implementations have to mimick CPython here? >> Perhaps the whole idea of differing "optimization" levels doesn't make >> sense for them. > > > Directly it might not, but if they support the AST module along with passing AST nodes to compile() then they would implicitly support optimizations for bytecode through custom loaders. > > I also checked PyPy and IronPython 3 and they both support -O. > > But an implementation that chose to skip the ast module and not support -O is the best argument to support Nick's ask to not specify the optimization if it is 0 (although I'm not saying that's enough to sway me to change the PEP). I was only +0 on that particular idea myself, so I agree it's better to keep things consistent. However, the PEP should explicitly define what happens if the empty string (rather than None) is passed in. Since we need to define a standard way of handling that anyway, it could be a reasonable API for suppressing the new name segment entirely (even if CPython doesn't make use of it outside the test suite). Cheers, Nick. > > _______________________________________________ > Import-SIG mailing list > Import-SIG at python.org > https://mail.python.org/mailman/listinfo/import-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bcannon at gmail.com Sun Mar 1 17:05:28 2015 From: bcannon at gmail.com (Brett Cannon) Date: Sun, 01 Mar 2015 16:05:28 +0000 Subject: [Import-SIG] PEP for the removal of PYO files References: Message-ID: Here is the latest draft. I think the biggest bit is the expanded section of the Open Issues with a few more formatting proposals and Nick's suggestion to let the common case of no optimizations lead to no level being specified in the file name (I also changed the potential PEP # as 487 got snagged). Otherwise a sentence about getting to generate all optimization levels upfront and the empty string suppressing the inclusion of the optimization level are the other substantive changes. PEP: 488 Title: Elimination of PYO files Version: $Revision$ Last-Modified: $Date$ Author: Brett Cannon Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 20-Feb-2015 Post-History: Abstract ======== This PEP proposes eliminating the concept of PYO files from Python. To continue the support of the separation of bytecode files based on their optimization level, this PEP proposes extending the PYC file name to include the optimization level in bytecode repository directory (i.e., the ``__pycache__`` directory). Rationale ========= As of today, bytecode files come in two flavours: PYC and PYO. A PYC file is the bytecode file generated and read from when no optimization level is specified at interpreter startup (i.e., ``-O`` is not specified). A PYO file represents the bytecode file that is read/written when **any** optimization level is specified (i.e., when ``-O`` is specified, including ``-OO``). This means that while PYC files clearly delineate the optimization level used when they were generated -- namely no optimizations beyond the peepholer -- the same is not true for PYO files. Put in terms of optimization levels and the file extension: - 0: ``.pyc`` - 1 (``-O``): ``.pyo`` - 2 (``-OO``): ``.pyo`` The reuse of the ``.pyo`` file extension for both level 1 and 2 optimizations means that there is no clear way to tell what optimization level was used to generate the bytecode file. In terms of reading PYO files, this can lead to an interpreter using a mixture of optimization levels with its code if the user was not careful to make sure all PYO files were generated using the same optimization level (typically done by blindly deleting all PYO files and then using the `compileall` module to compile all-new PYO files [1]_). This issue is only compounded when people optimize Python code beyond what the interpreter natively supports, e.g., using the astoptimizer project [2]_. In terms of writing PYO files, the need to delete all PYO files every time one either changes the optimization level they want to use or are unsure of what optimization was used the last time PYO files were generated leads to unnecessary file churn. The change proposed by this PEP also allows for **all** optimization levels to be pre-compiled for bytecode files ahead of time, something that is currently impossible thanks to the reuse of the ``.pyo`` file extension for multiple optimization levels. As for distributing bytecode-only modules, having to distribute both ``.pyc`` and ``.pyo`` files is unnecessary for the common use-case of code obfuscation and smaller file deployments. Proposal ======== To eliminate the ambiguity that PYO files present, this PEP proposes eliminating the concept of PYO files and their accompanying ``.pyo`` file extension. To allow for the optimization level to be unambiguous as well as to avoid having to regenerate optimized bytecode files needlessly in the `__pycache__` directory, the optimization level used to generate a PYC file will be incorporated into the bytecode file name. Currently bytecode file names are created by ``importlib.util.cache_from_source()``, approximately using the following expression defined by PEP 3147 [3]_, [4]_, [5]_:: '{name}.{cache_tag}.pyc'.format(name=module_name, cache_tag=sys.implementation.cache_tag) This PEP proposes to change the expression to:: '{name}.{cache_tag}.opt-{optimization}.pyc'.format( name=module_name, cache_tag=sys.implementation.cache_tag, optimization=str(sys.flags.optimize)) The "opt-" prefix was chosen so as to provide a visual separator from the cache tag. The placement of the optimization level after the cache tag was chosen to preserve lexicographic sort order of bytecode file names based on module name and cache tag which will not vary for a single interpreter. The "opt-" prefix was chosen over "o" so as to be somewhat self-documenting. The "opt-" prefix was chosen over "O" so as to not have any confusion with "0" while being so close to the interpreter version number. A period was chosen over a hyphen as a separator so as to distinguish clearly that the optimization level is not part of the interpreter version as specified by the cache tag. It also lends to the use of the period in the file name to delineate semantically different concepts. For example, the bytecode file name of ``importlib.cpython-35.pyc`` would become ``importlib.cpython-35.opt-0.pyc``. If ``-OO`` had been passed to the interpreter then instead of ``importlib.cpython-35.pyo`` the file name would be ``importlib.cpython-35.opt-2.pyc``. Implementation ============== importlib --------- As ``importlib.util.cache_from_source()`` is the API that exposes bytecode file paths as while as being directly used by importlib, it requires the most critical change. As of Python 3.4, the function's signature is:: importlib.util.cache_from_source(path, debug_override=None) This PEP proposes changing the signature in Python 3.5 to:: importlib.util.cache_from_source(path, debug_override=None, *, optimization=None) The introduced ``optimization`` keyword-only parameter will control what optimization level is specified in the file name. If the argument is ``None`` then the current optimization level of the interpreter will be assumed. Any argument given for ``optimization`` will be passed to ``str()`` and must have ``str.isalnum()`` be true, else ``ValueError`` will be raised (this prevents invalid characters being used in the file name). If the empty string is passed in for ``optimization`` then the addition of the optimization will be suppressed, reverting to the file name format which predates this PEP. It is expected that beyond Python's own 0-2 optimization levels, third-party code will use a hash of optimization names to specify the optimization level, e.g. ``hashlib.sha256(','.join(['dead code elimination', 'constant folding'])).hexdigest()``. While this might lead to long file names, it is assumed that most users never look at the contents of the __pycache__ directory and so this won't be an issue. The ``debug_override`` parameter will be deprecated. As the parameter expects a boolean, the integer value of the boolean will be used as if it had been provided as the argument to ``optimization`` (a ``None`` argument will mean the same as for ``optimization``). A deprecation warning will be raised when ``debug_override`` is given a value other than ``None``, but there are no plans for the complete removal of the parameter as this time (but removal will be no later than Python 4). The various module attributes for importlib.machinery which relate to bytecode file suffixes will be updated [7]_. The ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will both be documented as deprecated and set to the same value as ``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be not later than Python 4). All various finders and loaders will also be updated as necessary, but updating the previous mentioned parts of importlib should be all that is required. Rest of the standard library ---------------------------- The various functions exposed by the ``py_compile`` and ``compileall`` functions will be updated as necessary to make sure they follow the new bytecode file name semantics [6]_, [1]_. The CLI for the ``compileall`` module will not be directly affected (the ``-b`` flag will be implicitly as it will no longer generate ``.pyo`` files when ``-O`` is specified). Compatibility Considerations ============================ Any code directly manipulating bytecode files from Python 3.2 on will need to consider the impact of this change on their code (prior to Python 3.2 -- including all of Python 2 -- there was no __pycache__ which already necessitates bifurcating bytecode file handling support). If code was setting the ``debug_override`` argument to ``importlib.util.cache_from_source()`` then care will be needed if they want the path to a bytecode file with an optimization level of 2. Otherwise only code **not** using ``importlib.util.cache_from_source()`` will need updating. As for people who distribute bytecode-only modules (i.e., use a bytecode file instead of a source file), they will have to choose which optimization level they want their bytecode files to be since distributing a ``.pyo`` file with a ``.pyc`` file will no longer be of any use. Since people typically only distribute bytecode files for code obfuscation purposes or smaller distribution size then only having to distribute a single ``.pyc`` should actually be beneficial to these use-cases. And since the magic number for bytecode files changed in Python 3.5 to support PEP 465 there is no need to support pre-existing ``.pyo`` files [8]_. Rejected Ideas ============== N/A Open Issues =========== Formatting of the optimization level in the file name ----------------------------------------------------- Using the "opt-" prefix and placing the optimization level between the cache tag and file extension is not critical. All options which have been considered are: * ``importlib.cpython-35.opt-0.pyc`` * ``importlib.cpython-35.opt0.pyc`` * ``importlib.cpython-35.o0.pyc`` * ``importlib.cpython-35.O0.pyc`` * ``importlib.cpython-35.0.pyc`` * ``importlib.cpython-35-O0.pyc`` * ``importlib.O0.cpython-35.pyc`` * ``importlib.o0.cpython-35.pyc`` * ``importlib.0.cpython-35.pyc`` These were initially rejected either because they would change the sort order of bytecode files, possible ambiguity with the cache tag, or were not self-documenting enough. Not specifying the optimization level when it is at 0 ----------------------------------------------------- It has been suggested that for the common case of when the optimizations are at level 0 that the entire part of the file name relating to the optimization level be left out. This would allow for file names of ``.pyc`` files to go unchanged, potentially leading to less backwards-compatibility issues. It would also allow a potentially redundant bit of information to be left out of the file name if an implementation of Python did not allow for optimizing bytecode. This would only occur, though, if the interpreter didn't support ``-O`` **and** didn't implement the ast module, else user's could implement their own optimizations. Arguments against allow for this is "explicit is better than implicit" and "special cases aren't special enough to break the rules". There are also currently no Python 3 interpreters that don't support ``-O``, so a potential Python 3 implementation which doesn't allow bytecode optimization is entirely theoretical at the moment. References ========== .. [1] The compileall module (https://docs.python.org/3/library/compileall.html#module-compileall) .. [2] The astoptimizer project (https://pypi.python.org/pypi/astoptimizer) .. [3] ``importlib.util.cache_from_source()`` ( https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source ) .. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1 ( https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437 ) .. [5] PEP 3147, PYC Repository Directories, Warsaw (http://www.python.org/dev/peps/pep-3147) .. [6] The py_compile module (https://docs.python.org/3/library/compileall.html#module-compileall) .. [7] The importlib.machinery module ( https://docs.python.org/3/library/importlib.html#module-importlib.machinery) .. [8] ``importlib.util.MAGIC_NUMBER`` ( https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER ) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Sun Mar 1 17:02:30 2015 From: brett at python.org (Brett Cannon) Date: Sun, 01 Mar 2015 16:02:30 +0000 Subject: [Import-SIG] PEP for the removal of PYO files References: <20150228175708.372d145d@fsol> Message-ID: On Sat, Feb 28, 2015 at 7:48 PM Nick Coghlan wrote: > > On 1 Mar 2015 07:16, "Brett Cannon" wrote: > > > > > > > > On Sat, Feb 28, 2015 at 11:57 AM Antoine Pitrou > wrote: > >> > >> On Fri, 27 Feb 2015 17:06:59 +0000 > >> Brett Cannon wrote: > >> > > >> > A period was chosen over a hyphen as a separator so as to distinguish > >> > clearly that the optimization level is not part of the interpreter > >> > version as specified by the cache tag. It also lends to the use of > >> > the period in the file name to delineate semantically different > >> > concepts. > >> > >> Indeed but why would other implementations have to mimick CPython here? > >> Perhaps the whole idea of differing "optimization" levels doesn't make > >> sense for them. > > > > > > Directly it might not, but if they support the AST module along with > passing AST nodes to compile() then they would implicitly support > optimizations for bytecode through custom loaders. > > > > I also checked PyPy and IronPython 3 and they both support -O. > > > > But an implementation that chose to skip the ast module and not support > -O is the best argument to support Nick's ask to not specify the > optimization if it is 0 (although I'm not saying that's enough to sway me > to change the PEP). > > I was only +0 on that particular idea myself, so I agree it's better to > keep things consistent. However, the PEP should explicitly define what > happens if the empty string (rather than None) is passed in. Since we need > to define a standard way of handling that anyway, it could be a reasonable > API for suppressing the new name segment entirely (even if CPython doesn't > make use of it outside the test suite). > Fair enough. It also provides a way to get to the old file name if it's desirable for some reason. I still have the option in the Open Issues section to see what it brings up in further discussions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Mon Mar 2 15:21:19 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 2 Mar 2015 15:21:19 +0100 Subject: [Import-SIG] Proto-PEP: Redesigning extension module loading In-Reply-To: References: Message-ID: >>>> We should expose some kind of API in importlib.util (or a better place?) >>>> that >>>> can be used to check that a module works with reloading and >>>> subinterpreters. >>> >>> >>> What would such an API actually check to verify that a module could be >>> reloaded? >> >> Obviously we can't check for static state or object leakage between >> subinterpreters. >> By using the new API, you promise that the extension does support >> reloading and subinterpreters. This will be prominently stated in the >> docs, and checked by this function. >> For the old API, PyModule_Create with m_size>=0 can be used to support >> subinterpreters. But I don't think the language in the docs is strong >> enough to say that m_size>=0 is a promise of such support. > > Ah, I wasn't clear in terms of "check" or "test" when I mentioned this > - I was literally referring to something that could be run in test > suites to try these things and see if they worked or not, rather than > to a runtime "can I reload this safely?" check. "Try it and see" is > likely to be a better approach to take there. Hm, how would such a test work? A function that takes a piece of code (like timeit does), runs it in a new subinterpreter, and check for leaks? Or runs it in a new process and verifies no objects remain after PyFinalize? That seems way out of scope here. Here is a new draft. I have removed the "Create-only" option, which simplified the PEP a bit. I've added PyCapsule helper functions. These ended up taking quite a few arguments. It would be possible to derive the capsule name just from module.__name__ and the attribute name, following the PyCapsule_Import convention, but I think specifying it explicitly is necessary to get the proper C-level check. I ended up requiring the module name, and constructing the capsule name from that and the attribute. So I got: PyObject *PyModule_AddCapsule( PyObject *module, const char *module_name, const char *attribute_name, void *pointer, PyCapsule_Destructor destructor) void *PyModule_GetCapsulePointer( PyObject *module, const char *module_name, const char *attribute_name) The first one would usually be used once per module, and the second one begs for an extension-specific macro to cast the result to a usable type, so expected usage is just SPAM_GET_DATA(m). I think this draft is fine now so I'll start working on the implementation: ---- PEP: XXX Title: Redesigning extension module loading Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan BDFL-Delegate: "???" Discussions-To: "???" Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015 Resolution: Abstract ======== This PEP proposes a redesign of the way in which extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. Extensions that do not require custom memory layout for their module objects may be executed in arbitrary pre-defined namespaces, paving the way for extension modules being runnable with Python's ``-m`` switch. Other extensions can use custom types for their module implementation. Module types are no longer restricted to types.ModuleType. This proposal makes it easy to support properties at the module level and to safely store arbitrary global state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. The initialisation function is not passed ModuleSpec information about the loaded module, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. The other disadvantage of the discrepancy is that existing Python programmers learning C cannot effectively map concepts between the two domains. As long as extension modules are fundamentally different from pure Python ones in the way they're initialised, they are harder for people to pick up without relying on something like cffi, SWIG or Cython to handle the actual extension module creation. Currently, extension modules are also not added to sys.modules until they are fully initialized, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. Without the fully qualified module name, it is not trivial to correctly add the module to sys.modules either. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps the backwards-compatible behavior, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process =================== Currently, extension modules export an initialisation function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialised module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then continues to initialise it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the fully qualified name of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The proposal ============ The current extension module initialisation will be deprecated in favour of a new initialisation scheme. Since the current scheme will continue to be available, existing code will continue to work unchanged, including binary compatibility. Extension modules that support the new initialisation scheme must export the public symbol "PyModuleExec_modulename", and optionally "PyModuleCreate_modulename", where "modulename" is the name of the module. This mimics the previous naming convention for the "PyInit_modulename" function. If defined, these symbols must resolve to C functions with the following signatures, respectively:: int (*PyModuleExecFunction)(PyObject* module) PyObject* (*PyModuleCreateFunction)(PyObject* module_spec) The PyModuleExec function ------------------------- The PyModuleExec function is used to implement "loader.exec_module" defined in PEP 451. It function will be called to initialize a module. (Usually, this amounts to setting the module's initial attributes.) This happens in two situations: when the module is first initialized for a given (sub-)interpreter, and possibly later when the module is reloaded. When PyModuleExec is called, the module has already been added to sys.modules, and import-related attributes specified in PEP 451 [#pep-0451-attributes]_) have been set on the module. The "module" argument receives the module object to initialize. If PyModuleCreate is defined, "module" will generally be the the object returned by it. It is possible for a custom loader to pass any object to PyModuleExec, so this function should check and fail with TypeError if the module's type is unsupported. Any other assumptions should also be checked. If PyModuleCreate is not defined, PyModuleExec is expected to operate on any Python object for which attributes can be added by PyObject_GetAttr* and retrieved by PyObject_SetAttr*. This allows loading an extension into a pre-created module, making it possible to run it as __main__ in the future, participate in certain lazy-loading schemes [#lazy_import_concerns]_, or enable other creative uses. If PyModuleExec replaces the module's entry in sys.modules, the new object will be used and returned by importlib machinery. (This mirrors the behavior of Python modules. Note that for extensions, implementing PyModuleCreate is usually a better solution for the use cases this serves.) The function must return ``0`` on success, or, on error, set an exception and return ``-1``. The PyModuleCreate function --------------------------- The optional PyModuleCreate function is used to implement "loader.create_module" defined in PEP 451. By exporting it, an extension module indicates that it uses a custom module object. This prevents loading the extension in a pre-created module, but gives greater flexibility in allowing a custom C-level layout of the module object. Most extensions will not need to implement this function. The "module_spec" argument receives a "ModuleSpec" instance, as defined in PEP 451. When called, this function must create and return a module object, or set an exception and return NULL. There is no requirement for the returned object to be an instance of types.ModuleType. Any type can be used, as long as it supports setting and getting attributes, including at least the import-related attributes specified in PEP 451 [#pep-0451-attributes]_. This follows the current support for allowing arbitrary objects in sys.modules and makes it easier for extension modules to define a type that exactly matches their needs for holding module state. Note that when this function is called, the module's entry in sys.modules is not populated yet. Attempting to import the same module again (possibly transitively), may lead to an infinite loop. Extension authors are advised to keep PyModuleCreate minimal, an in particular to not call user code from it. If PyModuleCreate is not defined, the default loader will construct a module object as if with PyModule_New. Initialization helper functions ------------------------------- For two initialization tasks previously done by PyModule_Create, two functions are introduced:: int PyModule_SetDocString(PyObject *m, const char *doc) int PyModule_AddFunctions(PyObject *m, PyMethodDef *functions) These set the module docstring, and add the module functions, respectively. Both will work on any Python object that supports setting attributes. They return ``0`` on success, and on failure, they set the exception and return ``-1``. PyCapsule convenience functions ------------------------------- Instead of custom module objects, PyCapsule will become the preferred mechanism for storing per-module C data. Two new convenience functions will be added to help with this. * :: PyObject *PyModule_AddCapsule( PyObject *module, const char *module_name, const char *attribute_name, void *pointer, PyCapsule_Destructor destructor) Add a new PyCapsule to *module* as *attribute_name*. The capsule name is formed by joining *module_name* and *attribute_name* by a dot. This convenience function can be used from a module initialization function instead of separate calls to PyCapsule_New and PyModule_AddObject. Returns a borrowed reference to the new capsule, or NULL (with exception set) on failure. * :: void *PyModule_GetCapsulePointer( PyObject *module, const char *module_name, const char *attribute_name) Returns the pointer stored in *module* as *attribute_name*, or NULL (with an exception set) on failure. The capsule name is formed by joining *module_name* and *attribute_name* by a dot. This convenience function can be used instead of separate calls to PyObject_GetAttr and PyCapsule_GetPointer. Extension authors are encouraged to define a macro to call PyModule_GetCapsulePointer and cast the result to an appropriate type. Generalizing PyModule_* functions --------------------------------- The following functions and macros will be modified to work on any object that supports attribute access: * PyModule_GetNameObject * PyModule_GetName * PyModule_GetFilenameObject * PyModule_GetFilename * PyModule_AddIntConstant * PyModule_AddStringConstant * PyModule_AddIntMacro * PyModule_AddStringMacro * PyModule_AddObject The PyModule_GetDict function will continue to only work on true module objects. This means that it should not be used on extension modules that only define PyModuleExec. Legacy Init ----------- If PyModuleExec is not defined, the import machinery will try to initialize the module using the PyModuleInit hook, as described in PEP 3121. If PyModuleExec is defined, PyModuleInit will be ignored. Modules requiring compatibility with previous versions of CPython may implement PyModuleInit in addition to the new hook. Subinterpreters and Interpreter Reloading ----------------------------------------- Extensions using the new initialization scheme are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. The mechanism is designed to make this easy, but care is still required on the part of the extension author. No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. Module Reloading ---------------- Reloading an extension module will re-execute its PyModuleInit function. Similar caveats apply to reloading an extension module as to reloading a Python module. Notably, attributes or any other state of the module are not reset before reloading. Additionally, due to limitations in shared library loading (both dlopen on POSIX and LoadModuleEx on Windows), it is not generally possible to load a modified library after it has changed on disk. Therefore, reloading extension modules is of limited use. Multiple modules in one library ------------------------------- To support multiple Python modules in one shared library, the library must export appropriate PyModuleExec_ or PyModuleCreate_ hooks for each exported module. The modules are loaded using a ModuleSpec with origin set to the name of the library file, and name set to the module name. Note that this mechanism can currently only be used to *load* such modules, not to *find* them. XXX: This is an existing issue; either fix it/wait for a fix or provide an example of how to load such modules. Implementation ============== XXX - not started Open issues =========== We should expose some kind of API in importlib.util (or a better place?) that can be used to check that a module works with reloading and subinterpreters. Related issues ============== The runpy module will need to be modified to take advantage of PEP 451 and this PEP. This is out of scope for this PEP. Previous Approaches =================== Stefan Behnel's initial proto-PEP [#stefans_protopep]_ had a "PyInit_modulename" hook that would create a module class, whose ``__init__`` would be then called to create the module. This proposal did not correspond to the (then nonexistent) PEP 451, where module creation and initialization is broken into distinct steps. It also did not support loading an extension into pre-existing module objects. Nick Coghlan proposed the Create annd Exec hooks, and wrote a prototype implementation [#nicks-prototype]_. At this time PEP 451 was still not implemented, so the prototype does not use ModuleSpec. References ========== .. [#lazy_import_concerns] https://mail.python.org/pipermail/python-dev/2013-August/128129.html .. [#pep-0451-attributes] https://www.python.org/dev/peps/pep-0451/#attributes .. [#stefans_protopep] https://mail.python.org/pipermail/python-dev/2013-August/128087.html .. [#nicks-prototype] https://mail.python.org/pipermail/python-dev/2013-August/128101.html Copyright ========= This document has been placed in the public domain. From barry at python.org Mon Mar 2 23:38:42 2015 From: barry at python.org (Barry Warsaw) Date: Mon, 2 Mar 2015 17:38:42 -0500 Subject: [Import-SIG] PEP for the removal of PYO files References: Message-ID: <20150302173842.70b1c483@anarchist.wooz.org> On Feb 28, 2015, at 09:08 PM, Brett Cannon wrote: >On Sat, Feb 28, 2015 at 11:50 AM Nick Coghlan wrote: >> * Can we make "opt-0" implied so normal pyc file names don't change at all? > >Sure, but why specifically? EIBTI makes me not want to have some optional >bit in the file name just make someone's life who didn't use >cache_from_source() a little easier. I'd rather like opt-0 to be implied too, just because I think it will be the common case and it's less clutter, but I could be convinced that for consistency, opt-0 should be explicit. Just like with old .pyo files, you'll still have to support *loading* implicit opt-0 __pycache__ .pyc files. Even if the bytecode has to be regenerated for Python 3.5, you can't guarantee what tool will be generating it. So for backward compatibility with third party tools, I think you still have to support loading the old file names for 3.5, but only if the new name doesn't exist. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From ncoghlan at gmail.com Tue Mar 3 13:44:56 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 3 Mar 2015 22:44:56 +1000 Subject: [Import-SIG] Proto-PEP: Redesigning extension module loading In-Reply-To: References:

Message-ID: On 3 March 2015 at 00:21, Petr Viktorin wrote: >>>>> We should expose some kind of API in importlib.util (or a better place?) >>>>> that >>>>> can be used to check that a module works with reloading and >>>>> subinterpreters. >>>> >>>> >>>> What would such an API actually check to verify that a module could be >>>> reloaded? >>> >>> Obviously we can't check for static state or object leakage between >>> subinterpreters. >>> By using the new API, you promise that the extension does support >>> reloading and subinterpreters. This will be prominently stated in the >>> docs, and checked by this function. >>> For the old API, PyModule_Create with m_size>=0 can be used to support >>> subinterpreters. But I don't think the language in the docs is strong >>> enough to say that m_size>=0 is a promise of such support. >> >> Ah, I wasn't clear in terms of "check" or "test" when I mentioned this >> - I was literally referring to something that could be run in test >> suites to try these things and see if they worked or not, rather than >> to a runtime "can I reload this safely?" check. "Try it and see" is >> likely to be a better approach to take there. > > Hm, how would such a test work? > A function that takes a piece of code (like timeit does), runs it in a > new subinterpreter, and check for leaks? Or runs it in a new process > and verifies no objects remain after PyFinalize? > That seems way out of scope here. Yeah, I was thinking along the lines of some of the tests in _testembed.c. However, you're right it shouldn't be a requirement of the PEP. > I think this draft is fine now so I'll start working on the implementation: Sounds good. Brett, could you do the honours and post this latest draft at the same time you post the PYO removal PEP? Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Tue Mar 3 14:01:08 2015 From: brett at python.org (Brett Cannon) Date: Tue, 03 Mar 2015 13:01:08 +0000 Subject: [Import-SIG] Proto-PEP: Redesigning extension module loading References:

Message-ID: Yeah, I can assign it a number and get it committed when I add my PYO PEP. On Tue, Mar 3, 2015 at 7:44 AM Nick Coghlan wrote: > On 3 March 2015 at 00:21, Petr Viktorin wrote: > >>>>> We should expose some kind of API in importlib.util (or a better > place?) > >>>>> that > >>>>> can be used to check that a module works with reloading and > >>>>> subinterpreters. > >>>> > >>>> > >>>> What would such an API actually check to verify that a module could be > >>>> reloaded? > >>> > >>> Obviously we can't check for static state or object leakage between > >>> subinterpreters. > >>> By using the new API, you promise that the extension does support > >>> reloading and subinterpreters. This will be prominently stated in the > >>> docs, and checked by this function. > >>> For the old API, PyModule_Create with m_size>=0 can be used to support > >>> subinterpreters. But I don't think the language in the docs is strong > >>> enough to say that m_size>=0 is a promise of such support. > >> > >> Ah, I wasn't clear in terms of "check" or "test" when I mentioned this > >> - I was literally referring to something that could be run in test > >> suites to try these things and see if they worked or not, rather than > >> to a runtime "can I reload this safely?" check. "Try it and see" is > >> likely to be a better approach to take there. > > > > Hm, how would such a test work? > > A function that takes a piece of code (like timeit does), runs it in a > > new subinterpreter, and check for leaks? Or runs it in a new process > > and verifies no objects remain after PyFinalize? > > That seems way out of scope here. > > Yeah, I was thinking along the lines of some of the tests in > _testembed.c. However, you're right it shouldn't be a requirement of > the PEP. > > > I think this draft is fine now so I'll start working on the > implementation: > > Sounds good. Brett, could you do the honours and post this latest > draft at the same time you post the PYO removal PEP? > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Mar 13 13:44:47 2015 From: brett at python.org (Brett Cannon) Date: Fri, 13 Mar 2015 12:44:47 +0000 Subject: [Import-SIG] Proto-PEP: Redesigning extension module loading In-Reply-To: References:

Message-ID: The PEP has been committed and assigned number 489 (it will eventually show up at https://www.python.org/dev/peps/pep-0489 once the PEPs are re-generated). Petr, from now on you can send changes to peps at python.org. Make sure you attach them to your email as a diff against https://hg.python.org/peps . The next steps are to post to python-dev saying the PEP exists and to discuss it here on the import-sig. You should also eliminate all XXX references in the PEP. On Tue, Mar 3, 2015 at 8:01 AM Brett Cannon wrote: > Yeah, I can assign it a number and get it committed when I add my PYO PEP. > > On Tue, Mar 3, 2015 at 7:44 AM Nick Coghlan wrote: > >> On 3 March 2015 at 00:21, Petr Viktorin wrote: >> >>>>> We should expose some kind of API in importlib.util (or a better >> place?) >> >>>>> that >> >>>>> can be used to check that a module works with reloading and >> >>>>> subinterpreters. >> >>>> >> >>>> >> >>>> What would such an API actually check to verify that a module could >> be >> >>>> reloaded? >> >>> >> >>> Obviously we can't check for static state or object leakage between >> >>> subinterpreters. >> >>> By using the new API, you promise that the extension does support >> >>> reloading and subinterpreters. This will be prominently stated in the >> >>> docs, and checked by this function. >> >>> For the old API, PyModule_Create with m_size>=0 can be used to support >> >>> subinterpreters. But I don't think the language in the docs is strong >> >>> enough to say that m_size>=0 is a promise of such support. >> >> >> >> Ah, I wasn't clear in terms of "check" or "test" when I mentioned this >> >> - I was literally referring to something that could be run in test >> >> suites to try these things and see if they worked or not, rather than >> >> to a runtime "can I reload this safely?" check. "Try it and see" is >> >> likely to be a better approach to take there. >> > >> > Hm, how would such a test work? >> > A function that takes a piece of code (like timeit does), runs it in a >> > new subinterpreter, and check for leaks? Or runs it in a new process >> > and verifies no objects remain after PyFinalize? >> > That seems way out of scope here. >> >> Yeah, I was thinking along the lines of some of the tests in >> _testembed.c. However, you're right it shouldn't be a requirement of >> the PEP. >> >> > I think this draft is fine now so I'll start working on the >> implementation: >> >> Sounds good. Brett, could you do the honours and post this latest >> draft at the same time you post the PYO removal PEP? >> >> Regards, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Mon Mar 16 13:38:13 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 16 Mar 2015 13:38:13 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading Message-ID: <5506CEB5.7050105@gmail.com> Hello, On import-sig, I've agreed to continue Nick Coghlan's work on making extension modules act more like Python ones, work well with PEP 451 (ModuleSpec), and encourage proper subinterpreter and reloading support. Here is the resulting PEP. I don't have a patch yet, but I'm working on it. There's a remaining open issue: providing a tool that can be run in test suites to check if a module behaves well with subinterpreters/reloading. I believe it's out of scope for this PEP but speak out if you disagree. Please discuss on import-sig. ======================= PEP: 489 Title: Redesigning extension module loading Version: $Revision$ Last-Modified: $Date$ Author: Petr Viktorin , Stefan Behnel , Nick Coghlan Discussions-To: import-sig at python.org Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Aug-2013 Python-Version: 3.5 Post-History: 23-Aug-2013, 20-Feb-2015 Resolution: Abstract ======== This PEP proposes a redesign of the way in which extension modules interact with the import machinery. This was last revised for Python 3.0 in PEP 3121, but did not solve all problems at the time. The goal is to solve them by bringing extension modules closer to the way Python modules behave; specifically to hook into the ModuleSpec-based loading mechanism introduced in PEP 451. Extensions that do not require custom memory layout for their module objects may be executed in arbitrary pre-defined namespaces, paving the way for extension modules being runnable with Python's ``-m`` switch. Other extensions can use custom types for their module implementation. Module types are no longer restricted to types.ModuleType. This proposal makes it easy to support properties at the module level and to safely store arbitrary global state in the module that is covered by normal garbage collection and supports reloading and sub-interpreters. Extension authors are encouraged to take these issues into account when using the new API. Motivation ========== Python modules and extension modules are not being set up in the same way. For Python modules, the module is created and set up first, then the module code is being executed (PEP 302). A ModuleSpec object (PEP 451) is used to hold information about the module, and passed to the relevant hooks. For extensions, i.e. shared libraries, the module init function is executed straight away and does both the creation and initialisation. The initialisation function is not passed ModuleSpec information about the loaded module, such as the __file__ or fully-qualified name. This hinders relative imports and resource loading. This is specifically a problem for Cython generated modules, for which it's not uncommon that the module init code has the same level of complexity as that of any 'regular' Python module. Also, the lack of __file__ and __name__ information hinders the compilation of __init__.py modules, i.e. packages, especially when relative imports are being used at module init time. The other disadvantage of the discrepancy is that existing Python programmers learning C cannot effectively map concepts between the two domains. As long as extension modules are fundamentally different from pure Python ones in the way they're initialised, they are harder for people to pick up without relying on something like cffi, SWIG or Cython to handle the actual extension module creation. Currently, extension modules are also not added to sys.modules until they are fully initialized, which means that a (potentially transitive) re-import of the module will really try to reimport it and thus run into an infinite loop when it executes the module init function again. Without the fully qualified module name, it is not trivial to correctly add the module to sys.modules either. Furthermore, the majority of currently existing extension modules has problems with sub-interpreter support and/or reloading, and, while it is possible with the current infrastructure to support these features, it is neither easy nor efficient. Addressing these issues was the goal of PEP 3121, but many extensions, including some in the standard library, took the least-effort approach to porting to Python 3, leaving these issues unresolved. This PEP keeps the backwards-compatible behavior, which should reduce pressure and give extension authors adequate time to consider these issues when porting. The current process =================== Currently, extension modules export an initialisation function named "PyInit_modulename", named after the file name of the shared library. This function is executed by the import machinery and must return either NULL in the case of an exception, or a fully initialised module object. The function receives no arguments, so it has no way of knowing about its import context. During its execution, the module init function creates a module object based on a PyModuleDef struct. It then continues to initialise it by adding attributes to the module dict, creating types, etc. In the back, the shared library loader keeps a note of the fully qualified module name of the last module that it loaded, and when a module gets created that has a matching name, this global variable is used to determine the fully qualified name of the module object. This is not entirely safe as it relies on the module init function creating its own module object first, but this assumption usually holds in practice. The proposal ============ The current extension module initialisation will be deprecated in favour of a new initialisation scheme. Since the current scheme will continue to be available, existing code will continue to work unchanged, including binary compatibility. Extension modules that support the new initialisation scheme must export the public symbol "PyModuleExec_modulename", and optionally "PyModuleCreate_modulename", where "modulename" is the name of the module. This mimics the previous naming convention for the "PyInit_modulename" function. If defined, these symbols must resolve to C functions with the following signatures, respectively:: int (*PyModuleExecFunction)(PyObject* module) PyObject* (*PyModuleCreateFunction)(PyObject* module_spec) The PyModuleExec function ------------------------- The PyModuleExec function is used to implement "loader.exec_module" defined in PEP 451. It function will be called to initialize a module. (Usually, this amounts to setting the module's initial attributes.) This happens in two situations: when the module is first initialized for a given (sub-)interpreter, and possibly later when the module is reloaded. When PyModuleExec is called, the module has already been added to sys.modules, and import-related attributes specified in PEP 451 [#pep-0451-attributes]_) have been set on the module. The "module" argument receives the module object to initialize. If PyModuleCreate is defined, "module" will generally be the the object returned by it. It is possible for a custom loader to pass any object to PyModuleExec, so this function should check and fail with TypeError if the module's type is unsupported. Any other assumptions should also be checked. If PyModuleCreate is not defined, PyModuleExec is expected to operate on any Python object for which attributes can be added by PyObject_GetAttr* and retrieved by PyObject_SetAttr*. This allows loading an extension into a pre-created module, making it possible to run it as __main__ in the future, participate in certain lazy-loading schemes [#lazy_import_concerns]_, or enable other creative uses. If PyModuleExec replaces the module's entry in sys.modules, the new object will be used and returned by importlib machinery. (This mirrors the behavior of Python modules. Note that for extensions, implementing PyModuleCreate is usually a better solution for the use cases this serves.) The function must return ``0`` on success, or, on error, set an exception and return ``-1``. The PyModuleCreate function --------------------------- The optional PyModuleCreate function is used to implement "loader.create_module" defined in PEP 451. By exporting it, an extension module indicates that it uses a custom module object. This prevents loading the extension in a pre-created module, but gives greater flexibility in allowing a custom C-level layout of the module object. Most extensions will not need to implement this function. The "module_spec" argument receives a "ModuleSpec" instance, as defined in PEP 451. When called, this function must create and return a module object, or set an exception and return NULL. There is no requirement for the returned object to be an instance of types.ModuleType. Any type can be used, as long as it supports setting and getting attributes, including at least the import-related attributes specified in PEP 451 [#pep-0451-attributes]_. This follows the current support for allowing arbitrary objects in sys.modules and makes it easier for extension modules to define a type that exactly matches their needs for holding module state. Note that when this function is called, the module's entry in sys.modules is not populated yet. Attempting to import the same module again (possibly transitively), may lead to an infinite loop. Extension authors are advised to keep PyModuleCreate minimal, an in particular to not call user code from it. If PyModuleCreate is not defined, the default loader will construct a module object as if with PyModule_New. Initialization helper functions ------------------------------- For two initialization tasks previously done by PyModule_Create, two functions are introduced:: int PyModule_SetDocString(PyObject *m, const char *doc) int PyModule_AddFunctions(PyObject *m, PyMethodDef *functions) These set the module docstring, and add the module functions, respectively. Both will work on any Python object that supports setting attributes. They return ``0`` on success, and on failure, they set the exception and return ``-1``. PyCapsule convenience functions ------------------------------- Instead of custom module objects, PyCapsule will become the preferred mechanism for storing per-module C data. Two new convenience functions will be added to help with this. * :: PyObject *PyModule_AddCapsule( PyObject *module, const char *module_name, const char *attribute_name, void *pointer, PyCapsule_Destructor destructor) Add a new PyCapsule to *module* as *attribute_name*. The capsule name is formed by joining *module_name* and *attribute_name* by a dot. This convenience function can be used from a module initialization function instead of separate calls to PyCapsule_New and PyModule_AddObject. Returns a borrowed reference to the new capsule, or NULL (with exception set) on failure. * :: void *PyModule_GetCapsulePointer( PyObject *module, const char *module_name, const char *attribute_name) Returns the pointer stored in *module* as *attribute_name*, or NULL (with an exception set) on failure. The capsule name is formed by joining *module_name* and *attribute_name* by a dot. This convenience function can be used instead of separate calls to PyObject_GetAttr and PyCapsule_GetPointer. Extension authors are encouraged to define a macro to call PyModule_GetCapsulePointer and cast the result to an appropriate type. Generalizing PyModule_* functions --------------------------------- The following functions and macros will be modified to work on any object that supports attribute access: * PyModule_GetNameObject * PyModule_GetName * PyModule_GetFilenameObject * PyModule_GetFilename * PyModule_AddIntConstant * PyModule_AddStringConstant * PyModule_AddIntMacro * PyModule_AddStringMacro * PyModule_AddObject The PyModule_GetDict function will continue to only work on true module objects. This means that it should not be used on extension modules that only define PyModuleExec. Legacy Init ----------- If PyModuleExec is not defined, the import machinery will try to initialize the module using the PyModuleInit hook, as described in PEP 3121. If PyModuleExec is defined, PyModuleInit will be ignored. Modules requiring compatibility with previous versions of CPython may implement PyModuleInit in addition to the new hook. Subinterpreters and Interpreter Reloading ----------------------------------------- Extensions using the new initialization scheme are expected to support subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. The mechanism is designed to make this easy, but care is still required on the part of the extension author. No user-defined functions, methods, or instances may leak to different interpreters. To achieve this, all module-level state should be kept in either the module dict, or in the module object. A simple rule of thumb is: Do not define any static data, except built-in types with no mutable or user-settable class attributes. Module Reloading ---------------- Reloading an extension module will re-execute its PyModuleInit function. Similar caveats apply to reloading an extension module as to reloading a Python module. Notably, attributes or any other state of the module are not reset before reloading. Additionally, due to limitations in shared library loading (both dlopen on POSIX and LoadModuleEx on Windows), it is not generally possible to load a modified library after it has changed on disk. Therefore, reloading extension modules is of limited use. Multiple modules in one library ------------------------------- To support multiple Python modules in one shared library, the library must export appropriate PyModuleExec_ or PyModuleCreate_ hooks for each exported module. The modules are loaded using a ModuleSpec with origin set to the name of the library file, and name set to the module name. Note that this mechanism can currently only be used to *load* such modules, not to *find* them. XXX: This is an existing issue; either fix it/wait for a fix or provide an example of how to load such modules. Implementation ============== XXX - not started Open issues =========== We should expose some kind of API in importlib.util (or a better place?) that can be used to check that a module works with reloading and subinterpreters. Related issues ============== The runpy module will need to be modified to take advantage of PEP 451 and this PEP. This is out of scope for this PEP. Previous Approaches =================== Stefan Behnel's initial proto-PEP [#stefans_protopep]_ had a "PyInit_modulename" hook that would create a module class, whose ``__init__`` would be then called to create the module. This proposal did not correspond to the (then nonexistent) PEP 451, where module creation and initialization is broken into distinct steps. It also did not support loading an extension into pre-existing module objects. Nick Coghlan proposed the Create annd Exec hooks, and wrote a prototype implementation [#nicks-prototype]_. At this time PEP 451 was still not implemented, so the prototype does not use ModuleSpec. References ========== .. [#lazy_import_concerns] https://mail.python.org/pipermail/python-dev/2013-August/128129.html .. [#pep-0451-attributes] https://www.python.org/dev/peps/pep-0451/#attributes .. [#stefans_protopep] https://mail.python.org/pipermail/python-dev/2013-August/128087.html .. [#nicks-prototype] https://mail.python.org/pipermail/python-dev/2013-August/128101.html Copyright ========= This document has been placed in the public domain. From encukou at gmail.com Mon Mar 16 13:39:26 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 16 Mar 2015 13:39:26 +0100 Subject: [Import-SIG] Proto-PEP: Redesigning extension module loading In-Reply-To: References:

Message-ID: On Fri, Mar 13, 2015 at 1:44 PM, Brett Cannon wrote: > The PEP has been committed and assigned number 489 (it will eventually show > up at https://www.python.org/dev/peps/pep-0489 once the PEPs are > re-generated). > > Petr, from now on you can send changes to peps at python.org. Make sure you > attach them to your email as a diff against https://hg.python.org/peps . The > next steps are to post to python-dev saying the PEP exists and to discuss it > here on the import-sig. You should also eliminate all XXX references in the > PEP. Thank you! I've posted to python-dev now, and I'll remove XXX's when I have a presentable patch. From ncoghlan at gmail.com Mon Mar 16 14:12:42 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 16 Mar 2015 23:12:42 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <5506CEB5.7050105@gmail.com> References: <5506CEB5.7050105@gmail.com> Message-ID: This version looks good to me, although I noticed one significant typo worth fixing. On 16 March 2015 at 22:38, Petr Viktorin wrote: > Hello, > On import-sig, I've agreed to continue Nick Coghlan's work on making > extension modules act more like Python ones, work well with PEP 451 > (ModuleSpec), and encourage proper subinterpreter and reloading support. > Here is the resulting PEP. > > I don't have a patch yet, but I'm working on it. > > There's a remaining open issue: providing a tool that can be run in test > suites to check if a module behaves well with subinterpreters/reloading. I > believe it's out of scope for this PEP but speak out if you disagree. I no longer think we need a public testing API at this point, but I'd like to ensure we have something in test.support or the importlib tests that checks this for at least some of the stdlib extensions modules (there may be something already, but if there is, I'm not sure where it lives). It also occurs to me we may need (or at least want) an explicit "legacy style" import module as test fodder (to avoid accidentally breaking that as stdlib modules get converted), as well as nominating at least one stdlib extension module as the first module to be converted to the new style as part of the initial implementation. > Module Reloading > ---------------- > > Reloading an extension module will re-execute its PyModuleInit function. > Similar caveats apply to reloading an extension module as to reloading > a Python module. Notably, attributes or any other state of the module > are not reset before reloading. s/PyModuleInit/PyModuleExec/ here Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Mon Mar 16 18:42:44 2015 From: encukou at gmail.com (Petr Viktorin) Date: Mon, 16 Mar 2015 18:42:44 +0100 Subject: [Import-SIG] [Python-Dev] PEP 489: Redesigning extension module loading In-Reply-To: <5506f9e2.15148c0a.2361.ffff8a31@mx.google.com> References: <5506CEB5.7050105@gmail.com> <5506f9e2.15148c0a.2361.ffff8a31@mx.google.com> Message-ID: On Mon, Mar 16, 2015 at 4:42 PM, Jim J. Jewett wrote: > > On 16 March 2015 Petr Viktorin wrote: > >> If PyModuleCreate is not defined, PyModuleExec is expected to operate >> on any Python object for which attributes can be added by PyObject_GetAttr* >> and retrieved by PyObject_SetAttr*. > > I assume it is the other way around (add with Set and retrieve with Get), > rather than a description of the required form of magic. Right you are, I mixed that up. >> PyObject *PyModule_AddCapsule( >> PyObject *module, >> const char *module_name, >> const char *attribute_name, >> void *pointer, >> PyCapsule_Destructor destructor) > > What happens if module_name doesn't match the module's __name__? > Does it become a hidden attribute? A dotted attribute? Is the > result undefined? The module_name is used to name the capsule, following the convention from PyCapsule_Import. The "module.__name__" is not used or checked. The function would do this: capsule_name = module_name + '.' + attribute_name capsule = PyCapsule_New(pointer, capsule_name, destructor) PyModule_AddObject(module, attribute_name, capsule) just with error handling, and suitable C code for the "+". I will add the pseudocode to the PEP. > Later, there is > >> void *PyModule_GetCapsulePointer( >> PyObject *module, >> const char *module_name, >> const char *attribute_name) > > with the same apparently redundant arguments, Here the behavior would be: capsule_name = module_name + '.' + attribute_name capsule = PyObject_GetAttr(module, attribute_name) return PyCapsule_GetPointer(capsule, capsule_name) > but not a > PyModule_SetCapsulePointer. Are capsule pointers read-only, or can > they be replaced with another call to PyModule_AddCapsule, or by a > simple PyObject_SetAttr? You can replace the capsule using any of those two, or set the pointer using PyCapsule_SetPointer, or (most likely) change the data the pointer points to. The added functions are just simple helpers for common operations, meant to encourage keeping per-module state. >> Subinterpreters and Interpreter Reloading > ... >> No user-defined functions, methods, or instances may leak to different >> interpreters. > > By "user-defined" do you mean "defined in python, as opposed to in > the extension itself"? Yes. > If so, what is the recommendation for modules that do want to support, > say, callbacks? A dual-layer mapping that uses the interpreter as the > first key? Naming it _module and only using it indirectly through > module.py, which is not shared across interpreters? Not using this > API at all? There is a separate module object, with its own dict, for each subinterpreter (as when creating the module with "PyModuleDef.m_size == 0" today). Callbacks should be stored on the appropriate module instance. Does that answer your question? I'm not sure how you meant "callbacks". >> To achieve this, all module-level state should be kept in either the module >> dict, or in the module object. > > I don't see how that is related to leakage. > >> A simple rule of thumb is: Do not define any static data, except >> built-in types >> with no mutable or user-settable class attributes. > > What about singleton instances? Should they be per-interpreter? Yes, definitely. > What about constants, such as PI? In PyModuleExec, create the constant using PyFloat_FromDouble, and add it using PyModule_FromObject. That will do the right thing. (Float constants can be shared, since they cannot refer to user-defined code. But this PEP shields you from needing to know this for every type.) > Where should configuration variables (e.g., MAX_SEARCH_DEPTH) be > kept? On the module object. > What happens if this no-leakage rule is violated? Does the module > not load, or does it just maybe lead to a crash down the road? It may, as today, lead to unexpected behavior down the road. This is explained here: https://docs.python.org/3/c-api/init.html#sub-interpreter-support Unfortunately, there's no good way to detect such leakage. This PEP adds the tools, documentation, and guidelines to make it easy to do the right thing, but won't prevent you from shooting yourself in the foot in C code. Thank you for sharing your concerns! I will keep them in mind when writing the docs for this. From breamoreboy at yahoo.co.uk Tue Mar 17 21:41:28 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 17 Mar 2015 20:41:28 +0000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <5506CEB5.7050105@gmail.com> References: <5506CEB5.7050105@gmail.com> Message-ID: On 16/03/2015 12:38, Petr Viktorin wrote: > Hello, Can you use anything from the meta issue http://bugs.python.org/issue15787 for PEP 3121 and PEP 384 or will the work that you are doing render everything done previously redundant? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From ncoghlan at gmail.com Wed Mar 18 16:01:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Mar 2015 01:01:14 +1000 Subject: [Import-SIG] [Python-Dev] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> Message-ID: On 18 March 2015 at 06:41, Mark Lawrence wrote: > On 16/03/2015 12:38, Petr Viktorin wrote: >> >> Hello, > > > Can you use anything from the meta issue http://bugs.python.org/issue15787 > for PEP 3121 and PEP 384 or will the work that you are doing render > everything done previously redundant? Nothing should break in relation to PEP 3121 or 384, so I think that determination would still need to be made on a case by case basis. Alternatively, it may be possible to update the abitype.py converter to also switch to the new module initialisation hooks (if we can figure out a good way of automating that). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Thu Mar 19 11:31:14 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 19 Mar 2015 11:31:14 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <5506CEB5.7050105@gmail.com> References: <5506CEB5.7050105@gmail.com> Message-ID: Hi Petr, thanks for working on this. I added my comments inline. > Motivation > ========== > ... > The other disadvantage of the discrepancy is that existing Python programmers > learning C cannot effectively map concepts between the two domains. > As long as extension modules are fundamentally different from pure Python ones > in the way they're initialised, they are harder for people to pick up without > relying on something like cffi, SWIG or Cython to handle the actual extension > module creation. I don't think cffi fits as an example of extension module creation. It's more similar to ctypes, i.e. it tries to *avoid* third party extension modules. > The proposal > ============ > ... > Extension modules that support the new initialisation scheme must export > the public symbol "PyModuleExec_modulename", and optionally > "PyModuleCreate_modulename", where "modulename" is the > name of the module. This mimics the previous naming convention for > the "PyInit_modulename" function. Just a minor thing, but wouldn't it be better if the two had a common pre-underscore categorisation prefix, just like all other C-API functions? "PyExtModule_Exec_modulename" ? Pros: matches existing naming conventions, suggests that there's more than one function in this API corner ("you didn't know about Create when you copied my example?") Cons: longer and less beautiful name BTW, is there any way at all we can allow non-ASCII module names in this scheme? (Might not be in scope for this PEP, but if we change the module init scheme "for good" this time, it would be nice to have an idea if it'd be possible to support at all in the future.) > The PyModuleCreate function > --------------------------- I'd move this section here (before Exec) to match the process order and avoid forward references in the Exec section. It's worth stating explicitly when this function will be called. I guess it's always called right before Exec, also for subinterpreters and reload? > The PyModuleExec function > ------------------------- > ... > If PyModuleCreate is not defined, PyModuleExec is expected to operate > on any Python object for which attributes can be added by PyObject_GetAttr* > and retrieved by PyObject_SetAttr*. Good point. I think it's a valid requirement (and not a real restriction) that PEP-489 extension modules without a Create must accept any kind of object as "module", not just a PyModuleObject. The main problem with these things is that, in practice, the module *will* continue to be a PyModuleObject for the foreseeable future, so module authors will implicitly rely on it in one way or another... However: > This allows loading an extension into a pre-created module, making it possible > to run it as __main__ in the future, participate in certain lazy-loading > schemes [#lazy_import_concerns]_, or enable other creative uses. That sounds like a rather random bucket of potential future extensions, not sure it should be part of the PEP. But how is this requirement related to "__main__"? Does the proposed scheme really prevent that when Create *is* being implemented? How so? Or does it mean that modules that provide a Create function will never be able to be loaded lazily or in "other creative" ways? Cython modules will almost certainly (pending an actual implementation) provide their own Create function, for example. And others as well, given that a previous section has warmly advertised Create as a way to implement module properties, a feature that many extension module authors have felt a use for at some point. > Initialization helper functions > ------------------------------- > > For two initialization tasks previously done by PyModule_Create, > two functions are introduced:: > > int PyModule_SetDocString(PyObject *m, const char *doc) > int PyModule_AddFunctions(PyObject *m, PyMethodDef *functions) > > These set the module docstring, and add the module functions, respectively. Are these intended to be called by Create or Exec? While it sounds most appropriate to have Create set up the basic module object, calling these in Exec (i.e. after letting CPython set the module name/path/etc.) gives more freedom to the user. Should it matter? Should we suggest generally calling them in Exec instead of Create? (If only for consistency with modules that do not have a Create...) > PyCapsule convenience functions > ------------------------------- > > Instead of custom module objects, PyCapsule will become the preferred > mechanism for storing per-module C data. Why? Isn't an extension type a much simpler and substantially faster thing to use than an indirection through a capsule? Are we really encouraging users to let CPython do a string concatenation, Python string object creation, module attribute lookup and pointer extraction, just to access some value in the current module state? That sounds like a horrible amount of overhead. While a custom module extension type might not be entirely trivial to set up manually, it's still mostly just copy&paste (i.e. simple enough) and provides largely superior performance: a simple pointer indirection instead of the entire lookup dance above. Why not just rely on PyModule_GetState() for the time being? If we ever need to extend that mechanism and pass a different module object type into Exec(), that gives us a single place to support different (future) module types as well. And code that implements and returns its own module type from Create() can and will do its own straight forward cast anyway. > void *PyModule_GetCapsulePointer( > PyObject *module, > const char *module_name, > const char *attribute_name) > > Returns the pointer stored in *module* as *attribute_name*, or NULL > (with an exception set) on failure. The capsule name is formed by joining > *module_name* and *attribute_name* by a dot. > > This convenience function can be used instead of separate calls to > PyObject_GetAttr and PyCapsule_GetPointer. But that requires the user code to know the module name in all places where module state is needed (i.e. almost everywhere). Doesn't that counter the idea of passing the module spec into the Create function? And why is it necessary to pass the C encoded module name if the module itself (which knows its name as a readily prepared Python string) is the very first argument? BTW, it's worth mentioning the expected encoding of the C encoded names. UTF-8, I guess. > Generalizing PyModule_* functions > --------------------------------- > > The following functions and macros will be modified to work on any object > that supports attribute access: > > * PyModule_GetNameObject > * PyModule_GetName > * PyModule_GetFilenameObject > * PyModule_GetFilename > * PyModule_AddIntConstant > * PyModule_AddStringConstant > * PyModule_AddIntMacro > * PyModule_AddStringMacro > * PyModule_AddObject > > The PyModule_GetDict function will continue to only work on true module > objects. This means that it should not be used on extension modules that only > define PyModuleExec. That leads to somewhat unfortunate API naming, but I think it's acceptable. PyModule_GetState() is also worth mentioning here, in the same way as GetDict(). > Legacy Init > ----------- > > If PyModuleExec is not defined, the import machinery will try to initialize > the module using the PyModuleInit hook, as described in PEP 3121. The name is "PyInit_modulename". > If PyModuleExec is defined, PyModuleInit will be ignored. > Modules requiring compatibility with previous versions of CPython may > implement PyModuleInit in addition to the new hook. I guess the idea would be to implement PyInit() by calling either Create() or PyModule_Create(), and then Exec(), right? Should we suggest that in the PEP? > Subinterpreters and Interpreter Reloading > ----------------------------------------- > > Extensions using the new initialization scheme are expected to support > subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly. > The mechanism is designed to make this easy, but care is still required > on the part of the extension author. Would be nice to add a quick note that subinterpreter support basically means that the Create/Exec dance will be repeated for each interpreter instance, and that the module object will be garbage collected at the end of each interpreter life cycle. > No user-defined functions, methods, or instances may leak to different > interpreters. > To achieve this, all module-level state should be kept in either the module > dict, or in the module object. > A simple rule of thumb is: Do not define any static data, except built-in > types with no mutable or user-settable class attributes. I think it's also worth mentioning C level callbacks explicitly, since that can be quite tricky in some cases (it's one of the top-FAQs by Cython users). Whatever state is passed into the callback mechanism must include a direct or indirect reference to the module or state object as well if module state is used by the callback in any way (which is not unlikely). > Module Reloading > ---------------- > > Reloading an extension module will re-execute its PyModuleInit function. "Exec", as Nick already found. Worth mentioning explicitly that Create() will not be called again and that the object that Exec() receives is the same as returned by the original call to Create(). > Similar caveats apply to reloading an extension module as to reloading > a Python module. Notably, attributes or any other state of the module > are not reset before reloading. Interesting - is Exec() allowed to take advantage of that by not resetting some well selected attributes? E.g. constant global caches? Although I guess that would counter the idea of reloading a module... > Additionally, due to limitations in shared library loading (both dlopen on > POSIX and LoadModuleEx on Windows), it is not generally possible to load > a modified library after it has changed on disk. > Therefore, reloading extension modules is of limited use. Well, it could potentially use a hash suffix in the file name and still load under the same module name. See right below. > Multiple modules in one library > ------------------------------- > > To support multiple Python modules in one shared library, the library > must export appropriate PyModuleExec_ or PyModuleCreate_ hooks > for each exported module. > The modules are loaded using a ModuleSpec with origin set to the name of the > library file, and name set to the module name. > > Note that this mechanism can currently only be used to *load* such modules, > not to *find* them. > > XXX: This is an existing issue; either fix it/wait for a fix or provide > an example of how to load such modules. I really like that idea. It's essentially an extended inittab mechanism, also usable for executable single-file distributions (maybe even "python -m"), non-ASCII module names and "__init__.so" packages that import as an entire package structure of multiple modules. Needs some kind of "import module from library" C-API mechanism, though, or at least an explicitly exported list of modules to import from a shared library in the right order. I'd rather go for some kind of explicit import that creates these modules on request. Stefan From encukou at gmail.com Thu Mar 19 14:37:36 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 19 Mar 2015 14:37:36 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> Message-ID: <550AD120.9070406@gmail.com> On 03/19/2015 11:31 AM, Stefan Behnel wrote: > Hi Petr, > > thanks for working on this. I added my comments inline. Thanks for your comments, they're a nice reality check. I'm feeling a bit like I and Nick misunderstood Cython requirements somewhat, and concentrated on unimportant points (loading into pre-created modules) while ignoring important ones (fast access to module state). You also pointed out interesting things we didn't think about too much (non-ASCII names, multi-module extensions). One of the PEP's stated goals is that the behavior of extension modules should be be closer to Python modules. But if the solution (Exec-only modules) does't work for Cython, then the goal is pretty much irrelevant. I believe PyCapsule is the cleanest way of putting C state onto arbitrary objects, and by this time I can say it's not working. Perhaps it's time to say that extension modules *are* fundamentally different from pure Python ones. (And rewrite the PEP. *sigh*) I'll keep your comments in mind, but I have this idea that could make them obsolete; I'll reply to them if it gets shot down. >> Multiple modules in one library >> ------------------------------- >> >> To support multiple Python modules in one shared library, the library >> must export appropriate PyModuleExec_ or PyModuleCreate_ hooks >> for each exported module. >> The modules are loaded using a ModuleSpec with origin set to the name of the >> library file, and name set to the module name. >> >> Note that this mechanism can currently only be used to *load* such modules, >> not to *find* them. >> >> XXX: This is an existing issue; either fix it/wait for a fix or provide >> an example of how to load such modules. > > I really like that idea. It's essentially an extended inittab mechanism, > also usable for executable single-file distributions (maybe even "python > -m"), non-ASCII module names and "__init__.so" packages that import as an > entire package structure of multiple modules. > > Needs some kind of "import module from library" C-API mechanism, though, or > at least an explicitly exported list of modules to import from a shared > library in the right order. I'd rather go for some kind of explicit import > that creates these modules on request. It seems that, with this PEP, the main reason for extension authors to implement Create would be to get per-module state. PyCapsules in the module dict are not a good idea speed-wise; static C-level data is not an option if subinterpreters need to be supported. The "inittab" idea made me think of this: An extension could export an array of PyModuleDef, which has all the needed data for module creation and initialization: - m_name - for the "requested" name for the module (not necessarily what it'll be loaded as), for identifying modules in multi-module extensions - m_size - for requesting per-module C state) - m_reload (currently unused) would be the exec function (called for initialization and reload) This would rule out completely custom module objects, but are those needed anyway? A module can always replace itself in sys.modules if it needs extra magic. Getting rid of Create entirely supports a lot of the other goals (running user code in Create, pushing for subinterpreter support). And things like module properties or callable modules are not possible in source modules as well; perhaps those should be solved at a higher level. With this, you couldn't load extensions into arbitrary objects. But it would be possible to load into pre-created modules, as long as they were pre-created with the correct ModuleDef. It would probably be somewhat more difficult to make runpy (or custom loading libraries) ?work with these extension modules, but it should be possible. Implementation-wise, having m_reload filled in from the start would help: the PEP calls for looking up two entrypoints, and the lookup is relatively expensive (judging by the amount of caching in current code). It would also help with non-ASCII names, since the name is a string rather than a C identifier. Entrypoint and file names would need some design to make everything work. But before I go thinking about that: Does this seem like a better direction than Create/Exec? From solipsis at pitrou.net Thu Mar 19 15:17:55 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 19 Mar 2015 15:17:55 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com> Message-ID: <20150319151755.2650243b@fsol> On Thu, 19 Mar 2015 14:37:36 +0100 Petr Viktorin wrote: > On 03/19/2015 11:31 AM, Stefan Behnel wrote: > > Hi Petr, > > > > thanks for working on this. I added my comments inline. > > Thanks for your comments, they're a nice reality check. > I'm feeling a bit like I and Nick misunderstood Cython requirements > somewhat, and concentrated on unimportant points (loading into > pre-created modules) while ignoring important ones (fast access to > module state). Fast access is not only important for Cython, but for various stdlib modules as well (e.g. the _decimal module). By the way, another nice thing would be for access to always succeed; that simplifies extension module code quite a bit. > One of the PEP's stated goals is that the behavior of extension modules > should be be closer to Python modules. But if the solution (Exec-only > modules) does't work for Cython, then the goal is pretty much > irrelevant. I believe PyCapsule is the cleanest way of putting C state > onto arbitrary objects, and by this time I can say it's not working. Note that while *behaviour* may get closer to Python modules, implementation doesn't have to. For example, I don't think it's a problem if an extension module object has to be of a specific type; supporting duck-typing isn't important here. Regards Antoine. From stefan_ml at behnel.de Fri Mar 20 21:29:26 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 20 Mar 2015 21:29:26 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <550AD120.9070406@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com> Message-ID: Petr Viktorin schrieb am 19.03.2015 um 14:37: > On 03/19/2015 11:31 AM, Stefan Behnel wrote: >> thanks for working on this. I added my comments inline. > > Thanks for your comments, they're a nice reality check. Sorry if it was a bit too much. I didn't mean to shoot it down or so. I think we're on the right track, and the PEP will allow us to get a better idea of where we should be heading. > I'm feeling a bit like I and Nick misunderstood Cython requirements > somewhat, and concentrated on unimportant points (loading into pre-created > modules) You mean the split between Create and Exec? I think that's a very good and simple design. It gives the extension module full control over the module instance and its implementation (if it wants to), while leaving the core runtime full control over the basic setup of the module object's common API (__file__, __name__, etc.). Simple extension modules should be able to get away without implementing Create, so separating the two steps sounds better than requiring the module instantiation on user side and providing a callback into CPython to initialise it. > while ignoring important ones (fast access to module state). Yes, that *is* important. And I believe that a custom module (sub)type is a good way to achieve that, at least for Cython. For manually written modules, it might be easier to call PyModule_GetState(). > You also pointed out interesting things we didn't think about too much > (non-ASCII names, multi-module extensions). I just mentioned what came to my mind. We should still try to keep the PEP focussed on the problem at hand, but having some idea of what else might lie ahead can help with design decisions. > The "inittab" idea made me think of this: > > An extension could export an array of PyModuleDef, which has all the needed > data for module creation and initialization: I remember discussing this on python-dev, it was one of the ideas in the original thread that lead to the Create-Exec proto-pep: http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986 I think the main counter argument at the time was that there should be a way to control the module object instantiation. :) > - m_name - for the "requested" name for the module (not necessarily what > it'll be loaded as), for identifying modules in multi-module extensions > - m_size - for requesting per-module C state) > - m_reload (currently unused) would be the exec function (called for > initialization and reload) > > This would rule out completely custom module objects, but are those needed > anyway? A module can always replace itself in sys.modules if it needs extra > magic. Getting rid of Create entirely supports a lot of the other goals > (running user code in Create, pushing for subinterpreter support). And > things like module properties or callable modules are not possible in > source modules as well; perhaps those should be solved at a higher level. > > With this, you couldn't load extensions into arbitrary objects. But it > would be possible to load into pre-created modules, as long as they were > pre-created with the correct ModuleDef. It would probably be somewhat more > difficult to make runpy (or custom loading libraries) ?work with these > extension modules, but it should be possible. > > Implementation-wise, having m_reload filled in from the start would help: > the PEP calls for looking up two entrypoints, and the lookup is relatively > expensive (judging by the amount of caching in current code). > > It would also help with non-ASCII names, since the name is a string rather > than a C identifier. Entrypoint and file names would need some design to > make everything work. But before I go thinking about that: Does this seem > like a better direction than Create/Exec? It's still an alternative, I think. Nick objected to extending PyModuleDef because it's (obviously) part of the stable ABI. But we could instead export a new struct that *contains* a PyModuleDef, with additional callback functions like "new(spec)", as known from other extension types (tp_new). That would give us the Create() functionality (if set to non-NULL), or allow CPython to instantiate a regular module object (if set to NULL). With a magic version field at the top of the struct, this would also make it easy to extend in the future if we ever need more metadata or callbacks that we can't foresee now. Updating the version magic and appending to the struct is so much easier than writing a new PEP and redesigning the entire extension module init process again... So, yes, exporting a struct with module metadata and callbacks sounds like a very generic and straight forward interface to me. Stefan From ncoghlan at gmail.com Sat Mar 21 09:17:27 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 Mar 2015 18:17:27 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com> Message-ID: On 21 March 2015 at 06:29, Stefan Behnel wrote: > Petr Viktorin schrieb am 19.03.2015 um 14:37: >> On 03/19/2015 11:31 AM, Stefan Behnel wrote: >>> thanks for working on this. I added my comments inline. >> >> Thanks for your comments, they're a nice reality check. > > Sorry if it was a bit too much. I didn't mean to shoot it down or so. I > think we're on the right track, and the PEP will allow us to get a better > idea of where we should be heading. > > >> I'm feeling a bit like I and Nick misunderstood Cython requirements >> somewhat, and concentrated on unimportant points (loading into pre-created >> modules) > > You mean the split between Create and Exec? I think that's a very good and > simple design. It gives the extension module full control over the module > instance and its implementation (if it wants to), while leaving the core > runtime full control over the basic setup of the module object's common API > (__file__, __name__, etc.). > > Simple extension modules should be able to get away without implementing > Create, so separating the two steps sounds better than requiring the module > instantiation on user side and providing a callback into CPython to > initialise it. > > >> while ignoring important ones (fast access to module state). > > Yes, that *is* important. And I believe that a custom module (sub)type is a > good way to achieve that, at least for Cython. For manually written > modules, it might be easier to call PyModule_GetState(). Right, while I never really articulated it (not even to myself, let alone to Petr), I think my underlying assumption was that Cython would typically use Create+Exec for speed, but might offer a slower Exec-only option to get a more "Python-like" module behaviour that allowed Cython acceleration of directory, zipfile and package __main__ modules, along with other modules intended to be executed with the "-m" switch. One of the things I like about the PEP 489 design is that it should be general enough that Cython itself can decide what it wants to do on that front, without CPython needing to be aware of the details. On the capsule side of things, I think it's good to facilitate that as an alternative to having C extension modules link directly to each other, but I'm not sure it makes sense to encourage it as a way for a module to access its *own* state that can't readily be stored in a Python dictionary as a normal Python object. So perhaps the patterns to encourage here are: * prefer only defining Exec, with state stored as Python objects in the module globals * if you need C level global state, then you need to define Create as well and return a suitable object, such as a PyModule subclass, or the result of calling PyModule_Create with m_size > 0 in PyModuleDef * if you also need fast access to operations defined in other extension modules, prefer reading and saving references to the relevant capsule objects in Exec over direct C level linking at build time (Regarding that last point, we may want to some day consider exposing suitable capsules for some C accelerated standard library modules, like _decimal, rather than expanding the C API itself to cover those types) >> You also pointed out interesting things we didn't think about too much >> (non-ASCII names, multi-module extensions). > > I just mentioned what came to my mind. We should still try to keep the PEP > focussed on the problem at hand, but having some idea of what else might > lie ahead can help with design decisions. I briefly looked into C level UTF-8 support when adding a Unicode literal to the org() and chr() docs (I originally had it in the docstring as well, and it was pointed out in review that that might cause problems), and I'm not sure it's possible to sensibly support arbitrary Unicode module names for extension modules while our baseline assumption at the C level is C89 compatibility. We should definitely aim to cope with the fact that extension module names *might* contain arbitrary Unicode some day, even if we don't officially support that yet. I thought Brett actually implemented multi-module extension support a while back (which this PEP would then inherit), but I can't find any current evidence of that change, so either my recollection is wrong, or my search skills are failing me :) While looking for such evidence, I was also reminded of the fact that https://docs.python.org/3/c-api/ is missing a reference section on how extension module importing actually works - the only current explanation is in the more tutorial style https://docs.python.org/3/extending/extending.html#the-module-s-method-table-and-initialization-function That missing reference section is a docs gap that should likely be fixed as part of these changes. >> The "inittab" idea made me think of this: >> >> An extension could export an array of PyModuleDef, which has all the needed >> data for module creation and initialization: > > I remember discussing this on python-dev, it was one of the ideas in the > original thread that lead to the Create-Exec proto-pep: > > http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986 > > I think the main counter argument at the time was that there should be a > way to control the module object instantiation. :) It's an interesting notion - you could export the arguments to a call to PyModule_Create (and/or PyModule_Create2, and/or a new different function that accepts a different declaration API) and have an entirely static module initialisation process in at least some cases. It likely makes sense as a separate follow-on PEP for 3.6 though, as it's a further simplification of a certain way of using Create+Exec, and it's not clear just how you'd handle certain combinations of values in the current PyModuleDef struct. PEP 489 currently deals with that neatly by breaking out separate helper functions for initialising the docstring and the module globals function table that can be called from either Exec or Create as appropriate. >> - m_name - for the "requested" name for the module (not necessarily what >> it'll be loaded as), for identifying modules in multi-module extensions >> - m_size - for requesting per-module C state) >> - m_reload (currently unused) would be the exec function (called for >> initialization and reload) >> >> This would rule out completely custom module objects, but are those needed >> anyway? A module can always replace itself in sys.modules if it needs extra >> magic. Getting rid of Create entirely supports a lot of the other goals >> (running user code in Create, pushing for subinterpreter support). And >> things like module properties or callable modules are not possible in >> source modules as well; perhaps those should be solved at a higher level. I'd prefer not to guess at what might be useful in this space - the fact that the Create hook design leaves it open to third party experimentation is a feature, not a bug. If particularly useful patterns emerge that we want to recommend to new Python extension module authors, then we can standardise them at a later date (just as the current PEP 489 design is designed to standardise particular patterns in writing extension module Init methods). >> With this, you couldn't load extensions into arbitrary objects. But it >> would be possible to load into pre-created modules, as long as they were >> pre-created with the correct ModuleDef. It would probably be somewhat more >> difficult to make runpy (or custom loading libraries) ?work with these >> extension modules, but it should be possible. >> >> Implementation-wise, having m_reload filled in from the start would help: >> the PEP calls for looking up two entrypoints, and the lookup is relatively >> expensive (judging by the amount of caching in current code). >> >> It would also help with non-ASCII names, since the name is a string rather >> than a C identifier. Entrypoint and file names would need some design to >> make everything work. But before I go thinking about that: Does this seem >> like a better direction than Create/Exec? > > It's still an alternative, I think. Nick objected to extending PyModuleDef > because it's (obviously) part of the stable ABI. But we could instead > export a new struct that *contains* a PyModuleDef, with additional callback > functions like "new(spec)", as known from other extension types (tp_new). > That would give us the Create() functionality (if set to non-NULL), or > allow CPython to instantiate a regular module object (if set to NULL). > > With a magic version field at the top of the struct, this would also make > it easy to extend in the future if we ever need more metadata or callbacks > that we can't foresee now. Updating the version magic and appending to the > struct is so much easier than writing a new PEP and redesigning the entire > extension module init process again... > > So, yes, exporting a struct with module metadata and callbacks sounds like > a very generic and straight forward interface to me. Unless such an API is very carefully designed, it would be easy to fall into the trap of creating an API along the lines of the way C level extension class definitions were traditionally defined. That's a pretty horrible user experience if you're defining them by hand, which is why a lot of folks tend to cargo cult an existing class definition (and fill in the pieces they need), or else let something like Cython, SWIG or Boost deal with the problem for them. One of the biggest hassles with using a static struct is that you end up with a lot of cryptic padding to cover the slots you don't care about in order to get to the slots you actually do care about. The API design for defining types through the stable ABI (https://www.python.org/dev/peps/pep-0384/#type-objects), which was designed with the benefit of years of experience with the old approach, is much nicer, as the NULL-terminated list of named slots lets you only worry about the slots you care about, and the interpreter takes care of everything else. With the current design of PEP 489, the idea is that if you don't really care about the module object, you just define Exec, and the interpreter gives you a standard Python level module object. All your global state still gets stored as Python objects, and you just get the "C execution model with the Python data model" development experience which is actually quite a nice environment to program in. However, if you want straighforward access to the C *data* model at runtime as well as its execution model, then you can define Create and use the existing PyModule_Create APIs, or (as a new feature) a custom module subclass or a completely custom type, to define how your module state is stored. That two level approach gives you all the same flexibility you have today by defining a custom Init hook (and more), but also lets you opt out of learning most of the details of the C data model if all you're really after is faster low level manipulation of data stored in Python objects. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Sat Mar 21 11:04:17 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 21 Mar 2015 11:04:17 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

Message-ID: Nick Coghlan schrieb am 21.03.2015 um 09:17: > I think my underlying assumption was that Cython would > typically use Create+Exec for speed, but might offer a slower > Exec-only option to get a more "Python-like" module behaviour that > allowed Cython acceleration of directory, zipfile and package __main__ > modules, along with other modules intended to be executed with the > "-m" switch. How would these discourage (or disallow) usage of Create? > On the capsule side of things, I think it's good to facilitate that as > an alternative to having C extension modules link directly to each > other Is anyone really doing that? AFAIK, it's not even portable. Cython wraps all user exported APIs as pointers in capsules and automatically unpacks them on the other side at import time. It even shares its own internal extension types (function, generator, memoryview, etc.) across modules these days, all using capsules. > but I'm not sure it makes sense to encourage it as a way for a > module to access its *own* state that can't readily be stored in a > Python dictionary as a normal Python object. So perhaps the patterns > to encourage here are: > > * prefer only defining Exec, with state stored as Python objects in > the module globals > * if you need C level global state, then you need to define Create as > well and return a suitable object, such as a PyModule subclass, or the > result of calling PyModule_Create with m_size > 0 in PyModuleDef > * if you also need fast access to operations defined in other > extension modules, prefer reading and saving references to the > relevant capsule objects in Exec over direct C level linking at build > time +1 > (Regarding that last point, we may want to some day consider exposing > suitable capsules for some C accelerated standard library modules, > like _decimal, rather than expanding the C API itself to cover those > types) +10 :) > I briefly looked into C level UTF-8 support when adding a Unicode > literal to the org() and chr() docs (I originally had it in the > docstring as well, and it was pointed out in review that that might > cause problems), and I'm not sure it's possible to sensibly support > arbitrary Unicode module names for extension modules while our > baseline assumption at the C level is C89 compatibility. We should > definitely aim to cope with the fact that extension module names > *might* contain arbitrary Unicode some day, even if we don't > officially support that yet. The main problem with the current scheme is that the name of the module file must match the name of the exported symbol(s), and the module file name is search by the imported name. So there is a direct link between the (potentially non-ASCII) imported module name and the name of the (ASCII-only) exported entry point symbols. And the exported symbol names must be globally unique to support platforms with flat symbol namespaces. Uncoupling the imported module name from either the file name or the symbol name or even both isn't entirely obvious. I mean, ok, you could use a hash, or rather encode the name in punicode (and replace "-" by "_" in the symbol name). That would at least keep it somewhat readable for latin based scripts, while being fully backwards compatible to what we have (that was the whole point of the punicode design). Actually, why not just do that? :) > I thought Brett actually implemented multi-module extension support a > while back (which this PEP would then inherit), but I can't find any > current evidence of that change, so either my recollection is wrong, > or my search skills are failing me :) How should that work? Would it just try to look up all "PyInit_*" symbols and call them? In arbitrary order? > While looking for such evidence, I was also reminded of the fact that > https://docs.python.org/3/c-api/ is missing a reference section on how > extension module importing actually works - the only current > explanation is in the more tutorial style > https://docs.python.org/3/extending/extending.html#the-module-s-method-table-and-initialization-function > > That missing reference section is a docs gap that should likely be > fixed as part of these changes. +1 >>> The "inittab" idea made me think of this: >>> >>> An extension could export an array of PyModuleDef, which has all the needed >>> data for module creation and initialization: >> >> I remember discussing this on python-dev, it was one of the ideas in the >> original thread that lead to the Create-Exec proto-pep: >> >> http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986 >> >> I think the main counter argument at the time was that there should be a >> way to control the module object instantiation. :) > > It's an interesting notion - you could export the arguments to a call > to PyModule_Create (and/or PyModule_Create2, and/or a new different > function that accepts a different declaration API) and have an > entirely static module initialisation process in at least some cases. > > It likely makes sense as a separate follow-on PEP for 3.6 though, as > it's a further simplification of a certain way of using Create+Exec, > and it's not clear just how you'd handle certain combinations of > values in the current PyModuleDef struct. PEP 489 currently deals with > that neatly by breaking out separate helper functions for initialising > the docstring and the module globals function table that can be called > from either Exec or Create as appropriate. While I agree that this can be done later, I also think that adding yet another interface after the current change will only make it more difficult for users to get started and get their stuff done. Exporting a struct does sound like the most generic and future proof approach so far. If(f) we already assume that it will eventually become useful, we shouldn't go for less. Do you have any specific problem with the PyModuleDef "value combinations" in mind? I mean, we could always apply further restrictions on the content of an exported PyModuleDef when used for this interface. Unexpected setups should be easy to validate and reject by the import machinery, even if it's just because it's "not currently supported". Being strict is easy here. > With the current design of PEP 489, the idea is that if you don't > really care about the module object, you just define Exec, and the > interpreter gives you a standard Python level module object. All your > global state still gets stored as Python objects, and you just get the > "C execution model with the Python data model" development experience > which is actually quite a nice environment to program in. > > However, if you want straighforward access to the C *data* model at > runtime as well as its execution model, then you can define Create and > use the existing PyModule_Create APIs, or (as a new feature) a custom > module subclass or a completely custom type, to define how your module > state is stored. > > That two level approach gives you all the same flexibility you have > today by defining a custom Init hook (and more), but also lets you opt > out of learning most of the details of the C data model if all you're > really after is faster low level manipulation of data stored in Python > objects. I'm ok with either, but I'd really like to avoid replacing the new scheme by yet another new scheme in the future. Stefan From encukou at gmail.com Sat Mar 21 11:30:21 2015 From: encukou at gmail.com (Petr Viktorin) Date: Sat, 21 Mar 2015 11:30:21 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

Message-ID: <550D483D.2080007@gmail.com> On 03/21/2015 09:17 AM, Nick Coghlan wrote: > On 21 March 2015 at 06:29, Stefan Behnel wrote: >> Petr Viktorin schrieb am 19.03.2015 um 14:37: >>> On 03/19/2015 11:31 AM, Stefan Behnel wrote: >>>> thanks for working on this. I added my comments inline. >>> >>> Thanks for your comments, they're a nice reality check. >> >> Sorry if it was a bit too much. I didn't mean to shoot it down or so. I >> think we're on the right track, and the PEP will allow us to get a better >> idea of where we should be heading. Oh, no need to apologize! The result will be much better with your input :) >>> I'm feeling a bit like I and Nick misunderstood Cython requirements >>> somewhat, and concentrated on unimportant points (loading into pre-created >>> modules) >> >> You mean the split between Create and Exec? I think that's a very good and >> simple design. It gives the extension module full control over the module >> instance and its implementation (if it wants to), while leaving the core >> runtime full control over the basic setup of the module object's common API >> (__file__, __name__, etc.). >> >> Simple extension modules should be able to get away without implementing >> Create, so separating the two steps sounds better than requiring the module >> instantiation on user side and providing a callback into CPython to >> initialise it. >> >> >>> while ignoring important ones (fast access to module state). >> >> Yes, that *is* important. And I believe that a custom module (sub)type is a >> good way to achieve that, at least for Cython. For manually written >> modules, it might be easier to call PyModule_GetState(). > > Right, while I never really articulated it (not even to myself, let > alone to Petr), I think my underlying assumption was that Cython would > typically use Create+Exec for speed, but might offer a slower > Exec-only option to get a more "Python-like" module behaviour that > allowed Cython acceleration of directory, zipfile and package __main__ > modules, along with other modules intended to be executed with the > "-m" switch. It would be nice to extend runpy to handle Create+Exec modules. If this can be pulled off, there'd be no need for Exec-only modules except the convenience. * module reloading is useless for extension modules ? a changed version version can't be read from the disk, and correct reload behavior is another corner case for authors to think about * loading into custom objects is cool, but if the only use case mentioned so far is lazy loading, I think it's safe to drop * running as __main__ somehow taken care of > One of the things I like about the PEP 489 design is that it should be > general enough that Cython itself can decide what it wants to do on > that front, without CPython needing to be aware of the details. > > On the capsule side of things, I think it's good to facilitate that as > an alternative to having C extension modules link directly to each > other, but I'm not sure it makes sense to encourage it as a way for a > module to access its *own* state that can't readily be stored in a > Python dictionary as a normal Python object. So perhaps the patterns > to encourage here are: > > * prefer only defining Exec, with state stored as Python objects in > the module globals > * if you need C level global state, then you need to define Create as > well and return a suitable object, such as a PyModule subclass, or the > result of calling PyModule_Create with m_size > 0 in PyModuleDef > * if you also need fast access to operations defined in other > extension modules, prefer reading and saving references to the > relevant capsule objects in Exec over direct C level linking at build > time One thing I'm not clear about: what are the advantages of a module subclass over a normal module with m_size>0? It seems I'm missing something obvious here. > (Regarding that last point, we may want to some day consider exposing > suitable capsules for some C accelerated standard library modules, > like _decimal, rather than expanding the C API itself to cover those > types) > >>> You also pointed out interesting things we didn't think about too much >>> (non-ASCII names, multi-module extensions). >> >> I just mentioned what came to my mind. We should still try to keep the PEP >> focussed on the problem at hand, but having some idea of what else might >> lie ahead can help with design decisions. > > I briefly looked into C level UTF-8 support when adding a Unicode > literal to the org() and chr() docs (I originally had it in the > docstring as well, and it was pointed out in review that that might > cause problems), and I'm not sure it's possible to sensibly support > arbitrary Unicode module names for extension modules while our > baseline assumption at the C level is C89 compatibility. We should > definitely aim to cope with the fact that extension module names > *might* contain arbitrary Unicode some day, even if we don't > officially support that yet. Do you mean using non-ASCII characters in the literal itself? The proposal is not to make it easy and straightforward to use UTF-8 module names, but to make it possible. Cython can escape an UTF-8 string. The stdlib won't need it (outside tests, where it can be escaped). And extension authors are not all bound to cross-platform C89 ? if someone's writing a Chinese extension, they also need Chinese identifiers, to they probably already require a suitable compiler. > I thought Brett actually implemented multi-module extension support a > while back (which this PEP would then inherit), but I can't find any > current evidence of that change, so either my recollection is wrong, > or my search skills are failing me :) It's there, grep issue16421. > While looking for such evidence, I was also reminded of the fact that > https://docs.python.org/3/c-api/ is missing a reference section on how > extension module importing actually works - the only current > explanation is in the more tutorial style > https://docs.python.org/3/extending/extending.html#the-module-s-method-table-and-initialization-function > > That missing reference section is a docs gap that should likely be > fixed as part of these changes. Yes. >>> The "inittab" idea made me think of this: >>> >>> An extension could export an array of PyModuleDef, which has all the needed >>> data for module creation and initialization: >> >> I remember discussing this on python-dev, it was one of the ideas in the >> original thread that lead to the Create-Exec proto-pep: >> >> http://thread.gmane.org/gmane.comp.python.devel/135764/focus=140986 >> >> I think the main counter argument at the time was that there should be a >> way to control the module object instantiation. :) > > It's an interesting notion - you could export the arguments to a call > to PyModule_Create (and/or PyModule_Create2, and/or a new different > function that accepts a different declaration API) and have an > entirely static module initialisation process in at least some cases. > > It likely makes sense as a separate follow-on PEP for 3.6 though, as > it's a further simplification of a certain way of using Create+Exec, > and it's not clear just how you'd handle certain combinations of > values in the current PyModuleDef struct. PEP 489 currently deals with > that neatly by breaking out separate helper functions for initialising > the docstring and the module globals function table that can be called > from either Exec or Create as appropriate. The neatness might be more superficial than it seems. Separating Create and Exec has these effects: - Allowing you to implement just one and leave the rest to default machinery. This is good. - Allowing some time to pass between Create and Exec is called. This might be useful for lazy loading, I guess. - Allowing the loader or third-party code to modify the object between Create and Exec is called. This is dangerous (for consenting adults who don't mind the occasional segfault). - Allowing Exec to be called multiple times after Create, i.e. module reloading. I don't think there is a use case (and for module-specific cases it can be done in a separately exported function). - Allowing Exec without the corresponding Create, i.e. loading into arbitrary objects. This is cool, and it mimics what source modules can do, but I'm less and less convinced that it's actually useful. It's a lot to think about if you want to design a module that behaves correctly, and for some combinations it's not clear what "correctly" means. >>> - m_name - for the "requested" name for the module (not necessarily what >>> it'll be loaded as), for identifying modules in multi-module extensions >>> - m_size - for requesting per-module C state) >>> - m_reload (currently unused) would be the exec function (called for >>> initialization and reload) >>> >>> This would rule out completely custom module objects, but are those needed >>> anyway? A module can always replace itself in sys.modules if it needs extra >>> magic. Getting rid of Create entirely supports a lot of the other goals >>> (running user code in Create, pushing for subinterpreter support). And >>> things like module properties or callable modules are not possible in >>> source modules as well; perhaps those should be solved at a higher level. > > I'd prefer not to guess at what might be useful in this space - the > fact that the Create hook design leaves it open to third party > experimentation is a feature, not a bug. > > If particularly useful patterns emerge that we want to recommend to > new Python extension module authors, then we can standardise them at a > later date (just as the current PEP 489 design is designed to > standardise particular patterns in writing extension module Init > methods). > >>> With this, you couldn't load extensions into arbitrary objects. But it >>> would be possible to load into pre-created modules, as long as they were >>> pre-created with the correct ModuleDef. It would probably be somewhat more >>> difficult to make runpy (or custom loading libraries) ?work with these >>> extension modules, but it should be possible. >>> >>> Implementation-wise, having m_reload filled in from the start would help: >>> the PEP calls for looking up two entrypoints, and the lookup is relatively >>> expensive (judging by the amount of caching in current code). >>> >>> It would also help with non-ASCII names, since the name is a string rather >>> than a C identifier. Entrypoint and file names would need some design to >>> make everything work. But before I go thinking about that: Does this seem >>> like a better direction than Create/Exec? >> >> It's still an alternative, I think. Nick objected to extending PyModuleDef >> because it's (obviously) part of the stable ABI. But we could instead >> export a new struct that *contains* a PyModuleDef, with additional callback >> functions like "new(spec)", as known from other extension types (tp_new). >> That would give us the Create() functionality (if set to non-NULL), or >> allow CPython to instantiate a regular module object (if set to NULL). >> >> With a magic version field at the top of the struct, this would also make >> it easy to extend in the future if we ever need more metadata or callbacks >> that we can't foresee now. Updating the version magic and appending to the >> struct is so much easier than writing a new PEP and redesigning the entire >> extension module init process again... >> >> So, yes, exporting a struct with module metadata and callbacks sounds like >> a very generic and straight forward interface to me. > > Unless such an API is very carefully designed, it would be easy to > fall into the trap of creating an API along the lines of the way C > level extension class definitions were traditionally defined. That's a > pretty horrible user experience if you're defining them by hand, which > is why a lot of folks tend to cargo cult an existing class definition > (and fill in the pieces they need), or else let something like Cython, > SWIG or Boost deal with the problem for them. One of the biggest > hassles with using a static struct is that you end up with a lot of > cryptic padding to cover the slots you don't care about in order to > get to the slots you actually do care about. *sigh* Yeah, I'm very much looking forward to the day Python moves to C99, and everyone can use designated initializers. > The API design for defining types through the stable ABI > (https://www.python.org/dev/peps/pep-0384/#type-objects), which was > designed with the benefit of years of experience with the old > approach, is much nicer, as the NULL-terminated list of named slots > lets you only worry about the slots you care about, and the > interpreter takes care of everything else. Well, if we end up needing to extend PyModuleDef, let's use slots. The idea of extending ModuleDef brings me back to the runpy problem. I don't think it's actually necessary for "-m" to mean "exec the module in an object named "__main__". Let's provide a slot for a main function, and have runpy call that. This would mean in Cython modules the "if __name__ == "__main__" hack won't work, ever (as opposed to that being a bug this PEP can help fix). Is that an acceptable loss? (Maybe my next PEP should be letting Python modules define a __main__function, and slowly deprecating the things runpy needs to do.) Another possible extension is hooks for resources. Imagine using Cython like zipapp, to pack an entire app including extensions into one file. > With the current design of PEP 489, the idea is that if you don't > really care about the module object, you just define Exec, and the > interpreter gives you a standard Python level module object. All your > global state still gets stored as Python objects, and you just get the > "C execution model with the Python data model" development experience > which is actually quite a nice environment to program in. > > However, if you want straighforward access to the C *data* model at > runtime as well as its execution model, then you can define Create and > use the existing PyModule_Create APIs, or (as a new feature) a custom > module subclass or a completely custom type, to define how your module > state is stored. The problem is that to add C data, you'd either need to define an whole extra hook, or jump through inefficient PyCapsule hoops on every access. I worry that module authors will just take the path of least resistance, and use static data. I think it's substantially better to say "use sizeof(mydata) instead of 0, and use this fast function/macro to get at your data". > That two level approach gives you all the same flexibility you have > today by defining a custom Init hook (and more), but also lets you opt > out of learning most of the details of the C data model if all you're > really after is faster low level manipulation of data stored in Python > objects. A module def array additionally gives: - support for non-ASCII module names - a catalog of the modules the extension contains but you can't use custom module subclasses -- unless a create slot is added to the module def. (Or you can replace the sys.modules entry -- I believe the overhead of a wasted empty module object is negligible.) From ncoghlan at gmail.com Sat Mar 21 12:04:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 Mar 2015 21:04:09 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

Message-ID: On 21 March 2015 at 20:04, Stefan Behnel wrote: > Nick Coghlan schrieb am 21.03.2015 um 09:17: >> I think my underlying assumption was that Cython would >> typically use Create+Exec for speed, but might offer a slower >> Exec-only option to get a more "Python-like" module behaviour that >> allowed Cython acceleration of directory, zipfile and package __main__ >> modules, along with other modules intended to be executed with the >> "-m" switch. > > How would these discourage (or disallow) usage of Create? __main__ is a builtin module created by the interpreter during startup and directly linked to things like the "-i" switch. It's not created during the import process like a normal module, even when using -m. So at the moment, all those execution mechanisms are limited to Python source and compiled bytecode files - the don't support extension modules at all. PEP 489 offers the opportunity to extend that support to Exec-only extension modules, but doesn't do anything to improve the situation for extension modules that also define Create. >> On the capsule side of things, I think it's good to facilitate that as >> an alternative to having C extension modules link directly to each >> other > > Is anyone really doing that? AFAIK, it's not even portable. > > Cython wraps all user exported APIs as pointers in capsules and > automatically unpacks them on the other side at import time. It even shares > its own internal extension types (function, generator, memoryview, etc.) > across modules these days, all using capsules. Right, I was technically thinking of shared dependencies on common external libraries, rather than linking directly to each other. Either way, improving the discoverability and usability of the capsule mechanism would be valuable - at the moment you either have to "just know" it exists, or else be using something like Cython which takes care of setting it up for you. >> I briefly looked into C level UTF-8 support when adding a Unicode >> literal to the org() and chr() docs (I originally had it in the >> docstring as well, and it was pointed out in review that that might >> cause problems), and I'm not sure it's possible to sensibly support >> arbitrary Unicode module names for extension modules while our >> baseline assumption at the C level is C89 compatibility. We should >> definitely aim to cope with the fact that extension module names >> *might* contain arbitrary Unicode some day, even if we don't >> officially support that yet. > > The main problem with the current scheme is that the name of the module > file must match the name of the exported symbol(s), and the module file > name is search by the imported name. So there is a direct link between the > (potentially non-ASCII) imported module name and the name of the > (ASCII-only) exported entry point symbols. And the exported symbol names > must be globally unique to support platforms with flat symbol namespaces. > Uncoupling the imported module name from either the file name or the symbol > name or even both isn't entirely obvious. > > I mean, ok, you could use a hash, or rather encode the name in punicode > (and replace "-" by "_" in the symbol name). That would at least keep it > somewhat readable for latin based scripts, while being fully backwards > compatible to what we have (that was the whole point of the punicode > design). Actually, why not just do that? :) We do have a punycode encoder in the standard library, so it should be possible to use that to determine a suitable hook name when given a non-ASCII extension module to load. For example: >>> "m?nchen".encode("punycode").replace(b"-", b"_") b'mnchen_3ya' That would make it possible for "import m?nchen" to work with an extension module by looking for a file named "m?nchen", but using "PyInit_mnchen_3ya", "PyModuleCreate_mnchen_3ya" and "PyModuleCreate_mnchen_3ya" as the hooks to look for. It wouldn't be pretty to write by hand, but it should be fine for extension module generators like Cython and SWIG. >> I thought Brett actually implemented multi-module extension support a >> while back (which this PEP would then inherit), but I can't find any >> current evidence of that change, so either my recollection is wrong, >> or my search skills are failing me :) > > How should that work? Would it just try to look up all "PyInit_*" symbols > and call them? In arbitrary order? I think this may be what Petr was referring to when he said the current multi-module scheme only supported *loading* multiple modules from the same file, but not finding them. You need to use OS level symlinks or a similar mechanism to get the current finder to work in this situation (and as you say, it's not clear what a finder would look like in the absence of such filesystem level assistance - we can't afford to scan every shared library for possible symbol exports). >> It likely makes sense as a separate follow-on PEP for 3.6 though, as >> it's a further simplification of a certain way of using Create+Exec, >> and it's not clear just how you'd handle certain combinations of >> values in the current PyModuleDef struct. PEP 489 currently deals with >> that neatly by breaking out separate helper functions for initialising >> the docstring and the module globals function table that can be called >> from either Exec or Create as appropriate. > > While I agree that this can be done later, I also think that adding yet > another interface after the current change will only make it more difficult > for users to get started and get their stuff done. > > Exporting a struct does sound like the most generic and future proof > approach so far. If(f) we already assume that it will eventually become > useful, we shouldn't go for less. > > Do you have any specific problem with the PyModuleDef "value combinations" > in mind? I mean, we could always apply further restrictions on the content > of an exported PyModuleDef when used for this interface. Unexpected setups > should be easy to validate and reject by the import machinery, even if it's > just because it's "not currently supported". Being strict is easy here. The main thing that makes me wary is the redesign of the type definition system in PEP 384 to move away from exporting a static struct to declare new type objects. We did that because it made evolving the definition of type objects in an ABI compatible way very difficult. On the other hand, the main problem there was really the giant collection of slot pointers, which PyType_FromSpec replaced with a null-terminated array of slot definitions, as well as with the fact you were exporting the type struct directly. By contrast, PyModuleDef is already distinct from the actual internal layout of CPython module objects, and PyModuleDef.m_methods is already a null-terminated array of PyMethodDef entries. So, if we went down this path you *wouldn't* be able to completely customise module creation - you'd just have the option of exporting a PyModuleDef struct that the interpreter would then pass to PyModule_Create() on your behalf. If you wanted to replace the extension module with a different kind of object entirely, you'd swap it out of sys.modules in your Exec implementation, just as pure Python modules can replace themselves in module level code. The big advantage of this approach is that it ties PEP 489 directly back to the module state management enhancements in PEP 3121 - if you need more control than the new PyModule_SetDocString and PyModule_AddFunctions interfaces give you, then you need to export a PyModuleDef to define how the module gets created, including the ability to set m_size to -1 to indicate you're using C level module globals, or to > 0 to reserve additional space for module state. I think you've sold me on the idea - I'm not seeing any major downsides any more, and a lot of enhancements. The one refinement I would make is to allow "m_name" to be NULL to request that the import machinery fill it in automatically. >> With the current design of PEP 489, the idea is that if you don't >> really care about the module object, you just define Exec, and the >> interpreter gives you a standard Python level module object. All your >> global state still gets stored as Python objects, and you just get the >> "C execution model with the Python data model" development experience >> which is actually quite a nice environment to program in. >> >> However, if you want straighforward access to the C *data* model at >> runtime as well as its execution model, then you can define Create and >> use the existing PyModule_Create APIs, or (as a new feature) a custom >> module subclass or a completely custom type, to define how your module >> state is stored. >> >> That two level approach gives you all the same flexibility you have >> today by defining a custom Init hook (and more), but also lets you opt >> out of learning most of the details of the C data model if all you're >> really after is faster low level manipulation of data stored in Python >> objects. > > I'm ok with either, but I'd really like to avoid replacing the new scheme > by yet another new scheme in the future. Aye, you've persuaded me that we don't need to allow full customisation - an implicit call to PyModule_Create() should suffice. We're going to need to be careful in the interaction with PEP 384, though. Currently, that has the call to PyModule_Create() in the extension module with the PyMethodDef declaration, and passes in information regarding the expected CPython ABI. That allows the interpreter to process the MethodDef appropriately for the stable ABI. I'm not sure how the details of that work internally myself, so Petr would need to check into the consequences before committing to change the PEP. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Mar 21 12:37:57 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 Mar 2015 21:37:57 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <550D483D.2080007@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> Message-ID: /me dredges up some dim dark history regarding the evolution of the "-m" implementation :) On 21 March 2015 at 20:30, Petr Viktorin wrote: > The idea of extending ModuleDef brings me back to the runpy problem. I don't > think it's actually necessary for "-m" to mean "exec the module in an object > named "__main__". Let's provide a slot for a main function, and have runpy > call that. > This would mean in Cython modules the "if __name__ == "__main__" hack won't > work, ever (as opposed to that being a bug this PEP can help fix). Is that > an acceptable loss? runpy lets you set "__name__" to whatever you like, so triggering "if __name__ ..." blocks isn't the problem - it's playing nice with other code that assumes __main__ is a true singleton module that lasts the entire lifecycle of the process (or at least from PyInit to PyFinalize) > (Maybe my next PEP should be letting Python modules define a > __main__function, and slowly deprecating the things runpy needs to do.) One thing about __main__ that makes it genuinely special is that it's the namespace that the interpreter drops you into as a result of passing -i at the command line or setting PYTHONINSPECT=1 in the environment (either beforehand or while the application is running). Earlier runpy based implementations of -m broke that by running the code in a separate namespace rather than in the actual builtin __main__ module, while later implementations fixed it by using the real __main__ to run the code. So if we wanted to allow -m to support execution of extension modules with module level state, then one key thing to do would be to add a mechanism to replace __main__ *for real*, such that PYTHONINSPECT dropped you into the replacement namespace, rather than the original builtin one. Unfortunately, you then run into the problem that various package __init__ methods may have seen the original __main__ before runpy got a chance to swap it out - there's certainly code out there in the wild that assumes __main__ is reliably a true singleton module, one that never changes identity while the interpreter is capable of running Python bytecode. That's part of why it's the only module where its __spec__ may change depending on the phase of bootstrapping you're at - it starts out advertising itself as a builtin module, but that may change later on in the startup sequence depending on exactly what you invoked as __main__. (My vague recollection is that the largest number of states it can run through during any given startup sequence is 3, but the total number of different possible states is on the order of 6 or 7. It's been a while though, so I may be misremembering both numbers) This "__main__ is __main__" assumption is one I've never been game to even consider breaking - it's been a feature of Python since day 1, and it seems to me that the *kinds* of breakage people would see if they were relying on it and didn't know it would be close to incomprehensible. There's a reason I went and wrote PEP 432 after making the changes necessary to get the interpreter startup sequence to play nice with importlib in 3.3. Parts of it are some of the oldest code in CPython, it's all painfully hard to test properly, and it gets hard to tell the difference between "feature people are relying on" and "quirk of the current implementation we can safely change" :P Getting a fresh set of eyes on that code would be wonderful though - one of the reasons PEP 432 stagnated (aside from my getting busy with other things) was not having anyone else familiar enough with the entire startup sequence to really argue with me about the detailed design. (And at this point I'm rusty enough on it myself that getting back into it would be a voyage of rediscovery) Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Sat Mar 21 13:30:34 2015 From: encukou at gmail.com (Petr Viktorin) Date: Sat, 21 Mar 2015 13:30:34 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> Message-ID: <550D646A.2070003@gmail.com> On 03/21/2015 12:37 PM, Nick Coghlan wrote: > /me dredges up some dim dark history regarding the evolution of the > "-m" implementation :) > > On 21 March 2015 at 20:30, Petr Viktorin wrote: >> The idea of extending ModuleDef brings me back to the runpy problem. I don't >> think it's actually necessary for "-m" to mean "exec the module in an object >> named "__main__". Let's provide a slot for a main function, and have runpy >> call that. >> This would mean in Cython modules the "if __name__ == "__main__" hack won't >> work, ever (as opposed to that being a bug this PEP can help fix). Is that >> an acceptable loss? > > runpy lets you set "__name__" to whatever you like, so triggering "if > __name__ ..." blocks isn't the problem - it's playing nice with other > code that assumes __main__ is a true singleton module that lasts the > entire lifecycle of the process (or at least from PyInit to > PyFinalize) > >> (Maybe my next PEP should be letting Python modules define a >> __main__function, and slowly deprecating the things runpy needs to do.) > > One thing about __main__ that makes it genuinely special is that it's > the namespace that the interpreter drops you into as a result of > passing -i at the command line or setting PYTHONINSPECT=1 in the > environment (either beforehand or while the application is running). > > Earlier runpy based implementations of -m broke that by running the > code in a separate namespace rather than in the actual builtin > __main__ module, while later implementations fixed it by using the > real __main__ to run the code. I see. Thanks for the write-up. It seems that if runpy knew the ModuleDef for __main__ (and the ModuleDef didn't define a custom Create slot), it could look at m_size and allocate md_state appropriately. Then the module object itself wouldn't change, but the C storage would be available when Exec is called. Does anything that would prevent this come to mind? > > So if we wanted to allow -m to support execution of extension modules > with module level state, then one key thing to do would be to add a > mechanism to replace __main__ *for real*, such that PYTHONINSPECT > dropped you into the replacement namespace, rather than the original > builtin one. > > Unfortunately, you then run into the problem that various package > __init__ methods may have seen the original __main__ before runpy got > a chance to swap it out - there's certainly code out there in the wild > that assumes __main__ is reliably a true singleton module, one that > never changes identity while the interpreter is capable of running > Python bytecode. That's part of why it's the only module where its > __spec__ may change depending on the phase of bootstrapping you're at > - it starts out advertising itself as a builtin module, but that may > change later on in the startup sequence depending on exactly what you > invoked as __main__. (My vague recollection is that the largest number > of states it can run through during any given startup sequence is 3, > but the total number of different possible states is on the order of 6 > or 7. It's been a while though, so I may be misremembering both > numbers) > > This "__main__ is __main__" assumption is one I've never been game to > even consider breaking - it's been a feature of Python since day 1, > and it seems to me that the *kinds* of breakage people would see if > they were relying on it and didn't know it would be close to > incomprehensible. > > There's a reason I went and wrote PEP 432 after making the changes > necessary to get the interpreter startup sequence to play nice with > importlib in 3.3. Parts of it are some of the oldest code in CPython, > it's all painfully hard to test properly, and it gets hard to tell the > difference between "feature people are relying on" and "quirk of the > current implementation we can safely change" :P > > Getting a fresh set of eyes on that code would be wonderful though - > one of the reasons PEP 432 stagnated (aside from my getting busy with > other things) was not having anyone else familiar enough with the > entire startup sequence to really argue with me about the detailed > design. (And at this point I'm rusty enough on it myself that getting > back into it would be a voyage of rediscovery) > > Regards, > Nick. > From ncoghlan at gmail.com Sat Mar 21 14:55:28 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 21 Mar 2015 23:55:28 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <550D646A.2070003@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <550D646A.2070003@gmail.com> Message-ID: On 21 March 2015 at 22:30, Petr Viktorin wrote: > On 03/21/2015 12:37 PM, Nick Coghlan wrote: >> Earlier runpy based implementations of -m broke that by running the >> code in a separate namespace rather than in the actual builtin >> __main__ module, while later implementations fixed it by using the >> real __main__ to run the code. > > > I see. Thanks for the write-up. > It seems that if runpy knew the ModuleDef for __main__ (and the ModuleDef > didn't define a custom Create slot), it could look at m_size and allocate > md_state appropriately. Then the module object itself wouldn't change, but > the C storage would be available when Exec is called. Ah, true, I hadn't thought about that - I'd only thought about this in the context of the earlier completely custom module creation design. Restricting things to PEP 3121 module state helps a great deal. > Does anything that would prevent this come to mind? It would likely require an additional low level API helper somewhere, as the only way I can see it working without completely breaking the importlib abstraction is for runpy to call create_module on the loader, get the custom module back, and then have a way to say "make __main__ look like this", which will either succeed (in which case runpy proceeds to calling exec_module on the real __main__ instead of the module returned from create_module), or it fails (in which case runpy bails out explaining that the requested module can't be run as the main module) runpy still needs to be updated for PEP 451 in general though :( Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Sat Mar 21 16:13:48 2015 From: encukou at gmail.com (Petr Viktorin) Date: Sat, 21 Mar 2015 16:13:48 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <550D646A.2070003@gmail.com> Message-ID: <550D8AAC.3090405@gmail.com> On 03/21/2015 02:55 PM, Nick Coghlan wrote: > On 21 March 2015 at 22:30, Petr Viktorin wrote: >> On 03/21/2015 12:37 PM, Nick Coghlan wrote: >>> Earlier runpy based implementations of -m broke that by running the >>> code in a separate namespace rather than in the actual builtin >>> __main__ module, while later implementations fixed it by using the >>> real __main__ to run the code. >> >> >> I see. Thanks for the write-up. >> It seems that if runpy knew the ModuleDef for __main__ (and the ModuleDef >> didn't define a custom Create slot), it could look at m_size and allocate >> md_state appropriately. Then the module object itself wouldn't change, but >> the C storage would be available when Exec is called. > > Ah, true, I hadn't thought about that - I'd only thought about this in > the context of the earlier completely custom module creation design. > Restricting things to PEP 3121 module state helps a great deal. > >> Does anything that would prevent this come to mind? > > It would likely require an additional low level API helper somewhere, > as the only way I can see it working without completely breaking the > importlib abstraction is for runpy to call create_module on the > loader, get the custom module back, and then have a way to say "make > __main__ look like this", which will either succeed (in which case > runpy proceeds to calling exec_module on the real __main__ instead of > the module returned from create_module), or it fails (in which case > runpy bails out explaining that the requested module can't be run as > the main module) If extensions export a list of PyModuleDef, and there's a way to get at the defs, the create_module step can be skipped. But a special low-level helper is needed either way. > runpy still needs to be updated for PEP 451 in general though :( Yes. Given all the special cases, is's starting to look like this will need to happen when this PEP is implemented, to make sure things really work. From stefan_ml at behnel.de Sat Mar 21 18:38:40 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 21 Mar 2015 18:38:40 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <550D483D.2080007@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> Message-ID: Petr Viktorin schrieb am 21.03.2015 um 11:30: > It would be nice to extend runpy to handle Create+Exec modules. If this can > be pulled off, there'd be no need for Exec-only modules except the > convenience. > > * module reloading is useless for extension modules ? a changed version > version can't be read from the disk, and correct reload behavior is another > corner case for authors to think about I think even shared library reloading could be achieved by using a filename scheme like "modulename-HASH.so" with a SHA hash of the source file or so, if the original module name is used to run the right module init function(s). The files would pile up in memory, though (there's usually no "dynamic unlinking"), so it's not a feature for production. I generally agree that there is little enough of a use case for reloading that it can safely be ignored. > One thing I'm not clear about: what are the advantages of a module subclass > over a normal module with m_size>0? Properties and methods. In fact, you should rather ask why module objects have to be special in the first place. My initial idea was to implement *only* an extension type in extension modules, and have the library loader instantiate that. It would simply pass the module spec as constructor argument. However, Nick convinced me at the time that that's a) too inflexible and b) too cumbersome for manually written code. That eventually brought up the idea of splitting the initialisation into Create+Exec. >> I thought Brett actually implemented multi-module extension support a >> while back (which this PEP would then inherit), but I can't find any >> current evidence of that change, so either my recollection is wrong, >> or my search skills are failing me :) > > It's there, grep issue16421. Thanks. I didn't know about it. > Separating Create and Exec has these effects: > - Allowing you to implement just one and leave the rest to default > machinery. This is good. > - Allowing some time to pass between Create and Exec is called. This might > be useful for lazy loading, I guess. > - Allowing the loader or third-party code to modify the object between > Create and Exec is called. This is dangerous (for consenting adults who > don't mind the occasional segfault). Depends on what they do with the object. Setting attributes on it should be ok, for example. In fact, I would like to leave it to CPython to set attributes like "__name__" and "__file__" on it, because that simplifies the implementation of a Create function. From time to time, the module interface is extended with new attributes, so setting them externally avoids the need to adapt the user code each time. However, an API helper function could be provided that copies attributes from the module spec to the 'module' object. Calling that is simple enough, and it would leave the responsibility for the evolution of the "standard module API" in CPython. > - Allowing Exec to be called multiple times after Create, i.e. module > reloading. I don't think there is a use case (and for module-specific cases > it can be done in a separately exported function). > - Allowing Exec without the corresponding Create, i.e. loading into > arbitrary objects. This is cool, and it mimics what source modules can do, > but I'm less and less convinced that it's actually useful. > > It's a lot to think about if you want to design a module that behaves > correctly, and for some combinations it's not clear what "correctly" means. I agree. I think we can leave out these two "features". >> The API design for defining types through the stable ABI >> (https://www.python.org/dev/peps/pep-0384/#type-objects), which was >> designed with the benefit of years of experience with the old >> approach, is much nicer, as the NULL-terminated list of named slots >> lets you only worry about the slots you care about, and the >> interpreter takes care of everything else. > > Well, if we end up needing to extend PyModuleDef, let's use slots. That means we have to enable support for that now. And we have to integrate it with the way to provide the PyModuleDef in the first place (note that extending PyModuleDef itself is not an option due to the stable ABI). Meaning, users who don't want to provide a Create function will still have to deal with the (empty) slots, and everyone else will currently have to provide a one-slot "create" entry. I'm not saying it's a bad idea, but it might not be a good one either. > Another possible extension is hooks for resources. Imagine using Cython > like zipapp, to pack an entire app including extensions into one file. This can already be done. Note that there is no actual need for a native module to be called by "python -m". You can also just add a C main() function and start up an embedded CPython runtime in it. Cython can already generate this main() function for you. However, being able to "python -m" a native module (or package) would be nice for consistency and also support running it from the PYTHONPATH, which is a major convenience feature. >> With the current design of PEP 489, the idea is that if you don't >> really care about the module object, you just define Exec, and the >> interpreter gives you a standard Python level module object. All your >> global state still gets stored as Python objects, and you just get the >> "C execution model with the Python data model" development experience >> which is actually quite a nice environment to program in. >> >> However, if you want straighforward access to the C *data* model at >> runtime as well as its execution model, then you can define Create and >> use the existing PyModule_Create APIs, or (as a new feature) a custom >> module subclass or a completely custom type, to define how your module >> state is stored. > > The problem is that to add C data, you'd either need to define an whole > extra hook, or jump through inefficient PyCapsule hoops on every access. I > worry that module authors will just take the path of least resistance, and > use static data. I think it's substantially better to say "use > sizeof(mydata) instead of 0, and use this fast function/macro to get at > your data". Yes, there should be a fast default way to do that. Otherwise, people will just invent their own. The advantage of subinterpreter support and module finalisation isn't immediately obvious, the advantage of fast access to global state definitely is. >> That two level approach gives you all the same flexibility you have >> today by defining a custom Init hook (and more), but also lets you opt >> out of learning most of the details of the C data model if all you're >> really after is faster low level manipulation of data stored in Python >> objects. > > A module def array additionally gives: > - support for non-ASCII module names > - a catalog of the modules the extension contains > but you can't use custom module subclasses -- unless a create slot is added > to the module def. (Or you can replace the sys.modules entry -- I believe > the overhead of a wasted empty module object is negligible.) Yes, I guess it would be. However, the replacement must happen before other code might access the module (e.g. by importing it), i.e. right after putting it into sys.modules, at the very start of the Exec step. It does seem feel a hack, though, to design an interface that says "here's your module, throw it away if you like, but make sure to clean up what I left behind"... Stefan From encukou at gmail.com Sat Mar 21 19:37:20 2015 From: encukou at gmail.com (Petr Viktorin) Date: Sat, 21 Mar 2015 19:37:20 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> Message-ID: <550DBA60.90108@gmail.com> On 03/21/2015 06:38 PM, Stefan Behnel wrote: > Petr Viktorin schrieb am 21.03.2015 um 11:30: >> It would be nice to extend runpy to handle Create+Exec modules. If this can >> be pulled off, there'd be no need for Exec-only modules except the >> convenience. >> >> * module reloading is useless for extension modules ? a changed version >> version can't be read from the disk, and correct reload behavior is another >> corner case for authors to think about > > I think even shared library reloading could be achieved by using a filename > scheme like "modulename-HASH.so" with a SHA hash of the source file or so, > if the original module name is used to run the right module init function(s). > > The files would pile up in memory, though (there's usually no "dynamic > unlinking"), so it's not a feature for production. I generally agree that > there is little enough of a use case for reloading that it can safely be > ignored. I think this is something to build on top of what Python will provide. The "modulename-HASH.so" file wouldn't be easily locatable, so you'd need a "modulename.py" or "modulename.so" in front of it anyway, and that could just proxy to the real module (which stays non-reloadable). Implementation is up to any iterested party :) >> One thing I'm not clear about: what are the advantages of a module subclass >> over a normal module with m_size>0? > > Properties and methods. In fact, you should rather ask why module objects > have to be special in the first place. Well, methods are already part of PyModuleDef, so that leaves properties. Module objects are special mainly because they need space for C state, otherwise any object could be used (as in the current PEP). > My initial idea was to implement *only* an extension type in extension > modules, and have the library loader instantiate that. It would simply pass > the module spec as constructor argument. However, Nick convinced me at the > time that that's a) too inflexible and b) too cumbersome for manually > written code. That eventually brought up the idea of splitting the > initialisation into Create+Exec. And after that, the current PEP is meant to discourage using Create as much as possible. But I see how it's useful to provide it. >> Separating Create and Exec has these effects: >> - Allowing you to implement just one and leave the rest to default >> machinery. This is good. >> - Allowing some time to pass between Create and Exec is called. This might >> be useful for lazy loading, I guess. >> - Allowing the loader or third-party code to modify the object between >> Create and Exec is called. This is dangerous (for consenting adults who >> don't mind the occasional segfault). > > Depends on what they do with the object. Setting attributes on it should be > ok, for example. In fact, I would like to leave it to CPython to set > attributes like "__name__" and "__file__" on it, because that simplifies > the implementation of a Create function. From time to time, the module > interface is extended with new attributes, so setting them externally > avoids the need to adapt the user code each time. I agree here, and if your module subclass doesn't support setting dunder attributes then you need a custom loader for it. > However, an API helper function could be provided that copies attributes > from the module spec to the 'module' object. Calling that is simple enough, > and it would leave the responsibility for the evolution of the "standard > module API" in CPython. The import machinery does that between create and exec; I don't think an extra helper is necessary. >> - Allowing Exec to be called multiple times after Create, i.e. module >> reloading. I don't think there is a use case (and for module-specific cases >> it can be done in a separately exported function). >> - Allowing Exec without the corresponding Create, i.e. loading into >> arbitrary objects. This is cool, and it mimics what source modules can do, >> but I'm less and less convinced that it's actually useful. >> >> It's a lot to think about if you want to design a module that behaves >> correctly, and for some combinations it's not clear what "correctly" means. > > I agree. I think we can leave out these two "features". > > >>> The API design for defining types through the stable ABI >>> (https://www.python.org/dev/peps/pep-0384/#type-objects), which was >>> designed with the benefit of years of experience with the old >>> approach, is much nicer, as the NULL-terminated list of named slots >>> lets you only worry about the slots you care about, and the >>> interpreter takes care of everything else. >> >> Well, if we end up needing to extend PyModuleDef, let's use slots. > > That means we have to enable support for that now. And we have to integrate > it with the way to provide the PyModuleDef in the first place (note that > extending PyModuleDef itself is not an option due to the stable ABI). > Meaning, users who don't want to provide a Create function will still have > to deal with the (empty) slots, and everyone else will currently have to > provide a one-slot "create" entry. > > I'm not saying it's a bad idea, but it might not be a good one either. I meant slots as in PEP 0384 PyType_Slot ? there'd be no empty slots to deal with, you'd just set the ones to use. It does mean deprecating PyModuleDef, though. >>> That two level approach gives you all the same flexibility you have >>> today by defining a custom Init hook (and more), but also lets you opt >>> out of learning most of the details of the C data model if all you're >>> really after is faster low level manipulation of data stored in Python >>> objects. >> >> A module def array additionally gives: >> - support for non-ASCII module names >> - a catalog of the modules the extension contains >> but you can't use custom module subclasses -- unless a create slot is added >> to the module def. (Or you can replace the sys.modules entry -- I believe >> the overhead of a wasted empty module object is negligible.) > > Yes, I guess it would be. However, the replacement must happen before other > code might access the module (e.g. by importing it), i.e. right after > putting it into sys.modules, at the very start of the Exec step. > > It does seem feel a hack, though, to design an interface that says "here's > your module, throw it away if you like, but make sure to clean up what I > left behind"... Yes, it is a hack (and to be honest I think supporting properties on modules should feel hacky). Though a Create slot on module def would avoid the need for such a hack. From encukou at gmail.com Tue Mar 24 17:34:34 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 24 Mar 2015 17:34:34 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> Message-ID: <5511921A.5070303@gmail.com> I'll share my notes on an API with PEP 384-style slots, before attempting to write it out in PEP language. I struggled to find a good name for the "PyType_Spec" equivalent, since ModuleDef and ModuleSpec are both taken, but then I realized that, if the docstring is put in a slot, I just need an array of slots... Does the following look reasonable? in moduleobject.h: typedef struct PyModule_Slot{ int slot; void *pfunc; } PyModuleDesc_Slot; typedef struct PyModule_StateDef { int size; traverseproc m_traverse; inquiry m_clear; freefunc m_free; } #define Py_m_doc 1 #define Py_m_create 2 #define Py_m_methods 3 #define Py_m_statedef 4 #define Py_m_exec 5 in the extension: static PyMethodDef spam_methods[] = { {"demo", (PyCFunction)spam_demo, ...}, {NULL, NULL} }; static PyModule_StateDef spam_statedef[] = { sizeof(spam_state_t), spam_state_traverse, spam_state_clear, spam_state_free /* any of those three can be NULL if not needed */ } static PyModule_Slot spam_slots[] = { {Py_m_doc, PyDoc_STR("A spammy module")}, {Py_m_methods, spam_methods}, {Py_m_statedef, spam_statedef}, {Py_m_exec, spam_exec}, {0, NULL} } PyModuleDesc *PyModuleInit_spam { return spam_slots; } There is both a Create and Exec slot, among others ? anyone can choose what they need. If you set the Py_m_create slot, then you can't also set Py_m_state. All the other items are honored (including name and doc, which will be set by the module machinery ? but name might not match). The exec method is tied to the module; it's only called on modules created from the description (or ones that look as if they were, in runpy's case). It is called only once for each module; reload()ing an extension module will only reset import-related attributes (as it does now). If you don't set Py_m_create, you'll be able to run the module with python -m. For non-ASCII module names: the X in PyModuleGetDesc_X will be in punycode (s/-/_/), PyModuleDesc.name in UTF-8, and filename in the filesystem encoding. I've thought about supporting multiple modules per extension, but I don't see a clear way to do that. The standard ModuleSpec machinery assumes one module per file, and it's not straightforward to get around that. To load more modules from an extension, you'd need a custom finder or loader anyway. So I'm going to implement helpers needed to load a module given an arbitrary PyModuleDesc, and leave implementing multi-mod support to people who need it for now. So, an "inittab" is out for now. Perhaps a slot for automatically adding classes (from array of PyType_Spec) would help PyType_Spec adoption. And then a slot adding string/int/... constants from arrays of name/value would mean most modules wouldn't need an exec function. And an "inittab" slot should be possible for package-style extensions. I'll leave these ideas out for now, but possibilities for extending are there. On 03/21/2015 06:38 PM, Stefan Behnel wrote: > Petr Viktorin schrieb am 21.03.2015 um 11:30: >> It would be nice to extend runpy to handle Create+Exec modules. If this can >> be pulled off, there'd be no need for Exec-only modules except the >> convenience. >> >> * module reloading is useless for extension modules ? a changed version >> version can't be read from the disk, and correct reload behavior is another >> corner case for authors to think about > > I think even shared library reloading could be achieved by using a filename > scheme like "modulename-HASH.so" with a SHA hash of the source file or so, > if the original module name is used to run the right module init function(s). > > The files would pile up in memory, though (there's usually no "dynamic > unlinking"), so it's not a feature for production. I generally agree that > there is little enough of a use case for reloading that it can safely be > ignored. > > >> One thing I'm not clear about: what are the advantages of a module subclass >> over a normal module with m_size>0? > > Properties and methods. In fact, you should rather ask why module objects > have to be special in the first place. > > My initial idea was to implement *only* an extension type in extension > modules, and have the library loader instantiate that. It would simply pass > the module spec as constructor argument. However, Nick convinced me at the > time that that's a) too inflexible and b) too cumbersome for manually > written code. That eventually brought up the idea of splitting the > initialisation into Create+Exec. > > >>> I thought Brett actually implemented multi-module extension support a >>> while back (which this PEP would then inherit), but I can't find any >>> current evidence of that change, so either my recollection is wrong, >>> or my search skills are failing me :) >> >> It's there, grep issue16421. > > Thanks. I didn't know about it. > > >> Separating Create and Exec has these effects: >> - Allowing you to implement just one and leave the rest to default >> machinery. This is good. >> - Allowing some time to pass between Create and Exec is called. This might >> be useful for lazy loading, I guess. >> - Allowing the loader or third-party code to modify the object between >> Create and Exec is called. This is dangerous (for consenting adults who >> don't mind the occasional segfault). > > Depends on what they do with the object. Setting attributes on it should be > ok, for example. In fact, I would like to leave it to CPython to set > attributes like "__name__" and "__file__" on it, because that simplifies > the implementation of a Create function. From time to time, the module > interface is extended with new attributes, so setting them externally > avoids the need to adapt the user code each time. > > However, an API helper function could be provided that copies attributes > from the module spec to the 'module' object. Calling that is simple enough, > and it would leave the responsibility for the evolution of the "standard > module API" in CPython. > > >> - Allowing Exec to be called multiple times after Create, i.e. module >> reloading. I don't think there is a use case (and for module-specific cases >> it can be done in a separately exported function). >> - Allowing Exec without the corresponding Create, i.e. loading into >> arbitrary objects. This is cool, and it mimics what source modules can do, >> but I'm less and less convinced that it's actually useful. >> >> It's a lot to think about if you want to design a module that behaves >> correctly, and for some combinations it's not clear what "correctly" means. > > I agree. I think we can leave out these two "features". > > >>> The API design for defining types through the stable ABI >>> (https://www.python.org/dev/peps/pep-0384/#type-objects), which was >>> designed with the benefit of years of experience with the old >>> approach, is much nicer, as the NULL-terminated list of named slots >>> lets you only worry about the slots you care about, and the >>> interpreter takes care of everything else. >> >> Well, if we end up needing to extend PyModuleDef, let's use slots. > > That means we have to enable support for that now. And we have to integrate > it with the way to provide the PyModuleDef in the first place (note that > extending PyModuleDef itself is not an option due to the stable ABI). > Meaning, users who don't want to provide a Create function will still have > to deal with the (empty) slots, and everyone else will currently have to > provide a one-slot "create" entry. > > I'm not saying it's a bad idea, but it might not be a good one either. > > >> Another possible extension is hooks for resources. Imagine using Cython >> like zipapp, to pack an entire app including extensions into one file. > > This can already be done. Note that there is no actual need for a native > module to be called by "python -m". You can also just add a C main() > function and start up an embedded CPython runtime in it. Cython can already > generate this main() function for you. > > However, being able to "python -m" a native module (or package) would be > nice for consistency and also support running it from the PYTHONPATH, which > is a major convenience feature. > > >>> With the current design of PEP 489, the idea is that if you don't >>> really care about the module object, you just define Exec, and the >>> interpreter gives you a standard Python level module object. All your >>> global state still gets stored as Python objects, and you just get the >>> "C execution model with the Python data model" development experience >>> which is actually quite a nice environment to program in. >>> >>> However, if you want straighforward access to the C *data* model at >>> runtime as well as its execution model, then you can define Create and >>> use the existing PyModule_Create APIs, or (as a new feature) a custom >>> module subclass or a completely custom type, to define how your module >>> state is stored. >> >> The problem is that to add C data, you'd either need to define an whole >> extra hook, or jump through inefficient PyCapsule hoops on every access. I >> worry that module authors will just take the path of least resistance, and >> use static data. I think it's substantially better to say "use >> sizeof(mydata) instead of 0, and use this fast function/macro to get at >> your data". > > Yes, there should be a fast default way to do that. Otherwise, people will > just invent their own. The advantage of subinterpreter support and module > finalisation isn't immediately obvious, the advantage of fast access to > global state definitely is. > > >>> That two level approach gives you all the same flexibility you have >>> today by defining a custom Init hook (and more), but also lets you opt >>> out of learning most of the details of the C data model if all you're >>> really after is faster low level manipulation of data stored in Python >>> objects. >> >> A module def array additionally gives: >> - support for non-ASCII module names >> - a catalog of the modules the extension contains >> but you can't use custom module subclasses -- unless a create slot is added >> to the module def. (Or you can replace the sys.modules entry -- I believe >> the overhead of a wasted empty module object is negligible.) > > Yes, I guess it would be. However, the replacement must happen before other > code might access the module (e.g. by importing it), i.e. right after > putting it into sys.modules, at the very start of the Exec step. > > It does seem feel a hack, though, to design an interface that says "here's > your module, throw it away if you like, but make sure to clean up what I > left behind"... > > Stefan > From ncoghlan at gmail.com Wed Mar 25 13:11:44 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 25 Mar 2015 22:11:44 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <5511921A.5070303@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> Message-ID: On 25 March 2015 at 02:34, Petr Viktorin wrote: > I'll share my notes on an API with PEP 384-style slots, before attempting to > write it out in PEP language. > > I struggled to find a good name for the "PyType_Spec" equivalent, since > ModuleDef and ModuleSpec are both taken, but then I realized that, if the > docstring is put in a slot, I just need an array of slots... Because we're looking for an exported symbol, I think there's value in having a more clearly defined top level structure rather than just an array. PyModule_Export or PyModule_Declare come to mind, with a preference for the former (since we're exporting a module definition for CPython to import) typedef struct PyModule_Export { const char* doc; PyModule_Slot *slots; /* terminated by slot==0. */ } PyModule_Export; I prefer this mostly because it's easier to document and hence to understand - you can cover the process of creating the overall module in relation to PyModule_Export, while PyModule_Slot docs can focus on defining the *content* of the module. Having the docstring as the only expected field helps suggest that modules should at least define that much. Unlike types, we can leave the name out by default, as it will usually be implied by the file name (as is the case with Python modules). You've sold me on the idea of using a slots based API, though. However, the PEP's going to need to spend a bit more time on how to map this to the existing PyModule_Create API for modules that also want to support older versions of Python, while using the new system on 3.5+. > Does the following look reasonable? > > in moduleobject.h: > > typedef struct PyModule_Slot{ > int slot; > void *pfunc; > } PyModuleDesc_Slot; "pfunc" doesn't fit in this case, so I think a more generic field name like "value" would be needed. > typedef struct PyModule_StateDef { > int size; > traverseproc m_traverse; > inquiry m_clear; > freefunc m_free; > } > > #define Py_m_doc 1 > #define Py_m_create 2 > #define Py_m_methods 3 > #define Py_m_statedef 4 > #define Py_m_exec 5 Py_mod_*, perhaps? I'm also wondering if "exec" should move to be an "m_init" method in PyModule_StateDef, rather than an independent slot, replacing it with a PyType_Spec "types" slot as suggested below. > in the extension: > > static PyMethodDef spam_methods[] = { > {"demo", (PyCFunction)spam_demo, ...}, > {NULL, NULL} > }; > > static PyModule_StateDef spam_statedef[] = { > sizeof(spam_state_t), > spam_state_traverse, > spam_state_clear, > spam_state_free > /* any of those three can be NULL if not needed */ > } > > static PyModule_Slot spam_slots[] = { > {Py_m_methods, spam_methods}, > {Py_m_statedef, spam_statedef}, > {Py_m_exec, spam_exec}, > {0, NULL} > } PyModule_Export PyModule_Export_spam = { PyDoc_STR("A spammy module"), spam_slots } > > PyModuleDesc *PyModuleInit_spam { > return spam_slots; > } I suspect this is a holdover from an earlier iteration of the design. > > There is both a Create and Exec slot, among others ? anyone can choose what > they need. > > If you set the Py_m_create slot, then you can't also set Py_m_state. All the > other items are honored (including name and doc, which will be set by the > module machinery ? but name might not match). > > The exec method is tied to the module; it's only called on modules created > from the description (or ones that look as if they were, in runpy's case). > It is called only once for each module; reload()ing an extension module will > only reset import-related attributes (as it does now). That sounds reasonable to me. > If you don't set Py_m_create, you'll be able to run the module with python > -m. > > > For non-ASCII module names: the X in PyModuleGetDesc_X will be in punycode > (s/-/_/), PyModuleDesc.name in UTF-8, and filename in the filesystem > encoding. Adjusted appropriately for exporting a PyModule_Export struct, agreed. > I've thought about supporting multiple modules per extension, but I don't > see a clear way to do that. The standard ModuleSpec machinery assumes one > module per file, and it's not straightforward to get around that. To load > more modules from an extension, you'd need a custom finder or loader anyway. > So I'm going to implement helpers needed to load a module given an arbitrary > PyModuleDesc, and leave implementing multi-mod support to people who need it > for now. > So, an "inittab" is out for now. Symlinks should work for making the same binary file importable under different names in simple cases, and more complex cases are likely to need a custom finder and loader anyway. > Perhaps a slot for automatically adding classes (from array of PyType_Spec) > would help PyType_Spec adoption. Perhaps this one would be worth including in the initial proposal to help make it clear why we decided the slots based design was worthwhile? > And then a slot adding string/int/... constants from arrays of name/value > would mean most modules wouldn't need an exec function. For those cases, I think the module internally is likely to want fast C level access to the relevant constants - this note is the one that inspired my suggestion of moving the "exec" link into the statedef slot. > And an "inittab" slot should be possible for package-style extensions. > I'll leave these ideas out for now, but possibilities for extending are > there. If I recall correctly, there's actually a longstanding RFE somewhere for builtin packages that this change may eventually be able to help with. It was something embedding the full Qt libraries I think. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Wed Mar 25 14:36:42 2015 From: encukou at gmail.com (Petr Viktorin) Date: Wed, 25 Mar 2015 14:36:42 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> Message-ID: <5512B9EA.6000002@gmail.com> On 03/25/2015 01:11 PM, Nick Coghlan wrote: > On 25 March 2015 at 02:34, Petr Viktorin wrote: >> I'll share my notes on an API with PEP 384-style slots, before attempting to >> write it out in PEP language. >> >> I struggled to find a good name for the "PyType_Spec" equivalent, since >> ModuleDef and ModuleSpec are both taken, but then I realized that, if the >> docstring is put in a slot, I just need an array of slots... > > Because we're looking for an exported symbol, I think there's value in > having a more clearly defined top level structure rather than just an > array. OK. I'm not sure on cross-platform support of data rather than functions exported from shared libraries, so kept the hook as a function. Perhaps I'm being too paranoid here? > PyModule_Export or PyModule_Declare come to mind, with a preference > for the former (since we're exporting a module definition for CPython > to import) That's the name I was looking for, thanks! > typedef struct PyModule_Export { > const char* doc; > PyModule_Slot *slots; /* terminated by slot==0. */ > } PyModule_Export; > > I prefer this mostly because it's easier to document and hence to > understand - you can cover the process of creating the overall module > in relation to PyModule_Export, while PyModule_Slot docs can focus on > defining the *content* of the module. I don't think this is a problem. I can document creating with the PyModuleExport_ symbol, and then when say that it's an array of PyModule_Slot in the appropriate section. > Having the docstring as the only expected field helps suggest that > modules should at least define that much. Unlike types, we can leave > the name out by default, as it will usually be implied by the file > name (as is the case with Python modules). The downside is that it's additional boilerplate. PyType_Spec has a bunch of mandatory int fields, but here everything is a pointer. Also, does the docstring always need to be specified (as a constant)? I think some internal modules are fine without a docstring (see _hashlib, _multiprocessing, _elementtree, _sqlite3, ...). But if you're convinced a separate PyModule_Export structure is better, I won't fight. > You've sold me on the idea of using a slots based API, though. > However, the PEP's going to need to spend a bit more time on how to > map this to the existing PyModule_Create API for modules that also > want to support older versions of Python, while using the new system > on 3.5+. Agreed. >> Does the following look reasonable? >> >> in moduleobject.h: >> >> typedef struct PyModule_Slot{ >> int slot; >> void *pfunc; >> } PyModuleDesc_Slot; > > "pfunc" doesn't fit in this case, so I think a more generic field name > like "value" would be needed. > >> typedef struct PyModule_StateDef { >> int size; >> traverseproc m_traverse; >> inquiry m_clear; >> freefunc m_free; >> } >> >> #define Py_m_doc 1 >> #define Py_m_create 2 >> #define Py_m_methods 3 >> #define Py_m_statedef 4 >> #define Py_m_exec 5 > > Py_mod_*, perhaps? Sure. > I'm also wondering if "exec" should move to be an "m_init" method in > PyModule_StateDef, rather than an independent slot, replacing it with > a PyType_Spec "types" slot as suggested below. No. Sometimes the exec doesn't need C state. It can work with just the module dict, for example to export some methods conditionally, or export objects that aren't methods/classes/whatever there's a special slot for. [...] >> I've thought about supporting multiple modules per extension, but I don't >> see a clear way to do that. The standard ModuleSpec machinery assumes one >> module per file, and it's not straightforward to get around that. To load >> more modules from an extension, you'd need a custom finder or loader anyway. >> So I'm going to implement helpers needed to load a module given an arbitrary >> PyModuleDesc, and leave implementing multi-mod support to people who need it >> for now. >> So, an "inittab" is out for now. > > Symlinks should work for making the same binary file importable under > different names in simple cases, and more complex cases are likely to > need a custom finder and loader anyway. > >> Perhaps a slot for automatically adding classes (from array of PyType_Spec) >> would help PyType_Spec adoption. > > Perhaps this one would be worth including in the initial proposal to > help make it clear why we decided the slots based design was > worthwhile? > >> And then a slot adding string/int/... constants from arrays of name/value >> would mean most modules wouldn't need an exec function. > > For those cases, I think the module internally is likely to want fast > C level access to the relevant constants - this note is the one that > inspired my suggestion of moving the "exec" link into the statedef > slot. This is for wrapping constants that are already known at the C level. For example _ssl has a long list of these calls: PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", PY_SSL_ERROR_ZERO_RETURN); PyModule_AddIntConstant(m, "SSL_ERROR_WANT_READ", PY_SSL_ERROR_WANT_READ); PyModule_AddIntConstant(m, "SSL_ERROR_WANT_WRITE", PY_SSL_ERROR_WANT_WRITE); PyModule_AddIntConstant(m, "SSL_ERROR_WANT_X509_LOOKUP", PY_SSL_ERROR_WANT_X509_LOOKUP); PyModule_AddIntConstant(m, "SSL_ERROR_SYSCALL", PY_SSL_ERROR_SYSCALL); PyModule_AddIntConstant(m, "SSL_ERROR_SSL", PY_SSL_ERROR_SSL); PyModule_AddIntConstant(m, "SSL_ERROR_WANT_CONNECT", PY_SSL_ERROR_WANT_CONNECT); ... and so on. Many modules don't have proper error checking for this. >> And an "inittab" slot should be possible for package-style extensions. >> I'll leave these ideas out for now, but possibilities for extending are >> there. > > If I recall correctly, there's actually a longstanding RFE somewhere > for builtin packages that this change may eventually be able to help > with. It was something embedding the full Qt libraries I think. There are probably more use cases, but let's stick to the basics for now. From ncoghlan at gmail.com Thu Mar 26 05:25:45 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 26 Mar 2015 14:25:45 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <5512B9EA.6000002@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> <5512B9EA.6000002@gmail.com> Message-ID: On 25 March 2015 at 23:36, Petr Viktorin wrote: > On 03/25/2015 01:11 PM, Nick Coghlan wrote: >> >> On 25 March 2015 at 02:34, Petr Viktorin wrote: >>> >>> I'll share my notes on an API with PEP 384-style slots, before attempting >>> to >>> write it out in PEP language. >>> >>> I struggled to find a good name for the "PyType_Spec" equivalent, since >>> ModuleDef and ModuleSpec are both taken, but then I realized that, if the >>> docstring is put in a slot, I just need an array of slots... >> >> >> Because we're looking for an exported symbol, I think there's value in >> having a more clearly defined top level structure rather than just an >> array. > > > OK. > I'm not sure on cross-platform support of data rather than functions > exported from shared libraries, so kept the hook as a function. > Perhaps I'm being too paranoid here? Given that http://bugs.python.org/issue23743 came across my inbox this morning, I'm going to go with "No, you're not being too paranoid once we take C++ compilers and linkers into account". Perhaps we could make it use a new PyExport prefix though and drop the integer IDs in favour of exporting additional symbols? That is, have the hook be "PyExport_spam" with a separate "PyExport_spam_methods"? That opens the door to potentially having *other* export APIs in the future, like "PyExport_spam_codecs", "PyExport_spam_types", "PyExport_spam_constants_str". The main downside I see is potentially needing to check the shared library's list of exported symbols at import time for a potentially growing series of names, so also consider a variant of this idea that keeps the numeric slots instead of the symbol suffixes I describe. >> PyModule_Export or PyModule_Declare come to mind, with a preference >> for the former (since we're exporting a module definition for CPython >> to import) > > That's the name I was looking for, thanks! https://www.python.org/dev/peps/pep-0459/#the-python-exports-extension (which I first drafted some time back) came up in another discussion recently, and my brain finally connected it back to the C extension module API design problem :) I'm wondering if PyExportDef_Module might be a better name though (more on that below). >> typedef struct PyModule_Export { >> const char* doc; >> PyModule_Slot *slots; /* terminated by slot==0. */ >> } PyModule_Export; >> >> I prefer this mostly because it's easier to document and hence to >> understand - you can cover the process of creating the overall module >> in relation to PyModule_Export, while PyModule_Slot docs can focus on >> defining the *content* of the module. > > I don't think this is a problem. I can document creating with the > PyModuleExport_ symbol, and then when say that it's an array of > PyModule_Slot in the appropriate section. As you can see above, I realised we may be thinking about this the wrong way: we don't necessarily need to worry about making PyModule_Export itself extensible, as if we want to allow additional "addons" later, we can potentially use the C level linker namespace. In that model, each new slot would get a new suffix rather than a numeric ID. >> Having the docstring as the only expected field helps suggest that >> modules should at least define that much. Unlike types, we can leave >> the name out by default, as it will usually be implied by the file >> name (as is the case with Python modules). > > The downside is that it's additional boilerplate. PyType_Spec has a bunch of > mandatory int fields, but here everything is a pointer. A pointer which we're considering converting to (void *) and naming via a relatively opaque integer. I can see the necessity for that in the PyType_Spec case (given the huge number of slots and the fact we're creating them dynamically rather than deriving them from a shared library's exported symbols), but we're not talking anywhere near that number of slots here, and we're already coupled to the C linker semantics as that's how we find the initial export hook in the first place. > Also, does the docstring always need to be specified (as a constant)? I > think some internal modules are fine without a docstring (see _hashlib, > _multiprocessing, _elementtree, _sqlite3, ...). > > But if you're convinced a separate PyModule_Export structure is better, I > won't fight. I suspect it will be helpful if we replace the "named slots for future expansion" idea with suffixed exported symbols, but would be less useful if we keep the numbered slots. >> You've sold me on the idea of using a slots based API, though. >> However, the PEP's going to need to spend a bit more time on how to >> map this to the existing PyModule_Create API for modules that also >> want to support older versions of Python, while using the new system >> on 3.5+. > > Agreed. I suspect my new multiple exports will also make it easier to provide compatibility boilerplate that folks can use to write a backwards compatibility PyInit_spam shim, as they'll all be normal C functions that follow a defined naming scheme, whereas the numeric slots case requires a bit more work to process the slots correctly. In a "multiple exported symbols" module, the struct definitions may look something like: typedef struct PyExportDef_ModuleState { int size; traverseproc m_traverse; inquiry m_clear; freefunc m_free; } typedef export PyExportDef_Module { const char *doc; PyExportDef_ModuleState *state; } PyExportDef_Module * PyExport_spam(); int PyExport_spam_exec(PyObject *mod); OR, for complete customisation rather than using a standard module object post-processed by the exec hook: PyObject * PyExport_spam_create(PyObject *mod_spec); int PyExport_spam_exec(PyObject *mod); Exporting both the declarative PyExport_spam and the imperative PyExport_spam_create would an error. Either approach can be combined with exporting PyExport_spam_exec which would be run after all other declarative hooks. Rather than a seperate slot, easily exporting module level functions would be: PyMethodDef * PyExport_spam_methods(); (Option: alias PyMethodDef as PyExportDef_Method) >> I'm also wondering if "exec" should move to be an "m_init" method in >> PyModule_StateDef, rather than an independent slot, replacing it with >> a PyType_Spec "types" slot as suggested below. > > > No. Sometimes the exec doesn't need C state. It can work with just the > module dict, for example to export some methods conditionally, or export > objects that aren't methods/classes/whatever there's a special slot for. In the above sketch, that would be indicated by setting the "state" pointer to NULL. >>> And then a slot adding string/int/... constants from arrays of name/value >>> would mean most modules wouldn't need an exec function. >> >> For those cases, I think the module internally is likely to want fast >> C level access to the relevant constants - this note is the one that >> inspired my suggestion of moving the "exec" link into the statedef >> slot. > > > This is for wrapping constants that are already known at the C level. > For example _ssl has a long list of these calls: > PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN", > PY_SSL_ERROR_ZERO_RETURN); > PyModule_AddIntConstant(m, "SSL_ERROR_WANT_READ", > PY_SSL_ERROR_WANT_READ); > PyModule_AddIntConstant(m, "SSL_ERROR_WANT_WRITE", > PY_SSL_ERROR_WANT_WRITE); > PyModule_AddIntConstant(m, "SSL_ERROR_WANT_X509_LOOKUP", > PY_SSL_ERROR_WANT_X509_LOOKUP); > PyModule_AddIntConstant(m, "SSL_ERROR_SYSCALL", > PY_SSL_ERROR_SYSCALL); > PyModule_AddIntConstant(m, "SSL_ERROR_SSL", > PY_SSL_ERROR_SSL); > PyModule_AddIntConstant(m, "SSL_ERROR_WANT_CONNECT", > PY_SSL_ERROR_WANT_CONNECT); > > ... and so on. Many modules don't have proper error checking for this. Ah, yes, I understand. Indeed, changing that to a pair of hooks that exports a set of "name, value" pairs for integers or strings would be valuable. Continuing the naming scheme from above: PyExportDef_Str *PyExport_spam_constants_str(); PyExportDef_Int *PyExport_spam_constants_int(); Pulling this idea for your full extension example: static PyExportDef_Method spam_methods[] = { {"demo", (PyCFunction)spam_demo, ...}, {NULL, NULL} }; static PyExportDef_ModuleState spam_statedef = { sizeof(spam_state_t), spam_state_traverse, spam_state_clear, spam_state_free /* any of those three can be NULL if not needed */ } static PyExportDef_Module spam_module = { PyDoc_STR("A spammy module"), spam_exec, spam_statedef } PyExportDef_Module *PyExport_spam { return spam_module; } PyExportDef_Method *PyExport_spam_methods { return spam_methods; } Using slots instead, the last part (from spam_module down) would revert to being closer to your example: static PyExportDef_ModuleSlot spam_slots[] = { {Py_m_doc, PyDoc_STR("A spammy module")}, {Py_m_methods, spam_methods}, {Py_m_statedef, spam_statedef}, {Py_m_exec, spam_exec}, {0, NULL} } PyExportDef_ModuleSlot *PyExport_spam { return spam_slots; } So actually writing that down suggests numeric slots may still be a better idea. I like the "PyExport" and "PyExportDef" prefixes though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Thu Mar 26 11:01:49 2015 From: encukou at gmail.com (Petr Viktorin) Date: Thu, 26 Mar 2015 11:01:49 +0100 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> <5512B9EA.6000002@gmail.com> Message-ID: <5513D90D.2080905@gmail.com> On 03/26/2015 05:25 AM, Nick Coghlan wrote: > On 25 March 2015 at 23:36, Petr Viktorin wrote: >> On 03/25/2015 01:11 PM, Nick Coghlan wrote: >>> >>> On 25 March 2015 at 02:34, Petr Viktorin wrote: >>>> >>>> I'll share my notes on an API with PEP 384-style slots, before attempting >>>> to >>>> write it out in PEP language. >>>> >>>> I struggled to find a good name for the "PyType_Spec" equivalent, since >>>> ModuleDef and ModuleSpec are both taken, but then I realized that, if the >>>> docstring is put in a slot, I just need an array of slots... >>> >>> >>> Because we're looking for an exported symbol, I think there's value in >>> having a more clearly defined top level structure rather than just an >>> array. >> >> >> OK. >> I'm not sure on cross-platform support of data rather than functions >> exported from shared libraries, so kept the hook as a function. >> Perhaps I'm being too paranoid here? > > Given that http://bugs.python.org/issue23743 came across my inbox this > morning, I'm going to go with "No, you're not being too paranoid once > we take C++ compilers and linkers into account". Yeah. I'm usually against boilerplate but here the extra function will probably save headaches later. > Perhaps we could make it use a new PyExport prefix though and drop the > integer IDs in favour of exporting additional symbols? That is, have > the hook be "PyExport_spam" with a separate "PyExport_spam_methods"? > > That opens the door to potentially having *other* export APIs in the > future, like "PyExport_spam_codecs", "PyExport_spam_types", > "PyExport_spam_constants_str". I'm not convinced. I think the dynamic loading machinery is too sensitive to weird platform-specific compiler/linker details to add bits like this whenever they're needed. Having everything packaged up as a module, which registers whatever it wants when it's loaded, seems better to me. Also, I believe all these extension APIs should preferably include module names. The PyCapsule_Import naming convention is a good idea. Makes it easier to know where things come from. [...] > Pulling this idea for your full extension example: > > static PyExportDef_Method spam_methods[] = { > {"demo", (PyCFunction)spam_demo, ...}, > {NULL, NULL} > }; > > static PyExportDef_ModuleState spam_statedef = { > sizeof(spam_state_t), > spam_state_traverse, > spam_state_clear, > spam_state_free > /* any of those three can be NULL if not needed */ > } > > static PyExportDef_Module spam_module = { > PyDoc_STR("A spammy module"), > spam_exec, > spam_statedef > } > > PyExportDef_Module *PyExport_spam { > return spam_module; > } > > PyExportDef_Method *PyExport_spam_methods { > return spam_methods; > } > > Using slots instead, the last part (from spam_module down) would > revert to being closer to your example: > > static PyExportDef_ModuleSlot spam_slots[] = { > {Py_m_doc, PyDoc_STR("A spammy module")}, > {Py_m_methods, spam_methods}, > {Py_m_statedef, spam_statedef}, > {Py_m_exec, spam_exec}, > {0, NULL} > } > > PyExportDef_ModuleSlot *PyExport_spam { > return spam_slots; > } > > So actually writing that down suggests numeric slots may still be a > better idea. Yes, I think so as well. Also consider: - Python can fail hard on unknown slot numbers. Checking for unknown exports would require enumerating exported symbols, which is definitely not something I can code for every platform (if that's even possible). - Additional exported functions are not actually more type-safe ? the exported symbols would be void*, Python would still need to cast to an appropriate function type. > I like the "PyExport" and "PyExportDef" prefixes though. As above, I think modules are the only thing that should be exported, and with that, "PyModuleExport" sounds better. From ncoghlan at gmail.com Mon Mar 30 15:21:46 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 Mar 2015 23:21:46 +1000 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <5513D90D.2080905@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> <5512B9EA.6000002@gmail.com> <5513D90D.2080905@gmail.com> Message-ID: On 26 March 2015 at 20:01, Petr Viktorin wrote: > On 03/26/2015 05:25 AM, Nick Coghlan wrote: >> So actually writing that down suggests numeric slots may still be a >> better idea. > > > Yes, I think so as well. Also consider: > - Python can fail hard on unknown slot numbers. Checking for unknown exports > would require enumerating exported symbols, which is definitely not > something I can code for every platform (if that's even possible). > - Additional exported functions are not actually more type-safe ? the > exported symbols would be void*, Python would still need to cast to an > appropriate function type. OK, sold - numeric slots it is (and I agree that changes the naming scheme back to favouring retaining "PyModuleExport" as the common prefix). I think that's all the points of discussion we still had open on the current draft covered, so I'll wait for the next update before commenting further :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From encukou at gmail.com Tue Mar 31 14:49:32 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 31 Mar 2015 14:49:32 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> <5512B9EA.6000002@gmail.com> <5513D90D.2080905@gmail.com> Message-ID: <551A97DC.40702@gmail.com> On 03/30/2015 03:21 PM, Nick Coghlan wrote: > On 26 March 2015 at 20:01, Petr Viktorin wrote: >> On 03/26/2015 05:25 AM, Nick Coghlan wrote: >>> So actually writing that down suggests numeric slots may still be a >>> better idea. >> >> >> Yes, I think so as well. Also consider: >> - Python can fail hard on unknown slot numbers. Checking for unknown exports >> would require enumerating exported symbols, which is definitely not >> something I can code for every platform (if that's even possible). >> - Additional exported functions are not actually more type-safe ? the >> exported symbols would be void*, Python would still need to cast to an >> appropriate function type. > > OK, sold - numeric slots it is (and I agree that changes the naming > scheme back to favouring retaining "PyModuleExport" as the common > prefix). > > I think that's all the points of discussion we still had open on the > current draft covered, so I'll wait for the next update before > commenting further :) The next update will come with a preliminary implementation ? things are starting to look good at I do have another point though: I think we also need to implement PyState_AddModule and PyState_FindModule equivalents for slots. In the current draft, PEP 489 modules are not limited to one instance per definition: you could import one, and then (with a custom loader) import it again under a different name, and you'd get two independent modules. In effect, there is true per-module state; but PyState_AddModule/PyState_FindModule allowed global (per-interpreter) module state. I think this is a good thing; for one it will ease testing that modules are properly isolated, as required for sub-modules and module reloading. It's probably also nice for Cython's goal of emulating Python modules. On the other hand, any callback or class that neds access to the module would now have to store a reference to it. This would involve a lot of refactoring for some modules, and I don't think we can afford that. Also, some modules wrap a library that has global state (haven't checked stdlib, but curses, readline, locale are candidates). It doesn't make sense to allow loading more instances of such modules. Perhaps there should be a flag to distinguish them? Or just let them use PyState_AddModule/PyState_FindModule to prevent re-import? The need for flags would be a good argument to after all have a ModuleExport structure wrapping around slots. Such a structure could also share PyModuleDef_Base, making it usable with PyState_AddModule/PyState_FindModule (there'd be new functions, but the machinery/data structure could be reused). So I'm starting to be more inclined to do this again: typedef struct PyModule_Export { PyModuleDef_Base m_base; const char* m_doc; int m_flags; PyModule_Slot *m_slots; /* terminated by slot==0. */ } PyModule_Export; From encukou at gmail.com Tue Mar 31 14:53:57 2015 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 31 Mar 2015 14:53:57 +0200 Subject: [Import-SIG] PEP 489: Redesigning extension module loading In-Reply-To: <551A97DC.40702@gmail.com> References: <5506CEB5.7050105@gmail.com> <550AD120.9070406@gmail.com>

<550D483D.2080007@gmail.com> <5511921A.5070303@gmail.com> <5512B9EA.6000002@gmail.com> <5513D90D.2080905@gmail.com> <551A97DC.40702@gmail.com> Message-ID: On Tue, Mar 31, 2015 at 2:49 PM, Petr Viktorin wrote: > On 03/30/2015 03:21 PM, Nick Coghlan wrote: >> >> On 26 March 2015 at 20:01, Petr Viktorin wrote: >>> >>> On 03/26/2015 05:25 AM, Nick Coghlan wrote: >>>> >>>> So actually writing that down suggests numeric slots may still be a >>>> better idea. >>> >>> >>> >>> Yes, I think so as well. Also consider: >>> - Python can fail hard on unknown slot numbers. Checking for unknown >>> exports >>> would require enumerating exported symbols, which is definitely not >>> something I can code for every platform (if that's even possible). >>> - Additional exported functions are not actually more type-safe ? the >>> exported symbols would be void*, Python would still need to cast to an >>> appropriate function type. >> >> >> OK, sold - numeric slots it is (and I agree that changes the naming >> scheme back to favouring retaining "PyModuleExport" as the common >> prefix). >> >> I think that's all the points of discussion we still had open on the >> current draft covered, so I'll wait for the next update before >> commenting further :) > > > The next update will come with a preliminary implementation ? things are > starting to look good at Oops, left out the URL: https://github.com/encukou/cpython/commits/pep489 > I do have another point though: I think we also need to implement > PyState_AddModule and PyState_FindModule > equivalents for slots. > > In the current draft, PEP 489 modules are not limited to one instance per > definition: you could import one, and then (with a custom loader) import it > again under a different name, and you'd get two independent modules. > In effect, there is true per-module state; but > PyState_AddModule/PyState_FindModule allowed global (per-interpreter) module > state. > I think this is a good thing; for one it will ease testing that modules are > properly isolated, as required for sub-modules and module reloading. It's > probably also nice for Cython's goal of emulating Python modules. > On the other hand, any callback or class that neds access to the module > would now have to store a reference to it. This would involve a lot of > refactoring for some modules, and I don't think we can afford that. > > > Also, some modules wrap a library that has global state (haven't checked > stdlib, but curses, readline, locale are candidates). > It doesn't make sense to allow loading more instances of such modules. > Perhaps there should be a flag to distinguish them? Or just let them use > PyState_AddModule/PyState_FindModule to prevent re-import? > > The need for flags would be a good argument to after all have a ModuleExport > structure wrapping around slots. Such a structure could also share > PyModuleDef_Base, making it usable with PyState_AddModule/PyState_FindModule > (there'd be new functions, but the machinery/data structure could be > reused). So I'm starting to be more inclined to do this again: > > typedef struct PyModule_Export { > PyModuleDef_Base m_base; > const char* m_doc; > int m_flags; > PyModule_Slot *m_slots; /* terminated by slot==0. */ > } PyModule_Export; > >