From ericsnowcurrently at gmail.com  Sun May  3 00:22:31 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Sat, 2 May 2015 16:22:31 -0600
Subject: [Import-SIG] an old idea: getting rid of __init__.py
Message-ID: <CALFfu7BQGqSgQipSKW1cYrNf3E+zdcL_6HnkvwXU8xZW9zammw@mail.gmail.com>

When namespace packages were under discussion I remember we were
seriously considering eliminating the requirement of __init__.py for
*all* packages.  Well, I stumbled onto the following post from Guido
predating namespace packages by several years:

https://mail.python.org/pipermail/python-dev/2006-April/064400.html

Food for thought. :)

-eric

p.s. I haven't yet read through the thread, but I expect the
conversation dragged out long enough that the proposal lost steam.

From solipsis at pitrou.net  Sun May  3 00:41:07 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 3 May 2015 00:41:07 +0200
Subject: [Import-SIG] an old idea: getting rid of __init__.py
References: <CALFfu7BQGqSgQipSKW1cYrNf3E+zdcL_6HnkvwXU8xZW9zammw@mail.gmail.com>
Message-ID: <20150503004107.568b4089@fsol>

On Sat, 2 May 2015 16:22:31 -0600
Eric Snow <ericsnowcurrently at gmail.com> wrote:
> When namespace packages were under discussion I remember we were
> seriously considering eliminating the requirement of __init__.py for
> *all* packages.  Well, I stumbled onto the following post from Guido
> predating namespace packages by several years:

Well, I've already been bitten by Python mistaking a directory for a
"namespace package", just because of its simple existence. I wouldn't
want things to get any more annoying.

The argument that __init__.py is confusing to beginners is a bit
arbitrary; not requiring any __init__.py makes for situations that are
just as confusing.

Regards

Antoine.


> 
> https://mail.python.org/pipermail/python-dev/2006-April/064400.html
> 
> Food for thought. :)
> 
> -eric
> 
> p.s. I haven't yet read through the thread, but I expect the
> conversation dragged out long enough that the proposal lost steam.


From ncoghlan at gmail.com  Tue May  5 09:49:19 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 5 May 2015 17:49:19 +1000
Subject: [Import-SIG] an old idea: getting rid of __init__.py
In-Reply-To: <CALFfu7BQGqSgQipSKW1cYrNf3E+zdcL_6HnkvwXU8xZW9zammw@mail.gmail.com>
References: <CALFfu7BQGqSgQipSKW1cYrNf3E+zdcL_6HnkvwXU8xZW9zammw@mail.gmail.com>
Message-ID: <CADiSq7fz+_NR4+UsKiYqY1tfGnMdzCAq_QVRmWjbCjB3L_6okQ@mail.gmail.com>

On 3 May 2015 at 08:22, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> When namespace packages were under discussion I remember we were
> seriously considering eliminating the requirement of __init__.py for
> *all* packages.

Which is what we effectively did. You only need an __init__.py now if you:

a) want module level attributes, rather than only subpackages;
b) want to run other code at package import time; or
c) want to forcibly close the package to further extension in other directories.

As Antoine notes, the implicit nature of magically scanning
directories for subpackages trades away comprehensibility for the sake
of convenience. It's main advantage is actually "that's the way other
languages handle import namespacing".

The kind of traditional package created by adding __init__.py could be
described as being more akin to a "directory module" than it is to a
pure namespace package (certainly "directory module" is an accurate
description of former single-file modules like unittest, which go out
of their way to hide the fact that they're now implemented across
multiple files).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From encukou at gmail.com  Thu May  7 17:35:02 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 07 May 2015 17:35:02 +0200
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
Message-ID: <554B8626.8000709@gmail.com>

Hello!

Based on previous discussions, particularly the lacks of objections to
repurposing ModuleDef.m_reload, I've sent an updated version of PEP 489
to the editors. I'm including a copy below.

The implementation is nearly finished, with several things missing:
- Support for non-Linuxy platforms
- PyImport_Inittab, see below
- Documentation
- porting "xx" and "xxsubtype" modules (but "xxlimited" is done)


The changes from the last update are:
- PyModuleExport -> PyModuleDef (which brings us down to two slot types,
create & exec)
- Removed "singleton modules"
- Stated that PyModule_Create, PyState_FindModule, PyState_AddModule,
PyState_RemoveModule will not work on slots-based modules.
- Added a section on C-level callbacks
- Clarified that if PyModuleExport_* returns NULL, it's as if it wasn't
defined (i.e. falls back to PyInit)
- Added API functions: PyModule_FromDefAndSpec, PyModule_ExecDef
- Added PyModule_AddMethods and PyModule_AddDocstring helpers
- Added PyMODEXPORT_FUNC macro for x-platform declarations of the export
function
- Added summary of API changes
- Added example code for a backwards-compatible module
- Changed modules ported in the initial implementation to "array" and "xx*"
- Changed ImportErrors to SystemErrors in cases where the module is
badly written (and to mirror what PyInit does now)
- Several typo fixes and clarifications


Some further thoughts:

The docstring and methods are initialized in the creation step, rather
than exec. I don't think it's important enough to do this in exec, and
this way the implementation is easier (with respect to NULL slots, and
backwards compatibility with PyInit-based modules where Exec is a no-op).

As I was implementing this, I ran into PyImport_Inittab. I'll need to
add a similar list of PyModuleDefs.


And now for the PEP:

-- 

PEP: 489
Title: Redesigning extension module loading
Version: $Revision$
Last-Modified: $Date$
Author: Petr Viktorin <encukou at gmail.com>,
        Stefan Behnel <stefan_ml at behnel.de>,
        Nick Coghlan <ncoghlan at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.5
Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015
Resolution:


Abstract
========

This PEP proposes a redesign of the way in which extension modules interact
with the import machinery. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve them
by bringing extension modules closer to the way Python modules behave;
specifically to hook into the ModuleSpec-based loading mechanism
introduced in PEP 451.

This proposal draws inspiration from PyType_Spec of PEP 384 to allow
extension
authors to only define features they need, and to allow future additions
to extension module declarations.

Extensions modules are created in a two-step process, fitting better into
the ModuleSpec architecture, with parallels to __new__ and __init__ of
classes.

Extension modules can safely store arbitrary C-level per-module state in
the module that is covered by normal garbage collection and supports
reloading and sub-interpreters.
Extension authors are encouraged to take these issues into account
when using the new API.

The proposal also allows extension modules with non-ASCII names.


Motivation
==========

Python modules and extension modules are not being set up in the same way.
For Python modules, the module is created and set up first, then the module
code is being executed (PEP 302).
A ModuleSpec object (PEP 451) is used to hold information about the module,
and passed to the relevant hooks.

For extensions, i.e. shared libraries, the module
init function is executed straight away and does both the creation and
initialization. The initialization function is not passed the ModuleSpec,
or any information it contains, such as the __file__ or fully-qualified
name. This hinders relative imports and resource loading.

In Py3, modules are also not being added to sys.modules, which means that a
(potentially transitive) re-import of the module will really try to
re-import
it and thus run into an infinite loop when it executes the module init
function
again. Without the FQMN, it is not trivial to correctly add the module to
sys.modules either.
This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of __file__ and __name__
information hinders the compilation of "__init__.py" modules, i.e. packages,
especially when relative imports are being used at module init time.

Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or interpreter reloading, and,
while
it is possible with the current infrastructure to support these
features, it is neither easy nor efficient.
Addressing these issues was the goal of PEP 3121, but many extensions,
including some in the standard library, took the least-effort approach
to porting to Python 3, leaving these issues unresolved.
This PEP keeps backwards compatibility, which should reduce pressure and
give
extension authors adequate time to consider these issues when porting.


The current process
===================

Currently, extension modules export an initialization function named
"PyInit_modulename", named after the file name of the shared library. This
function is executed by the import machinery and must return either NULL in
the case of an exception, or a fully initialized module object. The
function receives no arguments, so it has no way of knowing about its
import context.

During its execution, the module init function creates a module object
based on a PyModuleDef struct. It then continues to initialize it by adding
attributes to the module dict, creating types, etc.

In the back, the shared library loader keeps a note of the fully qualified
module name of the last module that it loaded, and when a module gets
created that has a matching name, this global variable is used to determine
the fully qualified name of the module object. This is not entirely safe
as it
relies on the module init function creating its own module object first,
but this assumption usually holds in practice.


The proposal
============

The current extension module initialization will be deprecated in favor of
a new initialization scheme. Since the current scheme will continue to be
available, existing code will continue to work unchanged, including binary
compatibility.

Extension modules that support the new initialization scheme must export
the public symbol "PyModuleExport_<modulename>", where "modulename"
is the name of the module. (For modules with non-ASCII names the symbol name
is slightly different, see "Export Hook Name" below.)

If defined, this symbol must resolve to a C function with the following
signature::

    PyModuleDef* (*PyModuleExportFunction)(void)

For cross-platform compatibility, the function should be declared as::

    PyMODEXPORT_FUNC PyModuleExport_<modulename>(void)

The function must return a pointer to a PyModuleDef structure.
This structure must be available for the lifetime of the module created from
it ? usually, it will be declared statically.

Alternatively, this function can return NULL, in which case it is as if the
symbol was not defined ? see the "Legacy Init" section.

The PyModuleDef structure will be changed to contain a list of slots,
similarly to PEP 384's PyType_Spec for types.
To keep binary compatibility, and avoid needing to introduce a new structure
(which would introduce additional supporting functions and per-module
storage),
the currently unused m_reload pointer of PyModuleDef will be changed to
hold the slots. The structures are defined as::

    typedef struct {
        int slot;
        void *value;
    } PyModuleDef_Slot;

    typedef struct PyModuleDef {
        PyModuleDef_Base m_base;
        const char* m_name;
        const char* m_doc;
        Py_ssize_t m_size;
        PyMethodDef *m_methods;
        PyModuleDef_Slot *m_slots;  /* changed from `inquiry m_reload;` */
        traverseproc m_traverse;
        inquiry m_clear;
        freefunc m_free;
    } PyModuleDef;

The *m_slots* member must be either NULL, or point to an array of
PyModuleDef_Slot structures, terminated by a slot with id set to 0
(i.e. ``{0, NULL}``).

To specify a slot, a unique slot ID must be provided.
New Python versions may introduce new slot IDs, but slot IDs will never be
recycled. Slots may get deprecated, but will continue to be supported
throughout Python 3.x.

A slot's value pointer may not be NULL, unless specified otherwise in the
slot's documentation.

The following slots are currently available, and described later:

* Py_mod_create
* Py_mod_exec

Unknown slot IDs will cause the import to fail with SystemError.

When using the new import mechanism, m_size must not be negative.
Also, the *m_name* field of PyModuleDef will not be unused during importing;
the module name will be taken from the ModuleSpec.


Module Creation
---------------

Module creation ? that is, the implementation of
ExecutionLoader.create_module ? is governed by the Py_mod_create slot.

The Py_mod_create slot
......................

The Py_mod_create slot is used to support custom module subclasses.
The value pointer must point to a function with the following signature::

    PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)

The function receives a ModuleSpec instance, as defined in PEP 451,
and the PyModuleDef structure.
It should return a new module object, or set an error
and return NULL.

This function is not responsible for setting import-related attributes
specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or
``__loader__``) on the new module.

There is no requirement for the returned object to be an instance of
types.ModuleType. Any type can be used, as long as it supports setting and
getting attributes, including at least the import-related attributes.
However, only ModuleType instances support module-specific functionality
such as per-module state.

Note that when this function is called, the module's entry in sys.modules
is not populated yet. Attempting to import the same module again
(possibly transitively), may lead to an infinite loop.
Extension authors are advised to keep Py_mod_create minimal, an in
particular
to not call user code from it.

Multiple Py_mod_create slots may not be specified. If they are, import
will fail with SystemError.

If Py_mod_create is not specified, the import machinery will create a normal
module object by PyModule_New. The name is taken from *spec*.


Post-creation steps
...................

If the Py_mod_create function returns an instance of types.ModuleType
(or subclass), or if a Py_mod_create slot is not present, the import
machinery
will do the following steps after the module is created:

* If *m_size* is specified, per-module state is allocated and made
accessible
  through PyModule_GetState
* The PyModuleDef is associated with the module, making it accessible to
  PyModule_GetDef, and enabling the m_traverse, m_clear and m_free hooks.
* The docstring is set from m_doc.
* The module's functions are initialized from m_methods.

If the Py_mod_create function does not return a module subclass, then m_size
must be 0 or negative, and m_traverse, m_clear and m_free must all be NULL.
Otherwise, SystemError is raised.


Module Execution
----------------

Module execution -- that is, the implementation of
ExecutionLoader.exec_module -- is governed by "execution slots".
This PEP only adds one, Py_mod_exec, but others may be added in the future.

Execution slots may be specified multiple times, and are processed in
the order
they appear in the slots array.
When using the default import machinery, they are processed after
import-related attributes specified in PEP 451 [#pep-0451-attributes]_
(such as ``__name__`` or ``__loader__``) are set and the module is added
to sys.modules.


The Py_mod_exec slot
....................

The entry in this slot must point to a function with the following
signature::

    int (*PyModuleExecFunction)(PyObject* module)

It will be called to initialize a module. Usually, this amounts to
setting the module's initial attributes.
The "module" argument receives the module object to initialize.

If PyModuleExec replaces the module's entry in sys.modules,
the new object will be used and returned by importlib machinery.
(This mirrors the behavior of Python modules. Note that for extensions,
implementing Py_mod_create is usually a better solution for the use cases
this serves.)

The function must return ``0`` on success, or, on error, set an
exception and
return ``-1``.


Legacy Init
-----------

If the PyModuleExport function is not defined, or if it returns NULL, the
import machinery will try to initialize the module using the
"PyInit_<modulename>" hook, as described in PEP 3121.

If the PyModuleExport function is defined, the PyInit function will be
ignored.
Modules requiring compatibility with previous versions of CPython may
implement
the PyInit function in addition to the new hook.

Modules using the legacy init API will be initialized entirely in the
Loader.create_module step; Loader.exec_module will be a no-op.

A module that supports older CPython versions can be coded as::

    #define Py_LIMITED_API
    #include <Python.h>

    static int spam_exec(PyObject *module) {
        PyModule_AddStringConstant(module, "food", "spam");
        return 0;
    }

    static PyModuleDef_Slot spam_slots[] = {
        {Py_mod_exec, spam_exec},
        {0, NULL}
    };

    static PyModuleDef spam_def = {
        PyModuleDef_HEAD_INIT,                      /* m_base */
        "spam",                                     /* m_name */
        PyDoc_STR("Utilities for cooking spam"),    /* m_doc */
        0,                                          /* m_size */
        NULL,                                       /* m_methods */
        spam_slots,                                 /* m_slots */
        NULL,                                       /* m_traverse */
        NULL,                                       /* m_clear */
        NULL,                                       /* m_free */
    };

    PyModuleDef* PyModuleExport_spam(void) {
        return &spam_def;
    }

    PyMODINIT_FUNC
    PyInit_spam(void) {
        PyObject *module;
        module = PyModule_Create(&spam_def);
        if (module == NULL) return NULL;
        if (spam_exec(module) != 0) {
            Py_DECREF(module);
            return NULL;
        }
        return module;
    }

Note that this must be *compiled* on a new CPython version, but the
resulting
shared library will be backwards compatible.
(Source-level compatibility is possible with preprocessor directives.)

If a Py_mod_create slot is used, PyInit should call its function instead of
PyModule_Create. Keep in mind that the ModuleSpec object is not available in
the legacy init scheme.


Subinterpreters and Interpreter Reloading
-----------------------------------------

Extensions using the new initialization scheme are expected to support
subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
The mechanism is designed to make this easy, but care is still required
on the part of the extension author.
No user-defined functions, methods, or instances may leak to different
interpreters.
To achieve this, all module-level state should be kept in either the module
dict, or in the module object's storage reachable by PyModule_GetState.
A simple rule of thumb is: Do not define any static data, except
built-in types
with no mutable or user-settable class attributes.

Behavior of existing module creation functions
----------------------------------------------

The PyModule_Create function will fail when used on a PyModuleDef structure
with a non-NULL m_slots pointer.
The function doesn't have access to the ModuleSpec object necessary for
"new style" module creation.

The PyState_FindModule function will return NULL, and PyState_AddModule
and PyState_RemoveModule will fail with SystemError.
PyState registration is disabled because multiple module objects may be
created from the same PyModuleDef.


Module state and C-level callbacks
----------------------------------

Due to the unavailability of PyState_FindModule, any function that needs
access
to module-level state (including functions, classes or exceptions defined at
the module level) must receive a reference to the module object (or the
particular object it needs), either directly or indirectly.
This is currently difficult in two situations:

* Methods of classes, which receive a reference to the class, but not to
  the class's module
* Libraries with C-level callbacks, unless the callbacks can receive custom
  data set at cllback registration

Fixing these cases is outside of the scope of this PEP, but will be
needed for
the new mechanism to be useful to all modules. Proper fixes have been
discussed
on the import-sig mailing list [#findmodule-discussion]_.

As a rule of thumb, modules that rely on PyState_FindModule are, at the
moment,
not good candidates for porting to the new mechanism.


New Functions
-------------

A new function and macro will be added to implement module creation.
These are similar to PyModule_Create and PyModule_Create2, except they
take an additional ModuleSpec argument, and handle module definitions with
non-NULL slots::

    PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
    PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
                                        int module_api_version)

A new function will be added to run "execution slots" on a module::

    PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)

Additionally, two helpers will be added for setting the docstring and
methods on a module::

    int PyModule_SetDocString(PyObject *, const char *)
    int PyModule_AddFunctions(PyObject *, PyMethodDef *)


Export Hook Name
----------------

As portable C identifiers are limited to ASCII, module names
must be encoded to form the PyModuleExport hook name.

For ASCII module names, the import hook is named
PyModuleExport_<modulename>, where <modulename> is the name of the module.

For module names containing non-ASCII characters, the import hook is named
PyModuleExportU_<encodedname>, where the name is encoded using CPython's
"punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix),
with hyphens ("-") replaced by underscores ("_").


In Python::

    def export_hook_name(name):
        try:
            suffix = b'_' + name.encode('ascii')
        except UnicodeEncodeError:
            suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
        return b'PyModuleExport' + suffix

Examples:

=============  ===========================
Module name    Export hook name
=============  ===========================
spam           PyModuleExport_spam
lan?m?t        PyModuleExportU_lanmt_2sa6t
???          PyModuleExportU_zck5b2b
=============  ===========================


Module Reloading
----------------

Reloading an extension module using importlib.reload() will continue to
have no effect, except re-setting import-related attributes.

Due to limitations in shared library loading (both dlopen on POSIX and
LoadModuleEx on Windows), it is not generally possible to load
a modified library after it has changed on disk.

Use cases for reloading other than trying out a new version of the module
are too rare to require all module authors to keep reloading in mind.
If reload-like functionality is needed, authors can export a dedicated
function for it.


Multiple modules in one library
-------------------------------

To support multiple Python modules in one shared library, the library can
export additional PyModuleExport* symbols besides the one that corresponds
to the library's filename.

Note that this mechanism can currently only be used to *load* extra modules,
not to *find* them.

Given the filesystem location of a shared library and a module name,
a module may be loaded with::

    import importlib.machinery
    import importlib.util
    loader = importlib.machinery.ExtensionFileLoader(name, path)
    spec = importlib.util.spec_from_loader(name, loader)
    module = importlib.util.module_from_spec(spec)
    loader.exec_module(module)
    return module

On platforms that support symbolic links, these may be used to install one
library under multiple names, exposing all exported modules to normal
import machinery.


Testing and initial implementations
-----------------------------------

For testing, a new built-in module ``_testmoduleexport`` will be created.
The library will export several additional modules using the mechanism
described in "Multiple modules in one library".

The ``_testcapi`` module will be unchanged, and will use the old API
indefinitely (or until the old API is removed).

The ``array`` and ``xx*`` modules will be converted to the new API as
part of the initial implementation.


API Changes and Additions
-------------------------

New functions:

* PyModule_FromDefAndSpec (macro)
* PyModule_FromDefAndSpec2
* PyModule_ExecDef
* PyModule_SetDocString
* PyModule_AddFunctions

New macros:

* PyMODEXPORT_FUNC
* Py_mod_create
* Py_mod_exec

New structures:

* PyModuleDef_Slot

PyModuleDef.m_reload changes to PyModuleDef.m_slots.


Possible Future Extensions
==========================

The slots mechanism, inspired by PyType_Slot from PEP 384,
allows later extensions.

Some extension modules exports many constants; for example _ssl has
a long list of calls in the form::

    PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
                            PY_SSL_ERROR_ZERO_RETURN);

Converting this to a declarative list, similar to PyMethodDef,
would reduce boilerplate, and provide free error-checking which
is often missing.

String constants and types can be handled similarly.
(Note that non-default bases for types cannot be portably specified
statically; this case would need a Py_mod_exec function that runs
before the slots are added. The free error-checking would still be
beneficial, though.)

Another possibility is providing a "main" function that would be run
when the module is given to Python's -m switch.
For this to work, the runpy module will need to be modified to take
advantage of ModuleSpec-based loading introduced in PEP 451.
Also, it will be necessary to add a mechanism for setting up a module
according to slots it wasn't originally defined with.


Implementation
==============

Work-in-progress implementation is available in a Github repository
[#gh-repo]_;
a patchset is at [#gh-patch]_.


Previous Approaches
===================

Stefan Behnel's initial proto-PEP [#stefans_protopep]_
had a "PyInit_modulename" hook that would create a module class,
whose ``__init__`` would be then called to create the module.
This proposal did not correspond to the (then nonexistent) PEP 451,
where module creation and initialization is broken into distinct steps.
It also did not support loading an extension into pre-existing module
objects.

Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype
implementation [#nicks-prototype]_.
At this time PEP 451 was still not implemented, so the prototype
does not use ModuleSpec.

The original version of this PEP used Create and Exec hooks, and allowed
loading into arbitrary pre-constructed objects with Exec hook.
The proposal made extension module initialization closer to how Python
modules
are initialized, but it was later recognized that this isn't an
important goal.
The current PEP describes a simpler solution.


References
==========

.. [#lazy_import_concerns]
   https://mail.python.org/pipermail/python-dev/2013-August/128129.html

.. [#pep-0451-attributes]
   https://www.python.org/dev/peps/pep-0451/#attributes

.. [#stefans_protopep]
   https://mail.python.org/pipermail/python-dev/2013-August/128087.html

.. [#nicks-prototype]
   https://mail.python.org/pipermail/python-dev/2013-August/128101.html

.. [#rfc-3492]
   http://tools.ietf.org/html/rfc3492

.. [#gh-repo]
   https://github.com/encukou/cpython/commits/pep489

.. [#gh-patch]
   https://github.com/encukou/cpython/compare/master...encukou:pep489.patch

.. [#findmodule-discussion]
   https://mail.python.org/pipermail/import-sig/2015-April/000959.html


Copyright
=========

This document has been placed in the public domain.

From encukou at gmail.com  Wed May 13 16:31:26 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Wed, 13 May 2015 16:31:26 +0200
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
In-Reply-To: <554B8626.8000709@gmail.com>
References: <554B8626.8000709@gmail.com>
Message-ID: <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>

On Thu, May 7, 2015 at 5:35 PM, Petr Viktorin <encukou at gmail.com> wrote:
> Hello!
>
> Based on previous discussions, particularly the lacks of objections to
> repurposing ModuleDef.m_reload, I've sent an updated version of PEP 489
> to the editors. I'm including a copy below.
>
> The implementation is nearly finished, with several things missing:
> - Support for non-Linuxy platforms
> - PyImport_Inittab, see below
> - Documentation
> - porting "xx" and "xxsubtype" modules (but "xxlimited" is done)
[...]
>
>
> Some further thoughts:
>
> The docstring and methods are initialized in the creation step, rather
> than exec. I don't think it's important enough to do this in exec, and
> this way the implementation is easier (with respect to NULL slots, and
> backwards compatibility with PyInit-based modules where Exec is a no-op).
>
> As I was implementing this, I ran into PyImport_Inittab. I'll need to
> add a similar list of PyModuleDefs.

And here I'm somewhat stumped, can someone help me find the right direction?

There's a tool called freeze, which (among other things) generates the
PyImport_Inittab, in the file config.c which looks a bit like this:

extern PyObject* PyInit__thread(void);
extern PyObject* PyInit__signal(void);
[... and so on for the other modules ...]

struct _inittab _PyImport_Inittab[] = {
    {"_thread", PyInit__thread},
    {"_signal", PyInit__signal},
    [... and so on for the other modules ...]
};

This file is generated just from a list of module names, without
loading them. So, it can't easily determine whether a module uses
PyInit_*, or PyModuleExport_*. But it needs to choose the hook name
correctly, otherwise the program will fail to link.

I can see three solutions for this problem.
I could modify freeze to inspect the modules somehow. I'm wary of
writing platform-specific code for such an edge case, though, and I'm
not sure if freeze always has access to the modules it processes,
rather than just their names.

I could introduce some way to specify which hook is used out-of band.
But that's just passing the problem on to users, not solving it.
Also, freeze is pretty minimal and I'm vaguely aware of third-party
tools that do something similar (cx_freeze, py2exe, py2app); I might
need to coordinate with them.

Or, I could keep the "PyInit_*" hook name, and allow it to return
PyModuleDef instead of a module. This is obviously a hack, and would
force me to get back down to the drawing board, but considering the
options it seems best to explore this option.
(PyInit_* and PyModuleExport_* signatures are technically compatible,
since a PyModuleDef is a PyObject)

I'd welcome your thoughts.

From ncoghlan at gmail.com  Wed May 13 18:04:24 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 May 2015 02:04:24 +1000
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
In-Reply-To: <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>
References: <554B8626.8000709@gmail.com>
 <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>
Message-ID: <CADiSq7friGngVNL-SqaWHE8imCHQROy6jBuTe4xAdSqf7e0wrg@mail.gmail.com>

On 14 May 2015 at 00:31, Petr Viktorin <encukou at gmail.com> wrote:
> Or, I could keep the "PyInit_*" hook name, and allow it to return
> PyModuleDef instead of a module. This is obviously a hack, and would
> force me to get back down to the drawing board, but considering the
> options it seems best to explore this option.
> (PyInit_* and PyModuleExport_* signatures are technically compatible,
> since a PyModuleDef is a PyObject)
>
> I'd welcome your thoughts.

Would it be feasible to go with a model where _PyImport_inittab
continues to be based on the legacy extension module initialisation
system for the time being? That would mean implementing PyInit_* would
remain required rather than optional for 3.5, but lots of folks are
going to want to provide it anyway for compatibility with 3.4 and
earlier.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From encukou at gmail.com  Thu May 14 10:10:51 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 14 May 2015 10:10:51 +0200
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
In-Reply-To: <CADiSq7friGngVNL-SqaWHE8imCHQROy6jBuTe4xAdSqf7e0wrg@mail.gmail.com>
References: <554B8626.8000709@gmail.com>
 <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>
 <CADiSq7friGngVNL-SqaWHE8imCHQROy6jBuTe4xAdSqf7e0wrg@mail.gmail.com>
Message-ID: <CA+=+wqA7F_PDuGjU8MetTQ4dJrqYo5L8d+rsk-UhHyY1mMX59g@mail.gmail.com>

On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 14 May 2015 at 00:31, Petr Viktorin <encukou at gmail.com> wrote:
>> Or, I could keep the "PyInit_*" hook name, and allow it to return
>> PyModuleDef instead of a module. This is obviously a hack, and would
>> force me to get back down to the drawing board, but considering the
>> options it seems best to explore this option.
>> (PyInit_* and PyModuleExport_* signatures are technically compatible,
>> since a PyModuleDef is a PyObject)
>>
>> I'd welcome your thoughts.
>
> Would it be feasible to go with a model where _PyImport_inittab
> continues to be based on the legacy extension module initialisation
> system for the time being? That would mean implementing PyInit_* would
> remain required rather than optional for 3.5, but lots of folks are
> going to want to provide it anyway for compatibility with 3.4 and
> earlier.

That doesn't really solve the problem, just delays it until we decide
that PyInit_* is really optional.
It would mean you couldn't take advantage of the improvements in PEP
489 (create/exec split and ModuleSpec). You'd just write more
boilerplate for no benefit (except small stuff like non-ASCII module
names).

What might be worse, it would mean that modules would have different
behavior depending on whether they're frozen or not, which would
probably result in subtle bugs you'd only find when creating frozen
binaries.

From ncoghlan at gmail.com  Thu May 14 10:48:45 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 14 May 2015 18:48:45 +1000
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
In-Reply-To: <CA+=+wqA7F_PDuGjU8MetTQ4dJrqYo5L8d+rsk-UhHyY1mMX59g@mail.gmail.com>
References: <554B8626.8000709@gmail.com>
 <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>
 <CADiSq7friGngVNL-SqaWHE8imCHQROy6jBuTe4xAdSqf7e0wrg@mail.gmail.com>
 <CA+=+wqA7F_PDuGjU8MetTQ4dJrqYo5L8d+rsk-UhHyY1mMX59g@mail.gmail.com>
Message-ID: <CADiSq7cTmoV6AeAOa8ZaFkKx1q=+W3t2iY-Myvp+fB6-rVcnGg@mail.gmail.com>

On 14 May 2015 at 18:10, Petr Viktorin <encukou at gmail.com> wrote:
> On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 14 May 2015 at 00:31, Petr Viktorin <encukou at gmail.com> wrote:
>>> Or, I could keep the "PyInit_*" hook name, and allow it to return
>>> PyModuleDef instead of a module. This is obviously a hack, and would
>>> force me to get back down to the drawing board, but considering the
>>> options it seems best to explore this option.
>>> (PyInit_* and PyModuleExport_* signatures are technically compatible,
>>> since a PyModuleDef is a PyObject)
>>>
>>> I'd welcome your thoughts.
>>
>> Would it be feasible to go with a model where _PyImport_inittab
>> continues to be based on the legacy extension module initialisation
>> system for the time being? That would mean implementing PyInit_* would
>> remain required rather than optional for 3.5, but lots of folks are
>> going to want to provide it anyway for compatibility with 3.4 and
>> earlier.
>
> That doesn't really solve the problem, just delays it until we decide
> that PyInit_* is really optional.

Yeah, I was seeing if you thought a "buy more time to think about it
further" approach might be viable here. I think you're right that we
need a better answer up front, though.

> It would mean you couldn't take advantage of the improvements in PEP
> 489 (create/exec split and ModuleSpec). You'd just write more
> boilerplate for no benefit (except small stuff like non-ASCII module
> names).
>
> What might be worse, it would mean that modules would have different
> behavior depending on whether they're frozen or not, which would
> probably result in subtle bugs you'd only find when creating frozen
> binaries.

Looking at https://hg.python.org/cpython/file/default/Tools/freeze/makeconfig.py,
I'm thinking your "out-of-band" option may be a reasonable way to go,
with a corresponding tweak to the semantics of
https://docs.python.org/3/c-api/import.html#c._inittab to permit
(initfunc) to be a pointer to a PyInit_* function OR to a
PyModuleExport_* function.

We'd then have to determine which was which at runtime when processing
the inittab internally, by checking whether or not the result of the
call was a PyModuleDef or not.

For the inittab generation side, freeze would need to be updated to:

* allow builtin modules to be specifically nominated as "initialised
modules" or "defined modules"
* allow the default handling of builtin modules not nominated as one
or the other to be configured
* for backwards compatibility, builtin modules would be treated as
initialised modules by default

If you had a new module that was export only, you'd get a link time
error looking for the init function that didn't exist if you didn't
explicitly flag it as a "defined module". Similarly, if you switched
the default to be defined modules, you'd get a link time error for a
legacy module that didn't support the new API.

Does that approach sound plausible to you?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From encukou at gmail.com  Thu May 14 14:38:43 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 14 May 2015 14:38:43 +0200
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
In-Reply-To: <CADiSq7cTmoV6AeAOa8ZaFkKx1q=+W3t2iY-Myvp+fB6-rVcnGg@mail.gmail.com>
References: <554B8626.8000709@gmail.com>
 <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>
 <CADiSq7friGngVNL-SqaWHE8imCHQROy6jBuTe4xAdSqf7e0wrg@mail.gmail.com>
 <CA+=+wqA7F_PDuGjU8MetTQ4dJrqYo5L8d+rsk-UhHyY1mMX59g@mail.gmail.com>
 <CADiSq7cTmoV6AeAOa8ZaFkKx1q=+W3t2iY-Myvp+fB6-rVcnGg@mail.gmail.com>
Message-ID: <CA+=+wqBx5DvMSXV2_mxtdJje-9=7076HH79PhmfGoCDisL=w-A@mail.gmail.com>

On Thu, May 14, 2015 at 10:48 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 14 May 2015 at 18:10, Petr Viktorin <encukou at gmail.com> wrote:
>> On Wed, May 13, 2015 at 6:04 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> On 14 May 2015 at 00:31, Petr Viktorin <encukou at gmail.com> wrote:
>>>> Or, I could keep the "PyInit_*" hook name, and allow it to return
>>>> PyModuleDef instead of a module. This is obviously a hack, and would
>>>> force me to get back down to the drawing board, but considering the
>>>> options it seems best to explore this option.
>>>> (PyInit_* and PyModuleExport_* signatures are technically compatible,
>>>> since a PyModuleDef is a PyObject)
>>>>
>>>> I'd welcome your thoughts.
>>>
>>> Would it be feasible to go with a model where _PyImport_inittab
>>> continues to be based on the legacy extension module initialisation
>>> system for the time being? That would mean implementing PyInit_* would
>>> remain required rather than optional for 3.5, but lots of folks are
>>> going to want to provide it anyway for compatibility with 3.4 and
>>> earlier.
>>
>> That doesn't really solve the problem, just delays it until we decide
>> that PyInit_* is really optional.
>
> Yeah, I was seeing if you thought a "buy more time to think about it
> further" approach might be viable here. I think you're right that we
> need a better answer up front, though.
>
>> It would mean you couldn't take advantage of the improvements in PEP
>> 489 (create/exec split and ModuleSpec). You'd just write more
>> boilerplate for no benefit (except small stuff like non-ASCII module
>> names).
>>
>> What might be worse, it would mean that modules would have different
>> behavior depending on whether they're frozen or not, which would
>> probably result in subtle bugs you'd only find when creating frozen
>> binaries.
>
> Looking at https://hg.python.org/cpython/file/default/Tools/freeze/makeconfig.py,
> I'm thinking your "out-of-band" option may be a reasonable way to go,
> with a corresponding tweak to the semantics of
> https://docs.python.org/3/c-api/import.html#c._inittab to permit
> (initfunc) to be a pointer to a PyInit_* function OR to a
> PyModuleExport_* function.
>
> We'd then have to determine which was which at runtime when processing
> the inittab internally, by checking whether or not the result of the
> call was a PyModuleDef or not.

That would work, but I don't see much of an advantage over allowing
PyInit_* itself to return either module or PyModuleDef.

> For the inittab generation side, freeze would need to be updated to:
>
> * allow builtin modules to be specifically nominated as "initialised
> modules" or "defined modules"
> * allow the default handling of builtin modules not nominated as one
> or the other to be configured
> * for backwards compatibility, builtin modules would be treated as
> initialised modules by default
>
> If you had a new module that was export only, you'd get a link time
> error looking for the init function that didn't exist if you didn't
> explicitly flag it as a "defined module". Similarly, if you switched
> the default to be defined modules, you'd get a link time error for a
> legacy module that didn't support the new API.
>
> Does that approach sound plausible to you?

I think the "initialized" vs. "exported" distinction is an
implementation detail of the module, and this would expose it too
much.
According to its README, freeze "[parses] the program (and all its
modules) and scans the generated byte code for IMPORT instructions". I
think py2exe does something similar. The end users of such tools would
need to designate which modules use init vs. export.

Allowing PyInit to optionally return PyModuleDef is a bit of a hack,
but it keeps the details isolated between the module and the import
machinery.
PyModuleDef is a PyObject, so the PyInit signature matches. Just the
PyInit name is a bit misleading :(
I think I have a favorite direction now. (Sorry for asking for
directions and then wanting to ignore them! The discussion is
helpful.)


Somewhat related: any thoughts on the legacy init example code [0]?
You asked for an example like this; is it what you had in mind? If you
compile this with a PEP-489 Python with the stable API, the .so can be
used with older Pythons as well.
I now think it's a bit silly: it would be enough to use #ifdef: define
either PyModuleExport or PyInit, depending on the Python version.
This won't do if you're targetting the stable API, but in that case
you can't use any of the new PEP 489 features anyway, so it's enough
to only define PyInit.
Or is there something I missed?


[0] https://www.python.org/dev/peps/pep-0489/#legacy-init

From ncoghlan at gmail.com  Thu May 14 18:45:54 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 15 May 2015 02:45:54 +1000
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
In-Reply-To: <CA+=+wqBx5DvMSXV2_mxtdJje-9=7076HH79PhmfGoCDisL=w-A@mail.gmail.com>
References: <554B8626.8000709@gmail.com>
 <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>
 <CADiSq7friGngVNL-SqaWHE8imCHQROy6jBuTe4xAdSqf7e0wrg@mail.gmail.com>
 <CA+=+wqA7F_PDuGjU8MetTQ4dJrqYo5L8d+rsk-UhHyY1mMX59g@mail.gmail.com>
 <CADiSq7cTmoV6AeAOa8ZaFkKx1q=+W3t2iY-Myvp+fB6-rVcnGg@mail.gmail.com>
 <CA+=+wqBx5DvMSXV2_mxtdJje-9=7076HH79PhmfGoCDisL=w-A@mail.gmail.com>
Message-ID: <CADiSq7f4BUGU=s8P2zziJHgDFShqWiEjd4KEkNNF6km1qMSXhg@mail.gmail.com>

On 14 May 2015 at 22:38, Petr Viktorin <encukou at gmail.com> wrote:
> I think the "initialized" vs. "exported" distinction is an
> implementation detail of the module, and this would expose it too
> much.
> According to its README, freeze "[parses] the program (and all its
> modules) and scans the generated byte code for IMPORT instructions". I
> think py2exe does something similar. The end users of such tools would
> need to designate which modules use init vs. export.
>
> Allowing PyInit to optionally return PyModuleDef is a bit of a hack,
> but it keeps the details isolated between the module and the import
> machinery.
> PyModuleDef is a PyObject, so the PyInit signature matches. Just the
> PyInit name is a bit misleading :(

Agreed it makes the name of PyInit_* a bit misleading, but also agreed
that it sounds like a good trick for making this work in a way that
can handle _PyImport_inittab appropriately.

In terms of documenting it in a way that lets the hook name still make
sense, perhaps we can refer to returning PyModuleDef as "multi-phase
initialisation"? That is:

- initialise the module definition
- create the module object
- execute the module body

If you *don't* return a module definition, then the import system will
assume single phase initialisation.

> I think I have a favorite direction now. (Sorry for asking for
> directions and then wanting to ignore them! The discussion is
> helpful.)

I find that seeing a suggestion I don't like often sparks new ideas as
I attempt to figure out why I don't like it :)

> Somewhat related: any thoughts on the legacy init example code [0]?
> You asked for an example like this; is it what you had in mind? If you
> compile this with a PEP-489 Python with the stable API, the .so can be
> used with older Pythons as well.
> I now think it's a bit silly: it would be enough to use #ifdef: define
> either PyModuleExport or PyInit, depending on the Python version.
> This won't do if you're targetting the stable API, but in that case
> you can't use any of the new PEP 489 features anyway, so it's enough
> to only define PyInit.
> Or is there something I missed?

I think the idea above makes it mandatory to use "#ifdef" to request
multi-phase initialisation on 3.5+ and single-phase initialisation on
earlier versions. An example of the relevant incantations might still
be useful though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From encukou at gmail.com  Thu May 14 21:04:35 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 14 May 2015 21:04:35 +0200
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
In-Reply-To: <CADiSq7f4BUGU=s8P2zziJHgDFShqWiEjd4KEkNNF6km1qMSXhg@mail.gmail.com>
References: <554B8626.8000709@gmail.com>
 <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>
 <CADiSq7friGngVNL-SqaWHE8imCHQROy6jBuTe4xAdSqf7e0wrg@mail.gmail.com>
 <CA+=+wqA7F_PDuGjU8MetTQ4dJrqYo5L8d+rsk-UhHyY1mMX59g@mail.gmail.com>
 <CADiSq7cTmoV6AeAOa8ZaFkKx1q=+W3t2iY-Myvp+fB6-rVcnGg@mail.gmail.com>
 <CA+=+wqBx5DvMSXV2_mxtdJje-9=7076HH79PhmfGoCDisL=w-A@mail.gmail.com>
 <CADiSq7f4BUGU=s8P2zziJHgDFShqWiEjd4KEkNNF6km1qMSXhg@mail.gmail.com>
Message-ID: <CA+=+wqBLgiT_YjqSc_R+9YYhAN+_JDAj43kfRyoLqXiSkxt82g@mail.gmail.com>

On Thu, May 14, 2015 at 6:45 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 14 May 2015 at 22:38, Petr Viktorin <encukou at gmail.com> wrote:
>> Allowing PyInit to optionally return PyModuleDef is a bit of a hack,
>> but it keeps the details isolated between the module and the import
>> machinery.
>> PyModuleDef is a PyObject, so the PyInit signature matches. Just the
>> PyInit name is a bit misleading :(
>
> Agreed it makes the name of PyInit_* a bit misleading, but also agreed
> that it sounds like a good trick for making this work in a way that
> can handle _PyImport_inittab appropriately.
>
> In terms of documenting it in a way that lets the hook name still make
> sense, perhaps we can refer to returning PyModuleDef as "multi-phase
> initialisation"? That is:
>
> - initialise the module definition
> - create the module object
> - execute the module body

Yes! That'll even make a much better name for the PEP; currently it
reads like "yet another change".
(I hope I can rename a PEP once submitted?)

>> Somewhat related: any thoughts on the legacy init example code [0]?
>> You asked for an example like this; is it what you had in mind? If you
>> compile this with a PEP-489 Python with the stable API, the .so can be
>> used with older Pythons as well.
>> I now think it's a bit silly: it would be enough to use #ifdef: define
>> either PyModuleExport or PyInit, depending on the Python version.
>> This won't do if you're targetting the stable API, but in that case
>> you can't use any of the new PEP 489 features anyway, so it's enough
>> to only define PyInit.
>> Or is there something I missed?
>
> I think the idea above makes it mandatory to use "#ifdef" to request
> multi-phase initialisation on 3.5+ and single-phase initialisation on
> earlier versions. An example of the relevant incantations might still
> be useful though.

Definitely.

From ncoghlan at gmail.com  Fri May 15 08:10:02 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 15 May 2015 16:10:02 +1000
Subject: [Import-SIG] PEP 489: Redesigning extension module loading;
	version 4
In-Reply-To: <CA+=+wqBLgiT_YjqSc_R+9YYhAN+_JDAj43kfRyoLqXiSkxt82g@mail.gmail.com>
References: <554B8626.8000709@gmail.com>
 <CA+=+wqAkNLMcb+1uRndLe40Csm1bcGqkq9JNCLites4YqDrEXA@mail.gmail.com>
 <CADiSq7friGngVNL-SqaWHE8imCHQROy6jBuTe4xAdSqf7e0wrg@mail.gmail.com>
 <CA+=+wqA7F_PDuGjU8MetTQ4dJrqYo5L8d+rsk-UhHyY1mMX59g@mail.gmail.com>
 <CADiSq7cTmoV6AeAOa8ZaFkKx1q=+W3t2iY-Myvp+fB6-rVcnGg@mail.gmail.com>
 <CA+=+wqBx5DvMSXV2_mxtdJje-9=7076HH79PhmfGoCDisL=w-A@mail.gmail.com>
 <CADiSq7f4BUGU=s8P2zziJHgDFShqWiEjd4KEkNNF6km1qMSXhg@mail.gmail.com>
 <CA+=+wqBLgiT_YjqSc_R+9YYhAN+_JDAj43kfRyoLqXiSkxt82g@mail.gmail.com>
Message-ID: <CADiSq7cMtfhiXXmYRSOfK+kfozSkfbzCt-_okMsvMRL_TkTPZQ@mail.gmail.com>

On 15 May 2015 05:04, "Petr Viktorin" <encukou at gmail.com> wrote:
>
> On Thu, May 14, 2015 at 6:45 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > On 14 May 2015 at 22:38, Petr Viktorin <encukou at gmail.com> wrote:
> >> Allowing PyInit to optionally return PyModuleDef is a bit of a hack,
> >> but it keeps the details isolated between the module and the import
> >> machinery.
> >> PyModuleDef is a PyObject, so the PyInit signature matches. Just the
> >> PyInit name is a bit misleading :(
> >
> > Agreed it makes the name of PyInit_* a bit misleading, but also agreed
> > that it sounds like a good trick for making this work in a way that
> > can handle _PyImport_inittab appropriately.
> >
> > In terms of documenting it in a way that lets the hook name still make
> > sense, perhaps we can refer to returning PyModuleDef as "multi-phase
> > initialisation"? That is:
> >
> > - initialise the module definition
> > - create the module object
> > - execute the module body
>
> Yes! That'll even make a much better name for the PEP; currently it
> reads like "yet another change".
> (I hope I can rename a PEP once submitted?)

Yes, renaming is fine. That's one of the advantages of using PEP numbers in
their permanent URLs, rather than their names.

Cheers,
Nick.

P.S. I think this change makes this PEP another fine example of why
reference implementations are such an important part of the process - they
usually uncover issues and implications that *nobody* had thought of yet :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150515/80032c4f/attachment-0001.html>

From encukou at gmail.com  Mon May 18 16:02:37 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Mon, 18 May 2015 16:02:37 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization;
	version 5
Message-ID: <5559F0FD.3080704@gmail.com>

Hello!

I've sent the latest update of PEP 489 to the editors. I am quite happy
with how it turned out, and I don't expect too many further changes.

In this iteration, PyModuleExport is removed, and instead PyInit can
return a PyModuleDef instead of an initialized module. This means you
can again derive the hook name from the module name, which is necessary
for PyImport_Inittab and its supporting code, and the freeze tool.
The mechanism the PEP introduces is now called "multi-phase
initialization", and the PEP is renamed to reflect that. Thanks Nick for
the discussion, and the name!

The new PEP also mentions built-in modules, which will also support
multi-phase init.

Per-module state is now allocated at the beginning of the execute step;
the presence of the state pointer is checked to prevent re-running exec
on reload.

Also, docstrings and methods from the PyModuleDef are always added to
whatever create returns, even if it's not a PyModule (sub)type.


The implementation [0] should be complete and tested now. It is at the
point of needing a second pair of eyes :)
I have made the changes for non-Linux platforms, but I have no way to
test them.
Documentation still remains to be written.

[0] https://github.com/encukou/cpython/compare/master...encukou:pep489.patch


The PEP should be live soon; in the mean time, here is the text:

PEP: 489
Title: Multi-phase extension module initialization
Version: $Revision$
Last-Modified: $Date$
Author: Petr Viktorin <encukou at gmail.com>,
        Stefan Behnel <stefan_ml at behnel.de>,
        Nick Coghlan <ncoghlan at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.5
Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015
Resolution:


Abstract
========

This PEP proposes a redesign of the way in which built-in and extension
modules
interact with the import machinery. This was last revised for Python 3.0
in PEP
3121, but did not solve all problems at the time. The goal is to solve them
by bringing extension modules closer to the way Python modules behave;
specifically to hook into the ModuleSpec-based loading mechanism
introduced in PEP 451.

This proposal draws inspiration from PyType_Spec of PEP 384 to allow
extension
authors to only define features they need, and to allow future additions
to extension module declarations.

Extensions modules are created in a two-step process, fitting better into
the ModuleSpec architecture, with parallels to __new__ and __init__ of
classes.

Extension modules can safely store arbitrary C-level per-module state in
the module that is covered by normal garbage collection and supports
reloading and sub-interpreters.
Extension authors are encouraged to take these issues into account
when using the new API.

The proposal also allows extension modules with non-ASCII names.


Motivation
==========

Python modules and extension modules are not being set up in the same way.
For Python modules, the module object is created and set up first, then the
module code is being executed (PEP 302).
A ModuleSpec object (PEP 451) is used to hold information about the module,
and passed to the relevant hooks.

For extensions (i.e. shared libraries) and built-in modules, the module
init function is executed straight away and does both the creation and
initialization. The initialization function is not passed the ModuleSpec,
or any information it contains, such as the __file__ or fully-qualified
name. This hinders relative imports and resource loading.

In Py3, modules are also not being added to sys.modules, which means that a
(potentially transitive) re-import of the module will really try to
re-import
it and thus run into an infinite loop when it executes the module init
function
again. Without access to the fully-qualified module name, it is not
trivial to
correctly add the module to sys.modules either.
This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of __file__ and __name__
information hinders the compilation of "__init__.py" modules, i.e. packages,
especially when relative imports are being used at module init time.

Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or interpreter reloading, and,
while
it is possible with the current infrastructure to support these
features, it is neither easy nor efficient.
Addressing these issues was the goal of PEP 3121, but many extensions,
including some in the standard library, took the least-effort approach
to porting to Python 3, leaving these issues unresolved.
This PEP keeps backwards compatibility, which should reduce pressure and
give
extension authors adequate time to consider these issues when porting.


The current process
===================

Currently, extension and built-in modules export an initialization function
named "PyInit_modulename", named after the file name of the shared library.
This function is executed by the import machinery and must return a fully
initialized module object.
The function receives no arguments, so it has no way of knowing about its
import context.

During its execution, the module init function creates a module object
based on a PyModuleDef object. It then continues to initialize it by adding
attributes to the module dict, creating types, etc.

In the back, the shared library loader keeps a note of the fully qualified
module name of the last module that it loaded, and when a module gets
created that has a matching name, this global variable is used to determine
the fully qualified name of the module object. This is not entirely safe
as it
relies on the module init function creating its own module object first,
but this assumption usually holds in practice.


The proposal
============

The initialization function (PyInit_modulename) will be allowed to return
a pointer to a PyModuleDef object. The import machinery will be in charge
of constructing the module object, calling hooks provided in the PyModuleDef
in the relevant phases of initialization (as described below).

This multi-phase initialization is an additional possibility. Single-phase
initialization, the current practice of returning a fully initialized module
object, will still be accepted, so existing code will work unchanged,
including binary compatibility.

The PyModuleDef structure will be changed to contain a list of slots,
similarly to PEP 384's PyType_Spec for types.
To keep binary compatibility, and avoid needing to introduce a new structure
(which would introduce additional supporting functions and per-module
storage),
the currently unused m_reload pointer of PyModuleDef will be changed to
hold the slots. The structures are defined as::

    typedef struct {
        int slot;
        void *value;
    } PyModuleDef_Slot;

    typedef struct PyModuleDef {
        PyModuleDef_Base m_base;
        const char* m_name;
        const char* m_doc;
        Py_ssize_t m_size;
        PyMethodDef *m_methods;
        PyModuleDef_Slot *m_slots;  /* changed from `inquiry m_reload;` */
        traverseproc m_traverse;
        inquiry m_clear;
        freefunc m_free;
    } PyModuleDef;

The *m_slots* member must be either NULL, or point to an array of
PyModuleDef_Slot structures, terminated by a slot with id set to 0
(i.e. ``{0, NULL}``).

To specify a slot, a unique slot ID must be provided.
New Python versions may introduce new slot IDs, but slot IDs will never be
recycled. Slots may get deprecated, but will continue to be supported
throughout Python 3.x.

A slot's value pointer may not be NULL, unless specified otherwise in the
slot's documentation.

The following slots are currently available, and described later:

* Py_mod_create
* Py_mod_exec

Unknown slot IDs will cause the import to fail with SystemError.

When using multi-phase initialization, the *m_name* field of PyModuleDef
will
not be used during importing; the module name will be taken from the
ModuleSpec.

To prevent crashes when the module is loaded in older versions of Python,
the PyModuleDef object must be initialized using the newly added
PyModuleDef_Init function.
For example, an extension module "example" would be exported as::

    static PyModuleDef example_def = {...}

    PyMODINIT_FUNC
    PyInit_example(void)
    {
        return PyModuleDef_Init(&example_def);
    }

The PyModuleDef object must be available for the lifetime of the module
created
from it ? usually, it will be declared statically.


Module Creation Phase
---------------------

Creation of the module object ? that is, the implementation of
ExecutionLoader.create_module ? is governed by the Py_mod_create slot.

The Py_mod_create slot
......................

The Py_mod_create slot is used to support custom module subclasses.
The value pointer must point to a function with the following signature::

    PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)

The function receives a ModuleSpec instance, as defined in PEP 451,
and the PyModuleDef structure.
It should return a new module object, or set an error
and return NULL.

This function is not responsible for setting import-related attributes
specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or
``__loader__``) on the new module.

There is no requirement for the returned object to be an instance of
types.ModuleType. Any type can be used, as long as it supports setting and
getting attributes, including at least the import-related attributes.
However, only ModuleType instances support module-specific functionality
such as per-module state.

Note that when this function is called, the module's entry in sys.modules
is not populated yet. Attempting to import the same module again
(possibly transitively), may lead to an infinite loop.
Extension authors are advised to keep Py_mod_create minimal, an in
particular
to not call user code from it.

Multiple Py_mod_create slots may not be specified. If they are, import
will fail with SystemError.

If Py_mod_create is not specified, the import machinery will create a normal
module object using PyModule_New. The name is taken from *spec*.


Post-creation steps
...................

If the Py_mod_create function returns an instance of types.ModuleType
or a subclass (or if a Py_mod_create slot is not present), the import
machinery will associate the PyModuleDef with the module, making it
accessible
to PyModule_GetDef, and enabling the m_traverse, m_clear and m_free hooks.

If the Py_mod_create function does not return a module subclass, then m_size
must be 0, and m_traverse, m_clear and m_free must all be NULL.
Otherwise, SystemError is raised.

Additionally, initial attributes specified in the PyModuleDef are set on the
module object, regardless of its type:

* The docstring is set from m_doc, if non-NULL.
* The module's functions are initialized from m_methods, if any.


Module Execution Phase
----------------------

Module execution -- that is, the implementation of
ExecutionLoader.exec_module -- is governed by "execution slots".
This PEP only adds one, Py_mod_exec, but others may be added in the future.

Execution slots may be specified multiple times, and are processed in
the order
they appear in the slots array.
When using the default import machinery, they are processed after
import-related attributes specified in PEP 451 [#pep-0451-attributes]_
(such as ``__name__`` or ``__loader__``) are set and the module is added
to sys.modules.


Pre-Execution steps
-------------------

Before processing the execution slots, per-module state is allocated for the
module. From this point on, per-module state is accessible through
PyModule_GetState.


The Py_mod_exec slot
....................

The entry in this slot must point to a function with the following
signature::

    int (*PyModuleExecFunction)(PyObject* module)

It will be called to initialize a module. Usually, this amounts to
setting the module's initial attributes.
The "module" argument receives the module object to initialize.

If PyModuleExec replaces the module's entry in sys.modules,
the new object will be used and returned by importlib machinery.
(This mirrors the behavior of Python modules. Note that implementing
Py_mod_create is usually a better solution for the use cases this serves.)

The function must return ``0`` on success, or, on error, set an
exception and
return ``-1``.


Legacy Init
-----------

The backwards-compatible single-phase initialization continues to be
supported.
In this scheme, the PyInit function returns a fully initialized module
rather
than a PyModuleDef object.
In this case, the PyInit hook implements the creation phase, and the
execution
phase is a no-op.

Modules that need to work unchanged on older versions of Python should not
use multi-phase initialization, because the benefits it brings can't be
back-ported.
Nevertheless, here is an example of a module that supports multi-phase
initialization, and falls back to single-phase when compiled for an older
version of CPython::

    #include <Python.h>

    static int spam_exec(PyObject *module) {
        PyModule_AddStringConstant(module, "food", "spam");
        return 0;
    }

    #ifdef Py_mod_exec
    static PyModuleDef_Slot spam_slots[] = {
        {Py_mod_exec, spam_exec},
        {0, NULL}
    };
    #endif

    static PyModuleDef spam_def = {
        PyModuleDef_HEAD_INIT,                      /* m_base */
        "spam",                                     /* m_name */
        PyDoc_STR("Utilities for cooking spam"),    /* m_doc */
        0,                                          /* m_size */
        NULL,                                       /* m_methods */
    #ifdef Py_mod_exec
        spam_slots,                                 /* m_slots */
    #else
        NULL,
    #endif
        NULL,                                       /* m_traverse */
        NULL,                                       /* m_clear */
        NULL,                                       /* m_free */
    };

    PyMODINIT_FUNC
    PyInit_spam(void) {
    #ifdef Py_mod_exec
        return PyModuleDef_Init(&spam_def);
    #else
        PyObject *module;
        module = PyModule_Create(&spam_def);
        if (module == NULL) return NULL;
        if (spam_exec(module) != 0) {
            Py_DECREF(module);
            return NULL;
        }
        return module;
    #endif
    }


Built-In modules
----------------

Any extension module can be used as a built-in module by linking it into
the executable, and including it in the inittab (either at runtime with
PyImport_AppendInittab, or at configuration time, using tools like
*freeze*).

To keep this possibility, all changes to extension module loading introduced
in this PEP will also apply to built-in modules.
The only exception is non-ASCII module names, explained below.


Subinterpreters and Interpreter Reloading
-----------------------------------------

Extensions using the new initialization scheme are expected to support
subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
The mechanism is designed to make this easy, but care is still required
on the part of the extension author.
No user-defined functions, methods, or instances may leak to different
interpreters.
To achieve this, all module-level state should be kept in either the module
dict, or in the module object's storage reachable by PyModule_GetState.
A simple rule of thumb is: Do not define any static data, except
built-in types
with no mutable or user-settable class attributes.


Functions incompatible with multi-phase initialization
------------------------------------------------------

The PyModule_Create function will fail when used on a PyModuleDef structure
with a non-NULL *m_slots* pointer.
The function doesn't have access to the ModuleSpec object necessary for
multi-phase initialization.

The PyState_FindModule function will return NULL, and PyState_AddModule
and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*.
PyState registration is disabled because multiple module objects may be
created
from the same PyModuleDef.


Module state and C-level callbacks
----------------------------------

Due to the unavailability of PyState_FindModule, any function that needs
access
to module-level state (including functions, classes or exceptions defined at
the module level) must receive a reference to the module object (or the
particular object it needs), either directly or indirectly.
This is currently difficult in two situations:

* Methods of classes, which receive a reference to the class, but not to
  the class's module
* Libraries with C-level callbacks, unless the callbacks can receive custom
  data set at callback registration

Fixing these cases is outside of the scope of this PEP, but will be
needed for
the new mechanism to be useful to all modules. Proper fixes have been
discussed
on the import-sig mailing list [#findmodule-discussion]_.

As a rule of thumb, modules that rely on PyState_FindModule are, at the
moment,
not good candidates for porting to the new mechanism.


New Functions
-------------

A new function and macro implementing the module creation phase will be
added.
These are similar to PyModule_Create and PyModule_Create2, except they
take an additional ModuleSpec argument, and handle module definitions with
non-NULL slots::

    PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
    PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
                                        int module_api_version)

A new function implementing the module execution phase will be added.
This allocates per-module state (if not allocated already), and *always*
processes execution slots. The import machinery calls this method when
a module is executed, unless the module is being reloaded::

    PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)

Another function will be introduced to initialize a PyModuleDef object.
This idempotent function fills in the type, refcount, and module index.
It returns its argument cast to PyObject*, so it can be returned directly
from a PyInit function::

    PyObject * PyModuleDef_Init(PyModuleDef *);

Additionally, two helpers will be added for setting the docstring and
methods on a module::

    int PyModule_SetDocString(PyObject *, const char *)
    int PyModule_AddFunctions(PyObject *, PyMethodDef *)


Export Hook Name
----------------

As portable C identifiers are limited to ASCII, module names
must be encoded to form the PyInit hook name.

For ASCII module names, the import hook is named
PyInit_<modulename>, where <modulename> is the name of the module.

For module names containing non-ASCII characters, the import hook is named
PyInitU_<encodedname>, where the name is encoded using CPython's
"punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix),
with hyphens ("-") replaced by underscores ("_").


In Python::

    def export_hook_name(name):
        try:
            suffix = b'_' + name.encode('ascii')
        except UnicodeEncodeError:
            suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
        return b'PyInit' + suffix

Examples:

=============  ===================
Module name    Init hook name
=============  ===================
spam           PyInit_spam
lan?m?t        PyInitU_lanmt_2sa6t
???          PyInitU_zck5b2b
=============  ===================

For modules with non-ASCII names, single-phase initialization is not
supported.

In the initial implementation of this PEP, built-in modules with non-ASCII
names will not be supported.


Module Reloading
----------------

Reloading an extension module using importlib.reload() will continue to
have no effect, except re-setting import-related attributes.

Due to limitations in shared library loading (both dlopen on POSIX and
LoadModuleEx on Windows), it is not generally possible to load
a modified library after it has changed on disk.

Use cases for reloading other than trying out a new version of the module
are too rare to require all module authors to keep reloading in mind.
If reload-like functionality is needed, authors can export a dedicated
function for it.


Multiple modules in one library
-------------------------------

To support multiple Python modules in one shared library, the library can
export additional PyInit* symbols besides the one that corresponds
to the library's filename.

Note that this mechanism can currently only be used to *load* extra modules,
but not to *find* them.

Given the filesystem location of a shared library and a module name,
a module may be loaded with::

    import importlib.machinery
    import importlib.util
    loader = importlib.machinery.ExtensionFileLoader(name, path)
    spec = importlib.util.spec_from_loader(name, loader)
    module = importlib.util.module_from_spec(spec)
    loader.exec_module(module)
    return module

On platforms that support symbolic links, these may be used to install one
library under multiple names, exposing all exported modules to normal
import machinery.


Testing and initial implementations
-----------------------------------

For testing, a new built-in module ``_testmultiphase`` will be created.
The library will export several additional modules using the mechanism
described in "Multiple modules in one library".

The ``_testcapi`` module will be unchanged, and will use single-phase
initialization indefinitely (or until it is no longer supported).

The ``array`` and ``xx*`` modules will be converted to use multi-phase
initialization as part of the initial implementation.


Summary of API Changes and Additions
------------------------------------

New functions:

* PyModule_FromDefAndSpec (macro)
* PyModule_FromDefAndSpec2
* PyModule_ExecDef
* PyModule_SetDocString
* PyModule_AddFunctions
* PyModuleDef_Init

New macros:

* Py_mod_create
* Py_mod_exec

New types:

* PyModuleDef_Type will be exposed

New structures:

* PyModuleDef_Slot

PyModuleDef.m_reload changes to PyModuleDef.m_slots.


Possible Future Extensions
==========================

The slots mechanism, inspired by PyType_Slot from PEP 384,
allows later extensions.

Some extension modules exports many constants; for example _ssl has
a long list of calls in the form::

    PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
                            PY_SSL_ERROR_ZERO_RETURN);

Converting this to a declarative list, similar to PyMethodDef,
would reduce boilerplate, and provide free error-checking which
is often missing.

String constants and types can be handled similarly.
(Note that non-default bases for types cannot be portably specified
statically; this case would need a Py_mod_exec function that runs
before the slots are added. The free error-checking would still be
beneficial, though.)

Another possibility is providing a "main" function that would be run
when the module is given to Python's -m switch.
For this to work, the runpy module will need to be modified to take
advantage of ModuleSpec-based loading introduced in PEP 451.
Also, it will be necessary to add a mechanism for setting up a module
according to slots it wasn't originally defined with.


Implementation
==============

Work-in-progress implementation is available in a Github repository
[#gh-repo]_;
a patchset is at [#gh-patch]_.


Previous Approaches
===================

Stefan Behnel's initial proto-PEP [#stefans_protopep]_
had a "PyInit_modulename" hook that would create a module class,
whose ``__init__`` would be then called to create the module.
This proposal did not correspond to the (then nonexistent) PEP 451,
where module creation and initialization is broken into distinct steps.
It also did not support loading an extension into pre-existing module
objects.

Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype
implementation [#nicks-prototype]_.
At this time PEP 451 was still not implemented, so the prototype
does not use ModuleSpec.

The original version of this PEP used Create and Exec hooks, and allowed
loading into arbitrary pre-constructed objects with Exec hook.
The proposal made extension module initialization closer to how Python
modules
are initialized, but it was later recognized that this isn't an
important goal.
The current PEP describes a simpler solution.

A further iteration used a "PyModuleExport" hook as an alternative to
PyInit,
where PyInit was used for existing scheme, and PyModuleExport for
multi-phase.
However, not being able to determine the hook name based on module name
complicated automatic generation of PyImport_Inittab by tools like freeze.
Keeping only the PyInit hook name, even if it's not entirely appropriate for
exporting a definition, yielded a much simpler solution.


References
==========

.. [#lazy_import_concerns]
   https://mail.python.org/pipermail/python-dev/2013-August/128129.html

.. [#pep-0451-attributes]
   https://www.python.org/dev/peps/pep-0451/#attributes

.. [#stefans_protopep]
   https://mail.python.org/pipermail/python-dev/2013-August/128087.html

.. [#nicks-prototype]
   https://mail.python.org/pipermail/python-dev/2013-August/128101.html

.. [#rfc-3492]
   http://tools.ietf.org/html/rfc3492

.. [#gh-repo]
   https://github.com/encukou/cpython/commits/pep489

.. [#gh-patch]
   https://github.com/encukou/cpython/compare/master...encukou:pep489.patch

.. [#findmodule-discussion]
   https://mail.python.org/pipermail/import-sig/2015-April/000959.html


Copyright
=========

This document has been placed in the public domain.

From solipsis at pitrou.net  Mon May 18 16:51:03 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 18 May 2015 16:51:03 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
	initialization; version 5
References: <5559F0FD.3080704@gmail.com>
Message-ID: <20150518165103.34c9ed20@fsol>


Hi,

On Mon, 18 May 2015 16:02:37 +0200
Petr Viktorin <encukou at gmail.com> wrote:
> 
> I've sent the latest update of PEP 489 to the editors. I am quite happy
> with how it turned out, and I don't expect too many further changes.

I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be
discouraged in favour of custom module object fields?

Regards

Antoine.


From encukou at gmail.com  Mon May 18 17:07:20 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Mon, 18 May 2015 17:07:20 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <20150518165103.34c9ed20@fsol>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
Message-ID: <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>

On Mon, May 18, 2015 at 4:51 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Hi,
>
> On Mon, 18 May 2015 16:02:37 +0200
> Petr Viktorin <encukou at gmail.com> wrote:
>>
>> I've sent the latest update of PEP 489 to the editors. I am quite happy
>> with how it turned out, and I don't expect too many further changes.
>
> I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be
> discouraged in favour of custom module object fields?

No, it's the other way around -- we want to discourage using custom
module subclasses; most modules should just customize the exec phase.

From solipsis at pitrou.net  Mon May 18 17:15:07 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 18 May 2015 17:15:07 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
	initialization; version 5
In-Reply-To: <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
Message-ID: <20150518171507.6f711718@fsol>

On Mon, 18 May 2015 17:07:20 +0200
Petr Viktorin <encukou at gmail.com> wrote:
> On Mon, May 18, 2015 at 4:51 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> >
> > Hi,
> >
> > On Mon, 18 May 2015 16:02:37 +0200
> > Petr Viktorin <encukou at gmail.com> wrote:
> >>
> >> I've sent the latest update of PEP 489 to the editors. I am quite happy
> >> with how it turned out, and I don't expect too many further changes.
> >
> > I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be
> > discouraged in favour of custom module object fields?
> 
> No, it's the other way around -- we want to discourage using custom
> module subclasses; most modules should just customize the exec phase.

Can you explain why? The module state mechanism has turned out to be
cumbersome and inefficient, and is the main reason why PEP 3121
conversions of many stdlib modules have been deferred or abandoned.

A fast, easy way to access module "state" without defining global
variables at the C level is required.

Regards

Antoine.

From encukou at gmail.com  Mon May 18 17:32:13 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Mon, 18 May 2015 17:32:13 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <20150518171507.6f711718@fsol>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
Message-ID: <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>

On Mon, May 18, 2015 at 5:15 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 18 May 2015 17:07:20 +0200
> Petr Viktorin <encukou at gmail.com> wrote:
>> On Mon, May 18, 2015 at 4:51 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> >
>> > Hi,
>> >
>> > On Mon, 18 May 2015 16:02:37 +0200
>> > Petr Viktorin <encukou at gmail.com> wrote:
>> >>
>> >> I've sent the latest update of PEP 489 to the editors. I am quite happy
>> >> with how it turned out, and I don't expect too many further changes.
>> >
>> > I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it be
>> > discouraged in favour of custom module object fields?
>>
>> No, it's the other way around -- we want to discourage using custom
>> module subclasses; most modules should just customize the exec phase.
>
> Can you explain why? The module state mechanism has turned out to be
> cumbersome and inefficient, and is the main reason why PEP 3121
> conversions of many stdlib modules have been deferred or abandoned.

One reason against custom module subclasses is that it won't be easy
to support "python -m" for them (see
https://mail.python.org/pipermail/import-sig/2015-March/000923.html)
Nick, can you give some others? Preferring real module objects is
something I remember from our early discussions.

> A fast, easy way to access module "state" without defining global
> variables at the C level is required.

You can have a custom subclass, or you can use per-module state, or
put a capsule in the module dict.
This PEP doesn't add a fourth better way, but I don't think that's
really in its scope ("The goal is [...] bringing extension modules
closer to the way Python modules behave"). What it does do, with
slots, is provide a mechanism to add such a better way in the future,
relatively painlessly.

From ncoghlan at gmail.com  Mon May 18 17:55:19 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 19 May 2015 01:55:19 +1000
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
Message-ID: <CADiSq7dA=aiTMjwMdmOFDSp6BpG04FFfSbpOVCzmSitqGDjXEg@mail.gmail.com>

On 19 May 2015 01:32, "Petr Viktorin" <encukou at gmail.com> wrote:
>
> On Mon, May 18, 2015 at 5:15 PM, Antoine Pitrou <solipsis at pitrou.net>
wrote:
> > On Mon, 18 May 2015 17:07:20 +0200
> > Petr Viktorin <encukou at gmail.com> wrote:
> >> On Mon, May 18, 2015 at 4:51 PM, Antoine Pitrou <solipsis at pitrou.net>
wrote:
> >> >
> >> > Hi,
> >> >
> >> > On Mon, 18 May 2015 16:02:37 +0200
> >> > Petr Viktorin <encukou at gmail.com> wrote:
> >> >>
> >> >> I've sent the latest update of PEP 489 to the editors. I am quite
happy
> >> >> with how it turned out, and I don't expect too many further changes.
> >> >
> >> > I'm surprised the PEP still mentions PyModule_GetState. Shouldn't it
be
> >> > discouraged in favour of custom module object fields?
> >>
> >> No, it's the other way around -- we want to discourage using custom
> >> module subclasses; most modules should just customize the exec phase.
> >
> > Can you explain why? The module state mechanism has turned out to be
> > cumbersome and inefficient, and is the main reason why PEP 3121
> > conversions of many stdlib modules have been deferred or abandoned.
>
> One reason against custom module subclasses is that it won't be easy
> to support "python -m" for them (see
> https://mail.python.org/pipermail/import-sig/2015-March/000923.html)
> Nick, can you give some others? Preferring real module objects is
> something I remember from our early discussions.

I thought you talked me out of that somewhere along the line? My
recollection at this point is that I was originally wanting the use of the
Create slot to be compatible with runpy, but didn't actually have a
compelling reason for why we should accept that as a design constraint.

> > A fast, easy way to access module "state" without defining global
> > variables at the C level is required.
>
> You can have a custom subclass, or you can use per-module state, or
> put a capsule in the module dict.
> This PEP doesn't add a fourth better way, but I don't think that's
> really in its scope ("The goal is [...] bringing extension modules
> closer to the way Python modules behave"). What it does do, with
> slots, is provide a mechanism to add such a better way in the future,
> relatively painlessly.

Right, I think there's still a problem worth solving there, but I don't
think this specific PEP needs to solve it directly.

Cheers,
Nick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150519/16f5cc89/attachment.html>

From solipsis at pitrou.net  Mon May 18 17:58:03 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 18 May 2015 17:58:03 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
	initialization; version 5
In-Reply-To: <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
Message-ID: <20150518175803.03a1e0cf@fsol>

On Mon, 18 May 2015 17:32:13 +0200
Petr Viktorin <encukou at gmail.com> wrote:
> 
> > A fast, easy way to access module "state" without defining global
> > variables at the C level is required.
> 
> You can have a custom subclass, or you can use per-module state, or
> put a capsule in the module dict.

The latter two are cumbersome and inefficient. Only custom subclasses
can make things easy and fast at the C level. Which is why I'm
surprised that you seem to be encouraging, or not discouraging, the
"module state" API.

Regards

Antoine.

From encukou at gmail.com  Mon May 18 18:27:50 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Mon, 18 May 2015 18:27:50 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <20150518175803.03a1e0cf@fsol>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
 <20150518175803.03a1e0cf@fsol>
Message-ID: <CA+=+wqCs1m7GwiDcbmTAgKSM2qLq-+piJndPU9Sd-FVRO-Fbww@mail.gmail.com>

On Mon, May 18, 2015 at 5:58 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 18 May 2015 17:32:13 +0200
> Petr Viktorin <encukou at gmail.com> wrote:
>>
>> > A fast, easy way to access module "state" without defining global
>> > variables at the C level is required.
>>
>> You can have a custom subclass, or you can use per-module state, or
>> put a capsule in the module dict.
>
> The latter two are cumbersome and inefficient. Only custom subclasses
> can make things easy and fast at the C level.

With per-module state, you need a one-liner macro, and a pointer
dereference at runtime. Is that too cumbersome and inefficient, or am
I missing something?

The PEP still supports custom subclasses, for cases where you need
easy and fast module state.

> Which is why I'm surprised that you seem to be encouraging, or not
> discouraging, the "module state" API.

No, I'm not discouraging it. The PEP makes sure it continues to work.
Should there be another PEP to deprecate it?

From solipsis at pitrou.net  Mon May 18 18:57:13 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 18 May 2015 18:57:13 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
	initialization; version 5
In-Reply-To: <CA+=+wqCs1m7GwiDcbmTAgKSM2qLq-+piJndPU9Sd-FVRO-Fbww@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
 <20150518175803.03a1e0cf@fsol>
 <CA+=+wqCs1m7GwiDcbmTAgKSM2qLq-+piJndPU9Sd-FVRO-Fbww@mail.gmail.com>
Message-ID: <20150518185713.0c07a4c8@fsol>

On Mon, 18 May 2015 18:27:50 +0200
Petr Viktorin <encukou at gmail.com> wrote:
> On Mon, May 18, 2015 at 5:58 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> > On Mon, 18 May 2015 17:32:13 +0200
> > Petr Viktorin <encukou at gmail.com> wrote:
> >>
> >> > A fast, easy way to access module "state" without defining global
> >> > variables at the C level is required.
> >>
> >> You can have a custom subclass, or you can use per-module state, or
> >> put a capsule in the module dict.
> >
> > The latter two are cumbersome and inefficient. Only custom subclasses
> > can make things easy and fast at the C level.
> 
> With per-module state, you need a one-liner macro, and a pointer
> dereference at runtime. Is that too cumbersome and inefficient, or am
> I missing something?

The main problem is the PyState_FindModule() function. It's not
terribly efficient, and most of all you have to check its return value
for NULL.

Regards

Antoine.

From encukou at gmail.com  Mon May 18 19:06:53 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Mon, 18 May 2015 19:06:53 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <20150518185713.0c07a4c8@fsol>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
 <20150518175803.03a1e0cf@fsol>
 <CA+=+wqCs1m7GwiDcbmTAgKSM2qLq-+piJndPU9Sd-FVRO-Fbww@mail.gmail.com>
 <20150518185713.0c07a4c8@fsol>
Message-ID: <CA+=+wqAP4JSoDFMGQL1o4sF6bBsebhFXkMVt1i1aY7BwtEneFw@mail.gmail.com>

On Mon, May 18, 2015 at 6:57 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 18 May 2015 18:27:50 +0200
> Petr Viktorin <encukou at gmail.com> wrote:
>> On Mon, May 18, 2015 at 5:58 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> > On Mon, 18 May 2015 17:32:13 +0200
>> > Petr Viktorin <encukou at gmail.com> wrote:
>> >>
>> >> > A fast, easy way to access module "state" without defining global
>> >> > variables at the C level is required.
>> >>
>> >> You can have a custom subclass, or you can use per-module state, or
>> >> put a capsule in the module dict.
>> >
>> > The latter two are cumbersome and inefficient. Only custom subclasses
>> > can make things easy and fast at the C level.
>>
>> With per-module state, you need a one-liner macro, and a pointer
>> dereference at runtime. Is that too cumbersome and inefficient, or am
>> I missing something?
>
> The main problem is the PyState_FindModule() function. It's not
> terribly efficient, and most of all you have to check its return value
> for NULL.

Ah, but that one is orthogonal to per-module state. The
PyState_FindModule is concerned with finding "the" module
corresponding to a given PyModuleDef in a given interpreter.
The problem it attempts to solve is that the module can't easily be
passed around to all the places that need it. You'd actually have the
exact same problem with a custom subclass -- it's finding the module
instance that's the problem, not getting data from it.

The PEP actually discourages PyState_FindModule quite strongly: this
family of functions just doesn't work with modules initialized
multi-phase init. The PEP tells you that if you need
PyState_FindModule, we're sorry, and you should stick to the old way
of doing things until we solve the problem (and then it links to
preliminary discussion about the solution, which is out of its scope).

https://www.python.org/dev/peps/pep-0489/#functions-incompatible-with-multi-phase-initialization
https://www.python.org/dev/peps/pep-0489/#module-state-and-c-level-callbacks

From solipsis at pitrou.net  Mon May 18 19:17:21 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 18 May 2015 19:17:21 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
	initialization; version 5
In-Reply-To: <CA+=+wqAP4JSoDFMGQL1o4sF6bBsebhFXkMVt1i1aY7BwtEneFw@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
 <20150518175803.03a1e0cf@fsol>
 <CA+=+wqCs1m7GwiDcbmTAgKSM2qLq-+piJndPU9Sd-FVRO-Fbww@mail.gmail.com>
 <20150518185713.0c07a4c8@fsol>
 <CA+=+wqAP4JSoDFMGQL1o4sF6bBsebhFXkMVt1i1aY7BwtEneFw@mail.gmail.com>
Message-ID: <20150518191721.7bbde78f@fsol>

On Mon, 18 May 2015 19:06:53 +0200
Petr Viktorin <encukou at gmail.com> wrote:
> >
> > The main problem is the PyState_FindModule() function. It's not
> > terribly efficient, and most of all you have to check its return value
> > for NULL.
> 
> Ah, but that one is orthogonal to per-module state. The
> PyState_FindModule is concerned with finding "the" module
> corresponding to a given PyModuleDef in a given interpreter.
> The problem it attempts to solve is that the module can't easily be
> passed around to all the places that need it. You'd actually have the
> exact same problem with a custom subclass -- it's finding the module
> instance that's the problem, not getting data from it.

That's a fair point. But it means the PEP won't help those stdlib
modules which haven't been converted to PEP 3121, then.

Regards

Antoine.

From encukou at gmail.com  Mon May 18 19:35:57 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Mon, 18 May 2015 19:35:57 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <20150518191721.7bbde78f@fsol>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
 <20150518175803.03a1e0cf@fsol>
 <CA+=+wqCs1m7GwiDcbmTAgKSM2qLq-+piJndPU9Sd-FVRO-Fbww@mail.gmail.com>
 <20150518185713.0c07a4c8@fsol>
 <CA+=+wqAP4JSoDFMGQL1o4sF6bBsebhFXkMVt1i1aY7BwtEneFw@mail.gmail.com>
 <20150518191721.7bbde78f@fsol>
Message-ID: <CA+=+wqBsORsBGfJg9ggcB2ukqF9aW-d_G9-z9uEy_scSxyJYOA@mail.gmail.com>

On Mon, May 18, 2015 at 7:17 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 18 May 2015 19:06:53 +0200
> Petr Viktorin <encukou at gmail.com> wrote:
>> >
>> > The main problem is the PyState_FindModule() function. It's not
>> > terribly efficient, and most of all you have to check its return value
>> > for NULL.
>>
>> Ah, but that one is orthogonal to per-module state. The
>> PyState_FindModule is concerned with finding "the" module
>> corresponding to a given PyModuleDef in a given interpreter.
>> The problem it attempts to solve is that the module can't easily be
>> passed around to all the places that need it. You'd actually have the
>> exact same problem with a custom subclass -- it's finding the module
>> instance that's the problem, not getting data from it.
>
> That's a fair point. But it means the PEP won't help those stdlib
> modules which haven't been converted to PEP 3121, then.

Correct. This is not the PEP you're looking for.

Originally we did want to solve this problem, and I guess wording that
suggests it's solved might still be around. Is that the case? Should I
clarify that the problem is not yet solved?
As the author, it's easy for me to lose track of the big picture.

From solipsis at pitrou.net  Mon May 18 19:42:02 2015
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 18 May 2015 19:42:02 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
	initialization; version 5
In-Reply-To: <CA+=+wqBsORsBGfJg9ggcB2ukqF9aW-d_G9-z9uEy_scSxyJYOA@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
 <20150518175803.03a1e0cf@fsol>
 <CA+=+wqCs1m7GwiDcbmTAgKSM2qLq-+piJndPU9Sd-FVRO-Fbww@mail.gmail.com>
 <20150518185713.0c07a4c8@fsol>
 <CA+=+wqAP4JSoDFMGQL1o4sF6bBsebhFXkMVt1i1aY7BwtEneFw@mail.gmail.com>
 <20150518191721.7bbde78f@fsol>
 <CA+=+wqBsORsBGfJg9ggcB2ukqF9aW-d_G9-z9uEy_scSxyJYOA@mail.gmail.com>
Message-ID: <20150518194202.394abe0a@fsol>

On Mon, 18 May 2015 19:35:57 +0200
Petr Viktorin <encukou at gmail.com> wrote:
> 
> Correct. This is not the PEP you're looking for.
> 
> Originally we did want to solve this problem, and I guess wording that
> suggests it's solved might still be around. Is that the case? Should I
> clarify that the problem is not yet solved?

The following wording in the PEP:

"""This PEP proposes a redesign of the way in which built-in and
extension modules interact with the import machinery. This was last
revised for Python 3.0 in PEP 3121 , but did not solve all problems at
the time. The goal is to solve them by bringing extension modules
closer to the way Python modules behave; specifically to hook into the
ModuleSpec-based loading mechanism introduced in PEP 451 ."""

suggests that it will indeed help overcome the issues with PEP 3121. It
turns out it doesn't, except in one specific case (i.e. Cython).

Regards

Antoine.

From encukou at gmail.com  Mon May 18 19:49:43 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Mon, 18 May 2015 19:49:43 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <20150518194202.394abe0a@fsol>
References: <5559F0FD.3080704@gmail.com> <20150518165103.34c9ed20@fsol>
 <CA+=+wqCKCNs--v_LzeNRrgVnsNwMgRAQUubnW9D4YY8iqQxy3w@mail.gmail.com>
 <20150518171507.6f711718@fsol>
 <CA+=+wqBLXkvq1u5T9jb+2WwSiqVAc9shyAXemm6yY-K+J8CxPQ@mail.gmail.com>
 <20150518175803.03a1e0cf@fsol>
 <CA+=+wqCs1m7GwiDcbmTAgKSM2qLq-+piJndPU9Sd-FVRO-Fbww@mail.gmail.com>
 <20150518185713.0c07a4c8@fsol>
 <CA+=+wqAP4JSoDFMGQL1o4sF6bBsebhFXkMVt1i1aY7BwtEneFw@mail.gmail.com>
 <20150518191721.7bbde78f@fsol>
 <CA+=+wqBsORsBGfJg9ggcB2ukqF9aW-d_G9-z9uEy_scSxyJYOA@mail.gmail.com>
 <20150518194202.394abe0a@fsol>
Message-ID: <CA+=+wqDuLSb7C_YRZ2f16zH6F1sQXomuOi5R7gzSuwT4Q5g==A@mail.gmail.com>

On Mon, May 18, 2015 at 7:42 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 18 May 2015 19:35:57 +0200
> Petr Viktorin <encukou at gmail.com> wrote:
>>
>> Correct. This is not the PEP you're looking for.
>>
>> Originally we did want to solve this problem, and I guess wording that
>> suggests it's solved might still be around. Is that the case? Should I
>> clarify that the problem is not yet solved?
>
> The following wording in the PEP:
>
> """This PEP proposes a redesign of the way in which built-in and
> extension modules interact with the import machinery. This was last
> revised for Python 3.0 in PEP 3121 , but did not solve all problems at
> the time. The goal is to solve them by bringing extension modules
> closer to the way Python modules behave; specifically to hook into the
> ModuleSpec-based loading mechanism introduced in PEP 451 ."""
>
> suggests that it will indeed help overcome the issues with PEP 3121. It
> turns out it doesn't, except in one specific case (i.e. Cython).

Ah, the abstract. My eyes must have glazed over, and I didn't expand
"PEP 3121" when re-reading it. I'll reword this.

Thanks for noticing, and sorry for the confusion!

From ericsnowcurrently at gmail.com  Tue May 19 02:07:57 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Mon, 18 May 2015 18:07:57 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <5559F0FD.3080704@gmail.com>
References: <5559F0FD.3080704@gmail.com>
Message-ID: <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>

Thanks for working on this, Petr (et al.).  Sorry I've missed the
previous discussion.  Comments are in-line.

-eric

On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin <encukou at gmail.com> wrote:
> [snip]
>
> Furthermore, the majority of currently existing extension modules has
> problems with sub-interpreter support and/or interpreter reloading, and,
> while
> it is possible with the current infrastructure to support these
> features, it is neither easy nor efficient.
> Addressing these issues was the goal of PEP 3121, but many extensions,
> including some in the standard library, took the least-effort approach
> to porting to Python 3, leaving these issues unresolved.
> This PEP keeps backwards compatibility, which should reduce pressure and
> give
> extension authors adequate time to consider these issues when porting.

So just be to sure I understand, now PyModuleDef.m_slots will
unambiguously indicate whether or not an extension module is
compliant, right?

> [snip]
>
> The proposal
> ============

This section should include an indication of how the loader (and
perhaps finder) will change for builtin, frozen, and extension
modules.  It may help to describe the proposal up front by how the
loader implementation would look if it were somehow implemented in
Python code.  The subsequent sections sometimes indicate where
different things take place, but an explicit outline (as Python code)
would make the entire flow really obvious.  Putting that toward the
beginning of this section would help clearly set the stage for the
rest of the proposal.

> [snip]
> Unknown slot IDs will cause the import to fail with SystemError.

Was there any consideration made for just ignoring unknown slot IDs?
My gut reaction is that you have it the right way, but I can still
imagine use cases for custom slots that PyModuleDef_Init wouldn't know
about.

>
> When using multi-phase initialization, the *m_name* field of PyModuleDef
> will
> not be used during importing; the module name will be taken from the
> ModuleSpec.

So m_name will be strictly ignored by PyModuleDef_Init?

>
> To prevent crashes when the module is loaded in older versions of Python,
> the PyModuleDef object must be initialized using the newly added
> PyModuleDef_Init function.
> For example, an extension module "example" would be exported as::
>
>     static PyModuleDef example_def = {...}
>
>     PyMODINIT_FUNC
>     PyInit_example(void)
>     {
>         return PyModuleDef_Init(&example_def);
>     }

This example is helpful. :)

>
> The PyModuleDef object must be available for the lifetime of the module
> created
> from it ? usually, it will be declared statically.

How easily will this be a source of mysterious errors-at-a-distance?

> [snip]
> However, only ModuleType instances support module-specific functionality
> such as per-module state.

This is a pretty important point.  Presumably this constraints later
behavior and precedes all functionality related to per-module state.

> [snip]
> Extension authors are advised to keep Py_mod_create minimal, an in
> particular
> to not call user code from it.

This is a pretty important point as well.  We'll need to make sure
this is sufficiently clear in the documentation.  Would it make sense
to provide helpers for common cases, to encourage extension authors to
keep the create function minimal?

> [snip]
>
> If PyModuleExec replaces the module's entry in sys.modules,
> the new object will be used and returned by importlib machinery.

Just to be sure, something like "mod = sys.modules[modname]" is done
before each execution slot.  In other words, the result of the
previous execution slot should be used for the next one.

> (This mirrors the behavior of Python modules. Note that implementing
> Py_mod_create is usually a better solution for the use cases this serves.)

Could you elaborate?  What are those use cases and why would
Py_mod_create be better?

> [snip]
>
> Modules that need to work unchanged on older versions of Python should not
> use multi-phase initialization, because the benefits it brings can't be
> back-ported.

Given your example below, "should not" seems a bit strong to me.  In
fact, what are the objections to encouraging the approach from the
example?

> Nevertheless, here is an example of a module that supports multi-phase
> initialization, and falls back to single-phase when compiled for an older
> version of CPython::
>
>     #include <Python.h>
>
>     static int spam_exec(PyObject *module) {
>         PyModule_AddStringConstant(module, "food", "spam");
>         return 0;
>     }
>
>     #ifdef Py_mod_exec
>     static PyModuleDef_Slot spam_slots[] = {
>         {Py_mod_exec, spam_exec},
>         {0, NULL}
>     };
>     #endif
>
>     static PyModuleDef spam_def = {
>         PyModuleDef_HEAD_INIT,                      /* m_base */
>         "spam",                                     /* m_name */
>         PyDoc_STR("Utilities for cooking spam"),    /* m_doc */
>         0,                                          /* m_size */
>         NULL,                                       /* m_methods */
>     #ifdef Py_mod_exec
>         spam_slots,                                 /* m_slots */
>     #else
>         NULL,
>     #endif
>         NULL,                                       /* m_traverse */
>         NULL,                                       /* m_clear */
>         NULL,                                       /* m_free */
>     };
>
>     PyMODINIT_FUNC
>     PyInit_spam(void) {
>     #ifdef Py_mod_exec
>         return PyModuleDef_Init(&spam_def);
>     #else
>         PyObject *module;
>         module = PyModule_Create(&spam_def);
>         if (module == NULL) return NULL;
>         if (spam_exec(module) != 0) {
>             Py_DECREF(module);
>             return NULL;
>         }
>         return module;
>     #endif
>     }
>

This example is really helpful!

> [snip]
>
> Subinterpreters and Interpreter Reloading
> -----------------------------------------
>
> Extensions using the new initialization scheme are expected to support
> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.

Presumably this support is explicitly and completely defined in the
subsequent sentences.  Is it really just keeping "hidden" module state
encapsulated on the module object?  If not then it may make sense to
enumerate the requirements better for the sake of extension module
authors.

> The mechanism is designed to make this easy, but care is still required
> on the part of the extension author.
> No user-defined functions, methods, or instances may leak to different
> interpreters.
> To achieve this, all module-level state should be kept in either the module
> dict, or in the module object's storage reachable by PyModule_GetState.

Is this programmatically enforceable?  Is there any mechanism for
easily copying module state?  How about sharing some state between
subinterpreters?  How much room is there for letting extension module
authors define how their module behaves across multiple interpreters
or across multiple Initialize/Finalize cycles?

> A simple rule of thumb is: Do not define any static data, except
> built-in types
> with no mutable or user-settable class attributes.

This is another one of those points that needs to be crystal clear in the docs.

> As a rule of thumb, modules that rely on PyState_FindModule are, at the
> moment,
> not good candidates for porting to the new mechanism.

Are there any plans for a follow-up effort to help with this case?

> [snip]
>
> Module Reloading
> ----------------
>
> Reloading an extension module using importlib.reload() will continue to
> have no effect, except re-setting import-related attributes.
>
> Due to limitations in shared library loading (both dlopen on POSIX and
> LoadModuleEx on Windows), it is not generally possible to load
> a modified library after it has changed on disk.
>
> Use cases for reloading other than trying out a new version of the module
> are too rare to require all module authors to keep reloading in mind.
> If reload-like functionality is needed, authors can export a dedicated
> function for it.

Keep in mind the semantics of reload for pure Python modules.  The
module is executed into the existing namespace, overwriting the loaded
namespace but leaving non-colliding attributes alone.  While the
semantics for reloading an extension/builtin/frozen module are
currently basic (i.e. a no-op), there may well be room to support
reload behavior that mirrors that of pure Python modules without
needing to reload an SO file.  I would expect either the behavior of
exec to get repeated (tricky due to "hidden" module state?) or for
there to be a "reload" slot that would mirror Py_mod_exec.

At the same time, one may argue that reloading modules is not
something to encourage. :)

>
>
> Multiple modules in one library
> -------------------------------
>
> To support multiple Python modules in one shared library, the library can
> export additional PyInit* symbols besides the one that corresponds
> to the library's filename.
>
> Note that this mechanism can currently only be used to *load* extra modules,
> but not to *find* them.

What do you mean by "currently"?

It may also be worth tying the above statement with the following
text, since the following appears to be an explanation of how to
address the "finder" caveat.

>
> Given the filesystem location of a shared library and a module name,
> a module may be loaded with::
>
>     import importlib.machinery
>     import importlib.util
>     loader = importlib.machinery.ExtensionFileLoader(name, path)
>     spec = importlib.util.spec_from_loader(name, loader)
>     module = importlib.util.module_from_spec(spec)
>     loader.exec_module(module)
>     return module
>
> On platforms that support symbolic links, these may be used to install one
> library under multiple names, exposing all exported modules to normal
> import machinery.
>
>
> Testing and initial implementations
> -----------------------------------
>
> For testing, a new built-in module ``_testmultiphase`` will be created.
> The library will export several additional modules using the mechanism
> described in "Multiple modules in one library".
>
> The ``_testcapi`` module will be unchanged, and will use single-phase
> initialization indefinitely (or until it is no longer supported).
>
> The ``array`` and ``xx*`` modules will be converted to use multi-phase
> initialization as part of the initial implementation.

What do you mean by "initial implementation"?  Will it be done
differently in a later implementation?

>
>
> Summary of API Changes and Additions
> ------------------------------------
>
> New functions:
>
> * PyModule_FromDefAndSpec (macro)
> * PyModule_FromDefAndSpec2
> * PyModule_ExecDef
> * PyModule_SetDocString
> * PyModule_AddFunctions
> * PyModuleDef_Init
>
> New macros:
>
> * Py_mod_create
> * Py_mod_exec
>
> New types:
>
> * PyModuleDef_Type will be exposed
>
> New structures:
>
> * PyModuleDef_Slot
>
> PyModuleDef.m_reload changes to PyModuleDef.m_slots.

This section is missing any explanation of the impact on
Python/import.c, on the _imp/imp module, and on the 3 finders/loaders
in Lib/importlib/_bootstrap[_external].py (builtin/frozen/extension).

>
>
> Possible Future Extensions
> ==========================
>
> The slots mechanism, inspired by PyType_Slot from PEP 384,
> allows later extensions.
>
> Some extension modules exports many constants; for example _ssl has
> a long list of calls in the form::
>
>     PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
>                             PY_SSL_ERROR_ZERO_RETURN);
>
> Converting this to a declarative list, similar to PyMethodDef,
> would reduce boilerplate, and provide free error-checking which
> is often missing.

Great idea, including as it applies to other constants and types.

>
> String constants and types can be handled similarly.
> (Note that non-default bases for types cannot be portably specified
> statically; this case would need a Py_mod_exec function that runs
> before the slots are added. The free error-checking would still be
> beneficial, though.)

This implies to me that now is the time to ensure that this PEP
appropriately accommodates that need.  It would be unfortunate if we
had to later hack in some extra API to accommodate a use case we
already know about.  Better if we made sure the currently proposed
changes could accommodate the need, even if the implementation of that
part were not part of this PEP.

>
> Another possibility is providing a "main" function that would be run
> when the module is given to Python's -m switch.
> For this to work, the runpy module will need to be modified to take
> advantage of ModuleSpec-based loading introduced in PEP 451.

I'll point out that the pure-Python equivalent has been proposed on a
number of occasions and been rejected every time.  However, in the
case of extension modules it is more justifiable.  If extension
modules gain such a mechanism then it may be a justification for doing
something similar in Python.

> Also, it will be necessary to add a mechanism for setting up a module
> according to slots it wasn't originally defined with.

What does this mean?

>
>
> Implementation
> ==============
>
> Work-in-progress implementation is available in a Github repository
> [#gh-repo]_;
> a patchset is at [#gh-patch]_.

I'll have to take a look.

> [snip]

From ncoghlan at gmail.com  Tue May 19 05:51:22 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 19 May 2015 13:51:22 +1000
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
Message-ID: <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>

On 19 May 2015 at 10:07, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin <encukou at gmail.com> wrote:
>> [snip]
>>
>> Furthermore, the majority of currently existing extension modules has
>> problems with sub-interpreter support and/or interpreter reloading, and,
>> while
>> it is possible with the current infrastructure to support these
>> features, it is neither easy nor efficient.
>> Addressing these issues was the goal of PEP 3121, but many extensions,
>> including some in the standard library, took the least-effort approach
>> to porting to Python 3, leaving these issues unresolved.
>> This PEP keeps backwards compatibility, which should reduce pressure and
>> give
>> extension authors adequate time to consider these issues when porting.
>
> So just be to sure I understand, now PyModuleDef.m_slots will
> unambiguously indicate whether or not an extension module is
> compliant, right?

I'm not sure what you mean by "compliant". A non-NULL m_slots will
indicate usage of multi-phase initialisation, so it at least indicates
*intent* to correctly support subinterpreters et al. Actual delivery
on that promise is still a different question :)

>> [snip]
>>
>> The proposal
>> ============
>
> This section should include an indication of how the loader (and
> perhaps finder) will change for builtin, frozen, and extension
> modules.  It may help to describe the proposal up front by how the
> loader implementation would look if it were somehow implemented in
> Python code.  The subsequent sections sometimes indicate where
> different things take place, but an explicit outline (as Python code)
> would make the entire flow really obvious.  Putting that toward the
> beginning of this section would help clearly set the stage for the
> rest of the proposal.

+1 for a pseudo-code overview of the loader implementation.

>
>> [snip]
>> Unknown slot IDs will cause the import to fail with SystemError.
>
> Was there any consideration made for just ignoring unknown slot IDs?
> My gut reaction is that you have it the right way, but I can still
> imagine use cases for custom slots that PyModuleDef_Init wouldn't know
> about.

The "known slots only, all other slot IDs are reserved for future use"
slot semantics were copied directly from PyType_FromSpec in PEP 384.
Since it's just a numeric slot ID, you'd run a high risk of conflicts
if you allowed for custom extensions.

If folks want to do more clever things, they'll need to use the create
or exec slot to stash them on the module object, rather than storing
them in the module definition.

>> The PyModuleDef object must be available for the lifetime of the module
>> created
>> from it ? usually, it will be declared statically.
>
> How easily will this be a source of mysterious errors-at-a-distance?

It shouldn't be any worse than static type definitions, and normal
reference counting semantics should keep it alive regardless.

>> [snip]
>> Extension authors are advised to keep Py_mod_create minimal, an in
>> particular
>> to not call user code from it.
>
> This is a pretty important point as well.  We'll need to make sure
> this is sufficiently clear in the documentation.  Would it make sense
> to provide helpers for common cases, to encourage extension authors to
> keep the create function minimal?

The main encouragement is to not handcode your extension modules at
all, and let something like Cython or SWIG take care of the
boilerplate :)

>> [snip]
>>
>> If PyModuleExec replaces the module's entry in sys.modules,
>> the new object will be used and returned by importlib machinery.
>
> Just to be sure, something like "mod = sys.modules[modname]" is done
> before each execution slot.  In other words, the result of the
> previous execution slot should be used for the next one.

That's not the original intent of this paragraph - rather, it is
referring to the existing behaviour of the import machinery.

However, I agree that now we're allowing the Py_mod_exec slot to be
supplied multiple times, we should also be updating the module
reference between slot invocations.

I also think the PEP could do with a brief mention of the additional
modularity this approach brings at the C level - rather than having to
jam everything into one function, an extension module can easily break
up its initialisation into multiple steps, and its technically even
possible to share common steps between different modules.

>> (This mirrors the behavior of Python modules. Note that implementing
>> Py_mod_create is usually a better solution for the use cases this serves.)
>
> Could you elaborate?  What are those use cases and why would
> Py_mod_create be better?

Rather than replacing the implicitly created normal module during
Py_mod_exec (which is the only option available to Python modules),
PEP 489 lets you define the Py_mod_create slot to override the module
object creation directly.

Outside conversion of a Python module that manipulates sys.modules to
an extension module with Cython, there's no real reason to use the
"replacing yourself in sys.modules" option over using Py_mod_create
directly.

>> [snip]
>>
>> Modules that need to work unchanged on older versions of Python should not
>> use multi-phase initialization, because the benefits it brings can't be
>> back-ported.
>
> Given your example below, "should not" seems a bit strong to me.  In
> fact, what are the objections to encouraging the approach from the
> example?

Agreed, "should not" is probably too strong here. On the other hand,
preserving compatibility with older Python versions in a module that
has been updated to rely on multi-phase initialization is likely to be
a matter of "graceful degradation", rather than being able to
reproduce comparable functionality (which I believe may have been the
point Petr was trying to convey).

I expect Cython and SWIG may be able to manage that through
appropriate use of #ifdef's in the generated code, but doing it by
hand is likely to be painful, hence the potential benefits of just
sticking with single-phase initialisation for the time being.

>> [snip]
>>
>> Subinterpreters and Interpreter Reloading
>> -----------------------------------------
>>
>> Extensions using the new initialization scheme are expected to support
>> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
>
> Presumably this support is explicitly and completely defined in the
> subsequent sentences.  Is it really just keeping "hidden" module state
> encapsulated on the module object?  If not then it may make sense to
> enumerate the requirements better for the sake of extension module
> authors.

I'd actually like to have a better way of doing scenario testing for
extension modules (subinterpreters, multiple initialize/finalize
cycles, freezing), but I'm not sure this PEP is the best place to
define that. Perhaps we could do a PyPI project that was a tox-based
test battery for this kind of thing?

>> The mechanism is designed to make this easy, but care is still required
>> on the part of the extension author.
>> No user-defined functions, methods, or instances may leak to different
>> interpreters.
>> To achieve this, all module-level state should be kept in either the module
>> dict, or in the module object's storage reachable by PyModule_GetState.
>
> Is this programmatically enforceable?  Is there any mechanism for
> easily copying module state?  How about sharing some state between
> subinterpreters?  How much room is there for letting extension module
> authors define how their module behaves across multiple interpreters
> or across multiple Initialize/Finalize cycles?

It's not programmatically enforcable, hence the idea above of finding
a way to make it easier for people to test their extension modules are
importable across multiple Python versions and deployment scenarios.

>> As a rule of thumb, modules that rely on PyState_FindModule are, at the
>> moment,
>> not good candidates for porting to the new mechanism.
>
> Are there any plans for a follow-up effort to help with this case?

The problem here is that the PEP 3121 module state approach provides
storage on a *per-interpreter* basis, that is then shared amongst all
module instances created from a given module definition.

This means that when _PyImport_FindExtensionObject (see
https://hg.python.org/cpython/file/fc2eed9fc2d0/Python/import.c#l518)
reinitialises an extension module, the state is shared between the two
instances. When PEP 3121 was written, this was not seen as a problem,
since the expectation was that the behaviour would only be triggered
by multiple interpreter level initialize/finalize cycles.

One key scenario we missed at the time was "deleting an extension
module from sys.modules and importing it a second time, while
retaining a local reference for later restoration". Under PEP 3121,
the two instances collide on their state storage, as we have two
simultaneously existing module objects created in the same interpreter
from the same module definition. PEP 489 would inherit that same
problem if you tried to use it with the PyState_* APIs, so it simply
doesn't allow them at all. (Earlier versions of the PEP allowed it
with an "EXPORT_SINGLETON" slot that would disallow reimporting
entirely, which we took out in favour of "just keep using the existing
initialisation model in those cases for the time being")

For pure Python code, we don't have this problem, since the
interpreter takes care of providing a properly scoped globals()
reference to *all* functions defined in that module, regardless of
whether they're module level functions or method definitions on a
class. At the C level, we don't have that, as only module level
functions get a module reference passed in - methods only get a
reference to their class instance, without a reference to the module
globals, and delayed callbacks can be a problem as well.

The best improved API we could likely offer at this point is a
convenience API for looking up a module in *sys.modules* based on a
PyModuleDef instance, and updating PEP 489 to write the as-imported
module name into the returned PyModuleDef structure. That's probably
not a bad way to go, given that PEP 489 currently *ignores* the m_name
slot - flipping it around to be a *writable* slot would be a way to
let extension modules know dynamically how to look themselves up in
sys.modules.

The new lookup API would then be the moral equivalent of Python code
doing "mod = sys.modules[__name__]". With this approach, actively
*using* multiple references to a given module at the same time would
still break (since you'll always get the module currently in
sys.modules, even if that isn't the one you expected), but the
"save-and-restore" model needed for certain kinds of testing and
potentially other scenarios would work correctly.

>> Module Reloading
>> ----------------
>>
>> Reloading an extension module using importlib.reload() will continue to
>> have no effect, except re-setting import-related attributes.
>>
>> Due to limitations in shared library loading (both dlopen on POSIX and
>> LoadModuleEx on Windows), it is not generally possible to load
>> a modified library after it has changed on disk.
>>
>> Use cases for reloading other than trying out a new version of the module
>> are too rare to require all module authors to keep reloading in mind.
>> If reload-like functionality is needed, authors can export a dedicated
>> function for it.
>
> Keep in mind the semantics of reload for pure Python modules.  The
> module is executed into the existing namespace, overwriting the loaded
> namespace but leaving non-colliding attributes alone.  While the
> semantics for reloading an extension/builtin/frozen module are
> currently basic (i.e. a no-op), there may well be room to support
> reload behavior that mirrors that of pure Python modules without
> needing to reload an SO file.  I would expect either the behavior of
> exec to get repeated (tricky due to "hidden" module state?) or for
> there to be a "reload" slot that would mirror Py_mod_exec.

We considered this, and decided it was fairly pointless, since you
can't modify the extension module code. The one case I see where it
potentially makes sense is a "transitive reload", where the extension
module retrieves and caches attributes from another pure Python module
at import time, and that extension module has been reloaded.

It may also make a difference in the context of utilities like
https://docs.python.org/3/library/test.html#test.support.import_fresh_module,
where we manipulate the import system state to control how conditional
imports are handled.

> At the same time, one may argue that reloading modules is not
> something to encourage. :)

There's a reason import_fresh_module has never made it out of test.support :)

>> Multiple modules in one library
>> -------------------------------
>>
>> To support multiple Python modules in one shared library, the library can
>> export additional PyInit* symbols besides the one that corresponds
>> to the library's filename.
>>
>> Note that this mechanism can currently only be used to *load* extra modules,
>> but not to *find* them.
>
> What do you mean by "currently"?

It's a limitation of the way the existing finders work, rather than an
inherent limitation of the import system as a whole.

> It may also be worth tying the above statement with the following
> text, since the following appears to be an explanation of how to
> address the "finder" caveat.

Agreed that this could be clearer.

>> Testing and initial implementations
>> -----------------------------------
>>
>> For testing, a new built-in module ``_testmultiphase`` will be created.
>> The library will export several additional modules using the mechanism
>> described in "Multiple modules in one library".
>>
>> The ``_testcapi`` module will be unchanged, and will use single-phase
>> initialization indefinitely (or until it is no longer supported).
>>
>> The ``array`` and ``xx*`` modules will be converted to use multi-phase
>> initialization as part of the initial implementation.
>
> What do you mean by "initial implementation"?  Will it be done
> differently in a later implementation?

These modules will be converted in the reference implementation, other
modules won't be.

>> String constants and types can be handled similarly.
>> (Note that non-default bases for types cannot be portably specified
>> statically; this case would need a Py_mod_exec function that runs
>> before the slots are added. The free error-checking would still be
>> beneficial, though.)
>
> This implies to me that now is the time to ensure that this PEP
> appropriately accommodates that need.  It would be unfortunate if we
> had to later hack in some extra API to accommodate a use case we
> already know about.  Better if we made sure the currently proposed
> changes could accommodate the need, even if the implementation of that
> part were not part of this PEP.

This would be a new kind of execution slot, so the PEP already
accommodates these possible future extensions.

>> Another possibility is providing a "main" function that would be run
>> when the module is given to Python's -m switch.
>> For this to work, the runpy module will need to be modified to take
>> advantage of ModuleSpec-based loading introduced in PEP 451.
>
> I'll point out that the pure-Python equivalent has been proposed on a
> number of occasions and been rejected every time.  However, in the
> case of extension modules it is more justifiable.  If extension
> modules gain such a mechanism then it may be a justification for doing
> something similar in Python.
>
>> Also, it will be necessary to add a mechanism for setting up a module
>> according to slots it wasn't originally defined with.
>
> What does this mean?

When you use the -m switch, you always run in the builtin __main__
module namespace, and runpy fiddles with __main__.__spec__ to match
the details of the module passed to the switch. That's not currently a
trick we can manage when the "thing to run" is an extension module.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From encukou at gmail.com  Tue May 19 13:06:31 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Tue, 19 May 2015 13:06:31 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>	<CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
Message-ID: <555B1937.5020001@gmail.com>

On 05/19/2015 05:51 AM, Nick Coghlan wrote:
> On 19 May 2015 at 10:07, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin <encukou at gmail.com> wrote:
>>> [snip]
>>>
>>> Furthermore, the majority of currently existing extension modules has
>>> problems with sub-interpreter support and/or interpreter reloading, and,
>>> while
>>> it is possible with the current infrastructure to support these
>>> features, it is neither easy nor efficient.
>>> Addressing these issues was the goal of PEP 3121, but many extensions,
>>> including some in the standard library, took the least-effort approach
>>> to porting to Python 3, leaving these issues unresolved.
>>> This PEP keeps backwards compatibility, which should reduce pressure and
>>> give
>>> extension authors adequate time to consider these issues when porting.
>>
>> So just be to sure I understand, now PyModuleDef.m_slots will
>> unambiguously indicate whether or not an extension module is
>> compliant, right?
> 
> I'm not sure what you mean by "compliant". A non-NULL m_slots will
> indicate usage of multi-phase initialisation, so it at least indicates
> *intent* to correctly support subinterpreters et al. Actual delivery
> on that promise is still a different question :)

Yes, non-NULL m_slots means the module is compliant. If it's not, it's a
bug in the *module* (i.e. compliance is not *just* a matter of setting
setting m_slots).
This will be explained in the docs.

>>> [snip]
>>>
>>> The proposal
>>> ============
>>
>> This section should include an indication of how the loader (and
>> perhaps finder) will change for builtin, frozen, and extension
>> modules.  It may help to describe the proposal up front by how the
>> loader implementation would look if it were somehow implemented in
>> Python code.  The subsequent sections sometimes indicate where
>> different things take place, but an explicit outline (as Python code)
>> would make the entire flow really obvious.  Putting that toward the
>> beginning of this section would help clearly set the stage for the
>> rest of the proposal.
> 
> +1 for a pseudo-code overview of the loader implementation.

OK. Along with a link to PEP 451 code [*], it should make things clearer.
[*] https://www.python.org/dev/peps/pep-0451/#how-loading-will-work

>>> [snip]
>>> Unknown slot IDs will cause the import to fail with SystemError.
>>
>> Was there any consideration made for just ignoring unknown slot IDs?
>> My gut reaction is that you have it the right way, but I can still
>> imagine use cases for custom slots that PyModuleDef_Init wouldn't know
>> about.
> 
> The "known slots only, all other slot IDs are reserved for future use"
> slot semantics were copied directly from PyType_FromSpec in PEP 384.
> Since it's just a numeric slot ID, you'd run a high risk of conflicts
> if you allowed for custom extensions.
> 
> If folks want to do more clever things, they'll need to use the create
> or exec slot to stash them on the module object, rather than storing
> them in the module definition.

Right, if you need custom behavior, put it in a function and use the
provided hook. (If you need custom "slots" on PyModuleDef for some
reason, use a PyModuleDef subclass -- but I can't see where it would be
helpful.)
Ignoring unknown slot IDs would mean letting errors go unnoticed.

(Technicality: PyModuleDef_Init doesn't care about slots;
PyModule_FromDefAndSpec and PyModule_ExecDef do. and they will raise the
errors.)

>> When using multi-phase initialization, the *m_name* field of PyModuleDef
>> will
>> not be used during importing; the module name will be taken from the
>> ModuleSpec.
> 
> So m_name will be strictly ignored by PyModuleDef_Init?

Yes. The name is useful for introspection, but the import machinery will
use the name provided by the ModuleSpec.

(Technicality: again, PyModuleDef_Init doesn't touch names at all.
PyModule_FromDefAndSpec and PyModule_ExecDef do, and they will ignore
the name from the def.)

>>> The PyModuleDef object must be available for the lifetime of the module
>>> created
>>> from it ? usually, it will be declared statically.
>>
>> How easily will this be a source of mysterious errors-at-a-distance?
> 
> It shouldn't be any worse than static type definitions, and normal
> reference counting semantics should keep it alive regardless.

It's the the same as the current behavior (PEP 3121), where a
PyModuleDef is stored in the module, and if you let it die,
PyModule_GetState will give you an invalid pointer. It's just that in
PEP 489, the import machinery itself uses def, so you actually get to
feel the pain if you deallocate it.
All in all, this should not be a problem in practice; the PEP specifies
what'll happen if you go off doing exotic things. (For example, Cython
might run into this if it tries implementing a reloading scheme we
talked about earlier in the thread, and even then it shouldn't be a
major source of mysterious errors.) Normal mortals will be OK.

>> [snip]
>> However, only ModuleType instances support module-specific functionality
>> such as per-module state.
> 
> This is a pretty important point.  Presumably this constraints later
> behavior and precedes all functionality related to per-module state.

Yes. Module objects support more module-like behavior than other
objects. What you can and cannot use should be clear from the API. I'll
clarify a bit more what functionality depends on using a PyModule_Type
(or subclass) instance.
One thing I see I forgot to add is that execution slots are looked up
via PyModule_GetDef, so they won't be processed on non-module objects.

It's a very good idea to use a module subclass rather than a completely
custom object. The docs will need to strongly recommend this.

>>> [snip]
>>> Extension authors are advised to keep Py_mod_create minimal, an in
>>> particular
>>> to not call user code from it.
>>
>> This is a pretty important point as well.  We'll need to make sure
>> this is sufficiently clear in the documentation.  Would it make sense
>> to provide helpers for common cases, to encourage extension authors to
>> keep the create function minimal?
> 
> The main encouragement is to not handcode your extension modules at
> all, and let something like Cython or SWIG take care of the
> boilerplate :)

Yes, Cython should be default. For hand-written modules, the common case
should be not defining create at all.

>>> [snip]
>>>
>>> If PyModuleExec replaces the module's entry in sys.modules,
>>> the new object will be used and returned by importlib machinery.
>>
>> Just to be sure, something like "mod = sys.modules[modname]" is done
>> before each execution slot.  In other words, the result of the
>> previous execution slot should be used for the next one.
> 
> That's not the original intent of this paragraph - rather, it is
> referring to the existing behaviour of the import machinery.
> 
> However, I agree that now we're allowing the Py_mod_exec slot to be
> supplied multiple times, we should also be updating the module
> reference between slot invocations.

No, that won't work. It's possible (via direct calls to the import
machinery) to load a module without adding it to sys.modules.
The behavior should be clear (when you think about it) after I include
the loader implementation pseudocode.

> I also think the PEP could do with a brief mention of the additional
> modularity this approach brings at the C level - rather than having to
> jam everything into one function, an extension module can easily break
> up its initialisation into multiple steps, and its technically even
> possible to share common steps between different modules.

Eh, I think it's better to create one function that calls the parts,
which was always possible, and works just as well.
Repeating slots is allowed because it would be an unnecessary bother to
check for duplicates. It's not a feature to advertise, the PEP just
specifies that in the weird edge case, the intuitive thing will happen.

(I did have a useful future use case for repeated slots, but the current
PEP allows a better and more obvious solution so I'll not even mention
it again.)

Still, the steps are processed in a loop from a single function
(PyModule_ExecDef), and that function operates on a module object -- it
doesn't know about sys.modules and can't easily check if you replaced
the module somewhere.

>>> (This mirrors the behavior of Python modules. Note that implementing
>>> Py_mod_create is usually a better solution for the use cases this serves.)
>>
>> Could you elaborate?  What are those use cases and why would
>> Py_mod_create be better?
> 
> Rather than replacing the implicitly created normal module during
> Py_mod_exec (which is the only option available to Python modules),
> PEP 489 lets you define the Py_mod_create slot to override the module
> object creation directly.
> 
> Outside conversion of a Python module that manipulates sys.modules to
> an extension module with Cython, there's no real reason to use the
> "replacing yourself in sys.modules" option over using Py_mod_create
> directly.

Yes. The workaround you need to use in Python modules is possible for
extensions, but there's no reason to use it. I'll try to make it clearer
that it's an unnecessary workaround.

>>> [snip]
>>>
>>> Modules that need to work unchanged on older versions of Python should not
>>> use multi-phase initialization, because the benefits it brings can't be
>>> back-ported.
>>
>> Given your example below, "should not" seems a bit strong to me.  In
>> fact, what are the objections to encouraging the approach from the
>> example?
> 
> Agreed, "should not" is probably too strong here. On the other hand,
> preserving compatibility with older Python versions in a module that
> has been updated to rely on multi-phase initialization is likely to be
> a matter of "graceful degradation", rather than being able to
> reproduce comparable functionality (which I believe may have been the
> point Petr was trying to convey).

My point is that if you need graceful degradation, your best bet is to
stick with single-phase init. Then you'll have one code path that works
the same on all versions.
If you *need* the features of multi-phase init, you need to remove
support for Pythons that don't have it.
If you need both backwards compatibility and multi-phase init, you
essentially need to create two modules (with shared contents), and make
sure they end up in the same state after they're loaded.

> I expect Cython and SWIG may be able to manage that through
> appropriate use of #ifdef's in the generated code, but doing it by
> hand is likely to be painful, hence the potential benefits of just
> sticking with single-phase initialisation for the time being.

Yes, code generators are in a position to create two versions of the
module, and select one using using #ifdef.

The example in the PEP is helpful for other reasons than encouraging
#ifdef: it shows what needs to change when porting. Think of it as a diff :)

>>> [snip]
>>>
>>> Subinterpreters and Interpreter Reloading
>>> -----------------------------------------
>>>
>>> Extensions using the new initialization scheme are expected to support
>>> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
>>
>> Presumably this support is explicitly and completely defined in the
>> subsequent sentences.  Is it really just keeping "hidden" module state
>> encapsulated on the module object?  If not then it may make sense to
>> enumerate the requirements better for the sake of extension module
>> authors.

It is explained in the docs, see "Bugs and caveats" here:
https://docs.python.org/3/c-api/init.html#sub-interpreter-support
I'll add a link to that page.

> I'd actually like to have a better way of doing scenario testing for
> extension modules (subinterpreters, multiple initialize/finalize
> cycles, freezing), but I'm not sure this PEP is the best place to
> define that. Perhaps we could do a PyPI project that was a tox-based
> test battery for this kind of thing?

I think that's the wrong place to start. Currently, sub-interpreter
support is buried away in a docs chapter about Python
initialization/finalization, so a typical extension author won't even
notice it. We need to first make it *possible* to support
subinterpreters easily and correctly (so that Cython can do it), and to
document it prominently in the "writing extensions" part of the docs,
not only in "extending Python". Then,
This PEP does part of the first step, and the docs for it (which aren't
written yet) will do the second step.
After that, it could make sense to provide a tool for testing this.

>>> The mechanism is designed to make this easy, but care is still required
>>> on the part of the extension author.
>>> No user-defined functions, methods, or instances may leak to different
>>> interpreters.
>>> To achieve this, all module-level state should be kept in either the module
>>> dict, or in the module object's storage reachable by PyModule_GetState.
>>
>> Is this programmatically enforceable?

No. (I believe you could even prove this formally.)

>> Is there any mechanism for easily copying module state?

No. This would be impossible to provide in the general case. It's the
responsibility of your C code.
That said, if you need to copy module state, chances are your design
could use some rethinking.

>> How about sharing some state between subinterpreters? 

The PyCapsule API was designed for this.

>> How much room is there for letting extension module
>> authors define how their module behaves across multiple interpreters
>> or across multiple Initialize/Finalize cycles?

Technically, you have all the freedom you want. But if I embed Python
into my project/library, I'd want multiple sub-interpreters completely
isolated by default. If I use two libraries that each embed Python into
my app, I definitely want them isolated.
So the PEP tries to make it easy to keep multiple interpreters isolated.

> It's not programmatically enforcable, hence the idea above of finding
> a way to make it easier for people to test their extension modules are
> importable across multiple Python versions and deployment scenarios.
> 
>>> As a rule of thumb, modules that rely on PyState_FindModule are, at the
>>> moment,
>>> not good candidates for porting to the new mechanism.
>>
>> Are there any plans for a follow-up effort to help with this case?

See the link in the PEP. for initial discussion.

> The problem here is that the PEP 3121 module state approach provides
> storage on a *per-interpreter* basis, that is then shared amongst all
> module instances created from a given module definition.
> 
> This means that when _PyImport_FindExtensionObject (see
> https://hg.python.org/cpython/file/fc2eed9fc2d0/Python/import.c#l518)
> reinitialises an extension module, the state is shared between the two
> instances. When PEP 3121 was written, this was not seen as a problem,
> since the expectation was that the behaviour would only be triggered
> by multiple interpreter level initialize/finalize cycles.
> 
> One key scenario we missed at the time was "deleting an extension
> module from sys.modules and importing it a second time, while
> retaining a local reference for later restoration". Under PEP 3121,
> the two instances collide on their state storage, as we have two
> simultaneously existing module objects created in the same interpreter
> from the same module definition. PEP 489 would inherit that same
> problem if you tried to use it with the PyState_* APIs, so it simply
> doesn't allow them at all. (Earlier versions of the PEP allowed it
> with an "EXPORT_SINGLETON" slot that would disallow reimporting
> entirely, which we took out in favour of "just keep using the existing
> initialisation model in those cases for the time being")
> 
> For pure Python code, we don't have this problem, since the
> interpreter takes care of providing a properly scoped globals()
> reference to *all* functions defined in that module, regardless of
> whether they're module level functions or method definitions on a
> class. At the C level, we don't have that, as only module level
> functions get a module reference passed in - methods only get a
> reference to their class instance, without a reference to the module
> globals, and delayed callbacks can be a problem as well.
> 
> The best improved API we could likely offer at this point is a
> convenience API for looking up a module in *sys.modules* based on a
> PyModuleDef instance, and updating PEP 489 to write the as-imported
> module name into the returned PyModuleDef structure. That's probably
> not a bad way to go, given that PEP 489 currently *ignores* the m_name
> slot - flipping it around to be a *writable* slot would be a way to
> let extension modules know dynamically how to look themselves up in
> sys.modules.
> 
> The new lookup API would then be the moral equivalent of Python code
> doing "mod = sys.modules[__name__]". With this approach, actively
> *using* multiple references to a given module at the same time would
> still break (since you'll always get the module currently in
> sys.modules, even if that isn't the one you expected), but the
> "save-and-restore" model needed for certain kinds of testing and
> potentially other scenarios would work correctly.

I still think providing the module to classes is a better idea than a
lookup API, but that's going out of scope here.

>>> Module Reloading
>>> ----------------
>>>
>>> Reloading an extension module using importlib.reload() will continue to
>>> have no effect, except re-setting import-related attributes.
>>>
>>> Due to limitations in shared library loading (both dlopen on POSIX and
>>> LoadModuleEx on Windows), it is not generally possible to load
>>> a modified library after it has changed on disk.
>>>
>>> Use cases for reloading other than trying out a new version of the module
>>> are too rare to require all module authors to keep reloading in mind.
>>> If reload-like functionality is needed, authors can export a dedicated
>>> function for it.
>>
>> Keep in mind the semantics of reload for pure Python modules.  The
>> module is executed into the existing namespace, overwriting the loaded
>> namespace but leaving non-colliding attributes alone.  While the
>> semantics for reloading an extension/builtin/frozen module are
>> currently basic (i.e. a no-op), there may well be room to support
>> reload behavior that mirrors that of pure Python modules without
>> needing to reload an SO file.  I would expect either the behavior of
>> exec to get repeated (tricky due to "hidden" module state?) or for
>> there to be a "reload" slot that would mirror Py_mod_exec.
> 
> We considered this, and decided it was fairly pointless, since you
> can't modify the extension module code. The one case I see where it
> potentially makes sense is a "transitive reload", where the extension
> module retrieves and caches attributes from another pure Python module
> at import time, and that extension module has been reloaded.
> 
> It may also make a difference in the context of utilities like
> https://docs.python.org/3/library/test.html#test.support.import_fresh_module,
> where we manipulate the import system state to control how conditional
> imports are handled.
> 
>> At the same time, one may argue that reloading modules is not
>> something to encourage. :)
> 
> There's a reason import_fresh_module has never made it out of test.support :)

Right. Implementation-wise, it would actually be much easier to support
reload rather than make it a no-op. But then C module authors would need
to think about this edge case, which might be tricky to get right, would
not be likely to get test coverage, and is generally not useful anyway, .

If it turns out to be useful, it would be very simple to add an explicit
reload slot in the future.

>>> Multiple modules in one library
>>> -------------------------------
>>>
>>> To support multiple Python modules in one shared library, the library can
>>> export additional PyInit* symbols besides the one that corresponds
>>> to the library's filename.
>>>
>>> Note that this mechanism can currently only be used to *load* extra modules,
>>> but not to *find* them.
>>
>> What do you mean by "currently"?
> 
> It's a limitation of the way the existing finders work, rather than an
> inherent limitation of the import system as a whole.
> 
>> It may also be worth tying the above statement with the following
>> text, since the following appears to be an explanation of how to
>> address the "finder" caveat.
> 
> Agreed that this could be clearer.

OK, I'll clarify.


>> Summary of API Changes and Additions
>> ------------------------------------
>>
>> New functions:
>>
>> * PyModule_FromDefAndSpec (macro)
>> * PyModule_FromDefAndSpec2
>> * PyModule_ExecDef
>> * PyModule_SetDocString
>> * PyModule_AddFunctions
>> * PyModuleDef_Init
>>
>> New macros:
>>
>> * Py_mod_create
>> * Py_mod_exec
>>
>> New types:
>>
>> * PyModuleDef_Type will be exposed
>>
>> New structures:
>>
>> * PyModuleDef_Slot
>>
>> PyModuleDef.m_reload changes to PyModuleDef.m_slots.
> 
> This section is missing any explanation of the impact on
> Python/import.c, on the _imp/imp module, and on the 3 finders/loaders
> in Lib/importlib/_bootstrap[_external].py (builtin/frozen/extension).

I'll add a summary.

The internal _imp module will have backwards incompatible changes --
functions will be added and removed as necessary. That's what the
underscore means :)
The deprecated imp module will get a backwards compatibility shim for
anything it imported from _imp that got removed.
importlib will stay backwards compatible.

Python/import.c and Python/importdl.* will be rewritten entirely.
See the patches (linked from the PEP) for details.


From encukou at gmail.com  Tue May 19 16:55:04 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Tue, 19 May 2015 16:55:04 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <555B1937.5020001@gmail.com>
References: <5559F0FD.3080704@gmail.com>	<CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com>
Message-ID: <555B4EC8.3020002@gmail.com>

On 05/19/2015 01:06 PM, Petr Viktorin wrote:
> On 05/19/2015 05:51 AM, Nick Coghlan wrote:
>> On 19 May 2015 at 10:07, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>>> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin <encukou at gmail.com> wrote:
[snip]
>>>>
>>>> The proposal
>>>> ============
>>>
>>> This section should include an indication of how the loader (and
>>> perhaps finder) will change for builtin, frozen, and extension
>>> modules.  It may help to describe the proposal up front by how the
>>> loader implementation would look if it were somehow implemented in
>>> Python code.  The subsequent sections sometimes indicate where
>>> different things take place, but an explicit outline (as Python code)
>>> would make the entire flow really obvious.  Putting that toward the
>>> beginning of this section would help clearly set the stage for the
>>> rest of the proposal.
>>
>> +1 for a pseudo-code overview of the loader implementation.

Here is an overview of how the modified importers will operate.
Details such as logging or handling of errors and invalid states
are left out, and C code is presented with a concise Python-like syntax.

The framework that calls the importers is explained in PEP 451
[#pep-0451-loading]_.

importlib/_bootstrap.py:

    class BuiltinImporter:
        def create_module(self, spec):
            module = _imp.create_builtin(spec)

        def exec_module(self, module):
            _imp.exec_dynamic(module)

        def load_module(self, name):
            # use a backwards compatibility shim
            _load_module_shim(self, name)

importlib/_bootstrap_external.py:

    class ExtensionFileLoader:
        def create_module(self, spec):
            module = _imp.create_dynamic(spec)

        def exec_module(self, module):
            _imp.exec_dynamic(module)

        def load_module(self, name):
            # use a backwards compatibility shim
            _load_module_shim(self, name)

Python/import.c (the _imp module):

    def create_dynamic(spec):
        name = spec.name
        path = spec.origin

        # Find an already loaded module that used single-phase init.
        # For multi-phase initialization, mod is NULL, so a new module
        # is always created.
        mod = _PyImport_FindExtensionObject(name, name)
        if mod:
            return mod

        return _PyImport_LoadDynamicModuleWithSpec(spec)

    def exec_dynamic(module):
        def = PyModule_GetDef(module)
        state = PyModule_GetState(module)
        if state is NULL:
            PyModule_ExecDef(module, def)

    def create_builtin(spec):
        name = spec.name

        # Find an already loaded module that used single-phase init.
        # For multi-phase initialization, mod is NULL, so a new module
        # is always created.
        mod = _PyImport_FindExtensionObject(name, name)
        if mod:
            return mod

        for initname, initfunc in PyImport_Inittab:
            if name == initname:
                m = initfunc()
                if isinstance(m, PyModuleDef):
                    def = m
                    return PyModule_FromDefAndSpec(def, spec)
                else:
                    # fall back to single-phase initialization
                    module = m
                    _PyImport_FixupExtensionObject(module, name, name)
                    return module

Python/importdl.c:

    def _PyImport_LoadDynamicModuleWithSpec(spec):
        path = spec.origin
        package, dot, name = spec.name.rpartition('.')

        # see the "Non-ASCII module names" section for export_hook_name
        hook_name = export_hook_name(name)

        # call platform-specific function for loading exported function
        # from shared library
        exportfunc = _find_shared_funcptr(hook_name, path)

        m = exportfunc()
        if isinstance(m, PyModuleDef):
            def = m
            return PyModule_FromDefAndSpec(def, spec)

        module = m

        # fall back to single-phase initialization
        ....

Objects/moduleobject.c:

    def PyModule_FromDefAndSpec(def, spec):
        name = spec.name
        create = None
        for slot, value in def.m_slots:
            if slot == Py_mod_create:
                create = value
        if create:
            m = create(spec, def)
        else:
            m = PyModule_New(name)

        if isinstance(m, types.ModuleType):
            m.md_state = None
            m.md_def = def

        if def.m_methods:
            PyModule_AddFunctions(m, def.m_methods)
        if def.m_doc:
            PyModule_SetDocString(m, def.m_doc)

    def PyModule_ExecDef(module, def):
        if isinstance(module, types.module_type):
            if module.md_state is NULL:
                # allocate a block of zeroed-out memory
                module.md_state = _alloc(module.md_size)

        if def.m_slots is NULL:
            return

        for slot, value in def.m_slots:
            if slot == Py_mod_exec:
                value(module)


From ericsnowcurrently at gmail.com  Wed May 20 01:56:34 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Tue, 19 May 2015 17:56:34 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
Message-ID: <CALFfu7CY3VL8VrSvSfHn=n+1i_KBusNkWpQ=nHkosKD4rGQhww@mail.gmail.com>

On Mon, May 18, 2015 at 9:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 19 May 2015 at 10:07, Eric Snow <ericsnowcurrently at gmail.com> wrote:
  [snip]
>> Was there any consideration made for just ignoring unknown slot IDs?
>> My gut reaction is that you have it the right way, but I can still
>> imagine use cases for custom slots that PyModuleDef_Init wouldn't know
>> about.
>
> The "known slots only, all other slot IDs are reserved for future use"
> slot semantics were copied directly from PyType_FromSpec in PEP 384.
> Since it's just a numeric slot ID, you'd run a high risk of conflicts
> if you allowed for custom extensions.
>
> If folks want to do more clever things, they'll need to use the create
> or exec slot to stash them on the module object, rather than storing
> them in the module definition.

Makes sense.  This does remind me of something I wanted to ask.  Would
it make sense to leverage ModuleSpec.loader_state?  If I recall
correctly, we added loader_state with extension modules in mind.

>
>>> The PyModuleDef object must be available for the lifetime of the module
>>> created
>>> from it ? usually, it will be declared statically.
>>
>> How easily will this be a source of mysterious errors-at-a-distance?
>
> It shouldn't be any worse than static type definitions, and normal
> reference counting semantics should keep it alive regardless.

Got it.

>
>>> [snip]
>>> Extension authors are advised to keep Py_mod_create minimal, an in
>>> particular
>>> to not call user code from it.
>>
>> This is a pretty important point as well.  We'll need to make sure
>> this is sufficiently clear in the documentation.  Would it make sense
>> to provide helpers for common cases, to encourage extension authors to
>> keep the create function minimal?
>
> The main encouragement is to not handcode your extension modules at
> all, and let something like Cython or SWIG take care of the
> boilerplate :)

Hey, I tried to make something happen over on python-ideas! :)  Some
folks just don't want to go far enough.

  [snip]
>> Could you elaborate?  What are those use cases and why would
>> Py_mod_create be better?
>
> Rather than replacing the implicitly created normal module during
> Py_mod_exec (which is the only option available to Python modules),
> PEP 489 lets you define the Py_mod_create slot to override the module
> object creation directly.
>
> Outside conversion of a Python module that manipulates sys.modules to
> an extension module with Cython, there's no real reason to use the
> "replacing yourself in sys.modules" option over using Py_mod_create
> directly.

Ah, I got it.  We just want to ensure we match Python module behavior,
where there is no module-defined create step.  This would seem even
more important with tools like Cython that convert Python modules into
C extensions, even if the appropriate solution for a C extension
module would be a different approach (e.g. Py_mod_create).

  [snip]
>> Given your example below, "should not" seems a bit strong to me.  In
>> fact, what are the objections to encouraging the approach from the
>> example?
>
> Agreed, "should not" is probably too strong here. On the other hand,
> preserving compatibility with older Python versions in a module that
> has been updated to rely on multi-phase initialization is likely to be
> a matter of "graceful degradation", rather than being able to
> reproduce comparable functionality (which I believe may have been the
> point Petr was trying to convey).

Understood.  This section could stand to be clarified then.

>
> I expect Cython and SWIG may be able to manage that through
> appropriate use of #ifdef's in the generated code, but doing it by
> hand is likely to be painful, hence the potential benefits of just
> sticking with single-phase initialisation for the time being.

Hmm.  The example made it look relatively straight-forward.
Regardless, it's not a big deal.

>
>>> [snip]
>>>
>>> Subinterpreters and Interpreter Reloading
>>> -----------------------------------------
>>>
>>> Extensions using the new initialization scheme are expected to support
>>> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
>>
>> Presumably this support is explicitly and completely defined in the
>> subsequent sentences.  Is it really just keeping "hidden" module state
>> encapsulated on the module object?  If not then it may make sense to
>> enumerate the requirements better for the sake of extension module
>> authors.
>
> I'd actually like to have a better way of doing scenario testing for
> extension modules (subinterpreters, multiple initialize/finalize
> cycles, freezing), but I'm not sure this PEP is the best place to
> define that. Perhaps we could do a PyPI project that was a tox-based
> test battery for this kind of thing?

Interesting idea.  I think that a lot of folks would find that useful.
It feels a bit like some of the work Dave Malcolm did with validating
extension modules.

>
>>> The mechanism is designed to make this easy, but care is still required
>>> on the part of the extension author.
>>> No user-defined functions, methods, or instances may leak to different
>>> interpreters.
>>> To achieve this, all module-level state should be kept in either the module
>>> dict, or in the module object's storage reachable by PyModule_GetState.
>>
>> Is this programmatically enforceable?  Is there any mechanism for
>> easily copying module state?  How about sharing some state between
>> subinterpreters?  How much room is there for letting extension module
>> authors define how their module behaves across multiple interpreters
>> or across multiple Initialize/Finalize cycles?
>
> It's not programmatically enforcable, hence the idea above of finding
> a way to make it easier for people to test their extension modules are
> importable across multiple Python versions and deployment scenarios.

That's what I figured.

>
>>> As a rule of thumb, modules that rely on PyState_FindModule are, at the
>>> moment,
>>> not good candidates for porting to the new mechanism.
>>
>> Are there any plans for a follow-up effort to help with this case?
>
> The problem here is that the PEP 3121 module state approach provides
> storage on a *per-interpreter* basis, that is then shared amongst all
> module instances created from a given module definition.

You mean a form of interpreter-local storage?  Also, the module
definition is effectively global right?

>
> This means that when _PyImport_FindExtensionObject (see
> https://hg.python.org/cpython/file/fc2eed9fc2d0/Python/import.c#l518)
> reinitialises an extension module, the state is shared between the two
> instances. When PEP 3121 was written, this was not seen as a problem,
> since the expectation was that the behaviour would only be triggered
> by multiple interpreter level initialize/finalize cycles.
>
> One key scenario we missed at the time was "deleting an extension
> module from sys.modules and importing it a second time, while
> retaining a local reference for later restoration". Under PEP 3121,
> the two instances collide on their state storage, as we have two
> simultaneously existing module objects created in the same interpreter
> from the same module definition. PEP 489 would inherit that same
> problem if you tried to use it with the PyState_* APIs, so it simply
> doesn't allow them at all. (Earlier versions of the PEP allowed it
> with an "EXPORT_SINGLETON" slot that would disallow reimporting
> entirely, which we took out in favour of "just keep using the existing
> initialisation model in those cases for the time being")

That seems reasonable.

>
> For pure Python code, we don't have this problem, since the
> interpreter takes care of providing a properly scoped globals()
> reference to *all* functions defined in that module, regardless of
> whether they're module level functions or method definitions on a
> class. At the C level, we don't have that, as only module level
> functions get a module reference passed in - methods only get a
> reference to their class instance, without a reference to the module
> globals, and delayed callbacks can be a problem as well.

Yuck.  Is this something we could fix?  Is __module__ not set on all functions?

>
> The best improved API we could likely offer at this point is a
> convenience API for looking up a module in *sys.modules* based on a
> PyModuleDef instance, and updating PEP 489 to write the as-imported
> module name into the returned PyModuleDef structure. That's probably
> not a bad way to go, given that PEP 489 currently *ignores* the m_name
> slot - flipping it around to be a *writable* slot would be a way to
> let extension modules know dynamically how to look themselves up in
> sys.modules.

That sounds useful.

>
> The new lookup API would then be the moral equivalent of Python code
> doing "mod = sys.modules[__name__]". With this approach, actively
> *using* multiple references to a given module at the same time would
> still break (since you'll always get the module currently in
> sys.modules, even if that isn't the one you expected), but the
> "save-and-restore" model needed for certain kinds of testing and
> potentially other scenarios would work correctly.

Right, though I would expect there to be trouble if the replacement
module didn't support the module state API in the expected way.

>
>>> Module Reloading
>>> ----------------
>>>
>>> Reloading an extension module using importlib.reload() will continue to
>>> have no effect, except re-setting import-related attributes.
>>>
>>> Due to limitations in shared library loading (both dlopen on POSIX and
>>> LoadModuleEx on Windows), it is not generally possible to load
>>> a modified library after it has changed on disk.
>>>
>>> Use cases for reloading other than trying out a new version of the module
>>> are too rare to require all module authors to keep reloading in mind.
>>> If reload-like functionality is needed, authors can export a dedicated
>>> function for it.
>>
>> Keep in mind the semantics of reload for pure Python modules.  The
>> module is executed into the existing namespace, overwriting the loaded
>> namespace but leaving non-colliding attributes alone.  While the
>> semantics for reloading an extension/builtin/frozen module are
>> currently basic (i.e. a no-op), there may well be room to support
>> reload behavior that mirrors that of pure Python modules without
>> needing to reload an SO file.  I would expect either the behavior of
>> exec to get repeated (tricky due to "hidden" module state?) or for
>> there to be a "reload" slot that would mirror Py_mod_exec.
>
> We considered this, and decided it was fairly pointless, since you
> can't modify the extension module code. The one case I see where it
> potentially makes sense is a "transitive reload", where the extension
> module retrieves and caches attributes from another pure Python module
> at import time, and that extension module has been reloaded.

The reload approach specified in the PEP seems satisfactory at this point.

>
> It may also make a difference in the context of utilities like
> https://docs.python.org/3/library/test.html#test.support.import_fresh_module,
> where we manipulate the import system state to control how conditional
> imports are handled.
>
>> At the same time, one may argue that reloading modules is not
>> something to encourage. :)
>
> There's a reason import_fresh_module has never made it out of test.support :)
>
>>> Multiple modules in one library
>>> -------------------------------
>>>
>>> To support multiple Python modules in one shared library, the library can
>>> export additional PyInit* symbols besides the one that corresponds
>>> to the library's filename.
>>>
>>> Note that this mechanism can currently only be used to *load* extra modules,
>>> but not to *find* them.
>>
>> What do you mean by "currently"?
>
> It's a limitation of the way the existing finders work, rather than an
> inherent limitation of the import system as a whole.

Ah.  It sounded like the PEP was leading to some future solution to
resolve that.

>
>> It may also be worth tying the above statement with the following
>> text, since the following appears to be an explanation of how to
>> address the "finder" caveat.
>
> Agreed that this could be clearer.
>
>>> Testing and initial implementations
>>> -----------------------------------
>>>
>>> For testing, a new built-in module ``_testmultiphase`` will be created.
>>> The library will export several additional modules using the mechanism
>>> described in "Multiple modules in one library".
>>>
>>> The ``_testcapi`` module will be unchanged, and will use single-phase
>>> initialization indefinitely (or until it is no longer supported).
>>>
>>> The ``array`` and ``xx*`` modules will be converted to use multi-phase
>>> initialization as part of the initial implementation.
>>
>> What do you mean by "initial implementation"?  Will it be done
>> differently in a later implementation?
>
> These modules will be converted in the reference implementation, other
> modules won't be.

That's what I thought.  The use of the word "initial" threw me off.

>
>>> String constants and types can be handled similarly.
>>> (Note that non-default bases for types cannot be portably specified
>>> statically; this case would need a Py_mod_exec function that runs
>>> before the slots are added. The free error-checking would still be
>>> beneficial, though.)
>>
>> This implies to me that now is the time to ensure that this PEP
>> appropriately accommodates that need.  It would be unfortunate if we
>> had to later hack in some extra API to accommodate a use case we
>> already know about.  Better if we made sure the currently proposed
>> changes could accommodate the need, even if the implementation of that
>> part were not part of this PEP.
>
> This would be a new kind of execution slot, so the PEP already
> accommodates these possible future extensions.

Sounds good.  The explanation made it sound like a mechanism would be
required that could not be handled via a slot.

>
>>> Another possibility is providing a "main" function that would be run
>>> when the module is given to Python's -m switch.
>>> For this to work, the runpy module will need to be modified to take
>>> advantage of ModuleSpec-based loading introduced in PEP 451.
>>
>> I'll point out that the pure-Python equivalent has been proposed on a
>> number of occasions and been rejected every time.  However, in the
>> case of extension modules it is more justifiable.  If extension
>> modules gain such a mechanism then it may be a justification for doing
>> something similar in Python.
>>
>>> Also, it will be necessary to add a mechanism for setting up a module
>>> according to slots it wasn't originally defined with.
>>
>> What does this mean?
>
> When you use the -m switch, you always run in the builtin __main__
> module namespace, and runpy fiddles with __main__.__spec__ to match
> the details of the module passed to the switch. That's not currently a
> trick we can manage when the "thing to run" is an extension module.

I see now.

-eric

From ericsnowcurrently at gmail.com  Wed May 20 02:22:47 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Tue, 19 May 2015 18:22:47 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <555B1937.5020001@gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com>
Message-ID: <CALFfu7DQs_njKixMVcXPXxXG=jmu1OAP1tD1k8FMwD=Zu0vC6w@mail.gmail.com>

On Tue, May 19, 2015 at 5:06 AM, Petr Viktorin <encukou at gmail.com> wrote:
> On 05/19/2015 05:51 AM, Nick Coghlan wrote:
>> On 19 May 2015 at 10:07, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>>> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin <encukou at gmail.com> wrote:
>>>> [snip]
>>>>
>>>> Furthermore, the majority of currently existing extension modules has
>>>> problems with sub-interpreter support and/or interpreter reloading, and,
>>>> while
>>>> it is possible with the current infrastructure to support these
>>>> features, it is neither easy nor efficient.
>>>> Addressing these issues was the goal of PEP 3121, but many extensions,
>>>> including some in the standard library, took the least-effort approach
>>>> to porting to Python 3, leaving these issues unresolved.
>>>> This PEP keeps backwards compatibility, which should reduce pressure and
>>>> give
>>>> extension authors adequate time to consider these issues when porting.
>>>
>>> So just be to sure I understand, now PyModuleDef.m_slots will
>>> unambiguously indicate whether or not an extension module is
>>> compliant, right?
>>
>> I'm not sure what you mean by "compliant". A non-NULL m_slots will
>> indicate usage of multi-phase initialisation, so it at least indicates
>> *intent* to correctly support subinterpreters et al. Actual delivery
>> on that promise is still a different question :)
>
> Yes, non-NULL m_slots means the module is compliant. If it's not, it's a
> bug in the *module* (i.e. compliance is not *just* a matter of setting
> setting m_slots).
> This will be explained in the docs.

Perfect.

>
>>>> [snip]
>>>>
>>>> The proposal
>>>> ============
>>>
>>> This section should include an indication of how the loader (and
>>> perhaps finder) will change for builtin, frozen, and extension
>>> modules.  It may help to describe the proposal up front by how the
>>> loader implementation would look if it were somehow implemented in
>>> Python code.  The subsequent sections sometimes indicate where
>>> different things take place, but an explicit outline (as Python code)
>>> would make the entire flow really obvious.  Putting that toward the
>>> beginning of this section would help clearly set the stage for the
>>> rest of the proposal.
>>
>> +1 for a pseudo-code overview of the loader implementation.
>
> OK. Along with a link to PEP 451 code [*], it should make things clearer.
> [*] https://www.python.org/dev/peps/pep-0451/#how-loading-will-work

Sounds good.

>
>>>> [snip]
>>>> Unknown slot IDs will cause the import to fail with SystemError.
>>>
>>> Was there any consideration made for just ignoring unknown slot IDs?
>>> My gut reaction is that you have it the right way, but I can still
>>> imagine use cases for custom slots that PyModuleDef_Init wouldn't know
>>> about.
>>
>> The "known slots only, all other slot IDs are reserved for future use"
>> slot semantics were copied directly from PyType_FromSpec in PEP 384.
>> Since it's just a numeric slot ID, you'd run a high risk of conflicts
>> if you allowed for custom extensions.
>>
>> If folks want to do more clever things, they'll need to use the create
>> or exec slot to stash them on the module object, rather than storing
>> them in the module definition.
>
> Right, if you need custom behavior, put it in a function and use the
> provided hook. (If you need custom "slots" on PyModuleDef for some
> reason, use a PyModuleDef subclass -- but I can't see where it would be
> helpful.)
> Ignoring unknown slot IDs would mean letting errors go unnoticed.

This is reasonable.  Thanks.

>
> (Technicality: PyModuleDef_Init doesn't care about slots;
> PyModule_FromDefAndSpec and PyModule_ExecDef do. and they will raise the
> errors.)
>
>>> When using multi-phase initialization, the *m_name* field of PyModuleDef
>>> will
>>> not be used during importing; the module name will be taken from the
>>> ModuleSpec.
>>
>> So m_name will be strictly ignored by PyModuleDef_Init?
>
> Yes. The name is useful for introspection, but the import machinery will
> use the name provided by the ModuleSpec.

Okay.

>
> (Technicality: again, PyModuleDef_Init doesn't touch names at all.
> PyModule_FromDefAndSpec and PyModule_ExecDef do, and they will ignore
> the name from the def.)
>
>>>> The PyModuleDef object must be available for the lifetime of the module
>>>> created
>>>> from it ? usually, it will be declared statically.
>>>
>>> How easily will this be a source of mysterious errors-at-a-distance?
>>
>> It shouldn't be any worse than static type definitions, and normal
>> reference counting semantics should keep it alive regardless.
>
> It's the the same as the current behavior (PEP 3121), where a
> PyModuleDef is stored in the module, and if you let it die,
> PyModule_GetState will give you an invalid pointer. It's just that in
> PEP 489, the import machinery itself uses def, so you actually get to
> feel the pain if you deallocate it.
> All in all, this should not be a problem in practice; the PEP specifies
> what'll happen if you go off doing exotic things. (For example, Cython
> might run into this if it tries implementing a reloading scheme we
> talked about earlier in the thread, and even then it shouldn't be a
> major source of mysterious errors.) Normal mortals will be OK.

Thanks for explaining.  I'm less concerned now.

>
>>> [snip]
>>> However, only ModuleType instances support module-specific functionality
>>> such as per-module state.
>>
>> This is a pretty important point.  Presumably this constraints later
>> behavior and precedes all functionality related to per-module state.
>
> Yes. Module objects support more module-like behavior than other
> objects. What you can and cannot use should be clear from the API. I'll
> clarify a bit more what functionality depends on using a PyModule_Type
> (or subclass) instance.
> One thing I see I forgot to add is that execution slots are looked up
> via PyModule_GetDef, so they won't be processed on non-module objects.

Okay.  That makes sense now.

>
> It's a very good idea to use a module subclass rather than a completely
> custom object. The docs will need to strongly recommend this.

Agreed.  And the docs should also be clear on how non-module objects
are basically ignored, slot-wise.

>
>>>> [snip]
>>>> Extension authors are advised to keep Py_mod_create minimal, an in
>>>> particular
>>>> to not call user code from it.
>>>
>>> This is a pretty important point as well.  We'll need to make sure
>>> this is sufficiently clear in the documentation.  Would it make sense
>>> to provide helpers for common cases, to encourage extension authors to
>>> keep the create function minimal?
>>
>> The main encouragement is to not handcode your extension modules at
>> all, and let something like Cython or SWIG take care of the
>> boilerplate :)
>
> Yes, Cython should be default. For hand-written modules, the common case
> should be not defining create at all.

The docs should be explicit about this.

>
>>>> [snip]
>>>>
>>>> If PyModuleExec replaces the module's entry in sys.modules,
>>>> the new object will be used and returned by importlib machinery.
>>>
>>> Just to be sure, something like "mod = sys.modules[modname]" is done
>>> before each execution slot.  In other words, the result of the
>>> previous execution slot should be used for the next one.
>>
>> That's not the original intent of this paragraph - rather, it is
>> referring to the existing behaviour of the import machinery.
>>
>> However, I agree that now we're allowing the Py_mod_exec slot to be
>> supplied multiple times, we should also be updating the module
>> reference between slot invocations.
>
> No, that won't work. It's possible (via direct calls to the import
> machinery) to load a module without adding it to sys.modules.

What direct calls do you mean?  I would not expect any such mechanism
to work properly with extension modules.

> The behavior should be clear (when you think about it) after I include
> the loader implementation pseudocode.

Okay.

>
>> I also think the PEP could do with a brief mention of the additional
>> modularity this approach brings at the C level - rather than having to
>> jam everything into one function, an extension module can easily break
>> up its initialisation into multiple steps, and its technically even
>> possible to share common steps between different modules.
>
> Eh, I think it's better to create one function that calls the parts,
> which was always possible, and works just as well.
> Repeating slots is allowed because it would be an unnecessary bother to
> check for duplicates. It's not a feature to advertise, the PEP just
> specifies that in the weird edge case, the intuitive thing will happen.

Be that as it may, I think it would be a mistake to treat support for
multiple exec slots as a second-class citizen in the design.
Personally I find it an appealing feature.

>
> (I did have a useful future use case for repeated slots, but the current
> PEP allows a better and more obvious solution so I'll not even mention
> it again.)
>
> Still, the steps are processed in a loop from a single function
> (PyModule_ExecDef), and that function operates on a module object -- it
> doesn't know about sys.modules and can't easily check if you replaced
> the module somewhere.

I would consider this approach to be a mistake as well.  The approach
should stay consistent with the semantics of the whole import system,
where sys.modules is checked directly.  Unfortunately, that ship has
already sailed.

>
>>>> (This mirrors the behavior of Python modules. Note that implementing
>>>> Py_mod_create is usually a better solution for the use cases this serves.)
>>>
>>> Could you elaborate?  What are those use cases and why would
>>> Py_mod_create be better?
>>
>> Rather than replacing the implicitly created normal module during
>> Py_mod_exec (which is the only option available to Python modules),
>> PEP 489 lets you define the Py_mod_create slot to override the module
>> object creation directly.
>>
>> Outside conversion of a Python module that manipulates sys.modules to
>> an extension module with Cython, there's no real reason to use the
>> "replacing yourself in sys.modules" option over using Py_mod_create
>> directly.
>
> Yes. The workaround you need to use in Python modules is possible for
> extensions, but there's no reason to use it. I'll try to make it clearer
> that it's an unnecessary workaround.

Thank you.

>
>>>> [snip]
>>>>
>>>> Modules that need to work unchanged on older versions of Python should not
>>>> use multi-phase initialization, because the benefits it brings can't be
>>>> back-ported.
>>>
>>> Given your example below, "should not" seems a bit strong to me.  In
>>> fact, what are the objections to encouraging the approach from the
>>> example?
>>
>> Agreed, "should not" is probably too strong here. On the other hand,
>> preserving compatibility with older Python versions in a module that
>> has been updated to rely on multi-phase initialization is likely to be
>> a matter of "graceful degradation", rather than being able to
>> reproduce comparable functionality (which I believe may have been the
>> point Petr was trying to convey).
>
> My point is that if you need graceful degradation, your best bet is to
> stick with single-phase init. Then you'll have one code path that works
> the same on all versions.
> If you *need* the features of multi-phase init, you need to remove
> support for Pythons that don't have it.
> If you need both backwards compatibility and multi-phase init, you
> essentially need to create two modules (with shared contents), and make
> sure they end up in the same state after they're loaded.
>
>> I expect Cython and SWIG may be able to manage that through
>> appropriate use of #ifdef's in the generated code, but doing it by
>> hand is likely to be painful, hence the potential benefits of just
>> sticking with single-phase initialisation for the time being.
>
> Yes, code generators are in a position to create two versions of the
> module, and select one using using #ifdef.
>
> The example in the PEP is helpful for other reasons than encouraging
> #ifdef: it shows what needs to change when porting. Think of it as a diff :)

It may be worth being more clear about that.

>
>>>> [snip]
>>>>
>>>> Subinterpreters and Interpreter Reloading
>>>> -----------------------------------------
>>>>
>>>> Extensions using the new initialization scheme are expected to support
>>>> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly.
>>>
>>> Presumably this support is explicitly and completely defined in the
>>> subsequent sentences.  Is it really just keeping "hidden" module state
>>> encapsulated on the module object?  If not then it may make sense to
>>> enumerate the requirements better for the sake of extension module
>>> authors.
>
> It is explained in the docs, see "Bugs and caveats" here:
> https://docs.python.org/3/c-api/init.html#sub-interpreter-support
> I'll add a link to that page.

Cool.

>
>> I'd actually like to have a better way of doing scenario testing for
>> extension modules (subinterpreters, multiple initialize/finalize
>> cycles, freezing), but I'm not sure this PEP is the best place to
>> define that. Perhaps we could do a PyPI project that was a tox-based
>> test battery for this kind of thing?
>
> I think that's the wrong place to start. Currently, sub-interpreter
> support is buried away in a docs chapter about Python
> initialization/finalization, so a typical extension author won't even
> notice it. We need to first make it *possible* to support
> subinterpreters easily and correctly (so that Cython can do it), and to
> document it prominently in the "writing extensions" part of the docs,
> not only in "extending Python". Then,
> This PEP does part of the first step, and the docs for it (which aren't
> written yet) will do the second step.
> After that, it could make sense to provide a tool for testing this.

There's nothing about the docs that precludes putting testing helpers
up on PyPI though.  However, I'm definitely +1 on improving the docs.

>
>>>> The mechanism is designed to make this easy, but care is still required
>>>> on the part of the extension author.
>>>> No user-defined functions, methods, or instances may leak to different
>>>> interpreters.
>>>> To achieve this, all module-level state should be kept in either the module
>>>> dict, or in the module object's storage reachable by PyModule_GetState.
>>>
>>> Is this programmatically enforceable?
>
> No. (I believe you could even prove this formally.)
>
>>> Is there any mechanism for easily copying module state?
>
> No. This would be impossible to provide in the general case. It's the
> responsibility of your C code.
> That said, if you need to copy module state, chances are your design
> could use some rethinking.
>
>>> How about sharing some state between subinterpreters?
>
> The PyCapsule API was designed for this.

I'm simply thinking in terms of the options we have for a PEP I'm
working on that will facilitate passing objects between
subinterpreters and even possibly sharing some state between them.
Currently it will be practically necessary to exclude extension
modules from any such mechanism.  So I was wondering if there would be
a way to allow extension module authors to define how at least some of
the module's data could be shared between subinterpreters.

>
>>> How much room is there for letting extension module
>>> authors define how their module behaves across multiple interpreters
>>> or across multiple Initialize/Finalize cycles?
>
> Technically, you have all the freedom you want. But if I embed Python
> into my project/library, I'd want multiple sub-interpreters completely
> isolated by default. If I use two libraries that each embed Python into
> my app, I definitely want them isolated.
> So the PEP tries to make it easy to keep multiple interpreters isolated.

As I just noted, I'm looking at making use of subinterpreters for a
different use case where it *does* make sense to effectively share
objects between them.

  [snip]
>>> At the same time, one may argue that reloading modules is not
>>> something to encourage. :)
>>
>> There's a reason import_fresh_module has never made it out of test.support :)
>
> Right. Implementation-wise, it would actually be much easier to support
> reload rather than make it a no-op. But then C module authors would need
> to think about this edge case, which might be tricky to get right, would
> not be likely to get test coverage, and is generally not useful anyway, .
>
> If it turns out to be useful, it would be very simple to add an explicit
> reload slot in the future.

Agreed.

  [snip]
>> This section is missing any explanation of the impact on
>> Python/import.c, on the _imp/imp module, and on the 3 finders/loaders
>> in Lib/importlib/_bootstrap[_external].py (builtin/frozen/extension).
>
> I'll add a summary.
>
> The internal _imp module will have backwards incompatible changes --
> functions will be added and removed as necessary. That's what the
> underscore means :)

Be careful with that assumption.  We've had plenty of experiences
where the assumption because unreliable.

> The deprecated imp module will get a backwards compatibility shim for
> anything it imported from _imp that got removed.
> importlib will stay backwards compatible.
>
> Python/import.c and Python/importdl.* will be rewritten entirely.
> See the patches (linked from the PEP) for details.
>

-eric

From ericsnowcurrently at gmail.com  Wed May 20 02:33:03 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Tue, 19 May 2015 18:33:03 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <555B4B4A.5000902@redhat.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com> <555B4B4A.5000902@redhat.com>
Message-ID: <CALFfu7DEvT8vVA3AMLmp24gskeoMJwNK24TZ0g_FCFLkaW6CtQ@mail.gmail.com>

On Tue, May 19, 2015 at 8:40 AM, Petr Viktorin <pviktori at redhat.com> wrote:
> Here is an overview of how the modified importers will operate.
> Details such as logging or handling of errors and invalid states
> are left out, and C code is presented with a concise Python-like syntax.
>
> The framework that calls the importers is explained in PEP 451
> [#pep-0451-loading]_.

I know.  I wrote that PEP. :)

>
> importlib/_bootstrap.py:
>
>     class BuiltinImporter:
>         def create_module(self, spec):
>             module = _imp.create_builtin(spec)
>
>         def exec_module(self, module):
>             _imp.exec_dynamic(module)
>
>         def load_module(self, name):
>             # use a backwards compatibility shim
>             _load_module_shim(self, name)

Won't frozen modules be likewise affected?

>
> importlib/_bootstrap_external.py:
>
>     class ExtensionFileLoader:
>         def create_module(self, spec):
>             module = _imp.create_dynamic(spec)
>
>         def exec_module(self, module):
>             _imp.exec_dynamic(module)
>
>         def load_module(self, name):
>             # use a backwards compatibility shim
>             _load_module_shim(self, name)
>
> Python/import.c (the _imp module):
>
>     def create_dynamic(spec):
>         name = spec.name
>         path = spec.origin
>
>         # Find an already loaded module that used single-phase init.
>         # For multi-phase initialization, mod is NULL, so a new module
>         # is always created.
>         mod = _PyImport_FindExtensionObject(name, name)
>         if mod:
>             return mod
>
>         return _PyImport_LoadDynamicModuleWithSpec(spec)
>
>     def exec_dynamic(module):
>         def = PyModule_GetDef(module)

This is the point where custom module types get ignored, right?

>         state = PyModule_GetState(module)
>         if state is NULL:
>             PyModule_ExecDef(module, def)

Ah, it is idempotent.

>
>     def create_builtin(spec):
>         name = spec.name
>
>         # Find an already loaded module that used single-phase init.
>         # For multi-phase initialization, mod is NULL, so a new module
>         # is always created.
>         mod = _PyImport_FindExtensionObject(name, name)
>         if mod:
>             return mod
>
>         for initname, initfunc in PyImport_Inittab:
>             if name == initname:
>                 m = initfunc()
>                 if isinstance(m, PyModuleDef):
>                     def = m
>                     return PyModule_FromDefAndSpec(def, spec)
>                 else:
>                     # fall back to single-phase initialization
>                     module = m
>                     _PyImport_FixupExtensionObject(module, name, name)
>                     return module
>
> Python/importdl.c:
>
>     def _PyImport_LoadDynamicModuleWithSpec(spec):
>         path = spec.origin
>         package, dot, name = spec.name.rpartition('.')
>
>         # see the "Non-ASCII module names" section for export_hook_name
>         hook_name = export_hook_name(name)
>
>         # call platform-specific function for loading exported function
>         # from shared library
>         exportfunc = _find_shared_funcptr(hook_name, path)
>
>         m = exportfunc()
>         if isinstance(m, PyModuleDef):
>             def = m
>             return PyModule_FromDefAndSpec(def, spec)
>
>         module = m
>
>         # fall back to single-phase initialization
>         ....
>
> Objects/moduleobject.c:
>
>     def PyModule_FromDefAndSpec(def, spec):
>         name = spec.name
>         create = None
>         for slot, value in def.m_slots:
>             if slot == Py_mod_create:
>                 create = value
>         if create:
>             m = create(spec, def)
>         else:
>             m = PyModule_New(name)
>
>         if isinstance(m, types.ModuleType):
>             m.md_state = None
>             m.md_def = def
>
>         if def.m_methods:
>             PyModule_AddFunctions(m, def.m_methods)
>         if def.m_doc:
>             PyModule_SetDocString(m, def.m_doc)
>
>     def PyModule_ExecDef(module, def):
>         if isinstance(module, types.module_type):
>             if module.md_state is NULL:
>                 # allocate a block of zeroed-out memory
>                 module.md_state = _alloc(module.md_size)
>
>         if def.m_slots is NULL:
>             return
>
>         for slot, value in def.m_slots:
>             if slot == Py_mod_exec:
>                 value(module)
>
>

It may also be worth outlining how PyModuleDef_Init will work.

-eric

From encukou at gmail.com  Wed May 20 10:41:30 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Wed, 20 May 2015 10:41:30 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CALFfu7DEvT8vVA3AMLmp24gskeoMJwNK24TZ0g_FCFLkaW6CtQ@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>	<CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>	<CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>	<555B1937.5020001@gmail.com>	<555B4B4A.5000902@redhat.com>
 <CALFfu7DEvT8vVA3AMLmp24gskeoMJwNK24TZ0g_FCFLkaW6CtQ@mail.gmail.com>
Message-ID: <555C48BA.4080204@gmail.com>

On 05/20/2015 02:33 AM, Eric Snow wrote:
> On Tue, May 19, 2015 at 8:40 AM, Petr Viktorin <pviktori at redhat.com> wrote:
>> Here is an overview of how the modified importers will operate.
>> Details such as logging or handling of errors and invalid states
>> are left out, and C code is presented with a concise Python-like syntax.
>>
>> The framework that calls the importers is explained in PEP 451
>> [#pep-0451-loading]_.
> 
> I know.  I wrote that PEP. :)
> 
>>
>> importlib/_bootstrap.py:
>>
>>     class BuiltinImporter:
>>         def create_module(self, spec):
>>             module = _imp.create_builtin(spec)
>>
>>         def exec_module(self, module):
>>             _imp.exec_dynamic(module)
>>
>>         def load_module(self, name):
>>             # use a backwards compatibility shim
>>             _load_module_shim(self, name)
> 
> Won't frozen modules be likewise affected?

No, frozen modules are Python source, just not loaded from a file.

[...]
>> Python/import.c (the _imp module):
>>
>>     def create_dynamic(spec):
>>         name = spec.name
>>         path = spec.origin
>>
>>         # Find an already loaded module that used single-phase init.
>>         # For multi-phase initialization, mod is NULL, so a new module
>>         # is always created.
>>         mod = _PyImport_FindExtensionObject(name, name)
>>         if mod:
>>             return mod
>>
>>         return _PyImport_LoadDynamicModuleWithSpec(spec)
>>
>>     def exec_dynamic(module):
>>         def = PyModule_GetDef(module)
> 
> This is the point where custom module types get ignored, right?

Yes. The  actual code has a check for non-modules, to skip exec_dynamic
rather than have PyModule_GetDef raise. I'll add this to the overview to
make things clearer.

>>         state = PyModule_GetState(module)
>>         if state is NULL:
>>             PyModule_ExecDef(module, def)
> 
> Ah, it is idempotent.

Yes, this is the part that disables reload().

[...]
> It may also be worth outlining how PyModuleDef_Init will work.

That's hard to do in Python syntax, since most of what it does is ensure
the def is a valid PyObject. I'll explain it in a different section.
It's a very small, idempotent function:

PyObject*
PyModuleDef_Init(struct PyModuleDef* def)
{
    if (def->m_base.m_index == 0) {
        max_module_number++;
        Py_REFCNT(def) = 1;
        Py_TYPE(def) = &PyModuleDef_Type;
        def->m_base.m_index = max_module_number;
    }
    return (PyObject*)def;
}

The code is lifted straight from PyModule_Create2.

The m_index is bookkeeping for for PyState_FindModule, so it's unused
for modules with multi-phase init, but I didn't want to break the
invariant that it's set up together with Py_TYPE.

-- 
Petr Viktorin


From encukou at gmail.com  Wed May 20 12:55:37 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Wed, 20 May 2015 12:55:37 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CALFfu7DQs_njKixMVcXPXxXG=jmu1OAP1tD1k8FMwD=Zu0vC6w@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com>
 <CALFfu7DQs_njKixMVcXPXxXG=jmu1OAP1tD1k8FMwD=Zu0vC6w@mail.gmail.com>
Message-ID: <555C6829.60901@gmail.com>

On 05/20/2015 02:22 AM, Eric Snow wrote:
> On Tue, May 19, 2015 at 5:06 AM, Petr Viktorin <encukou at gmail.com> wrote:
>> On 05/19/2015 05:51 AM, Nick Coghlan wrote:
>>> On 19 May 2015 at 10:07, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>>>> On Mon, May 18, 2015 at 8:02 AM, Petr Viktorin <encukou at gmail.com> wrote:

[snip]
>>>>>
>>>>> If PyModuleExec replaces the module's entry in sys.modules,
>>>>> the new object will be used and returned by importlib machinery.
>>>>
>>>> Just to be sure, something like "mod = sys.modules[modname]" is done
>>>> before each execution slot.  In other words, the result of the
>>>> previous execution slot should be used for the next one.
>>>
>>> That's not the original intent of this paragraph - rather, it is
>>> referring to the existing behaviour of the import machinery.
>>>
>>> However, I agree that now we're allowing the Py_mod_exec slot to be
>>> supplied multiple times, we should also be updating the module
>>> reference between slot invocations.
>>
>> No, that won't work. It's possible (via direct calls to the import
>> machinery) to load a module without adding it to sys.modules.
> 
> What direct calls do you mean?  I would not expect any such mechanism
> to work properly with extension modules.

Reimplement
<https://www.python.org/dev/peps/pep-0451/#how-loading-will-work>
without the sys.modules parts.
The point is that exec_module doesn't a priori depend on the module
being in sys.modules, which I think is a good thing.

>> The behavior should be clear (when you think about it) after I include
>> the loader implementation pseudocode.
> 
> Okay.
> 
>>
>>> I also think the PEP could do with a brief mention of the additional
>>> modularity this approach brings at the C level - rather than having to
>>> jam everything into one function, an extension module can easily break
>>> up its initialisation into multiple steps, and its technically even
>>> possible to share common steps between different modules.
>>
>> Eh, I think it's better to create one function that calls the parts,
>> which was always possible, and works just as well.
>> Repeating slots is allowed because it would be an unnecessary bother to
>> check for duplicates. It's not a feature to advertise, the PEP just
>> specifies that in the weird edge case, the intuitive thing will happen.
> 
> Be that as it may, I think it would be a mistake to treat support for
> multiple exec slots as a second-class citizen in the design.
> Personally I find it an appealing feature.

It's there, but I'll not not advertise it too much in the docs.

>> (I did have a useful future use case for repeated slots, but the current
>> PEP allows a better and more obvious solution so I'll not even mention
>> it again.)
>>
>> Still, the steps are processed in a loop from a single function
>> (PyModule_ExecDef), and that function operates on a module object -- it
>> doesn't know about sys.modules and can't easily check if you replaced
>> the module somewhere.
> 
> I would consider this approach to be a mistake as well.  The approach
> should stay consistent with the semantics of the whole import system,
> where sys.modules is checked directly.  Unfortunately, that ship has
> already sailed.

It's the loader that checks sys.modules, *after* exec_module is called.
No other implementation of exec_module checks sys.modules in the middle
of its operation. So I think the semantics are consistent.

[snip]
>>>>>
>>>>> Modules that need to work unchanged on older versions of Python should not
>>>>> use multi-phase initialization, because the benefits it brings can't be
>>>>> back-ported.
>>>>
>>>> Given your example below, "should not" seems a bit strong to me.  In
>>>> fact, what are the objections to encouraging the approach from the
>>>> example?
>>>
>>> Agreed, "should not" is probably too strong here. On the other hand,
>>> preserving compatibility with older Python versions in a module that
>>> has been updated to rely on multi-phase initialization is likely to be
>>> a matter of "graceful degradation", rather than being able to
>>> reproduce comparable functionality (which I believe may have been the
>>> point Petr was trying to convey).
>>
>> My point is that if you need graceful degradation, your best bet is to
>> stick with single-phase init. Then you'll have one code path that works
>> the same on all versions.
>> If you *need* the features of multi-phase init, you need to remove
>> support for Pythons that don't have it.
>> If you need both backwards compatibility and multi-phase init, you
>> essentially need to create two modules (with shared contents), and make
>> sure they end up in the same state after they're loaded.
>>
>>> I expect Cython and SWIG may be able to manage that through
>>> appropriate use of #ifdef's in the generated code, but doing it by
>>> hand is likely to be painful, hence the potential benefits of just
>>> sticking with single-phase initialisation for the time being.
>>
>> Yes, code generators are in a position to create two versions of the
>> module, and select one using using #ifdef.
>>
>> The example in the PEP is helpful for other reasons than encouraging
>> #ifdef: it shows what needs to change when porting. Think of it as a diff :)
> 
> It may be worth being more clear about that.

OK

[snip]
>>>>> The mechanism is designed to make this easy, but care is still required
>>>>> on the part of the extension author.
>>>>> No user-defined functions, methods, or instances may leak to different
>>>>> interpreters.
>>>>> To achieve this, all module-level state should be kept in either the module
>>>>> dict, or in the module object's storage reachable by PyModule_GetState.
>>>>
>>>> Is this programmatically enforceable?
>>
>> No. (I believe you could even prove this formally.)
>>
>>>> Is there any mechanism for easily copying module state?
>>
>> No. This would be impossible to provide in the general case. It's the
>> responsibility of your C code.
>> That said, if you need to copy module state, chances are your design
>> could use some rethinking.
>>
>>>> How about sharing some state between subinterpreters?
>>
>> The PyCapsule API was designed for this.
> 
> I'm simply thinking in terms of the options we have for a PEP I'm
> working on that will facilitate passing objects between
> subinterpreters and even possibly sharing some state between them.
> Currently it will be practically necessary to exclude extension
> modules from any such mechanism.  So I was wondering if there would be
> a way to allow extension module authors to define how at least some of
> the module's data could be shared between subinterpreters.

You should be able to put that info in slots. It's hard to speculate
without knowing specifics, though.

>>>> How much room is there for letting extension module
>>>> authors define how their module behaves across multiple interpreters
>>>> or across multiple Initialize/Finalize cycles?
>>
>> Technically, you have all the freedom you want. But if I embed Python
>> into my project/library, I'd want multiple sub-interpreters completely
>> isolated by default. If I use two libraries that each embed Python into
>> my app, I definitely want them isolated.
>> So the PEP tries to make it easy to keep multiple interpreters isolated.
> 
> As I just noted, I'm looking at making use of subinterpreters for a
> different use case where it *does* make sense to effectively share
> objects between them.

OK. This PEP isn't designed for that, but it should offer enough
extensibility.

[snip]
>>> This section is missing any explanation of the impact on
>>> Python/import.c, on the _imp/imp module, and on the 3 finders/loaders
>>> in Lib/importlib/_bootstrap[_external].py (builtin/frozen/extension).
>>
>> I'll add a summary.
>>
>> The internal _imp module will have backwards incompatible changes --
>> functions will be added and removed as necessary. That's what the
>> underscore means :)
> 
> Be careful with that assumption.  We've had plenty of experiences
> where the assumption because unreliable.

That's why I provide backcompat shims for undocumented, deprecated
functions in "imp". But _imp is just too low-level to do that easily.


From encukou at gmail.com  Wed May 20 13:08:53 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Wed, 20 May 2015 13:08:53 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CALFfu7CY3VL8VrSvSfHn=n+1i_KBusNkWpQ=nHkosKD4rGQhww@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <CALFfu7CY3VL8VrSvSfHn=n+1i_KBusNkWpQ=nHkosKD4rGQhww@mail.gmail.com>
Message-ID: <555C6B45.9070001@gmail.com>

On 05/20/2015 01:56 AM, Eric Snow wrote:
> On Mon, May 18, 2015 at 9:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 19 May 2015 at 10:07, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>   [snip]
>>> Was there any consideration made for just ignoring unknown slot IDs?
>>> My gut reaction is that you have it the right way, but I can still
>>> imagine use cases for custom slots that PyModuleDef_Init wouldn't know
>>> about.
>>
>> The "known slots only, all other slot IDs are reserved for future use"
>> slot semantics were copied directly from PyType_FromSpec in PEP 384.
>> Since it's just a numeric slot ID, you'd run a high risk of conflicts
>> if you allowed for custom extensions.
>>
>> If folks want to do more clever things, they'll need to use the create
>> or exec slot to stash them on the module object, rather than storing
>> them in the module definition.
> 
> Makes sense.  This does remind me of something I wanted to ask.  Would
> it make sense to leverage ModuleSpec.loader_state?  If I recall
> correctly, we added loader_state with extension modules in mind.

I don't think we want to go out of our way to support non-module
objects. Module subclasses should cover any needed functionality, and
they will support slots.

>>>> [snip]
>>>> Extension authors are advised to keep Py_mod_create minimal, an in
>>>> particular
>>>> to not call user code from it.
>>>
>>> This is a pretty important point as well.  We'll need to make sure
>>> this is sufficiently clear in the documentation.  Would it make sense
>>> to provide helpers for common cases, to encourage extension authors to
>>> keep the create function minimal?
>>
>> The main encouragement is to not handcode your extension modules at
>> all, and let something like Cython or SWIG take care of the
>> boilerplate :)
> 
> Hey, I tried to make something happen over on python-ideas! :)  Some
> folks just don't want to go far enough.

Yeah, as someone who's trying to get Python3 porting patches to Samba, I
can tell you some upstreams really, really, really don't like rewriting
their code.

>>>> As a rule of thumb, modules that rely on PyState_FindModule are, at the
>>>> moment,
>>>> not good candidates for porting to the new mechanism.
>>>
>>> Are there any plans for a follow-up effort to help with this case?
>>
>> The problem here is that the PEP 3121 module state approach provides
>> storage on a *per-interpreter* basis, that is then shared amongst all
>> module instances created from a given module definition.
> 
> You mean a form of interpreter-local storage?  Also, the module
> definition is effectively global right?

The PyModuleDef is global and static, but you can create any number of
module objects from it.
Each interpreter gets its own module object, with state specific to the
module object. (And with a custom finder/loader you can make multiple
modules from the same def within one interpreter.

>> For pure Python code, we don't have this problem, since the
>> interpreter takes care of providing a properly scoped globals()
>> reference to *all* functions defined in that module, regardless of
>> whether they're module level functions or method definitions on a
>> class. At the C level, we don't have that, as only module level
>> functions get a module reference passed in - methods only get a
>> reference to their class instance, without a reference to the module
>> globals, and delayed callbacks can be a problem as well.
> 
> Yuck.  Is this something we could fix?  Is __module__ not set on all functions?

The module object is not stored on classes, so methods dont' have access
to it. I want a fix for that to be my next PEP :)


-- 
Petr Viktorin

From encukou at gmail.com  Wed May 20 13:34:04 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Wed, 20 May 2015 13:34:04 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization;
	version 6
Message-ID: <CA+=+wqAsvYgKy+f2zD5F+i02KRgAQ+2T=yZOeFNRaDp9MBpnaA@mail.gmail.com>

Hello,
Based mainly on comments by Eric Snow, I've sent another update to PEP 489.

See the diff at https://hg.python.org/peps/rev/aad7a39a695b

Here is a copy for your convenience:

PEP: 489
Title: Multi-phase extension module initialization
Version: $Revision$
Last-Modified: $Date$
Author: Petr Viktorin <encukou at gmail.com>,
        Stefan Behnel <stefan_ml at behnel.de>,
        Nick Coghlan <ncoghlan at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.5
Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015
Resolution:


Abstract
========

This PEP proposes a redesign of the way in which built-in and extension modules
interact with the import machinery. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve
import-related problems by bringing extension modules closer to the way Python
modules behave; specifically to hook into the ModuleSpec-based loading
mechanism introduced in PEP 451.

This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension
authors to only define features they need, and to allow future additions
to extension module declarations.

Extensions modules are created in a two-step process, fitting better into
the ModuleSpec architecture, with parallels to __new__ and __init__ of classes.

Extension modules can safely store arbitrary C-level per-module state in
the module that is covered by normal garbage collection and supports
reloading and sub-interpreters.
Extension authors are encouraged to take these issues into account
when using the new API.

The proposal also allows extension modules with non-ASCII names.

Not all problems tackled in PEP 3121 are solved in this proposal.
In particular, problems with run-time module lookup (PyState_FindModule)
are left to a future PEP.


Motivation
==========

Python modules and extension modules are not being set up in the same way.
For Python modules, the module object is created and set up first, then the
module code is being executed (PEP 302).
A ModuleSpec object (PEP 451) is used to hold information about the module,
and passed to the relevant hooks.

For extensions (i.e. shared libraries) and built-in modules, the module
init function is executed straight away and does both the creation and
initialization. The initialization function is not passed the ModuleSpec,
or any information it contains, such as the __file__ or fully-qualified
name. This hinders relative imports and resource loading.

In Py3, modules are also not being added to sys.modules, which means that a
(potentially transitive) re-import of the module will really try to re-import
it and thus run into an infinite loop when it executes the module init function
again. Without access to the fully-qualified module name, it is not trivial to
correctly add the module to sys.modules either.
This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of __file__ and __name__
information hinders the compilation of "__init__.py" modules, i.e. packages,
especially when relative imports are being used at module init time.

Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or interpreter reloading, and, while
it is possible with the current infrastructure to support these
features, it is neither easy nor efficient.
Addressing these issues was the goal of PEP 3121, but many extensions,
including some in the standard library, took the least-effort approach
to porting to Python 3, leaving these issues unresolved.
This PEP keeps backwards compatibility, which should reduce pressure and give
extension authors adequate time to consider these issues when porting.


The current process
===================

Currently, extension and built-in modules export an initialization function
named "PyInit_modulename", named after the file name of the shared library.
This function is executed by the import machinery and must return a fully
initialized module object.
The function receives no arguments, so it has no way of knowing about its
import context.

During its execution, the module init function creates a module object
based on a PyModuleDef object. It then continues to initialize it by adding
attributes to the module dict, creating types, etc.

In the back, the shared library loader keeps a note of the fully qualified
module name of the last module that it loaded, and when a module gets
created that has a matching name, this global variable is used to determine
the fully qualified name of the module object. This is not entirely safe as it
relies on the module init function creating its own module object first,
but this assumption usually holds in practice.


The proposal
============

The initialization function (PyInit_modulename) will be allowed to return
a pointer to a PyModuleDef object. The import machinery will be in charge
of constructing the module object, calling hooks provided in the PyModuleDef
in the relevant phases of initialization (as described below).

This multi-phase initialization is an additional possibility. Single-phase
initialization, the current practice of returning a fully initialized module
object, will still be accepted, so existing code will work unchanged,
including binary compatibility.

The PyModuleDef structure will be changed to contain a list of slots,
similarly to PEP 384's PyType_Spec for types.
To keep binary compatibility, and avoid needing to introduce a new structure
(which would introduce additional supporting functions and per-module storage),
the currently unused m_reload pointer of PyModuleDef will be changed to
hold the slots. The structures are defined as::

    typedef struct {
        int slot;
        void *value;
    } PyModuleDef_Slot;

    typedef struct PyModuleDef {
        PyModuleDef_Base m_base;
        const char* m_name;
        const char* m_doc;
        Py_ssize_t m_size;
        PyMethodDef *m_methods;
        PyModuleDef_Slot *m_slots;  /* changed from `inquiry m_reload;` */
        traverseproc m_traverse;
        inquiry m_clear;
        freefunc m_free;
    } PyModuleDef;

The *m_slots* member must be either NULL, or point to an array of
PyModuleDef_Slot structures, terminated by a slot with id set to 0
(i.e. ``{0, NULL}``).

To specify a slot, a unique slot ID must be provided.
New Python versions may introduce new slot IDs, but slot IDs will never be
recycled. Slots may get deprecated, but will continue to be supported
throughout Python 3.x.

A slot's value pointer may not be NULL, unless specified otherwise in the
slot's documentation.

The following slots are currently available, and described later:

* Py_mod_create
* Py_mod_exec

Unknown slot IDs will cause the import to fail with SystemError.

When using multi-phase initialization, the *m_name* field of PyModuleDef will
not be used during importing; the module name will be taken from the ModuleSpec.

To prevent crashes when the module is loaded in older versions of Python,
the PyModuleDef object must be initialized using the newly added
PyModuleDef_Init function. This sets the object type (which cannot be done
statically on certain compilers), refcount, and internal bookkeeping data
(m_index).
For example, an extension module "example" would be exported as::

    static PyModuleDef example_def = {...}

    PyMODINIT_FUNC
    PyInit_example(void)
    {
        return PyModuleDef_Init(&example_def);
    }

The PyModuleDef object must be available for the lifetime of the module created
from it ? usually, it will be declared statically.

Pseudo-code Overview
--------------------

Here is an overview of how the modified importers will operate.
Details such as logging or handling of errors and invalid states
are left out, and C code is presented with a concise Python-like syntax.

The framework that calls the importers is explained in PEP 451
[#pep-0451-loading]_.

::

    importlib/_bootstrap.py:

        class BuiltinImporter:
            def create_module(self, spec):
                module = _imp.create_builtin(spec)

            def exec_module(self, module):
                _imp.exec_dynamic(module)

            def load_module(self, name):
                # use a backwards compatibility shim
                _load_module_shim(self, name)

    importlib/_bootstrap_external.py:

        class ExtensionFileLoader:
            def create_module(self, spec):
                module = _imp.create_dynamic(spec)

            def exec_module(self, module):
                _imp.exec_dynamic(module)

            def load_module(self, name):
                # use a backwards compatibility shim
                _load_module_shim(self, name)

    Python/import.c (the _imp module):

        def create_dynamic(spec):
            name = spec.name
            path = spec.origin

            # Find an already loaded module that used single-phase init.
            # For multi-phase initialization, mod is NULL, so a new module
            # is always created.
            mod = _PyImport_FindExtensionObject(name, name)
            if mod:
                return mod

            return _PyImport_LoadDynamicModuleWithSpec(spec)

        def exec_dynamic(module):
            if not isinstance(module, types.ModuleType):
                # non-modules are skipped -- PyModule_GetDef fails on them
                return

            def = PyModule_GetDef(module)
            state = PyModule_GetState(module)
            if state is NULL:
                PyModule_ExecDef(module, def)

        def create_builtin(spec):
            name = spec.name

            # Find an already loaded module that used single-phase init.
            # For multi-phase initialization, mod is NULL, so a new module
            # is always created.
            mod = _PyImport_FindExtensionObject(name, name)
            if mod:
                return mod

            for initname, initfunc in PyImport_Inittab:
                if name == initname:
                    m = initfunc()
                    if isinstance(m, PyModuleDef):
                        def = m
                        return PyModule_FromDefAndSpec(def, spec)
                    else:
                        # fall back to single-phase initialization
                        module = m
                        _PyImport_FixupExtensionObject(module, name, name)
                        return module

    Python/importdl.c:

        def _PyImport_LoadDynamicModuleWithSpec(spec):
            path = spec.origin
            package, dot, name = spec.name.rpartition('.')

            # see the "Non-ASCII module names" section for export_hook_name
            hook_name = export_hook_name(name)

            # call platform-specific function for loading exported function
            # from shared library
            exportfunc = _find_shared_funcptr(hook_name, path)

            m = exportfunc()
            if isinstance(m, PyModuleDef):
                def = m
                return PyModule_FromDefAndSpec(def, spec)

            module = m

            # fall back to single-phase initialization
            ....

    Objects/moduleobject.c:

        def PyModule_FromDefAndSpec(def, spec):
            name = spec.name
            create = None
            for slot, value in def.m_slots:
                if slot == Py_mod_create:
                    create = value
            if create:
                m = create(spec, def)
            else:
                m = PyModule_New(name)

            if isinstance(m, types.ModuleType):
                m.md_state = None
                m.md_def = def

            if def.m_methods:
                PyModule_AddFunctions(m, def.m_methods)
            if def.m_doc:
                PyModule_SetDocString(m, def.m_doc)

        def PyModule_ExecDef(module, def):
            if isinstance(module, types.module_type):
                if module.md_state is NULL:
                    # allocate a block of zeroed-out memory
                    module.md_state = _alloc(module.md_size)

            if def.m_slots is NULL:
                return

            for slot, value in def.m_slots:
                if slot == Py_mod_exec:
                    value(module)


Module Creation Phase
---------------------

Creation of the module object ? that is, the implementation of
ExecutionLoader.create_module ? is governed by the Py_mod_create slot.

The Py_mod_create slot
......................

The Py_mod_create slot is used to support custom module subclasses.
The value pointer must point to a function with the following signature::

    PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)

The function receives a ModuleSpec instance, as defined in PEP 451,
and the PyModuleDef structure.
It should return a new module object, or set an error
and return NULL.

This function is not responsible for setting import-related attributes
specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or
``__loader__``) on the new module.

There is no requirement for the returned object to be an instance of
types.ModuleType. Any type can be used, as long as it supports setting and
getting attributes, including at least the import-related attributes.
However, only ModuleType instances support module-specific functionality
such as per-module state.

Note that when this function is called, the module's entry in sys.modules
is not populated yet. Attempting to import the same module again
(possibly transitively), may lead to an infinite loop.
Extension authors are advised to keep Py_mod_create minimal, an in particular
to not call user code from it.

Multiple Py_mod_create slots may not be specified. If they are, import
will fail with SystemError.

If Py_mod_create is not specified, the import machinery will create a normal
module object using PyModule_New. The name is taken from *spec*.


Post-creation steps
...................

If the Py_mod_create function returns an instance of types.ModuleType
or a subclass (or if a Py_mod_create slot is not present), the import
machinery will associate the PyModuleDef with the module.
This also makes the PyModuleDef accessible to execution phase, the
PyModule_GetDef function, and garbage collection routines (traverse,
clear, free).

If the Py_mod_create function does not return a module subclass, then m_size
must be 0, and m_traverse, m_clear and m_free must all be NULL.
Otherwise, SystemError is raised.

Additionally, initial attributes specified in the PyModuleDef are set on the
module object, regardless of its type:

* The docstring is set from m_doc, if non-NULL.
* The module's functions are initialized from m_methods, if any.


Module Execution Phase
----------------------

Module execution -- that is, the implementation of
ExecutionLoader.exec_module -- is governed by "execution slots".
This PEP only adds one, Py_mod_exec, but others may be added in the future.

The execution phase is done on the PyModuleDef associated with the module
object. For objects that are not a subclass of PyModule_Type (for which
PyModule_GetDef would fail), the execution phase is skipped.

Execution slots may be specified multiple times, and are processed in the order
they appear in the slots array.
When using the default import machinery, they are processed after
import-related attributes specified in PEP 451 [#pep-0451-attributes]_
(such as ``__name__`` or ``__loader__``) are set and the module is added
to sys.modules.


Pre-Execution steps
-------------------

Before processing the execution slots, per-module state is allocated for the
module. From this point on, per-module state is accessible through
PyModule_GetState.


The Py_mod_exec slot
....................

The entry in this slot must point to a function with the following signature::

    int (*PyModuleExecFunction)(PyObject* module)

It will be called to initialize a module. Usually, this amounts to
setting the module's initial attributes.
The "module" argument receives the module object to initialize.

If PyModuleExec replaces the module's entry in sys.modules,
the new object will be used and returned by importlib machinery.
(This mirrors the behavior of Python modules. Note that implementing
Py_mod_create is usually a better solution for the use cases this serves.)

The function must return ``0`` on success, or, on error, set an exception and
return ``-1``.


Legacy Init
-----------

The backwards-compatible single-phase initialization continues to be supported.
In this scheme, the PyInit function returns a fully initialized module rather
than a PyModuleDef object.
In this case, the PyInit hook implements the creation phase, and the execution
phase is a no-op.

Modules that need to work unchanged on older versions of Python should stick to
single-phase initialization, because the benefits it brings can't be
back-ported.
Here is an example of a module that supports multi-phase initialization,
and falls back to single-phase when compiled for an older version of CPython.
It is included mainly as an illustration of the changes needed to enable
multi-phase init::

    #include <Python.h>

    static int spam_exec(PyObject *module) {
        PyModule_AddStringConstant(module, "food", "spam");
        return 0;
    }

    #ifdef Py_mod_exec
    static PyModuleDef_Slot spam_slots[] = {
        {Py_mod_exec, spam_exec},
        {0, NULL}
    };
    #endif

    static PyModuleDef spam_def = {
        PyModuleDef_HEAD_INIT,                      /* m_base */
        "spam",                                     /* m_name */
        PyDoc_STR("Utilities for cooking spam"),    /* m_doc */
        0,                                          /* m_size */
        NULL,                                       /* m_methods */
    #ifdef Py_mod_exec
        spam_slots,                                 /* m_slots */
    #else
        NULL,
    #endif
        NULL,                                       /* m_traverse */
        NULL,                                       /* m_clear */
        NULL,                                       /* m_free */
    };

    PyMODINIT_FUNC
    PyInit_spam(void) {
    #ifdef Py_mod_exec
        return PyModuleDef_Init(&spam_def);
    #else
        PyObject *module;
        module = PyModule_Create(&spam_def);
        if (module == NULL) return NULL;
        if (spam_exec(module) != 0) {
            Py_DECREF(module);
            return NULL;
        }
        return module;
    #endif
    }


Built-In modules
----------------

Any extension module can be used as a built-in module by linking it into
the executable, and including it in the inittab (either at runtime with
PyImport_AppendInittab, or at configuration time, using tools like *freeze*).

To keep this possibility, all changes to extension module loading introduced
in this PEP will also apply to built-in modules.
The only exception is non-ASCII module names, explained below.


Subinterpreters and Interpreter Reloading
-----------------------------------------

Extensions using the new initialization scheme are expected to support
subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly,
avoiding the issues mentioned in Python documentation [#subinterpreter-docs]_.
The mechanism is designed to make this easy, but care is still required
on the part of the extension author.
No user-defined functions, methods, or instances may leak to different
interpreters.
To achieve this, all module-level state should be kept in either the module
dict, or in the module object's storage reachable by PyModule_GetState.
A simple rule of thumb is: Do not define any static data, except built-in types
with no mutable or user-settable class attributes.


Functions incompatible with multi-phase initialization
------------------------------------------------------

The PyModule_Create function will fail when used on a PyModuleDef structure
with a non-NULL *m_slots* pointer.
The function doesn't have access to the ModuleSpec object necessary for
multi-phase initialization.

The PyState_FindModule function will return NULL, and PyState_AddModule
and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*.
PyState registration is disabled because multiple module objects may be created
from the same PyModuleDef.


Module state and C-level callbacks
----------------------------------

Due to the unavailability of PyState_FindModule, any function that needs access
to module-level state (including functions, classes or exceptions defined at
the module level) must receive a reference to the module object (or the
particular object it needs), either directly or indirectly.
This is currently difficult in two situations:

* Methods of classes, which receive a reference to the class, but not to
  the class's module
* Libraries with C-level callbacks, unless the callbacks can receive custom
  data set at callback registration

Fixing these cases is outside of the scope of this PEP, but will be needed for
the new mechanism to be useful to all modules. Proper fixes have been discussed
on the import-sig mailing list [#findmodule-discussion]_.

As a rule of thumb, modules that rely on PyState_FindModule are, at the moment,
not good candidates for porting to the new mechanism.


New Functions
-------------

A new function and macro implementing the module creation phase will be added.
These are similar to PyModule_Create and PyModule_Create2, except they
take an additional ModuleSpec argument, and handle module definitions with
non-NULL slots::

    PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
    PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
                                        int module_api_version)

A new function implementing the module execution phase will be added.
This allocates per-module state (if not allocated already), and *always*
processes execution slots. The import machinery calls this method when
a module is executed, unless the module is being reloaded::

    PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)

Another function will be introduced to initialize a PyModuleDef object.
This idempotent function fills in the type, refcount, and module index.
It returns its argument cast to PyObject*, so it can be returned directly
from a PyInit function::

    PyObject * PyModuleDef_Init(PyModuleDef *);

Additionally, two helpers will be added for setting the docstring and
methods on a module::

    int PyModule_SetDocString(PyObject *, const char *)
    int PyModule_AddFunctions(PyObject *, PyMethodDef *)


Export Hook Name
----------------

As portable C identifiers are limited to ASCII, module names
must be encoded to form the PyInit hook name.

For ASCII module names, the import hook is named
PyInit_<modulename>, where <modulename> is the name of the module.

For module names containing non-ASCII characters, the import hook is named
PyInitU_<encodedname>, where the name is encoded using CPython's
"punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix),
with hyphens ("-") replaced by underscores ("_").


In Python::

    def export_hook_name(name):
        try:
            suffix = b'_' + name.encode('ascii')
        except UnicodeEncodeError:
            suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
        return b'PyInit' + suffix

Examples:

=============  ===================
Module name    Init hook name
=============  ===================
spam           PyInit_spam
lan?m?t        PyInitU_lanmt_2sa6t
???          PyInitU_zck5b2b
=============  ===================

For modules with non-ASCII names, single-phase initialization is not supported.

In the initial implementation of this PEP, built-in modules with non-ASCII
names will not be supported.


Module Reloading
----------------

Reloading an extension module using importlib.reload() will continue to
have no effect, except re-setting import-related attributes.

Due to limitations in shared library loading (both dlopen on POSIX and
LoadModuleEx on Windows), it is not generally possible to load
a modified library after it has changed on disk.

Use cases for reloading other than trying out a new version of the module
are too rare to require all module authors to keep reloading in mind.
If reload-like functionality is needed, authors can export a dedicated
function for it.


Multiple modules in one library
-------------------------------

To support multiple Python modules in one shared library, the library can
export additional PyInit* symbols besides the one that corresponds
to the library's filename.

Note that this mechanism can currently only be used to *load* extra modules,
but not to *find* them. (This is a limitation of the loader mechanism,
which this PEP does not try to modify.)
To work around the lack of a suitable finder, code like the following
can be used::

    import importlib.machinery
    import importlib.util
    loader = importlib.machinery.ExtensionFileLoader(name, path)
    spec = importlib.util.spec_from_loader(name, loader)
    module = importlib.util.module_from_spec(spec)
    loader.exec_module(module)
    return module

On platforms that support symbolic links, these may be used to install one
library under multiple names, exposing all exported modules to normal
import machinery.


Testing and initial implementations
-----------------------------------

For testing, a new built-in module ``_testmultiphase`` will be created.
The library will export several additional modules using the mechanism
described in "Multiple modules in one library".

The ``_testcapi`` module will be unchanged, and will use single-phase
initialization indefinitely (or until it is no longer supported).

The ``array`` and ``xx*`` modules will be converted to use multi-phase
initialization as part of the initial implementation.


Summary of API Changes and Additions
------------------------------------

New functions:

* PyModule_FromDefAndSpec (macro)
* PyModule_FromDefAndSpec2
* PyModule_ExecDef
* PyModule_SetDocString
* PyModule_AddFunctions
* PyModuleDef_Init

New macros:

* Py_mod_create
* Py_mod_exec

New types:

* PyModuleDef_Type will be exposed

New structures:

* PyModuleDef_Slot

PyModuleDef.m_reload changes to PyModuleDef.m_slots.

The internal ``_imp`` module will have backwards incompatible changes:
``create_builtin``, ``create_dynamic``, and ``exec_dynamic`` will be added;
``init_builtin``, ``load_dynamic`` will be removed.

The undocumented functions ``imp.load_dynamic`` and ``imp.init_builtin`` will
be replaced by backwards-compatible shims.


Possible Future Extensions
==========================

The slots mechanism, inspired by PyType_Slot from PEP 384,
allows later extensions.

Some extension modules exports many constants; for example _ssl has
a long list of calls in the form::

    PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
                            PY_SSL_ERROR_ZERO_RETURN);

Converting this to a declarative list, similar to PyMethodDef,
would reduce boilerplate, and provide free error-checking which
is often missing.

String constants and types can be handled similarly.
(Note that non-default bases for types cannot be portably specified
statically; this case would need a Py_mod_exec function that runs
before the slots are added. The free error-checking would still be
beneficial, though.)

Another possibility is providing a "main" function that would be run
when the module is given to Python's -m switch.
For this to work, the runpy module will need to be modified to take
advantage of ModuleSpec-based loading introduced in PEP 451.
Also, it will be necessary to add a mechanism for setting up a module
according to slots it wasn't originally defined with.


Implementation
==============

Work-in-progress implementation is available in a Github repository [#gh-repo]_;
a patchset is at [#gh-patch]_.


Previous Approaches
===================

Stefan Behnel's initial proto-PEP [#stefans_protopep]_
had a "PyInit_modulename" hook that would create a module class,
whose ``__init__`` would be then called to create the module.
This proposal did not correspond to the (then nonexistent) PEP 451,
where module creation and initialization is broken into distinct steps.
It also did not support loading an extension into pre-existing module objects.

Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype
implementation [#nicks-prototype]_.
At this time PEP 451 was still not implemented, so the prototype
does not use ModuleSpec.

The original version of this PEP used Create and Exec hooks, and allowed
loading into arbitrary pre-constructed objects with Exec hook.
The proposal made extension module initialization closer to how Python modules
are initialized, but it was later recognized that this isn't an important goal.
The current PEP describes a simpler solution.

A further iteration used a "PyModuleExport" hook as an alternative to PyInit,
where PyInit was used for existing scheme, and PyModuleExport for multi-phase.
However, not being able to determine the hook name based on module name
complicated automatic generation of PyImport_Inittab by tools like freeze.
Keeping only the PyInit hook name, even if it's not entirely appropriate for
exporting a definition, yielded a much simpler solution.


References
==========

.. [#pep-0451-attributes]
   https://www.python.org/dev/peps/pep-0451/#attributes

.. [#stefans_protopep]
   https://mail.python.org/pipermail/python-dev/2013-August/128087.html

.. [#nicks-prototype]
   https://mail.python.org/pipermail/python-dev/2013-August/128101.html

.. [#rfc-3492]
   http://tools.ietf.org/html/rfc3492

.. [#gh-repo]
   https://github.com/encukou/cpython/commits/pep489

.. [#gh-patch]
   https://github.com/encukou/cpython/compare/master...encukou:pep489.patch

.. [#findmodule-discussion]
   https://mail.python.org/pipermail/import-sig/2015-April/000959.html

.. [#pep-0451-loading]
   https://www.python.org/dev/peps/pep-0451/#how-loading-will-work]

.. [#subinterpreter-docs]
   https://docs.python.org/3/c-api/init.html#sub-interpreter-support


Copyright
=========

This document has been placed in the public domain.

From ericsnowcurrently at gmail.com  Wed May 20 16:07:52 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 20 May 2015 08:07:52 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <555C47CD.4060406@redhat.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com> <555B4B4A.5000902@redhat.com>
 <CALFfu7DEvT8vVA3AMLmp24gskeoMJwNK24TZ0g_FCFLkaW6CtQ@mail.gmail.com>
 <555C47CD.4060406@redhat.com>
Message-ID: <CALFfu7CKndjSn3bf3txpTSsnBnntdA18LjCA9JufKefX4Js28w@mail.gmail.com>

On Wed, May 20, 2015 at 2:37 AM, Petr Viktorin <pviktori at redhat.com> wrote:
> On 05/20/2015 02:33 AM, Eric Snow wrote:
  [snip]
>> Won't frozen modules be likewise affected?
>
> No, frozen modules are Python source, just not loaded from a file.

Isn't the mechanism similar to builtins?  Regardless, I was hopeful
that we could fix FrozenImporter at the same time that we fixed
BuiltinImporter.

  [snip]
>> It may also be worth outlining how PyModuleDef_Init will work.
>
> That's hard to do in Python syntax, since most of what it does is ensure
> the def is a valid PyObject. I'll explain it in a different section.
> It's a very small, idempotent function:
>
> PyObject*
> PyModuleDef_Init(struct PyModuleDef* def)
> {
>     if (def->m_base.m_index == 0) {
>         max_module_number++;
>         Py_REFCNT(def) = 1;
>         Py_TYPE(def) = &PyModuleDef_Type;
>         def->m_base.m_index = max_module_number;
>     }
>     return (PyObject*)def;
> }
>
> The code is lifted straight from PyModule_Create2.
>
> The m_index is bookkeeping for for PyState_FindModule, so it's unused
> for modules with multi-phase init, but I didn't want to break the
> invariant that it's set up together with Py_TYPE.

Okay.  Thanks for the explanation.  So really PyModuleDef_Init does
some bookkeeping and that's it.

-eric

From ericsnowcurrently at gmail.com  Wed May 20 16:56:33 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 20 May 2015 08:56:33 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <555C6829.60901@gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com>
 <CALFfu7DQs_njKixMVcXPXxXG=jmu1OAP1tD1k8FMwD=Zu0vC6w@mail.gmail.com>
 <555C6829.60901@gmail.com>
Message-ID: <CALFfu7BWmwUDy-r751_PvOoUha_YK6-GJ6BXwj3yhq4hs-xYtQ@mail.gmail.com>

On Wed, May 20, 2015 at 4:55 AM, Petr Viktorin <encukou at gmail.com> wrote:
> On 05/20/2015 02:22 AM, Eric Snow wrote:
>> On Tue, May 19, 2015 at 5:06 AM, Petr Viktorin <encukou at gmail.com> wrote:
  [snip]
>>> No, that won't work. It's possible (via direct calls to the import
>>> machinery) to load a module without adding it to sys.modules.
>>
>> What direct calls do you mean?  I would not expect any such mechanism
>> to work properly with extension modules.
>
> Reimplement
> <https://www.python.org/dev/peps/pep-0451/#how-loading-will-work>
> without the sys.modules parts.

You mean someone could do so?  Sure, they could.  Python has a
philosophy of not stopping you from doing what is usually the wrong
thing because sometimes it is the right thing for you.  As we say,
we're all consenting adults.

In this case, we expect that folks will use the import system (or
importlib) to import modules.  If they do it manually then they are
responsible to satisfy the semantics of the import system or risk
bugs.  One of the key goals of PEP 451 was to leave certain semantics
up to the import machinery rather than requiring all finder/loader
authors to implement the behavior.  This includes a number of tricky
parts like the sys.modules handling.

> The point is that exec_module doesn't a priori depend on the module
> being in sys.modules, which I think is a good thing.

Well, there's an explicit specification about how sys.modules is used
during loading.  For post-exec sys.modules lookup specifically,
https://docs.python.org/3.5//reference/import.html#id2.  The note in
the language reference says that it is an implementation detail.
However, keep in mind that this PEP is a CPython-specific proposal.

That said, I'm only -0 on not matching the sys.modules lookup behavior
of module loading.  It could be okay if we were to document the
behavior clearly.  My concern is with having different semantics even
if it only relates to a remote corner case.  It may be a corner case
that someone will rely on.

  [snip]
>> Be that as it may, I think it would be a mistake to treat support for
>> multiple exec slots as a second-class citizen in the design.
>> Personally I find it an appealing feature.
>
> It's there, but I'll not not advertise it too much in the docs.

I'm okay with that.  It's not like we're precluding promoting the
behavior later. :)

  [snip]
>>> Still, the steps are processed in a loop from a single function
>>> (PyModule_ExecDef), and that function operates on a module object -- it
>>> doesn't know about sys.modules and can't easily check if you replaced
>>> the module somewhere.
>>
>> I would consider this approach to be a mistake as well.  The approach
>> should stay consistent with the semantics of the whole import system,
>> where sys.modules is checked directly.  Unfortunately, that ship has
>> already sailed.
>
> It's the loader that checks sys.modules, *after* exec_module is called.

Not the loader.  It's the import machinery that does it.  See
importlib._bootstrap._exec.

> No other implementation of exec_module checks sys.modules in the middle
> of its operation. So I think the semantics are consistent.

I was thinking of each exec slot as a parallel to Loader.exec_module.
Thus I was expecting the same sys.modules lookup behavior that you get
during module loading.  That's why I would expect the module to get
updated to sys.modules[spec.name] after each exec slot runs.

At the moment I'm still -0 on not matching the sys.modules lookup
semantics.  However, like I said above, I can be convinced otherwise.

  [snip]
>> I'm simply thinking in terms of the options we have for a PEP I'm
>> working on that will facilitate passing objects between
>> subinterpreters and even possibly sharing some state between them.
>> Currently it will be practically necessary to exclude extension
>> modules from any such mechanism.  So I was wondering if there would be
>> a way to allow extension module authors to define how at least some of
>> the module's data could be shared between subinterpreters.
>
> You should be able to put that info in slots. It's hard to speculate
> without knowing specifics, though.

I'm sure you're right about slots so we should be fine.  We can cross
the bridge later. :)

[snip]
>> As I just noted, I'm looking at making use of subinterpreters for a
>> different use case where it *does* make sense to effectively share
>> objects between them.
>
> OK. This PEP isn't designed for that, but it should offer enough
> extensibility.

Right.

  [snip]
>>> The internal _imp module will have backwards incompatible changes --
>>> functions will be added and removed as necessary. That's what the
>>> underscore means :)
>>
>> Be careful with that assumption.  We've had plenty of experiences
>> where the assumption because unreliable.
>
> That's why I provide backcompat shims for undocumented, deprecated
> functions in "imp". But _imp is just too low-level to do that easily.

I'm okay with that, particularly since the _imp module is relatively new.

-eric

From ericsnowcurrently at gmail.com  Wed May 20 17:14:37 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 20 May 2015 09:14:37 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <555C6B45.9070001@gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <CALFfu7CY3VL8VrSvSfHn=n+1i_KBusNkWpQ=nHkosKD4rGQhww@mail.gmail.com>
 <555C6B45.9070001@gmail.com>
Message-ID: <CALFfu7AcVwP2=yPBb0grm6WN=ArioD00fR5O9G3rViseZP+izw@mail.gmail.com>

On Wed, May 20, 2015 at 5:08 AM, Petr Viktorin <encukou at gmail.com> wrote:
> On 05/20/2015 01:56 AM, Eric Snow wrote:
>> Makes sense.  This does remind me of something I wanted to ask.  Would
>> it make sense to leverage ModuleSpec.loader_state?  If I recall
>> correctly, we added loader_state with extension modules in mind.
>
> I don't think we want to go out of our way to support non-module
> objects. Module subclasses should cover any needed functionality, and
> they will support slots.

Sorry I wasn't clear.  ModuleSpec.loader_state isn't related to
non-module objects or module subclasses.  It's a mechanism by which
finders can pass some loader-specific info to the loader.  It could
also be used to maintain some initial module state separately from the
module.  As I said, I thought we added loader_state with extension
modules in mind, so I figured I'd ask.

  [snip]
>> Hey, I tried to make something happen over on python-ideas! :)  Some
>> folks just don't want to go far enough.
>
> Yeah, as someone who's trying to get Python3 porting patches to Samba, I
> can tell you some upstreams really, really, really don't like rewriting
> their code.

Sure.  I'm not advocating for folks to rewrite their extension
modules.  Rather I want the docs to be more active in encouraging the
use of tools like Cython.  I think the discussion on python-ideas
could still be resolved favorably.  Mostly I had other things to do so
I didn't move things forward. :)

  [snip]
>> Yuck.  Is this something we could fix?  Is __module__ not set on all functions?
>
> The module object is not stored on classes, so methods dont' have access
> to it.

Do classes defined in an extension module not have a __module__
attribute (holding the module name)?

> I want a fix for that to be my next PEP :)

Cool!  It may be good to have an explicit section in this PEP about
possible follow-up features (e.g. "Out of Scope").

Also, it would be a good idea to have an explicit section in the PEP
about backward-compatibility.  (Pretty sure there wasn't one.)  This
is an important aspect of every PEP and should be clearly
communicated, even if just to say there is no
backward-incompatibility.  Such a section is also a good place to
clearly indicate what extension authors need to do to adapt to the new
feature.

-eric

From ncoghlan at gmail.com  Thu May 21 00:16:54 2015
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 21 May 2015 08:16:54 +1000
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CALFfu7BWmwUDy-r751_PvOoUha_YK6-GJ6BXwj3yhq4hs-xYtQ@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com>
 <CALFfu7DQs_njKixMVcXPXxXG=jmu1OAP1tD1k8FMwD=Zu0vC6w@mail.gmail.com>
 <555C6829.60901@gmail.com>
 <CALFfu7BWmwUDy-r751_PvOoUha_YK6-GJ6BXwj3yhq4hs-xYtQ@mail.gmail.com>
Message-ID: <CADiSq7dAr5kU1aepcgOtANr0wX1CZvefwRALa5+0veamdH-4eg@mail.gmail.com>

On 21 May 2015 at 00:56, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Wed, May 20, 2015 at 4:55 AM, Petr Viktorin <encukou at gmail.com> wrote:
>> The point is that exec_module doesn't a priori depend on the module
>> being in sys.modules, which I think is a good thing.
>
> Well, there's an explicit specification about how sys.modules is used
> during loading.  For post-exec sys.modules lookup specifically,
> https://docs.python.org/3.5//reference/import.html#id2.  The note in
> the language reference says that it is an implementation detail.
> However, keep in mind that this PEP is a CPython-specific proposal.
>
> That said, I'm only -0 on not matching the sys.modules lookup behavior
> of module loading.  It could be okay if we were to document the
> behavior clearly.  My concern is with having different semantics even
> if it only relates to a remote corner case.  It may be a corner case
> that someone will rely on.

We *will* match the semantics for the *overall* loading process. What
Petr is saying is that *while* executing the "execution slots",
they'll all receive the object returned by Py_mod_create (or the
automatically created module if that slot is not defined), rather than
any replacement injected into sys.modules.

There's no Python level parallel for that "multiple execution slots"
behaviour, so it makes sense to define the semantics based on
simplicity of implementaiton and the fact we want to encourage the use
of Py_mod_create for extension modules over sys.modules injection.

>> No other implementation of exec_module checks sys.modules in the middle
>> of its operation. So I think the semantics are consistent.
>
> I was thinking of each exec slot as a parallel to Loader.exec_module.
> Thus I was expecting the same sys.modules lookup behavior that you get
> during module loading.  That's why I would expect the module to get
> updated to sys.modules[spec.name] after each exec slot runs.

I changed my mind when Petr posted the clarification that this is
really just a matter of iterating over the defined slots in the
loader's exec_module method, and calling any of them that are defined
as execution slots (for the time, just Py_mod_exec).

The entirety of a Python module runs in the same module namespace,
regardless of what is done with sys.modules, so having all execution
slots called with the same object is the extension module equivalent.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

From ericsnowcurrently at gmail.com  Thu May 21 00:39:32 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 20 May 2015 16:39:32 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CADiSq7dAr5kU1aepcgOtANr0wX1CZvefwRALa5+0veamdH-4eg@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com>
 <CALFfu7DQs_njKixMVcXPXxXG=jmu1OAP1tD1k8FMwD=Zu0vC6w@mail.gmail.com>
 <555C6829.60901@gmail.com>
 <CALFfu7BWmwUDy-r751_PvOoUha_YK6-GJ6BXwj3yhq4hs-xYtQ@mail.gmail.com>
 <CADiSq7dAr5kU1aepcgOtANr0wX1CZvefwRALa5+0veamdH-4eg@mail.gmail.com>
Message-ID: <CALFfu7A8Ax2cQW-e9NT2tNEDTQqebX9rVU-LXqzSpWbEnuZ1aA@mail.gmail.com>

On Wed, May 20, 2015 at 4:16 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 21 May 2015 at 00:56, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>> On Wed, May 20, 2015 at 4:55 AM, Petr Viktorin <encukou at gmail.com> wrote:
>>> The point is that exec_module doesn't a priori depend on the module
>>> being in sys.modules, which I think is a good thing.
>>
>> Well, there's an explicit specification about how sys.modules is used
>> during loading.  For post-exec sys.modules lookup specifically,
>> https://docs.python.org/3.5//reference/import.html#id2.  The note in
>> the language reference says that it is an implementation detail.
>> However, keep in mind that this PEP is a CPython-specific proposal.
>>
>> That said, I'm only -0 on not matching the sys.modules lookup behavior
>> of module loading.  It could be okay if we were to document the
>> behavior clearly.  My concern is with having different semantics even
>> if it only relates to a remote corner case.  It may be a corner case
>> that someone will rely on.
>
> We *will* match the semantics for the *overall* loading process. What
> Petr is saying is that *while* executing the "execution slots",
> they'll all receive the object returned by Py_mod_create (or the
> automatically created module if that slot is not defined), rather than
> any replacement injected into sys.modules.
>
> There's no Python level parallel for that "multiple execution slots"
> behaviour, so it makes sense to define the semantics based on
> simplicity of implementaiton and the fact we want to encourage the use
> of Py_mod_create for extension modules over sys.modules injection.

I was thinking along those same lines.  I'm okay with that rationale.
The PEP should be updated to clarify this point and its rationale.

>
>>> No other implementation of exec_module checks sys.modules in the middle
>>> of its operation. So I think the semantics are consistent.
>>
>> I was thinking of each exec slot as a parallel to Loader.exec_module.
>> Thus I was expecting the same sys.modules lookup behavior that you get
>> during module loading.  That's why I would expect the module to get
>> updated to sys.modules[spec.name] after each exec slot runs.
>
> I changed my mind when Petr posted the clarification that this is
> really just a matter of iterating over the defined slots in the
> loader's exec_module method, and calling any of them that are defined
> as execution slots (for the time, just Py_mod_exec).
>
> The entirety of a Python module runs in the same module namespace,
> regardless of what is done with sys.modules, so having all execution
> slots called with the same object is the extension module equivalent.

Sounds good.  Thanks for clarifying.

-eric

From ericsnowcurrently at gmail.com  Wed May 20 23:47:29 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Wed, 20 May 2015 15:47:29 -0600
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 6
In-Reply-To: <CA+=+wqAsvYgKy+f2zD5F+i02KRgAQ+2T=yZOeFNRaDp9MBpnaA@mail.gmail.com>
References: <CA+=+wqAsvYgKy+f2zD5F+i02KRgAQ+2T=yZOeFNRaDp9MBpnaA@mail.gmail.com>
Message-ID: <CALFfu7CtRhX4bQUBWKRsjOWkn3hiPbnmd6S_VRqPg4VJw9kK-A@mail.gmail.com>

FYI, Nick asked if I would be willing to be BDFL-Delegate for this PEP
and Guido has given the okay.  I've added myself to the PEP's header.
I'll try to make a decision soon (in time to land the patch before the
feature freeze), but I also must be confident about the pronouncement.

-eric

On Wed, May 20, 2015 at 5:34 AM, Petr Viktorin <encukou at gmail.com> wrote:
> Hello,
> Based mainly on comments by Eric Snow, I've sent another update to PEP 489.
>
> See the diff at https://hg.python.org/peps/rev/aad7a39a695b
>
> Here is a copy for your convenience:
>
> PEP: 489
> Title: Multi-phase extension module initialization
> Version: $Revision$
> Last-Modified: $Date$
> Author: Petr Viktorin <encukou at gmail.com>,
>         Stefan Behnel <stefan_ml at behnel.de>,
>         Nick Coghlan <ncoghlan at gmail.com>
> Discussions-To: import-sig at python.org
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 11-Aug-2013
> Python-Version: 3.5
> Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015
> Resolution:
>
>
> Abstract
> ========
>
> This PEP proposes a redesign of the way in which built-in and extension modules
> interact with the import machinery. This was last revised for Python 3.0 in PEP
> 3121, but did not solve all problems at the time. The goal is to solve
> import-related problems by bringing extension modules closer to the way Python
> modules behave; specifically to hook into the ModuleSpec-based loading
> mechanism introduced in PEP 451.
>
> This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension
> authors to only define features they need, and to allow future additions
> to extension module declarations.
>
> Extensions modules are created in a two-step process, fitting better into
> the ModuleSpec architecture, with parallels to __new__ and __init__ of classes.
>
> Extension modules can safely store arbitrary C-level per-module state in
> the module that is covered by normal garbage collection and supports
> reloading and sub-interpreters.
> Extension authors are encouraged to take these issues into account
> when using the new API.
>
> The proposal also allows extension modules with non-ASCII names.
>
> Not all problems tackled in PEP 3121 are solved in this proposal.
> In particular, problems with run-time module lookup (PyState_FindModule)
> are left to a future PEP.
>
>
> Motivation
> ==========
>
> Python modules and extension modules are not being set up in the same way.
> For Python modules, the module object is created and set up first, then the
> module code is being executed (PEP 302).
> A ModuleSpec object (PEP 451) is used to hold information about the module,
> and passed to the relevant hooks.
>
> For extensions (i.e. shared libraries) and built-in modules, the module
> init function is executed straight away and does both the creation and
> initialization. The initialization function is not passed the ModuleSpec,
> or any information it contains, such as the __file__ or fully-qualified
> name. This hinders relative imports and resource loading.
>
> In Py3, modules are also not being added to sys.modules, which means that a
> (potentially transitive) re-import of the module will really try to re-import
> it and thus run into an infinite loop when it executes the module init function
> again. Without access to the fully-qualified module name, it is not trivial to
> correctly add the module to sys.modules either.
> This is specifically a problem for Cython generated modules, for which it's
> not uncommon that the module init code has the same level of complexity as
> that of any 'regular' Python module. Also, the lack of __file__ and __name__
> information hinders the compilation of "__init__.py" modules, i.e. packages,
> especially when relative imports are being used at module init time.
>
> Furthermore, the majority of currently existing extension modules has
> problems with sub-interpreter support and/or interpreter reloading, and, while
> it is possible with the current infrastructure to support these
> features, it is neither easy nor efficient.
> Addressing these issues was the goal of PEP 3121, but many extensions,
> including some in the standard library, took the least-effort approach
> to porting to Python 3, leaving these issues unresolved.
> This PEP keeps backwards compatibility, which should reduce pressure and give
> extension authors adequate time to consider these issues when porting.
>
>
> The current process
> ===================
>
> Currently, extension and built-in modules export an initialization function
> named "PyInit_modulename", named after the file name of the shared library.
> This function is executed by the import machinery and must return a fully
> initialized module object.
> The function receives no arguments, so it has no way of knowing about its
> import context.
>
> During its execution, the module init function creates a module object
> based on a PyModuleDef object. It then continues to initialize it by adding
> attributes to the module dict, creating types, etc.
>
> In the back, the shared library loader keeps a note of the fully qualified
> module name of the last module that it loaded, and when a module gets
> created that has a matching name, this global variable is used to determine
> the fully qualified name of the module object. This is not entirely safe as it
> relies on the module init function creating its own module object first,
> but this assumption usually holds in practice.
>
>
> The proposal
> ============
>
> The initialization function (PyInit_modulename) will be allowed to return
> a pointer to a PyModuleDef object. The import machinery will be in charge
> of constructing the module object, calling hooks provided in the PyModuleDef
> in the relevant phases of initialization (as described below).
>
> This multi-phase initialization is an additional possibility. Single-phase
> initialization, the current practice of returning a fully initialized module
> object, will still be accepted, so existing code will work unchanged,
> including binary compatibility.
>
> The PyModuleDef structure will be changed to contain a list of slots,
> similarly to PEP 384's PyType_Spec for types.
> To keep binary compatibility, and avoid needing to introduce a new structure
> (which would introduce additional supporting functions and per-module storage),
> the currently unused m_reload pointer of PyModuleDef will be changed to
> hold the slots. The structures are defined as::
>
>     typedef struct {
>         int slot;
>         void *value;
>     } PyModuleDef_Slot;
>
>     typedef struct PyModuleDef {
>         PyModuleDef_Base m_base;
>         const char* m_name;
>         const char* m_doc;
>         Py_ssize_t m_size;
>         PyMethodDef *m_methods;
>         PyModuleDef_Slot *m_slots;  /* changed from `inquiry m_reload;` */
>         traverseproc m_traverse;
>         inquiry m_clear;
>         freefunc m_free;
>     } PyModuleDef;
>
> The *m_slots* member must be either NULL, or point to an array of
> PyModuleDef_Slot structures, terminated by a slot with id set to 0
> (i.e. ``{0, NULL}``).
>
> To specify a slot, a unique slot ID must be provided.
> New Python versions may introduce new slot IDs, but slot IDs will never be
> recycled. Slots may get deprecated, but will continue to be supported
> throughout Python 3.x.
>
> A slot's value pointer may not be NULL, unless specified otherwise in the
> slot's documentation.
>
> The following slots are currently available, and described later:
>
> * Py_mod_create
> * Py_mod_exec
>
> Unknown slot IDs will cause the import to fail with SystemError.
>
> When using multi-phase initialization, the *m_name* field of PyModuleDef will
> not be used during importing; the module name will be taken from the ModuleSpec.
>
> To prevent crashes when the module is loaded in older versions of Python,
> the PyModuleDef object must be initialized using the newly added
> PyModuleDef_Init function. This sets the object type (which cannot be done
> statically on certain compilers), refcount, and internal bookkeeping data
> (m_index).
> For example, an extension module "example" would be exported as::
>
>     static PyModuleDef example_def = {...}
>
>     PyMODINIT_FUNC
>     PyInit_example(void)
>     {
>         return PyModuleDef_Init(&example_def);
>     }
>
> The PyModuleDef object must be available for the lifetime of the module created
> from it ? usually, it will be declared statically.
>
> Pseudo-code Overview
> --------------------
>
> Here is an overview of how the modified importers will operate.
> Details such as logging or handling of errors and invalid states
> are left out, and C code is presented with a concise Python-like syntax.
>
> The framework that calls the importers is explained in PEP 451
> [#pep-0451-loading]_.
>
> ::
>
>     importlib/_bootstrap.py:
>
>         class BuiltinImporter:
>             def create_module(self, spec):
>                 module = _imp.create_builtin(spec)
>
>             def exec_module(self, module):
>                 _imp.exec_dynamic(module)
>
>             def load_module(self, name):
>                 # use a backwards compatibility shim
>                 _load_module_shim(self, name)
>
>     importlib/_bootstrap_external.py:
>
>         class ExtensionFileLoader:
>             def create_module(self, spec):
>                 module = _imp.create_dynamic(spec)
>
>             def exec_module(self, module):
>                 _imp.exec_dynamic(module)
>
>             def load_module(self, name):
>                 # use a backwards compatibility shim
>                 _load_module_shim(self, name)
>
>     Python/import.c (the _imp module):
>
>         def create_dynamic(spec):
>             name = spec.name
>             path = spec.origin
>
>             # Find an already loaded module that used single-phase init.
>             # For multi-phase initialization, mod is NULL, so a new module
>             # is always created.
>             mod = _PyImport_FindExtensionObject(name, name)
>             if mod:
>                 return mod
>
>             return _PyImport_LoadDynamicModuleWithSpec(spec)
>
>         def exec_dynamic(module):
>             if not isinstance(module, types.ModuleType):
>                 # non-modules are skipped -- PyModule_GetDef fails on them
>                 return
>
>             def = PyModule_GetDef(module)
>             state = PyModule_GetState(module)
>             if state is NULL:
>                 PyModule_ExecDef(module, def)
>
>         def create_builtin(spec):
>             name = spec.name
>
>             # Find an already loaded module that used single-phase init.
>             # For multi-phase initialization, mod is NULL, so a new module
>             # is always created.
>             mod = _PyImport_FindExtensionObject(name, name)
>             if mod:
>                 return mod
>
>             for initname, initfunc in PyImport_Inittab:
>                 if name == initname:
>                     m = initfunc()
>                     if isinstance(m, PyModuleDef):
>                         def = m
>                         return PyModule_FromDefAndSpec(def, spec)
>                     else:
>                         # fall back to single-phase initialization
>                         module = m
>                         _PyImport_FixupExtensionObject(module, name, name)
>                         return module
>
>     Python/importdl.c:
>
>         def _PyImport_LoadDynamicModuleWithSpec(spec):
>             path = spec.origin
>             package, dot, name = spec.name.rpartition('.')
>
>             # see the "Non-ASCII module names" section for export_hook_name
>             hook_name = export_hook_name(name)
>
>             # call platform-specific function for loading exported function
>             # from shared library
>             exportfunc = _find_shared_funcptr(hook_name, path)
>
>             m = exportfunc()
>             if isinstance(m, PyModuleDef):
>                 def = m
>                 return PyModule_FromDefAndSpec(def, spec)
>
>             module = m
>
>             # fall back to single-phase initialization
>             ....
>
>     Objects/moduleobject.c:
>
>         def PyModule_FromDefAndSpec(def, spec):
>             name = spec.name
>             create = None
>             for slot, value in def.m_slots:
>                 if slot == Py_mod_create:
>                     create = value
>             if create:
>                 m = create(spec, def)
>             else:
>                 m = PyModule_New(name)
>
>             if isinstance(m, types.ModuleType):
>                 m.md_state = None
>                 m.md_def = def
>
>             if def.m_methods:
>                 PyModule_AddFunctions(m, def.m_methods)
>             if def.m_doc:
>                 PyModule_SetDocString(m, def.m_doc)
>
>         def PyModule_ExecDef(module, def):
>             if isinstance(module, types.module_type):
>                 if module.md_state is NULL:
>                     # allocate a block of zeroed-out memory
>                     module.md_state = _alloc(module.md_size)
>
>             if def.m_slots is NULL:
>                 return
>
>             for slot, value in def.m_slots:
>                 if slot == Py_mod_exec:
>                     value(module)
>
>
> Module Creation Phase
> ---------------------
>
> Creation of the module object ? that is, the implementation of
> ExecutionLoader.create_module ? is governed by the Py_mod_create slot.
>
> The Py_mod_create slot
> ......................
>
> The Py_mod_create slot is used to support custom module subclasses.
> The value pointer must point to a function with the following signature::
>
>     PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)
>
> The function receives a ModuleSpec instance, as defined in PEP 451,
> and the PyModuleDef structure.
> It should return a new module object, or set an error
> and return NULL.
>
> This function is not responsible for setting import-related attributes
> specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or
> ``__loader__``) on the new module.
>
> There is no requirement for the returned object to be an instance of
> types.ModuleType. Any type can be used, as long as it supports setting and
> getting attributes, including at least the import-related attributes.
> However, only ModuleType instances support module-specific functionality
> such as per-module state.
>
> Note that when this function is called, the module's entry in sys.modules
> is not populated yet. Attempting to import the same module again
> (possibly transitively), may lead to an infinite loop.
> Extension authors are advised to keep Py_mod_create minimal, an in particular
> to not call user code from it.
>
> Multiple Py_mod_create slots may not be specified. If they are, import
> will fail with SystemError.
>
> If Py_mod_create is not specified, the import machinery will create a normal
> module object using PyModule_New. The name is taken from *spec*.
>
>
> Post-creation steps
> ...................
>
> If the Py_mod_create function returns an instance of types.ModuleType
> or a subclass (or if a Py_mod_create slot is not present), the import
> machinery will associate the PyModuleDef with the module.
> This also makes the PyModuleDef accessible to execution phase, the
> PyModule_GetDef function, and garbage collection routines (traverse,
> clear, free).
>
> If the Py_mod_create function does not return a module subclass, then m_size
> must be 0, and m_traverse, m_clear and m_free must all be NULL.
> Otherwise, SystemError is raised.
>
> Additionally, initial attributes specified in the PyModuleDef are set on the
> module object, regardless of its type:
>
> * The docstring is set from m_doc, if non-NULL.
> * The module's functions are initialized from m_methods, if any.
>
>
> Module Execution Phase
> ----------------------
>
> Module execution -- that is, the implementation of
> ExecutionLoader.exec_module -- is governed by "execution slots".
> This PEP only adds one, Py_mod_exec, but others may be added in the future.
>
> The execution phase is done on the PyModuleDef associated with the module
> object. For objects that are not a subclass of PyModule_Type (for which
> PyModule_GetDef would fail), the execution phase is skipped.
>
> Execution slots may be specified multiple times, and are processed in the order
> they appear in the slots array.
> When using the default import machinery, they are processed after
> import-related attributes specified in PEP 451 [#pep-0451-attributes]_
> (such as ``__name__`` or ``__loader__``) are set and the module is added
> to sys.modules.
>
>
> Pre-Execution steps
> -------------------
>
> Before processing the execution slots, per-module state is allocated for the
> module. From this point on, per-module state is accessible through
> PyModule_GetState.
>
>
> The Py_mod_exec slot
> ....................
>
> The entry in this slot must point to a function with the following signature::
>
>     int (*PyModuleExecFunction)(PyObject* module)
>
> It will be called to initialize a module. Usually, this amounts to
> setting the module's initial attributes.
> The "module" argument receives the module object to initialize.
>
> If PyModuleExec replaces the module's entry in sys.modules,
> the new object will be used and returned by importlib machinery.
> (This mirrors the behavior of Python modules. Note that implementing
> Py_mod_create is usually a better solution for the use cases this serves.)
>
> The function must return ``0`` on success, or, on error, set an exception and
> return ``-1``.
>
>
> Legacy Init
> -----------
>
> The backwards-compatible single-phase initialization continues to be supported.
> In this scheme, the PyInit function returns a fully initialized module rather
> than a PyModuleDef object.
> In this case, the PyInit hook implements the creation phase, and the execution
> phase is a no-op.
>
> Modules that need to work unchanged on older versions of Python should stick to
> single-phase initialization, because the benefits it brings can't be
> back-ported.
> Here is an example of a module that supports multi-phase initialization,
> and falls back to single-phase when compiled for an older version of CPython.
> It is included mainly as an illustration of the changes needed to enable
> multi-phase init::
>
>     #include <Python.h>
>
>     static int spam_exec(PyObject *module) {
>         PyModule_AddStringConstant(module, "food", "spam");
>         return 0;
>     }
>
>     #ifdef Py_mod_exec
>     static PyModuleDef_Slot spam_slots[] = {
>         {Py_mod_exec, spam_exec},
>         {0, NULL}
>     };
>     #endif
>
>     static PyModuleDef spam_def = {
>         PyModuleDef_HEAD_INIT,                      /* m_base */
>         "spam",                                     /* m_name */
>         PyDoc_STR("Utilities for cooking spam"),    /* m_doc */
>         0,                                          /* m_size */
>         NULL,                                       /* m_methods */
>     #ifdef Py_mod_exec
>         spam_slots,                                 /* m_slots */
>     #else
>         NULL,
>     #endif
>         NULL,                                       /* m_traverse */
>         NULL,                                       /* m_clear */
>         NULL,                                       /* m_free */
>     };
>
>     PyMODINIT_FUNC
>     PyInit_spam(void) {
>     #ifdef Py_mod_exec
>         return PyModuleDef_Init(&spam_def);
>     #else
>         PyObject *module;
>         module = PyModule_Create(&spam_def);
>         if (module == NULL) return NULL;
>         if (spam_exec(module) != 0) {
>             Py_DECREF(module);
>             return NULL;
>         }
>         return module;
>     #endif
>     }
>
>
> Built-In modules
> ----------------
>
> Any extension module can be used as a built-in module by linking it into
> the executable, and including it in the inittab (either at runtime with
> PyImport_AppendInittab, or at configuration time, using tools like *freeze*).
>
> To keep this possibility, all changes to extension module loading introduced
> in this PEP will also apply to built-in modules.
> The only exception is non-ASCII module names, explained below.
>
>
> Subinterpreters and Interpreter Reloading
> -----------------------------------------
>
> Extensions using the new initialization scheme are expected to support
> subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly,
> avoiding the issues mentioned in Python documentation [#subinterpreter-docs]_.
> The mechanism is designed to make this easy, but care is still required
> on the part of the extension author.
> No user-defined functions, methods, or instances may leak to different
> interpreters.
> To achieve this, all module-level state should be kept in either the module
> dict, or in the module object's storage reachable by PyModule_GetState.
> A simple rule of thumb is: Do not define any static data, except built-in types
> with no mutable or user-settable class attributes.
>
>
> Functions incompatible with multi-phase initialization
> ------------------------------------------------------
>
> The PyModule_Create function will fail when used on a PyModuleDef structure
> with a non-NULL *m_slots* pointer.
> The function doesn't have access to the ModuleSpec object necessary for
> multi-phase initialization.
>
> The PyState_FindModule function will return NULL, and PyState_AddModule
> and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*.
> PyState registration is disabled because multiple module objects may be created
> from the same PyModuleDef.
>
>
> Module state and C-level callbacks
> ----------------------------------
>
> Due to the unavailability of PyState_FindModule, any function that needs access
> to module-level state (including functions, classes or exceptions defined at
> the module level) must receive a reference to the module object (or the
> particular object it needs), either directly or indirectly.
> This is currently difficult in two situations:
>
> * Methods of classes, which receive a reference to the class, but not to
>   the class's module
> * Libraries with C-level callbacks, unless the callbacks can receive custom
>   data set at callback registration
>
> Fixing these cases is outside of the scope of this PEP, but will be needed for
> the new mechanism to be useful to all modules. Proper fixes have been discussed
> on the import-sig mailing list [#findmodule-discussion]_.
>
> As a rule of thumb, modules that rely on PyState_FindModule are, at the moment,
> not good candidates for porting to the new mechanism.
>
>
> New Functions
> -------------
>
> A new function and macro implementing the module creation phase will be added.
> These are similar to PyModule_Create and PyModule_Create2, except they
> take an additional ModuleSpec argument, and handle module definitions with
> non-NULL slots::
>
>     PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
>     PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
>                                         int module_api_version)
>
> A new function implementing the module execution phase will be added.
> This allocates per-module state (if not allocated already), and *always*
> processes execution slots. The import machinery calls this method when
> a module is executed, unless the module is being reloaded::
>
>     PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)
>
> Another function will be introduced to initialize a PyModuleDef object.
> This idempotent function fills in the type, refcount, and module index.
> It returns its argument cast to PyObject*, so it can be returned directly
> from a PyInit function::
>
>     PyObject * PyModuleDef_Init(PyModuleDef *);
>
> Additionally, two helpers will be added for setting the docstring and
> methods on a module::
>
>     int PyModule_SetDocString(PyObject *, const char *)
>     int PyModule_AddFunctions(PyObject *, PyMethodDef *)
>
>
> Export Hook Name
> ----------------
>
> As portable C identifiers are limited to ASCII, module names
> must be encoded to form the PyInit hook name.
>
> For ASCII module names, the import hook is named
> PyInit_<modulename>, where <modulename> is the name of the module.
>
> For module names containing non-ASCII characters, the import hook is named
> PyInitU_<encodedname>, where the name is encoded using CPython's
> "punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix),
> with hyphens ("-") replaced by underscores ("_").
>
>
> In Python::
>
>     def export_hook_name(name):
>         try:
>             suffix = b'_' + name.encode('ascii')
>         except UnicodeEncodeError:
>             suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
>         return b'PyInit' + suffix
>
> Examples:
>
> =============  ===================
> Module name    Init hook name
> =============  ===================
> spam           PyInit_spam
> lan?m?t        PyInitU_lanmt_2sa6t
> ???          PyInitU_zck5b2b
> =============  ===================
>
> For modules with non-ASCII names, single-phase initialization is not supported.
>
> In the initial implementation of this PEP, built-in modules with non-ASCII
> names will not be supported.
>
>
> Module Reloading
> ----------------
>
> Reloading an extension module using importlib.reload() will continue to
> have no effect, except re-setting import-related attributes.
>
> Due to limitations in shared library loading (both dlopen on POSIX and
> LoadModuleEx on Windows), it is not generally possible to load
> a modified library after it has changed on disk.
>
> Use cases for reloading other than trying out a new version of the module
> are too rare to require all module authors to keep reloading in mind.
> If reload-like functionality is needed, authors can export a dedicated
> function for it.
>
>
> Multiple modules in one library
> -------------------------------
>
> To support multiple Python modules in one shared library, the library can
> export additional PyInit* symbols besides the one that corresponds
> to the library's filename.
>
> Note that this mechanism can currently only be used to *load* extra modules,
> but not to *find* them. (This is a limitation of the loader mechanism,
> which this PEP does not try to modify.)
> To work around the lack of a suitable finder, code like the following
> can be used::
>
>     import importlib.machinery
>     import importlib.util
>     loader = importlib.machinery.ExtensionFileLoader(name, path)
>     spec = importlib.util.spec_from_loader(name, loader)
>     module = importlib.util.module_from_spec(spec)
>     loader.exec_module(module)
>     return module
>
> On platforms that support symbolic links, these may be used to install one
> library under multiple names, exposing all exported modules to normal
> import machinery.
>
>
> Testing and initial implementations
> -----------------------------------
>
> For testing, a new built-in module ``_testmultiphase`` will be created.
> The library will export several additional modules using the mechanism
> described in "Multiple modules in one library".
>
> The ``_testcapi`` module will be unchanged, and will use single-phase
> initialization indefinitely (or until it is no longer supported).
>
> The ``array`` and ``xx*`` modules will be converted to use multi-phase
> initialization as part of the initial implementation.
>
>
> Summary of API Changes and Additions
> ------------------------------------
>
> New functions:
>
> * PyModule_FromDefAndSpec (macro)
> * PyModule_FromDefAndSpec2
> * PyModule_ExecDef
> * PyModule_SetDocString
> * PyModule_AddFunctions
> * PyModuleDef_Init
>
> New macros:
>
> * Py_mod_create
> * Py_mod_exec
>
> New types:
>
> * PyModuleDef_Type will be exposed
>
> New structures:
>
> * PyModuleDef_Slot
>
> PyModuleDef.m_reload changes to PyModuleDef.m_slots.
>
> The internal ``_imp`` module will have backwards incompatible changes:
> ``create_builtin``, ``create_dynamic``, and ``exec_dynamic`` will be added;
> ``init_builtin``, ``load_dynamic`` will be removed.
>
> The undocumented functions ``imp.load_dynamic`` and ``imp.init_builtin`` will
> be replaced by backwards-compatible shims.
>
>
> Possible Future Extensions
> ==========================
>
> The slots mechanism, inspired by PyType_Slot from PEP 384,
> allows later extensions.
>
> Some extension modules exports many constants; for example _ssl has
> a long list of calls in the form::
>
>     PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
>                             PY_SSL_ERROR_ZERO_RETURN);
>
> Converting this to a declarative list, similar to PyMethodDef,
> would reduce boilerplate, and provide free error-checking which
> is often missing.
>
> String constants and types can be handled similarly.
> (Note that non-default bases for types cannot be portably specified
> statically; this case would need a Py_mod_exec function that runs
> before the slots are added. The free error-checking would still be
> beneficial, though.)
>
> Another possibility is providing a "main" function that would be run
> when the module is given to Python's -m switch.
> For this to work, the runpy module will need to be modified to take
> advantage of ModuleSpec-based loading introduced in PEP 451.
> Also, it will be necessary to add a mechanism for setting up a module
> according to slots it wasn't originally defined with.
>
>
> Implementation
> ==============
>
> Work-in-progress implementation is available in a Github repository [#gh-repo]_;
> a patchset is at [#gh-patch]_.
>
>
> Previous Approaches
> ===================
>
> Stefan Behnel's initial proto-PEP [#stefans_protopep]_
> had a "PyInit_modulename" hook that would create a module class,
> whose ``__init__`` would be then called to create the module.
> This proposal did not correspond to the (then nonexistent) PEP 451,
> where module creation and initialization is broken into distinct steps.
> It also did not support loading an extension into pre-existing module objects.
>
> Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype
> implementation [#nicks-prototype]_.
> At this time PEP 451 was still not implemented, so the prototype
> does not use ModuleSpec.
>
> The original version of this PEP used Create and Exec hooks, and allowed
> loading into arbitrary pre-constructed objects with Exec hook.
> The proposal made extension module initialization closer to how Python modules
> are initialized, but it was later recognized that this isn't an important goal.
> The current PEP describes a simpler solution.
>
> A further iteration used a "PyModuleExport" hook as an alternative to PyInit,
> where PyInit was used for existing scheme, and PyModuleExport for multi-phase.
> However, not being able to determine the hook name based on module name
> complicated automatic generation of PyImport_Inittab by tools like freeze.
> Keeping only the PyInit hook name, even if it's not entirely appropriate for
> exporting a definition, yielded a much simpler solution.
>
>
> References
> ==========
>
> .. [#pep-0451-attributes]
>    https://www.python.org/dev/peps/pep-0451/#attributes
>
> .. [#stefans_protopep]
>    https://mail.python.org/pipermail/python-dev/2013-August/128087.html
>
> .. [#nicks-prototype]
>    https://mail.python.org/pipermail/python-dev/2013-August/128101.html
>
> .. [#rfc-3492]
>    http://tools.ietf.org/html/rfc3492
>
> .. [#gh-repo]
>    https://github.com/encukou/cpython/commits/pep489
>
> .. [#gh-patch]
>    https://github.com/encukou/cpython/compare/master...encukou:pep489.patch
>
> .. [#findmodule-discussion]
>    https://mail.python.org/pipermail/import-sig/2015-April/000959.html
>
> .. [#pep-0451-loading]
>    https://www.python.org/dev/peps/pep-0451/#how-loading-will-work]
>
> .. [#subinterpreter-docs]
>    https://docs.python.org/3/c-api/init.html#sub-interpreter-support
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> https://mail.python.org/mailman/listinfo/import-sig

From encukou at gmail.com  Thu May 21 02:49:30 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 21 May 2015 02:49:30 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CALFfu7AcVwP2=yPBb0grm6WN=ArioD00fR5O9G3rViseZP+izw@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <CALFfu7CY3VL8VrSvSfHn=n+1i_KBusNkWpQ=nHkosKD4rGQhww@mail.gmail.com>
 <555C6B45.9070001@gmail.com>
 <CALFfu7AcVwP2=yPBb0grm6WN=ArioD00fR5O9G3rViseZP+izw@mail.gmail.com>
Message-ID: <CA+=+wqDS=t-TY_cA8VLL-i9NLPAtA7bNPQP29nBhqE1MrNzpSQ@mail.gmail.com>

On Wed, May 20, 2015 at 5:14 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Wed, May 20, 2015 at 5:08 AM, Petr Viktorin <encukou at gmail.com> wrote:
>> On 05/20/2015 01:56 AM, Eric Snow wrote:
>>> Makes sense.  This does remind me of something I wanted to ask.  Would
>>> it make sense to leverage ModuleSpec.loader_state?  If I recall
>>> correctly, we added loader_state with extension modules in mind.
>>
>> I don't think we want to go out of our way to support non-module
>> objects. Module subclasses should cover any needed functionality, and
>> they will support slots.
>
> Sorry I wasn't clear.  ModuleSpec.loader_state isn't related to
> non-module objects or module subclasses.  It's a mechanism by which
> finders can pass some loader-specific info to the loader.  It could
> also be used to maintain some initial module state separately from the
> module.  As I said, I thought we added loader_state with extension
> modules in mind, so I figured I'd ask.

It turns out to be unnecessary.
I will add that if create returns a non-module object, no execution
slots should be specified (i.e. there should only be a Py_mod_create).
That will allow us to change our mind later if this turns out to be a
bad idea, but I doubt it will.

>   [snip]
>>> Yuck.  Is this something we could fix?  Is __module__ not set on all functions?
>>
>> The module object is not stored on classes, so methods dont' have access
>> to it.
>
> Do classes defined in an extension module not have a __module__
> attribute (holding the module name)?

They do, but that's not good enough:
- Looking up the name in sys.modules is slow.
- Both that and sys.modules are OK to be modified by Python code, so
you can easily get a different module from such a lookup, and using a
different module's state pointer will most likely segfault.
(Maybe this discussion needs a new mail thread?)

>> I want a fix for that to be my next PEP :)
>
> Cool!  It may be good to have an explicit section in this PEP about
> possible follow-up features (e.g. "Out of Scope").

There is a section for follow-up features already (it talks about
possible future slots).
This follow-up didn't make it in -- I think it's too far out of scope,
as it isn't really concerned with loading modules. I think the link in
the section about PyState_FindModule is enough.

> Also, it would be a good idea to have an explicit section in the PEP
> about backward-compatibility.  (Pretty sure there wasn't one.)  This
> is an important aspect of every PEP and should be clearly
> communicated, even if just to say there is no
> backward-incompatibility.  Such a section is also a good place to
> clearly indicate what extension authors need to do to adapt to the new
> feature.

OK, I can add that. (in the morning; it's 3 AM here so the changes
wouldn't be any good now.)

From encukou at gmail.com  Thu May 21 02:50:07 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 21 May 2015 02:50:07 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CALFfu7A8Ax2cQW-e9NT2tNEDTQqebX9rVU-LXqzSpWbEnuZ1aA@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com>
 <CALFfu7DQs_njKixMVcXPXxXG=jmu1OAP1tD1k8FMwD=Zu0vC6w@mail.gmail.com>
 <555C6829.60901@gmail.com>
 <CALFfu7BWmwUDy-r751_PvOoUha_YK6-GJ6BXwj3yhq4hs-xYtQ@mail.gmail.com>
 <CADiSq7dAr5kU1aepcgOtANr0wX1CZvefwRALa5+0veamdH-4eg@mail.gmail.com>
 <CALFfu7A8Ax2cQW-e9NT2tNEDTQqebX9rVU-LXqzSpWbEnuZ1aA@mail.gmail.com>
Message-ID: <CA+=+wqDcVEkS8GpYSmWJCgrm+HRBt-mESce1=zqyFiUFecOiyA@mail.gmail.com>

On Thu, May 21, 2015 at 12:39 AM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Wed, May 20, 2015 at 4:16 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 21 May 2015 at 00:56, Eric Snow <ericsnowcurrently at gmail.com> wrote:
>>> On Wed, May 20, 2015 at 4:55 AM, Petr Viktorin <encukou at gmail.com> wrote:
>>>> The point is that exec_module doesn't a priori depend on the module
>>>> being in sys.modules, which I think is a good thing.
>>>
>>> Well, there's an explicit specification about how sys.modules is used
>>> during loading.  For post-exec sys.modules lookup specifically,
>>> https://docs.python.org/3.5//reference/import.html#id2.  The note in
>>> the language reference says that it is an implementation detail.
>>> However, keep in mind that this PEP is a CPython-specific proposal.
>>>
>>> That said, I'm only -0 on not matching the sys.modules lookup behavior
>>> of module loading.  It could be okay if we were to document the
>>> behavior clearly.  My concern is with having different semantics even
>>> if it only relates to a remote corner case.  It may be a corner case
>>> that someone will rely on.
>>
>> We *will* match the semantics for the *overall* loading process. What
>> Petr is saying is that *while* executing the "execution slots",
>> they'll all receive the object returned by Py_mod_create (or the
>> automatically created module if that slot is not defined), rather than
>> any replacement injected into sys.modules.
>>
>> There's no Python level parallel for that "multiple execution slots"
>> behaviour, so it makes sense to define the semantics based on
>> simplicity of implementaiton and the fact we want to encourage the use
>> of Py_mod_create for extension modules over sys.modules injection.
>
> I was thinking along those same lines.  I'm okay with that rationale.
> The PEP should be updated to clarify this point and its rationale.

There's no provision in the machinery to call multiple different
implementations of exec_module. And all sys.modules
lookup/manipulation is done by the machinery, so it doesn't make sense
to do it in ExtensionFileLoader.exec_module, either.
I believe that now, with the pseudo-code overview, this is clearer, so
a rationale isn't needed (the reason it was needed in the first place
is that the PEP was confusing.)
I will clarify the semantics Py_mod_exec section, though.

From stefan_ml at behnel.de  Thu May 21 08:06:37 2015
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 May 2015 08:06:37 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 6
In-Reply-To: <CA+=+wqAsvYgKy+f2zD5F+i02KRgAQ+2T=yZOeFNRaDp9MBpnaA@mail.gmail.com>
References: <CA+=+wqAsvYgKy+f2zD5F+i02KRgAQ+2T=yZOeFNRaDp9MBpnaA@mail.gmail.com>
Message-ID: <mjjsld$ukd$1@ger.gmane.org>

Petr Viktorin schrieb am 20.05.2015 um 13:34:
> To prevent crashes when the module is loaded in older versions of Python,
> the PyModuleDef object must be initialized using the newly added
> PyModuleDef_Init function. This sets the object type (which cannot be done
> statically on certain compilers), refcount, and internal bookkeeping data
> (m_index).
> For example, an extension module "example" would be exported as::
> 
>     static PyModuleDef example_def = {...}
> 
>     PyMODINIT_FUNC
>     PyInit_example(void)
>     {
>         return PyModuleDef_Init(&example_def);
>     }

If PyModuleDef_Init() is really a function, this will not help with "older
versions of Python", which do not have the function available. So, is it
going to be a macro?

Stefan


From stefan_ml at behnel.de  Thu May 21 08:22:27 2015
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 21 May 2015 08:22:27 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 6
In-Reply-To: <mjjsld$ukd$1@ger.gmane.org>
References: <CA+=+wqAsvYgKy+f2zD5F+i02KRgAQ+2T=yZOeFNRaDp9MBpnaA@mail.gmail.com>
 <mjjsld$ukd$1@ger.gmane.org>
Message-ID: <mjjtj3$fnp$1@ger.gmane.org>

Stefan Behnel schrieb am 21.05.2015 um 08:06:
> Petr Viktorin schrieb am 20.05.2015 um 13:34:
>> To prevent crashes when the module is loaded in older versions of Python,
>> the PyModuleDef object must be initialized using the newly added
>> PyModuleDef_Init function. This sets the object type (which cannot be done
>> statically on certain compilers), refcount, and internal bookkeeping data
>> (m_index).
>> For example, an extension module "example" would be exported as::
>>
>>     static PyModuleDef example_def = {...}
>>
>>     PyMODINIT_FUNC
>>     PyInit_example(void)
>>     {
>>         return PyModuleDef_Init(&example_def);
>>     }
> 
> If PyModuleDef_Init() is really a function, this will not help with "older
> versions of Python", which do not have the function available. So, is it
> going to be a macro?

Ah, ok, I found it further down in the PEP. It's not actually supposed to
be called in older Python versions, right? Meaning, we only provide source
level backwards compatibility and not binary backwards compatibility for
extension modules?

Then the paragraph above is really misleading.

Stefan


From encukou at gmail.com  Thu May 21 10:21:03 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 21 May 2015 10:21:03 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 5
In-Reply-To: <CALFfu7CKndjSn3bf3txpTSsnBnntdA18LjCA9JufKefX4Js28w@mail.gmail.com>
References: <5559F0FD.3080704@gmail.com>
 <CALFfu7DfrR4LswtgNuAB7TT-y8vN38-yYUvS0QBmQtP26RO-Zg@mail.gmail.com>
 <CADiSq7cuoimcpRZfyTGaGOmapFDYvGbcB+sVyG1x7rPvFHNTfg@mail.gmail.com>
 <555B1937.5020001@gmail.com> <555B4B4A.5000902@redhat.com>
 <CALFfu7DEvT8vVA3AMLmp24gskeoMJwNK24TZ0g_FCFLkaW6CtQ@mail.gmail.com>
 <555C47CD.4060406@redhat.com>
 <CALFfu7CKndjSn3bf3txpTSsnBnntdA18LjCA9JufKefX4Js28w@mail.gmail.com>
Message-ID: <CA+=+wqCSa_M6DMRKzP=z5721Uiu7g29tEda_Qd7tVJg_P+aB3g@mail.gmail.com>

On Wed, May 20, 2015 at 4:07 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> On Wed, May 20, 2015 at 2:37 AM, Petr Viktorin <pviktori at redhat.com> wrote:
>> On 05/20/2015 02:33 AM, Eric Snow wrote:
>   [snip]
>>> Won't frozen modules be likewise affected?
>>
>> No, frozen modules are Python source, just not loaded from a file.
>
> Isn't the mechanism similar to builtins?

No. FrozenImporter loads bytecode from a compiled-in marshalled
string, and then exec() it. It's completely different.

> Regardless, I was hopeful that we could fix FrozenImporter at the same time
> that we fixed BuiltinImporter.

I'm not sure what's to fix in FrozenImporter (it uses
create_module/exec_module already, is there something else?), but I
doubt this PEP is the right place.

From encukou at gmail.com  Thu May 21 13:27:16 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 21 May 2015 13:27:16 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module initialization;
	version 7
Message-ID: <CA+=+wqD27JizXoDB8zU8RpyXYDBUj=86pnqu+9ze5ACrAR=_Dw@mail.gmail.com>

Hello,
Based on the last round of comments, I've sent changes to PEP editors.

There is one functional change:
- Don't allow execution slots for non-module subclasses

and several wording fixes/clarifications:
- Remove misleading reason for PyModuleDef_Init
- Clarify that sys.modules is not checked between execution steps
- Add a Backwards Compatibility summary
- Heading level fix, typo fix


The full text follows:

PEP: 489
Title: Multi-phase extension module initialization
Version: $Revision$
Last-Modified: $Date$
Author: Petr Viktorin <encukou at gmail.com>,
        Stefan Behnel <stefan_ml at behnel.de>,
        Nick Coghlan <ncoghlan at gmail.com>
BDFL-Delegate: Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To: import-sig at python.org
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 11-Aug-2013
Python-Version: 3.5
Post-History: 23-Aug-2013, 20-Feb-2015, 16-Apr-2015
Resolution:


Abstract
========

This PEP proposes a redesign of the way in which built-in and extension modules
interact with the import machinery. This was last revised for Python 3.0 in PEP
3121, but did not solve all problems at the time. The goal is to solve
import-related problems by bringing extension modules closer to the way Python
modules behave; specifically to hook into the ModuleSpec-based loading
mechanism introduced in PEP 451.

This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension
authors to only define features they need, and to allow future additions
to extension module declarations.

Extensions modules are created in a two-step process, fitting better into
the ModuleSpec architecture, with parallels to __new__ and __init__ of classes.

Extension modules can safely store arbitrary C-level per-module state in
the module that is covered by normal garbage collection and supports
reloading and sub-interpreters.
Extension authors are encouraged to take these issues into account
when using the new API.

The proposal also allows extension modules with non-ASCII names.

Not all problems tackled in PEP 3121 are solved in this proposal.
In particular, problems with run-time module lookup (PyState_FindModule)
are left to a future PEP.


Motivation
==========

Python modules and extension modules are not being set up in the same way.
For Python modules, the module object is created and set up first, then the
module code is being executed (PEP 302).
A ModuleSpec object (PEP 451) is used to hold information about the module,
and passed to the relevant hooks.

For extensions (i.e. shared libraries) and built-in modules, the module
init function is executed straight away and does both the creation and
initialization. The initialization function is not passed the ModuleSpec,
or any information it contains, such as the __file__ or fully-qualified
name. This hinders relative imports and resource loading.

In Py3, modules are also not being added to sys.modules, which means that a
(potentially transitive) re-import of the module will really try to re-import
it and thus run into an infinite loop when it executes the module init function
again. Without access to the fully-qualified module name, it is not trivial to
correctly add the module to sys.modules either.
This is specifically a problem for Cython generated modules, for which it's
not uncommon that the module init code has the same level of complexity as
that of any 'regular' Python module. Also, the lack of __file__ and __name__
information hinders the compilation of "__init__.py" modules, i.e. packages,
especially when relative imports are being used at module init time.

Furthermore, the majority of currently existing extension modules has
problems with sub-interpreter support and/or interpreter reloading, and, while
it is possible with the current infrastructure to support these
features, it is neither easy nor efficient.
Addressing these issues was the goal of PEP 3121, but many extensions,
including some in the standard library, took the least-effort approach
to porting to Python 3, leaving these issues unresolved.
This PEP keeps backwards compatibility, which should reduce pressure and give
extension authors adequate time to consider these issues when porting.


The current process
===================

Currently, extension and built-in modules export an initialization function
named "PyInit_modulename", named after the file name of the shared library.
This function is executed by the import machinery and must return a fully
initialized module object.
The function receives no arguments, so it has no way of knowing about its
import context.

During its execution, the module init function creates a module object
based on a PyModuleDef object. It then continues to initialize it by adding
attributes to the module dict, creating types, etc.

In the back, the shared library loader keeps a note of the fully qualified
module name of the last module that it loaded, and when a module gets
created that has a matching name, this global variable is used to determine
the fully qualified name of the module object. This is not entirely safe as it
relies on the module init function creating its own module object first,
but this assumption usually holds in practice.


The proposal
============

The initialization function (PyInit_modulename) will be allowed to return
a pointer to a PyModuleDef object. The import machinery will be in charge
of constructing the module object, calling hooks provided in the PyModuleDef
in the relevant phases of initialization (as described below).

This multi-phase initialization is an additional possibility. Single-phase
initialization, the current practice of returning a fully initialized module
object, will still be accepted, so existing code will work unchanged,
including binary compatibility.

The PyModuleDef structure will be changed to contain a list of slots,
similarly to PEP 384's PyType_Spec for types.
To keep binary compatibility, and avoid needing to introduce a new structure
(which would introduce additional supporting functions and per-module storage),
the currently unused m_reload pointer of PyModuleDef will be changed to
hold the slots. The structures are defined as::

    typedef struct {
        int slot;
        void *value;
    } PyModuleDef_Slot;

    typedef struct PyModuleDef {
        PyModuleDef_Base m_base;
        const char* m_name;
        const char* m_doc;
        Py_ssize_t m_size;
        PyMethodDef *m_methods;
        PyModuleDef_Slot *m_slots;  /* changed from `inquiry m_reload;` */
        traverseproc m_traverse;
        inquiry m_clear;
        freefunc m_free;
    } PyModuleDef;

The *m_slots* member must be either NULL, or point to an array of
PyModuleDef_Slot structures, terminated by a slot with id set to 0
(i.e. ``{0, NULL}``).

To specify a slot, a unique slot ID must be provided.
New Python versions may introduce new slot IDs, but slot IDs will never be
recycled. Slots may get deprecated, but will continue to be supported
throughout Python 3.x.

A slot's value pointer may not be NULL, unless specified otherwise in the
slot's documentation.

The following slots are currently available, and described later:

* Py_mod_create
* Py_mod_exec

Unknown slot IDs will cause the import to fail with SystemError.

When using multi-phase initialization, the *m_name* field of PyModuleDef will
not be used during importing; the module name will be taken from the ModuleSpec.

Before it is returned from PyInit_*, the PyModuleDef object must be initialized
using the newly added PyModuleDef_Init function. This sets the object type
(which cannot be done statically on certain compilers), refcount, and internal
bookkeeping data (m_index).
For example, an extension module "example" would be exported as::

    static PyModuleDef example_def = {...}

    PyMODINIT_FUNC
    PyInit_example(void)
    {
        return PyModuleDef_Init(&example_def);
    }

The PyModuleDef object must be available for the lifetime of the module created
from it ? usually, it will be declared statically.

Pseudo-code Overview
--------------------

Here is an overview of how the modified importers will operate.
Details such as logging or handling of errors and invalid states
are left out, and C code is presented with a concise Python-like syntax.

The framework that calls the importers is explained in PEP 451
[#pep-0451-loading]_.

::

    importlib/_bootstrap.py:

        class BuiltinImporter:
            def create_module(self, spec):
                module = _imp.create_builtin(spec)

            def exec_module(self, module):
                _imp.exec_dynamic(module)

            def load_module(self, name):
                # use a backwards compatibility shim
                _load_module_shim(self, name)

    importlib/_bootstrap_external.py:

        class ExtensionFileLoader:
            def create_module(self, spec):
                module = _imp.create_dynamic(spec)

            def exec_module(self, module):
                _imp.exec_dynamic(module)

            def load_module(self, name):
                # use a backwards compatibility shim
                _load_module_shim(self, name)

    Python/import.c (the _imp module):

        def create_dynamic(spec):
            name = spec.name
            path = spec.origin

            # Find an already loaded module that used single-phase init.
            # For multi-phase initialization, mod is NULL, so a new module
            # is always created.
            mod = _PyImport_FindExtensionObject(name, name)
            if mod:
                return mod

            return _PyImport_LoadDynamicModuleWithSpec(spec)

        def exec_dynamic(module):
            if not isinstance(module, types.ModuleType):
                # non-modules are skipped -- PyModule_GetDef fails on them
                return

            def = PyModule_GetDef(module)
            state = PyModule_GetState(module)
            if state is NULL:
                PyModule_ExecDef(module, def)

        def create_builtin(spec):
            name = spec.name

            # Find an already loaded module that used single-phase init.
            # For multi-phase initialization, mod is NULL, so a new module
            # is always created.
            mod = _PyImport_FindExtensionObject(name, name)
            if mod:
                return mod

            for initname, initfunc in PyImport_Inittab:
                if name == initname:
                    m = initfunc()
                    if isinstance(m, PyModuleDef):
                        def = m
                        return PyModule_FromDefAndSpec(def, spec)
                    else:
                        # fall back to single-phase initialization
                        module = m
                        _PyImport_FixupExtensionObject(module, name, name)
                        return module

    Python/importdl.c:

        def _PyImport_LoadDynamicModuleWithSpec(spec):
            path = spec.origin
            package, dot, name = spec.name.rpartition('.')

            # see the "Non-ASCII module names" section for export_hook_name
            hook_name = export_hook_name(name)

            # call platform-specific function for loading exported function
            # from shared library
            exportfunc = _find_shared_funcptr(hook_name, path)

            m = exportfunc()
            if isinstance(m, PyModuleDef):
                def = m
                return PyModule_FromDefAndSpec(def, spec)

            module = m

            # fall back to single-phase initialization
            ....

    Objects/moduleobject.c:

        def PyModule_FromDefAndSpec(def, spec):
            name = spec.name
            create = None
            for slot, value in def.m_slots:
                if slot == Py_mod_create:
                    create = value
            if create:
                m = create(spec, def)
            else:
                m = PyModule_New(name)

            if isinstance(m, types.ModuleType):
                m.md_state = None
                m.md_def = def

            if def.m_methods:
                PyModule_AddFunctions(m, def.m_methods)
            if def.m_doc:
                PyModule_SetDocString(m, def.m_doc)

        def PyModule_ExecDef(module, def):
            if isinstance(module, types.module_type):
                if module.md_state is NULL:
                    # allocate a block of zeroed-out memory
                    module.md_state = _alloc(module.md_size)

            if def.m_slots is NULL:
                return

            for slot, value in def.m_slots:
                if slot == Py_mod_exec:
                    value(module)


Module Creation Phase
---------------------

Creation of the module object ? that is, the implementation of
ExecutionLoader.create_module ? is governed by the Py_mod_create slot.

The Py_mod_create slot
......................

The Py_mod_create slot is used to support custom module subclasses.
The value pointer must point to a function with the following signature::

    PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)

The function receives a ModuleSpec instance, as defined in PEP 451,
and the PyModuleDef structure.
It should return a new module object, or set an error
and return NULL.

This function is not responsible for setting import-related attributes
specified in PEP 451 [#pep-0451-attributes]_ (such as ``__name__`` or
``__loader__``) on the new module.

There is no requirement for the returned object to be an instance of
types.ModuleType. Any type can be used, as long as it supports setting and
getting attributes, including at least the import-related attributes.
However, only ModuleType instances support module-specific functionality
such as per-module state and processing of execution slots.
If something other than a ModuleType subclass is returned, no execution slots
may be defined; if any are, a SystemError is raised.

Note that when this function is called, the module's entry in sys.modules
is not populated yet. Attempting to import the same module again
(possibly transitively), may lead to an infinite loop.
Extension authors are advised to keep Py_mod_create minimal, an in particular
to not call user code from it.

Multiple Py_mod_create slots may not be specified. If they are, import
will fail with SystemError.

If Py_mod_create is not specified, the import machinery will create a normal
module object using PyModule_New. The name is taken from *spec*.


Post-creation steps
...................

If the Py_mod_create function returns an instance of types.ModuleType
or a subclass (or if a Py_mod_create slot is not present), the import
machinery will associate the PyModuleDef with the module.
This also makes the PyModuleDef accessible to execution phase, the
PyModule_GetDef function, and garbage collection routines (traverse,
clear, free).

If the Py_mod_create function does not return a module subclass, then m_size
must be 0, and m_traverse, m_clear and m_free must all be NULL.
Otherwise, SystemError is raised.

Additionally, initial attributes specified in the PyModuleDef are set on the
module object, regardless of its type:

* The docstring is set from m_doc, if non-NULL.
* The module's functions are initialized from m_methods, if any.


Module Execution Phase
----------------------

Module execution -- that is, the implementation of
ExecutionLoader.exec_module -- is governed by "execution slots".
This PEP only adds one, Py_mod_exec, but others may be added in the future.

The execution phase is done on the PyModuleDef associated with the module
object. For objects that are not a subclass of PyModule_Type (for which
PyModule_GetDef would fail), the execution phase is skipped.

Execution slots may be specified multiple times, and are processed in the order
they appear in the slots array.
When using the default import machinery, they are processed after
import-related attributes specified in PEP 451 [#pep-0451-attributes]_
(such as ``__name__`` or ``__loader__``) are set and the module is added
to sys.modules.


Pre-Execution steps
...................

Before processing the execution slots, per-module state is allocated for the
module. From this point on, per-module state is accessible through
PyModule_GetState.


The Py_mod_exec slot
....................

The entry in this slot must point to a function with the following signature::

    int (*PyModuleExecFunction)(PyObject* module)

It will be called to initialize a module. Usually, this amounts to
setting the module's initial attributes.
The "module" argument receives the module object to initialize.

The function must return ``0`` on success, or, on error, set an exception and
return ``-1``.

If PyModuleExec replaces the module's entry in sys.modules, the new object
will be used and returned by importlib machinery after all execution slots
are processed. This is a feature of the import machinery itself.
The slots themselves are all processed using the module returned from the
creation phase; sys.modules is not consulted during the execution phase.
(Note that for extension modules, implementing Py_mod_create is usually
a better solution for using custom module objects.)


Legacy Init
-----------

The backwards-compatible single-phase initialization continues to be supported.
In this scheme, the PyInit function returns a fully initialized module rather
than a PyModuleDef object.
In this case, the PyInit hook implements the creation phase, and the execution
phase is a no-op.

Modules that need to work unchanged on older versions of Python should stick to
single-phase initialization, because the benefits it brings can't be
back-ported.
Here is an example of a module that supports multi-phase initialization,
and falls back to single-phase when compiled for an older version of CPython.
It is included mainly as an illustration of the changes needed to enable
multi-phase init::

    #include <Python.h>

    static int spam_exec(PyObject *module) {
        PyModule_AddStringConstant(module, "food", "spam");
        return 0;
    }

    #ifdef Py_mod_exec
    static PyModuleDef_Slot spam_slots[] = {
        {Py_mod_exec, spam_exec},
        {0, NULL}
    };
    #endif

    static PyModuleDef spam_def = {
        PyModuleDef_HEAD_INIT,                      /* m_base */
        "spam",                                     /* m_name */
        PyDoc_STR("Utilities for cooking spam"),    /* m_doc */
        0,                                          /* m_size */
        NULL,                                       /* m_methods */
    #ifdef Py_mod_exec
        spam_slots,                                 /* m_slots */
    #else
        NULL,
    #endif
        NULL,                                       /* m_traverse */
        NULL,                                       /* m_clear */
        NULL,                                       /* m_free */
    };

    PyMODINIT_FUNC
    PyInit_spam(void) {
    #ifdef Py_mod_exec
        return PyModuleDef_Init(&spam_def);
    #else
        PyObject *module;
        module = PyModule_Create(&spam_def);
        if (module == NULL) return NULL;
        if (spam_exec(module) != 0) {
            Py_DECREF(module);
            return NULL;
        }
        return module;
    #endif
    }


Built-In modules
----------------

Any extension module can be used as a built-in module by linking it into
the executable, and including it in the inittab (either at runtime with
PyImport_AppendInittab, or at configuration time, using tools like *freeze*).

To keep this possibility, all changes to extension module loading introduced
in this PEP will also apply to built-in modules.
The only exception is non-ASCII module names, explained below.


Subinterpreters and Interpreter Reloading
-----------------------------------------

Extensions using the new initialization scheme are expected to support
subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly,
avoiding the issues mentioned in Python documentation [#subinterpreter-docs]_.
The mechanism is designed to make this easy, but care is still required
on the part of the extension author.
No user-defined functions, methods, or instances may leak to different
interpreters.
To achieve this, all module-level state should be kept in either the module
dict, or in the module object's storage reachable by PyModule_GetState.
A simple rule of thumb is: Do not define any static data, except built-in types
with no mutable or user-settable class attributes.


Functions incompatible with multi-phase initialization
------------------------------------------------------

The PyModule_Create function will fail when used on a PyModuleDef structure
with a non-NULL *m_slots* pointer.
The function doesn't have access to the ModuleSpec object necessary for
multi-phase initialization.

The PyState_FindModule function will return NULL, and PyState_AddModule
and PyState_RemoveModule will also fail on modules with non-NULL *m_slots*.
PyState registration is disabled because multiple module objects may be created
from the same PyModuleDef.


Module state and C-level callbacks
----------------------------------

Due to the unavailability of PyState_FindModule, any function that needs access
to module-level state (including functions, classes or exceptions defined at
the module level) must receive a reference to the module object (or the
particular object it needs), either directly or indirectly.
This is currently difficult in two situations:

* Methods of classes, which receive a reference to the class, but not to
  the class's module
* Libraries with C-level callbacks, unless the callbacks can receive custom
  data set at callback registration

Fixing these cases is outside of the scope of this PEP, but will be needed for
the new mechanism to be useful to all modules. Proper fixes have been discussed
on the import-sig mailing list [#findmodule-discussion]_.

As a rule of thumb, modules that rely on PyState_FindModule are, at the moment,
not good candidates for porting to the new mechanism.


New Functions
-------------

A new function and macro implementing the module creation phase will be added.
These are similar to PyModule_Create and PyModule_Create2, except they
take an additional ModuleSpec argument, and handle module definitions with
non-NULL slots::

    PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
    PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
                                        int module_api_version)

A new function implementing the module execution phase will be added.
This allocates per-module state (if not allocated already), and *always*
processes execution slots. The import machinery calls this method when
a module is executed, unless the module is being reloaded::

    PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)

Another function will be introduced to initialize a PyModuleDef object.
This idempotent function fills in the type, refcount, and module index.
It returns its argument cast to PyObject*, so it can be returned directly
from a PyInit function::

    PyObject * PyModuleDef_Init(PyModuleDef *);

Additionally, two helpers will be added for setting the docstring and
methods on a module::

    int PyModule_SetDocString(PyObject *, const char *)
    int PyModule_AddFunctions(PyObject *, PyMethodDef *)


Export Hook Name
----------------

As portable C identifiers are limited to ASCII, module names
must be encoded to form the PyInit hook name.

For ASCII module names, the import hook is named
PyInit_<modulename>, where <modulename> is the name of the module.

For module names containing non-ASCII characters, the import hook is named
PyInitU_<encodedname>, where the name is encoded using CPython's
"punycode" encoding (Punycode [#rfc-3492]_ with a lowercase suffix),
with hyphens ("-") replaced by underscores ("_").


In Python::

    def export_hook_name(name):
        try:
            suffix = b'_' + name.encode('ascii')
        except UnicodeEncodeError:
            suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
        return b'PyInit' + suffix

Examples:

=============  ===================
Module name    Init hook name
=============  ===================
spam           PyInit_spam
lan?m?t        PyInitU_lanmt_2sa6t
???          PyInitU_zck5b2b
=============  ===================

For modules with non-ASCII names, single-phase initialization is not supported.

In the initial implementation of this PEP, built-in modules with non-ASCII
names will not be supported.


Module Reloading
----------------

Reloading an extension module using importlib.reload() will continue to
have no effect, except re-setting import-related attributes.

Due to limitations in shared library loading (both dlopen on POSIX and
LoadModuleEx on Windows), it is not generally possible to load
a modified library after it has changed on disk.

Use cases for reloading other than trying out a new version of the module
are too rare to require all module authors to keep reloading in mind.
If reload-like functionality is needed, authors can export a dedicated
function for it.


Multiple modules in one library
-------------------------------

To support multiple Python modules in one shared library, the library can
export additional PyInit* symbols besides the one that corresponds
to the library's filename.

Note that this mechanism can currently only be used to *load* extra modules,
but not to *find* them. (This is a limitation of the loader mechanism,
which this PEP does not try to modify.)
To work around the lack of a suitable finder, code like the following
can be used::

    import importlib.machinery
    import importlib.util
    loader = importlib.machinery.ExtensionFileLoader(name, path)
    spec = importlib.util.spec_from_loader(name, loader)
    module = importlib.util.module_from_spec(spec)
    loader.exec_module(module)
    return module

On platforms that support symbolic links, these may be used to install one
library under multiple names, exposing all exported modules to normal
import machinery.


Testing and initial implementations
-----------------------------------

For testing, a new built-in module ``_testmultiphase`` will be created.
The library will export several additional modules using the mechanism
described in "Multiple modules in one library".

The ``_testcapi`` module will be unchanged, and will use single-phase
initialization indefinitely (or until it is no longer supported).

The ``array`` and ``xx*`` modules will be converted to use multi-phase
initialization as part of the initial implementation.


Summary of API Changes and Additions
====================================

New functions:

* PyModule_FromDefAndSpec (macro)
* PyModule_FromDefAndSpec2
* PyModule_ExecDef
* PyModule_SetDocString
* PyModule_AddFunctions
* PyModuleDef_Init

New macros:

* Py_mod_create
* Py_mod_exec

New types:

* PyModuleDef_Type will be exposed

New structures:

* PyModuleDef_Slot

PyModuleDef.m_reload changes to PyModuleDef.m_slots.

The internal ``_imp`` module will have backwards incompatible changes:
``create_builtin``, ``create_dynamic``, and ``exec_dynamic`` will be added;
``init_builtin``, ``load_dynamic`` will be removed.

The undocumented functions ``imp.load_dynamic`` and ``imp.init_builtin`` will
be replaced by backwards-compatible shims.


Backwards Compatibility
-----------------------

Existing modules will continue to be source- and binary-compatible with new
versions of Python.
Modules that use multi-phase initialization will not be compatible with
versions of Python that do not implement this PEP.

The functions ``init_builtin`` and ``load_dynamic`` will be removed from
the ``_imp`` module (but not from the ``imp`` module).

All changed loaders (``BuiltinImporter`` and ``ExtensionFileLoader``) will
remain backwards-compatible; the ``load_module`` method will be replaced by
a shim.

Internal functions of Python/import.c and Python/importdl.c will be removed.
(Specifically, these are ``_PyImport_GetDynLoadFunc``,
``_PyImport_GetDynLoadWindows``, and ``_PyImport_LoadDynamicModule``.)


Possible Future Extensions
==========================

The slots mechanism, inspired by PyType_Slot from PEP 384,
allows later extensions.

Some extension modules exports many constants; for example _ssl has
a long list of calls in the form::

    PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
                            PY_SSL_ERROR_ZERO_RETURN);

Converting this to a declarative list, similar to PyMethodDef,
would reduce boilerplate, and provide free error-checking which
is often missing.

String constants and types can be handled similarly.
(Note that non-default bases for types cannot be portably specified
statically; this case would need a Py_mod_exec function that runs
before the slots are added. The free error-checking would still be
beneficial, though.)

Another possibility is providing a "main" function that would be run
when the module is given to Python's -m switch.
For this to work, the runpy module will need to be modified to take
advantage of ModuleSpec-based loading introduced in PEP 451.
Also, it will be necessary to add a mechanism for setting up a module
according to slots it wasn't originally defined with.


Implementation
==============

Work-in-progress implementation is available in a Github repository [#gh-repo]_;
a patchset is at [#gh-patch]_.


Previous Approaches
===================

Stefan Behnel's initial proto-PEP [#stefans_protopep]_
had a "PyInit_modulename" hook that would create a module class,
whose ``__init__`` would be then called to create the module.
This proposal did not correspond to the (then nonexistent) PEP 451,
where module creation and initialization is broken into distinct steps.
It also did not support loading an extension into pre-existing module objects.

Nick Coghlan proposed "Create" and "Exec" hooks, and wrote a prototype
implementation [#nicks-prototype]_.
At this time PEP 451 was still not implemented, so the prototype
does not use ModuleSpec.

The original version of this PEP used Create and Exec hooks, and allowed
loading into arbitrary pre-constructed objects with Exec hook.
The proposal made extension module initialization closer to how Python modules
are initialized, but it was later recognized that this isn't an important goal.
The current PEP describes a simpler solution.

A further iteration used a "PyModuleExport" hook as an alternative to PyInit,
where PyInit was used for existing scheme, and PyModuleExport for multi-phase.
However, not being able to determine the hook name based on module name
complicated automatic generation of PyImport_Inittab by tools like freeze.
Keeping only the PyInit hook name, even if it's not entirely appropriate for
exporting a definition, yielded a much simpler solution.


References
==========

.. [#pep-0451-attributes]
   https://www.python.org/dev/peps/pep-0451/#attributes

.. [#stefans_protopep]
   https://mail.python.org/pipermail/python-dev/2013-August/128087.html

.. [#nicks-prototype]
   https://mail.python.org/pipermail/python-dev/2013-August/128101.html

.. [#rfc-3492]
   http://tools.ietf.org/html/rfc3492

.. [#gh-repo]
   https://github.com/encukou/cpython/commits/pep489

.. [#gh-patch]
   https://github.com/encukou/cpython/compare/master...encukou:pep489.patch

.. [#findmodule-discussion]
   https://mail.python.org/pipermail/import-sig/2015-April/000959.html

.. [#pep-0451-loading]
   https://www.python.org/dev/peps/pep-0451/#how-loading-will-work]

.. [#subinterpreter-docs]
   https://docs.python.org/3/c-api/init.html#sub-interpreter-support


Copyright
=========

This document has been placed in the public domain.

From encukou at gmail.com  Thu May 21 18:17:37 2015
From: encukou at gmail.com (Petr Viktorin)
Date: Thu, 21 May 2015 18:17:37 +0200
Subject: [Import-SIG] PEP 489: Multi-phase extension module
 initialization; version 6
In-Reply-To: <CALFfu7CtRhX4bQUBWKRsjOWkn3hiPbnmd6S_VRqPg4VJw9kK-A@mail.gmail.com>
References: <CA+=+wqAsvYgKy+f2zD5F+i02KRgAQ+2T=yZOeFNRaDp9MBpnaA@mail.gmail.com>
 <CALFfu7CtRhX4bQUBWKRsjOWkn3hiPbnmd6S_VRqPg4VJw9kK-A@mail.gmail.com>
Message-ID: <CA+=+wqBV3JzG+e6KGbDmEOVfQD1_UfWSeyE+z=yMMJC8dVsNFg@mail.gmail.com>

On Wed, May 20, 2015 at 11:47 PM, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> FYI, Nick asked if I would be willing to be BDFL-Delegate for this PEP
> and Guido has given the okay.  I've added myself to the PEP's header.
> I'll try to make a decision soon (in time to land the patch before the
> feature freeze), but I also must be confident about the pronouncement.
>
> -eric

Thank you for taking this on!
I believe all issues raised so far are addressed in the latest update,
which is now live.
If you still have an unaddressed point, please let me know.

From bcannon at gmail.com  Thu May 28 17:11:38 2015
From: bcannon at gmail.com (Brett Cannon)
Date: Thu, 28 May 2015 15:11:38 +0000
Subject: [Import-SIG] Idea: concept of a builder or transformer to
	compliment loaders
Message-ID: <CAP1=2W4jMbKY-oa+f0SspSJ481eCeZn4CVxvoHHgpVzOnQNSMg@mail.gmail.com>

I should start off by saying I don't plan to pursue this idea, but I wanted
to write it down for posterity and in case anyone else has thought about
this.

That being said, the idea of macros and other source-transforming things
done to Python code has come up a few times on python-ideas as of late. Now
experimenting with this sort of thing using a custom loader is not hard,
and thanks to importlib.abc.ResourceLoader.source_to_code()
<https://docs.python.org/3/library/importlib.html#importlib.abc.InspectLoader.source_to_code>
it's fairly easy to do (by design; I tried to initially structure
importlib's APIs to making alternative storage backends easy as well as
alternative syntax stuff like Quixote from back in the day).

But one thing I realized is that while finders and loaders are necessary
for alternative code storage mechanisms, they are not the right abstraction
for tweaking code semantics. Really all you need is a function that takes
in source code and spits out a code object to use with exec() (hence
ResourceLoader.source_to_code() even existing). It somewhat sucks that
people who just want to tweak code semantics have to define a loader
subclass and instantiate a new finder when all that is mostly stuff that
doesn't concern them. It also sucks that they would have to do that for
every storage type, e.g. local files and zip files.

Now I don't have a solid solution to propose for this niche use case. It
makes me want to have some kind of way to register compiler functions, but
that would be limiting if it went source -> code object. AST -> AST would
allow for chaining much like Victor has proposed in the past, but it also
means that people who want a transpiler to go source -> source are left
out. And then there is the whole thing of how to get the loaders to know of
these transpilers/transformers/compilers as adding more global state to sys
feels dirty (maybe an attribute on finders that they can draw from if they
so choose?), but maybe it isn't that big of a deal as long as they are just
callables and people realize they must be re-entrant.

As I said, I don't plan to work on this, but I wanted to get my ideas
written down in case someone else cared.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20150528/4b0161ff/attachment.html>

From ericsnowcurrently at gmail.com  Fri May 29 00:18:57 2015
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Thu, 28 May 2015 16:18:57 -0600
Subject: [Import-SIG] Idea: concept of a builder or transformer to
 compliment loaders
In-Reply-To: <CAP1=2W4jMbKY-oa+f0SspSJ481eCeZn4CVxvoHHgpVzOnQNSMg@mail.gmail.com>
References: <CAP1=2W4jMbKY-oa+f0SspSJ481eCeZn4CVxvoHHgpVzOnQNSMg@mail.gmail.com>
Message-ID: <CALFfu7AsBT9-0n1JwsrZ7sFMfEOJxx=DgwFWiuk7KtT=bXhxPA@mail.gmail.com>

On Thu, May 28, 2015 at 9:11 AM, Brett Cannon <bcannon at gmail.com> wrote:
> I should start off by saying I don't plan to pursue this idea, but I wanted
> to write it down for posterity and in case anyone else has thought about
> this.
>
> That being said, the idea of macros and other source-transforming things
> done to Python code has come up a few times on python-ideas as of late. Now
> experimenting with this sort of thing using a custom loader is not hard, and
> thanks to importlib.abc.ResourceLoader.source_to_code() it's fairly easy to
> do (by design; I tried to initially structure importlib's APIs to making
> alternative storage backends easy as well as alternative syntax stuff like
> Quixote from back in the day).
>
> But one thing I realized is that while finders and loaders are necessary for
> alternative code storage mechanisms, they are not the right abstraction for
> tweaking code semantics.

Agreed.

> Really all you need is a function that takes in
> source code and spits out a code object to use with exec() (hence
> ResourceLoader.source_to_code() even existing). It somewhat sucks that
> people who just want to tweak code semantics have to define a loader
> subclass and instantiate a new finder when all that is mostly stuff that
> doesn't concern them. It also sucks that they would have to do that for
> every storage type, e.g. local files and zip files.

Yep.

>
> Now I don't have a solid solution to propose for this niche use case. It
> makes me want to have some kind of way to register compiler functions, but

I had the same thought.

> that would be limiting if it went source -> code object. AST -> AST would
> allow for chaining much like Victor has proposed in the past, but it also
> means that people who want a transpiler to go source -> source are left out.

Yeah, it feels like there's an encapsulation there around the various
pieces of compilation.  Furthermore, I'd expect such an abstraction to
consider the needs of alternate Python implementations as well.

> And then there is the whole thing of how to get the loaders to know of these
> transpilers/transformers/compilers
> as adding more global state to sys feels
> dirty (maybe an attribute on finders that they can draw from if they so
> choose?), but maybe it isn't that big of a deal as long as they are just
> callables and people realize they must be re-entrant.

This is where something like ImportSystem (nee ImportEngine) would
help.  We'd just have sys.importsystem and add state there as
appropriate without further cluttering up the sys module.

FWIW, I've considered a number of minor additions similar to what
you're talking about for niche needs where it would still be nice to
have a convenient API because of the overhead of writing and managing
a finder/loader.

Perhaps it's just a matter of providing helper decorators along the
lines of contextlib.contextmanager, which convert your simple function
into the necessary format at register it in the correct place (e.g.
finder+loader -> sys.path/sys.metapath).

>
> As I said, I don't plan to work on this, but I wanted to get my ideas
> written down in case someone else cared.

:)

-eric