[Python-checkins] peps: PEP 432: Reframe as core init vs main interpreter init

nick.coghlan python-checkins at python.org
Tue Jun 30 13:43:33 CEST 2015


https://hg.python.org/peps/rev/bed6e48921ad
changeset:   5900:bed6e48921ad
user:        Nick Coghlan <ncoghlan at gmail.com>
date:        Tue Jun 30 21:43:19 2015 +1000
summary:
  PEP 432: Reframe as core init vs main interpreter init

files:
  pep-0432.txt |  280 +++++++++++++++++++-------------------
  1 files changed, 141 insertions(+), 139 deletions(-)


diff --git a/pep-0432.txt b/pep-0432.txt
--- a/pep-0432.txt
+++ b/pep-0432.txt
@@ -28,50 +28,42 @@
 Proposal
 ========
 
-This PEP proposes that CPython move to an explicit multi-phase initialization
-process, where a preliminary interpreter is put in place with limited OS
-interaction capabilities early in the startup sequence. This essential core
-remains in place while all of the configuration settings are determined,
-until a final configuration call takes those settings and finishes
-bootstrapping the interpreter immediately before locating and executing
-the main module.
+This PEP proposes that initialization of the CPython runtime be split into
+two clearly distinct phases:
+
+* core runtime initialization
+* main interpreter initialization
+
+The proposed design also has significant implications for:
+
+* main module execution
+* subinterpreter initialization
 
 In the new design, the interpreter will move through the following
 well-defined phases during the initialization sequence:
 
 * Pre-Initialization - no interpreter available
-* Initializing - interpreter partially available
-* Initialized - interpreter available, __main__ related metadata
-  incomplete
-
-With the interpreter itself fully initialised, main module execution will
-then proceed through two phases:
-
-* Main Preparation - __main__ related metadata populated
-* Main Execution - bytecode executing in the __main__ module namespace
-
-(Embedding applications may choose not to use the Main Preparation and
-Execution phases)
+* Core Initialized - main interpreter partially available,
+  subinterpreter creation not yet available
+* Initialized - main interpreter fully available, subinterpreter creation
+  available
 
 As a concrete use case to help guide any design changes, and to solve a known
 problem where the appropriate defaults for system utilities differ from those
 for running user scripts, this PEP also proposes the creation and
 distribution of a separate system Python (``pysystem``) executable
-which, by default, ignores user site directories and environment variables,
-and does not implicitly set ``sys.path[0]`` based on the current directory
-or the script being executed (it will, however, still support virtual
-environments).
+which, by default, operates in "isolated mode" (as selected by the CPython
+``-I`` switch).
 
 To keep the implementation complexity under control, this PEP does *not*
 propose wholesale changes to the way the interpreter state is accessed at
 runtime. Changing the order in which the existing initialization steps
-occur in order to make
-the startup sequence easier to maintain is already a substantial change, and
-attempting to make those other changes at the same time will make the
-change significantly more invasive and much harder to review. However, such
-proposals may be suitable topics for follow-on PEPs or patches - one key
-benefit of this PEP is decreasing the coupling between the internal storage
-model and the configuration interface, so such changes should be easier
+occur in order to make the startup sequence easier to maintain is already a
+substantial change, and attempting to make those other changes at the same time
+will make the change significantly more invasive and much harder to review.
+However, such proposals may be suitable topics for follow-on PEPs or patches
+- one key benefit of this PEP is decreasing the coupling between the internal
+storage model and the configuration interface, so such changes should be easier
 once this PEP has been implemented.
 
 
@@ -94,10 +86,10 @@
 
 A number of proposals are on the table for even *more* sophisticated
 startup behaviour, such as better control over ``sys.path``
-initialization (easily adding additional directories on the command line
-in a cross-platform fashion [7_], as well as controlling the configuration of
+initialization (e.g. easily adding additional directories on the command line
+in a cross-platform fashion [7_], controlling the configuration of
 ``sys.path[0]`` [8_]), easier configuration of utilities like coverage
-tracing when launching Python subprocesses [9_].
+tracing when launching Python subprocesses [9_]).
 
 Rather than continuing to bolt such behaviour onto an already complicated
 system, this PEP proposes to start simplifying the status quo by introducing
@@ -259,54 +251,33 @@
 * Pre-Initialization:
 
   * no interpreter is available.
-  * ``Py_IsInitializing()`` returns ``0``
+  * ``Py_IsCoreInitialized()`` returns ``0``
   * ``Py_IsInitialized()`` returns ``0``
   * The embedding application determines the settings required to create the
     main interpreter and moves to the next phase by calling
-    ``Py_BeginInitialization``.
+    ``Py_InitializationCore``.
 
-* Initializing:
+* Core Initialized:
 
   * the main interpreter is available, but only partially configured.
-  * ``Py_IsInitializing()`` returns ``1``
+  * ``Py_IsCoreInitialized()`` returns ``1``
   * ``Py_IsInitialized()`` returns ``0``
   * The embedding application determines and applies the settings
     required to complete the initialization process by calling
-    ``Py_ReadConfig`` and ``Py_EndInitialization``.
+    ``Py_ReadMainInterpreterConfig`` and ``Py_InitializeMainInterpreter``.
 
 * Initialized:
 
   * the main interpreter is available and fully operational, but
     ``__main__`` related metadata is incomplete
-  * ``Py_IsInitializing()`` returns ``0``
+  * ``Py_IsCoreInitialized()`` returns ``1``
   * ``Py_IsInitialized()`` returns ``1``
 
-
-Main Execution Phases
----------------------
-
-After initializing the interpreter, the embedding application may continue
-on to execute code in the ``__main__`` module namespace.
-
-* Main Preparation:
-
-  * subphase of Initialized (not separately identified at runtime)
-  * fully populates ``__main__`` related metadata
-  * may execute code in ``__main__`` namespace (e.g. ``PYTHONSTARTUP``)
-  * invoked as ``PyRun_PrepareMain``
-
-* Main Execution:
-
-  * subphase of Initialized (not separately identified at runtime)
-  * user supplied bytecode is being executed in the ``__main__`` namespace
-  * invoked as ``PyRun_ExecMain``
-
 Invocation of Phases
 --------------------
 
 All listed phases will be used by the standard CPython interpreter and the
-proposed System Python interpreter. Other embedding applications may
-choose to skip the step of executing code in the ``__main__`` namespace.
+proposed System Python interpreter.
 
 An embedding application may still continue to leave initialization almost
 entirely under CPython's control by using the existing ``Py_Initialize``
@@ -317,21 +288,21 @@
 
     /* Phase 1: Pre-Initialization */
     PyCoreConfig core_config = PyCoreConfig_INIT;
-    PyConfig config = PyConfig_INIT;
+    PyMainInterpreterConfig config = PyMainInterpreterConfig_INIT;
     /* Easily control the core configuration */
     core_config.ignore_environment = 1; /* Ignore environment variables */
     core_config.use_hash_seed = 0;      /* Full hash randomisation */
-    Py_BeginInitialization(&core_config);
+    Py_InitializeCore(&core_config);
     /* Phase 2: Initialization */
     /* Optionally preconfigure some settings here - they will then be
      * used to derive other settings */
-    Py_ReadConfig(&config);
+    Py_ReadMainInterpreterConfig(&config);
     /* Can completely override derived settings here */
-    Py_EndInitialization(&config);
+    Py_InitializeMainInterpreter(&config);
     /* Phase 3: Initialized */
     /* If an embedding application has no real concept of a main module
      * it can just stop the initialization process here.
-     * Alternatively, it can launch __main__ via the PyRun_*Main functions.
+     * Alternatively, it can launch __main__ via the relevant API functions.
      */
 
 
@@ -356,7 +327,7 @@
 
 The proposed API for this step in the startup sequence is::
 
-    void Py_BeginInitialization(const PyCoreConfig *config);
+    void Py_InitializeCore(const PyCoreConfig *config);
 
 Like ``Py_Initialize``, this part of the new API treats initialization
 failures
@@ -366,7 +337,7 @@
 the one already being proposed.
 
 The new ``PyCoreConfig`` struct holds the settings required for preliminary
-configuration::
+configuration of the core runtime and creation of the main interpreter::
 
     /* Note: if changing anything in PyCoreConfig, also update
      * PyCoreConfig_INIT */
@@ -435,18 +406,21 @@
 in order to keep the bootstrapping environment consistent across
 different embedding applications. If we can create a valid interpreter state
 without the setting, then the setting should go in the configuration passed
-to ``Py_EndInitialization()`` rather than in the core configuration.
+to ``Py_InitializeMainInterpreter()`` rather than in the core configuration.
 
 A new query API will allow code to determine if the interpreter is in the
 bootstrapping state between the creation of the interpreter state and the
 completion of the bulk of the initialization process::
 
-    int Py_IsInitializing();
+    int Py_IsCoreInitialized();
 
-Attempting to call ``Py_BeginInitialization()`` again when
-``Py_IsInitializing()`` or ``Py_IsInitialized()`` is true is a fatal error.
+Attempting to call ``Py_InitializeCore()`` again when
+``Py_IsCoreInitialized()`` is true is a fatal error.
 
-While in the initializing state, the interpreter should be fully functional
+As frozen bytecode may now be legitimately run in an interpreter which is not
+yet fully initialized, ``sys.flags`` will gain a new ``initialized`` flag.
+
+With the core runtime initialised, the interpreter should be fully functional
 except that:
 
 * compilation is not allowed (as the parser and compiler are not yet
@@ -463,23 +437,25 @@
   * ``sys.exec_prefix``
   * ``sys.prefix``
   * ``sys.warnoptions``
-  * ``sys.flags``
   * ``sys.dont_write_bytecode``
   * ``sys.stdin``
   * ``sys.stdout``
 * The filesystem encoding is not yet defined
 * The IO encoding is not yet defined
 * CPython signal handlers are not yet installed
-* only builtin and frozen modules may be imported (due to above limitations)
+* Only builtin and frozen modules may be imported (due to above limitations)
 * ``sys.stderr`` is set to a temporary IO object using unbuffered binary
   mode
+* The ``sys.flags`` attribute exists, but may contain flags may not yet
+  have their final values.
+* The ``sys.flags.initialized`` attribute is set to ``0``
 * The ``warnings`` module is not yet initialized
 * The ``__main__`` module does not yet exist
 
 <TBD: identify any other notable missing functionality>
 
 The main things made available by this step will be the core Python
-datatypes, in particular dictionaries, lists and strings. This allows them
+data types, in particular dictionaries, lists and strings. This allows them
 to be used safely for all of the remaining configuration steps (unlike the
 status quo).
 
@@ -487,9 +463,10 @@
 allowing any further configuration data to be stored on the interpreter
 object rather than in C process globals.
 
-Any call to ``Py_BeginInitialization()`` must have a matching call to
-``Py_Finalize()``. It is acceptable to skip calling Py_EndInitialization() in
-between (e.g. if attempting to read the configuration settings fails)
+Any call to ``Py_InitializeCore()`` must have a matching call to
+``Py_Finalize()``. It is acceptable to skip calling
+``Py_InitializeMainInterpreter()`` in between (e.g. if attempting to read the
+main interpreter configuration settings fails)
 
 
 Determining the remaining configuration settings
@@ -499,7 +476,7 @@
 settings needed to complete the process. No changes are made to the
 interpreter state at this point. The core API for this step is::
 
-    int Py_ReadConfig(PyConfig *config);
+    int Py_ReadMainInterpreterConfig(PyMainInterpreterConfig *config);
 
 The config argument should be a pointer to a config struct (which may be
 a temporary one stored on the C stack). For any already configured value
@@ -512,35 +489,47 @@
 code (which is relatively straightforward, thanks to the infrastructure
 already put in place to expose ``sys.implementation``).
 
-Unlike ``Py_Initialize`` and ``Py_BeginInitialization``, this call will raise
+Unlike ``Py_Initialize`` and ``Py_InitializeCore``, this call will raise
 an exception and report an error return rather than exhibiting fatal errors
 if a problem is found with the config data.
 
 Any supported configuration setting which is not already set will be
 populated appropriately in the supplied configuration struct. The default
 configuration can be overridden entirely by setting the value *before*
-calling ``Py_ReadConfiguration``. The provided value will then also be used
-in calculating any other settings derived from that value.
+calling ``Py_ReadMainInterpreterConfig``. The provided value will then also be
+used in calculating any other settings derived from that value.
 
 Alternatively, settings may be overridden *after* the
-``Py_ReadConfiguration`` call (this can be useful if an embedding
+``Py_ReadMainInterpreterConfig`` call (this can be useful if an embedding
 application wants to adjust a setting rather than replace it completely,
 such as removing ``sys.path[0]``).
 
 Merely reading the configuration has no effect on the interpreter state: it
 only modifies the passed in configuration struct. The settings are not
-applied to the running interpreter until the ``Py_EndInitialization`` call
-(see below).
+applied to the running interpreter until the ``Py_InitializeMainInterpreter``
+call (see below).
 
 
 Supported configuration settings
 --------------------------------
 
-The new ``PyConfig`` struct holds the settings required to complete the
-interpreter configuration. All fields are either pointers to Python
-data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
+The interpreter configuration is split into two parts: settings which are
+either relevant only to the main interpreter or must be identical across the
+main interpreter and all subinterpreters, and settings which may vary across
+subinterpreters.
 
-    /* Note: if changing anything in PyConfig, also update PyConfig_INIT */
+NOTE: For initial implementation purposes, only the flag indicating whether
+or not the interpreter is the main interpreter will be configured on a per
+interpreter basis. Other fields will be reviewed for whether or not they can
+feasibly be made interpreter specific over the course of the implementation.
+
+The ``PyMainInterpreterConfig`` struct holds the settings required to
+complete the main interpreter configuration. These settings are also all
+passed through unmodified to subinterpreters. Fields are either pointers to
+Python data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
+
+    /* Note: if changing anything in PyMainInterpreterConfig, also update
+     * PyMainInterpreterConfig_INIT */
     typedef struct {
         /* Argument processing */
         PyListObject *raw_argv;
@@ -613,10 +602,10 @@
         int show_banner;              /* -q switch (inverted) */
         int inspect_main;             /* -i switch, PYTHONINSPECT */
 
-    } PyConfig;
+    } PyMainInterpreterConfig;
 
 
-    /* Struct initialization is pretty ugly in C89. Avoiding this mess would
+    /* Struct initialization is pretty horrible in C89. Avoiding this mess would
      * be the most attractive aspect of using a PyDictObject* instead... */
     #define _PyArgConfig_INIT  NULL, NULL, NULL, NULL
     #define _PyLocationConfig_INIT  NULL, NULL, NULL, NULL, NULL, NULL
@@ -631,13 +620,28 @@
     #define _PyMainConfig_INIT  -1, NULL, NULL, NULL, NULL, NULL, -1
     #define _PyInteractiveConfig_INIT  NULL, -1, -1
 
-    #define PyConfig_INIT {_PyArgConfig_INIT, _PyLocationConfig_INIT,
+    #define PyMainInterpreterConfig_INIT {
+                           _PyArgConfig_INIT, _PyLocationConfig_INIT,
                            _PySiteConfig_INIT, _PyImportConfig_INIT,
                            _PyStreamConfig_INIT, _PyFilesystemConfig_INIT,
                            _PyDebuggingConfig_INIT, _PyCodeGenConfig_INIT,
                            _PySignalConfig_INIT, _PyImplicitConfig_INIT,
                            _PyMainConfig_INIT, _PyInteractiveConfig_INIT}
 
+The ``PyInterpreterConfig`` struct holds the settings that may vary between
+the main interpreter and subinterpreters. For the main interpreter, these
+settings are automatically populated by ``Py_InitializeMainInterpreter()``.
+
+::
+
+    /* Note: if changing anything in PyInterpreterConfig, also update
+     * PyInterpreterConfig_INIT */
+    typedef struct {
+        int is_main_interpreter;    /* Easily check for subinterpreters */
+    } PyInterpreterConfig;
+
+    #define PyInterpreterConfig_INIT {0}
+
 <TBD: did I miss anything?>
 
 
@@ -645,26 +649,25 @@
 -----------------------------------------
 
 The final step in the initialization process is to actually put the
-configuration settings into effect and finish bootstrapping the interpreter
-up to full operation::
+configuration settings into effect and finish bootstrapping the main
+interpreter up to full operation::
 
-    int Py_EndInitialization(const PyConfig *config);
+    int Py_InitializeMainInterpreter(const PyMainInterpreterConfig *config);
 
-Like Py_ReadConfiguration, this call will raise an exception and report an
-error return rather than exhibiting fatal errors if a problem is found with
-the config data.
+Like ``Py_ReadMainInterpreterConfig``, this call will raise an exception and
+report an error return rather than exhibiting fatal errors if a problem is
+found with the config data.
 
 All configuration settings are required - the configuration struct
-should always be passed through ``Py_ReadConfig()`` to ensure it
+should always be passed through ``Py_ReadMainInterpreterConfig`` to ensure it
 is fully populated.
 
-After a successful call, ``Py_IsInitializing()`` will be false, while
-``Py_IsInitialized()`` will become true. The caveats described above for the
-interpreter during the initialization phase will no longer hold.
+After a successful call ``Py_IsInitialized()`` will become true. The caveats
+described above for the interpreter during the phase where only the core
+runtime is initialized will no longer hold.
 
-Attempting to call ``Py_EndInitialization()`` again when
-``Py_IsInitializing()`` is false or ``Py_IsInitialized()`` is true is an
-error.
+Attempting to call ``Py_InitializeMainInterpreter()`` again when
+``Py_IsInitialized()`` is true is an error.
 
 However, some metadata related to the ``__main__`` module may still be
 incomplete:
@@ -702,6 +705,10 @@
 
     int PyRun_PrepareMain();
 
+This operation is only permitted for the main interpreter, and will raise
+``RuntimeError`` when invoked from a thread where the current thread state
+belongs to a subinterpreter.
+
 The actual processing is driven by the main related settings stored in
 the interpreter state as part of the configuration struct.
 
@@ -760,6 +767,10 @@
 
     int PyRun_ExecMain();
 
+This operation is only permitted for the main interpreter, and will raise
+``RuntimeError`` when invoked from a thread where the current thread state
+belongs to a subinterpreter.
+
 The actual processing is driven by the main related settings stored in
 the interpreter state as part of the configuration struct.
 
@@ -771,22 +782,22 @@
 If ``main_stream`` and ``prompt_stream`` are both set, main execution will
 be delegated to a new API::
 
-    int PyRun_InteractiveMain(PyObject *input, PyObject* output);
+    int _PyRun_InteractiveMain(PyObject *input, PyObject* output);
 
 If ``main_stream`` is set and ``prompt_stream`` is NULL, main execution will
 be delegated to a new API::
 
-    int PyRun_StreamInMain(PyObject *input);
+    int _PyRun_StreamInMain(PyObject *input);
 
 If ``main_code`` is set, main execution will be delegated to a new
 API::
 
-    int PyRun_CodeInMain(PyCodeObject *code);
+    int _PyRun_CodeInMain(PyCodeObject *code);
 
 After execution of main completes, if ``inspect_main`` is set, or
 the ``PYTHONINSPECT`` environment variable has been set, then
 ``PyRun_ExecMain`` will invoke
-``PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__)``.
+``_PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__)``.
 
 
 Internal Storage of Configuration Data
@@ -794,8 +805,8 @@
 
 The interpreter state will be updated to include details of the configuration
 settings supplied during initialization by extending the interpreter state
-object with an embedded copy of the ``PyCoreConfig`` and ``PyConfig``
-structs.
+object with an embedded copy of the ``PyCoreConfig``,
+``PyMainInterpreterConfig`` and ``PyInterpreterConfig`` structs.
 
 For debugging purposes, the configuration settings will be exposed as
 a ``sys._configuration`` simple namespace (similar to ``sys.flags`` and
@@ -838,7 +849,7 @@
 While the existing ``Py_InterpreterState_Head()`` API could be used instead,
 that reference changes as subinterpreters are created and destroyed, while
 ``PyInterpreterState_Main()`` will always refer to the initial interpreter
-state created in ``Py_BeginInitialization()``.
+state created in ``Py_InitializeCore()``.
 
 A new constraint is also added to the embedding API: attempting to delete
 the main interpreter while subinterpreters still exist will now be a fatal
@@ -853,7 +864,7 @@
 than merely writing an extension.
 
 The only newly exposed API that will be part of the stable ABI is the
-``Py_IsInitializing()`` query.
+``Py_IsCoreInitialized()`` query.
 
 
 Build time configuration
@@ -868,10 +879,10 @@
 -----------------------
 
 Backwards compatibility will be preserved primarily by ensuring that
-``Py_ReadConfig()`` interrogates all the previously defined
+``Py_ReadMainInterpreterConfig()`` interrogates all the previously defined
 configuration settings stored in global variables and environment variables,
-and that ``Py_EndInitialization()`` writes affected settings back to the
-relevant locations.
+and that ``Py_InitializeMainInterpreter()`` writes affected settings back to
+the relevant locations.
 
 One acknowledged incompatiblity is that some environment variables which
 are currently read lazily may instead be read once during interpreter
@@ -892,7 +903,7 @@
 ``PySys_SetArgv`` call. All APIs that currently support being called
 prior to ``Py_Initialize()`` will
 continue to do so, and will also support being called prior to
-``Py_BeginInitialization()``.
+``Py_InitializeCore()``.
 
 To minimise unnecessary code churn, and to ensure the backwards compatibility
 is well tested, the main CPython executable may continue to use some elements
@@ -909,7 +920,7 @@
 environment variables are trusted and that the directory containing the
 executed file is placed at the beginning of the import path.
 
-Issue 16499 [6_] proposes adding a ``-I`` option to change the behaviour of
+Issue 16499 [6_] added a ``-I`` option to change the behaviour of
 the normal CPython executable, but this is a hard to discover solution (and
 adds yet another option to an already complex CLI). This PEP proposes to
 instead add a separate ``pysystem`` executable
@@ -940,19 +951,19 @@
 Open Questions
 ==============
 
-* Error details for Py_ReadConfiguration and Py_EndInitialization (these
-  should become clear as the implementation progresses)
-* Should there be ``Py_PreparingMain()`` and ``Py_RunningMain()`` query APIs?
-* Should the answer to ``Py_IsInitialized()`` be exposed via the ``sys``
-  module?
-* Is initialisation of the ``PyConfig`` struct too unwieldy to be
-  maintainable? Would a Python dictionary be a better choice, despite
-  being harder to work with from C code?
-* Would it be better to manage the flag variables in ``PyConfig`` as
-  Python integers or as "negative means false, positive means true, zero
+* Error details for ``Py_ReadMainInterpreterConfig`` and
+  ``Py_InitializeMainInterpreter`` (these should become clearer as the
+  implementation progresses)
+* Is initialisation of the ``PyMainInterpreterConfig`` struct too unwieldy to
+  be maintainable? Would a Python dictionary be a better choice, despite
+  being harder to work with from C code? Can we upgrade to requiring a C99
+  compatible compiler?
+* Would it be better to manage the flag variables in ``PyMainInterpreterConfig``
+  as Python integers or as "negative means false, positive means true, zero
   means not set" so the struct can be initialized with a simple
   ``memset(&config, 0, sizeof(*config))``, eliminating the need to update
-  both PyConfig and PyConfig_INIT when adding new fields?
+  both PyMainInterpreterConfig and PyMainInterpreterConfig_INIT when adding
+  new fields?
 * The name of the new system Python executable is a bikeshed waiting to be
   painted. The 3 options considered so far are ``spython``, ``pysystem``
   and ``python-minimal``. The PEP text reflects my current preferred choice
@@ -969,15 +980,6 @@
 settles down and it's a matter of migrating individual settings over to
 the new design, that level of collaboration should become more practical.
 
-As the number of application binaries created by the build process is now
-four, the reference implementation also creates a new top level "Apps"
-directory in the CPython source tree. The source files for the main
-``python`` binary and the new ``pysystem`` binary will be located in that
-directory. The source files for the ``_freeze_importlib`` binary and the
-``_testembed`` binary have been moved out of the Modules directory (which
-is intended for CPython builtin and extension modules) and into the Tools
-directory.
-
 
 The Status Quo
 ==============

-- 
Repository URL: https://hg.python.org/peps


More information about the Python-checkins mailing list