[Python-checkins] bpo-33042: Fix pre-initialization sys module configuration (GH-6157)

Nick Coghlan webhook-mailer at python.org
Sun Mar 25 06:44:33 EDT 2018


https://github.com/python/cpython/commit/bc77eff8b96be4f035e665ab35c1d06e22f46491
commit: bc77eff8b96be4f035e665ab35c1d06e22f46491
branch: master
author: Nick Coghlan <ncoghlan at gmail.com>
committer: GitHub <noreply at github.com>
date: 2018-03-25T20:44:30+10:00
summary:

bpo-33042: Fix pre-initialization sys module configuration (GH-6157)

- new test case for pre-initialization of sys.warnoptions and sys._xoptions
- restored ability to call these APIs prior to Py_Initialize
- updated the docs for the affected APIs to make it clear they can be
  called before Py_Initialize
- also enhanced the existing embedding test cases
  to check for expected settings in the sys module

files:
A Misc/NEWS.d/next/C API/2018-03-20-21-43-09.bpo-33042.FPFp64.rst
M Doc/c-api/init.rst
M Doc/c-api/sys.rst
M Doc/whatsnew/3.7.rst
M Lib/test/test_embed.py
M Programs/_testembed.c
M Python/sysmodule.c

diff --git a/Doc/c-api/init.rst b/Doc/c-api/init.rst
index f2564c441472..694b4669eea8 100644
--- a/Doc/c-api/init.rst
+++ b/Doc/c-api/init.rst
@@ -31,6 +31,9 @@ The following functions can be safely called before Python is initialized:
   * :c:func:`Py_SetProgramName`
   * :c:func:`Py_SetPythonHome`
   * :c:func:`Py_SetStandardStreamEncoding`
+  * :c:func:`PySys_AddWarnOption`
+  * :c:func:`PySys_AddXOption`
+  * :c:func:`PySys_ResetWarnOptions`
 
 * Informative functions:
 
diff --git a/Doc/c-api/sys.rst b/Doc/c-api/sys.rst
index e4da96c493cd..994509aa50f2 100644
--- a/Doc/c-api/sys.rst
+++ b/Doc/c-api/sys.rst
@@ -205,16 +205,24 @@ accessible to C code.  They all work with the current interpreter thread's
 
 .. c:function:: void PySys_ResetWarnOptions()
 
-   Reset :data:`sys.warnoptions` to an empty list.
+   Reset :data:`sys.warnoptions` to an empty list. This function may be
+   called prior to :c:func:`Py_Initialize`.
 
 .. c:function:: void PySys_AddWarnOption(const wchar_t *s)
 
-   Append *s* to :data:`sys.warnoptions`.
+   Append *s* to :data:`sys.warnoptions`. This function must be called prior
+   to :c:func:`Py_Initialize` in order to affect the warnings filter list.
 
 .. c:function:: void PySys_AddWarnOptionUnicode(PyObject *unicode)
 
    Append *unicode* to :data:`sys.warnoptions`.
 
+   Note: this function is not currently usable from outside the CPython
+   implementation, as it must be called prior to the implicit import of
+   :mod:`warnings` in :c:func:`Py_Initialize` to be effective, but can't be
+   called until enough of the runtime has been initialized to permit the
+   creation of Unicode objects.
+
 .. c:function:: void PySys_SetPath(const wchar_t *path)
 
    Set :data:`sys.path` to a list object of paths found in *path* which should
@@ -260,7 +268,8 @@ accessible to C code.  They all work with the current interpreter thread's
 .. c:function:: void PySys_AddXOption(const wchar_t *s)
 
    Parse *s* as a set of :option:`-X` options and add them to the current
-   options mapping as returned by :c:func:`PySys_GetXOptions`.
+   options mapping as returned by :c:func:`PySys_GetXOptions`. This function
+   may be called prior to :c:func:`Py_Initialize`.
 
    .. versionadded:: 3.2
 
diff --git a/Doc/whatsnew/3.7.rst b/Doc/whatsnew/3.7.rst
index 7eb19f0f82a6..0b5ad007f30f 100644
--- a/Doc/whatsnew/3.7.rst
+++ b/Doc/whatsnew/3.7.rst
@@ -951,6 +951,14 @@ Build and C API Changes
   second argument is *NULL* and the :c:type:`wchar_t*` string contains null
   characters.  (Contributed by Serhiy Storchaka in :issue:`30708`.)
 
+- Changes to the startup sequence and the management of dynamic memory
+  allocators mean that the long documented requirement to call
+  :c:func:`Py_Initialize` before calling most C API functions is now
+  relied on more heavily, and failing to abide by it may lead to segfaults in
+  embedding applications. See the :ref:`porting-to-python-37` section in this
+  document and the :ref:`pre-init-safe` section in the C API documentation
+  for more details.
+
 
 Other CPython Implementation Changes
 ====================================
@@ -1098,6 +1106,7 @@ API and Feature Removals
   ``asyncio._overlapped``. Replace ``from asyncio import selectors`` with
   ``import selectors`` for example.
 
+.. _porting-to-python-37:
 
 Porting to Python 3.7
 =====================
@@ -1282,14 +1291,24 @@ Other CPython implementation changes
 ------------------------------------
 
 * In preparation for potential future changes to the public CPython runtime
-  initialization API (see :pep:`432` for details), CPython's internal startup
+  initialization API (see :pep:`432` for an initial, but somewhat outdated,
+  draft), CPython's internal startup
   and configuration management logic has been significantly refactored. While
   these updates are intended to be entirely transparent to both embedding
   applications and users of the regular CPython CLI, they're being mentioned
   here as the refactoring changes the internal order of various operations
   during interpreter startup, and hence may uncover previously latent defects,
   either in embedding applications, or in CPython itself.
-  (Contributed by Nick Coghlan and Eric Snow as part of :issue:`22257`.)
+  (Initially contributed by Nick Coghlan and Eric Snow as part of
+  :issue:`22257`, and further updated by Nick, Eric, and Victor Stinner in a
+  number of other issues). Some known details affected:
+
+    * :c:func:`PySys_AddWarnOptionUnicode` is not currently usable by embedding
+      applications due to the requirement to create a Unicode object prior to
+      calling `Py_Initialize`. Use :c:func:`PySys_AddWarnOption` instead.
+    * warnings filters added by an embedding application with
+      :c:func:`PySys_AddWarnOption` should now more consistently take precedence
+      over the default filters set by the interpreter
 
 * Due to changes in the way the default warnings filters are configured,
   setting :c:data:`Py_BytesWarningFlag` to a value greater than one is no longer
diff --git a/Lib/test/test_embed.py b/Lib/test/test_embed.py
index c7f45b59acc8..f926301b8466 100644
--- a/Lib/test/test_embed.py
+++ b/Lib/test/test_embed.py
@@ -51,7 +51,7 @@ def run_embedded_interpreter(self, *args, env=None):
         if p.returncode != 0 and support.verbose:
             print(f"--- {cmd} failed ---")
             print(f"stdout:\n{out}")
-            print(f"stderr:\n{out}")
+            print(f"stderr:\n{err}")
             print(f"------")
 
         self.assertEqual(p.returncode, 0,
@@ -83,7 +83,7 @@ def run_repeated_init_and_subinterpreters(self):
         for line in out.splitlines():
             if line == "--- Pass {} ---".format(numloops):
                 self.assertEqual(len(current_run), 0)
-                if support.verbose:
+                if support.verbose > 1:
                     print(line)
                 numloops += 1
                 continue
@@ -96,7 +96,7 @@ def run_repeated_init_and_subinterpreters(self):
             # Parse the line from the loop.  The first line is the main
             # interpreter and the 3 afterward are subinterpreters.
             interp = Interp(*match.groups())
-            if support.verbose:
+            if support.verbose > 1:
                 print(interp)
             self.assertTrue(interp.interp)
             self.assertTrue(interp.tstate)
@@ -190,12 +190,33 @@ def test_forced_io_encoding(self):
 
     def test_pre_initialization_api(self):
         """
-        Checks the few parts of the C-API that work before the runtine
+        Checks some key parts of the C-API that need to work before the runtine
         is initialized (via Py_Initialize()).
         """
         env = dict(os.environ, PYTHONPATH=os.pathsep.join(sys.path))
         out, err = self.run_embedded_interpreter("pre_initialization_api", env=env)
-        self.assertEqual(out, '')
+        if sys.platform == "win32":
+            expected_path = self.test_exe
+        else:
+            expected_path = os.path.join(os.getcwd(), "spam")
+        expected_output = f"sys.executable: {expected_path}\n"
+        self.assertIn(expected_output, out)
+        self.assertEqual(err, '')
+
+    def test_pre_initialization_sys_options(self):
+        """
+        Checks that sys.warnoptions and sys._xoptions can be set before the
+        runtime is initialized (otherwise they won't be effective).
+        """
+        env = dict(PYTHONPATH=os.pathsep.join(sys.path))
+        out, err = self.run_embedded_interpreter(
+                        "pre_initialization_sys_options", env=env)
+        expected_output = (
+            "sys.warnoptions: ['once', 'module', 'default']\n"
+            "sys._xoptions: {'not_an_option': '1', 'also_not_an_option': '2'}\n"
+            "warnings.filters[:3]: ['default', 'module', 'once']\n"
+        )
+        self.assertIn(expected_output, out)
         self.assertEqual(err, '')
 
     def test_bpo20891(self):
diff --git a/Misc/NEWS.d/next/C API/2018-03-20-21-43-09.bpo-33042.FPFp64.rst b/Misc/NEWS.d/next/C API/2018-03-20-21-43-09.bpo-33042.FPFp64.rst
new file mode 100644
index 000000000000..f840b55869cc
--- /dev/null
+++ b/Misc/NEWS.d/next/C API/2018-03-20-21-43-09.bpo-33042.FPFp64.rst	
@@ -0,0 +1,2 @@
+Embedding applications may once again call PySys_ResetWarnOptions,
+PySys_AddWarnOption, and PySys_AddXOption prior to calling Py_Initialize.
\ No newline at end of file
diff --git a/Programs/_testembed.c b/Programs/_testembed.c
index b3d7aa442d1a..09b7bf6d94fc 100644
--- a/Programs/_testembed.c
+++ b/Programs/_testembed.c
@@ -2,6 +2,7 @@
 #include "pythread.h"
 #include <inttypes.h>
 #include <stdio.h>
+#include <wchar.h>
 
 /*********************************************************
  * Embedded interpreter tests that need a custom exe
@@ -130,23 +131,89 @@ static int test_forced_io_encoding(void)
  * Test parts of the C-API that work before initialization
  *********************************************************/
 
+/* The pre-initialization tests tend to break by segfaulting, so explicitly
+ * flushed progress messages make the broken API easier to find when they fail.
+ */
+#define _Py_EMBED_PREINIT_CHECK(msg) \
+    do {printf(msg); fflush(stdout);} while (0);
+
 static int test_pre_initialization_api(void)
 {
     /* Leading "./" ensures getpath.c can still find the standard library */
+    _Py_EMBED_PREINIT_CHECK("Checking Py_DecodeLocale\n");
     wchar_t *program = Py_DecodeLocale("./spam", NULL);
     if (program == NULL) {
         fprintf(stderr, "Fatal error: cannot decode program name\n");
         return 1;
     }
+    _Py_EMBED_PREINIT_CHECK("Checking Py_SetProgramName\n");
     Py_SetProgramName(program);
 
+    _Py_EMBED_PREINIT_CHECK("Initializing interpreter\n");
     Py_Initialize();
+    _Py_EMBED_PREINIT_CHECK("Check sys module contents\n");
+    PyRun_SimpleString("import sys; "
+                       "print('sys.executable:', sys.executable)");
+    _Py_EMBED_PREINIT_CHECK("Finalizing interpreter\n");
     Py_Finalize();
 
+    _Py_EMBED_PREINIT_CHECK("Freeing memory allocated by Py_DecodeLocale\n");
     PyMem_RawFree(program);
     return 0;
 }
 
+
+/* bpo-33042: Ensure embedding apps can predefine sys module options */
+static int test_pre_initialization_sys_options(void)
+{
+    /* We allocate a couple of the option dynamically, and then delete
+     * them before calling Py_Initialize. This ensures the interpreter isn't
+     * relying on the caller to keep the passed in strings alive.
+     */
+    wchar_t *static_warnoption = L"once";
+    wchar_t *static_xoption = L"also_not_an_option=2";
+    size_t warnoption_len = wcslen(static_warnoption);
+    size_t xoption_len = wcslen(static_xoption);
+    wchar_t *dynamic_once_warnoption = calloc(warnoption_len+1, sizeof(wchar_t));
+    wchar_t *dynamic_xoption = calloc(xoption_len+1, sizeof(wchar_t));
+    wcsncpy(dynamic_once_warnoption, static_warnoption, warnoption_len+1);
+    wcsncpy(dynamic_xoption, static_xoption, xoption_len+1);
+
+    _Py_EMBED_PREINIT_CHECK("Checking PySys_AddWarnOption\n");
+    PySys_AddWarnOption(L"default");
+    _Py_EMBED_PREINIT_CHECK("Checking PySys_ResetWarnOptions\n");
+    PySys_ResetWarnOptions();
+    _Py_EMBED_PREINIT_CHECK("Checking PySys_AddWarnOption linked list\n");
+    PySys_AddWarnOption(dynamic_once_warnoption);
+    PySys_AddWarnOption(L"module");
+    PySys_AddWarnOption(L"default");
+    _Py_EMBED_PREINIT_CHECK("Checking PySys_AddXOption\n");
+    PySys_AddXOption(L"not_an_option=1");
+    PySys_AddXOption(dynamic_xoption);
+
+    /* Delete the dynamic options early */
+    free(dynamic_once_warnoption);
+    dynamic_once_warnoption = NULL;
+    free(dynamic_xoption);
+    dynamic_xoption = NULL;
+
+    _Py_EMBED_PREINIT_CHECK("Initializing interpreter\n");
+    _testembed_Py_Initialize();
+    _Py_EMBED_PREINIT_CHECK("Check sys module contents\n");
+    PyRun_SimpleString("import sys; "
+                       "print('sys.warnoptions:', sys.warnoptions); "
+                       "print('sys._xoptions:', sys._xoptions); "
+                       "warnings = sys.modules['warnings']; "
+                       "latest_filters = [f[0] for f in warnings.filters[:3]]; "
+                       "print('warnings.filters[:3]:', latest_filters)");
+    _Py_EMBED_PREINIT_CHECK("Finalizing interpreter\n");
+    Py_Finalize();
+
+    return 0;
+}
+
+
+/* bpo-20891: Avoid race condition when initialising the GIL */
 static void bpo20891_thread(void *lockp)
 {
     PyThread_type_lock lock = *((PyThread_type_lock*)lockp);
@@ -217,6 +284,7 @@ static struct TestCase TestCases[] = {
     { "forced_io_encoding", test_forced_io_encoding },
     { "repeated_init_and_subinterpreters", test_repeated_init_and_subinterpreters },
     { "pre_initialization_api", test_pre_initialization_api },
+    { "pre_initialization_sys_options", test_pre_initialization_sys_options },
     { "bpo20891", test_bpo20891 },
     { NULL, NULL }
 };
@@ -232,13 +300,13 @@ int main(int argc, char *argv[])
 
     /* No match found, or no test name provided, so display usage */
     printf("Python " PY_VERSION " _testembed executable for embedded interpreter tests\n"
-           "Normally executed via 'EmbeddingTests' in Lib/test/test_capi.py\n\n"
+           "Normally executed via 'EmbeddingTests' in Lib/test/test_embed.py\n\n"
            "Usage: %s TESTNAME\n\nAll available tests:\n", argv[0]);
     for (struct TestCase *tc = TestCases; tc && tc->name; tc++) {
         printf("  %s\n", tc->name);
     }
 
-    /* Non-zero exit code will cause test_capi.py tests to fail.
+    /* Non-zero exit code will cause test_embed.py tests to fail.
        This is intentional. */
     return -1;
 }
diff --git a/Python/sysmodule.c b/Python/sysmodule.c
index fb1dcfa3afe8..7cecff67486a 100644
--- a/Python/sysmodule.c
+++ b/Python/sysmodule.c
@@ -1609,11 +1609,141 @@ list_builtin_module_names(void)
     return list;
 }
 
+/* Pre-initialization support for sys.warnoptions and sys._xoptions
+ *
+ * Modern internal code paths:
+ *   These APIs get called after _Py_InitializeCore and get to use the
+ *   regular CPython list, dict, and unicode APIs.
+ *
+ * Legacy embedding code paths:
+ *   The multi-phase initialization API isn't public yet, so embedding
+ *   apps still need to be able configure sys.warnoptions and sys._xoptions
+ *   before they call Py_Initialize. To support this, we stash copies of
+ *   the supplied wchar * sequences in linked lists, and then migrate the
+ *   contents of those lists to the sys module in _PyInitializeCore.
+ *
+ */
+
+struct _preinit_entry {
+    wchar_t *value;
+    struct _preinit_entry *next;
+};
+
+typedef struct _preinit_entry *_Py_PreInitEntry;
+
+static _Py_PreInitEntry _preinit_warnoptions = NULL;
+static _Py_PreInitEntry _preinit_xoptions = NULL;
+
+static _Py_PreInitEntry
+_alloc_preinit_entry(const wchar_t *value)
+{
+    /* To get this to work, we have to initialize the runtime implicitly */
+    _PyRuntime_Initialize();
+
+    /* Force default allocator, so we can ensure that it also gets used to
+     * destroy the linked list in _clear_preinit_entries.
+     */
+    PyMemAllocatorEx old_alloc;
+    _PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
+
+    _Py_PreInitEntry node = PyMem_RawCalloc(1, sizeof(*node));
+    if (node != NULL) {
+        node->value = _PyMem_RawWcsdup(value);
+        if (node->value == NULL) {
+            PyMem_RawFree(node);
+            node = NULL;
+        };
+    };
+
+    PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
+    return node;
+};
+
+static int
+_append_preinit_entry(_Py_PreInitEntry *optionlist, const wchar_t *value)
+{
+    _Py_PreInitEntry new_entry = _alloc_preinit_entry(value);
+    if (new_entry == NULL) {
+        return -1;
+    }
+    /* We maintain the linked list in this order so it's easy to play back
+     * the add commands in the same order later on in _Py_InitializeCore
+     */
+    _Py_PreInitEntry last_entry = *optionlist;
+    if (last_entry == NULL) {
+        *optionlist = new_entry;
+    } else {
+        while (last_entry->next != NULL) {
+            last_entry = last_entry->next;
+        }
+        last_entry->next = new_entry;
+    }
+    return 0;
+};
+
+static void
+_clear_preinit_entries(_Py_PreInitEntry *optionlist)
+{
+    _Py_PreInitEntry current = *optionlist;
+    *optionlist = NULL;
+    /* Deallocate the nodes and their contents using the default allocator */
+    PyMemAllocatorEx old_alloc;
+    _PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
+    while (current != NULL) {
+        _Py_PreInitEntry next = current->next;
+        PyMem_RawFree(current->value);
+        PyMem_RawFree(current);
+        current = next;
+    }
+    PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
+};
+
+static void
+_clear_all_preinit_options(void)
+{
+    _clear_preinit_entries(&_preinit_warnoptions);
+    _clear_preinit_entries(&_preinit_xoptions);
+}
+
+static int
+_PySys_ReadPreInitOptions(void)
+{
+    /* Rerun the add commands with the actual sys module available */
+    PyThreadState *tstate = PyThreadState_GET();
+    if (tstate == NULL) {
+        /* Still don't have a thread state, so something is wrong! */
+        return -1;
+    }
+    _Py_PreInitEntry entry = _preinit_warnoptions;
+    while (entry != NULL) {
+        PySys_AddWarnOption(entry->value);
+        entry = entry->next;
+    }
+    entry = _preinit_xoptions;
+    while (entry != NULL) {
+        PySys_AddXOption(entry->value);
+        entry = entry->next;
+    }
+
+    _clear_all_preinit_options();
+    return 0;
+};
+
 static PyObject *
 get_warnoptions(void)
 {
     PyObject *warnoptions = _PySys_GetObjectId(&PyId_warnoptions);
     if (warnoptions == NULL || !PyList_Check(warnoptions)) {
+        /* PEP432 TODO: we can reach this if warnoptions is NULL in the main
+        *  interpreter config. When that happens, we need to properly set
+         * the `warnoptions` reference in the main interpreter config as well.
+         *
+         * For Python 3.7, we shouldn't be able to get here due to the
+         * combination of how _PyMainInterpreter_ReadConfig and _PySys_EndInit
+         * work, but we expect 3.8+ to make the _PyMainInterpreter_ReadConfig
+         * call optional for embedding applications, thus making this
+         * reachable again.
+         */
         Py_XDECREF(warnoptions);
         warnoptions = PyList_New(0);
         if (warnoptions == NULL)
@@ -1630,6 +1760,12 @@ get_warnoptions(void)
 void
 PySys_ResetWarnOptions(void)
 {
+    PyThreadState *tstate = PyThreadState_GET();
+    if (tstate == NULL) {
+        _clear_preinit_entries(&_preinit_warnoptions);
+        return;
+    }
+
     PyObject *warnoptions = _PySys_GetObjectId(&PyId_warnoptions);
     if (warnoptions == NULL || !PyList_Check(warnoptions))
         return;
@@ -1658,6 +1794,11 @@ PySys_AddWarnOptionUnicode(PyObject *option)
 void
 PySys_AddWarnOption(const wchar_t *s)
 {
+    PyThreadState *tstate = PyThreadState_GET();
+    if (tstate == NULL) {
+        _append_preinit_entry(&_preinit_warnoptions, s);
+        return;
+    }
     PyObject *unicode;
     unicode = PyUnicode_FromWideChar(s, -1);
     if (unicode == NULL)
@@ -1678,6 +1819,16 @@ get_xoptions(void)
 {
     PyObject *xoptions = _PySys_GetObjectId(&PyId__xoptions);
     if (xoptions == NULL || !PyDict_Check(xoptions)) {
+        /* PEP432 TODO: we can reach this if xoptions is NULL in the main
+        *  interpreter config. When that happens, we need to properly set
+         * the `xoptions` reference in the main interpreter config as well.
+         *
+         * For Python 3.7, we shouldn't be able to get here due to the
+         * combination of how _PyMainInterpreter_ReadConfig and _PySys_EndInit
+         * work, but we expect 3.8+ to make the _PyMainInterpreter_ReadConfig
+         * call optional for embedding applications, thus making this
+         * reachable again.
+         */
         Py_XDECREF(xoptions);
         xoptions = PyDict_New();
         if (xoptions == NULL)
@@ -1730,6 +1881,11 @@ _PySys_AddXOptionWithError(const wchar_t *s)
 void
 PySys_AddXOption(const wchar_t *s)
 {
+    PyThreadState *tstate = PyThreadState_GET();
+    if (tstate == NULL) {
+        _append_preinit_entry(&_preinit_xoptions, s);
+        return;
+    }
     if (_PySys_AddXOptionWithError(s) < 0) {
         /* No return value, therefore clear error state if possible */
         if (_PyThreadState_UncheckedGet()) {
@@ -2257,6 +2413,7 @@ _PySys_BeginInit(PyObject **sysmod)
     }
 
     *sysmod = m;
+
     return _Py_INIT_OK();
 
 type_init_failed:
@@ -2333,6 +2490,11 @@ _PySys_EndInit(PyObject *sysdict, _PyMainInterpreterConfig *config)
     if (get_xoptions() == NULL)
         return -1;
 
+    /* Transfer any sys.warnoptions and sys._xoptions set directly
+     * by an embedding application from the linked list to the module. */
+    if (_PySys_ReadPreInitOptions() != 0)
+        return -1;
+
     if (PyErr_Occurred())
         return -1;
     return 0;



More information about the Python-checkins mailing list