[Python-checkins] gh-102381: don't call watcher callback with dead object (#102382)

carljm webhook-mailer at python.org
Tue Mar 7 19:11:22 EST 2023


https://github.com/python/cpython/commit/1e703a473343ed198c9a06a876b25d7d69d4bbd0
commit: 1e703a473343ed198c9a06a876b25d7d69d4bbd0
branch: main
author: Carl Meyer <carl at oddbird.net>
committer: carljm <carl at oddbird.net>
date: 2023-03-07T17:10:58-07:00
summary:

gh-102381: don't call watcher callback with dead object (#102382)

Co-authored-by: T. Wouters <thomas at python.org>

files:
M Doc/c-api/code.rst
M Doc/c-api/dict.rst
M Doc/c-api/function.rst
M Include/cpython/code.h
M Include/cpython/dictobject.h
M Include/cpython/funcobject.h
M Include/internal/pycore_dict.h
M Lib/test/test_capi/test_watchers.py
M Modules/_testcapi/watchers.c
M Objects/codeobject.c
M Objects/dictobject.c
M Objects/funcobject.c

diff --git a/Doc/c-api/code.rst b/Doc/c-api/code.rst
index 062ef3a1fea9..a99de9904c07 100644
--- a/Doc/c-api/code.rst
+++ b/Doc/c-api/code.rst
@@ -172,6 +172,11 @@ bound into a function.
    before the destruction of *co* takes place, so the prior state of *co*
    can be inspected.
 
+   If *event* is ``PY_CODE_EVENT_DESTROY``, taking a reference in the callback
+   to the about-to-be-destroyed code object will resurrect it and prevent it
+   from being freed at this time. When the resurrected object is destroyed
+   later, any watcher callbacks active at that time will be called again.
+
    Users of this API should not rely on internal runtime implementation
    details. Such details may include, but are not limited to, the exact
    order and timing of creation and destruction of code objects. While
@@ -179,9 +184,15 @@ bound into a function.
    (including whether a callback is invoked or not), it does not change
    the semantics of the Python code being executed.
 
-   If the callback returns with an exception set, it must return ``-1``; this
-   exception will be printed as an unraisable exception using
-   :c:func:`PyErr_WriteUnraisable`. Otherwise it should return ``0``.
+   If the callback sets an exception, it must return ``-1``; this exception will
+   be printed as an unraisable exception using :c:func:`PyErr_WriteUnraisable`.
+   Otherwise it should return ``0``.
+
+   There may already be a pending exception set on entry to the callback. In
+   this case, the callback should return ``0`` with the same exception still
+   set. This means the callback may not call any other API that can set an
+   exception unless it saves and clears the exception state first, and restores
+   it before returning.
 
    .. versionadded:: 3.12
 
diff --git a/Doc/c-api/dict.rst b/Doc/c-api/dict.rst
index 34106ee6b1f3..b9f84cea7856 100644
--- a/Doc/c-api/dict.rst
+++ b/Doc/c-api/dict.rst
@@ -298,13 +298,26 @@ Dictionary Objects
    dictionary.
 
    The callback may inspect but must not modify *dict*; doing so could have
-   unpredictable effects, including infinite recursion.
+   unpredictable effects, including infinite recursion. Do not trigger Python
+   code execution in the callback, as it could modify the dict as a side effect.
+
+   If *event* is ``PyDict_EVENT_DEALLOCATED``, taking a new reference in the
+   callback to the about-to-be-destroyed dictionary will resurrect it and
+   prevent it from being freed at this time. When the resurrected object is
+   destroyed later, any watcher callbacks active at that time will be called
+   again.
 
    Callbacks occur before the notified modification to *dict* takes place, so
    the prior state of *dict* can be inspected.
 
-   If the callback returns with an exception set, it must return ``-1``; this
-   exception will be printed as an unraisable exception using
-   :c:func:`PyErr_WriteUnraisable`. Otherwise it should return ``0``.
+   If the callback sets an exception, it must return ``-1``; this exception will
+   be printed as an unraisable exception using :c:func:`PyErr_WriteUnraisable`.
+   Otherwise it should return ``0``.
+
+   There may already be a pending exception set on entry to the callback. In
+   this case, the callback should return ``0`` with the same exception still
+   set. This means the callback may not call any other API that can set an
+   exception unless it saves and clears the exception state first, and restores
+   it before returning.
 
    .. versionadded:: 3.12
diff --git a/Doc/c-api/function.rst b/Doc/c-api/function.rst
index bc7569d0add9..947ed7040408 100644
--- a/Doc/c-api/function.rst
+++ b/Doc/c-api/function.rst
@@ -173,8 +173,19 @@ There are a few functions specific to Python functions.
    runtime behavior depending on optimization decisions, it does not change
    the semantics of the Python code being executed.
 
-   If the callback returns with an exception set, it must return ``-1``; this
-   exception will be printed as an unraisable exception using
-   :c:func:`PyErr_WriteUnraisable`. Otherwise it should return ``0``.
+   If *event* is ``PyFunction_EVENT_DESTROY``,  Taking a reference in the
+   callback to the about-to-be-destroyed function will resurrect it, preventing
+   it from being freed at this time. When the resurrected object is destroyed
+   later, any watcher callbacks active at that time will be called again.
+
+   If the callback sets an exception, it must return ``-1``; this exception will
+   be printed as an unraisable exception using :c:func:`PyErr_WriteUnraisable`.
+   Otherwise it should return ``0``.
+
+   There may already be a pending exception set on entry to the callback. In
+   this case, the callback should return ``0`` with the same exception still
+   set. This means the callback may not call any other API that can set an
+   exception unless it saves and clears the exception state first, and restores
+   it before returning.
 
    .. versionadded:: 3.12
diff --git a/Include/cpython/code.h b/Include/cpython/code.h
index 0e4bd8a58c16..abcf1250603d 100644
--- a/Include/cpython/code.h
+++ b/Include/cpython/code.h
@@ -224,9 +224,14 @@ PyAPI_FUNC(int) PyCode_Addr2Line(PyCodeObject *, int);
 
 PyAPI_FUNC(int) PyCode_Addr2Location(PyCodeObject *, int, int *, int *, int *, int *);
 
-typedef enum PyCodeEvent {
-  PY_CODE_EVENT_CREATE,
-  PY_CODE_EVENT_DESTROY
+#define PY_FOREACH_CODE_EVENT(V) \
+    V(CREATE)                 \
+    V(DESTROY)
+
+typedef enum {
+    #define PY_DEF_EVENT(op) PY_CODE_EVENT_##op,
+    PY_FOREACH_CODE_EVENT(PY_DEF_EVENT)
+    #undef PY_DEF_EVENT
 } PyCodeEvent;
 
 
@@ -236,7 +241,7 @@ typedef enum PyCodeEvent {
  * The callback is invoked with a borrowed reference to co, after it is
  * created and before it is destroyed.
  *
- * If the callback returns with an exception set, it must return -1. Otherwise
+ * If the callback sets an exception, it must return -1. Otherwise
  * it should return 0.
  */
 typedef int (*PyCode_WatchCallback)(
diff --git a/Include/cpython/dictobject.h b/Include/cpython/dictobject.h
index 5001f3565447..ddada922020a 100644
--- a/Include/cpython/dictobject.h
+++ b/Include/cpython/dictobject.h
@@ -16,11 +16,11 @@ typedef struct {
 
     /* Dictionary version: globally unique, value change each time
        the dictionary is modified */
-#ifdef Py_BUILD_CORE       
+#ifdef Py_BUILD_CORE
     uint64_t ma_version_tag;
 #else
     Py_DEPRECATED(3.12) uint64_t ma_version_tag;
-#endif        
+#endif
 
     PyDictKeysObject *ma_keys;
 
@@ -90,13 +90,18 @@ PyAPI_FUNC(PyObject *) _PyDictView_Intersect(PyObject* self, PyObject *other);
 
 /* Dictionary watchers */
 
+#define PY_FOREACH_DICT_EVENT(V) \
+    V(ADDED)                     \
+    V(MODIFIED)                  \
+    V(DELETED)                   \
+    V(CLONED)                    \
+    V(CLEARED)                   \
+    V(DEALLOCATED)
+
 typedef enum {
-    PyDict_EVENT_ADDED,
-    PyDict_EVENT_MODIFIED,
-    PyDict_EVENT_DELETED,
-    PyDict_EVENT_CLONED,
-    PyDict_EVENT_CLEARED,
-    PyDict_EVENT_DEALLOCATED,
+    #define PY_DEF_EVENT(EVENT) PyDict_EVENT_##EVENT,
+    PY_FOREACH_DICT_EVENT(PY_DEF_EVENT)
+    #undef PY_DEF_EVENT
 } PyDict_WatchEvent;
 
 // Callback to be invoked when a watched dict is cleared, dealloced, or modified.
diff --git a/Include/cpython/funcobject.h b/Include/cpython/funcobject.h
index 5979febc2e34..c716330cc3fb 100644
--- a/Include/cpython/funcobject.h
+++ b/Include/cpython/funcobject.h
@@ -131,17 +131,17 @@ PyAPI_DATA(PyTypeObject) PyStaticMethod_Type;
 PyAPI_FUNC(PyObject *) PyClassMethod_New(PyObject *);
 PyAPI_FUNC(PyObject *) PyStaticMethod_New(PyObject *);
 
-#define FOREACH_FUNC_EVENT(V) \
-    V(CREATE)                 \
-    V(DESTROY)                \
-    V(MODIFY_CODE)            \
-    V(MODIFY_DEFAULTS)        \
+#define PY_FOREACH_FUNC_EVENT(V) \
+    V(CREATE)                    \
+    V(DESTROY)                   \
+    V(MODIFY_CODE)               \
+    V(MODIFY_DEFAULTS)           \
     V(MODIFY_KWDEFAULTS)
 
 typedef enum {
-    #define DEF_EVENT(EVENT) PyFunction_EVENT_##EVENT,
-    FOREACH_FUNC_EVENT(DEF_EVENT)
-    #undef DEF_EVENT
+    #define PY_DEF_EVENT(EVENT) PyFunction_EVENT_##EVENT,
+    PY_FOREACH_FUNC_EVENT(PY_DEF_EVENT)
+    #undef PY_DEF_EVENT
 } PyFunction_WatchEvent;
 
 /*
diff --git a/Include/internal/pycore_dict.h b/Include/internal/pycore_dict.h
index c74a34377130..1af5e59a677a 100644
--- a/Include/internal/pycore_dict.h
+++ b/Include/internal/pycore_dict.h
@@ -164,6 +164,7 @@ _PyDict_NotifyEvent(PyDict_WatchEvent event,
                     PyObject *key,
                     PyObject *value)
 {
+    assert(Py_REFCNT((PyObject*)mp) > 0);
     int watcher_bits = mp->ma_version_tag & DICT_VERSION_MASK;
     if (watcher_bits) {
         _PyDict_SendEvent(watcher_bits, event, mp, key, value);
diff --git a/Lib/test/test_capi/test_watchers.py b/Lib/test/test_capi/test_watchers.py
index 1922614ef605..93f6ef752d06 100644
--- a/Lib/test/test_capi/test_watchers.py
+++ b/Lib/test/test_capi/test_watchers.py
@@ -109,10 +109,21 @@ def test_error(self):
             self.watch(wid, d)
             with catch_unraisable_exception() as cm:
                 d["foo"] = "bar"
-                self.assertIs(cm.unraisable.object, d)
+                self.assertIn(
+                    "PyDict_EVENT_ADDED watcher callback for <dict at",
+                    cm.unraisable.object
+                )
                 self.assertEqual(str(cm.unraisable.exc_value), "boom!")
             self.assert_events([])
 
+    def test_dealloc_error(self):
+        d = {}
+        with self.watcher(kind=self.ERROR) as wid:
+            self.watch(wid, d)
+            with catch_unraisable_exception() as cm:
+                del d
+                self.assertEqual(str(cm.unraisable.exc_value), "boom!")
+
     def test_two_watchers(self):
         d1 = {}
         d2 = {}
@@ -389,6 +400,25 @@ def test_code_object_events_dispatched(self):
         del co4
         self.assert_event_counts(0, 0, 0, 0)
 
+    def test_error(self):
+        with self.code_watcher(2):
+            with catch_unraisable_exception() as cm:
+                co = _testcapi.code_newempty("test_watchers", "dummy0", 0)
+
+                self.assertEqual(
+                    cm.unraisable.object,
+                    f"PY_CODE_EVENT_CREATE watcher callback for {co!r}"
+                )
+                self.assertEqual(str(cm.unraisable.exc_value), "boom!")
+
+    def test_dealloc_error(self):
+        co = _testcapi.code_newempty("test_watchers", "dummy0", 0)
+        with self.code_watcher(2):
+            with catch_unraisable_exception() as cm:
+                del co
+
+                self.assertEqual(str(cm.unraisable.exc_value), "boom!")
+
     def test_clear_out_of_range_watcher_id(self):
         with self.assertRaisesRegex(ValueError, r"Invalid code watcher ID -1"):
             _testcapi.clear_code_watcher(-1)
@@ -479,7 +509,25 @@ def watcher(*args):
                 def myfunc():
                     pass
 
-                self.assertIs(cm.unraisable.object, myfunc)
+                self.assertEqual(
+                    cm.unraisable.object,
+                    f"PyFunction_EVENT_CREATE watcher callback for {myfunc!r}"
+                )
+
+    def test_dealloc_watcher_raises_error(self):
+        class MyError(Exception):
+            pass
+
+        def watcher(*args):
+            raise MyError("testing 123")
+
+        def myfunc():
+            pass
+
+        with self.add_watcher(watcher):
+            with catch_unraisable_exception() as cm:
+                del myfunc
+
                 self.assertIsInstance(cm.unraisable.exc_value, MyError)
 
     def test_clear_out_of_range_watcher_id(self):
diff --git a/Modules/_testcapi/watchers.c b/Modules/_testcapi/watchers.c
index d9ace632768a..1284fdc2767b 100644
--- a/Modules/_testcapi/watchers.c
+++ b/Modules/_testcapi/watchers.c
@@ -317,6 +317,13 @@ noop_code_event_handler(PyCodeEvent event, PyCodeObject *co)
     return 0;
 }
 
+static int
+error_code_event_handler(PyCodeEvent event, PyCodeObject *co)
+{
+    PyErr_SetString(PyExc_RuntimeError, "boom!");
+    return -1;
+}
+
 static PyObject *
 add_code_watcher(PyObject *self, PyObject *which_watcher)
 {
@@ -333,7 +340,11 @@ add_code_watcher(PyObject *self, PyObject *which_watcher)
         num_code_object_created_events[1] = 0;
         num_code_object_destroyed_events[1] = 0;
     }
+    else if (which_l == 2) {
+        watcher_id = PyCode_AddWatcher(error_code_event_handler);
+    }
     else {
+        PyErr_Format(PyExc_ValueError, "invalid watcher %d", which_l);
         return NULL;
     }
     if (watcher_id < 0) {
@@ -672,7 +683,7 @@ _PyTestCapi_Init_Watchers(PyObject *mod)
                        PyFunction_EVENT_##event)) {   \
         return -1;                                    \
     }
-    FOREACH_FUNC_EVENT(ADD_EVENT);
+    PY_FOREACH_FUNC_EVENT(ADD_EVENT);
 #undef ADD_EVENT
 
     return 0;
diff --git a/Objects/codeobject.c b/Objects/codeobject.c
index 175bd57568f8..65b1d258fb76 100644
--- a/Objects/codeobject.c
+++ b/Objects/codeobject.c
@@ -11,9 +11,24 @@
 #include "pycore_tuple.h"         // _PyTuple_ITEMS()
 #include "clinic/codeobject.c.h"
 
+static PyObject* code_repr(PyCodeObject *co);
+
+static const char *
+code_event_name(PyCodeEvent event) {
+    switch (event) {
+        #define CASE(op)                \
+        case PY_CODE_EVENT_##op:         \
+            return "PY_CODE_EVENT_" #op;
+        PY_FOREACH_CODE_EVENT(CASE)
+        #undef CASE
+    }
+    Py_UNREACHABLE();
+}
+
 static void
 notify_code_watchers(PyCodeEvent event, PyCodeObject *co)
 {
+    assert(Py_REFCNT(co) > 0);
     PyInterpreterState *interp = _PyInterpreterState_GET();
     assert(interp->_initialized);
     uint8_t bits = interp->active_code_watchers;
@@ -25,7 +40,21 @@ notify_code_watchers(PyCodeEvent event, PyCodeObject *co)
             // callback must be non-null if the watcher bit is set
             assert(cb != NULL);
             if (cb(event, co) < 0) {
-                PyErr_WriteUnraisable((PyObject *) co);
+                // Don't risk resurrecting the object if an unraisablehook keeps
+                // a reference; pass a string as context.
+                PyObject *context = NULL;
+                PyObject *repr = code_repr(co);
+                if (repr) {
+                    context = PyUnicode_FromFormat(
+                        "%s watcher callback for %U",
+                        code_event_name(event), repr);
+                    Py_DECREF(repr);
+                }
+                if (context == NULL) {
+                    context = Py_NewRef(Py_None);
+                }
+                PyErr_WriteUnraisable(context);
+                Py_DECREF(context);
             }
         }
         i++;
@@ -1667,7 +1696,14 @@ code_new_impl(PyTypeObject *type, int argcount, int posonlyargcount,
 static void
 code_dealloc(PyCodeObject *co)
 {
+    assert(Py_REFCNT(co) == 0);
+    Py_SET_REFCNT(co, 1);
     notify_code_watchers(PY_CODE_EVENT_DESTROY, co);
+    if (Py_REFCNT(co) > 1) {
+        Py_SET_REFCNT(co, Py_REFCNT(co) - 1);
+        return;
+    }
+    Py_SET_REFCNT(co, 0);
 
     if (co->co_extra != NULL) {
         PyInterpreterState *interp = _PyInterpreterState_GET();
diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index fc658ca2f4b7..e3795e75e3da 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -2308,7 +2308,14 @@ _PyDict_FromKeys(PyObject *cls, PyObject *iterable, PyObject *value)
 static void
 dict_dealloc(PyDictObject *mp)
 {
+    assert(Py_REFCNT(mp) == 0);
+    Py_SET_REFCNT(mp, 1);
     _PyDict_NotifyEvent(PyDict_EVENT_DEALLOCATED, mp, NULL, NULL);
+    if (Py_REFCNT(mp) > 1) {
+        Py_SET_REFCNT(mp, Py_REFCNT(mp) - 1);
+        return;
+    }
+    Py_SET_REFCNT(mp, 0);
     PyDictValues *values = mp->ma_values;
     PyDictKeysObject *keys = mp->ma_keys;
     Py_ssize_t i, n;
@@ -5732,6 +5739,18 @@ PyDict_ClearWatcher(int watcher_id)
     return 0;
 }
 
+static const char *
+dict_event_name(PyDict_WatchEvent event) {
+    switch (event) {
+        #define CASE(op)                \
+        case PyDict_EVENT_##op:         \
+            return "PyDict_EVENT_" #op;
+        PY_FOREACH_DICT_EVENT(CASE)
+        #undef CASE
+    }
+    Py_UNREACHABLE();
+}
+
 void
 _PyDict_SendEvent(int watcher_bits,
                   PyDict_WatchEvent event,
@@ -5744,9 +5763,18 @@ _PyDict_SendEvent(int watcher_bits,
         if (watcher_bits & 1) {
             PyDict_WatchCallback cb = interp->dict_state.watchers[i];
             if (cb && (cb(event, (PyObject*)mp, key, value) < 0)) {
-                // some dict modification paths (e.g. PyDict_Clear) can't raise, so we
-                // can't propagate exceptions from dict watchers.
-                PyErr_WriteUnraisable((PyObject *)mp);
+                // We don't want to resurrect the dict by potentially having an
+                // unraisablehook keep a reference to it, so we don't pass the
+                // dict as context, just an informative string message.  Dict
+                // repr can call arbitrary code, so we invent a simpler version.
+                PyObject *context = PyUnicode_FromFormat(
+                    "%s watcher callback for <dict at %p>",
+                    dict_event_name(event), mp);
+                if (context == NULL) {
+                    context = Py_NewRef(Py_None);
+                }
+                PyErr_WriteUnraisable(context);
+                Py_DECREF(context);
             }
         }
         watcher_bits >>= 1;
diff --git a/Objects/funcobject.c b/Objects/funcobject.c
index 91a6b3dd40a2..99048ea41c6c 100644
--- a/Objects/funcobject.c
+++ b/Objects/funcobject.c
@@ -8,6 +8,20 @@
 #include "pycore_pyerrors.h"      // _PyErr_Occurred()
 #include "structmember.h"         // PyMemberDef
 
+static PyObject* func_repr(PyFunctionObject *op);
+
+static const char *
+func_event_name(PyFunction_WatchEvent event) {
+    switch (event) {
+        #define CASE(op)                \
+        case PyFunction_EVENT_##op:         \
+            return "PyFunction_EVENT_" #op;
+        PY_FOREACH_FUNC_EVENT(CASE)
+        #undef CASE
+    }
+    Py_UNREACHABLE();
+}
+
 static void
 notify_func_watchers(PyInterpreterState *interp, PyFunction_WatchEvent event,
                      PyFunctionObject *func, PyObject *new_value)
@@ -21,7 +35,21 @@ notify_func_watchers(PyInterpreterState *interp, PyFunction_WatchEvent event,
             // callback must be non-null if the watcher bit is set
             assert(cb != NULL);
             if (cb(event, func, new_value) < 0) {
-                PyErr_WriteUnraisable((PyObject *) func);
+                // Don't risk resurrecting the func if an unraisablehook keeps a
+                // reference; pass a string as context.
+                PyObject *context = NULL;
+                PyObject *repr = func_repr(func);
+                if (repr != NULL) {
+                    context = PyUnicode_FromFormat(
+                        "%s watcher callback for %U",
+                        func_event_name(event), repr);
+                    Py_DECREF(repr);
+                }
+                if (context == NULL) {
+                    context = Py_NewRef(Py_None);
+                }
+                PyErr_WriteUnraisable(context);
+                Py_DECREF(context);
             }
         }
         i++;
@@ -33,6 +61,7 @@ static inline void
 handle_func_event(PyFunction_WatchEvent event, PyFunctionObject *func,
                   PyObject *new_value)
 {
+    assert(Py_REFCNT(func) > 0);
     PyInterpreterState *interp = _PyInterpreterState_GET();
     assert(interp->_initialized);
     if (interp->active_func_watchers) {
@@ -766,7 +795,14 @@ func_clear(PyFunctionObject *op)
 static void
 func_dealloc(PyFunctionObject *op)
 {
+    assert(Py_REFCNT(op) == 0);
+    Py_SET_REFCNT(op, 1);
     handle_func_event(PyFunction_EVENT_DESTROY, op, NULL);
+    if (Py_REFCNT(op) > 1) {
+        Py_SET_REFCNT(op, Py_REFCNT(op) - 1);
+        return;
+    }
+    Py_SET_REFCNT(op, 0);
     _PyObject_GC_UNTRACK(op);
     if (op->func_weakreflist != NULL) {
         PyObject_ClearWeakRefs((PyObject *) op);



More information about the Python-checkins mailing list