[Python-Dev] The recursion checking problem

Sat Aug 30 22:29:12 CEST 2008

On Sat, Aug 30, 2008 at 1:06 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Hi,
>
> I was working on a recursion overflow checking bug
> (http://bugs.python.org/issue2548) and, while I've managed to produce a working
> patch, I've also become uncomfortable with the very idea of trying to plug all
> those holes just for the sake of plugging them. I'll try to explain why, by
> describing the conflicting factors I've identified:
>
> - more and more, we are adding calls to Py_EnterRecursiveCall() and
> Py_LeaveRecursiveCall() all over the interpreter, to avoid special/obscure
> cases of undetected infinite recursion; this can probably be considered a good
> thing, but:
>
> - after a recursion error has been raised (technically a RuntimeError), usually
> some code has to do cleanup after noticing the exception; this cleanup now can
> very easily bump into the recursion limit again, due to the point mentioned
> above (the funniest example of this is PyErr_ExceptionMatches, which makes a
> call to PyObject_IsSubclass which itself increases the recursion count because
> __subclasscheck__ can be recursively invoked...).
>
> - to counter the latter problem, py3k has introduced a somewhat smarter
> mechanism (which I've tracked down to a commit in the defunct p3yk branch by
> Martin): when the recursion limit is exceeded, a special flag named
> "overflowed" is set in the thread state structure which disables the primary
> recursion check, so that cleanup code has a bit of room to increase the
> recursion count a bit. A secondary recursion check exists (equal to the primary
> one /plus/ 50) and, if it is reached, the interpreter aborts with a fatal error.
> The "overflowed" flag is cleared when the recursion count drops below the
> primary recursion limit /minus/ 50. Now it looks rather smart but:
>
> - unfortunately, some functions inside the interpreter discard every exception
> by design. The primary example is PyDict_GetItem(), which is certainly used
> quite a lot :-)... When PyDict_GetItem() returns NULL, the caller can only
> assume that the key isn't in the dict, it has no way to know that there was a
> critical problem due to a recursion overflow.
>

As the comment says for PyDict_GetItem(), it's a relic from the days
when there was no way to call Python code when making the call. That
is no longer true (and is probably true for a lot of places where a
similar assumption is made).

> I encountered the latter problem when trying to backport the py3k recursion
> overflow algorithm to trunk. A fatal error suddenly appeared in test_cpickle,
> and it turned out that the recursion count was exceeded in
> PyObject_RichCompare(), the error was then cleared in PyDict_GetItem(), but the
> "overflowed" flag was still set so that a subsequent recursion overflow would
> trigger the secondary check and lead to the fatal error.
>
> I guess that, if it doesn't happen in py3k, it's just by chance: the recursion
> overflow is probably happening at another point where errors don't get
> discarded. Indeed, the failure I got on trunk was manifesting itself when
> running "regrtest.py test_cpickle" but not directly "test_cpickle.py"... which
> shows how delicate the recursion mechanism has become.
>
> My attempt to solve the latter problem while still backporting the py3k scheme
> involves clearing the "overflowed" flag in PyErr_Clear(). This makes all tests
> pass ok, but also means the "overflowed" flag loses a lot of its meaning...
> since PyErr_Clear() is called in a lot of places (and, especially, in
> PyDict_GetItem()).
>
>
> Also, at this point I fear that the solution to the problem is becoming,
> because of its complexity, perhaps worse than the problem itself. That's why
> I'm bringing it here, to have your opinion.
>

Well, For Py3K at least we might need to consider going through the C
API and fixing it so that these incorrect assumptions that functions
like PyDict_GetItem() make are no longer made by introducing some new
functions that behave in a "better" way.

And for the recursion issue, I think it stems from corners that are
cut in the C API by us. We inline functions all over the place, assume
that Python's implementation underneath the hood is going to make
calls that stay in C, etc. But as time has gone on and we have added
flexibility to Python, more and more places have a chance to call
Python code and trigger issues.

> (I also suggest that we stop trying to fix recursion checking bugs until the
> stable release, so as to give us some time to do the Right Thing later - if
> there is such a thing)
>

I have no problem punting for now; there is no way I would be willing
to wager any amount of money that the recursion check covered all
cases. I have always viewed the check as a bonus sanity check, but not
something to heavily rely upon.

-Brett