[Python-Dev] Trashing recursive objects comparison?

Armin Rigo arigo at tunes.org
Fri Oct 24 12:08:27 EDT 2003


Hello,

On Fri, Oct 17, 2003 at 07:46:31AM -0700, Guido van Rossum wrote:
> > If the pretty academic subject of graph isomorphisms is well-worn
> > enough to be sent to the trash, I'll submit a patch that just
> > removes all this code and instead use the existing
> > sys.recursionlimit counter to catch infinite recursions and throw
> > the usual RuntimeError.

Patch http://www.python.org/sf/825639.

Rationale
---------

Adding a list to itself is a nice first-time example of Python features, but
it is quite uncommon in practice.  It introduces a few problems for the
CPython interpreter, which must explicitely detect and avoid infinite
recursions not only to please the user (e.g. for a nicer str() representation)  
but because infinite C-level recursions crash it.

The naive definition of comparison between lists is recursive, and thus
suffers from this problem.  "Bisimulation" is one of the possible
mathematically clean definition of what it means for two recursive structures
to be equal; this is what CPython currently implements.  However, I argue that
this behavior is unexpected (and undocumented), and it masks bugs in erroneous
user code: structures may be considered equal by error.  Triggering an
explicit "infinite recursion" exception would have clearly pointed out the
problem.

The current implementation of equality is to return True if comparison of two
containers recursively reaches the same pair of containers.  This is arguably
the same as if the following code:

    def f():
        return f()

returned None instead of looping indefinitely, because some dark magic in
CPython decides that returning None is a good idea (and returning None is
consistent: f() can return None if the nested f() call returns None too.
Of course, returning anything else would be consistent too, but then for the
equality we decide to return True whereas returning False would be consistent
too, and would just make less structures to be considered as equal).

Workarounds
-----------

Under the proposed patch, applications that relied on equality to compare
recursive structures will receive a RuntimeError: maximum recursion depth
exceeded in cmp. This error does not come with a long traceback, unlike the
normal "maximum recursion depth exceeded" error, unless user-defined (pure
Python) comparison operators are involved in the infinite loop.

It is easy to write the bisimulation algorithm in Python if one needs it, but
it is harder and quite unnatural to do the converse: work around CPython's
implementation of equality to turn off the bisimulation behavior.

Three approaches can be taken to port applications:

- structural equality can often be replaced by explicit structural tests.  
This is what the patch does for *all* the tests in Lib/test that relied on
recursive equality.  For example, if you want to check that an object is
really a list that contains itself and nothing else, you can easily check that
"isinstance(a, list) and len(a) == 1 and a[0] is a".  This is more precise
that the now-deprecated variants "a==a[0]" or "b=[]; b.append(b); a==b"
because the latters would also succeed if a is [c] and c is [a], for example.

- among the rare cases where we really want bisimulation, cyclic structures
involving user-defined objects with a custom notion of equality are probably
the most common case.  If so, then it is straightforward to add a cache to the 
__eq__ operator:

    def __eq__(self, other):
        if id(other) in self.assumedequal:
            return True
        try:
            self.assumedequal[id(other)] = True
            #...recursive comparisons...
        finally:
            del self.assumedequal[id(other)]

This typically only needs to be done for one of the classes involved -- as
long as all cycles you are interested in will involve an instance of this
class.

- finally, to compare cyclic structures made only from built-in containers, an
explicit "global" algorithm will do the trick.  Here is a non-recursive one
for lists:

def bisimilar_lists(a, b):
    def consider(a, b):
        key = id(a), id(b)
        if key not in bisim:
            bisim[key] = True
            pending.append((a, b))
    bisim = {}
    pending = []
    consider(a, b)
    for a, b in pending:    # a, b are the lists to compare
        if len(a) != len(b):  # different length
            return False
        for a1, b1 in zip(a, b):
            if type(a1) != type(b1):  # elements of different types
                return False
            if isinstance(a1, list):
                consider(a1, b1)  # add the two sub-lists to 'pending'
            elif a1 != b1:
                return False    # else compare non-lists directory
    return True

This could easily be extended to provide support for dictionaries.  The
complete equivalent of the current CPython implementation is harder to
achieve, but in the improbable case where the user really needs it (as opposed
to one of the above solutions), he could define custom special methods, say
__bisimilar__().  He would then extend the above algorithm to call this method
in preference to __eq__() when it exists.  Alternatively, he could define a
global dictionary mapping types to bisimulation algorithms, with a
registration mecanism for new types.  (This is similar to copy.py and
copy_reg.py.  It could be added to the standard library.)

Patch info
----------

The proposed patch adds two functions to the C API:

int Py_EnterRecursiveCall(char *where)

        Marks a point where a recursive C-level call is about to be
        performed.  'where' should be a string " in xyz" to be concatenated
        to the RuntimeError message caused by the recursion depth limit.

void Py_LeaveRecursiveCall()

        Ends a Py_EnterRecursiveCall().  Must be called once for each
        *successful* invocation of Py_EnterRecursiveCall().

These functions are used to simplify the code of the following:

- eval_frame()
- PyObject_Compare()
- PyObject_RichCompare()
- instance_call()

The idea to make these two functions part of the public API is to have a
well-tested and PyOS_CheckStack()-issuing way to perform safe recursive calls
at the C level, both in the core and in extension modules.  For example,
cPickle.c has its own notion of recursion depth limit, but it does not check
the OS stack; instead, it should probably use Py_EnterRecursiveCall() as well
(which I did not do yet).

Note that Py_EnterRecursiveCall() does the same checks as eval_frame() used to
do, whereas Py_LeaveRecursiveCall() is actually a single-instruction macro.  
There is a performance degradation for the comparison of large non-cyclic
lists, which I measure to be about 6-7% slower with the patch.  Possibly,
extra work could be done to tune Py_EnterRecursiveCall().

Another problem that Py_EnterRecursiveCall() could be enhanced to also address
is that a long, non-recursive comparison cannot currently be interrupted by
Ctrl-C.  For example:

>>> a = [5] * 1000
>>> b = [a] * 1000
>>> c = [b] * 1000
>>> c == c


-=-

Armin




More information about the Python-Dev mailing list