[Python-Dev] Trashing recursive objects comparison? (original) (raw)

Armin Rigo arigo at tunes.org
Fri Oct 24 12:08:27 EDT 2003

Previous message: [Python-Dev] Trashing recursive objects comparison?
Next message: [Python-Dev] Trashing recursive objects comparison?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello,

On Fri, Oct 17, 2003 at 07:46:31AM -0700, Guido van Rossum wrote:

> If the pretty academic subject of graph isomorphisms is well-worn > enough to be sent to the trash, I'll submit a patch that just > removes all this code and instead use the existing > sys.recursionlimit counter to catch infinite recursions and throw > the usual RuntimeError.

Patch http://www.python.org/sf/825639.

Rationale

Adding a list to itself is a nice first-time example of Python features, but it is quite uncommon in practice. It introduces a few problems for the CPython interpreter, which must explicitely detect and avoid infinite recursions not only to please the user (e.g. for a nicer str() representation)
but because infinite C-level recursions crash it.

The naive definition of comparison between lists is recursive, and thus suffers from this problem. "Bisimulation" is one of the possible mathematically clean definition of what it means for two recursive structures to be equal; this is what CPython currently implements. However, I argue that this behavior is unexpected (and undocumented), and it masks bugs in erroneous user code: structures may be considered equal by error. Triggering an explicit "infinite recursion" exception would have clearly pointed out the problem.

The current implementation of equality is to return True if comparison of two containers recursively reaches the same pair of containers. This is arguably the same as if the following code:

def f():
    return f()

returned None instead of looping indefinitely, because some dark magic in CPython decides that returning None is a good idea (and returning None is consistent: f() can return None if the nested f() call returns None too. Of course, returning anything else would be consistent too, but then for the equality we decide to return True whereas returning False would be consistent too, and would just make less structures to be considered as equal).

Workarounds

Under the proposed patch, applications that relied on equality to compare recursive structures will receive a RuntimeError: maximum recursion depth exceeded in cmp. This error does not come with a long traceback, unlike the normal "maximum recursion depth exceeded" error, unless user-defined (pure Python) comparison operators are involved in the infinite loop.

It is easy to write the bisimulation algorithm in Python if one needs it, but it is harder and quite unnatural to do the converse: work around CPython's implementation of equality to turn off the bisimulation behavior.

Three approaches can be taken to port applications:

structural equality can often be replaced by explicit structural tests.
This is what the patch does for all the tests in Lib/test that relied on recursive equality. For example, if you want to check that an object is really a list that contains itself and nothing else, you can easily check that "isinstance(a, list) and len(a) == 1 and a[0] is a". This is more precise that the now-deprecated variants "a==a[0]" or "b=[]; b.append(b); a==b" because the latters would also succeed if a is [c] and c is [a], for example.
among the rare cases where we really want bisimulation, cyclic structures involving user-defined objects with a custom notion of equality are probably the most common case. If so, then it is straightforward to add a cache to the eq operator:

def eq(self, other): if id(other) in self.assumedequal: return True try: self.assumedequal[id(other)] = True #...recursive comparisons... finally: del self.assumedequal[id(other)]

This typically only needs to be done for one of the classes involved -- as long as all cycles you are interested in will involve an instance of this class.

finally, to compare cyclic structures made only from built-in containers, an explicit "global" algorithm will do the trick. Here is a non-recursive one for lists:

def bisimilar_lists(a, b): def consider(a, b): key = id(a), id(b) if key not in bisim: bisim[key] = True pending.append((a, b)) bisim = {} pending = [] consider(a, b) for a, b in pending: # a, b are the lists to compare if len(a) != len(b): # different length return False for a1, b1 in zip(a, b): if type(a1) != type(b1): # elements of different types return False if isinstance(a1, list): consider(a1, b1) # add the two sub-lists to 'pending' elif a1 != b1: return False # else compare non-lists directory return True

This could easily be extended to provide support for dictionaries. The complete equivalent of the current CPython implementation is harder to achieve, but in the improbable case where the user really needs it (as opposed to one of the above solutions), he could define custom special methods, say bisimilar(). He would then extend the above algorithm to call this method in preference to eq() when it exists. Alternatively, he could define a global dictionary mapping types to bisimulation algorithms, with a registration mecanism for new types. (This is similar to copy.py and copy_reg.py. It could be added to the standard library.)

Patch info

The proposed patch adds two functions to the C API:

int Py_EnterRecursiveCall(char *where)

    Marks a point where a recursive C-level call is about to be
    performed.  'where' should be a string " in xyz" to be concatenated
    to the RuntimeError message caused by the recursion depth limit.

void Py_LeaveRecursiveCall()

    Ends a Py_EnterRecursiveCall().  Must be called once for each
    *successful* invocation of Py_EnterRecursiveCall().

These functions are used to simplify the code of the following:

eval_frame()
PyObject_Compare()
PyObject_RichCompare()
instance_call()

The idea to make these two functions part of the public API is to have a well-tested and PyOS_CheckStack()-issuing way to perform safe recursive calls at the C level, both in the core and in extension modules. For example, cPickle.c has its own notion of recursion depth limit, but it does not check the OS stack; instead, it should probably use Py_EnterRecursiveCall() as well (which I did not do yet).

Note that Py_EnterRecursiveCall() does the same checks as eval_frame() used to do, whereas Py_LeaveRecursiveCall() is actually a single-instruction macro.
There is a performance degradation for the comparison of large non-cyclic lists, which I measure to be about 6-7% slower with the patch. Possibly, extra work could be done to tune Py_EnterRecursiveCall().

Another problem that Py_EnterRecursiveCall() could be enhanced to also address is that a long, non-recursive comparison cannot currently be interrupted by Ctrl-C. For example:

a = [5] * 1000 b = [a] * 1000 c = [b] * 1000 c == c

-=-

Armin

Previous message: [Python-Dev] Trashing recursive objects comparison?
Next message: [Python-Dev] Trashing recursive objects comparison?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list