Issue 22735: Fix various crashes exposed through mro() customization (original) (raw)

The attached patch fixes several bugs revealed by providing a custom mro(). Most of these bugs are reentrancy issues (mainly, cls.bases assignment within mro() which causes incorrect refcounting), but there are also some issues when using incomplete types with uninitialized MRO (dereferencing NULL when extending such types, attribute lookup on super, etc.). The patch is made against the default branch (py3k), however all these bugs exist in Python 2 as well.

I also tried to break the patch into smaller pieces (commits, in fact) to ease reviewing: a common pattern is test->minor->fix series. The patch set is an output of git format-patch, and most patches have a detailed commit message describing a change inside.

Adding memory mgmt and object model guys to nosy list.

Thank you for your replies.

I don't think forbidding reentrancy is a good idea, and I'm against it for the following reasons.

First of all, naive prohibition of type_set_bases within mro() on a class doesn't save from mro() reentrancy:

def mro(cls):
    # cls is B,
    # B extends A

    A.__bases__ = ...  # What to do?

Changing bases of a class provokes MRO of all its subclasses to be updated as well. Therefore, one would also need to forbid changing bases of any direct or indirect base of the class up to the 'object', and this doesn't feel natural at all. Why should mro() of some specialized class B prevent changing bases of its general parent class A?

Otherwise, there must be a check in mro_subclasses (mro_hierarchy in the patch), which will not recurse into a class flagged as being inside mro(). But in this case a running mro(B) will be unable to notice a change of MRO of the parent class A, in case if mro() involves some non-trivial logic, and A.bases assignment occurs somewhere deep in the call stack and is done by a code from some unknown black-box module. Deferring mro() recalculation upon exiting from the outermost mro() in such case doesn't seem to be a good solution either, especially concerning its (most likely) tricky implementation.

def mro(cls):
    # some complicated calculation based of MROs of bases:
    parent_mros = [base.__mro__ for base in cls.__bases__]

    # The line below does this:
    #   parent.__bases__ = ...
    # (deep-deep-deep in the call stack)
    third_party.do_mro(cls)

    # Either the line above raises an error,
    # or parent_mros becomes invalid.

Another example. Setting bases from inside mro() may be useful for changing class behavior on the fly using a proxy base class (e.g. for debugging, logging, testing frameworks). The only place to do this except mro() is new, but in that case it is only possible to fix up bases at the moment of class creation. That is, a tricky class that change its bases would break such framework by overriding a proxy base, which is necessary for this framework to function properly.

def mro(cls):
    if cls or one of cls.__bases__ is not a proxy class:
        class Proxy(*cls.__bases__):
            ...
        cls.__bases__ = (Proxy,)  # reenter

    return type.mro(cls)

In other words, there should be a loophole to alter bases upon assignment, and I suppose mro() is a good option.

Also, bases assignment is known as the only reliable way to invoke MRO recalculation (http://stackoverflow.com/a/20832588/545027). Forbidding it from the inside mro() makes it not so obvious and reliable.

Actually, I encountered one of these bugs while working on a real metaclass providing an auto-inheriting attributes feature needed for a custom DSL built on top of Python. In a nutshell, if B extends A, then make B.V extend A.V automatically (https://github.com/abusalimov/mybuild/blob/6c7c89521b856c798b46732501adb5e06dec7e03/util/inherit.py, still work in progress).

I know, some of these use cases may sound a bit unreal (or even crazy), but neither me nor you can imagine all scenarios, that Python programmers could ever invent. Actually, there could already exist some: it is possible to workaround most of reentrancy issues through holding a reference to old_mro from inside a Python code. That is, backward compatibility comes into play too.

Finally, why do forbid something that was not prohibited before? I think of it as a feature with some issues to fix, not to remove at all. After all, there is a fix provided, it is more or less straightforward (I hope so), with tests showing a desired behavior. The desired behavior is also more or less obvious: take a mro calculated by mro() invoked due to the very last bases assignment, regardless reentrancy (the innermost one in such case).

Summarizing, that is why I believe that reentrancy should not be forbidden. Furthermore, I considered that way, and I'm pretty sure it is a wrong one. It implies too many corner cases, it has a non-obvious behavior from the point of view of a Python code, and a possible implementation doesn't seem to be more simple or robust than it is now.

I would be happy to hear opposite arguments, and if I convinced you, get a feedback on the patch, of course. I could also prepare a backport patch fixing 2.7 line as well.