[Python-Dev] Re: [Python-checkins]python/dist/src/Modulesgcmodule.c,2.33.6.5,2.33.6.6 (original) (raw)
Tim Peters tim_one@email.msn.com
Sun, 6 Apr 2003 20:47:53 -0400
- Previous message: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
- Next message: [Python-Dev] Re: [Python-checkins]python/dist/src/Modulesgcmodule.c,2.33.6.5,2.33.6.6
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Jeremy Hylton]
I think I'll second the thought that there are no satisfactory answers here. We've made a big step forward by fixing the core dumps.
If we want to document the current behavior, we would say that garbage collection may leave reachable objects in an "invalid state" in the presence of "problematic objects." A "problematic object" is an instance of a classic class that defines a getattr hook (getattr) but not a finalizer (del). An object an in "invalid state" has had its tpclear slot executed; in the case of instances, this means the dict will be empty. Specifically, if a problematic object is part of unreachable cycle, the garbage collector will execute the code in its getattr hook; if executing that code makes any object in the cycle reachable again, it will be left in an invalid state.
I expect that documenting it comprehensbly is impossible. For example, the referrent of "it" in your last sentence is unclear, and hard to flesh out. A problematic object doesn't need to be part of a cycle to cause problems, and when it does cause problems the things that end up in an unexpected state needn't be part of cycles either. It's more that the problematic object needs to be reachable only from an unreachable cycle (the unreachable cycle needn't contain problematic objects), and then it's all the objects reachable only from the unreachable cycle and from the problematic object that may be in trouble (and regardless of whether they're in cycles). Here's a concrete example, where the instance of the problematic D isn't in a cycle, and neither are the list or the dict that get magically cleared (.mylist and .mydict) despite being resurrected:
""" class C: pass
class D: def init(self): self.mydict = {'a': 1, 'b': 2} self.mylist = range(100)
def __getattr__(self, attribute):
global alist
if attribute == "__del__":
alist.append(self.mydict)
alist.append(self.mylist)
raise AttributeError
import gc gc.collect()
a = C() a.loop = a # make a cycle a.d_instance = D() # an instance of D hangs off the cycle
alist = [] del a print gc.collect() # 6: a, a.d_instance, their dicts, and D()'s # mydict and mylist
print alist # [(), []] """
If we had enough words to explain that, it still wouldn't be enough, because the effect of calling tp_clear isn't defined by the language for any type. If, for example, D also defined a .mytuple attr and resurrected it in getattr, the user would see that that one survived OK (tuples happen to have a NULL tp_clear slot).
If we document this for 2.2, it's more complicated because instances of new-style classes are also affected. What's worse, a new-style class with a getattribute hook is affected regardless of whether it has a finalizer.
In 2.2 but not 2.3, right? I haven't tried anything with getattribute. For that matter, in my own Python programming, I've never even defined a getattr method -- I spend most of my life tracking down bugs in things I don't use .
Here are a couple of thoughts about how to avoid leaving objects in an invalid state.
I'd much rather pursue that than write docs nobody will understand.
It's pretty unlikely for it to happen, but speaking from experience it's baffling when it does.
#1. (I think this was Fred's suggestion on Friday.) Don't do a hasattr() check on the object, do it on the class. This is what happens with new-style classes in Python 2.3: If a new-style class doesn't define an del method, then its instances don't have finalizer. It doesn't matter whether the specific instance has an del attribute. Limitations: This is a change in semantics, although it only covers a nearly insane corner case. The other limitation is that things could still go wrong, although only in the presence of a classic metaclass!
I'm not sure I followed the last sentence. If I did, screw calling hasattr() -- do a string lookup for "del" in the classic class's dict, and that's it. Anything that ends up executing arbitrary Python code is going to leave holes.
#2. If an object has a getattr hook and it's involved in a cycle, just put it in gc.garbage. Forget about checking for a finalizer. That seems fine for 2.3, since we're only talking about classic classes with getattr hooks. But it doesn't sound very pleasant for 2.2, since it covers an class instance with a getattr hook.
I'd like to avoid expanding the definition of what ends up in gc.garbage. The relationship to del and unreachable cycles is explainable now, modulo the getattr insanity. Getting rid of the latter is a lot more attractive than folding it into the former.
I think #1 is pretty reasonable. I'd like to see something fixed for 2.2.3, but I worry that the semantic change may be unacceptable for a bug fix release. (But maybe not, the semantics are pretty insane right now :-).
I have no problem with changing this for 2.2.3. I doubt any Python app will be affected, except possibly to rid 1 in 10,000 of a subtle bug. There's certainly no defensible app that relied on Python segfaulting here, and I can't imagine any relying on containers getting magically cleared at unpredictable times.
BTW, I'm still wondering why the ZODB thread test failed the way it did for Tres and Barry and me: you saw corrupt gc lists, but the rest of us never did. We saw a Connection instance with a mysteriously cleared dict. That's consistent with the getattr-hook-resurrects-an- object-reachable-only-from-an-unreachable-cycle examples I posted, but did you guys figure out on Friday whether that's what was actually happening? The corrupt-gc-lists symptom was explained by the getattr hook deleting unreachable objects while gc was still crawling over them, and that's a different (albeit related) problem than dicts getting cleared by magic.
- Previous message: [Python-Dev] Re: [Python-checkins]python/dist/src/Modules gcmodule.c,2.33.6.5,2.33.6.6
- Next message: [Python-Dev] Re: [Python-checkins]python/dist/src/Modulesgcmodule.c,2.33.6.5,2.33.6.6
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]