[Python-Dev] Big trouble in CVS Python (original) (raw)

Tim Peters tim_one@email.msn.com
Sun, 13 Apr 2003 18:07:05 -0400


[Jeremy Hylton]

We've had a lot of changes to the function call implementation over the last couple of months. What's the chance that this is just the first time we've noticed the problem?

Slim, I think -- anything systematically screwing up refcounts on calls would have lots of opportunities to create trouble. This one was unique and shy.

Seems pretty plausible that the recent GC changes just exposed an earlier bug.

For all the code changes, the only intended semantic difference was in has_finalizer's implementation details. So that didn't seem likely either.

Turned out that the damaged co_consts was attached to the test that exercised the new C code at fault. The code was compiled gazillions of cycles before the test was executed, though, and gazillions more cycles passed before GC bumped into the damage. If gc hadn't bumped into it, the memory would have gotten allocated to some other float, and then would have been decref'ed incorrectly when the original co_consts got deallocated. So it could have been much harder to track down .

What I still don't grasp is why a debug run never failed with a negative-refcount error. Attaching the prematurely-freed float to the float free list doesn't change its refcount field -- that remains 0. So if it was still in the free list when the original co_consts got reclaimed, we should have had a negrefcnt death. OTOH, if the memory was handed out to another float, then when the original co_consts got reclaimed it would have knocked that float's refcount down too, which should lead to a negrefcnt death later. Maybe co_consts never did get reclaimed? I'm not clear on how much we let slide at shutdown.