[Python-Dev] Playing games with reference counts (was Re: PyWeakref_GetObject() borrows its reference from... whom?) (original) (raw)

Larry Hastings larry at hastings.org
Thu Oct 13 07:41:14 EDT 2016


On 10/10/2016 10:38 PM, Chris Angelico wrote:

On Tue, Oct 11, 2016 at 8:14 AM, Larry Hastings <larry at hastings.org> wrote:

These hacks where we play games with the reference count are mostly removed in my branch. That's exactly what I would have said, because I was assuming that refcounts would be accurate. I'm not sure what you mean by "play games with",

By "playing games with reference counts", I mean code that purposely doesn't follow the rules of reference counting. Sadly, there are special cases that apparently are special enough to break the rules.
Which made implementing "buffered reference counting" that much harder.

I currently know of two examples of this in CPython. In both instances, an object has a reference to another object, but deliberately does not increase the reference count of the object, in order to prevent keeping the other object alive. The implementation relies on the GIL to preserve correctness; without a GIL, it was much harder to ensure this code was correct. (And I'm still not 100% I've done it. More thinking needed.)

Those two examples are:

  1. PyWeakReference objects. The wr_object pointer--the "reference" held by the weak reference object--points to an object, but does not increment the reference count. Worse yet, as already observed, PyWeakref_GetObject() and PyWeakref_GET_OBJECT() don't increment the reference count, an inconvenient API decision from my perspective.
  2. "Interned mortal" strings. When a string is both interned and mortal, it's stored in the static "interned" dict in unicodeobject.c--as both key and value--and then its's DECREF'd twice so those two references don't count. When the string is destroyed, unicode_dealloc resurrects the string, reinstating those two references, then removes it from the "interned" dict, then destroys the string as normal.

To support these, I've implemented what is effectively a secondary, atomic-only reference count. It seems to work. (And yes that means all objects are now 8 bytes bigger. Let me worry about memory consumption later, m'kay?)

Resurrecting object also gave me a headache in the Gilectomy with this buffered reference counting scheme, but I think I have that figured out too. When you resurrect an object, it's generally because you're going to expose it to other subsystems that may incr / decr / otherwise inspect the reference count. Which means that code may buffer reference count changes. Which means you can't immediately destroy the object anymore. So: when you resurrect, you set the new reference count, you also set a flag saying "I've already been resurrected", you pass it in to that other code, you then drop your references with Py_DECREF, and you exit. Your dealloc function will get called again later; you then see you've already done that first resurrection, and you destroy as normal. Curiously enough, the typeobject actually needs to do this twice: once for tp_finalize, once for tp_del. (Assuming I didn't completely misunderstand what the code was doing.)

My struggles continue,

//arry/

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20161013/282f9564/attachment.html>



More information about the Python-Dev mailing list