On 10/10/2016 10:38 PM, Chris Angelico       wrote:                 On Tue, Oct 11, 2016 at 8:14 AM, Larry Hastings  wrote:                These hacks where we play games with the reference count are mostly removed in my branch.              That's exactly what I would have said, because I was assuming that refcounts would be accurate. I'm not sure what you mean by "play games with",               By "playing games with reference counts", I mean code that purposely     doesn't follow the rules of reference counting.  Sadly, there are     special cases that apparently *are* special enough to break the     rules.  Which made implementing "buffered reference counting" that     much harder.          I currently know of two examples of this in CPython.  In both     instances, an object has a reference to another object, but     *deliberately* does not increase the reference count of the object,     in order to prevent keeping the other object alive.  The     implementation relies on the GIL to preserve correctness; without a     GIL, it was much harder to ensure this code was correct.  (And I'm     still not 100% I've done it.  More thinking needed.)          Those two examples are:            PyWeakReference objects.  The wr_object pointer--the         "reference" held by the weak reference object--points to an         object, but does not increment the reference count.  Worse yet,         as already observed, PyWeakref_GetObject() and         PyWeakref_GET_OBJECT() don't increment the reference count, an         inconvenient API decision from my perspective.
That a PyWeakReference object does not increment the reference count is the entire point of a weakref. The object wouldn't be destroyed and break the weak reference otherwise. Weak references could be implemented in a different manner - coordinate with the garbage collector to consider things who's only references come from weakrefs as collectable. That'd be an internal overhaul of the weakref implementation and potentially the gc.
             "Interned mortal" strings.  When a string is both interned         *and* mortal, it's stored in the static "interned" dict in         unicodeobject.c--as both key and value--and then its's DECREF'd         twice so those two references don't count.  When the string is         destroyed, unicode_dealloc resurrects the string, reinstating         those two references, then removes it from the "interned" dict,         then destroys the string as normal.

yow. i don't even want to know the history of that one...

    Resurrecting object also gave me a headache in the Gilectomy with       this buffered reference counting scheme, but I think I have that       figured out too.  When you resurrect an object, it's generally       because you're going to expose it to other subsystems that may       incr / decr / otherwise inspect the reference count.  Which means       that code may buffer reference count changes.  Which means you       can't immediately destroy the object anymore.  So: when you       resurrect, you set the new reference count, you also set a flag       saying "I've already been resurrected", you pass it in to that       other code, you then drop your references with Py_DECREF, and you       exit.  Your dealloc function will get called again later; you then       see you've already done that first resurrection, and you destroy       as normal.  Curiously enough, the typeobject actually needs to do       this twice: once for tp_finalize, once for tp_del.  (Assuming I       didn't completely misunderstand what the code was doing.)

    

kudos for trying to understand this. resurrection during destruction or finalization hurts my brain even though in many ways it makes sense.

-gps
 
">

(original) (raw)


On Thu, Oct 13, 2016 at 4:43 AM Larry Hastings <larry@hastings.org> wrote:

On 10/10/2016 10:38 PM, Chris Angelico wrote:
On Tue, Oct 11, 2016 at 8:14 AM, Larry Hastings  wrote:  
These hacks where we play games with the  
reference count are mostly removed in my branch.  
That's exactly what I would have said, because I was assuming that  
refcounts would be accurate. I'm not sure what you mean by "play games  
with",

By "playing games with reference counts", I mean code that purposely doesn't follow the rules of reference counting. Sadly, there are special cases that apparently \*are\* special enough to break the rules. Which made implementing "buffered reference counting" that much harder.

I currently know of two examples of this in CPython. In both instances, an object has a reference to another object, but \*deliberately\* does not increase the reference count of the object, in order to prevent keeping the other object alive. The implementation relies on the GIL to preserve correctness; without a GIL, it was much harder to ensure this code was correct. (And I'm still not 100% I've done it. More thinking needed.)

Those two examples are:
  1. PyWeakReference objects. The wr\_object pointer--the "reference" held by the weak reference object--points to an object, but does not increment the reference count. Worse yet, as already observed, PyWeakref\_GetObject() and PyWeakref\_GET\_OBJECT() don't increment the reference count, an inconvenient API decision from my perspective.
That a PyWeakReference object does not increment the reference count is the entire point of a weakref. The object wouldn't be destroyed and break the weak reference otherwise. Weak references could be implemented in a different manner - coordinate with the garbage collector to consider things who's only references come from weakrefs as collectable. That'd be an internal overhaul of the weakref implementation and potentially the gc.
  1. "Interned mortal" strings. When a string is both interned \*and\* mortal, it's stored in the static "interned" dict in unicodeobject.c--as both key and value--and then its's DECREF'd twice so those two references don't count. When the string is destroyed, unicode\_dealloc resurrects the string, reinstating those two references, then removes it from the "interned" dict, then destroys the string as normal.

yow. i don't even want to know the history of that one...

Resurrecting object also gave me a headache in the Gilectomy with this buffered reference counting scheme, but I think I have that figured out too. When you resurrect an object, it's generally because you're going to expose it to other subsystems that may incr / decr / otherwise inspect the reference count. Which means that code may buffer reference count changes. Which means you can't immediately destroy the object anymore. So: when you resurrect, you set the new reference count, you also set a flag saying "I've already been resurrected", you pass it in to that other code, you then drop your references with Py\_DECREF, and you exit. Your dealloc function will get called again later; you then see you've already done that first resurrection, and you destroy as normal. Curiously enough, the typeobject actually needs to do this twice: once for tp\_finalize, once for tp\_del. (Assuming I didn't completely misunderstand what the code was doing.)

kudos for trying to understand this. resurrection during destruction or finalization hurts my brain even though in many ways it makes sense.

-gps