gh-84436: Implement Immortal Objects by eduardo-elizondo · Pull Request #19474 · python/cpython (original) (raw)
@ariago Thanks for the reply, just went through it in detail. First of all, pretty much all of what you are saying in the first message is correct, though there are some additional details that would help complement what you wrote above. Let me try to reply to your message by breaking it down into what I believe are the main questions:
but the implementation instead starts to consider objects immortal as soon as their refcount reaches 31 bits, a much more reachable value on 64-bit machines
Overall, yes, the PEP talks about a very high value which is what we wanted to reflect there and left the value to be an implementation detail. Over here, we ended up using a saturated 32-bits because it provides a wide number of benefits:
- Performance: By using the lower 32-bits we help the compiler generate better code by directly manipulating the 32-bit registers (from the 64-bit refcount register). This gave a considerable perf improvement (around 1-2% geometric mean).
- Pointer Tagging: By leaving the upper 32 bits free, we could potentially se them for tagging the pointers to improve performance in future iterations.
- NoGIL refcount: The reference implementation of PEP703 also exploits this by using the unused bits to keep track of the biased reference counts.
Thus, this is why we decided to stick with 32-bit saturation for now which we could always revise in later versions if needed!
I have not tested it, but looking at the source code it seems that this kind of code would crash CPython on 64-bit machines
No need to test it, it will play out exactly as you mentioned! For this one, Eric and I talked about this exact scenario and even surfaced this during a language summit to the core-devs. However, we believe that it’s a very contrived example as it would require a large amount of asymmetric decrefs. In reality, what happens with very large objects such as these is that they either live throughout the entire execution of the application or there’s a combination of symmetric increfs and decrefs that prevent this from happening.
I did end up testing this in a very large machine with a large application where 16GB is relatively small with hundreds of older stable-ABI modules and didn’t see this issue materialize. Of course, this is just a single application but I was indeed on the lookout to make sure that this precise scenario was not prevalent. The good thing for this one though is that as we go into newer python versions and we keep updating our C-Extensions the risk here will become less and less prevalent.
a potential fix for this issue would be to change Py_INCREF on 64-bit platforms: instead of checking if (uint)refcount == 2**32-1, it could check if (Py_ssize_t)refcount < 0
At some point I did indeed try a solution similar to this, however, this ended up being in conflict with some of the refcount manipulations that the GC does on the two most significant bits and would ignore the fact that an object is immortal causing incorrect behavior. Not only that but also, for the reasons mentioned above, we wanted to keep the entire refcount arithmetic with just 32-bits.
A considered alternative (but never implemented) solution would be to make the GC immortal object aware and then go for the 64-bit solution, but this would required a bit more work on the gcmodule not to mention the added complexity in the module. There might be a simple solution there that I never figured out! However, this would still imply that we would all the bits that we now freed for future use cases. Given the acceptance of 703 we might as well keep refcounts as 32-bits.
...or, change _Py_IsImmortal to (Py_ssize_t)refcount<0 on 64-bit, and then just use _Py_IsImmortal in both INCREF and DECREF on both 32- and 64-bit platforms, and be done with it?
Using all 64-bits causes the issue that I pointed above with the GC. But we could indeed use this check for the lower 32 bits (which we already do in decref). This was actually the original implementation and as you mentioned, this is what we tried first. The reason we did a more specialized check in incref is just due to improved perf.