[Python-Dev] CPython optimization: storing reference counters outside of objects (original) (raw)

Artur Siekielski artur.siekielski at gmail.com
Sun May 22 01:57:55 CEST 2011


Hi. The problem with reference counters is that they are very often incremented/decremented, even for read-only algorithms (like traversal of a list). It has two drawbacks:

  1. CPU cache lines (64 bytes on X86) containing a beginning of a PyObject are very often invalidated, resulting in loosing many chances to use the CPU caches
  2. The copy-on-write after fork() optimization (Linux) is almost useless in CPython, because even if you don't modify data directly, refcounts are modified, and PyObjects with refcounts inside are spread all over process' memory (and one small refcount modification causes the whole page - 4kB - to be copied into a child process).

So an idea I would like to try is to move reference counts outside of PyObjects, to a contiguous block(s) of memory. PyObjects would have a pointer to a reference count inside this block. Doing this I think that

  1. The beginning of PyObject structs could be CPU-cached for a much longer time (small objects like ints could be fully cached). I don't know if having localized writes into the block with refcounts also help performance?
  2. copy-on-write after fork() will work much better, only the block with refcounts would be copied into a child process (for read-only algorithms)

However the drawback is that such design introduces a new level of indirection which is a pointer inside a PyObject instead of a direct value. Also it seems that the "block" with refcounts would have to be a non-trivial data structure.

I'm not a compiler/profiling expert so the main question is if such design can work, and maybe someone was thinking about something similar? And if CPython was profiled for CPU cache usage?



More information about the Python-Dev mailing list