[Python-Dev] CPython optimization: storing reference counters outside of objects (original) (raw)

Artur Siekielski artur.siekielski at gmail.com
Tue May 24 00:07:27 CEST 2011


2011/5/23 Guido van Rossum <guido at python.org>:

Anyway, I'd like to have working copy-on-write in CPython - in the presence of GIL I find it important to have multiprocess programs optimized (and I think it's a common idiom that a parent process prepares some big data structure, and child "worker" processes do some read-only quering). That is the question though -- is the idiom commonly used?

In fact I came to the whole idea with this optimization because the idiom didn't work for me. I had a big word index built by a parent process, and than wanted the children to enable querying this index (I wanted to use all cores on a server). The index consumed 50% of RAM and after a few minutes the children consumed all RAM.

I find it common in languages like Java to use thread pools, in Python+Linux we have multiprocess pools if we want to use all cores, and in this setting having a working copy-on-write is really valuable.

Oh, and using explicit shared memory or mmap is much harder, because you have to map the whole object graph into bytes.

It doesn't seem to me that it would scale all that far, since it only works as long as all forked copies live on the same machine and run on the same symmetrical multi-core processor.

? I don't know about multi-processor systems, but on single-processor multi-core systems (which are common even on servers) and Linux it works.

Artur



More information about the Python-Dev mailing list