Message 102502 - Python tracker (original) (raw)

[...]

_PyObject_Call 403 99,02 [...] affinity off: Functions Causing Most Work Name Samples % [...] _PyObject_Call 1.936 99,23 [...] _threadstartex 1.934 99,13

When we run on both cores, we get four times as many L1 instruction cache hits!

You mean we get 4x the number of cache /misses/, right?

This analysis is gratuitous if you can't evaluate/measure/calculate the actual cost (in proportion of total elapsed or CPU time) of the instruction cache misses. Perhaps it is actually negligible and the slowdown is caused by something else.

How best to combat this? I'll do some experiments on Windows. Perhaps we can identify cpu-bound threads and group them on a single core.

IMHO, the OS should handle this. I don't think ad-hoc platform-specific CPU affinity tweaks belong in the Python core.