[Python-Dev] PyDict_SetItem hook (original) (raw)

Collin Winter collinw at gmail.com
Fri Apr 3 19🔞04 CEST 2009

Previous message: [Python-Dev] PyDict_SetItem hook
Next message: [Python-Dev] PyDict_SetItem hook
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Apr 3, 2009 at 9:43 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

Thomas Wouters <thomas python.org> writes:

Really? Have you tried it? I get at least 5% noise between runs without any changes. I have gotten results that include negative run times. That's an implementation problem, not an issue with the tests themselves. Perhaps a better timing mechanism could be inspired from the timeit module. Perhaps the default numbers of iterations should be higher (many subtests run in less than 100ms on a modern CPU, which might be too low for accurate measurement). Perhaps the so-called "calibration" should just be disabled. etc. The tests in PyBench are not micro-benchmarks (they do way too much for that), Then I wonder what you call a micro-benchmark. Should it involve direct calls to low-level C API functions?

I agree that a suite of microbenchmarks is supremely useful: I would very much like to be able to isolate, say, raise statement performance. PyBench suffers from implementation defects that in its current incarnation make it unsuitable for this, though:

It does not effectively isolate component performance as it claims. When I was working on a change to BINARY_MODULO to make string formatting faster, PyBench would report that floating point math got slower, or that generator yields got slower. There is a lot of random noise in the results.
We have observed overall performance swings of 10-15% between runs on the same machine, using the same Python binary. Using the same binary on the same unloaded machine should give as close an answer to 0% as possible.
I wish PyBench actually did more isolation. Call.py:ComplexPythonFunctionCalls is on my mind right now; I wish it didn't put keyword arguments and **kwargs in the same microbenchmark.
In experimenting with gcc 4.4's FDO support, I produced a training load that resulted in a 15-30% performance improvement (depending on benchmark) across all benchmarks. Using this trained binary, PyBench slowed down by 10%.
I would like to see PyBench incorporate better statistics for indicating the significance of the observed performance difference.

I don't believe that these are insurmountable problems, though. A great contribution to Python performance work would be an improved version of PyBench that corrects these problems and offers more precise measurements. Is that something you might be interested in contributing to? As performance moves more into the wider consciousness, having good tools will become increasingly important.

Thanks, Collin

Previous message: [Python-Dev] PyDict_SetItem hook
Next message: [Python-Dev] PyDict_SetItem hook
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list