[Python-Dev] PyDict_SetItem hook (original) (raw)

Thomas Wouters thomas at python.org
Fri Apr 3 18:06:17 CEST 2009

Previous message: [Python-Dev] PyDict_SetItem hook
Next message: [Python-Dev] PyDict_SetItem hook
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Apr 3, 2009 at 11:27, Antoine Pitrou <solipsis at pitrou.net> wrote:

Thomas Wouters <thomas python.org> writes: > > > Pystone is pretty much a useless benchmark. If it measures anything, it's the speed of the bytecode dispatcher (and it doesn't measure it particularly well.) PyBench isn't any better, in my experience.

I don't think pybench is useless. It gives a lot of performance data about crucial internal operations of the interpreter. It is of course very little real-world, but conversely makes you know immediately where a performance regression has happened. (by contrast, if you witness a regression in a high-level benchmark, you still have a lot of investigation to do to find out where exactly something bad happened)

Really? Have you tried it? I get at least 5% noise between runs without any changes. I have gotten results that include negative run times. And yes, I tried all the different settings for calibration runs and timing mechanisms. The tests in PyBench are not micro-benchmarks (they do way too much for that), they don't try to minimize overhead or noise, but they are also not representative of real-world code. That doesn't just mean "you can't infer the affected operation from the test name", but "you can't infer anything." You can just be looking at differently borrowed runtime. I have in the past written patches to Python that improved every micro-benchmark and every real-world measurement I made, except PyBench. Trying to pinpoint the slowdown invariably lead to tests that did too much in the measurement loop, introduced too much noise in the "calibration" run or just spent their time in the measurement loop on doing setup and teardown of the test. Collin and Jeffrey have seen the exact same thing since starting work on Unladen Swallow.

So, sure, it might be "useful" if you have 10% or more difference across the board, and if you don't have access to anything but pybench and pystone.

Perhaps someone should start maintaining a suite of benchmarks, high-level and low-level; we currently have them all scattered around (pybench, pystone, stringbench, richard, iobench, and the various Unladen Swallow benchmarks; not to mention other third-party stuff that can be found in e.g. the Computer Language Shootout).

That's exactly what Collin proposed at the summits last week. Have you seen http://code.google.com/p/unladen-swallow/wiki/Benchmarks ? Please feel free to suggest more benchmarks to add :)

-- Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20090403/8ace99a8/attachment.htm>

Previous message: [Python-Dev] PyDict_SetItem hook
Next message: [Python-Dev] PyDict_SetItem hook
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list