[Python-Dev] PyDict_SetItem hook (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Fri Apr 3 18:43:58 CEST 2009
- Previous message: [Python-Dev] PyDict_SetItem hook
- Next message: [Python-Dev] PyDict_SetItem hook
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thomas Wouters <thomas python.org> writes:
Really? Have you tried it? I get at least 5% noise between runs without any changes. I have gotten results that include negative run times.
That's an implementation problem, not an issue with the tests themselves. Perhaps a better timing mechanism could be inspired from the timeit module. Perhaps the default numbers of iterations should be higher (many subtests run in less than 100ms on a modern CPU, which might be too low for accurate measurement). Perhaps the so-called "calibration" should just be disabled. etc.
The tests in PyBench are not micro-benchmarks (they do way too much for that),
Then I wonder what you call a micro-benchmark. Should it involve direct calls to low-level C API functions?
but they are also not representative of real-world code.
Representativity is not black or white. Is measuring Spitfire performance representative of the Genshi templating engine, or str.format-based templating? Regardless of the answer, it is still an interesting measurement.
That doesn't just mean "you can't infer the affected operation from the test name"
I'm not sure what you mean by that. If you introduce an optimization to make list comprehensions faster, it will certainly show up in the list comprehensions subtest, and probably in none of the other tests. Isn't it enough in terms of specificity?
Of course, some optimizations are interpreter-wide, and then the breakdown into individual subtests is less relevant.
I have in the past written patches to Python that improved every micro-benchmark and every real-world measurement I made, except PyBench.
Well, I didn't claim that pybench measures /everything/. That's why we have other benchmarks as well (stringbench, iobench, whatever). It does test a bunch of very common operations which are important in daily use of Python. If some important operation is missing, it's possible to add a new test.
Conversely, someone optimizing e.g. list comprehensions and trying to measure the impact using a set of so-called "real-world benchmarks" which don't involve any list comprehension in their critical path will not see any improvement in those "real-world benchmarks". Does it mean that the optimization is useless? No, certainly not. The world is not black and white.
That's exactly what Collin proposed at the summits last week. Have you seen http://code.google.com/p/unladen-swallow/wiki/Benchmarks
Yes, I've seen. I haven't tried it, I hope it can be run without installing the whole unladen-swallow suite?
These are the benchmarks I've had a tendency to use depending on the issue at hand: pybench, richards, stringbench, iobench, binary-trees (from the Computer Language Shootout). And various custom timeit runs :-)
Cheers
Antoine.
- Previous message: [Python-Dev] PyDict_SetItem hook
- Next message: [Python-Dev] PyDict_SetItem hook
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]