[Python-Dev] Python Benchmarks (original) (raw)

M.-A. Lemburg mal at egenix.com
Wed Jun 7 10:52:18 CEST 2006

Previous message: [Python-Dev] Python Benchmarks
Next message: [Python-Dev] test_ctypes failures on ppc64 debian
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Michael Chermside wrote:

Marc-Andre Lemburg writes:

Using the minimum looks like the way to go for calibration.

I wonder whether the same is true for the actual tests; since you're looking for the expected run-time, the minimum may not necessarily be the choice. No, you're not looking for the expected run-time. The expected run-time is a function of the speed of the CPU, the architechure of same, what else is running simultaneously -- perhaps even what music you choose to listen to that day. It is NOT a constant for a given piece of code, and is NOT what you are looking for.

I was thinking of the expected value of the test for run-time (the statistical value). This would likely have a better repeatability than e.g. the average (see Andrew's analysis) or the minimum which can be affected by artifacts due to the method of measurement (see Fredrik's analysis).

The downside is that you need quite a few data points to make a reasonable assumption on the value of the expected value.

Another problem is that of sometimes running into the situation where you have a distribution of values which is in fact the overlap of two (or more) different distributions (see Andrew's graphics).

In the end, the minimum is the best compromise, IMHO, since it is easy to get a good estimate fast.

pybench stores all measured times in the test pickle, so it is possible to apply different statistical methods later on - even after the test was run.

What you really want to do in benchmarking is to compare the performance of two (or more) different pieces of code. You do, of course, care about the real-world performance. So if you had two algorithms and one ran twice as fast when there were no context switches and 10 times slower when there was background activity on the machine, then you'd want prefer the algorithm that supports context switches. But that's not a realistic situation. What is far more common is that you run one test while listening to the Grateful Dead and another test while listening to Bach, and that (plus other random factors and the phase of the moon) causes one test to run faster than the other.

I wonder which one of the two ;-)

Taking the minimum time clearly subtracts some noise, which is a good thing when comparing performance for two or more pieces of code. It fails to account for the distribution of times, so if one piece of code occasionally gets lucky and takes far less time then minimum time won't be a good choice... but it would be tricky to design code that would be affected by the scheduler in this fashion even if you were explicitly trying!

Tried that and even though you can trick the scheduler into running your code without context switch, the time left to do benchmarks boils down to a milli-second - there's not a lot you can test in such a time interval.

What's worse: the available timers don't have good enough resolution to make the timings useful.

Later he continues:

Tim thinks that it's better to use short running tests and an accurate timer, accepting the added noise and counting on the user making sure that the noise level is at a minimum.

Since I like to give users the option of choosing for themselves, I'm going to make the choice of timer an option. I'm generally a fan of giving programmers choices. However, this is an area where we have demonstrated that even very competent programmers often have misunderstandings (read this thread for evidence!). So be very careful about giving such a choice: the default behavior should be chosen by people who think carefully about such things, and the documentation on the option should give a good explanation of the tradeoffs or at least a link to such an explanation.

I'll use good defaults (see yesterdays posting), which essentially means: use Tim's approach... until we all have OSes with real-time APIs using hardware timers instead of jiffie counters.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Jun 07 2006)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

Previous message: [Python-Dev] Python Benchmarks
Next message: [Python-Dev] test_ctypes failures on ppc64 debian
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list