[Python-Dev] Python Benchmarks (original) (raw)

M.-A. Lemburg mal at egenix.com
Fri Jun 2 15:29:14 CEST 2006


Andrew Dalke wrote:

M.-A. Lemburg:

The approach pybench is using is as follows: ... The calibration step is run multiple times and is used to calculate an average test overhead time. One of the changes that occured during the sprint was to change this algorithm to use the best time rather than the average. Using the average assumes a Gaussian distribution. Timing results are not. There is an absolute best but that's rarely reached due to background noise. It's more like a gamma distribution plus the minimum time. To show the distribution is non-Gaussian I ran the following def compute(): x = 0 for i in range(1000): for j in range(1000): x += 1 def bench(): t1 = time.time() compute() t2 = time.time() return t2-t1 times = [] for i in range(1000): times.append(bench()) print times The full distribution is attached as 'plot1.png' and the close up (range 0.45-0.65) as 'plot2.png'. Not a clean gamma function, but that's a closer match than an exponential. The gamma distribution looks more like a exponential function when the shape parameter is large. This corresponds to a large amount of noise in the system, so the run time is not close to the best time. This means the average approach works better when there is a lot of random background activity, which is not the usual case when I try to benchmark. When averaging a gamma distribution you'll end up with a bit of a skew, and I think the skew depends on the number of samples, reaching a limit point. Using the minimum time should be more precise because there is a definite lower bound and the machine should be stable. In my test above the first few results are 0.472838878632 0.473038911819 0.473326921463 0.473494052887 0.473829984665 I'm pretty certain the best time is 0.4725, or very close to that. But the average time is 0.58330151391 because of the long tail. Here are the last 6 results in my population of 1000 1.76353311539 1.79937505722 1.82750201225 2.01710510254 2.44861507416 2.90868496895 Granted, I hit a couple of web pages while doing this and my spam filter processed my mailbox in the background... There's probably some Markov modeling which would look at the number and distribution of samples so far and assuming a gamma distribution determine how many more samples are needed to get a good estimate of the absolute minumum time. But min(large enough samples) should work fine.

Thanks for the great analysis !

Using the minimum looks like the way to go for calibration.

I wonder whether the same is true for the actual tests; since you're looking for the expected run-time, the minimum may not necessarily be the choice. Then again, in both cases you are only looking at a small number of samples (20 for the calibration, 10 for the number of rounds), so this may be irrelevant.

BTW, did you run this test on Windows or a Unix machine ?

There's also an interesting second high at around 0.53. What could be causing this ?

If the whole suite runs in 50 seconds, the per-test run-times are far too small to be accurate. I usually adjust the warp factor so that each round takes 50 seconds. The stringbench.py I wrote uses the timeit algorithm which dynamically adjusts the test to run between 0.2 and 2 seconds.

That's why the timers being used by pybench will become a parameter that you can then select to adapt pybench it to the OS your running pybench on. Wasn't that decision a consequence of the problems found during the sprint?

It's a consequence of a discussion I had with Steve Holden and Tim Peters:

I believe that using wall-clock timers for benchmarking is not a good approach due to the high noise level. Process time timers typically have a lower resolution, but give a better picture of the actual run-time of your code and also don't exhibit as much noise as the wall-clock timer approach. Of course, you have to run the tests somewhat longer to get reasonable accuracy of the timings.

Tim thinks that it's better to use short running tests and an accurate timer, accepting the added noise and counting on the user making sure that the noise level is at a minimum.

Since I like to give users the option of choosing for themselves, I'm going to make the choice of timer an option.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Jun 02 2006)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


2006-07-03: EuroPython 2006, CERN, Switzerland 30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::



More information about the Python-Dev mailing list