[Python-Dev] Python Benchmarks (original) (raw)

Tim Peters tim.peters at gmail.com
Sat Jun 3 01:44:07 CEST 2006


[MAL]

Using the minimum looks like the way to go for calibration.

[Terry Reedy]

Or possibly the median.

[Andrew Dalke]

Why? I can't think of why that's more useful than the minimum time.

A lot of things get mixed up here ;-) The mean is actually useful if you're using a poor-resolution timer with a fast test. For example, suppose a test takes 1/10th the time of the span between counter ticks. Then, "on average", in 9 runs out of 10 the reported elapsed time is 0 ticks, and in 1 run out of 10 the reported time is 1 tick. 0 and 1 are both wrong, but the mean (1/10) is correct.

So there can be sense to that. Then people vaguely recall that the median is more robust than the mean, and all sense goes out the window ;-)

My answer is to use the timer with the best resolution the machine has. Using the mean is a way to worm around timer quantization artifacts, but it's easier and clearer to use a timer with resolution so fine that quantization doesn't make a lick of real difference. Forcing a test to run for a long time is another way to make timer quantization irrelevant, but then you're also vastly increasing chances for other processes to disturb what you're testing.

I liked benchmarking on Crays in the good old days. No time-sharing, no virtual memory, and the OS believed to its core that its primary purpose was to set the base address once at the start of a job so the Fortran code could scream. Test times were reproducible to the nanosecond with no effort. Running on a modern box for a few microseconds at a time is a way to approximate that, provided you measure the minimum time with a high-resolution timer :-)



More information about the Python-Dev mailing list