Programming language benchmarks (original) (raw)
Programming language benchmarks
This page contains the same program, implemented in the same way, in C, Ada, FORTRAN, Lisp, FORTH, Java, Perl, R and Ruby and run on a 300MHz Pentium under the "woody" Debian GNU/Linux, using exclusively tools that come with the Debian distribution, except for compiling Java I used the jikes compiler from IBM.
This is relevant for applications which involve intensive computations for multiple users, such as simulators with open web access (like warfarissimoand autovaca). For such applications, performance translates directly into cost: the number of machines you need to buy and maintain in order to serve a given number of users or to achieve a required response time is directly proportional to computation time.
The program we benchmarked computes the same 100-term polynomial 500,000 times, using exactly the same algorithm. In all programs the indices of the polynomial are kept in a local float vector. In this, the program only tests the quality of code which accesses local vectors and performs simple arithmetics in loops, and is free from differences in the standard library, operating system calls and, indeed, the presence of any advanced language features.
There are two versions of the program: (1) all computations are done inside a single function body; (2) each polynomial is computed through a call to a function; the vector of the 100 coefficients are kept in a statically dimensioned array inside this function; it is called 500000 times.
The execution speed of the first version was also explored throughbenchmarks on AMD 64x2.
Results are below (in seconds needed to perform the simple task outlined above). Click on the link with a language name to see the details (program and how to compile and run) for each language:
Language | single body (s) | with call (s) |
---|---|---|
FORTRAN, g77 V2.95.4 | 2.73 | 2.73 |
Ada 95, gnat V3.13p | 2.73 | 2.74 |
C, hand optimized **, gcc V2.95.4 | 2.73 | |
Java, gcj V3.0 | 3.03 | 15.53 |
D, gcc V4.0.3+ | 1)3.43 | 1)3.98 |
C, gcc V2.95.4 | 3.61 | 3.57 |
R translated to lisp using R2cl v0.1and compiled with cmucl | 3.69 | |
Lisp, CMU Common Lisp V3.0.8 18c+, build 3030 | 4.69 | 10.69 |
Java, jikes V1.15 (bytecompiled) | 8.23 | 13.54 |
FORTH, hand optimized ** Gforth 0.6.1 | 1)18.21 | |
FORTH,** Gforth 0.6.1 | 1)27.26 | |
Python** +psyco (interpreted) | 1)168.50 | |
Perl, more optimized$ V5.6.1 (natively compiled) | 209.20 | |
Perl, more optimized$ V5.6.1 (interpreted) | 258.64 | |
Perl, hand optimized*** V5.6.1 (bytecompiled) | 306.18 | |
Perl* V5.6.1 (natively compiled) | 367.23 | |
Python** V2.1.2 (interpreted) | 505.50 | |
Perl* V5.6.1 (bytecompiled) | 515.04 | |
RUBY*** (interpreted) | 1074.52 | |
R V1.5.1 (interpreted) | 5662.64 |
* Contributed by Matei Conovici
** Contributed by Mihai Manolescu
*** Contributed by zgrim
$ Contributed by Radu Greab
- Contributed by Mihai Militaru
1)estimated
- naive (no hand-optimization) gcc-compiled C is some 30% slower than fortran and ada, otherwise none is affected by a detectable overhead when allocating a local vector in a function;
- the gcj compiler does a good work, increasing the speed of code more than twice; however, allocation of local variables (the `pol' array for example) offsets all the gain obtained through native code compilation;
- the performance of the CMU-CL lisp compiler was stellar; it was better than Java, even compiled natively, despite its highly dynamic nature; of 10.69 seconds, some 6.5 are for garbage collection.
- bytecompiled Perl code is comparable in speed with interpreted R code and is not much slower than Perl code compiled natively; notice the fact that in R we use vectorial operators which also make the text very compact.
Discussion. The relevance of these tests rests in the fact that in any language, performance is governed by how quickly operations, loop, access to array elements, function calls, etc, are performed. Machine-oriented languages (C, FORTRAN, Ada and to some extent Java) perform on average 100 times faster than human-expression-oriented languages (Perl, R, Python, Ruby).
Programming a 1 GHz Pentium [fastest, 1000$, 2002 PC] in Perl is like programming a 10 MHz something [overclocked, 50$, 1982 Z80-based ZX Spectrum] in FORTRAN!
This is no big surprise as we are looking at the classical trade-off between the power of a language (allowing the programmer to express something in a compact way) and the performace of an implementation (which classically was related to the language being close to the machine representation).
However, the huge exception is CommonLisp. Lisp is the most powerful language that is, representing the classical extreme choice for expressive power instead of efficient implementation.
This choice seems to have paid off. Given enough expression power it was possible to write a compiler that competes with any machine-oriented language.
Conclusions (for me):
- For new projects in which one is free to choose the language, one should choose Lisp.
An example is the offline generation of web views (such as this) of complex, heterogenous databases where there are two requirements: quick implementation of new ideas and a preference for updating the static web site in 10 minutes rather than 15 hours. - If efficiency is very critical and only if the application must be multithreaded, one should choose Ada. Multithreaded, online simulation servers, like the ones mentioned above, are typical examples.
- Other languages, with which one might be highly familiar due to historical reasons, could become efficient languages by being translated into CommonLisp and compiled with CMU-CL.
Appendix. Details of each program and how to compile and run it:
FORTRAN
tespol.f
program tespol
dimension pol(100)
real pol
integer i,j,n
real su,pu,mu
real x
n = 500000
x = 0.2
mu = 10.0
pu = 0.0
do i = 1,n
do j=1,100
mu = (mu + 2.0) / 2.0
pol(j) = mu
enddo
su = 0.0
do j=1,100
su = x * su + pol(j)
enddo
pu = pu + su
enddo
write (*,*) pu
end
Compile and run with:
f77 tespol.f -O6 -o tespol time ./tespol
tespol2.f
function dopoly(x)
real x
real su,mu
integer j
dimension pol(100)
real pol
do j=1,100
mu = (mu + 2.0) / 2.0
pol(j) = mu
enddo
su = 0.0
do j=1,100
su = x * su + pol(j)
enddo
dopoly = su
end
program tespol
integer i
real pu
real x
n = 500000
x = 0.2
mu = 10.0
pu = 0.0
do i = 1,n
pu = pu + dopoly(x)
enddo
write (*,*) pu
end
Compile and run with:
f77 tespol2.f -O6 -o tespol2 time ./tespol2
Ada-95
tpol.adb
with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_Io; use Ada.Text_Io;
procedure Tpol is
Pol: array(1..100) of Float; N: Integer:= Integer'Value(Argument(1)); X: Float:= Float'Value(Argument(2)); S: Float; Mu: Float:= 10.0; Pu: Float:= 0.0;
begin for I in 1..N loop for J in 1..100 loop Mu := (Mu + 2.0) / 2.0; Pol(J) := Mu; end loop; S := 0.0; for J in 1..100 loop S := X * S + Pol(J); end loop; Pu := Pu+S; end loop; Put_Line(Float'Image(Pu)); end Tpol;
Compile and run with:
gnatmake -O6 tpol time ./tpol 500000 0.2
tpol2.adb
with Ada.Command_Line; use Ada.Command_Line; with Ada.Text_Io; use Ada.Text_Io;
procedure Tpol2 is
N: Integer:= Integer'Value(Argument(1)); X: Float:= Float'Value(Argument(2)); Pu: Float:= 0.0;
function Dopol(X: Float) return Float is
Pol: array(1..100) of Float;
S: Float;
Mu: Float:= 10.0;
begin for J in 1..100 loop Mu := (Mu + 2.0) / 2.0; Pol(J) := Mu; end loop; S := 0.0; for J in 1..100 loop S := X * S + Pol(J); end loop; return S; end Dopol;
begin for I in 1..N loop Pu := Pu+Dopol(X); end loop; Put_Line(Float'Image(Pu)); end Tpol2;
Compile and run with:
gnatmake -O6 tpol2 time ./tpol2 500000 0.2
Java
public class tpoly {
static public void main(String argv[]) {
float mu = (float)10.0;
float x,s;
float pu = (float)0.0;
int su, i, j, n;
float pol[] = new float[100];
n = 500000;
x = (float)0.2;
for(i=0; i<n; i++) {
for (j=0; j<100; j++) {
mu = (mu + (float)2.0) / (float)2.0;
pol[j] = mu;
}
s = (float)0.0;
for (j=0; j<100; j++) {
s = x*s + pol[j];
}
pu += s;
}
System.out.println(pu);
} }
Compile and run with:
gcj-3.0 --main=tpoly --classpath=/usr/share/java/libgcj.jar -O2 -o tpoly tpoly.java time ./tpoly
or compile to bytecode and run with:
jikes -O tpoly.java time java tpoly
public class tpoly2 {
static float dopoly(float x) {
float pol[] = new float[100];
int j;
float mu = (float)10.0;
float s;
for (j=0; j<100; j++) {
mu = (mu + (float)2.0) / (float)2.0;
pol[j] = mu;
}
s = (float)0.0;
for (j=0; j<100; j++) {
s = x*s + pol[j];
}
return s;
}
static public void main(String argv[]) {
float x;
float pu = (float)0.0;
int i, n;
n = 500000;
x = (float)0.2;
for(i=0; i<n; i++) {
pu += dopoly(x);
}
System.out.println(pu);
} }
Compile and run with:
gcj-3.0 --main=tpoly --classpath=/usr/share/java/libgcj.jar -O2 -o tpoly2 tpoly2.java time ./tpoly2
or compile to bytecode and run with:
jikes -O tpoly2.java time java tpoly2
C
tepol.c
#include <stdio.h> #include <stdlib.h>
main(short argc, char **argv) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100];
n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } pu += s; } printf("%f\n",pu); }
Compile and run with:
gcc -O6 tepol.c -o tepol time ./tepol 500000 0.2
tepol2.c
#include <stdio.h> #include <stdlib.h>
float dopoly(float x) {
float mu = 10.0; float s; int j; float pol[100];
for (j=0; j<100; j++) { pol[j] = mu = (mu + 2.0) / 2.0; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } return s; }
main(short argc, char **argv) { float x; float pu = 0.0; int i, n;
n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { pu += dopoly(x); } printf("%f\n",pu); }
Compile and run with:
gcc -O6 tepol2.c -o tepol2 time ./tepol2 500000 0.2
Hand optimised C
bench1.c (by Mihai Manolescu)
#include <stdio.h> #include <stdlib.h>
main(short argc, char **argv) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100]; register tm1; register int tp1;
n = atol(argv[1]); x = atof(argv[2]); for(i=0; i<n; i++) { tp1=2; tm1=1/2.0; for (j=0; j<100; j++) { pol[j] = mu = (mu + tp1) * tm1; } s = 0.0; for (j=0; j<100; j++) { s = x*s + pol[j]; } pu += s; } printf("%f\n",pu); }
Common-Lisp
testpol.lisp
(defun eval-pol (n x) (declare (fixnum n) (single-float x)) (let ((su 0.0) (mu 10.0) (pu 0.0) (pol (make-array 100 :element-type 'single-float))) (declare (single-float su) (single-float mu) (single-float pu)) (dotimes (i n) (declare (fixnum i)) (setf su 0.0) (dotimes (j 100) (declare (fixnum j)) (setf mu (the single-float (/ (+ mu 2.0) 2.0))) (setf (aref pol j) (the single-float mu))) (dotimes (j 100) (declare (fixnum j)) (setf su (the single-float (+ (aref pol j) (the single-float (* su x)))))) (setf pu (the single-float (+ pu su))) ) (prin1 pu) ))
Start CMU Common lisp:
lisp
... then load, compile and run with:
(load "testpol.lisp") (compile #'eval-pol) (time (eval-pol 500000 0.2))
testpol2.lisp
(proclaim '(optimize))
(declaim (start-block dopoly eval-pol))
(defun dopoly (x) (declare (ftype (function (single-float) single-float) dopoly)) (declare (single-float x)) (let ((su 0.0) (mu 10.0) (pol (make-array 100 :element-type 'single-float))) (declare (single-float su) (single-float mu)) (dotimes (j 100) (declare (fixnum j)) (setf mu (the single-float (/ (the single-float (+ mu 2.0)) 2.0))) (setf (aref pol j) (the single-float mu))) (dotimes (j 100) (declare (fixnum j)) (setf su (the single-float (+ (aref pol j) (the single-float (* su x)))))) su) )
(defun eval-pol (n x) (declare (fixnum n) (single-float x)) (let ((pu 0.0)) (declare (single-float pu)) (dotimes (i n) (declare (fixnum i)) (setf pu (the single-float (+ pu (the single-float (dopoly x))))) ) (prin1 pu) ))
(declaim (end-block))
Start CMU Common lisp:
lisp
... then load, compile and run with:
(load "testpol2.lisp") (compile #'eval-pol) (time (eval-pol 500000 0.2))
D
tepol.d (by Mihai Militaru)
//tepol.d
import std.stdio; import std.conv;
int main(char[][] args) { float mu = 10.0; float x,s; float pu = 0.0; int su, i, j, n; float pol[100];
n = toLong(args[1]);
x = toFloat(args[2]);
for(i=0; i<n; i++)
{
for (j=0; j<100; j++)
{
pol[j] = mu = (mu + 2.0) / 2.0;
}
s = 0.0;
for (j=0; j%lt;100; j++)
{
s = x*s + pol[j];
}
pu += s;
}
writefln("%f\n",pu);
return 0;
}
Compile with:
gdmd -O -release -inline tepol.d
Run with:
time ./tepol 500000 0.2
tepol2.d (by Mihai Militaru)
//tepol2.d
import std.stdio; import std.conv;
float dopoly(float x) { float mu = 10.0; float s; int j; float pol[100];
for (j=0; j<100; j++)
{
pol[j] = mu = (mu + 2.0) / 2.0;
}
s = 0.0;
for (j=0; j<100; j++)
{
s = x*s + pol[j];
}
return s;
}
int main(char[][] args) { float x; float pu = 0.0; int i, n;
n = toLong(args[1]);
x = toFloat(args[2]);
for(i=0; i<n; i++)
{
pu += dopoly(x);
}
printf("%f\n",pu);
return 0;
}
Compile with:
gdmd -O -release -inline tepol2.d
Run with:
time ./tepol2 500000 0.2
Python
tpytpol.py (by Mihai Manolescu)
n = 500000 x = 0.2
def t(x): mu = 10.0 pu = 0.0 pol =[0] * 100 r = range(0,100)
for i in range(0,n):
for j in r:
pol[j] = mu = (mu + 2.0) / 2.0
su = 0.0
for j in r:
su = x * su + pol[j]
pu = pu + su
print pu
t(x)
Run with:
time python tpytpol.py
Perl
tperlpol.pl (by Matei Conovici)
#! /usr/bin/perl
n=n = n=ARGV[0]; x=x = x=ARGV[1];
$mu = 10; $pu = 0;
@pol = ();
for ($i = 0; i<i < i<n; $i++) { for ($j = 0; j<100;j < 100; j<100;j++) { mu=(mu = (mu=(mu + 2) / 2;
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mi>o</mi><mi>l</mi><mo stretchy="false">[</mo></mrow><annotation encoding="application/x-tex">pol[</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal">p</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mopen">[</span></span></span></span>j] = $mu;
}
$s = 0;
for ($j = 0; <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>j</mi><mo><</mo><mn>100</mn><mo separator="true">;</mo></mrow><annotation encoding="application/x-tex">j < 100; </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.854em;vertical-align:-0.1944em;"></span><span class="mord mathnormal" style="margin-right:0.05724em;">j</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel"><</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:0.8389em;vertical-align:-0.1944em;"></span><span class="mord">100</span><span class="mpunct">;</span></span></span></span>j++) {
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo>=</mo></mrow><annotation encoding="application/x-tex">s = </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">s</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span></span></span></span>x * <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo>+</mo></mrow><annotation encoding="application/x-tex">s + </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">s</span><span class="mord">+</span></span></span></span>pol[$j];
}
pu+=pu += pu+=s; }
print "$pu\n";
Compile and run with:
perlcc -O -o tperlpol tperlpol.pl time ./tperlpol 500000 0.2
or, for byte compilation:
perlcc -B -o tperlpol tperlpol.pl time ./tperlpol 500000 0.2
Perl (hand optimized)
pl.pl (by zgrim)
my($n,$x,$mu,$pu,@pol)=(500000,0.2,10,0,(0..99)); for(1..$n) { for (0..$#pol) { pol[pol[pol[] = mu=(mu = ( mu=(mu + 2 ) * 0.5 } $s = 0; for(0..$#pol) { s=s = s=x * s+s + s+pol[$] } pu+=pu += pu+=s } print qq[$pu\n];
Compile and run with:
perlcc -B -o pl pl.pl time ./pl
Perl (hand optimized for native compilation)
perl3.pl by Radu Greab
#!/usr/bin/perl -w
use strict;
my n=n = n=ARGV[0]; my x=x = x=ARGV[1];
my $mu = 10; my $pu = 0;
my @pol;
foreach (0 .. $n - 1) { foreach (0 .. 99) { pol[pol[pol[_] = mu=(mu = (mu=(mu + 2) / 2; }
my $s = 0;
foreach (0 .. 99) {
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo>=</mo></mrow><annotation encoding="application/x-tex">s = </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">s</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span></span></span></span>x * <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi><mo>+</mo></mrow><annotation encoding="application/x-tex">s + </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6667em;vertical-align:-0.0833em;"></span><span class="mord mathnormal">s</span><span class="mord">+</span></span></span></span>pol[$_];
}
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>p</mi><mi>u</mi><mo>+</mo><mo>=</mo></mrow><annotation encoding="application/x-tex">pu += </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7778em;vertical-align:-0.1944em;"></span><span class="mord mathnormal">p</span><span class="mord mathnormal">u</span><span class="mord">+</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span></span></span></span>s;
}
print "$pu\n";
Compile and run with:
perlcc -O -o perl3 perl3.pl time ./perl3
R (or S, Splus)
trpol2.R
trpol2 <- function(n,x) { mu <<- 10.0 pu <<- 0.0 pol <<- 1:100 tp1 <<- 2.0 tm1 <<- 1/2.0 for (i in 1:n) { for (j in 1:100) { mu <<- (mu + tp1) * tm1 pol[j] <<- mu } s <<- 0.0; for (j in 1:100) { s <<- x*s + pol[j]; } pu <- s+pu; } print(pu) }
Execute the program with:
time echo " source("trpol2.R") ; trpol2(500000,0.2) ; q() " | R --no-save
Ruby (by zgrim)
pol.rb
n = 500000 x = 0.2 mu = 10 pu = 0 pol = []
n.times do 0.upto(99) { |j| pol[j] = mu = (mu + 2) * 0.5 } s = 0 0.upto(99) { |j| s = x * s + pol[j] } pu += s end
print pu,"\n"
Execute the program with:
time ruby pol.rb
FORTH (by Mihai Manolescu)
bench.fs
--- bench.fs:
\ gforth source code for speed test \ mmihai, sept '04 \
100 floats allocate throw constant farray
: init_farray 0e0 farray 100 0 do dup fdup f! float+ loop drop fdrop ;
: my_loop init_farray 0e0 \ x pu 10e0 \ x pu mu
0 do
farray
100 0 do
2e0 f+ f2/
dup fdup f!
float+
loop
drop \ x pu mu
frot frot \ mu x pu
fswap 0e0 \ mu pu x su
farray
100 0 do
fover f*
dup f@ f+
float+
loop
drop
frot f+ \ mu x pu
frot \ x pu mu
loop
fdrop f. cr fdrop
;
0.2e0 500000 my_loop
Execute the program with:
time gforth-fast bench.fs -e bye
FORTH, hand optimised (by Mihai Manolescu)
bench2.fs
\ gforth source code for speed test \ mmihai, sept '04 \
falign here 100 floats allot constant farray
: my_loop 0e0 \ x pu 10e0 \ x pu mu
0 do
farray
2e0 fswap
100 0 do
fover f+ fover f/
dup fdup f!
float+
loop
fnip
drop \ x pu mu
frot frot \ mu x pu
fswap 0e0 \ mu pu x su
farray
100 0 do
dup
fover f*
f@ f+
float+
loop
drop
frot f+ \ mu x pu
frot \ x pu mu
loop
fdrop f. cr fdrop
;
0.2e0 500000 my_loop
Execute the program with:
time gforth-fast bench2.fs -e bye
Copyright (c) 2003 Alexandru Dan Corlanet al.