Summary of Hans Othmer benchmarks

Quick overview

This document summarizes the benchmarks from Hans Othmer's research group in computational mathematical biology at the University of Utah Mathematics Department.

The benchmark results are available in two series of tables.

The longer first table contains the results obtained for all optimization levels on all machines.
The shorter second (executive summary) table contains the best single result obtained for each machine. This is probably the set of results that you want to look at.

Each table contains rows of results ordered in decreasing performance. Each row contains the

benchmark name,
vendor and model,
compiler optimization flags (possibly truncated, if they are very long),
benchmark time in seconds (on UNIX systems, this is the sum of the user and system times reported by the time command; this will not exceed the actual wall-clock time), and
relative performance (1.0000 is fastest).

Important disclaimer

Please remember that there is no answer to the commonly-asked question: ``What is the fastest machine?''.

Within the collection of benchmarks of which these are members, it is often possible to pick a single benchmark which rates a particular machine the fastest, and yet, on other benchmarks, the same machine may perform poorly with respect to competing models.

Particularly on modern RISC architectures, performance can be extremely sensitive to the quality of compiler optimizations; in at least one case, a speedup of a factor of fifty was seen over a range of compiler options on the same system.

The benchmarking of these programs has investigated a substantial number of compilation options and optimization levels, but it is possible that new releases of compilers, or alternative compilers, might improve the results significantly. We make reasonable efforts to keep our compiler and operating systems up-to-date with vendor software releases, but particularly with older machine models, or machines obtained on a short-term loan for evaluation purposes, it is frequently impossible to rerun the benchmarks after such new releases.

It is imperative with computer benchmarking to examine a range of benchmark programs, where those programs are chosen to represent the kinds of numerical computation that are important to you, before coming to a conclusion about which machine is best for your jobs.

Many other factors besides benchmark performance should affect computer purchasing decisions, including at least these:

vendor track records;
vendor future directions, and survivability in an increasingly competitive market;
hardware and software reliability;
ease of administration;
ease of maintenance;
ease of use;
initial cost;
ongoing cost-of-ownership, including license renewals and cost of repairs;
upgrade costs, particularly for memory and disk storage; and
availability of third-party commercial software that you expect to require.

Brief benchmark descriptions

All of the benchmark programs described in this document are written in highly-portable Fortran 77, and all represent real research programs using real data; they are not loop kernels or toy implementations. Program code sizes are given below.

`1dmidlr`

[5420 lines of Fortran code]

This program solves the Luo-Rudy equations with one diffusing variable and seven non-diffusing variables on an interval [0,size], using a fully-implicit Crank-Nicholson scheme. The nonlinear equations are solved using nksolve. The gates are eliminated algebraically.

`adilr`

[8871 lines of Fortran code]

adilr is an `alternating-direction implicit' two-dimensional partial differential equation solver. From adilr.f:

This program solves the time dependent 2D Luo-Rudy equations using an ADI scheme with a fully-implicit scheme in each direction. The nonlinear equations are solved using nksolve. Using funccn.f gives a full Crank-Nicholson method, while funcmid.f evaluates the nonlinearity using the midpoint rule. In either case, seven of the gating variables are eliminated algebraically, and only two equations are solved iteratively.

The space goes like NX*NY*1800 bytes, and the time like NSTEPS * (NX*NY)**p, where p is an unknown exponent > 0 ( p is unknown because nonlinear equations are being solved).

Most of the time is spent in function evaluations in func() (112 lines) and alphabetlr() (57 lines). These two routines, and those that they call, account for 75% of the execution time (on a SPARCstation 10/30) when profiling is eliminated, even though they represent only 169/8876 = 1.9% of the user code. Subroutine alphabetlr() evaluates complex expressions with 29 references to dexp(), and there are about 350M calls to dexp().

Thus, this code spends its time evaluating loopless complicated expressions. That is quite different from the behavior of linear algebra code, in which most of the time is spent in simple inner loops executed N**2 and N**3 times. This suggests that adilr would not do well on vector machines, compared to linear algebra code, but should fare better on modern RISC machines.

`pattern`

[5704 lines of Fortran code]

pattern is an implicit Crank-Nicolson time integrator for a two-dimensional parabolic PDE. Array dimensions are chosen to avoid cache conflicts, which can have a dramatic effect on the performance of this program on some machines; see parabolic/README for details.

The major time consumers in pattern are funct(), rhs(), vnorm(), and slngth().

funct() has a single very long vector loop of length NEQ (= NVAR * NX * NY = 2 * 256 * 256 = 131072), with a single statement

fv(i) = -(v(i) -vo(i)) + ht*(theta*fv(i)+ (1.0d0-theta)*f1(i))

rhs() has doubly-nested loops the inner bodies of which are executed NX*NY (== 65536) and 2*NX*NY (== 131072) times, with code that is essentially vector operations of the type

v = s*v + c v = v1 + s*v2 + v3 v = v + s*v2

for vectors v, v1, v2, v3, and scalars c and s.

vnorm() computes a scaled vector norm, sqrt(sum(k=1:n) v(k)*s(k)). slngth() has a simple loop involving a vector division s/v.