University of Utah CHPC opteron cluster benchmarks
UNIVERSITY OF UTAH CHPC OPTERON CLUSTER BENCHMARKS, 3/01/05
Following are some explicit solvent pme md simulation benchmarks for a
large Opteron cluster at the U of Utah. This cluster (delicatearches) has 256
relatively slow dual 1.4 GHz Opterons interconnected with Myrinet mpich_gm and
gigabit ethernet mpich 1.2.6 or mpich2. The operating systems are 64 bit
GNU/Linux. Thanks to Tom Cheatham and the folks at U of Utah for access to
this nice resource. As a preamble to the benchmarks, let me say that this
stuff is incredibly fast for a linux cluster despite the slower processor
speeds; I got 1.99 nsec/day of factor ix (~91k atoms) and 5.4 nsec/day on
the jac benchmark (~23.5k atoms) for pmemd 8. All benchmarking was done with
the system under full load.
The Pathscale pathf90 compiler, version 2 was used to compile PMEMD and SANDER
for all these benchmarks. At the time of doing these benchmarks, the
Portland Group pgf90 compiler, version 5.2-4 was available, but there were
issues with compiler bugs and some other problem with the mpich-gm libraries.
Since then, I have had access to the lastest pgf90 release (6.0) and it seems
that critical bugfixes have occurred and performance is roughly comparable to
that of the Pathscale compiler (based on spotcheck benchmarks only).
Configuration files have been released for PMEMD as part of the new_configure
tarball on the amber.scripps.edu website.
Special Notations:
NA - Not applicable.
ND - Not done. In the case of SANDER, certain benchmarks were not done either
because scaling would be less than 50% or because SANDER can only do
parallel runs using a processor count that is a power of 2.
BENCHMARKS USING THE MYRINET MPICH_GM INTERCONNECT:
90906 Atoms, Constant Pressure Molecular Dynamics (Factor IX)
cutoff = 8.0 angstrom, timestep = 0.0015 psec,
orthogonal unit cell
#procs | PMEMD 8
| psec/day scaling(%)
|
1 | 60 NA
2 | 111 100
4 | 215 96
8 | 398 89
16 | 749 84
24 | 974 73
32 | 1336 75
40 | 1542 69
48 | 1662 62
56 | 1851 59
64 | 1994 54
23558 Atoms, Constant Volume Molecular Dynamics (JAC Benchmark)
cutoff = 9.0 angstrom, timestep = 0.001 psec,
orthogonal unit cell
JAC (joint amber charm) benchmark (constant volume), 23558 atoms pme,
explicit solvent simulation, mpich_gm interconnect; here I use default
skinnb values, which is a fair way to run this test (it has no effect on
output, it is a performance optimization, and specifying the wrong value can
de-optimize the code):
#procs | PMEMD 8 | SANDER 8
| psec/day scaling(%) | psec/day scaling(%)
| |
1 | 124 NA | 115 NA
2 | 235 100 | 231 100
4 | 461 98 | 424 92
8 | 873 93 | 752 81
16 | 1630 87 | 1170 63
32 | 2979 79 | 1464 40
48 | 4019 71 | ND ND
56 | 4547 69 | ND ND
64 | 4800 64 | ND ND
72 | 5400 64 | ND ND
BENCHMARKS USING GIGABIT ETHERNET MPICH2 (ver 1) INTERCONNECT:
(DON'T USE -DSLOW_NONBLOCKING_MPI FOR PMEMD)
90906 Atoms, Constant Pressure Molecular Dynamics (Factor IX)
cutoff = 8.0 angstrom, timestep = 0.0015 psec,
orthogonal unit cell
#procs | PMEMD 8
| psec/day scaling(%)
|
1 | 60 NA
2 | 109 100
4 | 193 89
6 | 273 84
8 | 350 81
12 | 499 76
16 | 642 74
24 | 864 66
32 | 997 57
NOT using -DSLOW_NONBLOCKING_MPI is optimal over the range of 4-32 procs
with mpich2
BENCHMARKS USING GIGABIT ETHERNET MPICH (ver 1.6.3) INTERCONNECT:
(DO USE -DSLOW_NONBLOCKING_MPI FOR PMEMD)
90906 Atoms, Constant Pressure Molecular Dynamics (Factor IX)
cutoff = 8.0 angstrom, timestep = 0.0015 psec,
orthogonal unit cell
#procs | PMEMD 8
| psec/day scaling(%)
|
1 | 60 NA
2 | 109 100
4 | 184 85
6 | 252 77
8 | 322 74
12 | 450 69
16 | 554 64
24 | 720 55
32 | 864 50
USING -DSLOW_NONBLOCKING_MPI is optimal over the range of 12-32 processors.
For fewer processors, you will do about as well or slightly better without
using -DSLOW_NONBLOCKING_MPI.
Bob Duke
NIEHS and
UNC-Chapel Hill Chemistry Dept.