PMEMD benchmarks for factor IX
All benchmark results shown here are for the Factor IX constant pressure
benchmark, a solvated system with a total of 90906 atoms. The time step used
was 1.5 fs, and shake was used for H bonds (ntc = 2, ntf = 2). The direct
force cutoff was 8.0 angstrom, and the default value for the pairlist skin
(skinnb) was used, which is either 1.0 angstrom or 2.0 angstrom, depending on
the software and the optimization options chosen during compilation. This is
a fairly typical big solvated protein simulation.
*******************************************************************************
IBM SP4 CLUSTER PERFORMANCE (EDINBURGH PARALLEL COMPUTING CENTRE):
*******************************************************************************
The IBM p690 Regatta at EPCC has 1280 1.3 GHz Power4 CPU's. The SP4 processors
are grouped into "Multi-Chip Modules" (MCM's). A MCM is composed of a group of
four dual processor chips with some shared cache. Thus, processors are
typically allocated in groups of eight, and the user is billed for CPU's in
groups of eight. However, higher throughput can sometimes be obtained by not
using all the CPU's on a MCM, presumably due to bottlenecks in the shared
components. We show results below in which 8 or 4 processors per MCM are in
use. The total number of processors indicated are the number allocated to you,
but not necessarily in use. Data is shown for PMEMD 8 optimized two different
ways - BC/MV = DIRFRC_BIGCACHE_OPT and MASSV (the default) versus
VO/MV = DIRFRC_VECT_OPT and MASSV, which gives worse performance at low
processor count and better performance at really high processor count. We
recommend use of up to 8x14 = 112 processors for PMEMD 8, and the default
build process (DIRFRC_BIGCACHE_OPT/MASSV) works best in that range.
90906 Atoms, Constant Pressure Molecular Dynamics (Factor IX)
#procs PMEMD BC/MV PMEMD VO/MV Sander 8
psec/day psec/day psec/day
16 (8x 2) 672 677 nd
32 (8x 4) 1125 1115 nd
64 (8x 8) 1975 1871 369
96 (8x12) 2743 2422 nd
112 (8x14) 2945 2605 nd
128 (8x16) 2516 2714 339
256 (4x32) 3049 3345 nd
320 (4x40) 2864 3551 nd
384 (4x48) 2833 3600 nd
BENCHMARKING RESULTS FOR PITTSBURGH SUPERCOMPUTER CENTER ALPHASERVER, LEMIEUX
*******************************************************************************
LEMIEUX PERFORMANCE, Compaq 1 GHz ES45 alphaserver, Quadrics interconnect
*******************************************************************************
With the Quadrics interconnect, it is possible to use one or two interconnect
"rails", with one rail being the default. Performance may be improved by use
of two rails by on the order of 10-20%, but at the time of benchmarking, there
appeared to be system problems associated with using two rails. Thus, at
present we only present data for one rail, and only recommend the use of one
rail. PMEMD optimization is the default optimization (no DIRFRC_*) specified,
which produced the best results over a range of values.
90906 Atoms, Constant Pressure Molecular Dynamics (Factor IX)
#procs PMEMD Sander 8
psec/day psec/day
64 (4x16) 1745 500
128 (4x32) 2615 1172
Benchmarks were run on NCSA's Itanium 1 Linux cluster, which has a Myrinet
interconnect. The Itanium 1 is significantly slower than the Itanium 2, but
the benchmarks show good PMEMD scaling on the Myrinet interconnect out to
about 32 processors, which is fairly typical. The Itanium chips have a huge
L3 cache, so PMEMD is best optimized using DIRFRC_BIGCACHE_OPT (the default).
*******************************************************************************
LINUX CLUSTER PERFORMANCE, 800 MHZ ITANIUM 1, MYRINET SWITCH (NCSA TITAN)
*******************************************************************************
90906 Atoms, Constant Pressure Molecular Dynamics (Factor IX)
#procs PMEMD Sander 8
psec/day psec/day
1 43 30
2 71 59
4 130 106
8 279 190
16 484 285
32 771 340
48 842 nd
64 939 404