Implementing Amber on a Beowulf type cluster

(Example of using AMBER on a Linux PC cluster)
    Ristra Cluster

Preamble

The following information describes the general experience gained by us (Carlos Simmerling and Viktor Hornak at SUNY Stony Brook) on our cluster named "Ristra". While we will attempt to keep this information up to date, this area changes rapidly and we cannot guarantee that any of the software or hardware described below will be available. Also remember that this is not a HOW-TO guide giving step by step instructions on how to setup the cluster. There are many ways to do that and we only briefly mention how our configuration is done. It is also important to note that many variations of Linux, compilers and hardware are available; all we can do is relate the results of what we have used and what has worked for us. Your results may vary for a variety of reasons.

Computing nodes

The Ristra cluster is composed of ~70 nodes: first 20 are dual processor Pentium nodes (2xPIII 800MHz, 256MB PC133 memory), other ~50 are 1.4GHz Athlons (most have 256MB DDR memory, a couple have 512MB for more memory intensive calculations). All nodes are configured as diskless workstations, i.e. they download kernel and other necessary files from the server. Less then 256MB memory per CPU proved to be insufficient for running amber (to avoid recompiling amber and other hassles). All nodes are on private subnet (192.168.10.*). They all have Intel EtherExpress 10/100Mb cards. Intel cards are a good choice for diskless nodes because they either allow to burn their eeprom or support PXE booting (Preboot Execution Environment). The latter is used with our diskless nodes to boot them from the net (see more details below).

All nodes are running slightly modified version of RedHat Linux 7.3 with 2.4.20 kernel. The configuration of diskless clients is not quite straightforward, we recommend using PXE capable network cards in combination with pxelinux (see PXELINUX link below). Another possibility is to program the eeprom chips of the network cards that allow it (see etherboot project link below) but this is a bit more complicated with no additional advantages. There are many resources available giving details on such configurations:


Servers

A cluster server should be a reliable machine with the ability to have a big disk storage. It needs to have two network cards. One is providing connection to the outside (with public IP address) while the second card with private IP serves as an entry point to the cluster. It should never be necessary to login to individual nodes, all jobs can be run from the server. Spend time (and money if you can) to pick a reliable disk storage. Go with hot swappable RAID if possible. There are quite a few options: a hardware RAID with a dedicated card and an array of SCSI disks (most reliable but most expensive), hardware RAID of IDE disks presenting itself as a single SCSI drive, or software RAID with IDE/SCSI disks (pick your IDE disks carefully!). The array is nfs-mounted on each cluster node. The server runs static dhcp for the cluster, tftp service to allow downloading kernel to the node during booting, NIS server providing login information for all user accounts, ntpd time server for cluster nodes, etc. You can run all services with one server only, or you can delegate different services to more servers.

Network is 100MBs, with an HP ProCurve 4000M switch providing 80 ports (with all modules in). Cheaper switches are not worth it, this is not a good place to save on. Amber on recent linux kernels scales pretty well with straight Gigabit (1000MBs) networking. However, we ran into difficulties while running both 100MBs and 1GBs nodes using the same server. Therefore it's recommended to setup 1GBs nodes with a separate server.


Compiling Amber8

Compilation of amber on a linux platform may be challenging due to a great variety of linux distributions, compilers, and system and MPI libraries.

Unlike previous versions of Amber (follow this link for compilation of amber7), Amber8 does not use Machine files anymore. Instead configure script is used to set preprocessor, compiler and linker options. Another important change is that compilation now requires Fortran90 compiler (which effectively rules out GNU g77 as an option).

Because Intel Fortran and C compilers give excellent performance for the compiled code (at least for Intel processors) we will stick to these compilers in the following. In addition, both Fortran and C compilers are available as free unsupported downloads from Intel site, at least at the time of this writing.

Compiling Amber8 with Intel compilers and MPICH:

Remember to set AMBERHOME and MPICH_HOME (for parallel compilations) environmental variables before you start compiling (export is used for setting environmental variables in the following examples; use setenv for csh variants). Here is the basic procedure for getting things set up, compiled and tested:

1. Install Intel compilers (ifort8.0, icc8.0)

Get compilers (non-commercial, unsupported version or supported, paid-for version) from Intel website (http://developer.intel.com/software/products/noncom/) and follow their procedure to install them. (Note: be sure to look at the license requirements under the "FAQ" link. Most academic research groups will not qualify for the non-commerical product.)

Once compilers are installed, set compiler environment, e.g. from your .bashrc:

# Intel C Compiler (icc8.0), Fortran compiler (ifort8.0)
source /opt/intel_cc_80/bin/iccvars.sh
source /opt/intel_fc_80/bin/ifortvars.sh

2. Compile and install MPI libraries (mpich-1.2.5.2)

To compile MPICH libraries with Intel compilers, configure for example like this:

export CC=icc
export FC=ifort
./configure --prefix=/usr/local/mpich-1.2.5.2_icc -cc=$CC -fc=$FC
make install
If you use GCC to compile mpich, configure in the following way:
export CC=gcc
export FC=ifort
./configure --prefix=/usr/local/mpich-1.2.5.2_gcc -cc=$CC -fc=$FC
make install

3. Compile and install amber8

export AMBERHOME="/usr/local/amber8_mpich"
export MPICH_HOME="/usr/local/mpich-1.2.5.2_icc"
3a. Compile a serial (without MPI) version of all programs:
cd $AMBERHOME/src
./configure ifort
make serial
cd $AMBERHOME/test
make test | tee test.serial.out
3b. Compile a parallel version of sander (and sander.LES):
cd $AMBERHOME/src
make clean
./configure -mpich ifort
make parallel
cd $AMBERHOME/test
export DO_PARALLEL="/usr/local/mpich-1.2.5.2_icc/bin/mpirun -np 2 "
make test.sander | tee test.sander.parallel.out
make test.sander.LES | tee test.sander.LES.parallel.out

Compiling Amber8 with Intel compilers and LAM MPI:

Alternatively, if you want to use LAM MPI libraries instead of MPICH, here is a short description how the above procedure would be modified:

step 2 from above would be:

export CXX=icc
export FC=ifort
export CFLAGS=-static
export CXXFLAGS=-static
export FFLAGS=-static
./configure --prefix=/usr/local/lam-7.0.4_icc --without-romio --without-profiling
make
make install

step 3b from above would be:

export AMBERHOME="/usr/local/amber8_lam"
export LAM_HOME="/usr/local/lam-7.0.4_icc"
./configure -lam ifort
make parallel
test parallel version of sander with LAM:

Make sure that LAM commands are in your path, e.g. put:
export PATH=/usr/local/lam-7.0.4_icc/bin:$PATH
into your .bashrc file (and reload it).

Create a file, for example called machinefile and put it into $AMBERHOME/test. This file defines "LAM Universe", i.e. which CPUs will your job run on. An example of the machinefile, which specifies that a current machine with 2 CPUs will be used might simply look like this:

localhost cpu=2

Make sure you can login to this machine without a password. Please refer to LAM manuals which describe how to run MPI jobs with LAM in full detail.

cd $AMBERHOME/test
lamboot -v machinefile  
lamnodes (just check your universe)
export DO_PARALLEL="/usr/local/lam-7.0.4_icc/bin/mpirun -np 2 "
make test.sander | tee test.sander.parallel.out
make test.sander.LES | tee test.sander.LES.parallel.out
lamhalt

Compiling REM with Amber8

If you need to compile Replica Exchange (REM) code into sander, see section 5.18 of amber8 manual. In short, follow a serial build (step 3a) with the slightly modified parallel build:

cd $AMBERHOME/src
make clean
./configure -mpich ifort
make AMBERBUILDFLAGS='-DREM' parallel
cd $AMBERHOME/test
export DO_PARALLEL="/usr/local/mpich-1.2.5.2_icc/bin/mpirun -np 4 "
make test.sander.REM | tee test.sander.REM.out
Note: Set number of processors to at least 4 (you can do it even if you are on a dual cpu machine).

If you're getting the following error during running REM tests ("make test.sander.REM" above):

 cd rem_gb; Run.rem
 /bin/sh: Run.rem: command not found
 make: *** [test.sander.REM] Error 127
Go to Makefile, find the 3 REM tests and replace "Run.rem" by "./Run.rem" (the current dir specifier "./" slipped inadvertently from in front of Run.rem command)
Here is the list of several common Linux distributions, which worked for us (meaning that they minimally passed 'make test.sander'): Intel compilers (ifort/icc) version 8 were used (Build 20031016Z); mpich-1.2.5.2 was compiled with icc or gcc; lam-7.0.4 was compiled with icc.

Known problems with Amber8:

Notes:

Distribution specific notes:

Scheduling with PBS

Compiling and setting up

Running jobs on clusters with 20+ nodes without any batch/scheduling system starts to be very impractical. Portable Batch System (PBS) is one of the most common scheduling systems for linux clusters. There is also a better supported commercial version of PBS called PBS-Pro. Unfortunately, development on free OpenPBS seems to have shifted entirely into a commercial PBS-Pro. As number of nodes in your cluster increases, expect more and more problems with OpenPBS, which is not very robust and lacks fault tolerance features (node crashes, jobs crashes not cleaning up, etc.).

Here is a short description on installing and configuring PBS.

Download sources (not RPMs) from OpenPBS website. Configure and compile: if you encounter problems on compilation, check PBS mailing list. It is the best resource (and unfortunately the only one) for troubleshooting your problems.

Follow the instructions in the Administration Guide to configure Server, Scheduler and Moms (clients). It is advantageous to split the nodes into different classes by their properties. For example, nodes with more memory may form one class, the ones with faster CPU form another class, etc. You can use these machine classes to target your calculation at machine with specific properties ("#PBS -l" parameter in qsub script). Otherwise, if you don't select any class, PBS chooses which nodes your calculation will be run on.

Start Server (by hand for the first time):
pbs_server -t create

Activate it in qmgr:
qmgr> set server scheduling=true

Start Scheduler:
pbs_sched

Configure a Server in qmgr:

create queue md queue_type=execution  # at least 1 execution queue must exist 
                                      # (it's called 'md' in this example)
set server default_queue=md           # set a default queue
s q md enabled=true                   # enable md queue
s s default_node=ristra01             # set a default node to run jobs on
s s node_pack=true                    # pack jobs on SMPs first
s q md started=true                   # you must start a queue md

pbs_mom must be started on each node

Submitting jobs

First you will need to create a script specifying your options for execution as well as the actual command your job will run (mpirun). You can either edit this sample script to suit your needs, or create your own script from scratch. Jobs are submitted through qsub command:

qsub <pbs_script>

A job ID will be returned to you, and your job moves into the queue.

Other useful PBS commands

To check the status of your job(s), type:

qstat

To delete your job from the queue (whether it's running or not), type:

qdel <jobid>

To check the status of nodes, type:

pbsnodes -a

It is not very difficult to parse the output of some of these commands and display it as an arbitrarily formatted HTML document on one of your cluster servers. For example, to display the queue status, you would parse the output of "qstat -f" (sample output), and to display the status of the nodes, you'd process the output of "pbsnodes -a" or qmgr's "list node node1 node2 ..." commands ( sample output).

Last updated: April 18, 2004
Email: viktor.hornak sunysb edu