Implementing Amber on a Beowulf type cluster(Example of using AMBER on a Linux PC cluster) |
|
All nodes are running slightly modified version of RedHat Linux 7.3 with 2.4.20 kernel. The configuration of diskless clients is not quite straightforward, we recommend using PXE capable network cards in combination with pxelinux (see PXELINUX link below). Another possibility is to program the eeprom chips of the network cards that allow it (see etherboot project link below) but this is a bit more complicated with no additional advantages. There are many resources available giving details on such configurations:
Network is 100MBs, with an HP ProCurve 4000M switch providing 80 ports (with all modules in). Cheaper switches are not worth it, this is not a good place to save on. Amber on recent linux kernels scales pretty well with straight Gigabit (1000MBs) networking. However, we ran into difficulties while running both 100MBs and 1GBs nodes using the same server. Therefore it's recommended to setup 1GBs nodes with a separate server.
Unlike previous versions of Amber (follow this link for compilation of amber7), Amber8 does not use Machine files anymore. Instead configure script is used to set preprocessor, compiler and linker options. Another important change is that compilation now requires Fortran90 compiler (which effectively rules out GNU g77 as an option).
Because Intel Fortran and C compilers give excellent performance for the compiled code (at least for Intel processors) we will stick to these compilers in the following. In addition, both Fortran and C compilers are available as free unsupported downloads from Intel site, at least at the time of this writing.
Compiling Amber8 with Intel compilers and MPICH:
Remember to set AMBERHOME and MPICH_HOME (for parallel compilations) environmental variables before you start compiling (export is used for setting environmental variables in the following examples; use setenv for csh variants). Here is the basic procedure for getting things set up, compiled and tested:
1. Install Intel compilers (ifort8.0, icc8.0)
Get compilers (non-commercial, unsupported version or supported, paid-for version) from Intel website (http://developer.intel.com/software/products/noncom/) and follow their procedure to install them. (Note: be sure to look at the license requirements under the "FAQ" link. Most academic research groups will not qualify for the non-commerical product.)
Once compilers are installed, set compiler environment, e.g. from your .bashrc:
# Intel C Compiler (icc8.0), Fortran compiler (ifort8.0) source /opt/intel_cc_80/bin/iccvars.sh source /opt/intel_fc_80/bin/ifortvars.sh
2. Compile and install MPI libraries (mpich-1.2.5.2)
To compile MPICH libraries with Intel compilers, configure for example like this:
export CC=icc export FC=ifort ./configure --prefix=/usr/local/mpich-1.2.5.2_icc -cc=$CC -fc=$FC make installIf you use GCC to compile mpich, configure in the following way:
export CC=gcc export FC=ifort ./configure --prefix=/usr/local/mpich-1.2.5.2_gcc -cc=$CC -fc=$FC make install
3. Compile and install amber8
export AMBERHOME="/usr/local/amber8_mpich" export MPICH_HOME="/usr/local/mpich-1.2.5.2_icc"3a. Compile a serial (without MPI) version of all programs:
cd $AMBERHOME/src ./configure ifort make serial cd $AMBERHOME/test make test | tee test.serial.out3b. Compile a parallel version of sander (and sander.LES):
cd $AMBERHOME/src make clean ./configure -mpich ifort make parallel cd $AMBERHOME/test export DO_PARALLEL="/usr/local/mpich-1.2.5.2_icc/bin/mpirun -np 2 " make test.sander | tee test.sander.parallel.out make test.sander.LES | tee test.sander.LES.parallel.out
Compiling Amber8 with Intel compilers and LAM MPI:
Alternatively, if you want to use LAM MPI libraries instead of MPICH, here is a short description how the above procedure would be modified:
step 2 from above would be:
export CXX=icc export FC=ifort export CFLAGS=-static export CXXFLAGS=-static export FFLAGS=-static ./configure --prefix=/usr/local/lam-7.0.4_icc --without-romio --without-profiling make make install
step 3b from above would be:
export AMBERHOME="/usr/local/amber8_lam" export LAM_HOME="/usr/local/lam-7.0.4_icc" ./configure -lam ifort make paralleltest parallel version of sander with LAM:
Make sure that LAM commands are in your path, e.g. put:
export PATH=/usr/local/lam-7.0.4_icc/bin:$PATH
into your .bashrc file (and reload it).
Create a file, for example called machinefile and put it
into $AMBERHOME/test. This file defines "LAM Universe", i.e. which
CPUs will your job run on. An example of the machinefile,
which specifies that a current machine with 2 CPUs will be used might
simply look like this:
localhost cpu=2
Make sure you can login to this machine without a password. Please refer to LAM manuals which describe how to run MPI jobs with LAM in full detail.
cd $AMBERHOME/test lamboot -v machinefile lamnodes (just check your universe) export DO_PARALLEL="/usr/local/lam-7.0.4_icc/bin/mpirun -np 2 " make test.sander | tee test.sander.parallel.out make test.sander.LES | tee test.sander.LES.parallel.out lamhalt
Compiling REM with Amber8
If you need to compile Replica Exchange (REM) code into sander, see section 5.18 of amber8 manual. In short, follow a serial build (step 3a) with the slightly modified parallel build:
cd $AMBERHOME/src make clean ./configure -mpich ifort make AMBERBUILDFLAGS='-DREM' parallel cd $AMBERHOME/test export DO_PARALLEL="/usr/local/mpich-1.2.5.2_icc/bin/mpirun -np 4 " make test.sander.REM | tee test.sander.REM.outNote: Set number of processors to at least 4 (you can do it even if you are on a dual cpu machine).
If you're getting the following error during running REM tests ("make test.sander.REM" above):
cd rem_gb; Run.rem /bin/sh: Run.rem: command not found make: *** [test.sander.REM] Error 127Go to Makefile, find the 3 REM tests and replace "Run.rem" by "./Run.rem" (the current dir specifier "./" slipped inadvertently from in front of Run.rem command)
Known problems with Amber8:
Notes:
cd cytosine; ./Run.cytosine
Unit 5 Error on OPEN: in.md
[0] MPI Abort by user Aborting program !
[0] Aborting program!
p0_4302: p4_error: : 1
./Run.cytosine: Program error
It usually means that you compiled parallel version of
sander but forgot to set DO_PARALLEL before
running the tests (in other words, you are running MPI version
of sander without mpirun).
Here is a short description on installing and configuring PBS.
Download sources (not RPMs) from OpenPBS website. Configure and compile: if you encounter problems on compilation, check PBS mailing list. It is the best resource (and unfortunately the only one) for troubleshooting your problems.
Follow the instructions in the Administration Guide to configure Server, Scheduler and Moms (clients). It is advantageous to split the nodes into different classes by their properties. For example, nodes with more memory may form one class, the ones with faster CPU form another class, etc. You can use these machine classes to target your calculation at machine with specific properties ("#PBS -l" parameter in qsub script). Otherwise, if you don't select any class, PBS chooses which nodes your calculation will be run on.
Start Server (by hand for the first time):
pbs_server -t create
Activate it in qmgr:
qmgr> set server scheduling=true
Start Scheduler:
pbs_sched
Configure a Server in qmgr:
create queue md queue_type=execution # at least 1 execution queue must exist
# (it's called 'md' in this example)
set server default_queue=md # set a default queue
s q md enabled=true # enable md queue
s s default_node=ristra01 # set a default node to run jobs on
s s node_pack=true # pack jobs on SMPs first
s q md started=true # you must start a queue md
pbs_mom must be started on each node
qsub <pbs_script>
A job ID will be returned to you, and your job moves into the queue.
qstat
To delete your job from the queue (whether it's running or not), type:
qdel <jobid>
To check the status of nodes, type:
pbsnodes -a
It is not very difficult to parse the output of some of these commands and display it as an arbitrarily formatted HTML document on one of your cluster servers. For example, to display the queue status, you would parse the output of "qstat -f" (sample output), and to display the status of the nodes, you'd process the output of "pbsnodes -a" or qmgr's "list node node1 node2 ..." commands ( sample output).