(from Bob Duke)
Folks -
For those of you interested in being able to use the Intel Fortran Compiler, version 8, on Linux IA32 machines, Intel has finally, after 3 months or thereabouts, released a fix. The file to pick up from https://premier.intel.com is l_fc_pc_8.0.046.tar.gz. Any releases in the version 8 lineage earlier than this won't work. I of course can't predict what will happen with versions after this one, so if you update later, beware!
This version of the compiler produces a PMEMD that is roughly 5-10% faster than the PMEMD produced by ifc 7.1, and I have run regressions (21 tests) on a dual xeon, 1 and 2 processors and all tests pass. However, when Intel giveth, Intel tends to also take away. There are new issues on getting pmemd to build and run with mpich. These are not insurmountable; it is just disappointing that you have to do the workarounds. So here are the details for pmemd 8 on redhat 9 or rhel 3. This probably also applies to redhat 8, but I don't have a test system.
First of all a sample config.h for a uniprocess build, followed by a sample config.h for an mpich build, both for the pentium and redhat 9 or rhel 3:
#!/bin/csh -f setenv PREPROCFLAGS "-DDIRFRC_VECT_OPT" setenv CPP "/lib/cpp -traditional " setenv OPT_LO "ifort -c -auto -tpp7 -mp1 -O0" setenv OPT_MED "ifort -c -auto -tpp7 -mp1 -O2" setenv OPT_HI "ifort -c -auto -tpp7 -xW -mp1 -ip -O3" setenv LOAD "ifort" setenv LOADLIB " -limf -lsvml "
#!/bin/csh -f setenv MPICH_HOME /opt/pkg/mpi setenv MPICH_INCLUDE $MPICH_HOME/include setenv MPICH_LIBDIR $MPICH_HOME/lib setenv MPILIB "-L$MPICH_LIBDIR -lmpich" setenv PREPROCFLAGS "-DMPI -DSLOW_NONBLOCKING_MPI -DDIRFRC_VECT_OPT" setenv CPP "/lib/cpp -traditional -I$MPICH_INCLUDE" setenv OPT_LO "ifort -c -auto -tpp7 -mp1 -O0" setenv OPT_MED "ifort -c -auto -tpp7 -mp1 -O2" setenv OPT_HI "ifort -c -auto -tpp7 -xW -mp1 -ip -O3" setenv LOAD "ifort" setenv LOADLIB " -limf -lsvml $MPILIB"
Now for the additional caveats. With ifort 8, either on an IA32 chip or on the itanium, a lot more stack is used by the executables produced if you actually use some of the more modern f90 capabilities (like pmemd does). Thus it is important to (on the csh or tcsh) do a "limit stacksize unlimited" in your .login script (for sh and it's variants I think you have to use a ulimit, and different syntax). In all past experience, this is only required in .login. I don't know what has happened in ifort 8, but now for mpich runs it is necessary to put the "limit stacksize unlimited" in .cshrc (which all invocations of csh execute). This is very strange because limits are supposed to be inherited without any such action (kind of like environment variables - next topic). Also, in the past you needed to source the appropriate intel fortran environment variable script in your .login (or .profile for sh-ish shells). Well, for mpi executables built by ifort 8, you also need to source the fortran environment variable script in .cshrc (or .bashrc or whatever). It is probably sufficient to just set LD_LIBRARY_PATH. An alternative to this is to put the intel libraries path in /etc/ld.so.conf, but you must then remember to run /sbin/ldconfig as root.
Thanks to David Konerding for help on all this; we were both testing stuff yesterday. If you are still running pmemd 3.1, by the way, and want to use it on these later redhat releases, please be sure to remove the -static option from whatever MACHINE file you use (due to a static threads library stack overflow issue, I believe - now threads code is used in all builds, due to intel library stuff).
One additional note on building mpich 1.2.5.2 under ifort 8. It DOES work, but it is fairly common to build it as root, and then it is important to remember to set up the intel fortran compiler environment (source the script in the root account). If you forget to do this, you get the errors in construction of mpif.h that have been reported on the ifc developer's forum (MPI_ADDRESS_KIND and MPI_OFFSET_KIND are set to 0; they should be 4 and 8, contrary to what it says on the forum).
Sorry for the mess. It's not my fault! What you save in cash on Linux systems, you pay back in other ways...
Regards - Bob Duke
Folks -
I earlier promised to post some info on configuring mpich for better performance when running pmemd (or for that matter, sander) jobs. I don't know if everyone already knows this stuff; if so I apologize. However, I found the MPICH manual to be a bit confusing, and not entirely correct or complete, and had to do a fair bit of testing to get a simple gigabit ethernet system to run pmemd with very low net latency.
The system I use is simple and relatively cheap ($8K US) - 2 3.2 GHz dual xeon pc's running a current release of redhat linux, with a dedicated connection via a category 6 crossover cable and two ethernet cards based on the intel 82545 chip (my specific cards are the intel pro/1000 mt server adapter - you pay more for the server cards, but they are not outrageous, and they offload the cpu - thanks to Dave Konerding for the original suggestion). I don't need an ethernet switch with only 2 machines, but if you have more machines, you should get a really good switch, or it will be the bottleneck (others with vast experience on this issue, please specify a list of good choices).
Configuring the OS and MPI
Okay, the way to get performance, at least in terms of reducing net latency, is to increase the socket buffer size. You actually need to do this at two levels:
1) At the OS level, you need to zap a couple of values that determine the upper limit allowed for socket buffers. This is, in my opinion, best done by adding the following two lines to the system file /etc/rc.d/rc.local (as root, of course), and rebooting:
echo 1048576 > /proc/sys/net/core/rmem_max echo 1048576 > /proc/sys/net/core/wmem_maxThis sets the upper limit on socket buffers to 1 MB.
2) At the level of mpi, you need to make the following entry in your .cshrc (or the equivalent commands in .bashrc, if you bash):
setenv P4_SOCKBUFSIZE 524288This is the only way I have found, despite the other doc'd ways, to get mpich to use a bigger socket buffer, and it only applies to the ch_p4 device as far as I know. Here I am using half the max allowed by the r/wmem_max setting; I set the system to a larger maximum just in case I want to bump this value up without hassles sometime.
For big systems, you may also need to bump up P4_GLOBMEMSIZE; mine is at 8388608. If this value is too low, your run will fail but there will be a helpful error message. This, as far as I know, only has an impact on successful initialization as opposed to performance.
Okay, what do you get for your efforts? For the fix ~91K atom problem running on 4 processors, performance improves by about 29%. This occurs because net latency drops from 27% to 5%. Worthwhile, in my opinion. Overall, using the above fix problem, you can get 238 psec/day out of 4 processors, costing you ~$8K, so that is not bad.
Specific data for various P4_SOCKBUFSIZE settings follows. The test is factor ix, constant pressure pme, 8 ang cutoff, 250 steps, .0015 psec step, on 4 3.2 GHz xeon processors.
P4_SOCKBUFSIZE, bytes cpu time wall clock time default (16Kbytes?) 127.55 175 - note latency 16384 127.52 174 32768 128.04 155 65536 126.86 144 131072 128.46 138 262144 128.78 137 524288 129.13 136 1048576 128.7 135
Other things to watch out for:
Another source of potential misery for the user of a small setup revolves around setting up the process group file(s). The only way I have succeeded in getting processes running in the right places when I have a dedicated connector (ie., I am using mpi over something other than the system's main net card) is to edit a pgfile and point to it in the mpirun command. Thus, my default pgfile looks like:
tiger_hs 1 /work/exe/pmemd lion_hs 2 /work/exe/pmemdand I start jobs using "mpirun -p4pg ~/pgfile /work/exe/pmemd
What the pgfile above does is specify the correct nic card (*_hs in my hosts file) and specify the number of processes per system. I start jobs from tiger, so you decrement the process count by one (the pgfile format somewhat insanely specifies the number of ADDITIONAL processes to start -geez). So this pgfile will start 4 processes. If you are not getting the performance you expect, look at the pmemd logfile output. If the cpu utilization is uneven by more than 5% or so, or if there are not the number of processors you anticipated, something is wrong with where and how mpich is running your jobs.
I PRESUME this sort of thing is not necessary if you are not using dedicated nic's that differ from the nic's pointed at by hostname. Then the machines.LINUX file under mpich/share could probably just contain:
tiger : 2 lion :2and everything would be great. This absolutely does not cause the right things to happen for an mpirun -np 4 on my systems, though.
This stuff is way more complicated than it should be. Anyone with influence with the mpich folks should maybe point that out ;-) Most folks hopefully get insulated from all this junk by their sys admins.
Regards - Bob Duke