Idaho State

UNIVERSITY
 

Beowulf REsource for Monte-carlo Simulations

 

Directors:

  Dr. Phil Cole
  Dr. Eduardo Farfan
 

Hardware

Brems is made of 21 nodes: 12 nodes with two 2.0 GHz Opteron CPUs and 2 GB of ECC RAM, and a 40 GB hard drive, and 9 nodes with two dual-core 2.0 GHz Opteron CPUs, 4 GB of ECC RAM, and a 80 GB hard drive. This gives a total of 60 processing cores. The head node also has 1.75 TB of redundant RAID5 storage.

Operating System

Brems is running Debian Linux 3.1 Sarge for the AMD64 with a vanilla 2.6.16 64-bit kernel and full 64-bit userspace. However, the kernel does support 32-bit executables and there is a full 32-bit Debian 3.1 Sarge install in a chroot for compatability.  

Network

The Brems head node can be accessed via ssh at either brems.physics.isu.edu or brems.iac.isu.edu. Within the cluster (via a copied /etc/hosts file) each node can be referenced by brems1...brems21. The corresponding IP addresses are 10.0.200.1 ... 10.0.200.21. All the nodes are connected to the same gigabit switch with one gigabit link each for a flat topology. Brems has a 100Mbit link to the campus network and a static IP of 134.50.3.200  

Communication

ssh is the remote shell used for initiating communication between nodes on the cluster. Users should be set up with appropriate authorized_keys and known_hosts files to allow password-less and prompt-less login between nodes. Although programs that push a lot of data through stdin/stdout may be hurt by the encryption overhead of ssh, most programs and libraries use their own sockets for the actual computation data and will be minimally impacted.

Filesystem

/home resides on the head node and is shared read/write across all of the nodes and chroots

/ia32 is a full 32-bit debian install for compatibility reasons.

/brems contains programs/libraries custom compiled for the cluster.

/brems/mcnpxdata/ and /brems/geant4_data contain physics data/libraries for these applications.

/brems/mpich* contains various mpich libraries. Directories ending in _32 are compiled as 32-bit for program (MCNPX) compatibility

/brems/mpich is a symbolic link to the newest known working mpich install

/brems/mpich*/share/machines.* contains the various "architecture" files used with mpirun.

/brems/bin should be in all users' PATH. Contains the actual cluster-specific executables.

/nodes/default/ this is the template for the root filesystem that the nodes rsync when booted on the network. All changes to the nodes root filesystem should be made here and then pushed out to avoid inconsistencies when nodes are re-imaged.

Utilities

For simple administration tasks the C3 tools from ornl http://www.csm.ornl.gov/torc/C3/ have been installed in /opt/c3-4/, although most of the useful scripts have been symlinked into /brems/bin/.

cexec is a script to run a process on the nodes. By default it will run on all the slave nodes. Use "cexec hostname" to verify ssh permissions and which nodes are being run. You can specify specific nodes using :a-b after the cexec where a is less than or equal to b and a and b are the desired node numbers minus 2 (counting from 0 and starting with node2). Example: "cexec :2-3 hostname" returns:

************************* brems *************************
--------- brems4---------
brems4.iac.isu.edu
--------- brems5---------
brems5.iac.isu.edu

cps is a cluster version of ps. It runs a custom ps script on each of the nodes, giving a quick overview of the top process on each CPU.

ckill is a cluster version of pkill. "ckill pattern" sends a kill signal to every process on the affected nodes that contains "pattern" in it's name. Example: "ckill mcnpx" kills the mcnpx process on all the slave nodes.

mpirun is the program to launch programs based on mpi 1.2.* It's basic usage is "mpirun -arch LINUX -np 25 mcnpx-mpi" where LINUX specifies that /brems/mpich/share/machines.LINUX contains the correct list of nodes to run on, 25 is the number of processes to run (for MCNPX 25 is 24 compute processes and 1 master process), and mcnpx-mpi is the name of the executable file that is compiled with mpi support.



Compilation

The cluster has gcc/g++/g77 versions 3.3 and 3.4 installed. Just using gcc defaults to gcc3.4, and specific versions can be specified by using the full name ex: "g77-3.3".

The cluster also has the commercial Portland Group compilers installed (http://www.pgroup.com). They are installed in /usr/pgi/ and the directories /usr/pgi/linux86-64/6.0/bin/ contain the executables for 64 bit. Generated executables are portable to other machines, but require a free compatibility library from http://www.pgroup.com.

To use the PGI compiler tools you need to run the following 2 commands:

export PATH=$PATH:/usr/pgi/linux86-64/6.0/bin
export LM_LICENSE_FILE=/usr/pgi/license.dat
pgcc is the C compiler, pgCC is the C++ compiler, and pgf77 and pgf90 are the fortran compilers. The documentation for the PGI compilers can be accessed at http://brems.iac.isu.edu/documentation/pgi_6.0/index.htm

MPI programs can be compiled using the mpi compiler scripts in /brems/mpich*/bin/. They are frontends to the compiler that each mpich installation was configured with (most often the pgi compilers).