Saturday, October 10, 2015

Running job with proper software version via modulefiles in supercomputer

Motivation

On HPC cluster (high performance computing/supercomputer), it is often to have multiple version of the same software installed. When the user want to use the specific software which the certain library, changing environment variable like $PATH manually is tedious and sometime difficult, the is the common errors during running job in HPC. To solve the problem, corresponding shell script would have to be written for multiple languages. All of these explicit steps are error prone and difficult to maintain. It is also no simple upgrade path. One exist solution is by using modulefiles, a (Unix) tool for dynamic modification of a user's environment.

Module Commands

Here are some module commands:
  • module avail: to check available modulefiles
  • module list: view current active modules
  • module purge: delete currently loaded modules
  • module load <package>: to load default module <package>
  • module load <package/version>: to load module <package> on specific version.
  • module unload <package>: unload the package
  • module show <package>: display command triggered by module load
  • module whatis <package>: display 1-line info about the module
  • module help <package>: need a help?

Task/Exercise

Do the following task to load fftw3 package via modulefiles and test it with a job.
  • Download FFTW3 and install the package on virtual server
  • Create the a module for the FFTW package (filename: fftw3)
  • Run the code within jobscript

Solution

Here the my solution for the given task (code hosted in Github):
  1. Login to master/frontend on cluster as root 
  2. Download the latest fftw3 3
  3. Make directory on /opt/fftw3 
  4. Install fftw3 as single precision with:
    $ ./configure --prefix=/opt//fftw3 --disable-shared \
    --enable-static --enable-single --enable-fortran
    $ make
    $ make install
    
  5. Install fftw3 with double precision with:
    make clean
    $ ./configure --prefix=/opt/soft/libs/fftw3 --disable-shared \
    --enable-static --enable-fortran
    $ make
    $ make install
    
  6. Set the prefix in file fftw according to installation path (/opt/fftw3)
  7. Move fftw to modulefiles path (/usr/share/Module/modulefiles, check it with module avail) 
  8. Compile job
    $ make all
    
  9. Load module
    $ module load fftw3
    
  10. Make bash script for the jobb >> diffusion.sh
  11. Submit job to the cluster
    $ qsub diffusion.sh
  12. Check if there is any error (XX is output number):
  13. $ qstat -f>
    $ more diffusion.sh.oXX
    
Related Posts Plugin for WordPress, Blogger...