Environment on the RRZE HPC systems

We aim to provide an environment across the RRZE production cluster systems that is as homogeneous as possible. This page describes this environment.

This page covers the following topics:

Available software on the HPC systems

We provide compilers and some standard-libraries on the clusters via the modules system.

If you need additional libraries or software, we will only install these globally if there is demand from more than a handful of users. If you are the only group using a software, just install it into your home directory.

The only commercial software we provide on all clusters are the Intel compilers and related tools.
For any other commercial software, HPC@RRZE will NOT provide any licenses. If you want to use any commercial software, you will need to bring the license with you. This is also true for software sub-licensed from the RRZE software group. All calculations you do on the clusters will draw licenses out of your license pool. Please try to clarify any licensing questions before contacting us, as we really do not plan to become experts in software licensing.

modules system

On all RRZE HPC systems, established tools for software development (compilers, editors, …), libraries, and selected applications are available. For many of these applications, it is necessary to set special environment variables, so that e.g. search paths are correct or license servers can be found.

To ease selection of and switching between different versions of software packages, all HPC systems at RRZE use the modules system (cf. modules.sourceforge.net). It allows to conveniently load the necessary configurations for different programs or different versions of the same program an, if necessary, unload them again later.

Important module commands

Overview of the most important module commands
Command What it does
module avail lists available modules
module whatis shows an over-verbose listing of all available modules
module list shows which modules are currently loaded
module load <pkg> loads the module pkg, that means it makes all the settings that are necessary for using the package pkg (e.g. search paths).
module load <pkg>/version loads a specific version of the module pkg instead of the default version.
module unload <pkg> removes the module pkg, that means it undoes what the load command did.
module help <pkg> shows a detailed description for module pkg.
module show <pkg> shows what environment variables module pkg actually sets/modifies.

General hints for using modules

  • modules always only affect the current shell.
  • If individual modules are to be loaded all the time, you can put the command into your login scripts, e.g. into $HOME/.bash_profile – but keep in mind that your home is shared among all HPC clusters and that the available modules are likely to be different on different system.
  • The syntax of the module-Commands is independent from the shell used. They can thus usually be used unmodified in any type of PBS job script.
  • Some modules cannot be loaded together. In some cases such a conflict is detected automatically during the load command, in which case an error message is printed and no modifications are made.
  • Modules can depend on other modules, so that these are loaded automatically when you load the module. It is also possible to define default versions for modules. As an example, the current Intel compiler modules will depend on IntelMPI and the Intel MKL and load these automatically. If you load just the module intel64, you will get the current default Intel compiler version for that cluster. If you want to ensure a specific version, append /version number, e.g. intel64/19.0up02.
  • A current list of all available modules can be retrieved with the command module avail.

Important standard modules

Important standard modules, available on most or all clusters
intel64 This is probably the most used module by far: It loads the current recommended version of the Intel compilers for the current cluster. Note that this will not always be the same version across clusters. This module depends on and automatically loads Intel MPI and MKL on most clusters. If you want to use a different MPI variant, do NOT load this module, but load the module for the MPI variant instead.
openmpi Loads some version of OpenMPI and the matching compiler. Note that OpenMPI is not the MPI-variant recommended by RRZE, but we provide it because some users had better experience with it than the default IntelMPI.
gcc Some version of the GNU compiler collection. Please note that all systems naturally have a default gcc version that is delivered together with the operating system and that is always available without loading any module. However, that version is often a bit dated, so we provide a gcc-module with a somewhat newer version on some clusters.

Some hints which can simplify the usage of modules in Makefiles:

  • When using MPI modules, the environment variables MPICHROOTDIR and MPIHOME are set to the root directory of the respective MPICH version. Access to include files and libraries can therefore be achieved by  $MPIHOME/include and $MPIHOME/lib.
  • Analogously, the environment variables INTEL_C_HOME and INTEL_F_HOME are set to the respective root directory when using the Intel compiler modules. This can be helpful when Fortran and C++ objects should be linked and the respective libraries have to be included manually.

 

shells

In general, two types of shells are available on the HPC systems at RRZE:

  • csh, the C-shell, usually in the form of the feature enhanced tcsh instead of the classic csh.
  • bash

csh used to be the default login shell for all users, not because it is a good shell (it certainly isn’t!), but simply for „historical reasons“. Since ca. 2014 the default shell for new users has been bash instead, which most people having used any Linux systems will be familiar with. The newer clusters (starting with Emmy) will always enforce bash as the shell, even for old accounts. If you have one of those old accounts still using csh and want to change to bash for the older clusters too, you can contact the ServiceTheke or the HPC team to get your login shell changed.

 

Software Development

You will find a wide variety of software packages in different versions installed on the cluster frontends. The module concept is used to simplify the selection and switching between different software packages and versions. Please see the section on batch processing for a description of how to use modules in batch scripts.

Compilers

Intel

Intel compilers are the recommended choice for software development on all clusters. A current version of the Fortran90, C and C++ compilers (called ifort, icc and icpc, respectively) can be selected by loading the intel64 module. For use in scripts and makefiles, the module sets the shell variables $INTEL_F_HOME and $INTEL_C_HOME to the base directories of the compiler packages.

As a starting point, try to use the option combination -O3 -xHost when building objects. All Intel compilers have a -help switch that gives an overview of all available compiler options. For in-depth information please consult the local docs in $INTEL_[F,C]_HOME/doc/ and Intel’s online documentation for their compiler suite (currently named „Intel Parallel Studio XE“).

Endianness

All x86-based processors use the little-endian storage format which means that the LSB for multi-byte data has the lowest memory location. The same format is used in unformatted Fortran data files. To simplify the handling of big-endian files (e.g. data you have produced on IBM Power, Sun Ultra, or NEC SX systems) the Intel Fortran compiler has the ability to convert the endianness on the fly in read or write operations. This can be configured separately for different Fortran units. Just set the environment variable F_UFMTENDIAN at run-time.

Examples:

Effect of the environment variable F_UFMTENDIAN
F_UFMTENDIAN= Effect
big everything treated as BE
little everything treated as LE (default)
big:10,20 everything treated as LE, except for units 10 and 20
„big;little:8“ everything treated as BE, except for unit 8

GCC

The GNU compiler collection (GCC) is available directly without having to load any module. However, do not expect to find the latest GCC version here. Typically, several versions are separately installed on all systems and made available via environment modules, e.g.  module load gcc/<version>.

Be aware that the default Intel/Open MPI module assumes the Intel compiler. When using GCC, the corresponding module intelmpi/XX-gnu or  openmpi/XX-gcc has to be loaded.

MPI Profiling with Intel Trace Collector/Analyzer

Intel Trace Collector/Analyzer are powerful tools that acquire/display information on the communication behavior of an MPI program. Performance problems related to MPI can be identified by looking at timelines and statistical data. Appropriate filters can reduce the amount of information displayed to a manageable level.

In order to use Trace Collector/Analyzer you have to load the itac module. This section describes only the most basic usage patterns. Complete documentation can be found on Intel’s ITAC website, or in the Trace Analyzer Help menu. Please note that tracing is currently only possible when using Intel MPI, therefore the corresponding intel64 and intelmpi module have to be loaded.

Trace Collector (ITC)

ITC is a tool for producing tracefiles from a running MPI application. These traces contain information about all MPI calls and messages and, optionally, on functions in the user code.

It is possible to trace your application without rebuilding it by dynamically loading the ITC profiling library during execution. The library intercepts all MPI calls and generates a trace file.  To start the trace, simply add the -trace option to your mpirun command, e.g.:

$ mpirun -trace -n 4 ./myApp.

In some cases, your application has to be rebuild to trace it, for example if it is statically linked with the MPI library or if you want to add user function information to the trace. To include the required libraries, you can use the -trace option during compilation and linking. Your application can then be run as usual, for example:

$ mpicc -trace myApp.c -o myApp
$ mpirun -n 4 ./myApp

You can also specify other profiling libraries, for a complete list please refer to the ITC User Guide.

After an MPI application that has been compiled or linked with ITC has terminated, a collection of trace files is written to the current directory. They follow the naming scheme <binary-name>.stf* and serve as input for the Trace Analyzer tool. Keep in mind that depending on the amount of communication and the number of MPI processes used, these trace files can become quite large. To generate one single file instead of several smaller files, specify the option -genv VT_LOGFILE_FORMAT=SINGLESTF in your mpiruncall.

Trace Analyzer (ITA)

The <binary-name>.stf file produced after running the instrumented MPI application should be used as an argument to the traceanalyzer command:

traceanalyzer <binary-name>.stf

The trace analyzer processes the trace files written by the application and lets you browse through the data. Click on „Charts-Event Timeline“ to see the messages transferred between all MPI processes and the time each process spends in MPI and application code, respectively. Click and drag lets you zoom into the timeline data (zoom out with the „o“ key). „Charts-Message profile“ shows statistics about the communication requirements of each pair of MPI processes. The statistics displays change their content according to the currently displayed data in the timeline window. Please consider the Help menu or the ITAC User Guide to get more information. Additionally, the HPC group of RRZE will be happy to work with you on getting insight into the performance characteristics of your MPI applications.

Parallel Computing

The intended parallelization paradigm on all clusters is either message passing using the Message Passing Interface (MPI) or shared-memory programming with OpenMP.

IntelMPI

Intel MPI supports different compilers (GCC, Intel). If you use Intel compilers, the appropriate intelmpi module is loaded automatically upon loading the intel64 compiler module. The standard MPI scripts mpif77, mpif90, mpicc and mpicxx are then available. By loading a intelmpi/XXX-gnu module instead of the default intelmpi, those scripts will use the GCC.

There are no special prerequisites for running MPI programs. Just use

mpirun -n <num_procs> [<options>] your-binary your-arguments

The parameter -n <num_procs> is mandatory to specify how many processes should be started.

By default, one process will be started on each allocated CPU in a blockwise fashion, i.e. the first node is filled completely, followed by the second node etc.. If you want to start less processes per node (e.g. because of large memory requirements) you can specify the -ppn <num_procs> option to mpirun to define the number of processes per node.

We do not support running MPI programs interactively on the frontends. To do interactive testing, please start an interactive batch job on some compute nodes. During working hours, a number of nodes is reserved for short (< 1 hour) tests.

The MPI start mechanism communicates all environment variables that are set in the shell where mpirun is running to all MPI processes. Thus it is not required to change your login scripts in order to export things like OMP_NUM_THREADS, LD_LIBRARY_PATH etc..

It is possible to use process binding to specify the placement of the processes on the architecture. This may increase the speed of your application, but also requires advanced knowledge about the system’s architecture. When no options are given, default values are used. This is the recommended setting for most users. More information about process binding can be found here.

OpenMPI

We mainly support IntelMPI, therefore we recommend to use it whenever possible. If necessary however, OpenMPI is available via the modules system. Loading the openmpi/XX-intel or openmpi/XX-gcc module will automatically also load the respective compiler module. The standard MPI compiler wrapper mpicc, mpicxx and mpifort are then available.

The usage of OpenMPI is very similar to IntelMPI:

mpirun -n <num_procs> [<options>] your-binary your-arguments

The parameter -n <num_procs> or -np <num_procs> specifies the number of processes to run. Additionally, you can define the number of processes per socket with -npersocket <num_procs> or the number of processes per node via -npernode <num_procs>.

OpenMPI also supports process binding via options for mpirun. Further details can be found here.

OpenMP

The installed compilers support at least the relevant parts of recent OpenMP standards. The compiler recognizes OpenMP directives if you supply the command line option. Use -fopenmp for GCC and -qopenmp for the Intel compiler. This is also required for the link step.

To run an OpenMP application, the number of threads has to be specified. This is done via the environment variable OMP_NUM_THREADS. If this is not set, the default variable will be used. In most cases, the default is 1, which means that your code is executed serially.   If you want to use for example 12 threads in the parallel regions of your program, you can change the environment variable by  export OMP_NUM_THREADS=12.

OpenMP Pinning

To reach optimum performance with OpenMP codes, correct pinning of the OpenMP threads is essential. As nowadays practically all machines are ccNUMA, where incorrect or no pinning can have devastating effects, this is something that should not be ignored.

A comfortable way to pin your OpenMP threads to processors is by using likwid-pin, which is available within the likwid module on all clusters. You can start your program run using the following syntax:

likwid-pin -c <cpulist> <executable>

There are various possibilities to specify the CPU list, depending on the hardware setup and the requirements of your application. A short summary is available by calling likwid-pin -h. A more detailed documentation can be found in the Wiki.

An alternative way of pinning is using OpenMP specific methods. More information about this is available in the HPC Wiki.

Libraries

Intel Math Kernel Library (MKL)

The Math Kernel Library provides threaded BLAS, LAPACK, and FFTW routines and some supplementary functions (e.g., random number generators). For distributed-memory parallelization there is also ScaLAPACK and CDFT (cluster DFT), together with some sparse solver subroutines. It is highly recommended to use MKL for any kind of linear algebra if possible. To facilitate the choice of functions for a specific use case, you can refer for example to the Intel MKL LAPACK function finding advisor.

After loading the mkl module, several shell variables are available that help with compiling and linking programs that use MKL. The installation directory can be found under $MKLROOT, other useful environment variables are available by module show mkl/XX. For most applications, it should be sufficient to compile and link your program with -mkl. For more complex applications, you can find out what libraries are recommended by using the Intel MKL link line advisor.

Many MKL routines are threaded and can run in parallel by setting the OMP_NUM_THREADS shell variable to the desired number of threads. If you do not set OMP_NUM_THREADS, the default number of threads is one. Using OpenMP together with threaded MKL is possible, but the OMP_NUM_THREADS setting will apply to both your code and the MKL routines. If you don’t want this it is possible to force MKL into serial mode by setting the MKL_SERIAL environment variable to YES.

For more in-depth information, please refer to Intel’s online documentation on MKL.