FAU archive

*** Attention: THE SYSTEM AND THE DOCUMENTATION ARE STILL IN EARLY BETA TEST !!! ***

Applying for space

In order to use the FAU archive you need to fill out the application form.

The archived data will (by default) have a retention time of 10 years (i.e. be kept 10 years on tape). Chose the retention period with care, because it cannot be altered afterwards and data will be deleted without further notice once the retention period expired! Checking the expiration date of your archived files (see below on how to do that) – it’s your responsibility!
The default of 10 years was chosen because that usually is what e.g. DFG requires.
There are also archive classes to keep data only 1, 2, or 5 years.
There are also archive classes with 25 and 99 years which are much more expensive. However, please be aware that RRZE can not really guarantee such long periods – in fact it is pretty unrealistic that the data will be kept for that long without a system change. This is merely intended as a marker that you probably want to or have to keep this data for a very long time.

Using the Archive

The archive can only be accessed from a specific server.

Log onto fundusa1.rrze.uni-erlangen.de using SSH. This frontend can only be reached from the university network (including VPN).

Your RRZE-Linux-$Home as well as the HPC-NFS filesystems are (auto)mounted on fundusa1. The RRZE-Windows-Home can also be found at /home/rzwin/$USER. Group ids on the HPC file systems may not resolve correctly on fundusa1.

If you are logged into fundusa1 for a long time (at the time of writing this, more than 1 day), your Kerberos ticket will expire, and access to the RRZE-Homes will no longer work, resulting in „Permission denied“ errors when trying to access it. To solve this issue, just log in again. Advanced users may also extend their kerberos tickets a few times (maximum 1 week) – the tool you might want to google for that is ‚krenew‘.

Local storage is available on fundusa1 in case you have to transfer („stage“) the data before (e.g. using rsync, scp, etc.). To allocate temporary, local disk space for staging files, use
ws_allocate SOMENAME 7
The generated directory will be displayed. The directory name will consist of your user name and the name you give as SOMENAME.

The directory will automatically be removed after the specified duration (7 days – the last argument of the command above).

Transfer/Copy your data into your staging directory /staging/$USER-SOMENAME

For optimal use of the archive, tar/zip small files before archiving them!

The archive is optimised to handle a small number of large files.
Archiving a large number of files (several ten) at once, is a good indication that you should pack these together into a .zip/.tar.gz/.7z-archive first and store that one instead. Of course this does not apply if the resulting zipfile would be way too large, e.g. far exceed 100 GB.

Both commands, dsmc and dsmj (see below) tend to write log files to the current directory. Thus, remember to start them from a directory where you have write permissions, otherwise the commands will fail with Permission denied.

Using the GUI

There is a X11-based Grafical User Interface (GUI).
In order to be able to use it you must add the option -X or -Y to your ssh call and have an X-Server running locally. Microsoft Windows users may use MobaXterm which is a SSH-client with a builtin X-Server.

The GUI can be started by dsmj.
The most important actions:

  • Archive/Archive (main panel)
  • Archive/Retrieve (main panel)
  • Utilities/Delete Archive Data (top panel)

To select a specific archive class (e.g. with only 1, 2 or 5 years retention:
on the archive plane -> Options -> Override incl/excl list => Mgmt Class.)

Checking retention of archived data

There is no expiry date shown in the GUI, however you can find the archive class and the archive date and calculate it from there. To see this information select a file in the Retrieve window.

Using the commandline

Archiving data using command line

To create an archive use dsmc archive with suitable arguments.
The most important options are:

  • -desc="your project description is here; max 256 characters"
  • -archmc=RRZEARCHIV_##J with ##=01, 02, 05, 10; if not selected the archive will be stored for 10years
  • -deletefiles delete files from your staging directory after archiving
  • -subdir=yes subdirectories are included in the archive

A full command could look like this dsmc archive -desc="projet_A Partial archive #1" [-deletefiles] -subdir=yes /staging/$USER-archiv/projA-a1/

Retrieving data using command line

To query your archive on the command line use dsmc query archive /staging/$USER-SOMENAME/project/
You need to specify at least parts of the path and the slash at the end is important!,
You can adding -subdir=yes (will show subdirectories as well) or -desc="your archive description" (will only show archives with matching description) for additional selection in the output list.
Using dsmc query archive is the only way to display the expiration date of your archives!

To retrieve data on the command line use dsmc retrieve
An example could look like:
dsmc retrieve -replace=no -subdir=yes -desc="projet_A Partial archive #1" /staging/$USER-archiv/projA-a1/ [DESTDIR]
the -replace=no prevents accidental overwriting of existing files in [DESTDIR].

Querying your archive / finding your archived data

Proper organization of your data before archiving is mandatory.

Checking retention of archived data

You can check the retention of archived files using the dsmc query archive command. The expiry date will be listed in the output (see screenshot below).

The trailing slash when specifying directory for the query is important. dsmc query archive -subdir=yes /staging will not find anything!

Further usage of the command line

To delete archived data in the archive before the retention period expired, use dsmc del
e.g., dsmc del archive -desc="projet_A Partial archive #1" /staging/$USER-archiv/\*.zip [-pick]

Get help on the dsmc command
dsmc help

Note

If no description is specified, the current date is used by default.
If the same file is achieved with different descriptions, it will be multiple times in the archive (and also accounted multiple times)

Python and Jupyter

Jupyterhub was the topic of the HPC Cafe in October 2020. https://jupyterhub.rrze.uni-erlangen.de/ is an experimental service.

 

This page will address some common pitfalls when working with python and related tools on a shared system like a cluster.

The following topics will be discussed in detail on this page:

Available python versions

All unix systems come with a system wide python installation, however for the cluster it is highly recommended to use one of the anaconda installations provided as a modules.

# reminder
module avail python
module load python/XY

These modules come with a wide range of preinstalled packages.

Installing packages with pip

Pip is a package manager for python it can be used to easily install packages and manage their versions.
By default pip will try to install packages system wide, which will not be possible due to missing permissions.
The behaviour can be changed by adding --user to the call.
pip install --user package-name

By defining the variable PYTHONUSERBASE (best done in your bashrc/bash_profile) we change the installation location from ~/.local to a different path. Doing so will prevent your home folder from cluttering with stuff that does not need a backup and hitting the quota.
export PYTHONUSERBASE=${WORK}/software/privat

Conda environment

Some scientific software comes in form of a Conda environment (e.g. https://docs.gammapy.org/0.17/install/index.html).
By default such an environment will be installed to ~/.conda. However the size can be several GB therefore you should configure Conda to a different path. This will prevent your home folder from hitting the quota. It can be done by following these steps:

conda config # create ~/.condarc
Add the following lines to the file (replace the path if you prefer a different location)

pkgs_dirs:
- ${WORK}/software/privat/conda/pkgs
envs_dirs:
- ${WORK}/software/privat/conda/envs

You can check that this configuration file is properly read by inspecting the output of conda info
For more options see https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html

Jupyter notebook security

When using Jupyter notebooks with their default configuration, they are protected by a random hashed password, which in some circumstances can cause security issues on a multiuser-system like cshpc or the cluster frontends.
We can change this with a few configuration steps by adding a password protection.

First generate a configuration file by executing
jupyter notebook --generate-config

Open a python terminal and generate a password
from notebook.auth import passwd; passwd()

Add the password hash to your notebook config file

# The string should be of the form type:salt:hashed-password.

c.NotebookApp.password = u''
c.NotebookApp.password_required = True

From now on your notebook will be password protected this comes also with the benefit that you can use bash functions for a more convenient use.

Quick reminder how to use the remote notebook

#start notebook on a frontend (e.g. woody)
jupyter notebook --no-browser --port=XXXX

on your client:
ssh -f user_name@woody.rrze.uni-erlangen.de -L YYYY:localhost:XXXX -N
Open the notebook in your local browser at https://localhost:YYYY
With XXXX and YYYY being 4 digit numbers.
Don’t forget to stop the notebook once you are done. Otherwise you will block resources that could be used by others!

Some useful functions/alias for lazy people 😉

alias remote_notebook_stop='ssh username@remote_server_ip "pkill -u username jupyter"'
Be aware this will kill ALL jupyter processes that you own!

start_jp_woody(){

nohup ssh -J username@cshpc.rrze.uni-erlangen.de -L $1:localhost:$1 username@woody.rrze.uni-erlangen.de " 
. /etc/bash.bashrc.local; module load python/3.7-anaconda ; jupyter notebook --port=$1 --no-browser" ;
echo ""; echo " the notebook can be started in your browser at: https://localhost:$1/ " ; echo ""
}

 

start_jp_emmy(){

nohup ssh -J username@cshpc.rrze.uni-erlangen.de -L $1:localhost:$1 username@emmy.rrze.uni-erlangen.de " 
. /etc/profile; module load python/3.7-anaconda ; jupyter notebook --port=$1 --no-browser" ;
echo ""; echo " the notebook can be started in your browser at: https://localhost:$1/ " ; echo ""
}

 

If you are using a cshell remove . /etc/bash.bashrc.local and . /etc/profile from the functions.

SSH – Secure Shell access to HPC systems

To use the HPC systems at RRZE, you have to log into a cluster frontend via an SSH (SecureShell) client. SSH is a common command-line tool for remotely logging into and executing commands on a different computer over the network. The following topics will be discussed in detail on this page:

Basic usage

Connect to a remote host

Under Linux, Mac and recent Windows 10 versions, a command-line SSH client is pre-installed. If you want to have a graphical user interface, you can use third-party clients like PuTTY (Windows, Linux) or MobaXterm (Windows).

Direct access to the cluster frontends is restricted to networks within the university. So if you are connected via such a network, or if you are using VPN, you can connect using the following command:

ssh USERNAME@CLUSTERNAME.rrze.fau.de

In this case, USERNAME is your HPC user name and CLUSTERNAME is the name of the cluster you want to log into, e.g. woody, emmy or meggie. If you want to access TinyFat, TinyGPU or TinyEth, you also have to connect to woody. You will be prompted for your HPC password or your SSH key passphrase if you are using SSH keys. After successful authentication, you have a login shell on the target system.

If you are outside of the university network and are not using VPN, you have to connect to the dialogserver first :

ssh USERNAME@cshpc.rrze.fau.de

You can then use the above SSH command to connect to the cluster front ends from there.

Copy data to a remote host

A secure mechanism for copying data to a remote host is also available in all OpenSSH distributions on Linux, Mac, and current Windows 10 versions. When running Windows, you can also use WinSCP, which has a graphical user interface.

For all command-line based options, the secure copy mechanism is invoked by the following command:

scp <filename> USERNAME@CLUSTERNAME.rrze.fau.de:<remote_directory>

This will copy the local file <filename> to the directory $HOME/<remote_directory> on the remote system. This directory must exist prior to the copy attempt. Keep in mind that nearly all available file systems are mounted on all frontends (see File Systems documentation). It is therefore sufficient to copy data to only one frontend, e.g. cshpc.

Graphical applications

We generally do not recommend to run graphical applications on the cluster frontends, since they normally consume much more resources and can, therefore, interfere with the work of other users on these shared systems. However, there are some cases where the use of graphical applications is necessary.

For applications that do not need many resources, it should be sufficient to enable X11 forwarding/X11 tunneling by your SSH client via the -X option:

ssh -X USERNAME@CLUSTERNAME.rrze.fau.de

However, this requires an X11-Server running on your local machine, which is generally not available by default on Mac and Windows. In this case, you need to activate X11 tunneling in your client configuration, as well as have an X Window server (e.g. Xming or MobaXTerm for Windows, XQuartz for Mac) running locally.

As an alternative, we recommend using remote desktop software to run graphical applications, e.g. NoMachine NX. A description of how to set up and use NoMachine NX on cshpc is available in the dialogserver description.

SSH public-key authentication

As an alternative to logging in with your HPC password when you connect to a server via SSH, you can also use public key authentication. It requires a so-called SSH key pair comprised of two matching parts – a public and a private key. The key pair is generated on your local machine. The public key is uploaded to the remote system, whereas the private key remains on your local machine. We recommend generating a separate SSH key pair for every system (workstation, laptop, …) you use for logging into the HPC clusters.

Generating key pairs is possible when your client has OpenSSH capabilities (Linux, Mac, Windows 10). If you are using PuTTY, you can generate keys with puttygen.exe.

When generating a key pair, you have to choose between different algorithms and key sizes. The recommendations which one to use are changing over time since also the capabilities to break encryptions increase. Currently, it is advised to use either rsa with a length of 4096 bits, ecdsa with 521 bits or ed25519. Use one of the following commands to generate a key pair:

ssh-keygen -t rsa -b 4096

ssh-keygen -t ecdsa -b 521

ssh-keygen -t ed25519

During the generation process, you will be prompted for a passphrase to encrypt your private key. We don’t recommend leaving this empty since in this case, your private key sitting on your computer as a plain text file.  If this unencrypted private key is copied/stolen by someone, they can access the corresponding server directly. In case it is encrypted by a passphrase, the attacker must first find out the passphrase in order to gain access to the server with the key.

By default, the key pair is generated into the folder .ssh in your home directory, with the files id_<algorithm> being your private and id_<algorithm>.pub being your public key. If you want to change the location and name of your key pair, use the following option:

ssh-keygen -f <path_to_keys>/<keyname> -t <algorithm>

The public key must then be copied to the server and added to the authorized_keys file to be used for authentication. This can be conveniently done using the ssh-copy-id tool:

ssh-copy-id -i ~/.ssh/id_<algorithm>.pub USERNAME@cshpc.rrze.fau.de

If this doesn’t work, you can also manually copy the public key and add it to ~/.ssh/authorized_keys:

cat id_rsa.pub | ssh USERNAME@cshpc.rrze.fau.de 'cat>> ~/.ssh/authorized_keys'

Once the public key has been configured on the server, the server will allow any connecting user that owns the private key to log in. Since your home directory is shared on all HPC systems at RRZE, it is sufficient to copy the key to only one system, e.g. cshpc. It will be automatically available on all others.

If you have changed the default name of your key pair, you have to explicitly specify that this key should be used for connecting to a specific host. This is possible by using the -i parameter:

ssh -i ~/<path_to_keys>/<keyname> USERNAME@CLUSTERNAME.rrze.fau.de

For frequent usage, this is quite cumbersome. Therefore, it is possible to specify these parameters (and many more) in the ~/.ssh/config file. A detailed description of how to do this is given below.

If you have problems using your key, e.g. when you are asked for your password despite the key, or in case authentication is not working for some other reason, try using the option ssh -v. This will cause SSH to print debugging messages about its progress, which can help locate the issue much easier.

SSH agent

If you have set a passphrase for your private SSH key, you will be prompted to enter the passphrase every time you use the key to connect to a remote host. To avoid this, you can use an SSH agent. After you have entered your passphrase for the first time, this small tool will store your private key for the duration of your session. This will allow you to connect to a remote host without re-entering your passphrase every time.

If you are using a current Linux distribution with a graphical desktop session (Unity, GNOME,…), an SSH agent will be started automatically in the background. Your private keys will be stored automatically and used when connecting to a remote host.

In case you are not using a graphical desktop session or your SSH agent does not start automatically, you will have to start it manually by typing the following into your local terminal session:

eval "$(ssh-agent -s)"

This will start the agent in the background. To add your private key to the agent, type the following:

ssh-add ~/.ssh/<keyname>

After you have successfully entered your passphrase, you will get a confirmation message that your identity file was successfully added to the agent. This will allow you to use your key to sign in without having to enter the passphrase again in the current terminal session.

You can also list the keys which are currently managed by the SSH agent via:
ssh-add -l

For more information about the SSH agent, type man ssh-add on your terminal.

Configure host settings in ~/.ssh/config

If you are regularly connecting to multiple remote systems over SSH, you’ll find that typing all the remote hostnames, different usernames, identity files, and various more options is quite cumbersome. However, there is a much simpler solution to define shortcuts for different hosts and store SSH settings for each remote machine you connect to.

The client-side configuration file is named config and is located in the .ssh folder in your home directory. If it does not exist, you can create it manually.

The configuration file is organized in different sections for each host. You can use wildcards to match more than one host. The SSH client reads the configuration file line by line, so later matches can override earlier ones. Because of this, you should put your most general matches at the top of the file.

One simple example to create a shortcut for connection to cshpc is given below. The following is added to ~/.ssh/config:

Host cshpc
  HostName cshpc.rrze.fau.de
  User USERNAME
  IdentitiyFile ~/.ssh/private_ssh_key_name

With this configuration, you can now connect via

ssh cshpc

instead of typing

ssh -i ~/.ssh/private_ssh_key_nameUSERNAME@cshpc.rrze.fau.de

A large number of different SSH options are available. Some options which are used more frequently or are especially useful are listed below. You can find a full list by typing man ssh_config in your terminal.

Security recommendations

In general, it is recommended not to trust systems that are accessible to multiple users or that someone else has root access to, which is true for all HPC systems. Even with best efforts by the administrators to keep the systems safe, it is always possible that attackers are able to gain root rights on the system, which makes them very powerful. An attacker may for example install keyloggers or hijack your running SSH-agent, just to name a few possibilities.

Thus it is often recommended

  • not to log in via interactive passwords on untrusted hosts,
  • not to use SSH agents on untrusted hosts,
  • and not to use SSH agent forwarding to untrusted hosts.

It is generally more secure to use SSH public-private key pairs for authentication when accessing remote systems, as long as these rules are followed:

  • Store no private keys on untrusted hosts. Private keys should only be placed on single-user systems (e.g. your laptop).
  • Always use SSH private keys with strong passphrases.
  • Use only one SSH key pair per system with shared homes.
  • Use a separate key pair for every client (laptop, desktop,..).

To make it easier to jump between different systems at RRZE, we recommend generating a separate key for internal use only. This key may also be used for access to external systems (e.g. LRZ).

SSH agent forwarding

SSH agent forwarding is mostly used as a Single-Sign-On solution to connect from one remote host to another (e.g. from cshpc to other cluster frontend or between different cluster frontends). When you enable SSH agent forwarding, the query of the remote server for the private key is redirected to your local client where the SSH-agent is running. This eliminates the need for using password logins and for having private keys on remote machines. However, it is not recommended to use SSH agent forwarding to an untrusted host. Attackers with the ability to bypass file permissions on the remote machine can gain access to the agent on your local machine through the forwarded connection. An attacker cannot obtain key material from the agent, however, they can use the loaded keys to gain access to remote machines with your identity. An alternative to using SSH-agent forwarding is the ProxyJump functionality provided by SSH, which is described below.

X11 forwarding

Similar to SSH agent forwarding, X11 forwarding can be a security risk. If your SSH client is configured to generally allow applications on a remote server to render GUI windows on your screen, this can be exploited by an attacker. It is therefore recommended to specify ForwardX11 no for all hosts in ~/.ssh/config and only use -X on the command line when necessary.

Host keys

SSH host keys are used to verify a server’s identity before you sent any sensitive information like passwords to it. Each server has a unique host key, which is the server’s public key. It can be used by the client to decrypt an authentication message sent from the server when connecting. This makes sure that the remote host you connect to is really the one you intended to connect to, and that your connection is not secretly redirected to another server.

SSH clients automatically store host keys for all hosts they have connected to. These keys are normally stored in ~/.ssh/known_hosts. If the host key of a server you are trying to connect to has changed, you will get a warning message.

When you connect to a server for the first time, you cannot know if the key offered by the server is correct. Therefore, we provide the public system keys for the cluster frontends below, which can be directly added into the ~/.ssh/known_hosts file (you may need to generate the .ssh directory and/or the file if it does not exist yet) on your local machine.

cshpc.rrze.fau.de ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAIEAs0wFVn1PN3DGcUtd/JHsa6s1DFOAu+Djc1ARQklFSYmxdx5GNQMvS2+SZFFa5Rcw+foAP9Ks46hWLo9mOjTV9AwJdOcSu/YWAhh+TUOLMNowpAEKj1i7L1Iz9M1yrUQsXcqDscwepB9TSSO0pSJAyrbuGMY7cK8m6//2mf7WSxc=
cshpc.rrze.fau.de ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPSIFF3lv2wTa2IQqmLZs+5Onz1DEug8krSrWM3aCDRU
cshpc.rrze.fau.de ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNVzp97t3CxlHtUiJ5ULqc/KLLH+Zw85RhmyZqCGXwxBroT+iK1Quo1jmG6kCgjeIMit9xQAHWjS/rxrlI10GIw= 

emmy.rrze.fau.de ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA2q7Ung+RdwLkMyQXiod/6BFsUBMcKlEnvG3pFR7cw7/wdLUcjUU4ubQR9ctNlQZok7XU9b2ttMVwUOYI3w2RZnQFwm9jzUbAAl00XRfBThI9cWlgJu0UR/I+W/iRJdBSAmffwsQYTYBzJ4cRTtKSLZ98yEbJVtwfRRG12PVMewNGVDsnmBOBX5zWG92tgaA1bXAiB0GVWBS79lV78+ii/1UR/PldZaA+RQtxDx0ckuc8vq10XK4GvXJijyrEzIsi3SeIFApMhr+W84uIGp5HjhaaYwVWMkBge8PX8bR8oXNaUFLVmaRUrX/WSchCmLp2YBh3npeZ/B9vAtb6LXoS7Q==
emmy.rrze.fau.de ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBqBH0GRzrNUrTyOE25TkQXqY/30PLVUqUam93XArPMb
emmy.rrze.fau.de ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBEsYP0xDLyI/VHC68o4BqZ1RR5Ff7qMscZjKiKD1kEP2ckea0dMdH4oB4ahScShcEG5iZmQ2FlN41FbGX4zp6Go=

woody.rrze.fau.de ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAEAQDeiwzMm9JQ3fnc7AqLkXrlmUOHh5CXh11XF0gQe0wqZx2sER+oumfi+T0M2lWMTBoeMOL74fMal/Bgpq66ETqnodDyVasyD6LwaJwEVIxlss9gcrN1SPa1XaXxgAhpEaR7mTwrNLjM6W5d3+6CMiLvp32lsL4RrQHShjhkXAYrhr3RbApMLwFdb6QZzHa7teN47aMy9s6oubRr4haoeTbfFGRaQjIyguG18nrOcnTlhyPafiHyivL5AE0wLiLCZyqux5Q0GZhIr1uK8smyT1QIIf55A8bRHVvE/QGkcT8lz2w9OnKJKNHUS/UA4MerJIM5V2/IOOdSeDgLnMuJ0usEwawgqNXoqx76X1wuhXA8IqaP4J0vo2OkK7QdyZP7qKP2YMZoDwFmyl73C8xaCw28ovIYGzPmCVLpwtIAQf0uX5xe2yWo9hLPhfP3rTKEKksOVnLKcLNpMLMqxxJHsHbLnmbFn7RdDWQ92JH0nZDAjmUZ2NaHzoPbcz5y1/CCvdURUrNLSosaMqcclq5yZif8uWtUQ8wvIacrMQUFetPTz86dg9ryIrZhxOaYgWzNQVV2ZeED4k5P+QQLkjbvw26htYWxHP7BpTOxIYryQJO/gRMTOnDPP/js49nECn1bW07HDYznhhztGVcjZAgNND8hELHxAmG3WYAsR0/sOMM1ddddM3GbYaCzX++3EE26dvEWpy3J6rHRq9mvGhRG7p8Y2LozlyDXo8wodNNci2/kXTArgeZnU5W15awjl9G5haPcoeNxg467T7bIKGq9JdLkHhqBrqGesrM0ADcDLufgrcT0SIukrc9rSOgVWtYfnXeRWfj7FrjaT15FpWeFSxBXqkQeOrScrPpmbkE7fR5xJYPFDugXQs2FvIjfvW8TsSWaxyt7eLbiFdfa1czGO1S5SOIASIn4/6CuECasvMalSX0JKKLV3Yhs7zXMk3t3wiAHXJ8m+PZB7sY0jhU1UDJIymbvwSzEtrRpbXLkQDhf9XHuG0yNS8dd9u25X6jqoWogPGKoEpQX/2xicebMfJRA3TLWuOM4RtqwAYNVrjsmAfXVmAewvlAtPNFrD0JeKJANVGfa6JFvLfhGHI2UVmdt5vzQzneI/31/+2jNbglcheAfsUO5gPbq3BdToM1bDxJ8hWw3sS2gZ2DZybVz/95rdh9zcj+ciCDMjYypVzgmDROskoAcoVRdKyOE1ZJ3jCOPvJphEPwDSNUBGiYu6LCZdTcwMsepGOvNYbk/c9LIIyczNFh/H46cgekYgVx0i8LwmhJkxCnaK7N15NkMHsK7yInjLqfzKvZQ0z7mfmeXKVgDQVEDxjsdUYq3UUCcbA8muWyuSUtTu3+wSG/v2xhl
woody.rrze.fau.de ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIPC9x4/BNwKQro3+95Gwh4DZpHBT2tVHPjKouwIBOk6g
woody.rrze.fau.de ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBL7lrwFFhlZZ7mGBJ3f5gSxDEKcxvebrXLXd/bz0fH6A9Qk2GrJN2tL+sleVPRJHTboOFbdeaJy0igSwivqI2vc= 

meggie.rrze.fau.de ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCwi2jQPuIe88/SJRmaKmA1VOOse4UxyjWlqp6VHM+8gggkpajGz3l6xZD1BihqOpY10oIA6rRHBQipZmFGgDkgTT40jdMvP8sLzqtJqKoQILXJqQbGWrGgjEDwXdZHIWaiV5Q8XDAgqj9+4W9ZHfeGtgS2OqhzAlTdgHzx94h8m6J8JUc+QtPGlWGBr/Z2Ee+KFEG1siT09k7E72sOnL9VDqMHFlWtHUsGfcR+8f6hnKnSHBB2TpxGac2Yv0KpqtHFdGMLY22RzDgCoEeY42fLvOqF9xIU8NgWoqII4W1AcvvpPDe8EthnKkaMsQjqj6N1uJ1qpsOZry7TiwQQF2/D
meggie.rrze.fau.de ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBNmjmhh6fMxEkmNybzP3Maau/KRbOTZECKF8FxZVH3a3rMirSyjRG8LLNswctajPJxeQCAb5OIh1A63PbsIA2g8=
meggie.rrze.fau.de ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILOVhUpUyaugYDdwpHCuKfcgS0PQjZN+7KlbJ5ByZvhi 

Please keep in mind that these keys are changed from time to time. So if you get a warning while connecting, first check the above keys if they have changed!

Advanced Usage

 

Proxy Jump

If you want to connect from one host to another without the risks involved by using SSH agent forwarding and without having to type your password each time, the ProxyJump functionality of SSH can be a good alternative. When using ProxyJump, the connection is forwarded through one or more jump hosts to the target destination host via SSH. This is the most secure method because encryption is end-to-end. You can use ProxyJump for example to connect to the emmy cluster frontend by using cshpc as the jump host. This can be achieved via the following additions to ~/.ssh/config:

Host emmy
  HostName emmy.rrze.fau.de
  ProxyJump cshpc.rrze.fau.de

SSH config

There are some options to use in ~/.ssh/config that can simplify your workflow. To see all available options, type man ssh_config in your terminal.

  • Instead of defining the same identity file explicitly for every host, you can also define to always use the same key for a specific user:
    Match User JohnDoe
      IdentityFile ~/.ssh/private_ssh_key_name
  • Specify that only the SSH keys listed in ~/.ssh/config should be tried for authentication. This can avoid a „Too many authentication failures“ error, if the SSH agent offers many different keys.
    IdentitiesOnly yes
  • It is possible to use wildcards (*,?,..) in hostnames to reduce the number of explicit entries. For example, it is possible to deny SSH agent and X11 forwarding for all hosts via:
    Hosts * 
      ForwardAgent no 
      ForwardX11 no

OpenFOAM

OpenFOAM (for „Open-source Field Operation And Manipulation“) is a C++ toolbox for the development of customized numerical solvers, and pre-/post-processing utilities for the solution of continuum mechanics problems, most prominently including computational fluid dynamics (CFD). It contains solvers for a wide range of problems, from simple laminar regimes to DNS or LES including reactive turbulent flows. It provides a framework for manipulating fields and solving general partial differential equations on unstructured grids based on finite volume methods. Therefore, it is suitable for complex geometries and a wide range of configurations and applications.

There are three main variants of OpenFOAM that are released as free and open-source software under a GPLv3 license: ESI OpenFOAM, The OpenFOAM Foundation, Foam Extend.

Availability / Target HPC systems

We provide modules for some major OpenFOAM versions, which were mostly requested by specific groups or users. If you have a request for a new version, please contact support-hpc@fau.de. Please note that we will only provide modules for fully released versions, which will be used by more than one user. If you need some specific custom configuration or version, please consider building it yourself. Installation guides are available from the respective OpenFOAM distributors.

The installed versions of OpenFOAM may differ between the different HPC clusters. You can check the available versions via module avail openfoam.

Production jobs should be run on the parallel HPC systems in batch mode.  It is NOT permitted to run computationally intensive OpenFOAM simulation runs or serial/parallel post-processing sessions with large memory consumption on login nodes.

Notes

  • OpenFOAM produces per default lots of small files – for each processor, every step, and for each field. The parallel file system ($FASTTMP) is not made for such a finely grained file/folder structure. For more recent versions of OpenFOAM, you can use collated I/O, which produces somewhat less problematic output.
  • Paraview is used for post-processing and is also available via the modules system on the HPC cluster. However, keep an eye on the main memory requirements for this visualization, especially on the frontends!

Sample job scripts

All job scripts have to contain the following information:

  • Resource definition for the queuing system (more details here)
  • Load OpenFOAM environment module
  • Start command for parallel execution of solver of choice

For meggie/Slurm batch system: mpirun takes the parameters (nodes, tasks-per-node) that you specified in the header of your batch file. You don’t have to specify this again in your mpirun call (see also MPI on meggie).  In order that this works correctly, the total number of MPI tasks (nodes times tasks-per-node) must be equal to numberOfSubdomains inside system/decomposeParDict!

 

#!/bin/bash -l
#PBS -lnodes=4:ppn=40,walltime=24:00:00
#PBS -N my-job-name
#PBS -j eo

# number of cores to use per node
PPN=20
# load environment module
module load openfoam/XXXX

# change to working directory 
cd ${PBS_O_WORKDIR}

# count the number of nodes
NODES=`uniq ${PBS_NODEFILE} | wc -l`
# calculate the number of cores actually used
CORES=$(( ${NODES} * ${PPN} ))

# Please insert here your prefered solver executable!
mpirun -np ${CORES} -npernode ${PPN} icoFoam -parallel -fileHandler collated > logfile

#!/bin/bash -l
#SBATCH --job-name=my-job-name
#SBATCH --nodes=4
#SBATCH --tasks-per-node=20                   # for 20 physical cores on meggie
#SBATCH --time=24:00:00 
#SBATCH --export=NONE 

# load environment module 
module load openfoam/XXXX 

unset SLURM_EXPORT_ENV 

# Please insert here your prefered solver executable! 
mpirun icoFoam -parallel -fileHandler collated > logfile

Further information

Mentors

  • please volunteer!

ANSYS CFX

ANSYS CFX is a general purpose Computational Fluid Dynamics (CFD) code. It provides a wide variety of physical models for turbulent flows, acoustics, Eulerian and Lagrangian multiphase flow modeling, radiation, combustion and chemical reactions, heat and mass transfer including CHT (conjugate heat transfer in solid domains). It is mostly used for simulating turbomachinery, such as pumps, fans, compressors and gas and hydraulic turbines.

Please note that the clusters do not come with any license. If you want to use ANSYS products on the HPC clusters, you have to have access to suitable licenses. These can be purchased directly from RRZE. To efficiently use the HPC resources, ANSYS HPC licenses are necessary.

Availability / Target HPC systems

Different versions of all ANSYS products are available via the modules system, which can be listed by module avail ansys. A special version can be loaded, e.g. by module load ansys/2020R1.

We mostly install the current versions automatically, but if something is missing, please contact support-hpc@fau.de.

Production jobs should be run on the parallel HPC systems in batch mode.

ANSYS CFX can also be used in interactive GUI mode for serial pre- and/or post-processing on the login nodes (Linux: SSH Option „-X“; Windows: using PuTTY and XMing for X11-forwarding). This should only be used to make quick simulation setup changes. It is NOT permitted to run computationally intensive ANSYS CFX simulation runs or serial/parallel post-processing sessions with large memory consumption on login nodes.

Alternatively, ANSYS CFX can be run interactively with GUI on TinyFat (for large main memory requirements) or on a compute node.

Getting started

The (graphical) CFX launcher is started by typing

cfx5launch

on the command line. If you want to use the separate pre- or postprocessing capabilities, you can also launch cfx5pre or cfx5post, respectively.

For running simulations in batch mode on the HPC systems, use the

cfx5solve

command. You can find out the available parameters via cfx5solve -help. One example call to use in your batch script would be

cfx5solve -batch -par-dist $NODELIST -double -def <solver input file>

The number of processes and the hostnames of the compute nodes to be used are defined in $NODELIST. For how to compile this list, refer to the example script below. Using SMT threads is not recommended.

Notes

  • We recommend writing automatic backup files (every 6 to 12 hours) for longer runs to be able to restart the simulation in case of a job or machine failure. This can be specified in Output Control → User Interface → Backup Tab
  • Furthermore, it is recommended to use the „Elapsed Wall Clock Time Control“ in the job definition in ANSYS CFX Pre (Solver Control → Elapsed Wall Clock Time Control → Maximum Run Time → <24h). Also plan enough buffer time for writing the final output, depending on your application, this can take quite a long time!

Sample job scripts

All job scripts have to contain the following information:

  • Resource definition for the queuing system (more details here)
  • Load ANSYS environment module
  • Generate a file with names of hosts of the current simulation run to tell CFX on which nodes it should run (see example below)
  • Execute cfx5solve with appropriate command line parameters (available options via cfx5solve -help)

#!/bin/bash -l
#PBS -lnodes=4:ppn=40,walltime=24:00:00
#PBS -N cfx
#PBS -j eo

# specify the name of your input-def file
DEFFILE="example.def"
# number of cores to use per node
PPN=20
# load environment module
module load ansys/XXXX
# generate node list
uniq $PBS_NODEFILE | sed -e 's/$/*'$PPN'/' | paste -d ',' -s > NODELIST


# execute cfx with command line parameters (see cfx5solve -help for all available parameters)  
 cfx5solve -batch -double -par-dist  $NODELIST -def $DEFFILE

Further information

  • Documentation is available within the application help manual. Further information is provided through the ANSYS Customer Portal for registered users.
  • More in-depth documentation is available at LRZ. Please note: not everything is directly applicable to HPC systems at RRZE!

Mentors

 

Tensorflow

TensorFlow is an Open Source Machine Learning Framework.

Availability / Target HPC systems

TensorFlow currently is not installed on any of RRZE’s HPC systems as new versions are very frequently released and all groups have their own special needs.

The following HPC systems are best suited:

  • TinyGPU or GPU nodes in Emmy
  • Woody smaller but many for CPU-only runs

Notes

Different routes can be take to get your private installation of TensorFlow. Don’t waste valuable storage in $HOME and use $WORK instead for storing your installation.

The pre-built Docker images might not work on the GTX980 nodes in TinyGPU as their host CPU is too old to support the required AVX instruction set.

Official Docker images are regularly pubilshed on https://hub.docker.com/r/tensorflow/tensorflow. These images can be used with Singurlarity on our HPC systems. Run the following steps on the woody frontend to pull your image:

cd $WORK
export SINGULARITY_CACHEDIR=$(mktemp -d)
singularity pull tensorflow-2.1.0-gpu-py3.sif docker://tensorflow/tensorflow:2.1.0-gpu-py3
rm -rf $SINGULARITY_CACHEDIR

Within your job script you use the container as follows. /home/* and /apps/ are automatically bind-mounted into the container. On TinyGPU (but currently not on Emmy), GPU device libraries are also automatically bind-mounted into the container.

./tensorflow-2.1.0-gpu-py3.sif  ./script.py

On the GPU nodes of Emmy you have to use singularity run --nv tensorflow-2.1.0-gpu-py3.sif  ./script.py.

Nvidia maintains own Docker images for TensorFlow on the NVIDIA GPU Cloud (NGC) which are updated once per months: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow. These images can also be used with Singurlarity on our TinyGPU. Run the following steps on the woody frontend to pull your image:

cd $WORK
export SINGULARITY_CACHEDIR=$(mktemp -d)
singularity pull tensorflow-ngc-20.03-tf2-py3.sif docker://nvcr.io/nvidia/tensorflow:20.03-tf2-py3
rm -rf $SINGULARITY_CACHEDIR

Within your job script you use the container as follows. /home/* and /apps/ are automatically bind-mounted into the container. On TinyGPU (but currently not on Emmy), GPU device libraries are also automatically bind-mounted into the container.

./tensorflow-ngc-20.03-tf2-py3.sif  script.py

On the GPU nodes of Emmy you have to use singularity run --nv tensorflow-ngc-20.03-tf2-py3.sif  ./script.py.

When manually installing TensorFlow (into a Python VirtualEnv) using pip remember to load one of your python modules! The system python will not be sufficient.

Anaconda also comes with TensorFlow packages in conda-forge. Either load one of your python modules and install the additional packages into one of your directories or start with your private (mini)conda installation from scratch! The system python will not be sufficient.

To check that your TensorFlow is functional and detects the hardware, you can use the following simple Python sequence:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Further information

Mentors

  • please volunteer!

Test cluster

The RRZE test and benchmark cluster  is an environment for porting software to new CPU architectures and running benchmark tests. It comprises a variety of nodes with different processors, clock speeds, memory speeds, memory capacity, number of CPU sockets, etc. There is no high-speed network, and MPI parallelization is restricted to one node. The usual NFS file systems are available.

This is a testing ground. Any job may be canceled without prior notice. For further information about proper usage, please contact HPC@RRZE.

This is a quick overview of the systems including their host names (frequencies are nominal values) – NDA systems are not listed:

  • aurora1: Single Intel Xeon „Skylake“ Gold 6126 CPU (12 cores + SMT) @ 2.60GHz.
    Accelerators: 2x NEC Aurora „TSUBASA“ 10B (48 GiB RAM)
  • broadep2: Dual Intel Xeon „Broadwell“ CPU E5-2697 v4 (18 cores + SMT) @ 2.30GHz, 128 GiB RAM
  • casclakesp2: Dual  Intel Xeon „Cascade Lake“ Gold 6248 CPU (20 cores + SMT) @ 2.50GHz, 384 GiB RAM
  • hasep1: Dual Intel Xeon „Haswell“ E5-2695 v3 CPU (14 cores + SMT) @ 2.30GHz, 64 GiB RAM
  • interlagos1: Dual AMD Opteron 6276 „Interlagos“ CPU (16 cores) @ 2.3 GHz, 64 GiB RAM.
    Accelerator: AMD Radeon VII GPU (16 GiB HBM2)
  • ivyep1: Dual Intel Xeon „Ivy Bridge“ E5-2690 v2 CPU (10 cores + SMT) @ 3.00GHz, 64 GiB RAM
  • medusa: Dual Intel Xeon „Cascade Lake“ Gold 6246 CPU (12 cores + SMT) @ 3.30GHz, 192 GiB RAM.
    Accelerators:
    NVIDIA GeForce RTX 2070 SUPER (8 GiB GDDR6)
    – NVIDIA GeForce RTX 2080 SUPER (8 GiB GDDR6)
    – NVIDIA Quadro RTX 5000 (16 GiB GDDR6)
    – NVIDIA Quadro RTX 6000 (24 GiB GDDR6)
  • naples1: Dual AMD EPYC 7451 „Naples“ CPU (24 cores + SMT) @ 2.3 GHz, 128 GiB RAM
  • phinally: Dual Intel Xeon „Sandy Bridge“ CPU E5-2680 (8 cores + SMT) @ 2.70GHz, 64 GiB RAM
  • rome1: Single AMD EPYC 7452 „Rome“ CPU (32 cores + SMT) @ 2.35 GHz, 128 GiB RAM
  • skylakesp2: Intel Xeon „Skylake“ Gold 6148 CPU (20 cores + SMT) @ 2.40GHz, 96 GiB RAM
  • summitridge1: AMD Ryzen 7 1700X CPU (8 cores + SMT), 32 GiB RAM
  • warmup: Dual Cavium/Marvell „ThunderX2“ (ARMv8) CN9980 (32 cores + 4-way SMT) @ 2.20 GHz, 128 GiB RAM

Technical specifications of all more or less recent GPUs available at RRZE (either in the Testcluster or in TinyGPU):

RAM BW

[GB/s]

Ref Clock
[GHz]
Cores
Shader/TMUs/ROPs
TDP

[W]

SP
[
TFlop/s]
DP

[TFlop/s]

Host Host CPU
(base clock frequency)
 
Nvidia Geforce GTX980 4 GB GDDR5 224 1,126 2048/​128/​64 180 4,98 0,156 tg00x Intel Xeon Nehalem X5550 (4 Cores, 2.67GHz)
Nvidia Geforce GTX1080 8 GB GDDR5 320 1,607 2560/​160/​64 180 8,87 0,277 tg03x Intel Xeon Broadwell E5-2620 v4 (8 C, 2.10GHz)
Nvidia Geforce GTX1080Ti 11 GB GDDR5 484 1,480 3584/​224/​88 250 11,34 0,354 tg04x Intel Xeon Broadwell E5-2620 v4 (8 C, 2.10GHz)
Nvidia Geforce RTX2070Super 8 GB GDDR6 448 1,605 2560/​160/​64 215 9,06 0,283 medusa Intel Xeon Cascadelake Gold 6246 (12 C, 3.30GHz)
Nvidia Quadro RTX5000, active
16 GB GDDR6 448 1,620 3072/​192/​64 230 11,15 0,348 medusa Intel Xeon Cascadelake Gold 6246 (12 C, 3.30GHz)
Nvidia Geforce RTX2080Super 8 GB GDDR6 496 1,650 3072/​192/​64 250 11,15 0,348 medusa Intel Xeon Cascadelake Gold 6246 (12 C, 3.30GHz)
Nvidia Geforce RTX2080Ti 11 GB GDDR6 616 1,350 4352/​272/​88 250 13,45 0,420 tg06x Intel Xeon Skylake Gold 6134 (8 Cores, 3.20GHz)
Nvidia Quadro RTX6000, active
24 GB GDDR6 672 1,440 4608/​288/​96 260 16,31 0,510 medusa Intel Xeon Cascadelake Gold 6246 (12 C, 3.30GHz)
Nvidia Tesla V100 (PCIe, passive) 32 GB HBM2 900 1,245 5120 Shader 250 14,13 7,066 tg07x Intel Xeon Skylake Gold 6134 (8 Cores, 3.20GHz)
AMD Radeon VII 16 GB HBM2 1024 1,400 3840/​240/​64 300 13,44 3,360 interlagos1 AMD Interlagos Opteron 6276

This website shows information regarding the following topics:

Access, User Environment, and File Systems

Access to the machine

Note that access to the test cluster is restricted: If you want access to it, you will need to contact hpc@rrze. In order to get access to the NDA machines you have to provide a short (!) description of what you want to do there.

From within the FAU network, users can connect via SSH to the frontend
testfront.rrze.fau.de
If you need access from outside of FAU, you usually have to connect for example to the dialog server cshpc.rrze.fau.de first and then ssh to testfront from there.

While it is possible to ssh directly to a compute node, a user is only allowed to do this while they have a batch job running there. When all batch jobs of a user on a node have ended, all of their processes, including any open shells, will be killed automatically.

The login nodes and most of the compute nodes run Ubuntu 18.04. As on most other RRZE HPC systems, a modules environment is provided to facilitate access to software packages. Type „module avail“ to get a list of available packages. Note that, depending on the node, the modules may be different due to the wide variety of architectures. Expect inconsistencies. In case of questions, contact  hpc@rrze.

File Systems

The nodes have local hard disks of very different capacities and speeds. These are not production systems, so do not expect a production environment.

When connecting to the front end node, you’ll find yourself in your regular RRZE $HOME directory (/home/hpc/...). There are relatively tight quotas there, so it will most probably be too small for the inputs/outputs of your jobs. It however does offer a lot of nice features, like fine grained snapshots, so use it for „important“ stuff, e.g. your job scripts, or the source code of the program you’re working on. See the HPC file system page for a more detailed description of the features and the other available file systems including, e.g., $WORK.

Batch processing

As with all production clusters at RRZE, resources are controlled through a batch system, SLURM in this case. Due to the broad spectrum of architectures in the test cluster, it is usually advisable to compile on the target node using an interactive SLURM job (see below).

There is a „work“ queue and an „nda“ queue, both with up to 24 hours of runtime.  Access to the „nda“ queue is restricted because the machines tied to this queue are pre-production hardware or otherwise special so that benchmark results must not be published without further consideration.

Batch jobs can be submitted on the frontend. The default job runtime is 10 minutes.

The currently available nodes can be listed using:

sinfo -o "%.14N %.9P %.11T %.4c %.8z %.6m %.35f"

To select a node, you can either use the host name or a feature name from sinfo:

  • sbatch --nodes=1 --constraint=featurename --time=hh:mm:ss --export=NONE jobskript
  • sbatch --nodes=1 --nodelist=hostname --time=hh:mm:ss --export=NONE jobskript

Submitting an interactive job:

srun --nodes=1 --nodelist=hostname --time=hh:mm:ss --export=NONE --pty /bin/bash -l

For getting access to performance counter registers and other restricted parts of the hardware (so that lkwid-perfctr works as intended), use the option -C hwperf.

By default, SLURM exports the environment of the shell where the job was submitted. If this is not desired, use   --export=NONE and  unset SLURM_EXPORT_ENV. Otherwise, problems may arise on nodes that do not run Ubuntu.

Please see the batch system description for further details.

ANSYS Mechanical

ANSYS Mechanical is a computational structural mechanics software which makes it possible to solve structural engineering problems. It is available in two different software environments – ANSYS Workbench (the newer GUI-oriented environment) and ANSYS Mechanical APDL (sometimes called ANSYS Classic, the older MAPDL scripted environment).

Please note that the clusters do not come with any license. If you want to use ANSYS products on the HPC clusters, you have to have access to suitable licenses. These can be purchased directly from RRZE. To efficiently use the HPC resources, ANSYS HPC licenses are necessary.

Availability / Target HPC systems

Production jobs should be run on the parallel HPC systems in batch mode. For simulations with high memory requirements, a single-node job on TinyFAT can be used.

ANSYS Mechanical can also be used in interactive GUI mode via Workbench for serial pre- and/or post-processing on the login nodes. This should only be used to make quick simulation setup changes.  It is NOT permitted to run computationally/memory intensive ANSYS Mechanical simulations on login nodes.

Different versions of all ANSYS products are available via the modules system, which can be listed by module avail ansys. A special version can be loaded, e.g. by module load ansys/2019R1.

We mostly install the current versions automatically, but if something is missing, please contact support-hpc@fau.de.

Notes

  • Two different parallelization methods are available: shared-memory and distributed-memory parallelization.
  • Shared-memory parallelization: uses multiple cores on a single node; specify via ansys191 -smp -np N, default: N=2
  • Distributed-memory parallelization: uses multiple nodes; specify via ansys191 -dis -b -machines machine1:np:machine2:np:...

Sample job scripts

All job scripts have to contain the following information:

  • Resource definition for the queuing system (more details here)
  • Load ANSYS environment module
  • Generate a variable with names of hosts of the current simulation run and specify the number of processes per host
  • Execute Mechanical with appropriate command line parameters (distributed memory run in batch mode)
  • Specify input and output file

#!/bin/bash -l
#PBS -lnodes=2:ppn=40,walltime=24:00:00
#PBS -N mech
#PBS -j eo

# load environment module
module load ansys/XXXX
# generate machine list, uses 20 processes per node
machines=$(cat $PBS_NODEFILE | uniq | echo $(awk '{print $0":20"}') | sed 's/ /:/g')

# execute mechanical with command line parameters 
# Please insert here the correct version and your own input and output file with its correct name! 
ansys191 -dis -b -machines $machines < input.dat > output.out

Further information

  • Documentation is available within the application help manual. Further information is provided through the ANSYS Customer Portal for registered users.
  • More in-depth documentation is available at LRZ. Please note: not everything is directly applicable to HPC systems at RRZE!

Mentors

 

IMD

IMD is a software package for classical molecular dynamics simulations. Several types of interactions are supported, such as central pair potentials, EAM potentials for metals, Stillinger-Weber and Tersoff potentials for covalent systems, and Gay-Berne potentials for liquid crystals. A rich choice of simulation options is available: different integrators for the simulation of the various thermodynamic ensembles, options that allow to shear and deform the sample during the simulation, and many more. There is no restriction on the number of particle types. (http://imd.itap.physik.uni-stuttgart.de/)

The latest versions of IMD are released under GPL-3.0.

Availability / Target HPC systems

IMD is currently not centrally installed, but can be installed locally in users‘ home folder. Follow the instruction on http://imd.itap.physik.uni-stuttgart.de/userguide/compiling.html. While compiling at RRZE, first load „intel64“ module. It is recommended to clean the compilation before initiating a new compiling process, i.e. gmake clean. SpecifyIMDSYS=lima on any of RRZE’s cluster; however, only use the resulting binary on the cluster where you produced it, i.e. recompile again with IMDSYS=lima when moving to a different cluster.

If there is enough demand, RRZE might also provide a module for IMD.

Sample job scripts

#!/bin/bash -l
#PBS -l nodes=2:ppn=40,walltime=01:00:00
#PBS -N myJobName
#PBS -M myEmailAdress@fau.de
#PBS -j oe

module load intel64

# specify the full path of the IMD excutable
IMDCMD=$HOME/bin/imd_mpi_eam4point_fire_fnorm_homdef_stress_nbl_mono_hpo

# input parameter file name
PARAM=myJob.param

# run your job
/apps/rrze/bin/mpirun_rrze-intelmpd -pinexpr S0:0-9@S1:0-9  $IMDCMD -p $PARAM

Further information

Mentors

 

Quantum Espresso

Quantum Espresso is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.

Availability / Target HPC systems

  • parallel computers: try to stick to one k-point / node, main target machines
  • throughput cluster Woody: might be useful for small systems, manually distributed phonon calculations

Notes on parallelization in general

  • do not use Hyperthreading
    • e.g. Emmy, OpenMPI (3.1): mpirun –report-bindings –bind-to core –map-by ppr:1:core
  • use image parallelization .e.g. for NEB / phonon calculation
  • ask for help with parallelization of phonon calculation
  • use gamma point version (KPOINTS GAMMA) instead of KPOINTS AUTOMATIC
  • k-point parallelization
    • 1 k-point per  node .e.g. -npools #nnodes
    • -npools must be a divisor of #MPI tasks
  • -ndiag for  #bands > 500 (use ELPA version, will be provided soon)
  • -ntg 2,5,10 as a last resort only, and if nr3 < #MPI tasks

Sample job scripts

TBD

Further information

Mentors

  • T. Klöffel, RRZE, support-hpc@fau.de
  • AG B. Meyer (Interdisciplinary Center for Molecular Materials)