TinyX compute-clusters

Introduction

The TinyX clusters are a group of small special purpose clusters:

  • TinyFat for applications needing lots of main memory
  • TinyGPU for applications utilizing GPU-accelerators
  • TinyEth for throughput workloads – this cluster is made from „recycled“ hardware

While their hardware is very diverse, these clusters share a few properties:

  • They are very small, hence the TinySomething naming scheme.
  • They do not have their own front ends. Access is through the Woody front ends.
  • They have local hard discs (other than the big clusters which run diskless).

In general, the Documentation for Woody applies, this page will only list the differences.

Click the header to show the description for each cluster:

Memoryhog and the TinyFat cluster are intended for running serial or moderately parallel (OpenMP) applications that require large amounts of memory in one machine. They are mostly used for pre-/post-processing work for jobs on other clusters. There are a bunch of different machines in TinyFat:

Hostnames #
nodes
CPUs and number of cores per machine Main memory
(in GB)
Additional comments and
required qsub parameters
tf001-tf016 15 2x AMD Opteron 6134 („Magny Cours“) @2,3 GHz
= 16 cores
128 1 GBit Ethernet, 90 GB HDD
-l nodes=1:ppn=16the QDR-Infiniband disappeared with the shutdown of LiMa in 12/2018; thus, multi-node jobs are no longer possible on TinyFat)
memoryhog
(formerly tf020)
1 4x Intel Xeon X7560 („Nehalem EX“) @2,27 GHz
= 32 cores/64 threads
512 interactively accessible without batch job
tf040-tf042
(not generally available)
3 2x Intel Xeon E5-2680 v4 („Broadwell“) @2,4 GHz
= 28 cores/56 threads
512 10 GBit Ethernet, 1 TB Intel 750 SSD
-q broadwell512 -l nodes=1:ppn=56
tf050-tf057
(not generally available)
7 2x Intel Xeon E5-2643 v4 („Broadwell“) @3,4 GHz
= 12 cores/24 threads
256 10 GBit Ethernet, 1 TB Intel 750 SSD
-q broadwell256 -l nodes=1:ppn=24

To submit batch jobs to TinyFat, you need to use qsub.tinyfat instead of the normal qsub command on the Woody front ends. Be sure to use the correct parameters for qsub as specified in the table above.
To check the status of your jobs, use qstat.tinyfat instead of the normal qstat.

In the nodes that have an SSD, it is mounted in the directory /scratchssd. Please only use these if you can really profit from their use, as like all consumer SSDs they only support a limited number of writes, so in other words, by writing to them, you „use them up“.

All Broadwell-based nodes have been purchased by specific groups or special projects. These users have priority access and nodes may be reserved exclusively for them.

There is also the machine memoryhog that is somewhat part of the cluster: Every HPC user can log in directly to memoryhog.rrze.uni-erlangen.de to run their memory intensive workloads. This of course means you need to be considerate of other users.
Processes hogging up too many resources or running for too long will be killed without notice.

TinyGPU complements the GPU nodes in Emmy cluster and has nodes with four different types of GPUs (mostly of consumer type):

Hostnames #
nodes
CPUs and number of cores per machine, main memory GPU Type Additional comments Property
tg001-tg008 7 2x Intel Xeon 5550 („Nehalem“) @2.66 Ghz
= 8 cores/SMT off; 24 GB RAM
2x NVIDIA GTX 980 :gtx980, :anygtx,

:cuda8, :cuda9, :cuda10

tg031-tg037 7 2x Intel Xeon E5-2620v4 („Broadwell“) @2.1 GHz
= 16 cores/SMT off; 64 GB RAM
4x NVIDIA GTX 1080 local SATA-SSD (880 GB) available under /scratchssd :gtx1080, :anygtx, :any1080

:cuda8, :cuda9, :cuda10

tg040-tg049
(not generally available)
10 2x Intel Xeon E5-2620v4 („Broadwell“) @2.1 GHz
= 16 cores/SMT off; 64 GB RAM (one node has 128 GB RAM)
4x NVIDIA GTX 1080 Ti local SATA-SSD (1.8 TB) available under /scratchssd :gtx1080ti, :anygtx, :any1080

:cuda8, :cuda9, :cuda10

tg060-tg069
(not generally available)
6 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz
= 16 cores with optional SMT; 96 GB RAM
4x NVIDIA RTX 2080 Ti local SATA-SSD (1.8 TB) available under /scratchssd :rtx1080ti, :anyrtx,

:cuda10

tg071-tg073
(not generally available)
3 2x Intel Xeon Gold 6134 („Skylake“) @3.2 GHz
= 16 cores with optional SMT; 96 GB RAM
4x NVIDIA Tesla V100 (two nodes with 4x 32GB, one node with 4x 16 GB) local NVMe-SSD (2.9 TB) available under /scratchssd :v100, :v100_32g / :v100_16g

:anytesla, :cuda9, :cuda10

use :smt to add the SMT threads to the cpu set of the job – the ppn value has to remain unchanged and specifies the number of physical cores, only!

To submit batch jobs to TinyGPU, you need to use qsub.tinygpu instead of the normal qsub-command on the Woody frontends.

If you do not request a specific GPU type, your job will run on any available node. If you want to request a specific GPU type, use the properties stated in the table above. For a more general selection, there are :anygtx and :anytesla (their meaning should be obvious) as well as :cuda8:cuda9, and :cuda10 which tell the supported CUDA versions. Thus, e.g. qsub.tinygpu -l nodes=1:ppn=8:gtx980 [...].
You may request parts of a node, e.g. if you only need one GPU. For each 4 cores you request, you are also assigned one GPU. For obvious reasons, you are only allowed to request multiples of 4 cores. As an example, if you request qsub.tinygpu -l nodes=1:ppn=8:gtx1080 [...] you will get 8 cores, 2 of the GTX 1080 GPUs and half of the main memory in one of the tg03X nodes. There is no dedicated share of the local HDD/SSD assigned.

Properties can also be used to request a certain CPU clock frequency. This is not something you will usually want to do, but it can be used for certain kinds of benchmarking. Note that you cannot make the CPUs go any faster, only slower, as the default already is the turbo mode, which makes the CPU clock as fast as it can (up to 3.2 GHz, depending on the requested configuration) without exceeding its thermal or power budget. So please do not use any of the following options unless you know what you’re doing. The available options are: :noturbo to disable Turbo Mode, :fX.X to request a specific frequency. The available frequencies for the different nodes vary.

To request access to the hardware performance counters (i.e. to use likwid-perfctr), you have to add the property :likwid and request the full node. Otherwise you will get the error message Access to performance monitoring registers locked from likwid-perfctr. The property is not required (and should also not be used) for other parts of the LIKWID suite, e.g. it is not required for likwid-pin.

To check the status of your jobs, use qstat.tinygpu instead of the normal qstat.

The Woody front ends only have a limited software installation with regard to GPGPU computing. It is recommended to compile code on one of the TinyGPU nodes, i.e. by requesting an interactive job on TinyGPU.

cuDNN is not provided by RRZE due to licensing issues. If needed, download your personal copy yourself from the NVidia portal.

The GTX1080/GTX1080Ti GPUs can only be used with CUDA 8.0 (or higher). The V100 may require at least CUDA 9.0. The RTX2080Ti may require at least CUDA 10.0.

Host software compiled on the new tg06x/tg07x or tg03x/tg04x nodes might not run on the older tg00x nodes if AVX/AVX2 instructions are used. Host software using AVX512 instructions will only run on tg06x/tg07x.

31 out of the 37 nodes have been purchased by specific groups or special projects. These users have priority access and nodes may be reserved exclusively for them.

The TinyEth cluster is a simple cluster for throughput workloads. It consists of old LiMa nodes where the Infiniband port died, so they cannot be used in LiMa anymore, but they’re still good for serial or single-node jobs.

All nodes are equipped with 2x Xeon 5650 („Westmere“) CPUs (12 cores, SMT disabled) running at 2.66 GHz and 48 GB of RAM. They have a small and slow local hard disc.

To submit batch jobs to TinyEth, you need to use qsub.tinyeth instead of the normal qsub command on the Woody front ends. The request string is-l nodes=1:ppn=X with X being any number between 1 and 12. If you do not request a full node (i.e. X<12 you only get a corresponding fraction of the main memory. To check the status of your jobs, use qstat.tinyeth instead of the normal qstat.