Tensorflow

TensorFlow is an Open Source Machine Learning Framework.

Availability / Target HPC systems

TensorFlow currently is not installed on any of RRZE’s HPC systems as new versions are very frequently released and all groups have their own special needs.

The following HPC systems are best suited:

  • TinyGPU or GPU nodes in Emmy
  • Woody smaller but many for CPU-only runs

Notes

Different routes can be take to get your private installation of TensorFlow. Don’t waste valuable storage in $HOME and use $WORK instead for storing your installation.

The pre-built Docker images might not work on the GTX980 nodes in TinyGPU as their host CPU is too old to support the required AVX instruction set.

Official Docker images are regularly pubilshed on https://hub.docker.com/r/tensorflow/tensorflow. These images can be used with Singurlarity on our HPC systems. Run the following steps on the woody frontend to pull your image:

cd $WORK
export SINGULARITY_CACHEDIR=$(mktemp -d)
singularity pull tensorflow-2.1.0-gpu-py3.sif docker://tensorflow/tensorflow:2.1.0-gpu-py3
rm -rf $SINGULARITY_CACHEDIR

Within your job script you use the container as follows. /home/* and /apps/ are automatically bind-mounted into the container. On TinyGPU (but currently not on Emmy), GPU device libraries are also automatically bind-mounted into the container.

./tensorflow-2.1.0-gpu-py3.sif  ./script.py

On the GPU nodes of Emmy you have to use singularity run --nv tensorflow-2.1.0-gpu-py3.sif  ./script.py.

Nvidia maintains own Docker images for TensorFlow on the NVIDIA GPU Cloud (NGC) which are updated once per months: https://ngc.nvidia.com/catalog/containers/nvidia:tensorflow. These images can also be used with Singurlarity on our TinyGPU. Run the following steps on the woody frontend to pull your image:

cd $WORK
export SINGULARITY_CACHEDIR=$(mktemp -d)
singularity pull tensorflow-ngc-20.03-tf2-py3.sif docker://nvcr.io/nvidia/tensorflow:20.03-tf2-py3
rm -rf $SINGULARITY_CACHEDIR

Within your job script you use the container as follows. /home/* and /apps/ are automatically bind-mounted into the container. On TinyGPU (but currently not on Emmy), GPU device libraries are also automatically bind-mounted into the container.

./tensorflow-ngc-20.03-tf2-py3.sif  script.py

On the GPU nodes of Emmy you have to use singularity run --nv tensorflow-ngc-20.03-tf2-py3.sif  ./script.py.

When manually installing TensorFlow (into a Python VirtualEnv) using pip remember to load one of your python modules! The system python will not be sufficient.

Anaconda also comes with TensorFlow packages in conda-forge. Either load one of your python modules and install the additional packages into one of your directories or start with your private (mini)conda installation from scratch! The system python will not be sufficient.

To check that your TensorFlow is functional and detects the hardware, you can use the following simple Python sequence:

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

Further information

Mentors

  • please volunteer!