HELIOS cluster documentation

Pavel Strachota

April 25, 2024
Download PDF version for offline use

Contents

 1 Introduction
 2 Design and Specifications
  2.1 Hardware Specs
  2.2 Software Environment
 3 Logging in to HELIOS
  3.1 User Accounts
  3.2 Command Shell
 4 Installed Software
  4.1 Environment Modules
   4.1.1 Module-Specific Notes
  4.2 System Packages
 5 Running Compute Jobs
  5.1 Job Queues
  5.2 Queue Properties
  5.3 Requesting Resources
   5.3.1 Advice for Memory Requirements
  5.4 Running Batch (Non-interactive) Jobs
   5.4.1 MPI Jobs
   5.4.2 Hybrid OpenMP / MPI Jobs
   5.4.3 CUDA Jobs
   5.4.4 Mixed CUDA + MPI (+OpenMP) Jobs
   5.4.5 MATLAB (Mathematica, R, Julia, ...) Jobs
   5.4.6 Python Jobs
   5.4.7 ANSYS Jobs
   5.4.8 OpenFOAM Jobs
  5.5 Running Interactive Jobs
   5.5.1 Jupyter Notebook
  5.6 Job Management
  5.7 Direct SSH Access to Compute Nodes
 6 Storage Space
  6.1 Home Directories
  6.2 Scratch Space
 7 Remote Visualization
 8 Support

1 Introduction

This document is a brief documentation for new users of the HELIOS cluster at the Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague. It is intended to provide the very basic information required to get started. Manuals for using the more advanced features of the software environment can be found in the references.

The impatient users may skip directly to Section 3.

2 Design and Specifications

HELIOS is a high performance computing (HPC) cluster, i.e. a system of interconnected compute servers (a.k.a. compute nodes, execution hosts) where computing tasks are scheduled and executed. Apart from the compute nodes, the system features a high capacity, high throughput storage subsystem and a server for interactive access - the login node. Users work on the login node to prepare their computing tasks for execution (compile the source code, set parameters etc.) and then use the job scheduler to submit them for execution. It is forbidden to execute demanding interactive computations directly on the login node!

2.1 Hardware Specs

2.2 Software Environment

3 Logging in to HELIOS

There are two ways of logging in to the login node:

  1. Using SSH command line interface:

    ssh USERNAME@helios.fjfi.cvut.cz

    Note that Windows users may use PuTTY. See Section 3.1 below for what to substitute for USERNAME.

  2. Remote desktop (over SSH) using the X2Go client. Please select ”MATE” as the desktop environment in the session configuration dialog.

File transfer to and from the login node can be done using the standard tools: sftp, scp, sshfs, WinSCP.

SSH server runs on the standard port 22.

3.1 User Accounts

User accounts are created upon request (see Section 8). User names and passwords are the same as for most other IT resources at the university (USERMAP, SSO, etc.). Accounts for external users can also be created.

The primary group for all users is ”users”. In addition, users are members of one or more supplementary groups (students, employees, project team members etc.). Access to job submission queues (see Section 5.1) is controlled by group membership.

3.2 Command Shell

The default command shell for all users is /bin/bash and this setting cannot be changed by chsh (it will be overwritten upon next user database update). The other installed shells are tcsh and zsh.

4 Installed Software

4.1 Environment Modules

Most useful software components except for system utilities are managed by the module command which basically sets up the environment (the contents of the PATH, MANPATH, LD_LIBRARY_PATH etc. variables) for the current shell session in an appropriate way. The software modules (e.g. the compiler and the MPI library module) must be loaded during compile time on the login node as well as during execution time on the compute nodes. See also Section 5.4.

4.1.1 Module-Specific Notes

4.2 System Packages

There is a number of internal CentOS (RPM) packages installed on the login node and on the compute nodes. All compute nodes have the same set of packages installed. If you happen to need a piece of software in the form of an additional RPM package, please contact support.

5 Running Compute Jobs

The compute jobs are scheduled to be run on the compute noes by submitting them to one of the available job queues. Whether or not a user is allowed to submit a job to a queue depends on their membership in the supplementary groups.

5.1 Job Queues

The queues are currently configured as follows:

5.2 Queue Properties

5.3 Requesting Resources

Upon submitting jobs to queues (see Section 5.4 below for examples how to do that), the user specifies the resources required for the job. Besides the execution time (wall time) mentioned above, it is necessary to specify the requirement for

Once the job is scheduled for execution, the required resources are fully granted for the job. On the other hand, the resource limits are also strictly enforced (by means of the cgroups feature of the Linux kernel).

The above resources have very restrictive default values: 1 CPU core, 256 MB of memory and no GPU accelerator. This means that:

5.3.1 Advice for Memory Requirements

The maximum amount of memory available for a job is approx. 124 GB on type A nodes (cpu_a queue) and approx. 375 GB on type B/C nodes (cpu_b, gpu queues). A tighter estimate of the maximum can be found by investigating the available resources using e.g. "pbsnodes node01" and subtracting a couple of megabytes from the obtained value.

5.4 Running Batch (Non-interactive) Jobs

Running a batch (non-interactive) job consists in creating a job submission script and using the qsub command to enqueue it. Job submission scripts are BASH/Python scripts with special ”comment-like” directives for the PBS Pro job scheduler. The directives can also be replaced by the respective command-line arguments to qsub. Below, one can find examples of submission scripts for different purposes.

5.4.1 MPI Jobs

The job submission script for an MPI job may look like this:

#!/bin/bash 
### Job Name 
#PBS -N intertrack_job 
### required runtime 
#PBS -l walltime=01:00:00 
### queue for submission 
#PBS -q cpu_a 
 
### Merge output and error files 
#PBS -j oe 
 
### request 4 chunks with 32 CPUs each 
### running 32 MPI processes per chunk (total 128 MPI ranks) 
### (i.e. 1 chunk = 1 complete node with 124 GB of memory available for the job), 
#PBS -l select=4:mem=124G:ncpus=32:mpiprocs=32 
 
### start job in the directory it was submitted from 
cd $PBS_O_WORKDIR 
 
### load the necessary software modules 
module load openmpi/2.1.5-gcc_4.8.5-psm2 
 
### run the application and provide its command line arguments 
mpirun ./intertrack Params 
 
### Note that more applications/shell commands may be added here 
### (e.g. for post-processing of the results)

Suppose we save the above script under the name ”RunQ” in the application’s executable directory. Then we submit the job by

[stracpav@login1 ~/WORK/Progs-backport/apps/intertrack]$ qsub RunQ 
1576.login1

The job’s ID is printed.

5.4.2 Hybrid OpenMP / MPI Jobs

The job submission script for a hybrid OpenMP / MPI job may look like this:

#!/bin/bash 
### Job Name 
#PBS -N intertrack-hybrid-job 
### required runtime 
#PBS -l walltime=01:00:00 
### queue for submission 
#PBS -q cpu_b 
 
### Merge output and error files 
#PBS -j oe 
 
### Request 2 chunks with 16 CPUs each 
### and spawn 1 MPI process with 16 threads on each node 
### (1 chunk = 1 complete node in the cpu_b queue with ~375 GB of memory available) 
#PBS -l select=2:mem=375G:ncpus=16:mpiprocs=1:ompthreads=16 
 
### start job in the directory it was submitted from 
cd $PBS_O_WORKDIR 
 
### load the necessary software modules 
module load openmpi/2.1.5-gcc_4.8.5-psm2 
 
### run the application (dont forget to disable OpenMPIs default binding policy 
### when multithreading is used) 
mpirun --bind-to none ./intertrack Params

5.4.3 CUDA Jobs

The job submission script for a CUDA job using 10 GB of memory, 4 CPU cores, and a single NVIDIA Tesla V100 accelerator may look like this:

#!/bin/bash 
### Job Name 
#PBS -N LBM_CUDA 
### required runtime 
#PBS -l walltime=01:00:00 
### queue for submission 
#PBS -q gpu 
 
### Merge output and error files 
#PBS -j oe 
 
#PBS -l select=1:mem=10G:ncpus=4:ngpus=1 
 
### start job in the directory it was submitted from 
cd $PBS_O_WORKDIR 
 
### load the necessary software modules 
module load cuda/10.0 
 
### run the application 
./lbm_cuda_simulation

Note that the CUDA_VISIBLE_DEVICES environment variable is not set at all, which is completely fine for device isolation by means of cgroups and CUDA versions > 7.0 .

5.4.4 Mixed CUDA + MPI (+OpenMP) Jobs

Combining CUDA and MPI allows executing parallel code on multiple GPU accelerators on a single node and even across multiple compute nodes. Currently, multi-node GPU computation is only possible for the BioCCS/U research team members using the gpuX queue. OpenMPI supports the GPUDirect technology which enables direct transfers of data between GPUs over NVLink or the OmniPath fabric without intermediate use of the main system memory. Distributed GPU computation is a rather delicate activity requiring interoperation of CUDA-aware OmniPath kernel drivers, the CUDA-aware PSM2 library, OpenMPI build with CUDA enabled, and the CUDA framework itself. From the programmer’s perspective, mixing multi-GPU CUDA code with MPI has some specific caveats, as described e.g. here.

On HELIOS, the supported combinations of modules are as follows:

These MPI modules with CUDA support must NOT be used for regular MPI computations on the CPU-only nodes.

The job submission script for a CUDA+MPI+OpenMP job may look like this:

#!/bin/bash 
### Job Name 
#PBS -N lbm3d-CUDA-MPI 
### required runtime 
#PBS -l walltime=01:00:00 
### queue for submission 
#PBS -q gpuX 
 
### Merge output and error files 
#PBS -j oe 
 
### request 2 GPU nodes each with 32 CPUs, 4 GPUs, 4 MPI ranks & 8 OpenMP threads 
#PBS -l select=2:mem=375G:ncpus=32:ngpus=4:mpiprocs=4:ompthreads=8 
 
### start job in the directory it was submitted from 
cd $PBS_O_WORKDIR 
 
### load the necessary software module 
module load gcc/6.5 
module load cuda/10.1 
module load openmpi/2.1.5-gcc_4.8.5-psm2-cuda10.1 
# module load openmpi/4.1.0-gcc_4.8.5-psm2-cuda10.1 
 
export PSM2_CUDA=1 
export PSM2_GPUDIRECT=1 
 
### run the application and provide its command line arguments 
mpirun ./sim_1 
 
### Note that more applications/shell commands may be added here 
### (e.g. for post-processing of the results)

5.4.5 MATLAB (Mathematica, R, Julia, ...) Jobs

Assume that we have a MATLAB script named myscript.m stored in the current directory. The job submission script for a (single-threaded) non-interactive MATLAB job may look like this:

#!/bin/bash 
### Job Name 
#PBS -N MATLAB_test 
### required runtime 
#PBS -l walltime=01:00:00 
### queue for submission 
#PBS -q cpu_a 
 
### Merge output and error files 
#PBS -j oe 
 
### Request 16 GB of memory and 1 CPU core on 1 compute node 
#PBS -l select=1:mem=16G:ncpus=1 
 
### start job in the directory it was submitted from 
cd $PBS_O_WORKDIR 
 
### load the necessary software modules 
module load MATLAB/R2020a 
# module load Mathematica/12.0.0 
# module load R/3.5.2 
# module load julia/1.5.2 
 
### run the script: 
### --------------- 
 
### ... for MATLAB R2018b or older 
#matlab -nodisplay -r "myscript; quit;" 
### ... for MATLAB R2019a or newer 
matlab -batch "myscript" 
 
### ... for recent Mathematica versions 
#wolframscript -script myscript.wls 
 
### .. for R 
#R CMD BATCH myscript.r 
 
### ... for Julia 
#julia myscript.jl

The script also indicates (in the commented-out lines) that an analogous approach can be adopted for running Mathematica, R, Julia, and other console-based jobs limited to a single compute node and using one or more threads (set the ncpus resource accordingly).

5.4.6 Python Jobs

Python versions 2.7 and 3.6 are readily installed on all nodes (no module load command is necessary). In addition, more recent Python version(s) are gradually being added as modules. It is recommended that one creates a Python virtual environment for their work so that custom Python packages can be installed via pip as necessary.

[stracpav@login1 ~]$ python3.6 -m venv my_virtual_env

or

[stracpav@login1 ~]$ module load python/3.10.9 
[stracpav@login1 ~]$ python3 -m venv my_virtual_env

Provided that the job is submitted from within the my_virtual_env directory containing the Python script myscript.py, it can look e.g. like this:

#!/bin/bash 
### Job Name 
#PBS -N python_script 
### required runtime 
#PBS -l walltime=01:00:00 
### queue for submission 
#PBS -q cpu_a 
 
### Merge output and error files 
#PBS -j oe 
 
### Request 16 GB of memory and 1 CPU core on 1 compute node 
#PBS -l select=1:mem=16G:ncpus=1 
 
### start job in the directory it was submitted from 
cd $PBS_O_WORKDIR 
 
# activate the Python virtual environment: 
# Note that once activated, the Python version used when creating 
# the virtual environment is readily available and the respective module 
# need not be loaded my means of "module load python/X.X.X". 
source bin/activate 
 
### run the application 
python myscript.py

5.4.7 ANSYS Jobs

The case setup is usually prepared using the ANSYS GUI, see Section 5.5. The actual simulations using ANSYS products (e.g. Fluent) can then be run as batch jobs provided that

The job submission script for an MPI-parallel non-interactive ANSYS (Fluent) job may look like this:

#!/bin/bash 
### Job Name 
#PBS -N ANSYS_Fluent_case 
### required runtime 
#PBS -l walltime=01:00:00 
### queue for submission 
#PBS -q cpu_a 
 
### Merge output and error files 
#PBS -j oe 
 
### Request 60 GB of memory and 16 CPU cores on 1 compute node 
#PBS -l select=1:mem=60G:ncpus=16:mpiprocs=16 
 
### start job in the directory it was submitted from 
cd $PBS_O_WORKDIR 
 
module load ANSYS/19.1 
### Use the 
export PATH=$ANSYS_BASE/fluent/bin:$PATH 
 
### direct PBS Pro support is not set up in ANSYS, 
### so we obtain the MPI parameters as follows: 
NPROCS=‘wc -l < $PBS_NODEFILE 
### run ANSYS Fluent 
fluent 3ddp -t${NPROCS} -p -cnf=$PBS_NODEFILE -mpi=openmpi -g -i My_Case.jou > log.txt

Notice how the ANSYS_BASE environment variable can be used to set relative path to the individual ANSYS product (e.g. Fluent) executable.

5.4.8 OpenFOAM Jobs

Integrating PBS Pro job submission into the OpenFOAM traditional

Preprocess decomposePar runParallel Postprocess

chain is a little bit more complicated in comparison to the other procedures described in this manual. Please contact support if you intend to run non-interactive parallel OpenFOAM simulations.

5.5 Running Interactive Jobs

Sometimes it is useful or required to work on a compute node interactively.

Interactive jobs may be used to run GUI-based applications that require substantial computing power directly on the compute nodes.

5.5.1 Jupyter Notebook

It is possible to connect to a running Jupyter Notebook server from the user’s local machine provided that an SSH tunnel is established to the compute node. Starting from scratch, one can follow the procedure below:

  1. As in Section 5.4.6, create a virtual environment and install Jupyter Notebook:

    [stracpav@login1 ~]$ python3.6 -m venv jupyter_virtual_env 
    [stracpav@login1 ~]$ cd jupyter_virtual_env 
    [stracpav@login1 ~/jupyter_virtual_env]$ source bin/activate 
    (jupyter_virtual_env.... ]$ pip install --upgrade pip 
    (jupyter_virtual_env.... ]$ pip install jupyter

    Optionally, one can set up Jupyter to listen on all interfaces by default, which requires generating a configuration file

    (jupyter_virtual_env.... ]$ jupyter notebook --generate-config

    and then editing the file ~/.jupyter/jupyter_notebook_config.py so that the following lines are uncommented and changed accordingly:

            c.NotebookApp.allow_origin = * 
            c.NotebookApp.allow_remote_access = True 
            c.NotebookApp.ip = *
  2. Once the setup is completed, create an interactive job and start Jupyter Notebook on the compute node:

    [stracpav@login1 ~]$ qsub -I -q cpu_a -l walltime=1:00:00 -l select=1:mem=8G:ncpus=1 
    qsub: waiting for job 139604.login1 to start 
    qsub: job 139604.login1 ready 
     
    [stracpav@node11 ~]$ cd jupyter_virtual_env 
    [stracpav@node11 ~/jupyter_virtual_env]$ source bin/activate 
    (jupyter_virtual_env.... ]$ jupyter notebook --no-browser --ip=node11 
    [I 12:30:53.266 NotebookApp] Serving notebooks from local directory: .... 
    [I 12:30:53.266 NotebookApp] The Jupyter Notebook is running at: 
    [I 12:30:53.266 NotebookApp] http://node11:8888/?token= ... 
    [I 12:30:53.267 NotebookApp] Use Control-C to stop this server ... 
    [C 12:30:53.286 NotebookApp] 
     
        To access the notebook, open this file in a browser: 
            file:///mnt/lustre/helios-home/stracpav/.local/share/jupyter/ ... 
        Or copy and paste one of these URLs: 
            http://node11:8888/?token=82b67a9f3e29b318bd306 ...

    If the optional part of step 1 has been performed, the ”--ip” flag is not required.

  3. Now open a new SSH connection to Helios, using port forwarding to forward e.g. local port 8888 to the remote port 8888 on node11:

    ssh -L 8888:node11:8888 stracpav@helios.fjfi.cvut.cz
  4. Finally, open a local browser and paste the suggested URL (with the correct token) to the address line, replacing the node name by ”localhost”:

    http://localhost:8888/?token=82b67a9f3e29b318bd306cdc773faa8d3d8eb575320ef4ad

  5. After shutting down Jupyter, close the second SSH connection to Helios and also leave the interactive job.

5.6 Job Management

5.7 Direct SSH Access to Compute Nodes

Users having a running job (either batch or interactive) can connect to the respective compute nodes directly via SSH. The nodes are on the internal network and are accessible from the login node only unless an SSH tunnel is created. The SSH session is automatically terminated by PBS once the job finishes.

6 Storage Space

6.1 Home Directories

Each user has their home directory available on the login node as well as on all compute nodes. The home directory is mounted under

/mnt/lustre/helios-home/USERNAME

Quotas are currently not applied on the home directories. The whole file system has roughly 180 TB of usable space available for both the regular users and the BioCCS/U team members. Quotas may be introduced later if excess capacity use by some individual users becomes an issue.

6.2 Scratch Space

On each compute node and for each user, there is a directory (a scratch space)

/scratch/USERNAME

which is available for writing intermediate results as the job runs. The local SSD storage is much faster than the Lustre file system if thousands of small files need to be handled. After the job finishes, any files that it left in the scratch directory remain there for 7 days. After that, they are deleted automatically.

It is not so easy to access the scratch space on an individual node once the job finishes. In order to collect all relevant data from the scratch space before the job finishes, write a submission script in the following fashion:

### PBS Job submission setup 
. 
. 
. 
### create a subdirectory in the users scratch area for each job, based on 
### a unique job ID 
### (so that more jobs dont interfere if they meet on a single compute node) 
OUTPUT_SCRATCH=/scratch/$USER/$PBS_JOBID 
mkdir $OUTPUT_SCRATCH 
. 
. 
. 
### run the application which writes to the scratch directory 
./my_app.exe -output $OUTPUT_SCRATCH 
. 
. 
. 
### do some post-processing and store the results in the users home directory 
. 
. 
. 
### be nice to others and clean up what you left in the scratch space 
rm -rf $OUTPUT_SCRATCH

7 Remote Visualization

For visualization of large datasets, ParaView and VisIt software tools are installed and available as modules (see Section 4.1). It is particularly easy to start a parallel ParaView remote visualization server on the login node (with direct high speed access to the user’s home directory on the Lustre filesystem) and connect to it from the user’s workstation using an identical version of ParaView installed locally. The procedure is as follows:

  1. log in to Helios, creating an SSH tunnel to forward the port 11111 of login1 to your local machine:

    ssh -L 11111:localhost:11111 USERNAME@helios.fjfi.cvut.cz

    On Windows, use the port forwarding features of PuTTY.

  2. On Helios login node (using the SSH shell created in step 1), launch the ParaView server. The MPI implementation readily shipped with ParaView can be used to start multiple processes of the server (see Section 4.1.1 for details on running different versions of ParaView):

    [stracpav@login1 ~]$ module load paraview/5.8.1-headless 
    [stracpav@login1 ~]$ mpiexec -np 4 pvserver
  3. On the local machine (either Linux or Windows), start ParaView and establish a new server connection (from the ”File Connect” menu entry), using ”localhost” as the host name and ”11111” as the port. In the next step, choose that the server is started manually (as it is already started by you).

  4. Connect to the server. From now on, the remote file system is accessible through the ”Open” dialog and all rendering is performed on the remote server. Only the ParaView user interface displaying the rendering results runs on the user’s machine.

  5. When finished, choose ”File Disconnect” or simply close ParaView. This will terminate the ParaView server as well.

8 Support

For support, please contact Pavel Strachota.

References

[1]   PBS Professional 18.2 User’s Guide https://www.pbsworks.com/pdfs/PBSUserGuide18.2.pdf

[2]   PBS Professional 18.2 Reference Guide https://www.pbsworks.com/pdfs/PBSRefGuide18.2.pdf

[3]   Environment Modules http://modules.sourceforge.net/