Table des matières

FAQ on the new cluster Taurus2

The subscribers to the cluster's list are invited to put their tips and tricks on using the cluster, on this page.

There you will find a very old french version cluster_fr_old.

New high-performance storage policy on cluster :

A computing cluster should allow users to use a large storage space during computation, consequently the use of storage should be temporary. Once the calculations have been made, it is the responsibility of the user to:

  • compress your important data
  • move the stored data to another storage space (local or janus for instance)
  • delete the unused data
  • Do not use space or strange character in your name of file, repertory.

The system administrator reserves the right to compress or delete your files at any time.

There is no backup on the cluster, you may lose all your data at any time!

In addition, in order to avoid usages that may affect other users, there will be a quota system on your home (50 GB). Users requiring more space have to ask a special account. In a special account, any data present since more 40 days will be automatically deleted without any possibility of recovery.

A - Presentation of the cluster

1. Technical overview

TAURUS
Taurus
Number of CPU cores 340
Estimate peak power 1,8 ??? TFlops
CPUs Type 50 x Intel(©) Xeon(©) E5440 - 2.83GHz quad-core (4)
10 x AMD(©) Opteron(©) 6134 - 2.3 GHz sexa-core (6)
10 x AMD Opteron™ 4184 - 2.8 GHz octo-core (8)
18 x Intel(©) Xeon(©) E5-2670 - 2.5GHz 2*10 core (20)
GPU 4 x Nvidia Tesla K20m
Master node Bull Novascale R460
Node calculations 12 x Bull Novascale R422
5 x Transtec Calleo 351
5 x Dell PowerEdge R415
10 x TODO
Memory configuration 32Go of RAM by node, special ram node have 220 Go
Total memory 20*32 + 220 = 880 Go
Disks capacity 6 To
Interconnection 2 x Gigabit Ethernet
Operating system GNU/Linux
Distribution Rocks 6.1.1 - CentOS 6.5
Main software Compilers and libraries GNU, OpenMPI, Nvidia sdk toolkit

2. Who can access the cluster ?

All LERIA members can access the cluster.

To gain access, just ask the validation of the account on the cluster by sending an email to technique (at) info.univ-angers.fr with your login ldap (same as CAS,ENT).

Guest accounts are possible after agreement with the LERIA laboratory.

B - Using the cluster

1. How to connect to the cluster?

From inside our network

Users can only connect to the cluster by SSH on standard port:

ex.: ssh -Y mylogin@taurus2.info-ua

and all file transfert should use sftp :

ex.: sftp mylogin@taurus2.info-ua

From internet

Unvailable temporarily, you have to connect first on janus: ssh -Y mylogin@janus.info.univ-angers.fr

Users can connect to the cluster either by SSH on port 2222 or by the VPN:

ex.: ssh -p 2222 -Y mylogin@cluster.info.univ-angers.fr

and all file transfert should use sftp :

ex.: sftp -o port=2222 mylogin@cluster.info.univ-angers.fr

2. How to properly use the cluster?

It's better to compile your source code on the cluster: you can use the n-2-54 node for that. See this page to know how increase performance of your program with option compilation of gcc. Also you can take a look of the intel c++ compiler: he can vectorize your code much more than gcc.

SGE is a resource manager (job scheduler) that allows multiple users to reserve resources for passing work as soon as resources are available.

It is imperative to use SGE (Sun Grid Engine) to submit calculations to the cluster.

The front should never be used as a node for calculation.

The basic commands are:

ex.: qsub -m bea -M $USER@univ-angers.fr test.sh
If your jobs are likely to send a lot of mails (or if not sure), and to avoid mail flood on our SMTP servers, don't use “-M …” option, instead use “-m n” (No mail is sent).
ex.: screen -d -m qlogin -m bea -M myloginmail@info.univ-angers.fr -now no 

⇨ Wait for the mail and connect to the node with: screen -r

ex.: qstat -u my_user
ex. : qstat -u "*" | less
ex.: qhost -j

3. How to submit jobs?

There is one single (queue) with 3 specific parallel environments: param, mpi, threaded.

By default : all.q

This queue contains all the compute nodes in the cluster and can run both interactive jobs and batches without limitation of resources or time. But you have just 20 simultaneous running job at the same time. If you want more, you should submitting on the parallel environment param.

This is the default destination for jobs submitted by qsub. This is a FIFO queue.

Use: qsub test_sequentiel.sh

4. How to limit the use of resources?

We must limit the use of resources from a job (memory, disk space, …) :

The argument used to limit the resource '-l' :
ex1.: qsub -l h_vmem=2G test.sh
ex2.: qsub -l h_vmem=1G,mem_free=800M test.sh
ex3.: qsub -l h_fsize=10M test.sh

h_vmem is the maximum memory that can be used

mem_free is the minimum available memory needed to run the job

h_fsize is the maximum size of a file produced by the job

h_vmem does not guarantee the job execution environment will have enough memory. For instance, “qsub -l h_vmem=48G test.sh” can be run on a node with only 32G of RAM. If you have specific memory requirement then you should use a dedicated processor or the mem_free option.
For specific help:
man 5 complex

To see the list of available resources:

qconf -sc

5. How not to monopolize all resources: parallel environments

There are 3 parallel environments to share the best cluster resources between different types of work:

This specific environment requests SGE to try to fill each node before assigning jobs to another node. The number of slots (=core) is limited to 400 but you can run 50 jobs at the same time (i.e.: 8 user can be use the param environment at the same time) .

This allows the FIFO queue to accept work submitted outside this environment even if they arrived after the initialisation.

:!: Parameter indicating the number of slots required by job should exceptions, always be equal to 1 (one core / job) :!:

Use: qsub -pe param 1 test_param.sh

This environment limits the job to stay on a single node allowing the reservation of a node to complete the job if the subject parameter indicating the number of slots per job is equal to eight (8 cores = 1 node) for Intel E5440 (12 for AMD-Opteron 4184, 16 for AMD-Opteron 6134, 20 for Intel E5-2670).

:!: The parameter indicating the number of slots per job can not be greater than 8/12/16/20 (depends of cpu) :!:

Use: qsub -pe threaded 8 test_bench.sh
               qrsh -pe threaded 8
               

If you want to use openMP, you have to use this way:

qsub -R y -pe threaded 4 test_bench.sh
OMP_NUM_THREADS=$NSLOTS  # In this case, $NSLOTS=4
export OMP_NUM_THREADS

This environment is used for parallel jobs using multiple nodes at once (i.e. distributed memory).

Use: qsub -pe mpi 16 test_mpi.sh

6. How to choose a specific environment?

New

You can choose a specific environment:

Example: you need to use gcc compiler version 4.8.2 but by defaut you have the 4.4.7 version. You can use environment module to fix the problem:

 $[login@taurus2]$ gcc --version
 gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)
 $[login@taurus2]$ module load gcc/4.8.2
 $[login@taurus2]$ gcc --version
 gcc (GCC) 4.8.2
 $[login@taurus2]$ module unload gcc/4.8.2
 $[login@taurus2]$ gcc --version
 gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)

To see all module available:

 $[login@taurus2]$ module avail
 ------------------------------------- /usr/share/Modules/modulefiles -------------------------------------
 dot              module-info      null             rocks-openmpi    use.own
 module-git       modules          opt-python       rocks-openmpi_ib
 
 -------------------------------------------- /etc/modulefiles --------------------------------------------
 openmpi-x86_64
 
 ------------------------------------ /share/apps/modules/modulefiles -------------------------------------
 gcc/4.8.2  null       nvidia/6.5

To see all module launched:

  $[login@taurus2]$ module list
  Currently Loaded Modulefiles:
  1) rocks-openmpi   2) gcc/4.8.2

To make your own module:

  module load use.own

–> Create in your home a new repository “privatemodules” with the squelette “null” for write a new modulehttp://en.wikipedia.org/wiki/Environment_Modules_%28software%29 in TCL http://en.wikipedia.org/wiki/Tcl.

You can now compile your own software and manage them with your own module.

If you have trooble to compile your software or if you think software is helpful for all users. Thanks to send a mail at technique [at] info.unviv-angers.fr

How to use cuda?

GPU cards (tesla k20m) are present on compute node n-2-54 and n-2-53 (2*2 k20m).

You can use:

You can charge environnement via:

<del>module load nvidia/6.5</del>

or

module load nvidia/7.5

Cuda 7.5 allow the use of Multi-Stream.

How to use cplex?

Leria has an academic license for the CPLEX software.

The path to the library cplex is not the default path (/opt/ibm/……) but is /share/apps/cplex/12.6.1 . If you want, you can load this environment with the following command:

$ module load cplex/12.6.1

If you need an anterior version of the cplex library, contact technique [at] info.

Also, I propose the following solution to have better performance for your executable on the cluster, if you want to launch executable on intel node (resp. AMD node):

$ ssh n-2-54 #(resp. n-1-90 for AMD):
$ cd path/to/src/of/your/app
$ module load gcc/4.8.2
# get a example of Makefile to compile cplex program
$ cp /share/apps/cplex/12.6.1/CPLEX_Studio/cplex/examples/x86-64_linux/static_pic/Makefile .  
# change Makefile for your need but think to particularly change this variables:
# CPLEXDIR      = /share/apps/cplex/12.6.1/CPLEX_Studio/cplex
# CONCERTDIR    = /share/apps/cplex/12.6.1/CPLEX_Studio/concert
# CCC = g++ -m64 -Ofast -flto -march=native -funroll-loops        
# CC  = gcc -m64 -Ofast -flto -march=native -funroll-loops

Also, you can add linker option -static . Compilation will be more longer and executable bigger but you can increase a little performance on execution.

After that, you can launch your compute via sge on the master node.

7. How to choose the type of processor (Intel Xeon 2.83GHz, 2.3GHz or AMD Opteron AMD Opteron 2.8Ghz)?

New

By default, the work is performed once a sufficient resource is available. However, it is possible to request a specific type of processor using groups that have been identified among the nodes of calculations:

qsub -q “*@@amd-6134” test.sh

qsub -q “*@@amd-4184” test.sh

qsub -q “*@@intel-E5-2670” test.sh

qsub -q “*@@cuda” test.sh

8. BENCHMARKS / TESTS: How to apply for booking a full node

  screen -d -m qlogin -q "*@@intel-E5440" -pe threaded 8 -m bea -M myloginmail@info.univ-angers.fr -now no

⇨ Wait for the mail and connect to the node with: screen -r

  screen -d -m qlogin -q "*@@amd-6134" -pe threaded 16 -m bea -M myloginmail@info.univ-angers.fr -now no

⇨ Wait for the mail and connect to the node with: screen -r

  screen -d -m qlogin -q "*@@amd-4184" -pe threaded 12 -m bea -M myloginmail@info.univ-angers.fr -now no

⇨ Wait for the mail and connect to the node with: screen -r

Within the framework of benchmarks, consider copying your data and binaries in /tmp before running your code to isolate you from network whose performance depends heavily on the activity of other nodes.

9. How to delete one or more jobs?

qdel -f <num_job1> <num_job2> ... <num_jobn>
qdel -f -u $USER

10. Array Jobs

An Array Job is a job to be executed multiple times, for instance when a parametric test is launched. SGE launches the same script multiple times, the only difference between the runs is the environment variable $ SGE_TASK_ID. This variable can be used as the seed of a generator or pseudorandom numbers correspond to the number of an instance or a combination of pre-generated parameters that the script will look into a file.

An array job is submitted using the '-t' flag:

ex.: qsub -t 1-100 test.sh

This will run the test.sh script 100 times, each time with a different value of $SGE_TASK_ID {1, …, 100}.

We may wish to use the script only every N times:

ex.: qsub -t 1000-1400:100 test.sh

This will run the test.sh script 5 times, each time with a different value of $SGE_TASK_ID {1000, 1100, 1200, 1300, 1400}.

Advantages over sending hundreds of individual jobs:

11. How to define dependencies between jobs?

It can happen to need the results of a job to start another. You can of course run the first job, wait until it finishes and then start the second depending on the result. Or you can use dependencies to submit two jobs at the same time, the second job is running when the first is completed.

qsub -N Step1 test1.sh
qsub -hold_jid Step1 -N Step2 test2.sh

In the example above, the Step1 and Step2 jobs are submitted to the queue. Step 1 will be executed as soon as possible, but the argument -hold_jid Step1 strength to wait to complete Step1 before starting Step2.

This also works with array jobs. This can be useful for example to aggregate into a single file all the small files produced by a job array.

qsub -pe param 1 -N monjob -t 1-50 test.sh
qsub -hold_jid monjob -N analyse analyse.sh

Here the job analyse will only run at the end of the 50 tests parametrics which monjob is consists.

Example: Compress files produced by jobs

If you run a lot of jobs that produce a lot of results, it may be wise to compress the output files to not use too much disk space.

So we need a script that launches jobs producing results and eventually a job that waits for the previous jobs are completed to compress and delete the generated files.

It will be a principal script submit.sh:

#!/bin/bash                                                   
                                                             
# Create a string representing a group of jobs.

# The string prepended to the name of each job group.

group="g1"                                                    
                                                              
# Jobs submission                                        
qsub -N ${group}job1 job1.sh                   
qsub -N ${group}job2 job2.sh       
# etc  
                                                       
# submit Job which handle file produced
qsub -N postprocessing -hold_jid ${group}* postproc.sh $group 

and a post-processing script postproc.sh :

#!/bin/bash 
#$ -cwd                                                       
                                                              
# The only parameter is the name of the group
group=$1                                                      
                                                              
tar -czf archive_${group}.tar.gz ${group}*                    
rm ${group}*  

12. Scripts

Instead of passing parameters in the qsub command, it is possible, and often more convenient to insert the script executed by qsub. Lines in the script with parameters, or directives are prefixed with # $.

You will find in the archive qsub_template.zip A model of script with a version with commentary in French and another in English. This script contains a number of useful guidelines including those on this page and others. You are encouraged to use this script and modify it to suit your needs.

13. How do I know if the cluster is available ?

The activity of the cluster is visible in real-time from the internal network Ganglia

14. What to do in case of problems ?

Look in the error file is generated in the form <job_name>.exxxxx

Check that the path /opt/gridengine/bin/lx26-amd64 is well ahead of the environement $PATH

Pay attention to memory management in your programs…

Think also use the cluster mailing list which includes all cluster's user and the technical team responsible for its administration.

15. Where can I find documentation?

 man {sge_intro,qsub,qhost,qstat}

http://www.rocksclusters.org/roll-documentation/sge/5.3/using.html

http://wikis.sun.com/display/gridengine62u2/Home

C - Cluster History (Changelog)

Years Version Distribution (OS) Number and type of CPUs Power calculations estimated
2003 1.0 Alinka Raisin 30 x Intel Pentium-4 2.4 144 GFlops
2006 1.1 Rocks cluster 4.3 30 x Intel Pentium-4 2.4 144 GFlops
2008 2.0 Rocks cluster 5.0 50 x Intel Xeon-quad-Core 2.8 1680 GFlops
2010 2.1 Rocks cluster 5.3 50 x Intel Xeon-quad-Core 2.8 + 10 x AMD Opteron-octo-Core 2.3Ghz 2200 GFlops
2012 2.2 Rocks cluster 5.4 50 x Intel Xeon-quad-Core 2.8 + 10 x AMD Opteron-octo-Core 2.3Ghz + 10 x AMD opteron-hexa-Core 2.8Ghz 2200 GFlops
2015 3.0 Rocks cluster 6.1.1 40 x Intel Xeon-quad-Core 2.8 + 10 x AMD Opteron-octo-Core 2.3Ghz + 10 x AMD opteron-hexa-Core 2.8Ghz + 18 x Intel Xeon 10*2 Core 2.5Ghz + 4 GPU Nvidia Tesla K20 ??? GFlops

D. Known problem

For example:

# in CMAkeList.txt
set (CMAKE_CXX_COMPILER /share/apps/gcc/4.8.2/bin/g++)

But you have to use:

module load gcc/4.8.2