FAQ on the new cluster Taurus2

The subscribers to the cluster's list are invited to put their tips and tricks on using the cluster, on this page.

There you will find a very old french version cluster_fr_old.

New high-performance storage policy on cluster :

A computing cluster should allow users to use a large storage space during computation, consequently the use of storage should be temporary. Once the calculations have been made, it is the responsibility of the user to:

  • compress your important data
  • move the stored data to another storage space (local or janus for instance)
  • delete the unused data
  • Do not use space or strange character in your name of file, repertory.

The system administrator reserves the right to compress or delete your files at any time.

There is no backup on the cluster, you may lose all your data at any time!

In addition, in order to avoid usages that may affect other users, there will be a quota system on your home (50 GB). Users requiring more space have to ask a special account. In a special account, any data present since more 40 days will be automatically deleted without any possibility of recovery.

A - Presentation of the cluster

1. Technical overview

Number of CPU cores 340
Estimate peak power 1,8 ??? TFlops
CPUs Type 50 x Intel(©) Xeon(©) E5440 - 2.83GHz quad-core (4)
10 x AMD(©) Opteron(©) 6134 - 2.3 GHz sexa-core (6)
10 x AMD Opteron™ 4184 - 2.8 GHz octo-core (8)
18 x Intel(©) Xeon(©) E5-2670 - 2.5GHz 2*10 core (20)
GPU 4 x Nvidia Tesla K20m
Master node Bull Novascale R460
Node calculations 12 x Bull Novascale R422
5 x Transtec Calleo 351
5 x Dell PowerEdge R415
10 x TODO
Memory configuration 32Go of RAM by node, special ram node have 220 Go
Total memory 20*32 + 220 = 880 Go
Disks capacity 6 To
Interconnection 2 x Gigabit Ethernet
Operating system GNU/Linux
Distribution Rocks 6.1.1 - CentOS 6.5
Main software Compilers and libraries GNU, OpenMPI, Nvidia sdk toolkit

2. Who can access the cluster ?

All LERIA members can access the cluster.

To gain access, just ask the validation of the account on the cluster by sending an email to technique (at) info.univ-angers.fr with your login ldap (same as CAS,ENT).

Guest accounts are possible after agreement with the LERIA laboratory.

B - Using the cluster

1. How to connect to the cluster?

From inside our network

Users can only connect to the cluster by SSH on standard port:

ex.: ssh -Y mylogin@taurus2.info-ua

and all file transfert should use sftp :

ex.: sftp mylogin@taurus2.info-ua

From internet

Unvailable temporarily, you have to connect first on janus: ssh -Y mylogin@janus.info.univ-angers.fr

Users can connect to the cluster either by SSH on port 2222 or by the VPN:

ex.: ssh -p 2222 -Y mylogin@cluster.info.univ-angers.fr

and all file transfert should use sftp :

ex.: sftp -o port=2222 mylogin@cluster.info.univ-angers.fr

2. How to properly use the cluster?

It's better to compile your source code on the cluster: you can use the n-2-54 node for that. See this page to know how increase performance of your program with option compilation of gcc. Also you can take a look of the intel c++ compiler: he can vectorize your code much more than gcc.

SGE is a resource manager (job scheduler) that allows multiple users to reserve resources for passing work as soon as resources are available.

It is imperative to use SGE (Sun Grid Engine) to submit calculations to the cluster.

The front should never be used as a node for calculation.

The basic commands are:

  • qsub : can submit a job to the cluster via a shell script
ex.: qsub -m bea -M $USER@univ-angers.fr test.sh
If your jobs are likely to send a lot of mails (or if not sure), and to avoid mail flood on our SMTP servers, don't use “-M …” option, instead use “-m n” (No mail is sent).
  • qlogin : Request an interactive shell on a machine in the cluster (for use with 'screen' then to retrieve the shell)
ex.: screen -d -m qlogin -m bea -M myloginmail@info.univ-angers.fr -now no 

⇨ Wait for the mail and connect to the node with: screen -r

  • qstat : Display running jobs belonging to the user.
ex.: qstat -u my_user
  • qstat -u “*” : Display jobs of every users (running or pending)
ex. : qstat -u "*" | less
  • qhost : Display information about the node of the cluster.
ex.: qhost -j

3. How to submit jobs?

There is one single (queue) with 3 specific parallel environments: param, mpi, threaded.

By default : all.q

This queue contains all the compute nodes in the cluster and can run both interactive jobs and batches without limitation of resources or time. But you have just 20 simultaneous running job at the same time. If you want more, you should submitting on the parallel environment param.

This is the default destination for jobs submitted by qsub. This is a FIFO queue.

Use: qsub test_sequentiel.sh

4. How to limit the use of resources?

We must limit the use of resources from a job (memory, disk space, …) :

  • as a courtesy to other users,
  • to shield their jobs in case of abnormal behavior,
  • not to start a job on a node where other jobs already monopolize a lot of resources.
The argument used to limit the resource '-l' :
ex1.: qsub -l h_vmem=2G test.sh
ex2.: qsub -l h_vmem=1G,mem_free=800M test.sh
ex3.: qsub -l h_fsize=10M test.sh

h_vmem is the maximum memory that can be used

mem_free is the minimum available memory needed to run the job

h_fsize is the maximum size of a file produced by the job

h_vmem does not guarantee the job execution environment will have enough memory. For instance, “qsub -l h_vmem=48G test.sh” can be run on a node with only 32G of RAM. If you have specific memory requirement then you should use a dedicated processor or the mem_free option.
For specific help:
man 5 complex

To see the list of available resources:

qconf -sc

5. How not to monopolize all resources: parallel environments

There are 3 parallel environments to share the best cluster resources between different types of work:

  • param : for parametric tests

This specific environment requests SGE to try to fill each node before assigning jobs to another node. The number of slots (=core) is limited to 400 but you can run 50 jobs at the same time (i.e.: 8 user can be use the param environment at the same time) .

This allows the FIFO queue to accept work submitted outside this environment even if they arrived after the initialisation.

:!: Parameter indicating the number of slots required by job should exceptions, always be equal to 1 (one core / job) :!:

Use: qsub -pe param 1 test_param.sh
  • threaded : for jobs requiring the reservation of a full node (benches, multi-threading, …)

This environment limits the job to stay on a single node allowing the reservation of a node to complete the job if the subject parameter indicating the number of slots per job is equal to eight (8 cores = 1 node) for Intel E5440 (12 for AMD-Opteron 4184, 16 for AMD-Opteron 6134, 20 for Intel E5-2670).

:!: The parameter indicating the number of slots per job can not be greater than 8/12/16/20 (depends of cpu) :!:

Use: qsub -pe threaded 8 test_bench.sh
               qrsh -pe threaded 8

If you want to use openMP, you have to use this way:

qsub -R y -pe threaded 4 test_bench.sh
  • mpi : for parallel jobs

This environment is used for parallel jobs using multiple nodes at once (i.e. distributed memory).

Use: qsub -pe mpi 16 test_mpi.sh

6. How to choose a specific environment?


You can choose a specific environment:

Example: you need to use gcc compiler version 4.8.2 but by defaut you have the 4.4.7 version. You can use environment module to fix the problem:

 $[login@taurus2]$ gcc --version
 gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)
 $[login@taurus2]$ module load gcc/4.8.2
 $[login@taurus2]$ gcc --version
 gcc (GCC) 4.8.2
 $[login@taurus2]$ module unload gcc/4.8.2
 $[login@taurus2]$ gcc --version
 gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-11)

To see all module available:

 $[login@taurus2]$ module avail
 ------------------------------------- /usr/share/Modules/modulefiles -------------------------------------
 dot              module-info      null             rocks-openmpi    use.own
 module-git       modules          opt-python       rocks-openmpi_ib
 -------------------------------------------- /etc/modulefiles --------------------------------------------
 ------------------------------------ /share/apps/modules/modulefiles -------------------------------------
 gcc/4.8.2  null       nvidia/6.5

To see all module launched:

  $[login@taurus2]$ module list
  Currently Loaded Modulefiles:
  1) rocks-openmpi   2) gcc/4.8.2

To make your own module:

  module load use.own

–> Create in your home a new repository “privatemodules” with the squelette “null” for write a new modulehttp://en.wikipedia.org/wiki/Environment_Modules_%28software%29 in TCL http://en.wikipedia.org/wiki/Tcl.

You can now compile your own software and manage them with your own module.

If you have trooble to compile your software or if you think software is helpful for all users. Thanks to send a mail at technique [at] info.unviv-angers.fr

How to use cuda?

GPU cards (tesla k20m) are present on compute node n-2-54 and n-2-53 (2*2 k20m).

You can use:

  • cuda 6.5 on the compute node 0-0.
  • cuda 7.5 on the compute node n-2-54.

You can charge environnement via:

<del>module load nvidia/6.5</del>


module load nvidia/7.5

Cuda 7.5 allow the use of Multi-Stream.

How to use cplex?

Leria has an academic license for the CPLEX software.

The path to the library cplex is not the default path (/opt/ibm/……) but is /share/apps/cplex/12.6.1 . If you want, you can load this environment with the following command:

$ module load cplex/12.6.1

If you need an anterior version of the cplex library, contact technique [at] info.

Also, I propose the following solution to have better performance for your executable on the cluster, if you want to launch executable on intel node (resp. AMD node):

$ ssh n-2-54 #(resp. n-1-90 for AMD):
$ cd path/to/src/of/your/app
$ module load gcc/4.8.2
# get a example of Makefile to compile cplex program
$ cp /share/apps/cplex/12.6.1/CPLEX_Studio/cplex/examples/x86-64_linux/static_pic/Makefile .  
# change Makefile for your need but think to particularly change this variables:
# CPLEXDIR      = /share/apps/cplex/12.6.1/CPLEX_Studio/cplex
# CONCERTDIR    = /share/apps/cplex/12.6.1/CPLEX_Studio/concert
# CCC = g++ -m64 -Ofast -flto -march=native -funroll-loops        
# CC  = gcc -m64 -Ofast -flto -march=native -funroll-loops

Also, you can add linker option -static . Compilation will be more longer and executable bigger but you can increase a little performance on execution.

After that, you can launch your compute via sge on the master node.

7. How to choose the type of processor (Intel Xeon 2.83GHz, 2.3GHz or AMD Opteron AMD Opteron 2.8Ghz)?


By default, the work is performed once a sufficient resource is available. However, it is possible to request a specific type of processor using groups that have been identified among the nodes of calculations:

  • Not yet available For Intel Xeon™ E5440 @ 2.83GHz (192 cores | compute-2-0 to compute-2-23): qsub -q “*@@intel-E5440” test.sh
  • For AMD Opteron™ 6134 @ 2.3GHz (80 cores | n-1-90 to n-1-94):

qsub -q “*@@amd-6134” test.sh

  • For AMD Opteron™ 4184 @ 2.8GHz (60 cores | n-1-95 to n-1-99) :

qsub -q “*@@amd-4184” test.sh

  • For Intel E5-2670 @ 2.8GHz (140 cores | n-2-52 to n-2-45) :

qsub -q “*@@intel-E5-2670” test.sh

  • For Ram @ 2.3GHz (230 Go RAM, 12 cores | compute-0-9) : qsub -q “*@@ram” test.sh For RAM node, contact technique [at] info.univ-angers.fr
  • For Cuda @ 2.8GHz (4 Tesla K20m, 40 cores | n-2-54 to n-2-53) :

qsub -q “*@@cuda” test.sh

8. BENCHMARKS / TESTS: How to apply for booking a full node

  • Not yet available For Intel E5440 - 8 core @ 2,83Ghz:
  screen -d -m qlogin -q "*@@intel-E5440" -pe threaded 8 -m bea -M myloginmail@info.univ-angers.fr -now no

⇨ Wait for the mail and connect to the node with: screen -r

  • For AMD 6134 - 16 core @ 2,3Ghz:
  screen -d -m qlogin -q "*@@amd-6134" -pe threaded 16 -m bea -M myloginmail@info.univ-angers.fr -now no

⇨ Wait for the mail and connect to the node with: screen -r

  • For AMD 4184 - 12 core @ 2,8Ghz:
  screen -d -m qlogin -q "*@@amd-4184" -pe threaded 12 -m bea -M myloginmail@info.univ-angers.fr -now no

⇨ Wait for the mail and connect to the node with: screen -r

Within the framework of benchmarks, consider copying your data and binaries in /tmp before running your code to isolate you from network whose performance depends heavily on the activity of other nodes.

9. How to delete one or more jobs?

qdel -f <num_job1> <num_job2> ... <num_jobn>
  • To delete all my work: :
qdel -f -u $USER

10. Array Jobs

An Array Job is a job to be executed multiple times, for instance when a parametric test is launched. SGE launches the same script multiple times, the only difference between the runs is the environment variable $ SGE_TASK_ID. This variable can be used as the seed of a generator or pseudorandom numbers correspond to the number of an instance or a combination of pre-generated parameters that the script will look into a file.

An array job is submitted using the '-t' flag:

ex.: qsub -t 1-100 test.sh

This will run the test.sh script 100 times, each time with a different value of $SGE_TASK_ID {1, …, 100}.

We may wish to use the script only every N times:

ex.: qsub -t 1000-1400:100 test.sh

This will run the test.sh script 5 times, each time with a different value of $SGE_TASK_ID {1000, 1100, 1200, 1300, 1400}.

Advantages over sending hundreds of individual jobs:

  • A job array allows a more concise display of the queue since all tasks executed a non-array occupy only online job during a display qstat.
  • Deleting all the tasks of an array job is done using the name / ID of the job (JOB qdel) and the deletion of a single specific task using its number (qdel JOB.TASKID). It is easier to make selective cuts if the first results show that the job does not work as expected.

11. How to define dependencies between jobs?

It can happen to need the results of a job to start another. You can of course run the first job, wait until it finishes and then start the second depending on the result. Or you can use dependencies to submit two jobs at the same time, the second job is running when the first is completed.

qsub -N Step1 test1.sh
qsub -hold_jid Step1 -N Step2 test2.sh

In the example above, the Step1 and Step2 jobs are submitted to the queue. Step 1 will be executed as soon as possible, but the argument -hold_jid Step1 strength to wait to complete Step1 before starting Step2.

This also works with array jobs. This can be useful for example to aggregate into a single file all the small files produced by a job array.

qsub -pe param 1 -N monjob -t 1-50 test.sh
qsub -hold_jid monjob -N analyse analyse.sh

Here the job analyse will only run at the end of the 50 tests parametrics which monjob is consists.

Example: Compress files produced by jobs

If you run a lot of jobs that produce a lot of results, it may be wise to compress the output files to not use too much disk space.

So we need a script that launches jobs producing results and eventually a job that waits for the previous jobs are completed to compress and delete the generated files.

It will be a principal script submit.sh:

# Create a string representing a group of jobs.

# The string prepended to the name of each job group.

# Jobs submission                                        
qsub -N ${group}job1 job1.sh                   
qsub -N ${group}job2 job2.sh       
# etc  
# submit Job which handle file produced
qsub -N postprocessing -hold_jid ${group}* postproc.sh $group 

and a post-processing script postproc.sh :

#$ -cwd                                                       
# The only parameter is the name of the group
tar -czf archive_${group}.tar.gz ${group}*                    
rm ${group}*  

12. Scripts

Instead of passing parameters in the qsub command, it is possible, and often more convenient to insert the script executed by qsub. Lines in the script with parameters, or directives are prefixed with # $.

You will find in the archive qsub_template.zip A model of script with a version with commentary in French and another in English. This script contains a number of useful guidelines including those on this page and others. You are encouraged to use this script and modify it to suit your needs.

13. How do I know if the cluster is available ?

The activity of the cluster is visible in real-time from the internal network Ganglia

14. What to do in case of problems ?

Look in the error file is generated in the form <job_name>.exxxxx

Check that the path /opt/gridengine/bin/lx26-amd64 is well ahead of the environement $PATH

Pay attention to memory management in your programs…

Think also use the cluster mailing list which includes all cluster's user and the technical team responsible for its administration.

15. Where can I find documentation?

C - Cluster History (Changelog)

Years Version Distribution (OS) Number and type of CPUs Power calculations estimated
2003 1.0 Alinka Raisin 30 x Intel Pentium-4 2.4 144 GFlops
2006 1.1 Rocks cluster 4.3 30 x Intel Pentium-4 2.4 144 GFlops
2008 2.0 Rocks cluster 5.0 50 x Intel Xeon-quad-Core 2.8 1680 GFlops
2010 2.1 Rocks cluster 5.3 50 x Intel Xeon-quad-Core 2.8 + 10 x AMD Opteron-octo-Core 2.3Ghz 2200 GFlops
2012 2.2 Rocks cluster 5.4 50 x Intel Xeon-quad-Core 2.8 + 10 x AMD Opteron-octo-Core 2.3Ghz + 10 x AMD opteron-hexa-Core 2.8Ghz 2200 GFlops
2015 3.0 Rocks cluster 6.1.1 40 x Intel Xeon-quad-Core 2.8 + 10 x AMD Opteron-octo-Core 2.3Ghz + 10 x AMD opteron-hexa-Core 2.8Ghz + 18 x Intel Xeon 10*2 Core 2.5Ghz + 4 GPU Nvidia Tesla K20 ??? GFlops

D. Known problem

  • problem between CMake and environment module: you have to specify explicitly variable environnment in CMakeList.txt.

For example:

# in CMAkeList.txt
set (CMAKE_CXX_COMPILER /share/apps/gcc/4.8.2/bin/g++)

But you have to use:

module load gcc/4.8.2
faq/cluster2.txt · Dernière modification: 05/04/2017 10:02 par Chantrein Jean-Mathieu
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0