Traductions de cette page:

FAQ on the cluster

The subscribers to the cluster's list are invited to put their tips and tricks on using the cluster, on this page.

There you will find a french old version cluster_fr_old.

A - Presentation of the cluster

1. Technical overview

Number of CPU cores 340
Estimate peak power 1,8 TFlops
CPUs Type 50 x Intel(©) Xeon(©) E5440 - 2.83GHz quad-core (4)
10 x AMD(©) Opteron(©) 6134 - 2.3 GHz octo-core (8)
10 x AMD Opteron™ 4184 - 2.8 GHz hexa-core (6)
 Master node  Bull Novascale R460
Node calculations 12 x Bull Novascale R422
5 x Transtec Calleo 351
5 x Dell PowerEdge R415
 Memory configuration  2 Go de RAM by core
 Total memory  700 Go
 Disks capacity  5 To
 Interconnection  2 x Gigabit Ethernet
 Operating system  GNU/Linux
 Distribution  Rocks 5.x - CentOS 5.x
 Main software  Compilers and libraries GNU, OpenMPI, MPICH, MPICH2

2. Who can access the cluster ?

All LERIA members can access the cluster.

To gain access, just ask the validation of the account on the cluster by sending an email to technique (at)

Guest accounts are possible after agreement with the LERIA laboratory.

B - Using the cluster

1. How to connect to the cluster?

From inside our network

Users can only connect to the cluster by SSH on standard port:

ex.: ssh -Y

and all file transfert should use sftp :

ex.: sftp

From internet

Users can connect to the cluster either by SSH on port 2222 or by the VPN:

ex.: ssh -p 2222 -Y

and all file transfert should use sftp :

ex.: sftp -o port=2222

2. How to properly use the cluster?

SGE is a resource manager (job scheduler) that allows multiple users to reserve resources for passing work as soon as resources are available.

It is imperative to use SGE (Sun Grid Engine) to submit calculations to the cluster.

The front should never be used as a node for calculation.

The basic commands are:

  • qsub : can submit a job to the cluster via a shell script
ex.: qsub -m bea -M $
If your jobs are likely to send a lot of mails (or if not sure), and to avoid mail flood on our SMTP servers, don't use “-M …” option, instead use “-m n” (No mail is sent).
  • qlogin : Request an interactive shell on a machine in the cluster (for use with 'screen' then to retrieve the shell)
ex.: screen -d -m qlogin -m bea -M $ -now no 

⇨ Wait for the mail and connect to the node with: screen -r

  • qrsh : This command doesn't seem to work on the cluster. Check qlogin instead.
  • qstat : Display running jobs belonging to the user. (all by default)
ex.: qstat -u my_user
  • qstat -u “*” : Display jobs of every users
ex. : qstat -u "*" | less
  • qhost : Display information about the node of the cluster.
ex.: qhost -j

3. How to submit jobs?

There is one single (queue) with 3 specific parallel environments.

By default : all.q

This queue contains all the compute nodes in the cluster and can run both interactive jobs and batches without limitation of resources or time.

This is the default destination for jobs submitted by qsub. This is a FIFO queue.

Use: qsub

qstat command display all works (running or pending).

4. How to limit the use of resources?

We must limit the use of resources from a job (memory, disk space, …) :

  • as a courtesy to other users,
  • to shield their jobs in case of abnormal behavior,
  • not to start a job on a node where other jobs already monopolize a lot of resources.
The argument used to limit the resource '-l' :
ex1.: qsub -l h_vmem=2G
ex2.: qsub -l h_vmem=1G,mem_free=800M
ex3.: qsub -l h_fsize=10M

h_vmem is the maximum memory that can be used

mem_free is the minimum available memory needed to run the job

h_fsize is the maximum size of a file produced by the job

hard and soft requirements '-hard' & '-soft'

Frome the manual of qsub: *

  • the hard option signifies that all -q and -l resource requirements following in the command line will be hard requirements and must be satisfied in full before a job can be scheduled;
  • the soft option signifies that the resource requirements is “nice-to-have, but not essential.

By default, the requirements are soft.

ex1.: qsub -hard -l h_vmem=2G
For specific help:
man 5 complex

To see the list of available resources:

qconf -sc

5. How not to monopolize all resources: parallel environments

There are 3 parallel environments to share the best cluster resources between different types of work:

  • param : for parametric tests

This specific environment requests SGE to try to fill each node before assigning jobs to another node. The number of slots (=core) is limited to 3/4 of the whole (means 250 in 9/11/2012). This allows the FIFO queue to accept work submitted outside this environment even if they arrived after the initialisation.

:!: Parameter indicating the number of slots required by job should exceptions, always be equal to 1 (one core / job) :!:

Use: qsub -pe param 1
  • threaded : for jobs requiring the reservation of a full node (benches, multi-threading, …)

This environment limits the job to stay on a single node allowing the reservation of a node to complete the job if the subject parameter indicating the number of slots per job is equal to eight (8 cores = 1 node).

:!: The parameter indicating the number of slots per job can not be greater than 8 :!:

Use: qsub -pe threaded 8
               qrsh -pe threaded 8
  • mpi : for parallel jobs

This environment is used for parallel jobs using multiple nodes at once (i.e. distributed memory).

Use: qsub -pe mpi 16

6. How to choose the type of processor (Intel Xeon 2.83GHz, 2.3GHz or AMD Opteron AMD Opteron 2.8Ghz)?

By default, the work is performed once a sufficient resource is available. However, it is possible to request a specific type of processor using groups that have been identified among the nodes of calculations:

  • For Intel Xeon™ E5440 @ 2.83GHz (192 cores | compute-0-0 to compute-0-23):
 qsub -q "*@@intel-E5440"
  • For AMD Opteron™ 6134 @ 2.3GHz (80 cores | compute-0-24 to compute-0-28):
 qsub -q "*@@amd-6134"
  • For AMD Opteron™ 4184 @ 2.8GHz (60 cores | compute-0-29 to compute-0-33) :
 qsub -q "*@@amd-4184"

7. BENCHMARKS / TESTS: How to apply for booking a full node

  • For Intel E5440 - 8 core @ 2,83Ghz:
  screen -d -m qlogin -q "*@@intel-E5440" -pe threaded 8 -m bea -M $ -now no

⇨ Wait for the mail and connect to the node with: screen -r

  • For AMD 6134 - 16 core @ 2,3Ghz:
  screen -d -m qlogin -q "*@@amd-6134" -pe threaded 16 -m bea -M $ -now no

⇨ Wait for the mail and connect to the node with: screen -r

  • For AMD 4184 - 12 core @ 2,8Ghz:
  screen -d -m qlogin -q "*@@amd-4184" -pe threaded 12 -m bea -M $ -now no

⇨ Wait for the mail and connect to the node with: screen -r

Within the framework of benchmarks, consider copying your data and binaries in /tmp before running your code to isolate you from network whose performance depends heavily on the activity of other nodes.

8. How to delete one or more jobs?

qdel -f <num_job1> <num_job2> ... <num_jobn>
  • To delete all my work: :
qdel -f -u $USER

9. Array Jobs

An Array Job is a job to be executed multiple times, for instance when a parametric test is launched. SGE launches the same script multiple times, the only difference between the runs is the environment variable $ SGE_TASK_ID. This variable can be used as the seed of a generator or pseudorandom numbers correspond to the number of an instance or a combination of pre-generated parameters that the script will look into a file.

An array job is submitted using the '-t' flag:

ex.: qsub -t 1-100

This will run the script 100 times, each time with a different value of $SGE_TASK_ID {1, …, 100}.

We may wish to use the script only every N times:

ex.: qsub -t 1000-1400:100

This will run the script 5 times, each time with a different value of $SGE_TASK_ID {1000, 1100, 1200, 1300, 1400}.

Submitting array job can be resource consuming. It might be interresting to limit the number of conccurent tasks with '-tc'.

ex.: qsub -t 1-1000 -tc 50

This will run the script 1000 times with at most 50 tasks at the same time.

Advantages over sending hundreds of individual jobs:

  • A job array allows a more concise display of the queue since all tasks executed a non-array occupy only online job during a display qstat.
  • Deleting all the tasks of an array job is done using the name / ID of the job (JOB qdel) and the deletion of a single specific task using its number (qdel JOB.TASKID). It is easier to make selective cuts if the first results show that the job does not work as expected.

10. How to define dependencies between jobs?

It can happen to need the results of a job to start another. You can of course run the first job, wait until it finishes and then start the second depending on the result. Or you can use dependencies to submit two jobs at the same time, the second job is running when the first is completed.

qsub -N Step1
qsub -hold_jid Step1 -N Step2

In the example above, the Step1 and Step2 jobs are submitted to the queue. Step 1 will be executed as soon as possible, but the argument -hold_jid Step1 strength to wait to complete Step1 before starting Step2.

This also works with array jobs. This can be useful for example to aggregate into a single file all the small files produced by a job array.

qsub -pe param 1 -N monjob -t 1-50
qsub -hold_jid monjob -N analyse

Here the job analyse will only run at the end of the 50 tests parametrics which monjob is consists.

Example: Compress files produced by jobs

If you run a lot of jobs that produce a lot of results, it may be wise to compress the output files to not use too much disk space.

So we need a script that launches jobs producing results and eventually a job that waits for the previous jobs are completed to compress and delete the generated files.

It will be a principal script

# Create a string representing a group of jobs.

# The string prepended to the name of each job group.

# Jobs submission                                        
qsub -N ${group}job1                   
qsub -N ${group}job2       
# etc  
# submit Job which handle file produced
qsub -N postprocessing -hold_jid ${group}* $group 

and a post-processing script :

#$ -cwd                                                       
# The only parameter is the name of the group
tar -czf archive_${group}.tar.gz ${group}*                    
rm ${group}*  

11. Scripts

Instead of passing parameters in the qsub command, it is possible, and often more convenient to insert the script executed by qsub. Lines in the script with parameters, or directives are prefixed with # $.

You will find in the archive A model of script with a version with commentary in French and another in English. This script contains a number of useful guidelines including those on this page and others. You are encouraged to use this script and modify it to suit your needs.

12. How do I know if the cluster is available ?

The activity of the cluster is visible in real-time from the internal network Ganglia

13. What to do in case of problems ?

Look in the error file is generated in the form <job_name>.exxxxx

Check that the path /opt/gridengine/bin/lx26-amd64 is well ahead of the environement $PATH

Pay attention to memory management in your programs…

Think also use the cluster mailing list which includes all cluster's user and the technical team responsible for its administration.

14. Where can I find documentation?

C - Cluster History (Changelog)

Years Version Distribution (OS) Number and type of CPUs Power calculations estimated
2003 1.0 Alinka Raisin 30 x Intel Pentium-4 2.4 144 GFlops
2006 1.1 Rocks cluster 4.3 30 x Intel Pentium-4 2.4 144 GFlops
2008 2.0 Rocks cluster 5.0 50 x Intel Xeon-quad-Core 2.8 1680 GFlops
2010 2.1 Rocks cluster 5.3 50 x Intel Xeon-quad-Core 2.8 + 10 x AMD Opteron-octo-Core 2.3Ghz 2200 GFlops
2012 2.2 Rocks cluster 5.4 50 x Intel Xeon-quad-Core 2.8 + 10 x AMD opteron-hexa-Core 2.8Ghz 2200 GFlops
faq/cluster.txt · Dernière modification: 02/08/2017 14:08 par Vigneron Vincent
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0