Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
en:leria:centre_de_calcul:cluster [08/02/2019 11:25]
Chantrein Jean-Mathieu [Array jobs]
en:leria:centre_de_calcul:cluster [24/04/2019 19:05] (current)
Chantrein Jean-Mathieu [Data storage]
Line 18: Line 18:
 <​note>​ <​note>​
    * This wiki page is also yours, do not hesitate to modify it directly or to propose modifications to technique [at] info.univ-angers.fr.    * This wiki page is also yours, do not hesitate to modify it directly or to propose modifications to technique [at] info.univ-angers.fr.
-   * All cluster users must be on the mailing list [[http://sympa.info.univ-angers.fr/​wws/info/cluster|cluster]] +   * All cluster users must be on the mailing list [[http://listes.univ-angers.fr/​sympa/info/calcul-hpc-leria|calcul-hpc-leria]] 
-   * To subscribe to this mailing list, simply send an email to sympa@info.univ-angers.fr with the subject subscribe ​cluster ​Name Surname+   * To subscribe to this mailing list, simply send an email to sympa@listes.univ-angers.fr with the subject subscribe ​calcul-hpc-leria ​Name Surname
 </​note>​ </​note>​
  
Line 65: Line 65:
 Each of these partitions contains nodes. Each of these partitions contains nodes.
  
-The compute nodes work with a debian ​testing ​operating system. You can find the list of installed software in the [[en:​leria:​centre_de_calcul:​cluster#​lists_of_install_software_for_high_performance_calculating | List of installed software for high performance computing]] sections.+The compute nodes work with a debian ​stable ​operating system. You can find the list of installed software in the [[en:​leria:​centre_de_calcul:​cluster#​lists_of_install_software_for_high_performance_calculating | List of installed software for high performance computing]] sections.
 ==== Usage policy ==== ==== Usage policy ====
  
Line 272: Line 272:
 You can also see [[en:​leria:​centre_de_calcul:​cluster#​global_architecture|global architecture]]. You can also see [[en:​leria:​centre_de_calcul:​cluster#​global_architecture|global architecture]].
  
-The compute cluster uses a pool of distributed storage servers [[https://​www.beegfs.io/​content/​|beegfs]]. This beegfs storage is independent of the compute servers. This storage area is naturally accessible in the tree of any compute node under /​home/​$USER. Since this storage is remote, all read/write in your home is network dependent. Our Beegfs storage and the underlying network are very powerful, but for some heavy processing, you might be better off using local disks from the compute servers. To do this, you can use the /​local_working_directory directory of the calculation servers. This directory works in the same way as /tmp except that the data is persistent when the server is restarted.+  * The compute cluster uses a pool of distributed storage servers [[https://​www.beegfs.io/​content/​|beegfs]]. This beegfs storage is independent of the compute servers. This storage area is naturally accessible in the tree of any compute node under /​home/​$USER. Since this storage is remote, all read/write in your home is network dependent. Our Beegfs storage and the underlying network are very powerful, but for some heavy processing, you might be better off using local disks from the compute servers. To do this, you can use the /​local_working_directory directory of the calculation servers. This directory works in the same way as /tmp except that the data is persistent when the server is restarted.
  
 +  * If you want to create groups, please send an email to technique.info [at] listes.univ-angers.fr with the name of the group and the associated users.
 +
 +  * As a reminder, **by default**, the rights of your home are in 755, so **anyone can read and execute your data**. ​
 ===== Advanced use ===== ===== Advanced use =====
  
 ==== Array jobs ==== ==== Array jobs ====
  
-You should start by reading the [[https://​slurm.schedmd.com/​job_array.html|official documentation]].+You should start by reading the [[https://​slurm.schedmd.com/​job_array.html|official documentation]]. This [[http://​scicomp.aalto.fi/​triton/​tut/​array.html|page]] presents some interesting use case.
  
 If you have a large number of files or parameters to process with a single executable, you must use a [[https://​slurm.schedmd.com/​job_array.html|array job]]. If you have a large number of files or parameters to process with a single executable, you must use a [[https://​slurm.schedmd.com/​job_array.html|array job]].
Line 349: Line 352:
 ./​job_name_exec ${INSTANCES[$SLURM_ARRAY_TASK_ID]} ./​job_name_exec ${INSTANCES[$SLURM_ARRAY_TASK_ID]}
 </​code>​ </​code>​
 +
 +=== Multiple instances job with multiple executions (Seed number) ===
 +
 +Sometimes it is necessary to launch several times the execution on an instance by modifying the seed which makes it possible to generate and reproduct random numbers.
 +
 +Let the following tree:
 +<​code>​
 +job_name
 +├── error
 +├── instances
 +│ ├── bench1.txt
 +│ ├── bench2.txt
 +│ └── bench3.txt
 +├── job_name_exec
 +├── output
 +├── submit_instances_dir_with_seed.slurm
 +└── submit.sh
 +</​code>​
 +
 +Just run the following command:
 +
 +  ./submit.sh
 +
 +with the following submit.sh file (remember to change the NB_SEED variable):
 +
 +<code bash>
 +#!/bin/bash
 +
 +readonly NB_SEED=50
 +
 +for instance in $ (ls instances)
 +do
 +  sbatch --output output/​${instance}_%A-%a --error error/​${instance}_%A-%a --array 0-${NB_SEED} submit_instances_dir_with_seed.slurm instances/​${instance}
 +done
 +exit 0
 +</​code>​
 +
 +and the following submit_instances_dir_with_seed.slurm batch:
 +
 +<code bash>
 +#!/bin/bash
 +#SBATCH --mail-type = END, FAIL
 +#SBATCH --mail-user = YOUR-EMAIL
 +
 +echo "#######​ INSTANCE: $ {1}"
 +echo "#######​ SEED NUMBER: $ {SLURM_ARRAY_TASK_ID}"​
 +echo
 +srun echo nameApplication $ {1} $ {SLURM_ARRAY_TASK_ID}
 +</​code>​
 +
 +With this method, the variable SLURM_ARRAY_TASK_ID contains the seed. And you submit as many array jobs as there are instances in the instance directory.
 +You can easily find your output which is named like this:
 +
 +  output/​instance_name-ID_job-seed_number
  
 === Dependencies between job === === Dependencies between job ===
Line 489: Line 546:
    * frederic.lardeux    * frederic.lardeux
    * gilles.hunault    * gilles.hunault
 +
 +==== Cplex ====
 +
 +Leria has an academic license for the Cplex software.
 +
 +The path to the library cplex is the default path /​opt/​ibm/​ILOG/​CPLEX_Studio129 (version 12.9)
 ===== FAQ ===== ===== FAQ =====
 +
 +  * How to know which are the resources of a partition, example with the partition std:
 +
 +  user@stargate~#​ scontrol show Partition std
  
   * How to get an interactive shell prompt in a compute node of your default partition?   * How to get an interactive shell prompt in a compute node of your default partition?
Line 555: Line 622:
   * libtool   * libtool
   * libopenblas-base   * libopenblas-base
 +  * maven
   * nasm   * nasm
 +  * openjdk-8-jdk-headless
   * r-base   * r-base
   * r-base-dev   * r-base-dev
Line 601: Line 670:
 ==== Cluster load overview ==== ==== Cluster load overview ====
  
-https://​leria.univ-angers.fr/grafana/​d/​_0Bh3sxiz/​vue-densemble-du-cluster+https://grafana.leria.univ-angers.fr/​d/​_0Bh3sxiz/​vue-densemble-du-cluster
  
 ==== Details per node ==== ==== Details per node ====
  
-https://​leria.univ-angers.fr/grafana/​d/​000000007/​noeuds-du-cluster+https://grafana.leria.univ-angers.fr/​d/​000000007/​noeuds-du-cluster
  
 <​note>​You can select the node you are interested in using the drop-down menu "​HOST"</​note>​ <​note>​You can select the node you are interested in using the drop-down menu "​HOST"</​note>​
en/leria/centre_de_calcul/cluster.1549621546.txt.gz · Last modified: 08/02/2019 11:25 by Chantrein Jean-Mathieu
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0