Monthly Archives: June 2018

Update on queues of Zeus HPC

Queuing model UPDATE June 2018

To simplify usage of different queues, we combined all nodes into a single default queue (slurm partition) “all”. The usage limits are now solely user-based, each user has default (for now) number of CPU*minutes that they can use at any time moment (subject to available resources). If this number of CPU*min is reached, the new jobs from this user will be put on queue until their running jobs will free resources. This is independent of type of compute nodes. During this initial stage we will try to adapt the default CPU*min allowance to suite better and more effective HPC usage. The simple principle behind this is that user can use more CPU cores but for less time, or, less CPU cores, but for longer time. The run time of the job to be submitted is determined by the value you set in –time or -t parameter during the submission of a job (e.g. -t 24:00:00).

If you require a particular type of compute nodes (CPU/GPU/Phi etc), this can be done in submission script or during the submission with sbatch command: by specifying an additional parameter “constraint”:

  • for 56 Intel Broadwell CPU based nodes (128GB RAM each) with 32xCPU-cores , specify --constraint=broadwell
  • for 144 Intel Nehalem CPU based nodes (48 GB RAM each) with 8xCPU-cores, specify --constraint=nehalem
  • for 18 Intel SandyBridge CPU based nodes (48 GB RAM) with 12xCPU-cores, do --constraint=sandy
  • for 1 x 32 CPU, 512GB RAM SMP node ask for --constraint=smp
  • for 10 nodes x 2 NVidia Kepler K20 GPUs, ask for --gres=gpu:K20:N (where N is the Nr of GPUs needed, max is 2 GPUs/node)
  • for 18 nodes x 2 NVidia Kepler K80 GPUs, ask for --gres=gpu:K80:N  (where N is the Nr of GPUs needed, max is 2 GPUs/node)
  • for N Intel Phi, ask for --gres=mic:N or --constraint=phi.

For more details on Zeus’s CPUs and nodes see this post: http://zeus.coventry.ac.uk/wordpress/?p=336

If you have no particular preference on the type of CPU or compute node and are running parallel job, please specify ONLY TOTAL Nr of CPUs required, NOT Nr of nodes!: SLURM will assign the nodes automatically.

e.g. if I need 64 CPUs in total for 24 hours on whatever available nodes I submit my slurm script with:

sbatch -n 64 -t 24:00:00 myslurmscriptname.slurm

if I need 64 CPUs in total for 48 hours on Broadwell-based nodes (32 CPUs/node) I submit my slurm script with:

sbatch -n 64 -t 48:00:00 --constraint=broadwell myslurmscriptname.slurm

Finally if I want 2 GPU nodes with 2 Nvidia Kepler K80 GPUs (1 GPUs/node) and 2 CPUs on each node (36 hours), I do something like:

sbatch -N2 --ntasks-per-node=1 --cpus-per-task=2 --gres=gpu:K80:1 -t 36:00:00 mygpuscript.slurm

Certainly some variations of these sbatch commands are possible, also these flags can be specified inside the slurm submission script itself. For full list of possible sbatch options see slurm docs: https://slurm.schedmd.com/sbatch.html

 

Alex Pedcenko

css.php