Monthly Archives: February 2020

05 Submitting jobs to different nodes

Posted by admin on February 13, 2020 No comments

Since 2018 we have only one default slurm partition (queue) on zeus called “all”, see here

Submit to Broadwell nodes (32-CPU nodes)
Submit to Sandybridge nodes (12 CPUs per node)
Submit to Nehalem nodes
Request GPUs
Running job without slurm submission script

See also: examples of SLURM submission scripts

Submit to ‘Broadwell’ nodes

The hostnames of Broadwell nodes are zeus[300-343] & zeus[400-409] & zeus[500-501] (56 in total). They have 32 CPU-cores (Broadwell Xeon) and 128GB of RAM. There are several ways how you can use these nodes.

submit directly to compute node by specifying its hostname (not recommended, only if you need to use that exact node for some reason, e.g. having a reservation there): e.g. to request zeus300, in your slurm script use
```
#SBATCH -w zeus300
```
or request that particular node during submission
```
sbatch -w zeus300 slurmscriptname.slurm
```
or in your slurm submission script request constraint “broadwell” with whatever number of tasks you require, for example, we can request one task that can have access to all 32 CPUs of one broadwell node (to run SMP code for example)
```
#SBATCH --constraint=broadwell
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
```
or request 32 individual tasks (will allocate one task per CPU)
```
#SBATCH --constraint=broadwell
#SBATCH -n32 -N1
```
where -n32 is the total number of CPU’s requested over one node (-N1) (not Nr of CPUs per node)

Submit to ‘Sandybridge’ 12-CPU nodes nodes

GPU nodes of zeus (zeus[200-217]) have 12-core Sandybridge family CPU’s and NVIDIA K20 GPUs (2 per each node). You can use those nodes without GPU’s as well. To use these nodes do either of:

specify

--constraint=sandy

or request k20 GPUs:

--gres=gpu:K20:N

(where N is number of GPUs you need)

sbatch --constraint=sandy slurmscriptname.slurm

sbatch -N1 --gres=gpu:K20:2 slurmscriptname.slurm

you can also request these particular nodes (zeus200…zeus217) to be allocated to your job, e.g. we can ask for 2 of these nodes:

sbatch -w zeus[200-201] slurmscriptname.slurm

Submit to Nehalem (8-CPU) nodes

These are old Nehalem CPU based nodes of Zeus HPC each having 8 CPU-cores and 48Gb of RAM: zeus[20-91] (floor 3 of ECB) and zeus[100-171] (Mezz floor of ECB)
If you want to use specifically these nodes you can use

sbatch --constraint=nehalem slurmscriptname.slurm

when submitting your job with sbatch or srun or specify it in your slurm sbatch script

#SBATCH --constraint=nehalem

You can specifically request a particular node (e.g. zeus34 and zeus56…72 in the example below.

sbatch -w zeus[34,56-72] slurmscriptname.slurm

Important thing to remember is that these nodes cannot allocate more trhan 8 tasks per node.

Request nodes with GPU processors

In your slurm submission script specify (e.g. to ask for 2 GPUs of any kind)
```
#SBATCH --gres=gpu:2
```
OR request particular kind of GPUs (K20 or K80) in your slurm submission script (below example asks for 1 K80 GPU):
```
#SBATCH --gres=gpu:k80:1
```
or e.g. 2 K20 GPUs on one node in exclusive mode (node will not be shared with other jobs if it has free resources (CPUs or GPUs that are not used by your job):
```
sbatch --gres=gpu:k20:2 -N1 --exclusive slurmscriptname.sh
```

Launching job without slurm submission script

You can run any executable, e.g. “myexecutable”, on the zeuse’s nodes, without any slurm submission scripts (job will be placed in queue while the job is running anyway). Here we request 2 nodes and the total of 12 CPUs for 10 minutes:
```
srun -t 00:10:00 -N2 -n12 ./myexecutable
```
Note that when submitting job with “srun”, the command line will not be “released” until the job completes and the output is produced in STDOUT (unless specified otherwise). This approach is suitable if you expect your “job” to complete very quickly. this way you can also see the direct output of your executable in the terminal.
Yo can also reserve nodes for a specified time without launching any task on them (i.e. it will launch idle interactive bash session), the example below is asking for 2 Broadwell nodes (32 CPU/node, in exclusive mode (no resource sharing with other jobs)) for 12 hours:
```
salloc -N2 --constraint=broadwell --exclusive -t 12:00:00
```
Similarly for GPUs, e.g. one Sandybridge node with 2 K20 GPUs for 30 minutes:
```
salloc -N1 --gres=gpu:K20:2 --exclusive -t 30
```
After submission of “salloc” job your console will start dedicated bash session in interactive mode, which will have all allocated resources available. When you leave that bash session (Ctrl-D or “exit”) your “salloc” job/allocation will be terminated in the queue. To avoid this (job termination on exiting bash session) you can use “screen” utility.

For all possible “constraints”, see here: http://zeus.coventry.ac.uk/wordpress/?p=1094
Note that in any of the cases described above, if you do not specify the time required for your job, it will be assigned default 4 hours, after which the job will be terminated, you can do it by using “-t” flag (below we ask for 8 random CPUs for 8 hours

sbatch -n8 -t 8:00:00 slurmscriptname.slurm

EEC High Performance Computing

HPC of CU EEC

Monthly Archives: February 2020

05 Submitting jobs to different nodes

Submit to ‘Broadwell’ nodes

Submit to ‘Sandybridge’ 12-CPU nodes nodes

Submit to Nehalem (8-CPU) nodes

Request nodes with GPU processors

Launching job without slurm submission script

Recent Posts

Quick Links

Categories

Recent Comments

Archives

Meta

Archives

Categories

Meta