Monthly Archives: September 2013

Examples of job submission scripts with SLURM

Basic sample submission scripts are listed HERE.

Note that you also can reserve the nodes with command salloc

salloc -N1 -n4 -t 12:00:00 — request 4 CPU-cores (n4) on one node (N1) for 12 hours in default queue (“all”)

salloc -p long -n24 -t 48:00:00 — request 24 CPU-cores for 48 hours in queue “long” (on whatever number of nodes)

salloc -p NGPU --gres=gpu:2 -N1 -n2 -t 12:00:00 — request 2 CPU cores and two GPUs in queue NGPU (we have 2 GPUs/node max) for 12 hours

salloc -p Broadwell -N1 -w zeus300 --exclusive — request particular node (zeus300) in exclusive mode (all resources)

When the node is reserved, you can send e.g. mpi code with mpirun command, or just login to the target node with ssh and work interactively in the command line.

You can also use srun command to run tasks on CPUs, reserved with salloc :

[aa3025@zeus2 ~]$ qstat
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            538907      NGPU     bash   aa3025  R       0:04      1 zeus405

[aa3025@zeus2 ~]$ srun -n 4 hostname
zeus405
zeus405
zeus405
zeus405

Interactive reservation of CPUs (no slurm scripting involved)

You can reserve nodes of Zeus HPC by issuing command, e.g.

sallocĀ  -t 2:30:00 -N3 -n24

this will reserve 24 CPU-cores (-n24) on 3 compute nodes (-N3) for 2.5 hours.

You can also just specify how many CPU’s you want without number of nodes, say 16 CPU cores anywhere:

sallocĀ  -t 2:30:00 -n16

CPU’s will be allocated on available nodes, not necessary in consecutive order.

You will get something like:

[aa3025@zeus2 arrays]$ salloc -t 2:30:00 -n16
salloc: Pending job allocation 296104
salloc: job 296104 queued and waiting for resources
salloc: job 296104 has been allocated resources
salloc: Granted job allocation 296104
[aa3025@zeus2 arrays]$

check on which nodes your job was allocated, if needed:

[aa3025@zeus2 arrays]$ qstat
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            296104       all     bash   aa3025  R       0:05      1 zeus15
[aa3025@zeus2 arrays]$

or if you already have some jobs in the queue, do

scontrol show jobid=296104

and you will see the specs of the job 296104

JobId=296104 JobName=bash
   UserId=aa3025(500) GroupId=aa3025(500)
   Priority=2483 Nice=0 Account=default QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:02:16 TimeLimit=02:30:00 TimeMin=N/A
   SubmitTime=2017-02-24T11:59:53 EligibleTime=2017-02-24T11:59:53
   StartTime=2017-02-24T12:00:02 EndTime=2017-02-24T14:30:02
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=short4 AllocNode:Sid=zeus2:20360
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=zeus15
   BatchHost=zeus15
   NumNodes=1 NumCPUs=16 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=(null)
   WorkDir=/home/aa3025/tests/arrays

from where we see that node list is

...
NodeList=zeus15
...

Then you can directly use these nodes in a way you like, with “srun” or with mpi, e.g

[aa3025@zeus2 arrays]$ srun hostname
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15
zeus15

So we have 16 instances of “hostname” command executed on the target node(s).

with MPI, e.g. if I use Intel mpi compiled code (say some standard matrix multiplication code in parallel):

Compilation with mpicc wrapper

[aa3025@zeus2 gcc]$ module load intel/13
[aa3025@zeus2 gcc]$ mpicc mmult.c -o mmult_mpi_icc.exe
...
[aa3025@zeus2 gcc]$ mpirun -np 16 ./mmult_mpi_icc.exe
Time taken = 7.069788 seconds
Time taken = 7.093434 seconds
Time taken = 7.129114 seconds
Time taken = 7.146285 seconds
Time taken = 7.139382 seconds
Time taken = 7.143934 seconds
Time taken = 7.152325 seconds
Time taken = 7.160740 seconds
Time taken = 7.162425 seconds
Time taken = 7.159303 seconds
Time taken = 7.182479 seconds
Time taken = 7.169939 seconds
Time taken = 7.194784 seconds
Time taken = 7.193553 seconds
Time taken = 7.187440 seconds
Time taken = 7.245136 seconds

.. is the output from each CPU

When you finished using your reservation before the expiration time, you need to cancel it with “scancel” to free resources (or it will expire after 2.5 hours you specified during the reservation with “salloc”).

It will also be cancelled if you correctly “exit” from the the salloc-ated terminal.

scancel 296104

where 296104 was jobid number.

Alex Pedcenko

SLURM & examples of submission scripts

slurm_1

The queue scheduler of ZEUS is SLURM (analogue of PBS/torque scheduler we use on Pluto and Mercury):

  • Slurm documentation is available here.
  • Basic sample submission scripts are listed HERE.
  • Submission of the job via slurm script “scriptname.sh” can done with command “sbatch”:

    sbatch scriptname.sh

  • Queue states:

    sinfo

  • Listing of running jobs:

    squeue

  • Delete job 1234 from queue

    scancel 1234

  • You can run any executable, e.g. “myexecutable”, on the zeuse’s nodes, e.g. zeus66 and zeus01…zeus06 without any slurm submission scripts (job will be placed in queue anyway):

    srun -w zeus[66,101-106] ./myexecutable

    Note: You will NOT be able to login to the compute node(s) from the head node if you haven’t jobs running on that compute node(s).

css.php