Running Docker job on EPYC HPC

Running docker container as HPC job on one node (you can’t run a single container on multiple compute nodes obviously):

on Epyc & Pluto HPCs you can run a docker container as a slurm job, like

sbatch -N1 -n 8 -t 8:00:00 docker.slurm

where docker.slurm can contain, for example, the following incantations:
———————————————-

#!/bin/bash

container="ashael/hpl"
container_name=docker_${SLURM_JOB_ID}


docker pull ${container} # pull from docker hub

# we mount current working folder to container under /scratch
docker run -it -d --cpus="$SLURM_JOB_CPUS_PER_NODE" -v ${SLURM_SUBMIT_DIR}:/scratch --name docker_${SLURM_JOB_ID} ${container} # start container in a background (-d "detached" mode)

MY_USERID=$(id -u $USER)

CONTAINER_ID=$(docker ps -aqf "name=${container_name}")

echo my container id is ${CONTAINER_ID}

docker exec  ${CONTAINER_ID} useradd --uid ${MY_USERID} --home $HOME $USER
docker exec  ${CONTAINER_ID} chown -R $USER:$USER /scratch

docker exec  -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} mkdir -p /scratch/$SLURM_JOB_ID

# run executable inside your container, output may be written to /scratch/$SLURM_JOB_ID/ inside the container to appear in your current folder
docker exec -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} uname --all

# OR execute an existing script runme.sh from current folder ($SLURM_SUBMIT_DIR) mounted inside your container in /scratch, can also pass a jobid and CPU Nrs to it as parameters if needed:
docker exec -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} /scratch/runme.sh $SLURM_JOB_ID $SLURM_JOB_CPUS_PER_NODE

# stop the container and clean up
docker stop $CONTAINER_ID
docker ps -a # list all containers' states
docker rm $CONTAINER_ID # clean up, remove the container

Note the local job submission folder is “mounted” into the container in line 5. This is the way you can transfer the files between the container and your sessions (there are other ways as well).

 

Alternatively, while your container is running on the target node, you can ssh to the node and attach to the container’s terminal, allowing you to run interactive commands inside it:

docker attach docker_XXXXXX

where docker_XXXXXX is container name or ID (which you can get as well from “docker ps -a” output on the node.

If you want to terminate container from inside the container, do “Ctrl+D”

if you want to exit the “interactive” container session, leaving the container to run, do “Ctrl+p” and then “Ctrl+q”.

To start exited (but still existing container docker_XXXXXX, check “docker ps -a”), do from the compute node

docker start docker_XXXXXX

To stop the container from outside (from the compute node), do

docker stop docker_XXXXXX

When you finished with your container job, please remove it from the node as:

docker rm docker_XXXXXX

where docker_XXXXXX is the container name or ID (see “docker ps -a”)

At the end, always check “docker ps -a” from the target node to make sure all your containers are stopped and destroyed before you free up the compute node.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

css.php