Running docker container as HPC job on one node (you can’t run a single container on multiple compute nodes obviously):
on Epyc & Pluto HPCs you can run a docker container as a slurm job, like
sbatch -N1 -n 8 -t 8:00:00 docker.slurm
where docker.slurm can contain, for example, the following incantations:
———————————————-
#!/bin/bash container="ashael/hpl" container_name=docker_${SLURM_JOB_ID} docker pull ${container} # pull from docker hub # we mount current working folder to container under /scratch docker run -it -d --cpus="$SLURM_JOB_CPUS_PER_NODE" -v ${SLURM_SUBMIT_DIR}:/scratch --name docker_${SLURM_JOB_ID} ${container} # start container in a background (-d "detached" mode) MY_USERID=$(id -u $USER) CONTAINER_ID=$(docker ps -aqf "name=${container_name}") echo my container id is ${CONTAINER_ID} docker exec ${CONTAINER_ID} useradd --uid ${MY_USERID} --home $HOME $USER docker exec ${CONTAINER_ID} chown -R $USER:$USER /scratch docker exec -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} mkdir -p /scratch/$SLURM_JOB_ID # run executable inside your container, output may be written to /scratch/$SLURM_JOB_ID/ inside the container to appear in your current folder docker exec -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} uname --all # OR execute an existing script runme.sh from current folder ($SLURM_SUBMIT_DIR) mounted inside your container in /scratch, can also pass a jobid and CPU Nrs to it as parameters if needed: docker exec -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} /scratch/runme.sh $SLURM_JOB_ID $SLURM_JOB_CPUS_PER_NODE # stop the container and clean up docker stop $CONTAINER_ID docker ps -a # list all containers' states docker rm $CONTAINER_ID # clean up, remove the container
Note the local job submission folder is “mounted” into the container in line 5. This is the way you can transfer the files between the container and your sessions (there are other ways as well).
Alternatively, while your container is running on the target node, you can ssh to the node and attach to the container’s terminal, allowing you to run interactive commands inside it:
docker attach docker_XXXXXX
where docker_XXXXXX is container name or ID (which you can get as well from “docker ps -a” output on the node.
If you want to terminate container from inside the container, do “Ctrl+D”
if you want to exit the “interactive” container session, leaving the container to run, do “Ctrl+p” and then “Ctrl+q”.
To start exited (but still existing container docker_XXXXXX, check “docker ps -a”), do from the compute node
docker start docker_XXXXXX
To stop the container from outside (from the compute node), do
docker stop docker_XXXXXX
When you finished with your container job, please remove it from the node as:
docker rm docker_XXXXXX
where docker_XXXXXX is the container name or ID (see “docker ps -a”)
At the end, always check “docker ps -a” from the target node to make sure all your containers are stopped and destroyed before you free up the compute node.
Recent Comments