EEC High Performance Computing

HPC Help Wiki pages:

Posted by admin on May 13, 2022 No comments

http://zeus1/dokuwiki/doku.php (work in progress)

SULIS expression of interest

Posted by admin on October 25, 2021 No comments

Dear Colleagues,

Coventry University entered into an equipment bid with the Midlands+ group of Universities for a tier 2 HPC cluster. This bid was successful, and the new cluster, called SULIS, is due to go into production. As one of the consortium in the bid, we have an allocation of core hours and GPU hours on the cluster. The cluster consists of 25,216 AMD EPYC compute cores configured as 167 dual processor CPU compute nodes plus 30 nodes equipped with three Nvidia A100 40GB GPUs. The cluster has a proportion of time reserved for the EPSRC to allocate outside the consortium, but consortium members are also allowed to bid for the EPSRC allocated time.

SULIS will be going live on Monday 1^st November and we are now inviting expressions of interest. Please find a link to a form, in order to indicate your interest in using the cluster, the time required on the cluster and proposed usage here – https://forms.office.com/r/WKpfznnJte

If you have any queries please email HPC.MPCS@coventry.ac.uk

Best,

Damien Foster BSc (hons) (Edin) DPhil (Oxon) FIMA, MINSTP

Professor of Statistical Physics

Centre Director

Centre for Computational Science and Mathematical Modelling

Faculty of Engineering, Environment and Computing

Coventry University

Coventry CV1 5FB

T: 02477 659245 | M: 0797 498 4977 | E: ab5651@coventry.ac.uk

Power outage in ECB, 5 August 2021

Posted by admin on August 5, 2021 No comments

Due to power outage in the early morning of 5 August, all HPC infrastructure in ECB was powered off.

By the evening of the same day power problems seems resolved, however aircons in Zeus HPC room are not working properly and one unit failed. Until this is resolved only part of the Zeus HPC compute nodes will be back online: compute nodes zeus[100…171] and zeus[200-217], which reside in different room unaffected by the aircon failure. I’ll bring the rest of the compute nodes (Broadwells and other half of the Nehalems) as soon as the cooling problem is solved. Contractors are working on the fix.

Regards

Alex Pedcenko

Zeus HPC GPU usage survey

Posted by admin on July 1, 2021 No comments

Dear All,

we are carrying a survey of our HPC GPU (graphical processors) usage, which would help us to understand and plan for future upgrades of the hardware. If you were using GPUs on zeus HPC or would like to use them in future, can you please complete a very short survey on the following link:

https://forms.office.com/Pages/ResponsePage.aspx?id=mqsYS2U3vkqsfA4NOYr9T9aUY6sEesBHsORvbIuKaq5UQUU0UjVWT1VZVko1WVBGNlFISElSVzdPMy4u

Many thanks

Alex Pedcenko

Running Docker job on EPYC HPC

Posted by admin on May 28, 2021 No comments

Running docker container as HPC job on one node (you can’t run a single container on multiple compute nodes obviously):

on Epyc & Pluto HPCs you can run a docker container as a slurm job, like

sbatch -N1 -n 8 -t 8:00:00 docker.slurm

where docker.slurm can contain, for example, the following incantations:
———————————————-

#!/bin/bash

container="ashael/hpl"
container_name=docker_${SLURM_JOB_ID}


docker pull ${container} # pull from docker hub

# we mount current working folder to container under /scratch
docker run -it -d --cpus="$SLURM_JOB_CPUS_PER_NODE" -v ${SLURM_SUBMIT_DIR}:/scratch --name docker_${SLURM_JOB_ID} ${container} # start container in a background (-d "detached" mode)

MY_USERID=$(id -u $USER)

CONTAINER_ID=$(docker ps -aqf "name=${container_name}")

echo my container id is ${CONTAINER_ID}

docker exec  ${CONTAINER_ID} useradd --uid ${MY_USERID} --home $HOME $USER
docker exec  ${CONTAINER_ID} chown -R $USER:$USER /scratch

docker exec  -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} mkdir -p /scratch/$SLURM_JOB_ID

# run executable inside your container, output may be written to /scratch/$SLURM_JOB_ID/ inside the container to appear in your current folder
docker exec -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} uname --all

# OR execute an existing script runme.sh from current folder ($SLURM_SUBMIT_DIR) mounted inside your container in /scratch, can also pass a jobid and CPU Nrs to it as parameters if needed:
docker exec -u ${MY_USERID}:${MY_USERID} ${CONTAINER_ID} /scratch/runme.sh $SLURM_JOB_ID $SLURM_JOB_CPUS_PER_NODE

# stop the container and clean up
docker stop $CONTAINER_ID
docker ps -a # list all containers' states
docker rm $CONTAINER_ID # clean up, remove the container

Note the local job submission folder is “mounted” into the container in line 5. This is the way you can transfer the files between the container and your sessions (there are other ways as well).

Alternatively, while your container is running on the target node, you can ssh to the node and attach to the container’s terminal, allowing you to run interactive commands inside it:

docker attach docker_XXXXXX

where docker_XXXXXX is container name or ID (which you can get as well from “docker ps -a” output on the node.

If you want to terminate container from inside the container, do “Ctrl+D”

if you want to exit the “interactive” container session, leaving the container to run, do “Ctrl+p” and then “Ctrl+q”.

To start exited (but still existing container docker_XXXXXX, check “docker ps -a”), do from the compute node

docker start docker_XXXXXX

To stop the container from outside (from the compute node), do

docker stop docker_XXXXXX

When you finished with your container job, please remove it from the node as:

docker rm docker_XXXXXX

where docker_XXXXXX is the container name or ID (see “docker ps -a”)

At the end, always check “docker ps -a” from the target node to make sure all your containers are stopped and destroyed before you free up the compute node.

Folding@Home — Coventry HPC got into top 1500 “donors”

Posted by admin on May 6, 2021 No comments

After about a year of operation within Folding@Home project we reached into top 1,500 (out of 225,885)

See https://stats.foldingathome.org/team/259515

Alex

Accessing CU network drives while working from home

Posted by admin on April 1, 2020 No comments

If you are outside CU campus network (I bet you are now) but want to access W-drive or H-drive or R-drive, you need to connect to CU VPN first. Then open “This PC” (a.k.a. My Computer or Windows Explorer) and enter in the address bar the following addresses:

W-drive: \\coventry.ac.uk\csv\Students\Shared\EC\STUDENT\
H-Drive Students: \\coventry.ac.uk\csv\Students\Personal (then check each folder inside to see in each of them your Documents folder is located, you won’t be able to see any others)
H-Drive Staff: \\coventry.ac.uk\csv\Staff\Personal (then check each folder inside to see in each of them your Documents folder is located, you won’t be able to see any others)
R-Drive: \\coventry.ac.uk\csv\Research

You will need to authenticate with your CU username in this format: COVENTRY\yourusername and your CU password. If you want to make this “permanent” you can mount these folders on your Windows 10 PC by right-clicking on “This PC” –> More –> Map Network Drive –> enter one of these addresses -> check “Connect Using different credentials (if your PC is not CU computer)” –> enter username and password as described above.

Running Parallel Python3 Jupyter Notebook on zeus HPC

Posted by admin on April 1, 2020 No comments

This is an approach to launch Jupyter notebook on a compute node of EEC HPC. Normally you launch Jupyter locally and then open associated web interface on your local machine. This is also possible to do on the HPC, however because the compute nodes of the cluster are mostly accessible via CLI only and are not “exposed” to the external to HPC network, one need to tunnel through the headnodes in order to reach them. The provided set of 2 HTA scripts simplifies the procedure: first script submits the Jupyter job to the queuing system of HPC (slurm) using html forms and ssh command tool (plink.exe from PuTTY). The second script (again using plink) establishes the ssh-tunnel to the target node, where the Jupyter server is running and starts the default browser on the client machine, pointing to the local port brought by the tunnel to the client machine. Since the compute nodes of HPC have multiple CPUs (some up to 32 cores), it is also shown that Jupyter notebook can utilise IPython ipcluster for running notebook codes on parallel threads.

link to scripts on coventry github

Video:

https://web.microsoftstream.com/embed/video/cb944d42-8d10-4784-8407-6e53fbaf3cbe?autoplay=false&showinfo=true

HPC on MS Teams

Posted by admin on March 24, 2020 No comments

Hi All,

I created HPC Team on MS Teams, in case support need to be provided. If you want you can add yourself to the HPC Team yourself: open MS Teams -> join> join with a code -> glxx7vy (this is the code)

Regards

Alex

HPC Help

Posted by admin on March 4, 2020 No comments

If you require help with your HPC-related stuff, you can try finding me @ MS Teams: HPC Help Channel of EEC HPC Team

Alex Pedcenko

EEC High Performance Computing

HPC of CU EEC

HPC Help Wiki pages:

SULIS expression of interest

Power outage in ECB, 5 August 2021

Zeus HPC GPU usage survey

Running Docker job on EPYC HPC

Folding@Home — Coventry HPC got into top 1500 “donors”

Accessing CU network drives while working from home

Running Parallel Python3 Jupyter Notebook on zeus HPC

HPC on MS Teams

HPC Help

Recent Posts

Quick Links

Categories

Recent Comments

Archives

Meta

Archives

Categories

Meta