Category Archives: News

HPC Help Wiki pages:

http://zeus1/dokuwiki/doku.php  (work in progress)

Folding@Home — Coventry HPC got into top 1500 “donors”

After about a year of operation within Folding@Home project we reached into top 1,500 (out of 225,885)

See https://stats.foldingathome.org/team/259515 

 

 

Alex

Update on queues of Zeus HPC

Queuing model UPDATE June 2018

To simplify usage of different queues, we combined all nodes into a single default queue (slurm partition) “all”. The usage limits are now solely user-based, each user has default (for now) number of CPU*minutes that they can use at any time moment (subject to available resources). If this number of CPU*min is reached, the new jobs from this user will be put on queue until their running jobs will free resources. This is independent of type of compute nodes. During this initial stage we will try to adapt the default CPU*min allowance to suite better and more effective HPC usage. The simple principle behind this is that user can use more CPU cores but for less time, or, less CPU cores, but for longer time. The run time of the job to be submitted is determined by the value you set in –time or -t parameter during the submission of a job (e.g. -t 24:00:00).

If you require a particular type of compute nodes (CPU/GPU/Phi etc), this can be done in submission script or during the submission with sbatch command: by specifying an additional parameter “constraint”:

  • for 56 Intel Broadwell CPU based nodes (128GB RAM each) with 32xCPU-cores , specify --constraint=broadwell
  • for 144 Intel Nehalem CPU based nodes (48 GB RAM each) with 8xCPU-cores, specify --constraint=nehalem
  • for 18 Intel SandyBridge CPU based nodes (48 GB RAM) with 12xCPU-cores, do --constraint=sandy
  • for 1 x 32 CPU, 512GB RAM SMP node ask for --constraint=smp
  • for 10 nodes x 2 NVidia Kepler K20 GPUs, ask for --gres=gpu:K20:N (where N is the Nr of GPUs needed, max is 2 GPUs/node)
  • for 18 nodes x 2 NVidia Kepler K80 GPUs, ask for --gres=gpu:K80:N  (where N is the Nr of GPUs needed, max is 2 GPUs/node)
  • for N Intel Phi, ask for --gres=mic:N or --constraint=phi.

For more details on Zeus’s CPUs and nodes see this post: http://zeus.coventry.ac.uk/wordpress/?p=336

If you have no particular preference on the type of CPU or compute node and are running parallel job, please specify ONLY TOTAL Nr of CPUs required, NOT Nr of nodes!: SLURM will assign the nodes automatically.

e.g. if I need 64 CPUs in total for 24 hours on whatever available nodes I submit my slurm script with:

sbatch -n 64 -t 24:00:00 myslurmscriptname.slurm

if I need 64 CPUs in total for 48 hours on Broadwell-based nodes (32 CPUs/node) I submit my slurm script with:

sbatch -n 64 -t 48:00:00 --constraint=broadwell myslurmscriptname.slurm

Finally if I want 2 GPU nodes with 2 Nvidia Kepler K80 GPUs (1 GPUs/node) and 2 CPUs on each node (36 hours), I do something like:

sbatch -N2 --ntasks-per-node=1 --cpus-per-task=2 --gres=gpu:K80:1 -t 36:00:00 mygpuscript.slurm

Certainly some variations of these sbatch commands are possible, also these flags can be specified inside the slurm submission script itself. For full list of possible sbatch options see slurm docs: https://slurm.schedmd.com/sbatch.html

 

Alex Pedcenko

PLUTO HPC switched off

for the weekend 23-25 March, Pluto HPC is switching off, due to the cooling failure in the HPC comms  room to prevent hardware damage.

Alex Pedcenko

CUDA updated to 9.0 on Zeus HPC

module load cuda/last will load ver. 9.0 of CUDA Toolkit paths. Nvidia drivers are also updated to CUDA-9 compatible (384.81) on all K80 (zeus[400-409) nodes)

Alex Pedcenko

PGI Community Edition [free]

PGI Community Edition

PGI Community Edition includes a no-cost license to a recent release of the PGI Fortran, C and C++ compilers and tools for multicore CPUs and NVIDIA Tesla GPUs, including all OpenACC, OpenMP and CUDA Fortran features. The PGI Community Edition enables development of performance-portable HPC applications with uniform source code across the most widely used parallel processors and systems.

http://www.pgroup.com/products/community.htm 

 

module load pgi/2017

Compilers:

pgc
pgcc
pgc++
pgf77
pgf90
pgf95
etc...

OpenMPI for PGI compilers

module load pgi/mpi/1.10.2/2017

OR

module load pgi/mpi/2.1.2/2017

Then mpi versions of PGI compilers will be available as

mpicc, mpic++, mpif90, mpif77...

 

Normality restored

Zeus HPC is operational. DataLake machine is still experiencing some problems.

Alex

Zeus down due to Room overheating

Hi,

Zeus HPC is down due to cooling failure in the room EC3-23.

Regards,
Alex

P.S. will update you when it can come back….

Nodes of Zeus and HPL LINPACK tests [update Nov 2016]

New compute nodes (Broadwell) performance (measured by HPL Linpack benchmark)

Below are results of few HPL tests on all 56 new Broadwell 32-CPU nodes as well as on all 144 old Nehalems. Netlib xhpl was compiled with intel icc and ran with Bullx mpi. Here are the results:

# of cores CPU model Config Flops achieved theoretical
1792 Broadwell 56 nodes 34 Tflops 30.1 Tflops
1152 Nehalem 144 nodes 9.5 Tflops
320 Broadwell 10 nodes 6.523 Tflops 5.376 Tflops
32 Broadwell 1 node 675 Gflops 537.6 Gflops
12 Broadwell 1 node 260 Gflops 202 Gflops
204 Sandybridge 17 nodes (from GPU queue) 3.3 TFlops 3.9 TFops
12 Sandybridge 1 node (from GPU queue) 200 Gflops 230 Gflops
8 Sandybridge 1 node (48 Gb, 12 CPU-cores) 135.6 Gflops 76.7
8 Nehalem 1 node 70.49 Gflops 76.6
8 Nehalem 4 nodes x 2 cores 70.15 Gflops 76.6
8 Sandybridge 4 nodes x 2 cores 135.6 Gflops 76.8
8 Nehalem 2 floors x 2 nodes x 2 cores 70.17 Gflops 76.8
216 Sandybridge 18 nodes x 12 cores 3.5 Tflops
576 Nehalem 72 nodes x 8 cores 4.5 Tflops
32 Sandybridges on SMP node 1 nodes x 32 cores 0.5 Tflops

Test ran using bullx mpi 1.2.9

#################### 2013 results #######################

Below are some first basic HPL (linpack) results of the cluster.

We have two types of CPU’s on Zeuse’s  nodes (both @2.4 GHz):

 

# of cores CPU model Config Flops achieved theoretical
8 Sandybridge 1 node 135.6 Gflops 76.7
8 Nehalem 1 node 70.49 Gflops 76.6
8 Nehalem 4 nodes x 2 cores 70.15 Gflops 76.6
8 Sandybridge 4 nodes x 2 cores 135.6 Gflops 76.8
8 Nehalem 2 floors x 2 nodes x 2 cores 70.17 Gflops 76.8
216 Sandybridge 18 nodes x 12 cores 3.5 Tflops
576 Nehalem 72 nodes x 8 cores 4.5 Tflops
32 Sandybridges on SMP node 1 nodes x 32 cores 0.5 Tflops

Test ran using bullx mpi 1.2.4

Alex Pedcenko

RStudio on Zeus

We have R 3.3.2 installed on zeus (login nodes and compute nodes). You can also access R on one login node via RStudio web interace at http://zeus.coventry.ac.uk/R

Alex Pedcenko

css.php