Category Archives: Links

HPC Help Wiki pages:

http://zeus1/dokuwiki/doku.php  (work in progress)

Zeus HPC Video channel

Microsoft Stream Zeus HPC channel

You are welcome t to contribute if you want.

Alex Pedcenko

04 Partitions (queues) of zeus HPC

There is only one queue/partition named “all” (default) on zeus HPC. For more details see this post:

Queues of zeus [updated June 2018] | EEC High Performance Computing (coventry.domains)

 

Alex Pedcenko

ONLINE SELF-PACED TRAINING with NVIDIA Deep Learning Institute Online Labs

The NVIDIA Deep Learning Institute (DLI) offers hands-on training for developers, data scientists, and researchers looking to solve the world’s most challenging problems with deep learning and accelerated computing.

Choose from full-day courses to deploy an end-to-end project, or two-hour mini courses to learn a specific technology or technique.

 

Online Courses available:

  • Fundamentals of Deep Learning for Computer Vision
  • Fundamentals of Accelerated Computing with CUDA C/C++
  • Fundamentals of Accelerated Computing with CUDA Python
  • Fundamentals of Accelerated Computing with OpenACC
  • Deep Learning for Healthcare Image Analysis
  • Deep Learning for Healthcare Genomics

The link: https://developer.nvidia.com/dli/onlinelabs 

PGI Community Edition [free]

PGI Community Edition

PGI Community Edition includes a no-cost license to a recent release of the PGI Fortran, C and C++ compilers and tools for multicore CPUs and NVIDIA Tesla GPUs, including all OpenACC, OpenMP and CUDA Fortran features. The PGI Community Edition enables development of performance-portable HPC applications with uniform source code across the most widely used parallel processors and systems.

http://www.pgroup.com/products/community.htm 

 

module load pgi/2017

Compilers:

pgc
pgcc
pgc++
pgf77
pgf90
pgf95
etc...

OpenMPI for PGI compilers

module load pgi/mpi/1.10.2/2017

OR

module load pgi/mpi/2.1.2/2017

Then mpi versions of PGI compilers will be available as

mpicc, mpic++, mpif90, mpif77...

 

EC3-21 Temperature

HPC Temperature plots

Resource Management for Multi-Core/Multi-Threaded Usage

https://slurm.schedmd.com/slurm_ug_2011/cons_res.pdf

GPU stuff

k80

(“World fastest GPU accelerator” as per 2014 — http://images.nvidia.com/content/tesla/pdf/nvidia-tesla-k80-overview.pdf )

10 New Zeus HPC Broadwell-CPU based nodes have 10 x 2 Nvidia Tesla K80 GPU accelerators on board (2 per node). Theoretical performance of each K80 GPU in double precision is 2.91 TFlops (8.7 TFlop in single precision) , which gives theoretical GPU power of these nodes in order of 58 TFlop in double precision calculations.

18 Older Sandybridge CPU-based compute nodes of Zeus have 36 Nvidia Tesla K20 GPU accelerators (2 per node), each of K20 has max. theoretical performance of 1.2 TFlops (double precision), which gives overall max GPU compute power of 43 TFlop. Obviously this is just an indication of the amount of max. possible compute power at ideal scaling (realistically you can’t just add these numbers together).

In comparison HPL benchmark performed on CPUs of all new 56 Broadwell nodes (1792 CPU-cores) gave 34 Tflop and 1152 CPUs of older 8-Core Nehalem based nodes gave 9.5 TFlops. and 18 x 12-core Sandybridge CPUs produced about 3.6 TFlops. See details here: http://zeus.coventry.ac.uk/wordpress/?s=HPL

——

Alex

 

New Intel Broadwell (32 CPU-core) nodes

  • 44 nodes with 2 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz (32 CPU-cores/node) 128Gb RAM:   queue Broadwell
  • 10 nodes with 2 x Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz (32 CPU-cores/node) 128Gb RAM + 2x NVidia K80 GPU:  queue NGPU
  • 2 nodes  with 2xIntel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz (32 CPU-cores/node) 128Gb RAM + Xeon Phi coprocessor SE10/7120: queue Phi

at the moment each queue has 36 hrs time limit for a job.

03 Disk Space and User Quotas

As you may be aware, we are experiencing constant disk space shortage on zeus HPC. To address this issue, the disk quotas have been introduced for user home folders. Each user is can be entitled (if space permitting) to 200 Gb of disk space. You can exceed this amount up to 1Tb for up to 7 days (grace period). After that you may no longer add files to your home folder until you clean it below original 200Gb threshold.
Remember that you should not use the HPC disk space for storing your files, please store only the files which are necessary for currently running jobs and/or post processing results etc..

For storing temporary files, job results and even submitting new jobs you can use fast scratch space here /beegfs/users/yourHPCusername

This space has no quota, but it is not backed up, so it should be only used for current jobs and projects, not for storing data!

 

You can look check how much your home folder is consuming here: http://zeus.coventry.ac.uk/space.php quotas’ status and disk usage are updated hourly.

What if I’m above the quota, but I still need my files and have no means to offload them from HPC?

At the moment we are regularly backing up the content of all /home/ folders onto external network drive (NAS drive). If you need to keep your files, which are currently in your home folder and are above the user quota and you have no means to store them anywhere else, please let me know (email me: aa3025@coventry.ac.uk) and I can disable backing up your home folder on zeus HPC. This way you will have bulk of your files already stored on the backup drive and can delete them from your zeus home folder, leaving only the files which are necessary for currently running jobs. When you delete the files from your home folder on Zeus, they will be automatically (with some delay of 1 day or so) deleted from the backup drive. So if you need to keep them, please let me know for disabling overwriting of the backup copy.

Then you can be given an access to backup drive to retrieve your files when needed.

Best Regards,
Alex Pedcenko

css.php