Monthly Archives: May 2017

New GCC-7.1.0 compiler is installed on zeus

module load gcc/7.1.0

Alex

Google is giving a cluster of 1,000 Cloud TPUs to researchers for free

See details here: https://go.newsfusion.com//cloud-computing/item/935489

 

Alex

 

EC3-21 Temperature

HPC Temperature plots

HPC room was overheating again on Sunday 7 May

Chillers in the HPC room EC3-21 failed once again this Sunday. Broadwell nodes and half of Nehalem nodes (zeus[20-91,15]) were switched off until the cause of the faults will be finally found by Estates.
Compute nodes which are available : zeus[100-171, 200-217] (queues: all, long, GPU)

Regards,
Alex

Normality restored

Zeus HPC is operational. DataLake machine is still experiencing some problems.

Alex

Zeus HPC update

Update on the HPC issue:

Temperature in main HPC room stabilised, I brought login nodes and main server and file servers up. Until further update from Estates about the cooling system stability in the room, most of the compute nodes in that room will be offline (that includes new Broadwell nodes)

I brought some Nehalem (half of 8-CPU nodes) and Sandybridge (12-CPU “GPU” queue) compute nodes up in unaffected by cooling failure room (zeus[100-171], zeus[200-217]), they can be used as file servers now are operational.

Regards,
Alex

Zeus down due to Room overheating

Hi,

Zeus HPC is down due to cooling failure in the room EC3-23.

Regards,
Alex

P.S. will update you when it can come back….

css.php