Monthly Archives: July 2015

Partial power off on Thursday-Friday 30-31 July-2015

Hi,
due to some electric power works to be scheduled in Mezz floor server room of ECB, part of Zeus HPC (nodes zeus00-zeus71, long queue) will be shut down on Thursday, 30-July-2015 at 22:00. At the moment you can still reserve and use these nodes in the “long” queue if your job finishes before 22:00 on Thursday. If you need longer time for your job to finish, the job you submit to long queue will be placed in waiting queue until maintenance task completes (hopefully end of Friday, 31st of July).

Regards,
Alex Pedcenko

reconstructPar in multiprocessor mode

If you are using OpenFOAM parallel solvers, you may have noticed that reconstruction of the decomposed fields after the solution completes takes quite a long time, especially if you have huge number of time-steps to reconstruct. Apparently the OpenFOAM utility reconstructPar is using just one CPU-core and reconstruct time steps one-by-one in a serial fashion.

However, reconstructPar takes few command-line arguments, in particular you can specify a time step or several separate time-steps to reconstruct. This feature enables to launch several reconstructPar processes, each dedicated to the reconstruction of its own portion of time steps. Brilliant idea! We are going to send bunch of reconstructPar to the compute nodes of HPC once solution completes (I bet I’m not the first person to invent this, but I’m going to use the power of slurm:)!

So, once your solution is done, put this script preconstructPar.slurm it the case folder and submit the job on whatever number of nodes and cores you want, e.g. here I use 4 nodes and 8 cores on each (32 CPUs in total) and default queue “all”: sbatch -N4 -n32 -p all preconstructPar.slurm

The listing of the slurm script which does the job:


#!/bin/bash
#SBATCH --time=8:00:00
#SBATCH --job-name="reconstruct"
#---------------------------------------------------------------------
#PROC files will store groups of time-steps for each CPU:
rm PROC*
#---------------------------------------------------------------------
# How many CPUs we have:
NNODES=$SLURM_NTASKS #(( SLURM_JOB_NUM_NODES * $SLURM_NTASKS_PER_NODE ))
#----------------------------------------------------------------------
# Find how many time-steps there are:
Nsteps=`ls --ignore="constant" ./processor0 | wc -l`
let Nsteps=$Nsteps-1
# if number of time-steps is less then CPU's:
if [ $Nsteps -lt $NNODES ]; then
NNODES=$Nsteps
fi
echo "Nsteps:" $Nsteps
#---------------------------------------------------------------------
#Checking whether Nr of timesteps divides into Nr of CPUs
let TPN=$Nsteps/$NNODES
let rem=$Nsteps%$NNODES
if [ $rem -gt 0 ] # if there is a remainder add one timestep per CPU
then
let TPN=$TPN+1
fi
let NNODES=$Nsteps/$TPN #thats how many CPUs we eventually need for fair division
#--------------------------------------------------------------------
echo " Will use $NNODES CPUs to reconstruct $TPN timesteps on each node:"
let TPN=$TPN-1 # TPN is Nr of time-staps per CPU, but we count from 0
#---------------------------------------------------------------------
node=0
i=0
#-------------- main loop along the time-steps ------------------------
for f in `ls --ignore="constant" ./processor0`
do
if [ "$f" != "0" ];then
if [ $i -eq 0 ];
then
COMA=""
else
COMA=","
fi
steps[$node]=${steps[$node]}$COMA$f
let i=$i+1
if [ $i -gt $TPN ]
then
let node=$node+1
i=0
fi
fi
done
#---------------- loop complete, print PROC files -----------------------
for ((j=0;j<=$NNODES-1; j++)) do echo ${steps[$j]} > PROCID$j # output which timesteps each CPU must use
echo ${steps[$j]}
done
#------------------------------------------------------------------------
# creating temporary bash-script for each process will pick its own portion of timesteps:
echo "#!/bin/bash" > thread.sh
chmod +x thread.sh
echo "times=`cat PROCID$SLURM_PROCID`" >> thread.sh
echo "echo "Launching on CPU $SLURM_PROCID the times $times">>debug.log " >> thread.sh
echo "reconstructPar -noZero -time $times" >> thread.sh
# Launching in parallel on $NNODES CPUs:
srun -n$NNODES thread.sh
#------------------ clean up the rubbish after all done --------------------
rm -f PROC*
rm -f thread.sh

You can now check on which CPUs the processes are actually running by doing:
squeue
to see on which nodes your reconstruction job is running. Say “zeus[200-217]”. Then we can check for reconstructPar processes on these nodes:
pdsh -w zeus[200-217] ps aux | grep reconstruct | dshbak
Where the list of nodes “zeus[200-217]” is taken from the output of squeue command (see above).

You must see something like:

[aa3025@zeus2 Rayleigh2]$ pdsh -w zeus[200-217] ps aux | grep reconstruct | dshbak
----------------
zeus200
----------------
aa3025 52682 99.9 17.2 8925884 8523252 ? R 09:37 4:19 reconstructPar -noZero -time 0.5
aa3025 52683 100 16.8 8925884 8327192 ? R 09:37 4:20 reconstructPar -noZero -time 10
aa3025 52684 100 17.0 8925884 8441992 ? R 09:37 4:20 reconstructPar -noZero -time 1
----------------
zeus201
----------------
aa3025 21405 99.7 17.4 8925884 8599528 ? R 09:37 4:20 reconstructPar -noZero -time 11
aa3025 21406 99.7 17.2 8925884 8540500 ? R 09:37 4:20 reconstructPar -noZero -time 10.5
aa3025 21407 99.6 17.3 8925884 8578924 ? R 09:37 4:20 reconstructPar -noZero -time 11.5
----------------
zeus202
----------------
aa3025 14981 99.5 17.3 8925884 8549820 ? R 09:37 4:19 reconstructPar -noZero -time 12.5
aa3025 14982 99.6 17.3 8925884 8557780 ? R 09:37 4:20 reconstructPar -noZero -time 12
aa3025 14983 99.6 17.1 8925884 8476252 ? R 09:37 4:19 reconstructPar -noZero -time 13
----------------
zeus203
........ etc

Enjoy the speed of reconstruction!

Alex Pedcenko

Zeus running out of space, please clean up!

link to the space consumption by users: http://zeus/space.php

Thank you,
Alex

‘Spit of Satan’ solar flare captured in NASA’s stunning ….

http://www.mirror.co.uk/news/technology-science/science/spit-satan-solar-flare-captured-5980001

1434591002-He.jpg

 

 

 

 

 

 

 

 

You can see the “Spit” on my capture sequences here:

css.php