Monthly Archives: December 2016

LS-DYNA tests on new Broadwell nodes

MPP version of LSDYNA was tested for various CPU configurations on new Broadwell nodes vs old Nehalem 8-CPU nodes for the same problem.

SLURM file for LS-Dyna 9.1.0 submission is  here   lsdyna.

For 9.1.0 version of LSDYNA use

module load lsdyna/971
module load lsdyna/pmpi

Family Nodes CPUs EXEC_TIME, hrs lsdyna ver mpi
Broadwell 1 32 06:02 7.1.2 HPMPI
Broadwell 1 16 09:22 7.1.2 HPMPI
Nehalem 2 16=2×8 >12 hrs (time limit reached) 7.1.2 HPMPI
Broadwell 2 32=2×16 05:28 7.1.2 HPMPI
Broadwell 4 64=4×16 02:51 9.1.0 PMPI
Broadwell 8 128=8×16 01:46 9.1.0 PMPI
Broadwell 2 64=2×32 05:08 9.1.0 PMPI
Sandybridge 2 24=2×12 09:30 7.1.3 PMPI

 

So It looks like running LSDYNA on both CPUs (16 cores) of Broadwell is really does not make the problem solve faster. Instead use just 16 CPU-cores (see 64 CPU-cores case — its 2x faster!): either all on one socket or on different ones is still remains to be tested.

Alex Pedcenko

Nodes of Zeus and HPL LINPACK tests [update Nov 2016]

New compute nodes (Broadwell) performance (measured by HPL Linpack benchmark)

Below are results of few HPL tests on all 56 new Broadwell 32-CPU nodes as well as on all 144 old Nehalems. Netlib xhpl was compiled with intel icc and ran with Bullx mpi. Here are the results:

# of cores CPU model Config Flops achieved theoretical
1792 Broadwell 56 nodes 34 Tflops 30.1 Tflops
1152 Nehalem 144 nodes 9.5 Tflops
320 Broadwell 10 nodes 6.523 Tflops 5.376 Tflops
32 Broadwell 1 node 675 Gflops 537.6 Gflops
12 Broadwell 1 node 260 Gflops 202 Gflops
204 Sandybridge 17 nodes (from GPU queue) 3.3 TFlops 3.9 TFops
12 Sandybridge 1 node (from GPU queue) 200 Gflops 230 Gflops
8 Sandybridge 1 node (48 Gb, 12 CPU-cores) 135.6 Gflops 76.7
8 Nehalem 1 node 70.49 Gflops 76.6
8 Nehalem 4 nodes x 2 cores 70.15 Gflops 76.6
8 Sandybridge 4 nodes x 2 cores 135.6 Gflops 76.8
8 Nehalem 2 floors x 2 nodes x 2 cores 70.17 Gflops 76.8
216 Sandybridge 18 nodes x 12 cores 3.5 Tflops
576 Nehalem 72 nodes x 8 cores 4.5 Tflops
32 Sandybridges on SMP node 1 nodes x 32 cores 0.5 Tflops

Test ran using bullx mpi 1.2.9

#################### 2013 results #######################

Below are some first basic HPL (linpack) results of the cluster.

We have two types of CPU’s on Zeuse’s  nodes (both @2.4 GHz):

 

# of cores CPU model Config Flops achieved theoretical
8 Sandybridge 1 node 135.6 Gflops 76.7
8 Nehalem 1 node 70.49 Gflops 76.6
8 Nehalem 4 nodes x 2 cores 70.15 Gflops 76.6
8 Sandybridge 4 nodes x 2 cores 135.6 Gflops 76.8
8 Nehalem 2 floors x 2 nodes x 2 cores 70.17 Gflops 76.8
216 Sandybridge 18 nodes x 12 cores 3.5 Tflops
576 Nehalem 72 nodes x 8 cores 4.5 Tflops
32 Sandybridges on SMP node 1 nodes x 32 cores 0.5 Tflops

Test ran using bullx mpi 1.2.4

Alex Pedcenko

Remote rendering with ParaView on HPC node with “pvserver”

You can launch remote ParaView server on the HPC compute node to render the data stored on HPC (no need to transfer data to your client machine)

  1. Reserve a compute node, say 1 “whole” node for 4 hours
salloc -N1 -n8 --exclusive -t 4:00:00

(you can also do it with a slurm submission script of course)

2. then on the node you have been allocated by slurm (say zeus15), launch ParaView server process:

[aa3025@zeus2 ~]$ salloc -N1 -n8 -t 4:00:00
salloc: Granted job allocation 16726
[aa3025@zeus2 ~]$ qstat
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
16726 short4 bash aa3025 R 0:12 1 zeus15


[aa3025@zeus2 ~]$ ssh zeus15
[aa3025@zeus15 ~]$ /share/apps/paraview/ParaView-5.2.0-Qt4-OpenGL2-MPI-Linux-64bit/bin/pvserver -display :0.0 --use-offscreen-rendering

Waiting for client...
Connection URL: cs://zeus15:11111
Accepting connection(s): zeus15:11111

Leave this window alone, the server is now accepting connections on port 11111.

3. Now we need to tunnel from your Desktop machine (which must have the same verion of ParaView installed as on the node (ParaView-5.2.0). First we establish the ssh tunnel from port 11111 of the node to port 11111 of your Desktop PC via Zeus’s login node, say zeus2.

So on your Desktop machine set up the tunnel (add user name before zeus2 if necessary, i.e. user@zeus2):

ssh -L 11111:zeus15:11111 zeus2

You will be logged in to zeus2 in this ssh session. keep this terminal running this is your link to the target node zeus15.

4. Next open local ParaView (Linux,Windows) and “connect to server”   “localhost:11111” -> in the 1st console running the server, you will see that connection is made:

[aa3025@zeus15 bin]$ ./pvserver -display :0.0 --use-offscreen-rendering

Waiting for client...

Connection URL: cs://zeus15:11111

Accepting connection(s): zeus15:11111

Client connected.

Now you can open your data files in ParaView directly from your home folder on zeus and process them etc and create your animation:

Once you checked it is all working you can make small 1-line slurm script for starting the ParaView server with sbatch-submitted job.

Alex Pedcenko.

css.php