Monthly Archives: December 2016

LS-DYNA tests on new Broadwell nodes

Posted by admin on December 17, 2016 No comments

MPP version of LSDYNA was tested for various CPU configurations on new Broadwell nodes vs old Nehalem 8-CPU nodes for the same problem.

SLURM file for LS-Dyna 9.1.0 submission is here lsdyna.

For 9.1.0 version of LSDYNA use

module load lsdyna/971 module load lsdyna/pmpi

Family	Nodes	CPUs	EXEC_TIME, hrs	lsdyna ver	mpi
Broadwell	1	32	06:02	7.1.2	HPMPI
Broadwell	1	16	09:22	7.1.2	HPMPI
Nehalem	2	16=2×8	>12 hrs (time limit reached)	7.1.2	HPMPI
Broadwell	2	32=2×16	05:28	7.1.2	HPMPI
Broadwell	4	64=4×16	02:51	9.1.0	PMPI
Broadwell	8	128=8×16	01:46	9.1.0	PMPI
Broadwell	2	64=2×32	05:08	9.1.0	PMPI
Sandybridge	2	24=2×12	09:30	7.1.3	PMPI

So It looks like running LSDYNA on both CPUs (16 cores) of Broadwell is really does not make the problem solve faster. Instead use just 16 CPU-cores (see 64 CPU-cores case — its 2x faster!): either all on one socket or on different ones is still remains to be tested.

Alex Pedcenko

Nodes of Zeus and HPL LINPACK tests [update Nov 2016]

Posted by admin on December 8, 2016 No comments

New compute nodes (Broadwell) performance (measured by HPL Linpack benchmark)

Below are results of few HPL tests on all 56 new Broadwell 32-CPU nodes as well as on all 144 old Nehalems. Netlib xhpl was compiled with intel icc and ran with Bullx mpi. Here are the results:

# of cores	CPU model	Config	Flops achieved	theoretical
1792	Broadwell	56 nodes	34 Tflops	30.1 Tflops
1152	Nehalem	144 nodes	9.5 Tflops
320	Broadwell	10 nodes	6.523 Tflops	5.376 Tflops
32	Broadwell	1 node	675 Gflops	537.6 Gflops
12	Broadwell	1 node	260 Gflops	202 Gflops
204	Sandybridge	17 nodes (from GPU queue)	3.3 TFlops	3.9 TFops
12	Sandybridge	1 node (from GPU queue)	200 Gflops	230 Gflops
8	Sandybridge	1 node (48 Gb, 12 CPU-cores)	135.6 Gflops	76.7
8	Nehalem	1 node	70.49 Gflops	76.6
8	Nehalem	4 nodes x 2 cores	70.15 Gflops	76.6
8	Sandybridge	4 nodes x 2 cores	135.6 Gflops	76.8
8	Nehalem	2 floors x 2 nodes x 2 cores	70.17 Gflops	76.8
216	Sandybridge	18 nodes x 12 cores	3.5 Tflops	—
576	Nehalem	72 nodes x 8 cores	4.5 Tflops	—
32	Sandybridges on SMP node	1 nodes x 32 cores	0.5 Tflops	—

Test ran using bullx mpi 1.2.9

#################### 2013 results #######################

Below are some first basic HPL (linpack) results of the cluster.

We have two types of CPU’s on Zeuse’s nodes (both @2.4 GHz):

GPU queue: 18 nodes (zeus200-217) of Intel(R) Xeon(R) CPU E5-2440 (Sandybridge) — 12 cores per node & 48 Gb of RAM, & 2x Nvidia Kepler K20 GPUs
Default “all” queue: 144 nodes of Intel(R) Xeon(R) CPU L5530 (Nehalem) — 8 cores per node & 48 Gb of RAM
“SMP” queue: 1 node of Intel(R) Xeon(R) CPU E5-4620 (Sandybridge)– 32 cores per node & 512 Gb RAM

# of cores	CPU model	Config	Flops achieved	theoretical
8	Sandybridge	1 node	135.6 Gflops	76.7
8	Nehalem	1 node	70.49 Gflops	76.6
8	Nehalem	4 nodes x 2 cores	70.15 Gflops	76.6
8	Sandybridge	4 nodes x 2 cores	135.6 Gflops	76.8
8	Nehalem	2 floors x 2 nodes x 2 cores	70.17 Gflops	76.8
216	Sandybridge	18 nodes x 12 cores	3.5 Tflops	—
576	Nehalem	72 nodes x 8 cores	4.5 Tflops	—
32	Sandybridges on SMP node	1 nodes x 32 cores	0.5 Tflops	—

Test ran using bullx mpi 1.2.4

Alex Pedcenko

Remote rendering with ParaView on HPC node with “pvserver”

Posted by admin on December 2, 2016 No comments

You can launch remote ParaView server on the HPC compute node to render the data stored on HPC (no need to transfer data to your client machine)

Reserve a compute node, say 1 “whole” node for 4 hours

salloc -N1 -n8 --exclusive -t 4:00:00

(you can also do it with a slurm submission script of course)

2. then on the node you have been allocated by slurm (say zeus15), launch ParaView server process:

[aa3025@zeus2 ~]$ salloc -N1 -n8 -t 4:00:00
salloc: Granted job allocation 16726
[aa3025@zeus2 ~]$ qstat
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
16726 short4 bash aa3025 R 0:12 1 zeus15


[aa3025@zeus2 ~]$ ssh zeus15
[aa3025@zeus15 ~]$ /share/apps/paraview/ParaView-5.2.0-Qt4-OpenGL2-MPI-Linux-64bit/bin/pvserver -display :0.0 --use-offscreen-rendering

Waiting for client...
Connection URL: cs://zeus15:11111
Accepting connection(s): zeus15:11111

Leave this window alone, the server is now accepting connections on port 11111.

3. Now we need to tunnel from your Desktop machine (which must have the same verion of ParaView installed as on the node (ParaView-5.2.0). First we establish the ssh tunnel from port 11111 of the node to port 11111 of your Desktop PC via Zeus’s login node, say zeus2.

So on your Desktop machine set up the tunnel (add user name before zeus2 if necessary, i.e. user@zeus2):

ssh -L 11111:zeus15:11111 zeus2

You will be logged in to zeus2 in this ssh session. keep this terminal running this is your link to the target node zeus15.

4. Next open local ParaView (Linux,Windows) and “connect to server” “localhost:11111” -> in the 1st console running the server, you will see that connection is made:

[aa3025@zeus15 bin]$ ./pvserver -display :0.0 --use-offscreen-rendering

Waiting for client...

Connection URL: cs://zeus15:11111

Accepting connection(s): zeus15:11111

Client connected.

Now you can open your data files in ParaView directly from your home folder on zeus and process them etc and create your animation:

Once you checked it is all working you can make small 1-line slurm script for starting the ParaView server with sbatch-submitted job.

Alex Pedcenko.

EEC High Performance Computing

HPC of CU EEC

Monthly Archives: December 2016

LS-DYNA tests on new Broadwell nodes

Nodes of Zeus and HPL LINPACK tests [update Nov 2016]

New compute nodes (Broadwell) performance (measured by HPL Linpack benchmark)

Remote rendering with ParaView on HPC node with “pvserver”

Recent Posts

Quick Links

Categories

Recent Comments

Archives

Meta

Archives

Categories

Meta