LS-DYNA: CAE Simulation Software on Linux · PDF fileLS-DYNA: CAE Simulation Software on Linux...

IBM Deep Computing Group

June, 2003 | IBM Deep Computing Group © 2002 IBM Corporation

LS-DYNA: CAE Simulation Software on Linux Clusters

Guangye Li ([email protected])IBM Deep Computing Team

2

IBM Deep Computing Team – 2003 Cluster World Conference

Cluster World Conference | June 2003 © 2002 IBM Corporation

Topics

Introduction to LS-DYNALS-DYNA ApplicationsTwo versions of LS-DYNA: SMP and MPPAn example

Performance of LS-DYNA on clustersPerformance Improvement with Faster ProcessorsInterconnect Options: Gigabit Ethernet or MyrinetOne or two process nodesComparison of LAM/MPI and MPICH PerformanceSpeedup from Compiler OptionsSpeedup from Faster 533 MHz Front side Bus

Chrysler experience

3



LS-DYNA: A general purpose transient dynamic finite

element program capable of simulating complex real world problems

Software Vendor: Livermore Software Technology Corp. (LSTC)

Largest application in CAE

Large customer base

4



LS-DYNA applications include:

Occupant safety Metal Forming Metal Cutting Biomedical Blast loading Fluid-structure interaction Earthquake engineering …

5



Two parallel versions of LS-DYNA

SMP (OpenMP) for shared memory multiple processors.• Parallelized from a serial code• Scalable up to 16 CPUs

MPP (Distributed memory version)• Using the domain decomposition technique• Using MPI for communications between subdomains

(processors)• Scalable up to more than 100 CPUs. • Suitable for both shared memory multiple

processors and clusters• MPP-DYNA on clusters dramatically reduced the

turnaround time and the simulation cost

6



Comparison of SMP and MPP

05000

100001500020000250003000035000

1-CPU 2-CPU 4-CPU 8-CPU 16-CPU 32-CPU

ElapsedTime (sec)

SMP MPP1.3 GHz IBM p690November 2002 LS-DYNArefined Neon-535k elements

7



An Example: The Neon Model

Frontal crash with initial speed at 31.5 miles/hour

Model sizenumber of shell elements: 269,249number of nodal points: 285,832

Simulation length: 150 msvehicle bounce back observed at 70 ms

Model created by National Crash Analysis Center (NCAC) at GeorgeWashington University

one of the few publicly available model for vehicle crash analysisbased on 1996 Plymouth Neon

8



1996 Plymouth Neon

9



The model

10



The mesh

11



Domain decomposition

The whole mesh is decomposed into NCPU subdomains.

Each domain has about the same number of elementsEach link cut corresponding to communications between two

nodes. The decomposition should minimize the link cuts

Each CPU processes elements in its subdomain

CPUs exchange boundary data using message passing (MPI)

12



13



Simulation results

14



Performance Improvement with Faster Processors

0

5000

10000

15000

20000

25000


ElapsedTime (sec)

2.4 GHz 2.8 GHz

V960 r1488 LS-DYNAXeon, 2 CPUs per nodeGigabit EthernetJan-March 2003 LAM/MPIrefined Neon-535k elements

15



Configuring Each Node with One Processor

02000400060008000

10000120001400016000


ElapsedTime (sec)

2 CPUs per node 1 CPU per nodeV960 r1488 LS-DYNAGigabit Ethernet x335 2.8 GHzMarch 2003 LAM/MPIFront crash model 430k elements

16



Interconnect – Effect on Performance

0

5000

10000

15000

20000

25000

2-CPU 4-CPU 8-CPU 16-CPU 32-CPU

ElapsedTime (sec)

Fast Ethernet Gigabit Ethernet Myrinet2.2 GHz IntelliStation ClusterJune 2002 MPI LS-DYNArefined Neon-535k elements

17



Interconnect Performance Compared

0

5

10

15

20

25

30


ParallelSpeedup

x335+Fast Ethernet x335+Gigabit Ethernetx335+Myrinet p655+SP Switch2

V960 LS-DYNAJan 2003 Refined Neon 535k Elements

18



Comparison of LAM/MPI and MPICH Performance

0500

100015002000250030003500

16-CPU 32-CPU

ElapsedTime (sec)

MPICH-1.2.4 LAM/MPI-6.5.62.8 GHz x335 (Xeon) ClusterGigabit EthernetMarch 2003 LS-DYNArefined Neon-535k elements

19



Speedup from Compiler Options

25110No_SSE20781SSE

Elapsed time (sec)Intel Compiler Option

V960 r1106 MPP-DYNAFeb 2002 LAM/MPI 6.5.22.2 GHz IntelliStation node – 12 processor runs

20



Speedup from Faster 533 MHz Frontside Bus

1.184300001.201550001.08320001.1012000

Speedup: 400MHz to 533 MHz Frontside Bus

Model Size (elements)

V960 r1488 LS-DYNAMarch 2003 LAM/MPI2.8 GHz x335 node – 2 processor runs

21



Performance Improvement with Version 970

02000400060008000

100001200014000160001800020000


Elap

sed

Tim

e (s

ec) version 960 r1488 version 970 r3535

2.8 GHz x335 (Xeon) ClusterGigabit EthernetMarch 2003 LAM/MPI MPP-DYNArefined Neon-535k elements

1.20

1.151.14

1.10

22



Chrysler experience

Customer requirements• Reduced turn around time

• Price/performance

• Good accuracy, i.e., The numerical results should match the results on those from the current 64 bit machines

A team work• Chrysler

• LSTC

• IBM

• IntelEventually all 22 QA models passed the accuracy requirements and Chrysler bought 108 Xeon based IBM Linux cluster nodes for car crash simulation

23



Chrysler is happy with the IBM Linux cluster solution

Without parallel processing, we never would have achieved 5* (NCAP) and “good” (IIHS) on our new Chrysler Sebring and Dodge Stratus within the current product development time. --SubhasShetty, Chrysler

24



Summary

MPI based MPP-DYNA has better scalabilityLinux clusters reduced the turn around time for car crash simulationLinux clusters reduced the simulation costThe accuracy is satisfactoryUsers today can customize their system in order to pick the features which serve them best

Processors

Operating system

Interconnect

LS-DYNA: CAE Simulation Software on Linux · PDF fileLS-DYNA: CAE Simulation Software on Linux...

Documents

Transcript of LS-DYNA: CAE Simulation Software on Linux · PDF fileLS-DYNA: CAE Simulation Software on Linux...