Applications of Algebraic Multigrid to Large Scale Mechanics Problems
description
Transcript of Applications of Algebraic Multigrid to Large Scale Mechanics Problems
![Page 1: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/1.jpg)
1
Mark F. Adams
22 October 2004
Applications of Algebraic Multigrid to Large Scale
Mechanics Problems
![Page 2: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/2.jpg)
2
Outline
Algebraic multigrid (AMG) introduction Industrial applications Micro-FE bone modeling Olympus Parallel FE framework Scalability studies on IBM SPs
Scaled speedup Plain speedup Nodal performance
![Page 3: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/3.jpg)
3
Multigrid smoothing and coarse grid correction
(projection)smoothing
Finest Grid
Prolongation (P=RT)
The MultigridV-cycle
First Coarse Grid
Restriction (R)
Note:smaller grid
![Page 4: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/4.jpg)
4
Multigrid V() - cycle Function u = MG-VMG-V(A,f)
if A is small u A-1f
else u S(f, u) -- steps of smoother (pre) rH PT( f – Au )
uH MG-VMG-V(PTAP, rH ) -- recursionrecursion (Galerkin)
u u + PuH
u S(f, u) -- steps of smoother (post)
Iteration matrix: T = S ( I - P(RAP)-1RA ) S multiplicative
![Page 5: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/5.jpg)
5
Smoothed Aggregation
Coarse grid space & smoother MG method Piecewise constant function: “Plain” agg. (P0)
Start with kernel vectors B of operator eg, 6 RBMs in elasticity
Nodal aggregation B P0
“Smoothed” aggregation: lower energy of functions One Jacobi iteration: P ( I - D-1 A ) P0
![Page 6: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/6.jpg)
6
Parallel Smoothers
CG/Jacobi: Additive (Requires damping for MG) Damped by CG (Adams SC1999) Dot products, non-stationary
Gauss-Seidel: multiplicative (Optimal MG smoother) Complex communication and computation (Adams SC2001)
Polynomial Smoothers: Additive Chebyshev is ideal for multigrid smoothers Chebychev chooses p(y) such that
|1 - p(y) y | = min over interval [* , max] Estimate of max easy Use * = max / C (No need for lowest eigenvalue)
C related to rate of grid coarsening
![Page 7: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/7.jpg)
7
Outline
Algebraic multigrid (AMG) introduction Industrial applications Micro-FE bone modeling Olympus Parallel FE framework Scalability studies on IBM SPs
Scaled speedup Plain speedup Nodal performance
![Page 8: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/8.jpg)
8
Aircraft carrier• 315,444 vertices• Shell and beam elements (6 DOF per node)• Linear dynamics – transient (time domain)• About 1 min. per solve (rtol=10-6)
– 2.4 GHz Pentium 4/Xenon processors– Matrix vector product runs at 254 Mflops
![Page 9: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/9.jpg)
9
Solve and setup times(26 Sun processors)
![Page 10: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/10.jpg)
10
Adagio: “BR” tire
ADAGIO: Quasi static solid mechanics app. (Sandia) Nearly incompressible visco-elasticity (rubber)
Augmented Lagrange formulation w/ Uzawa like update Contact (impenetrability constraint)
Saddle point solution scheme: Uzawa like iteration (pressure) w/ contact search
Non-linear CG (with linear constraints and constant pressure)
Preconditioned with Linear solvers (AMG, FETI, …) Nodal (Jacobi)
![Page 11: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/11.jpg)
11
“BR” tire
![Page 12: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/12.jpg)
12
Displacement history
![Page 13: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/13.jpg)
13
Outline
Algebraic multigrid (AMG) introduction Industrial applications Micro-FE bone modeling Olympus Parallel FE framework Scalability studies on IBM SPs
Scaled speedup Plain speedup Nodal performance
![Page 14: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/14.jpg)
14
Trabecular Bone
5-mm Cube
Cortical bone
Trabecular bone
![Page 15: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/15.jpg)
15
Micro-Computed Tomography
CT @ 22 m resolution
3D image
Mechanical TestingE, yield, ult, etc.
2.5 mm cube44 m elements
FE mesh
Methods: FE modeling
![Page 16: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/16.jpg)
16
Outline
Algebraic multigrid (AMG) introduction Industrial applications Micro-FE bone modeling Olympus Parallel FE framework Scalability studies on IBM SPs
Scaled speedup Plain speedup Nodal performance
![Page 17: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/17.jpg)
17
Athena: Parallel FE ParMetis
Parallel Mesh Partitioner (Univerisity of Minnesota)
Prometheus Multigrid Solver
FEAP Serial general purpose
FE application (University of California)
PETSc Parallel numerical
libraries (Argonne National Labs)
FE MeshInput File
Athena ParMetis
FE input file(in memory)
FE input file(in memory)
Partition to SMPs
Athena Athena ParMetis
File File File File
FEAPFEAPFEAPFEAP Material Card
Silo DBSilo DB
Silo DBSilo DB
Visit
Prometheus
PETScParMetis
METISMETISMETISMETIS
pFEAP
Computational Architecture
Olympus
![Page 18: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/18.jpg)
18
Outline
Algebraic multigrid (AMG) introduction Industrial applications Micro-FE bone modeling Olympus Parallel FE framework Scalability studies on IBM SPs
Scaled speedup Plain speedup Nodal performance
![Page 19: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/19.jpg)
1980 µm w/o shell
Inexact Newton CG linear solver
Variable tolerance
Smoothed aggregation AMG preconditioner
Nodal block diagonal smoothers: 2nd order Chebeshev (add.) Gauss-Seidel (multiplicative)
Scalability
![Page 20: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/20.jpg)
21
80 µm w/ shell
Vertebral Body With Shell
Large deformation elast. 6 load steps (3% strain) Scaled speedup
~131K dof/processor 7 to 537 million dof 4 to 292 nodes IBM SP Power3
15 of 16 procs/node used Double/Single Colony switch
![Page 21: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/21.jpg)
22
Computational phases Mesh setup (per mesh):
Coarse grid construction (aggregation) Graph processing
Matrix setup (per matrix): Coarse grid operator construction
Sparse matrix triple product RAP (expensive for S.A.)
Subdomain factorizations
Solve (per RHS): Matrix vector products (residuals, grid transfer) Smoothers (Matrix vector products)
![Page 22: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/22.jpg)
23
Linear solver iterations
Newton
Load
Small (7.5M dof) Large (537M dof)
1 2 3 4 5 1 2 3 4 5 6
1 5 14 20 21 18 5 11 35 25 70 2
2 5 14 20 20 20 5 11 36 26 70 2
3 5 14 20 22 19 5 11 36 26 70 2
4 5 14 20 22 19 5 11 36 26 70 2
5 5 14 20 22 19 5 11 36 26 70 2
6 5 14 20 22 19 5 11 36 26 70 2
![Page 23: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/23.jpg)
24
131K dof / proc - Flops/sec/proc .47 Terflops - 4088 processors
![Page 24: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/24.jpg)
25
End to end times and (in)efficiency components
![Page 25: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/25.jpg)
26
Sources of scale inefficiencies in solve phase
7.5M dof 537M dof
#iteration 450 897
#nnz/row 50 68
Flop rate 76 74
#elems/pr 19.3K 33.0K
model 1.00 2.78
Measured 1.00 2.61
![Page 26: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/26.jpg)
27
164K dof/proc
![Page 27: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/27.jpg)
28
First try: Flop rates (265K dof/processor)
265K dof per proc. IBM switch bug
Bisection bandwidth plateau 64-128 nodes
Solution: use more processors Less dof per proc. Less pressure on switch
Bisection bandwidth
![Page 28: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/28.jpg)
29
Outline
Algebraic multigrid (AMG) introduction Industrial applications Micro-FE bone modeling Olympus Parallel FE framework Scalability studies on IBM SPs
Scaled speedup Plain speedup Nodal performance
![Page 29: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/29.jpg)
30
Speedup with 7.5M dof problem (1 to 128 nodes)
![Page 30: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/30.jpg)
31
Outline
Algebraic multigrid (AMG) introduction Industrial applications Micro-FE bone modeling Olympus Parallel FE framework Scalability studies on IBM SPs
Scaled speedup Plain speedup Nodal performance
![Page 31: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/31.jpg)
32
Nodal Performance of IBM SP Power3 and Power4
IBM power3, 16 processors per node 375 Mhz, 4 flops per cycle 16 GB/sec bus (~7.9 GB/sec w/ STREAM bm)
Implies ~1.5 Gflops/s MB peak for Mat-Vec We get ~1.2 Gflops/s (15 x .08Gflops)
IBM power4, 32 processors per node 1.3 GHz, 4 flops per cycle Complex memory architecture
![Page 32: Applications of Algebraic Multigrid to Large Scale Mechanics Problems](https://reader036.fdocuments.in/reader036/viewer/2022062808/56815417550346895dc2134b/html5/thumbnails/32.jpg)
33
Speedup