Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8...

38
Parallel Domain Decomposition Preconditioning For Computational Fluid Dynamics VecPar '98, Porto, Portugal, June 21-23, 1998 Timothy J. Barth Information Sciences Directorate NASA Ames Research Center Moffett Field, California USA, 9LI035 Tony F. Chan Department of Mathematics University of California, Los Angeles Los Angeles, California USA, 9002LI Wei-Pai Tang Department of Computer Science University of Waterloo Waterloo, Ontario Canada, N2L 3G1 Barth, Chart, Tang (1998) https://ntrs.nasa.gov/search.jsp?R=20020023597 2020-05-25T14:23:11+00:00Z

Transcript of Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8...

Page 1: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Parallel Domain Decomposition Preconditioning

For Computational Fluid Dynamics

VecPar '98, Porto, Portugal, June 21-23, 1998

Timothy J. Barth

Information Sciences Directorate

NASA Ames Research Center

Moffett Field, California USA, 9LI035

Tony F. Chan

Department of Mathematics

University of California, Los Angeles

Los Angeles, California USA, 9002LI

Wei-Pai Tang

Department of Computer Science

University of Waterloo

Waterloo, Ontario Canada, N2L 3G1

Barth, Chart, Tang (1998)

https://ntrs.nasa.gov/search.jsp?R=20020023597 2020-05-25T14:23:11+00:00Z

Page 2: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

I Outline

• Objectives

• Some Difficult Fluid Flow Prob_ms

• Stabilized Spatial Discretizations

• Newton's Method for Solving the Discretized

Flow Equations

* Schur Complement Domain Decomposition

- Basic Formulation

- Simplifying Strategies

• Iteratiye Su_.bd.__n and Schur

Complement Solves-

, Matrix Element Dropping

• Localized Schur Complement

Computation

• Supersparse Computations

- Performance Evaluation

• Concluding Remarks

Barth, Chsn. T_ng (1998)

Page 3: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

!

Objectives

* Simulate Compressible Navier-Stokes Flow

About General Geometries

- Unstructured (Simplicial) Meshes

- Stabilized Numerical Space Discretizations

- Exact and Approximate Newton Iterations

• Obtain ParallelScalabilityand Efficiency

SchurComplement Domain Decomposition

- ILU + GMRES on Subdomain Problems

- Emphasis on CoarseGrain Parallism

• SGI Origin2000(64processors)

• SGI Array (40processors)

• CRAY Jg0 Cluster(20processors)

B&rth, Chan, T,tng (1098)

Page 4: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Some Difficult Flow Prob_ms

Entrance/Exit Flow

Recirculation Flow

Boundary-layer Flow

Barth, Chan, Tang (1998)

Page 5: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

_r

Advection Dominated Model Problems

m I

o,L ____I I L .IE".........=t[-...oO-,_+_._.o- I...... i i'-_i'- yu--..._-o I

u. +u u =0(.._o

Convergence of ILU+GMRES for Cuthill-McKee

ordered matrices produced from scalar SUPG spatial

discretization.

Barth, Chan, T:_ng (1998)

Page 6: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Stabilized Numerical Methods

Advection Dominated Flows

• Model Equation:

_,(_,t) = g(_,t), • c r

• Stabilized F.E.M.: (Johnson, Hughes, et. al.)

Find u E Sh such that V w E Vh

fawutdn+f_w('x'Vu)dn+fn e(Vw'Vu)dna

+f(A. Vu - _Au)r (A.w - caw) d n

+ Discontinuity Capturing Operator = 0

Barth, Chan, Tang (1908)

Page 7: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

ELF

(an Element Library for Fluids)

Euler equations in divergence form:

u¢ + f(u)= + g(u), = 0, /(u), g(u): R m _ R m

Euler equations in symmetric quasilinear form

Ao vt+ AAo v=+ BAo vv = 0, Ao, A,B E R m×m

_"SPD _ _llmm ...... - symm -_- ...... ....

ELF Methodology: Galerkin Least-Squares in

symmetric variables

ELF Input: Simplex geometry, Simplex dofs.

ELF Output: Jacobian matrix and rhs terms for

global assembly.

Barth, Chan, T_ng ,'1_,_8)

Page 8: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Newton's Method

Fluid flow equations in semi-discrete form:

Dut = R(u), R(u):R" -_R -

Backward Euler time integration:

Let At vary with spatial position and IIR(_)llso

that Newton's method is approached as

liR(_)ll-_ 0.

Barth, Chan, T_ng (1098)

Page 9: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

. = ,

Solving A x -- b

Let

Az-b=O

represent a matrix problem taken from one step

of Newton's method.

Solve using flexible GMRES in right

preconditioned form:

(AM -l )Mz - b 0

• Allows nonstationary preconditioning

Ideally

(AM -1) = 0(1).

Barth, Chan, T_ng ¢'1C_._8)

Page 10: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

' Preconditioners

Some candidate preconditioners:

1. ILU factorization

- Nonoptimal

- Not easily parallized

@ Additive Schwarz variants of ILU on

overlapping meshes

- Nonoptimal without coarse space correction

- 3-D tetrahedral coarsening is problematic

3. Multigrid

- Similar difficulties as additive Schwarz

4. Nonoverlappmg Schurdomain decomposition

- Data localitywell-suitedto coarsegrain

parallelarchitectures

- Algebraiccoarsespaceprovidesglobal

coupling

Barth, Chan. "r_ng (1998)

Page 11: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

--'t'O "Nonoptimality of ILU rrecona_ 1,,n_ng

i_iii..........., --[- ................. =_1 ....?m ......."................?m ".........................

:,o i: ..................no

,o 5551

mOo _ 4o _o IO 0 2o 4o _o

OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_

Figure 1-2. Performance of ILU for diffusion and

advection dominated problems using scalar SUPO

discretization and Cuthill-McKee ordering.

• IILU retains 0 (_) condition number for

elliptic problems.

Use of modified ILU variants is

problematic in the nonsymmetric

(advection dominated) limit.

Barth, Chan, "F_ng (1998)

Page 12: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

"2 '*_ ,._ _,

Serial versus Parallel ILU

'&. ...........[..l-_, u,u oroll ! !"_'_ 1.J..,j.o_ _.11_ ....... i...|--=_ ==yu.r.-zu..yo.GOl(u.._÷u..ty#

I " •......... 4............................. .L...............

J_ ........i!I iiii illi iiiiiiiiiiiiii I

io I ft i " ! |_l I I I I

I0 0 20 40 (iO 80 IGO

OURES _V_mr

8 SldomJ Ibmdm

JO ° : : i

I0 :" •'.l___.J ..............II°_l_'_ l _ i i i I_,*.*÷ ............I

I°l_'t ! _ ! t i I

|_--• oo,._oo_o*ilo*eo_o ...... o°.4ooo._,,,,o .......

4 ,,," ............. - ............. "- .............

I0 { : :i0 o _o 4o w so t'_

GMRI_ Mm_.V_mm. M_ipll_

Comparison of single domain ILU preconditioning

with multiple domain ILU via Schur complement.

B&rth, Chan, T,tng (1998)

Page 13: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

T

Additive Schwarz Preconditioned GMRES

(From Barth, AGARD R-807, 1996)

_o_-.......r r r 7cp..teMm

iio o leo 2o0 _o 4co

Effect of increased overlap and ILU fill.

,o'r......i I l ]

_.._,j i'.... -'7 "i

.........i1o _ _) 10o 15o

GI_IU35 Mmtx.Veca_ Pmdu_

Effect of increasing the number of subdomains.

- Inviscid flow about multi-element airfoil (5k vertices)

- 1 sweep overlapped ILU with _ _ s_.._e correction

Barth, Chan, Tang (1998)

Page 14: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Schur Complement Domain Decomposition

(Some Contributors)

Basic Approach: Axelsson, Glowinski, Lions, Mandel,

Nepomyaschikh, P_riaux, Przemieniecki, Widlund, Xu

Interface Preconditioned: Bjorstad, Bramble, Chan,

Keyes, Golub, Gropp, Mayers, Pasciak, Schatz, Smith

_'P llelIm I " .Chart,Keyes,Gropp,

Smith

See Also: Domain Decomposition: Parallel

Multilevel Methods for Elliptic PDEs, B. Smith

and P. Bjorstad and W. Gropp, Cambridge Press,

1996.

B&rth. Chin, T._ng (1998)

Page 15: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

1-,.,

Nonoverlapping Domain Decomposition

via Schur Complement

8 4 S 678 g x

Domain decomposition and induced 2 x 2 matrix

partitioning.

Barth, Chin, T;sng (1998)

Page 16: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Solving a 2 x 2 Block Matrix

• Subdomain partition as illustrated.

• Order the subdomain variables before the

interface variables.

@

@

2 x 2 form of the system

zl, zB denote the interior and the boundaryvariables.

• Block LU factorization of A:

@ Eliminate ZB from the reduced equations:

Sxv =/8 - ABIAT_fl

Solve for z z by

z! = A'_(fl - AIBzB),

• S ffi ABs - ABzAT_AnB is the Schwr

Complement of Ass in A.

B&rth, Chin, T.sng (199R)

Page 17: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Exact Schur Complement Preconditio.ni._g

Preprocessing Step: Calculate the Schur

complement

Preconditioning Step: Solve Mz - r

. #.

Step (1)

Step (2)

Step (3)

Step (4)

Step (5)

Step (6)

tI]b _- rb- _ vi

X b = _--l_o b

_li = (AIB)i Xb

z, = u_- (AII)_. _y_

(parallel)(parallel)

(comm)

(s-parallel,

(parallel)

(parallel)

Barth, Chan, T,tng (1998)

Page 18: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Scalability Experiments

Fixed Subdomain Sise, 4 x Partition:

- Interior and interface dofs/partition is constant.

- Total interface size increases 4x.

Fixed Problem Size, 4 x Partitions:

Interior and interface dofs/partition decrease 4x

and 2x respectively.

Total interface size increases 2x.

Barth, C.h_n. T_ng (1_8)

Page 19: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Interface Decomposition

Overview (top) and closeup (bottom) of interface

decomposition obtained via partitioning of the Schur

complement supernode graph (64 subdomains, 4

interfaces).

B&rth, Chan, T_ng (1998)

Page 20: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Global Preconditioning Strategy for Ax = b

• Partition subdomains and interfaces

• Assign a processor to each subdomain and

interface component

• Precondition subdomain problems using

processor-local ILU

• Precondition the Schur complement using_Jt - -

_ drol_.filled, processor-local -ILU factorizations

• Implement flexible GMRES in a domain

decomposed parallel environment

- Parallel dot products

- Parallel matrix-vector products

Barth, Chan, T_ng (19.98)

Page 21: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preconditioner I: Inexact Subproblem Solves

Preprocessing Step: Calculate the approxi'mate

Schur complement

= AaB- _(ABI),(AII); 1 (AIB )i.

1. (AH);-I(AIa), -+ (Ali)_-l(A,v), using fig1 steps

of ILU+GMRES so that S -_ S.

2. (AIx) _"I z, --¢ A_-I1z, using m2 steps of

ILU+GMRES.

3. $-- limb"-_8-11t/Jb using m3 steps of

ILU+GMRES.

_Preconditioning Step: Solv_ Mz = r

Step (1)

Step (2)

Step (3)

Step (4)

Step (5)

Step (6)

,,, = (A_,)_.__,v, = (A_I)_ u,

tOb -----rb -- _'_ t)i

X b _- _-1 1UlJb

v, = (AI_), _b

• , ffi",- (_-)_" v,

(parallel)

(_omm)(s-parallel,comm)

(parallel)

(parallel)

Barth, Chan, Tang (1998)

Page 22: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

.qi -,

Preconditioner I: Perforumnce

Diffusioa Dominaw.d

150-

100-

k==i

m_l=m_2=m_3

Scalar SUPG discretization with lineax elements

Stopping criterion: IIA=. - bi[ _< lO-SllA=o - bll

Barth, Chan, Tang (1998)

Page 23: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

*',,L

Preconditioner I: Performance

Adveelion _nated

120 .................. v ................. ! ................. - .....................................

t

? ? , ,, ,,

100........... --- zsk_,(_ ,,_,_) ...................... nekck_ (I mkkm_)

.'_ -.- __(I mklmJl)80 ........... -_.. nOkd_(n6_) ...................

I

- ................._.................i................._.................3..................

__i " "1 [ i40 ,_................................................]

_n-i ............ -..&,. ............... i ................. _.................. i .................

"_/ "<'_.'",,.. i i __" " .... / ....._'_L_-,-.__. _ - . ..,._.. . . -..... , ",. _ ...... • . I •

I II ! I i t ! t I ! I/ _ -- - ....... 4 ......O . • •

0 2 4 6 8 10

m_l=m_2fm_3

- Scalar SUPG discretization with linear elements

- Stopping criterion: IlAz,. - bll < 10-SllAzo - bll

Barth, Chan, Tang (1998)

Page 24: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Observations for Preconditioner I

There exists an optimal number of subdomains

for minimizing solution time.

Replacing a single long recurrence ILU

factorization with several shorter recurrences

coupled via Schur complement yields an

improved quality preconditioner.

• Choosing small values of the parameters m l, m2,

.....and m3 < 3!cads ___um CPU times. (our

expe_ence).

Linear growth in the time needed to form the

preconditioner is avoided by further partitioning

of the interface. How to precondition the

parallelized interface? Processor-Local ILU.

B&rth, Chan, Tang (1998)

Page 25: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preconditioner II: Drop Tolerance Approximation

Drop elements of the Schtur complement matrix, Sij:

(1) Is,_l< toa(scalarmat,i_en_,i_)(2) Sparsity pattern (block matrix emries)

- For discretized elliptic problems S exhibits faster

element decay than (AH)_ -1 (Golub and Mayers).

Preprocesslng Step: Calculate the approximate

Schur complement

8-_ ABB -- _"_(Am)i(ftxz)_l(Azv)ii

A A

- Precondition S with ILU(DROP(S))

'Preconditioning Stepf _(;|V_ Mx - r

Step (1)

Step (2)

Step (3)

Step (4)Step (5)

Step (6)

u, = (A_t)_ "1ri

v, = (As1)i u,

rob ---- rb -- E Vi

Xb -_ _-1 Wb

Yi (AIB)iXb

(parallel)

(parallel)

(comm)

(s-parallel,comm)

(parallel)

B&rth, Chin, T_ng (1998)

Page 26: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preconditioner III: Wireframe Approximation

, Main cost in Preconditioner II is the formation of

S itself: one subdomain solve is needed for every

variable on the interface.

• Idea: Approximate S by the Schur complement of

a relatively thin udreframe region surrounding the

interface variables.

Take a principal submatrix of

A: variables within the wire-

frame.

_".- ....• Compute the • - aplm_i-

mate Schur complement of

interface variables in this prin-

cipal submatrix instead of A

itself.

• From DD theory: the exact Schur complement of

wireframe region is spectraUy equivalent to the

Schur complement of the whole domain.

Barth, Chart, T,in_ (1998)

Page 27: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preconditioner III: Wireframe Approximation

Pure Diffusion Problem:

Mesh

Size

1600

1600

1600

Num Time Time

Procs Suppoct Iter Form* Applym

16 2 15 .012 .II

16 3 II .017 .08

16 4 6 .020 .04

Pure Advection Problem:

,qt -

Mesh

Size

Sum

Procs

Time

Form*Support Iter

1600 16 2 8 .012 .09

1600 16 3 .......... _L .017 .06

1600 16 4 4 .019 .03

Time

Apply

- Scalar SUPG discretization with linear elements

-- Stopping criterion: [[Axn - bl[ _<10-9[[Axo - b[]

No element dropping, ml = 2, m2 -- 2, m3 = 2

* Timing based on parallelized wireframe

Barth, Chan, Tt_ (!_08)

Page 28: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preconditioner III: Wirefranm Approximation

$ Introduces a new adjustable parameter into the

preconditioner.

• Specify the width of the wireframe strip in terms

of graph distance on the mesh triangulation.

• Quality of the preconditioner improves rapidly

with increasing wireframe width.

Time taken form the Schur complement is

reduced by approximately 50%.

• Further improvements in cost/performance:

choosing the shape of the wireframe to better

represent the PDE being solved, viz. the flow

direction.

B&rth, Chan, T_ng (1998)

Page 29: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preconditioner III: Wireframe Approximation

• Preprocessing step: approximate Schur

complement on the localized subdomains:

Axy: restrictions of AH, A18, ABI to the

localized Schur subdomain(s).

• The preconditioning step (solving Mx = r) is

now:

Step (1)

Step (2)

Step (3)

Step (4)

Step (5)

Step (6)

A 1u, = (A.); r,

v, = (As_), u,

Wb---_rb-- EUi

u, = (A18)i Xb

x, = _ - (A%_);'V,

(parallel)

(parallel)

(comm)

(s-parallel,comm)

Barth, Chan, T_ng (1998)

Page 30: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preconditioner IV: Supersparse Approximation

• Observations:

1. Iterative calculation of A_-_ A1s is expensive.

2. Columns of AzB are usually very sparse.

3. Unpreconditioned Krylov subspace methods

require computation of the vector sequence

-_[p,Ap, A2p,... ,Amp], p-- col(B) for small m.

4. ILU preconditioning destroys sparsity of the

preconditioned Krylov subspace vectors

_,M-_Ap, (M-_A)2p,..., (M-_A)_p].

Barth, Chan_ T_ng (1998)

Page 31: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preconditioner IV: Supersparse Approxi'm.a.tion

• Strategy:

1. Eliminate wasted flop's in matrix-vector

products by storing vectors and matrices in

sparse form.

X • O av

/ ****llil/:!W ,, XXJ |el !°v I

*_:-2._Appr.oximate the ILUbscksolves based on a

vector fill level strategy to maintain sparsity.

Ax-b

bi xi

....J..................I........I|IIIII|IIIIIIIIIIIIIIIIVlllfflIOV

i

i i

Barth, Chan, T;_ng (1998)

Page 32: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

"'2

Preconditioner IV: Supersparse Approximation

• Results indicate I_reconditkming performance

comparable to exact baci_olves.

• 60-70% reduction in cost.

• This technique can be combined with the

previous wireframe strategy with combined 5-7

fold performance gains.

Fill Level k

0

|

2

3

4

OO

Global GMRES Iter

,, a, , m

|

26

22

Time(k)/Time(oo)

0.299

0.313

21

20

20

2O

0.337

0.362

0.392

1.000

_-D test problem consisting of Euler flow past a

multi-element airfoil geometry partitioned into 4 subdomains

with 1600 mesh vertices in each subdomain.

B&rth, Chan, Tang (1998)

Page 33: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Compressible Flow Calculation

,o1 lml: .............................................................. lo ....

j, .................................; , ..... ,', _--CC........ I#_:d i i i_Q .C- _---- ._-CZ_..;..-C _' ....N t_.-_................'f....................!

l i_.l I 1L

7 1 t J iio o.o zJ s_ _ io_ i=-._ io o lo 2o 30 40 5o 6o'

_im_mGlobal GMRI_ l_edx-V_ Pmdk,_

- Mach = .63, Angle of Attack = 2.0 °

- ELF Galerkin Least-Squares

- Time Relaxed Newton Iteration

- m l -- 1, m2 = ma = 4, Fill Level--3

- 4 Subdomain Mesh, (Sk Cells/Subdomain)

Bsrth, Chan, Tang (1998)

Page 34: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

._t -

Compressible Flow Calculation

/ \

1

,oL._._ :°Y_'_-F .....F--7.......]......-qio_'11-.... "-'_-_-_"_, T 1__!• .. • ..... _L-- o4,.

,__:.-_:-__.._%,_ .-_----.

1 1"....::_-:-

i _ i I i,, _10 0 $ 10 I$ 10 0 l0 20 30 40 50 60

llcI50n olohl oMRES lt41m_-Vectm'

- Mach = .63, Angle of Attack = 2.0 °

- ELF Galerkin Least-Squares

- Time Relaxed Newton Iteration

- ml = 1, m2 = m3 = 4, Fill Level=3

- 16 Subdomain Mesh, (Sk Cells/Subdomain)

Barth, Chan, T,_ng (1098)

Page 35: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Subdomain- Interface Load Balancing

r--r ....T'"r"-l-"r"" T.....

_o. -_2_-__.m ,.'...."......

0 . !

q_.m)w_oi ,_bdoml ImerlaceSize

....... 4°°1"........i.........1.........r.......r........T........T........i.........I......

/ i ! t I i lJ_'t

_----' _ "-[ t __, !, !i_'_ .......i.........i,......1-_.., _........_......

=---| IA J=:= ,i !

/ I I I i. I I ! t-- O! 1 I I I ! 1 ! t

0 8 16 24 32 40 48 5664 0 8 16 24 32 40 48 5664

Wocemmn

Mach -- .2, Angle of Attack ----2.0 °

16 Subdomain Mesh, (Sk Cells/Subdomain)

IBM SP2, MPI Message Passing Protocol

ELF Oalerkin Least-Squares

Barth, Chan, Tang (1998)

Page 36: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Preliminary Timings

Subdomaln Interf=lO0 Subdomaln Interf=lSO Subdomaln Interf=lSO-20(

(4 Interface Partitions)

; meoo a

• i

Raw1"Imps Adi.ma forI¢m¢¢ S_

........!.....T-T-T .....!.....f.... '°T.........................[........T........T........T........].........i......./ i I I I I i

,oF............... I i................

,,.....

ii.l i i i..:$-:.!-,::::i=.-.?:--,-!--F--,>t.....t.......t................f........f........I........l........i......./ _.].... ,I,...;.... .I....-i.'..-..I..'.... _---4

ii i Li i __i) f 4 _ ! ', ,_0 s 16 14 3i l} 4s 5664 0 s 16 i+ 3i +0 4 s66+

Pmcemco PmO:WlS

6 GMRES Iterations

- 5k Cells/$ubdomain

-IBM SP2, MPI Message Passing Protocol

- mi = 1, mi : ma = 4, Fill Level=3

Barth, Chan, Tang (19<;8)

Page 37: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

w

Performance of Schur Complement D.D.

For Viscous Flow

,o.__°_0 ,0 30 _0 4o lo 0 _ 20 _) 40

(Add C]dR_ Mmtx-Vec_ Mol_pli_ GiolM GIJES blmtx-¥ec_r Mu/_lles

6

...... _,----_:_=..........I.................1.......N_

,o. m ..................................I0

! "

- SUPG Discretization

- Local CFL Number 100, 000

B&rth, Chan, T_ng (1908)

Page 38: Parallel Domain Decomposition Preconditioning Timothy J. Barth · OMP.F_Mxrla.V_ IdektpJ_ GldM8 ldw_-Vec_r bhkJplt_ Figure 1-2. Performance of ILU for diffusion and advection dominated

Concluding Return-ks

The baseline nonoverlapping scheme is very

robust but relatively expensive

Localized wireframe and supersparse

computations can significantly reduce cost of the

forming the Schur complement preconditioner

• Scalability: Growth of the Schur complement

matrix necessitatespart!tl0ning of the interface

• Scalability: Load balancing still problematic due

to size imbalance of the interface associated with

each subdomain

Barth, Chan, Tang (1098)