Numerical Simulation of 3D Fully Nonlinear Waters Waves on Parallel Computers

Post on 04-Jan-2016

18 views 2 download

Tags:

description

Numerical Simulation of 3D Fully Nonlinear Waters Waves on Parallel Computers. Xing Cai University of Oslo. Outline of the Talk. Mathematical model Numerical scheme (sequential) Parallelization strategy (domain decomposition) Object-oriented implementation Numerical experiment. - PowerPoint PPT Presentation

Transcript of Numerical Simulation of 3D Fully Nonlinear Waters Waves on Parallel Computers

Numerical Simulation of3D Fully Nonlinear Waters Waves

on Parallel Computers

Xing CaiXing CaiUniversity of Oslo

PARA'98

Outline of the Talk

Mathematical model

Numerical scheme (sequential)

Parallelization strategy (domain decomposition)

Object-oriented implementation

Numerical experiment

PARA'98

Mathematical Model

Fully nonlinear 3D water waves Primary unknowns:

wallssolidon 0

surfaceon water 02/)(

surfaceon water 0

olumein water v 0

222

2

n

gzyxt

zyyxxt

,

PARA'98

Numerical Scheme

Physical domain:

Transformation: (a fixed domain)

),,( ,),( ),,( )( tyxzHyxzyxt xy

HH

Hzz 1

)(t

0 ,),( ),,( zHyxzyx xy

PARA'98

Numerical Scheme

• Operator splitting• At each time level:

FDM for updating free surface conditions FEM solution of an elliptic boundary value problem in

0)( K

H

HzHHzHz

HzH

HzH

HtzyxK

yxyx

y

x

)()()()(

)(0

)(01

),,,(2222

PARA'98

Preconditioning

Elliptic boundary value problem - most CPU intensive Resulting system of linear equations Preconditiong

bAxbMAxM 11

Gauss-Seidel O(N2)CG+MILU O(N7/6)

CG+MG/DD O(N)

N- number of unknowns

Computational cost

PARA'98

The Question

How to do the parallelization?

Different approaches on different levels: Automatic parallelization Parallelization on the low matrix-vector level Parallelization on the level of simulators

Starting point: an o-o water wave simulator(built in Diffpack: C++ environment for scientific computing)

PARA'98

Parallelization Strategy

Domain Decomposition

• Divide and conquer• Solution of the original large problem through iteratively

solving many smaller subproblems -- solution method or preconditioner

• Flexible -- localized treatment of irregular geometries, singularities etc

• Very efficient numerical methods -- even on sequential computers

• Suitable for coarse grained parallelization

PARA'98

Overlapping Domain Decomposition

Alternating Schwarz method for two subdomains

Example: solving an elliptic boundary value problem

in

A sequence of approximations

where

on

in

gu

fAu

21

nuuu ,, 10

1|

\on

in

121

111

111

nn

n

n

uu

gu

fAu

2|

\on

in

12

222

222

nn

n

n

uu

gu

fAu

PARA'98

Numerical Foundation

Additive Schwarz Method

Subproblems are of the same form as the original large problem, with possibly different boundary conditions on artificial boundaries.

Subproblems can be solved in parallel.

PARA'98

Convergence of the Solution

Example:Solving the Poissonproblem on the unitsquare

PARA'98

Numerical Foundation

Coarse Grid Correction

Important for good DD convergence

Run on each processor, shared with subdomain

simulators on the same processor

PARA'98

Some Observations

Parallel Computing

efficiency relies on the parallelization

Domain Decomposition

suits well for parallel computing

a good parallelization strategy

Object-Oriented Programming Technique flexible and efficient sequential simulators

can be used in subdomain solves -- main ingredient of DD

PARA'98

New Programming Model

A simulator-parallel model

Each processor hosts an arbitrary number of subdomains balance between numerical efficiency and load balancing

One subdomain is assigned a sequential simulator

Flexibility -- different types of grids, linear system solvers, preconditioners, convergence monitors etc. are allowed for different subproblems

Domain decomposition on the level of subdomain simulators!

PARA'98

Simulator-Parallel

Reuse of existing sequential simulators

Data distribution is implied

No need for global data

Needs additional functionalities for exchanging nodal values inside the overlapping region

Needs some global administration

PARA'98

A Generic Programming Framework

An add-on library (SPMD model) Use of object-oriented programming technique Flexibility and portability Simplified parallelization process for end-user

PARA'98

The Administrator

Parameter Interfacesolution method or preconditioner, max iterations, stopping criterion etc

DD algorithm Interfaceaccess to predifined numerical algorithm e.g. CG

Operation Interface (standard codes & UDC)access to subdomain simulators, matrix-vector product, inner product etc

PARA'98

The Subdomain Simulator

Subdomain Simulator -- a generic representation C++ class hierarchy Interface of generic member functions

PARA'98

Adaptation of Sequential Simulator

Class SubdomainSimulator - generic representation of a sequential simulator.

Class SubdomainFEMSolver - generic representation of a sequential simulator using FEM.

A new sequential wave simulator that fits in the framework is

readily extended from the

existing sequential simulator,

also being a subclass of

SubdomainFEMSolver.

SubdomainSimulator

SubdomainFEMSolver WaveSimulator

NewWSimulator

PARA'98

Performance

Algorithmic efficiency efficiency of original sequential simulator(s) efficiency of domain decomposition method

Parallel efficiency communication overhead (low) coarse grid correction overhead (normally low) synchronization overhead load balancing

subproblem size work on subdomain solves

PARA'98

Parallel Simulation of Waves

PARA'98

Parallel Efficiency

Fixed number of subdomains M=16. Subdomain grids from partition of a global 41x41x41 grid. Simulation over 32 time steps. DD as preconditioner of CG for the Laplace eq. Multigrid V-cycle as subdomain solver.

P Execution time Speedup Efficiency

1 1404.44 N/A N/A

2 715.32 1.96 0.98

4 372.79 3.77 0.94

8 183.99 7.63 0.95

16 90.89 15.45 0.97

PARA'98

Overall Efficiency

Number of subdomains equal to number of processors

P/M Execution time Subgrid Iterations

1 642.14 68921 7.69

2 597.47 38663 9.00*

4 265.62 21689 13.59

8 172.23 12259 17.25

16 90.89 6929 16.56

*For P=2 parallel BiCGStab is used.

PARA'98

Summary

Efficient solution of elliptic boundary value problems

Parallelization based on DD

Introduction of a simulator-parallel model

A generic framework for implementation

http:www.nobjects.com