Parallel and Robust Multigrid Techniques on Structured...

106
Parallel and Robust Multigrid Techniques on Structured Grids Departamento de Arquitectura de Computadores y Automática Facultad de Ciencias Fïsicas Universidad Complutense de Madrid Spain 1/106 Ignacio Martín Llorente www.dacya.ucm.es/nacho July 16, 2001

Transcript of Parallel and Robust Multigrid Techniques on Structured...

Page 1: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Parallel and Robust Multigrid Techniques on Structured Grids

Departamento de Arquitectura de Computadores y AutomáticaFacultad de Ciencias Fïsicas

Universidad Complutense de MadridSpain

1/106

Ignacio Martín Llorentewww.dacya.ucm.es/nacho

July 16, 2001

Page 2: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

2/106

¿Do the scientific community really need robust methods for their

simulation codes?

Motivation

ISCR-CASC-LLNL Ignacio Martín LlorenteIntroduction to the Presentation

Page 3: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

3/106

“Civilization advances by extending the number of important operations which we can perform without thinking about them.”

Alfred North WhiteheadAn Introduction to Mathematics, 1911

Motivation

ISCR-CASC-LLNL Ignacio Martín LlorenteIntroduction to the Presentation

Page 4: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Progress in numerical simulation requires the effective combination of advances in:

• Algorithm development• Understanding the underlying physics• Computer hardware

Objectives

ISCR-CASC-LLNL Ignacio Martín Llorente

4/106

The need for speed:

• The design cycle for aerospace vehicles must be significantly shortened

• CFD will continue to replace a larger portion of wind tunnel and flight testing

Introduction to the Presentation

Page 5: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Objectives

ISCR-CASC-LLNL Ignacio Martín Llorente

5/106

¿Can we obtain an optimal(algorithmic and architectural) solver

for CFD problems?

Introduction to the Presentation

Page 6: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Objectives

ISCR-CASC-LLNL Ignacio Martín Llorente

6/106

Bran98

Introduction to the Presentation

Page 7: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Objectives

ISCR-CASC-LLNL Ignacio Martín Llorente

7/106

Personal opinion:

It does not exist a key method optimal for all cases

Introduction to the Presentation

Page 8: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Objectives

ISCR-CASC-LLNL Ignacio Martín Llorente

8/106

We present one alternative based on implicit schemes and semicoarsening

There are many possible paths towards faster CFD

• Precondiotining• Separation of elliptic and hyperbolic parts• ...

Introduction to the Presentation

Page 9: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Objectives

ISCR-CASC-LLNL Ignacio Martín Llorente

9/106

In particular, we present one building block to achieve optimal CFD by relieving

Bran98

Introduction to the Presentation

Page 10: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Global Vision of the Presentation

ISCR-CASC-LLNL Ignacio Martín Llorente

10/106

Part 1: Multigrid for the diffussion problem

Part 2: Extension to the convection problem

Part 3: Extension to the incompressible Navier-Stokes equations

Algorithmic and Architectural issues of

on structured grids

Introduction to the Presentation

Page 11: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Definition of Algorithmic Properties

ISCR-CASC-LLNL Ignacio Martín Llorente

11/106

Properties that describe the efficiency of an iterative method:Convergence rate: Reduction in error per cycle

Computational work per cycle: Operation count to execute one cycle

What is a robust method?

=> Method able to efficiently solve a wide range of problems

We should define it more precisely by setting up a set of suitable test problems

=> Multigrid technique with a convergence rate independent of:

Grid size

Anisotropy

And for our case example: Reynolds number and yaw angle

Introduction to the Presentation

Page 12: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Definition of Architectural Properties

ISCR-CASC-LLNL Ignacio Martín Llorente

12/106

Properties that are essential to use the full potential of current and future computing systems in each single iteration

Memory usage: Memory waste

Parallel efficiency: How the parallel setting is exploited?

Parallel scalability: Is the parallel efficiency maintained for a higher number of processors?

Cache-memory exploitation: Space and temporal locality in the memory accesses in order to reduce the number of cache misses

Parallel scalability is quite important in solving very large systems on massively parallel computers

=> How does the solver perform as both the number of processors and the grid size are increased?

Introduction to the Presentation

Page 13: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Ways to increase the problem size with the number of processors

Accuracy critical scaling modelMemory-bounded (isomemory) (SuNi90)

Efficiency-bounded (isoefficiency) (GrGK93) Time-bounded (isotime) (Gustafson, Gust88)

Time critical scaling modelFixed-size (Amdahl’s law)

Each scaling model has its own scalability metrics (LlTi97, LlTV96):

We focus on memory-bounded scaling model

Definition of Architectural Properties• Definitions of Scalability

ISCR-CASC-LLNL Ignacio Martín Llorente

13/106

Introduction to the Presentation

Page 14: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Definition of Architectural Properties• Definitions of Scalability for the Memory-Bounded Scaling Model

ISCR-CASC-LLNL Ignacio Martín Llorente

14/106

Is the efficiency maintained as the problem size is increased linearly with the number of processors (memory-bounded)?

ppNTNTpNE

),()1,(),( =

From (N,p) to a system n times larger (nN,np):nnPnNTNT

pNTnNTpNEnpnNEnppSE

1),()1,(),()1,(

),(),(),( ==

Is the execution time maintained as the problem size is increased linearly with the number of processors (memory-bounded)?

From (N,p) to a system n times larger (nN,np):),(

),(),(npnNTpNTnppST =

If the computational complexity of the algorithm is linear O(N) then T(nN,1) = n T(N,1) and SE(p,np)= ST(p,np)

WE PROPOSE TWO SCALIBILITY METRICS FOR MEMORY-BOUNDED SCALING:

Iterative cycle working out of cache

(scaled efficiency)

Introduction to the Presentation

Page 15: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

15/106

Details of this research in

I. M. Llorente and F. Tirado, Relationships between Efficiency an Execution Time of Full Multigrid Methods on Parallel Computers,

IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Nº 6, 1997, pp. 562-573

Definition of Architectural Properties

I. M. Llorente, F. Tirado and L. Vázquez, Some Aspects about the Scalability of Scientific Applications on Parallel Computers, Parallel

Computing, Vol. 22, pp. 1169-1195, 1996

Introduction to the Presentation

Page 16: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Objectives of the First Part

ISCR-CASC-LLNL Ignacio Martín Llorente

16/106

¿Can we obtain an optimal solver for the diffussion operator on highly

stretched grids?

PART 1: Multigrid for the Diffussion Problem

Page 17: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Outline of the First Part

ISCR-CASC-LLNL Ignacio Martín Llorente

17/106

PART 1: Multigrid for the Diffussion Problem

Introduction• Description of the model problem• Components of multigrid• Anisotropies in the discrete operator• What is a robust solver?• Review of robust alternatives

Comparation of two robust alternatives• Convergence factor• Memory requirements• Cache-memory exploitation• Convergence rate per work unit

Parallel implementation• Parallel architectures• Parallel implementation of the alternating-plane approach• Parallel implementation of the semicoarsening approach• Architectural advantages of the 1-D decomposition

Conclusions of the first part

Page 18: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• Description of the Model Problem

ISCR-CASC-LLNL Ignacio Martín Llorente

18/106

3-D Anisotropic Diffussion Equation on a rectangular domain with DirichletBoundary Conditions

),,(2

2

2

2

2

2zyxS

zyx φφγφβφα =

∂+

∂+

∂∂∂∂

Discretized by a Finite volume cell-centered 7-point operator

SZZ

YY

XX

ijkijkijkpijkp

kijijkpkijp

jkiijkpjkip

ZZz

YYy

XXx

φφφφ

φφφ

φφφ

γ

β

α

=++−Δ

+

++−Δ

+

++−Δ

−+

−+

−+

))((2

))((2

))((2

11

11

11

PART 1: Multigrid for the Diffussion Problem

Page 19: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• Components of Multigrid

ISCR-CASC-LLNL Ignacio Martín Llorente

19/106

Two fundamental principles:

relaxation : standard iterative method / smootherIt is used to eliminate oscillatory components of the error

coarse grid correctionOn a coarser grid, low-frequency components appear more oscillatory

I

I

I h2h

I 2hh2h

h

2hh

standard coarsening : doubling the mesh size in all coordinate directions

FAS (Full approximation scheme) schemeIt can be applied to solve non-linear equations

PART 1: Multigrid for the Diffussion Problem

Page 20: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

General FAS ( γ1 ,γ2 ) – V cycle for the solution of the system

L0 u0 = f0

• Pre-smoothing : Apply γ1 sweeps of the smoothing method on L0lul0 = f0

FOR level = 1 TO L-1

• Computation of residual rlevel-1 = flevel-1 - Llevel-1 ulevel-1

• Restriction of residual rlevel= R rlevel-1

• Restriction of current approximation u’level = R ulevel-1

• Computation of right-hand side flevel- = rlevel + Llevel u’level

• Pre-smoothing : Apply γ1 sweeps of the smoothing method on Llevelulevel = flevel

FOR level = L-2 TO 1

• Correction of current approximation ulevel = ulevel + P (ulevel+1 - u’level+1)

• Post-smoothing : Apply γ2 sweeps of the smoothing method on Llevelulevel = flevel

Introduction• Description Components of Multigrid

ISCR-CASC-LLNL Ignacio Martín Llorente

20/106

Connection between grid

levels

PART 1: Multigrid for the Diffussion Problem

Page 21: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

21/106

Our discrete operator

SZZ

YY

XX

ijkijkijkpijkp

kijijkpkijp

jkiijkpjkip

ZZz

YYy

XXx

φφφφ

φφφ

φφφ

γ

β

α

=++−Δ

+

++−Δ

+

++−Δ

−+

−+

−+

))((2

))((2

))((2

11

11

11

• Computational Anisotropy :Use of a exponential stretched grid in the three dimensions.

•Physical Anisotropy : Different coefficients in each direction

Introduction• Anisotropies in the Discrete Operator

PART 1: Multigrid for the Diffussion Problem

Page 22: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• Anisotropies in the Discrete Operator

ISCR-CASC-LLNL Ignacio Martín Llorente

22/106

Grid stretching in order to pack points into regions with large gradients

PART 1: Multigrid for the Diffussion Problem

Page 23: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• What is a Robust Solver?

ISCR-CASC-LLNL Ignacio Martín Llorente

23/106

• The convergence of standard multigrid (point smoother combined with full coarsening) deteriorates dramatically in presence of anisotropies

• Brandt’s fundamental block relaxation rule states that all strongly coupled unknowns (coordinates with relative larger coefficients) should be relaxed simultaneously

Implicit (line or plane) relaxation combined with full coarsening

• However if

• The nature of the anisotropy is not known beforehand, or• The aspect ratios or the equation coefficients differ with each other

throughout the computational domain

=> A robust solver is needed

PART 1: Multigrid for the Diffussion Problem

Page 24: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• Review of Robust Alternatives

ISCR-CASC-LLNL Ignacio Martín Llorente

24/106

Alternating-direction plane smoothers in combination with full coarsening(Bran84, LlMe00, …)

X , Y and Z coarsening

Exploreall possibilities

STRUCTURED

GRIDS

Resolution of the 2-D problemsParallel implementation of the 2-D solvers

PART 1: Multigrid for the Diffussion Problem

☺ Fast convergence rate that improves with stretching

GEOMET

RIC

Page 25: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

25/106

Block smoothing in combination with semicoarsening(DMRR89, Scha98, BrFJ00, ...)

X coarsening

STRUCTURED

GRIDS

Resolution of the 2-D problemsThe 2-D problem size remains fixed in the coarsening process (memory waste and higher work per cycle)

☺ 1-D parallel implementation☺ Easier to implement than the

alternating-plane approach

PART 1: Multigrid for the Diffussion Problem

Introduction• Review of Robust Alternatives

GEOMET

RIC

Page 26: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

26/106

Recombination of the corrections of more than one semicoarsening gridmultiple semicoasening

(Muld89, OvRo93,...)

STRUCTURED

GRIDS

Difficult 3-D generalizationRecombination operatorMemory waste

☺ Two level parallelism

PART 1: Multigrid for the Diffussion Problem

Introduction• Review of Robust Alternatives

GEOMET

RIC

Page 27: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

27/106

Standard coarsening combined with a semicoarsened smootherflexible multiple semicoasening

(Oost95, Stub97, ...)

X and Y coarsening with line relaxation

STRUCTURED

GRIDS

Memory management☺ Easier parallel implementation

PART 1: Multigrid for the Diffussion Problem

Introduction• Review of Robust Alternatives

GEOMET

RIC

Page 28: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

28/106

Point-wise smoothing combined with a fully adaptive coarsening processalgebraic multigrid

(BrMR82, RuSt87, CFHJ00, ...)

UNSTRUCTURED

GRIDS

Parallel implementationSetup timeData management

☺ Black box

An Algebraic Multigrid Tutorial, Van Emde Henson

PART 1: Multigrid for the Diffussion Problem

Introduction• Review of Robust Alternatives

ALGEB

RAIC

Page 29: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Comparation of Two Robust Alternatives

ISCR-CASC-LLNL Ignacio Martín Llorente

29/106

We focus on:

Alternating-plane smoother combined with full coarsening (LlMe00, …)

Plane smoother combined with semicoarsening (PSEL01, …)

Results obtained with:

• Homogeneous problem with random initial guess• V(1,1) cycle • Restriction done by full weighted operator• Trilinear (plane alternating) and linear (semicoarsening) interpolation for the

prologator• Zebra plane relaxation

PART 1: Multigrid for the Diffussion Problem

Page 30: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Comparation of Two Robust Alternatives• Convergence Factor

ISCR-CASC-LLNL Ignacio Martín Llorente

30/106

Computational Anisotropy : 3-D exponential stretched grid

• Each semi-coarsening approach exhibits the same convergence factor. There is no privileged direction

• The alternating-plane smoother improves its convergence factor as the stretching grows

Alternating-plane

PART 1: Multigrid for the Diffussion Problem

Grid stretching

Convergence Factor

Simulations for the isotropic equation on a 64x64x64 grid with different stretching factors.

Page 31: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

31/106

02

2

2

2

2

2=

∂+

∂+

∂∂∂∂

zyxcba φφφ

• Best Semi-coarsening procedure the one that keeps coupling of connected unknowns

• Alternating-plane smoother becomes a direct solver for high anisotropies

Physical Anisotropy: Coefficients are increased in two directions

Alternating-plane and semicoarsening solving the connected unknowns

PART 1: Multigrid for the Diffussion Problem

c=1 a = b = 1,10,100,1000,10000

Convergence Factor

Simulations for the anisotropic equation on a uniform 64x64x64 grid

Comparation of Two Robust Alternatives• Convergence Factor

Page 32: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

32/106

377

55

11

655

97

18

1

10

100

1000

32 64 128

Alternating

Semi-coarsening

Mem

ory

size

(M

b)

Problem size

• Memory requirements of the semi-coarsening approach are about twice as large as the alternating-plane approach

Comparation of Two Robust Alternatives• Memory Requirements

Alternating-plane

PART 1: Multigrid for the Diffussion Problem

Page 33: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

33/106Comparation of Two Robust Alternatives• Execution Time per Cycle

Factors to consider:Floating point operations per Multigrid cycle Cache memory exploitation

PART 1: Multigrid for the Diffussion Problem

Floating point operations per Multigrid cycle: • 26 % larger on the alternating-plane approach

Alternating-plane

Page 34: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

34/106

Memory data storage• Plane YZ ( X= constant )Suitable for X-semi-coarsening

• Plane XZ ( Y= constant )Suitable for Y-semi-coarsening

• Plane XY ( Z= constant )Suitable for Z-semi-coarsening

PART 1: Multigrid for the Diffussion Problem

Comparation of Two Robust Alternatives• Execution Time per Cycle

Cache memory exploitation

Page 35: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

35/106

Millions of L2 cache Misses

0

200

400

600

800

1000

O2K 64 O2 64

Alter X-SemiY-Semi Z-Semi

0102030405060708090

O2K 32 O2 32

Alter X-SemiY-Semi Z-Semi

Cache memory exploitation • Y-Semicoarsening and X-Semicoarsening have the same behavior (L2 cache block size).• Z-Semicoarsening produces more L2 cache misses due to the memory storage scheme.

• SGI O2 WorkstationL2 : 1-MB unified -Ofast=ip32_10k

• SGI Origin 2000 system (O2K)L2 : 4-MB unified -Ofast=ip27

Comparation of Two Robust Alternatives• Execution Time per Cycle

X and Y semicoarsening

Cost per cycle : 38% larger on the alternating-plane approach

PART 1: Multigrid for the Diffussion Problem

64x64x64 grid32x32x32 grid

Page 36: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

36/106

Simulations for the isotropic equation on a 64x64x64 grid with different stretching factors

Comparation of Two Robust Alternatives• Convergence Rate per Work Unit

Definitive metric

Work Unit: Time consumed in computing the system metrics on the finest level

PART 1: Multigrid for the Diffussion Problem

• The alternating-plane approach reduces the same amount of error in less time

Grid stretching

Convergence Factor per Work Unit (O2K)

Page 37: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

37/106

Alpha21164

Mem

oriaM

emoria

Router

Streams

E-reg

R10000 R10000L2 L2

Hub Asic

SysAD Bus: 780 MB/s

Memoria

Directorio

780 MB/sDir.

Control

780 MB/s

780 MB/s

CrayLink Router

780 MB/s780 MB/s

XIO

SGI Origin 2000: 32 MIPS R10000

Parallel Implementation• Parallel Architectures

PART 1: Multigrid for the Diffussion Problem

CRAY T3E: 48 Alpha 21164

Page 38: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

38/106Parallel Implementation• Parallel Implementation Alternatives

PART 1: Multigrid for the Diffussion Problem

MG+ DD : Grid partitioning

• Domain decomposition is applied at each level

It retains the convergence of the sequential algorithm

It implies more communication overheads than domain decomposition approaches since exchanges of data are required on each grid level

DD + MG : Domain decomposition

• Domain decomposition is applied on the finest grid and multigrid inside each block

It deteriorates the convergence of the sequential algorithm

It implies fewer communications since exchanges of data are only required on the finest grid level

Page 39: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

39/106

• Regardless of the data partitioning applied, it requires the solution of tridiagonal systems of equations distributed among the processors because the combination of alternating-line smoothers and full coarsening is applied to solve the planes

• Solving the line is the most time consuming task of our code (around 80 %)

• An alternating-line 2D solver can be used for estimating the parallel efficiency of the whole application

Parallel Implementation• Parallel Implementation of the Alternating-Plane Approach

PART 1: Multigrid for the Diffussion Problem

Distributed y-lines solved by:• Pipelined Gaussian Elimination • Cyclic Reduction • Wang’s algorithm

Local x-lines solved by Gaussian Elimination

x

yLet us study the efficiency of these methods on parallel computers for

128x128 and 1024x1024 2-D systems (1-D partitioning)

Page 40: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

40/106Parallel Implementation• Parallel Implementation of the Alternating-Plane Approach

PART 1: Multigrid for the Diffussion Problem

0

0.2

0.4

0.6

0.8

1

1.2

0 4 8 12 16 20 24 28 32 36

Number of processors

Effic

ienc

y

12810242048

0

0.2

0.4

0.6

0.8

1

1.2

0 4 8 12 16 20 24 28 32 36

Number of processors

Effic

ienc

y

T3E O2K

Pipelined Gaussian Elimination (PGE)

• High efficiencies for large 2D problems• Current memory limits do not allow 3D problems to be solved where their corresponding

2D planes are big enough to obtain satisfactory efficiencies

Page 41: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

41/106Parallel Implementation• Parallel Implementation of the Alternating-Plane Approach

PART 1: Multigrid for the Diffussion Problem

0

0.2

0.4

0.6

0.8

1

1.2

0 4 8 12 16 20 24 28 32 36

Number of processors

Effic

ienc

y

128 PCR1024 PCR2048 PCR128 PCR21024 PCR22048 PCR2

0

0.2

0.4

0.6

0.8

1

1.2

0 4 8 12 16 20 24 28 32 36

Number of processors

Effic

ienc

y

T3E O2K

Cyclic Reduction

• High efficiencies for large 2D problems• Current memory limits do not allow 3D problems to be solved where their corresponding

2D planes are big enough to obtain satisfactory efficiencies

Page 42: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

42/106Parallel Implementation• Parallel Implementation of the Alternating-Plane Approach

PART 1: Multigrid for the Diffussion Problem

0

0.2

0.4

0.6

0.8

1

1.2

0 4 8 12 16 20 24 28 32 36

Number of processors

Effic

ienc

y

12810242048

0

0.2

0.4

0.6

0.8

1

1.2

0 4 8 12 16 20 24 28 32 36

Number of processors

Effic

ienc

y

T3E O2K

Wang’s algorithm

• High efficiencies for large 2D problems• Current memory limits do not allow 3D problems to be solved where their corresponding

2D planes are big enough to obtain satisfactory efficiencies

Page 43: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

43/106

Details of this research in

PART 1: Multigrid for the Diffussion Problem

Parallel Implementation• Parallel Implementation of the Alternating-Plane Approach

D. Espadas, M Prieto, I. M. Llorente and F.Tirado, Solution of Alternating-line Processes on Modern Parallel Computers, In Proceedings of the 28th. International Conference on Parallel

processing, ICPP '99. Aizu-Wakamatsu (Japan), September 1999. Published by the IEEE Computer Society, pp. 208-215

Page 44: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

44/106

• A 1-D data decomposition on the semicoarsened direction can be chosen so that a parallel tridiagonal solver is not needed

PART 1: Multigrid for the Diffussion Problem

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

Page 45: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

45/106

PART 1: Multigrid for the Diffussion Problem

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

Cray T3E Origin 2000

• Can the efficiency be improved?

One V-cycle for the isotropic equation on a 64x64x64 grid

Page 46: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

46/106

The efficiency deteriorates for a large number of processors due to the load imbalance below the critical level (level with one plane per processor)

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

PART 1: Multigrid for the Diffussion Problem

P0 P1 P3P2

P0 P1 P3P2Number of idle Processors is doubledin each level below the critical one

Critical level

Page 47: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

47/106

Agglomeration on Coarsest Grid (MoDe99, ...):

Grids below the critical level are solve on a single processorIt could reduce the communication overhead

It increases the execution time because the plane-wise smoother is very expensive

Parallel Superconvergent Multigrid (FrMc88, …):

It keeps the processor busy below the critical level using multiple coarse gridsIt could improve the convergence rate

It increases the execution time because of the time needed for merging the solutions

U-cycle Method (XiSc97, …):

It solves the problem on the critical level by applying an certain number of sweepsIt could avoid idle processors

It increases the execution time in the simulation because convergence rate of each cycle becomes lower

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

PART 1: Multigrid for the Diffussion Problem

Alternatives to relieve the load imbalance

Page 48: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

48/106

PART 1: Multigrid for the Diffussion Problem

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

U-cycle approach

• The number of grid levels has been fixed so that each processor has one plane on the coarsest level (critical level)

P0 P1 P3P2

• Critical Level for 4 processors

• Coarsest level fixed by the number of processors used

May be too large to keep the convergence rate

Page 49: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

49/106

PART 1: Multigrid for the Diffussion Problem

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

Cray T3E

• Higher efficiency than the pure V-cycle! (0.92 vs. 0.7 for 16 processors) • What about the convergence rate?

• Realistic efficiency

One U-cycle for the isotropic equation on 32x32x32 and

64x64x64 grids

Page 50: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

50/106

PART 1: Multigrid for the Diffussion Problem

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

• Lower efficiency than the pure V-cycle! (0.3 vs. 0.7 for 16 processors)

Cray T3E

Realistic efficiency using U-cycles reaching a residual norm

of 10-12 on a 64x64x64 grid

Page 51: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

51/106

Hybrid Smoother:

1. The number of levels can be defined and is a trade-off between algorithmic and architectural properties

The higher the number of levels the higher the convergence rateThe smaller the number of levels the higher the parallel efficiency

2. It uses as smoother:

Zebra smoother in and above the critical levelDamped Jacobi in and below that level

It improves the granularity of the smoother but deteriorates its convergence rate

PART 1: Multigrid for the Diffussion Problem

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

Trade-off between algorithmic and architectural properties

Page 52: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

52/106

PART 1: Multigrid for the Diffussion Problem

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

• Therefore going down to the coarsest level is not the most efficient choice• The efficiency for the hybrid approach on 16 processors is higher (0.87 vs. 0.7 for

pure V-Cycle and vs. 0.3 for the U-Cycle)

Cray T3E Origin 2000

Realistic efficiency using Hybrid-cycles reaching a residual norm of 10-12 on

a 64x64x64 grid

Page 53: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

53/106

Details of this research in

PART 1: Multigrid for the Diffussion Problem

D. Espadas, M Prieto, I. M. Llorente and F.Tirado, Solution of Alternating-line Processes on Modern Parallel Computers, In Proceedings of the 28th. International ConfICPP '99). Aizu-

Wakamatsu (Japan), September 1999. Published by the IEEE Computer Society, pp. 208-215

Parallel Implementation• Parallel Implementation of the Semicoarsening Approach

Page 54: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Parallel Implementation•Architectural Advantages for the 1-D Decomposition

Ignacio Martín Llorente

54/106

0

0.1

0.2

0.3

0.4

0.5

0.6

8 16 32 64

3D2D1D

P0 P1 P2 P3

P4 P5 P6 P7

P8 P9 P10 P11

P12 P13 P14 P15

n n/√ p

• Communication- computation ratio

• Communications grow proportionally to the size of the boundaries• Computations grow proportionally to the size of its entire partition

Perimeter-surface ratio in 2DSurface-volume ratio in 3D

• Traditional wisdom says that 3-D decomposition of 3-D problems (for example a Poisson multigrid solver with point smoothing) leads to a lower inherent communication-to-computation ratio

• The impact becomes greater as the number of processors increases

256x256x256 problem

nn/√p

ISCR-CASC-LLNL PART 1: Multigrid for the Diffussion Problem

Page 55: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Parallel Implementation•Architectural Advantages for the 1-D Decomposition

ISCR-CASC-LLNL Ignacio Martín Llorente

55/106

PART 1: Multigrid for the Diffussion Problem

0

20

40

60

80

100

120

128 256

Tim

e (s

ec)

1D2D3D

0

10

20

30

40

50

60

128 256

Tim

e (s

ec)

1D2D3D

16 processors 32 processorsCray T3E

0

20

40

60

80

100

120

140

160

180

128 256

Tim

e (s

ec)

1D2D3D

0

50

100

150

200

250

300

350

128 256

Tim

e (s

ec)

1D2D3D

Origin 2000

Page 56: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Parallel Implementation•Architectural Advantages for the 1-D Decomposition

ISCR-CASC-LLNL Ignacio Martín Llorente

56/106

PART 1: Multigrid for the Diffussion Problem

2-D decomposition on 32-processor SGI Origin 2000

22 % better than the 1-D decomposition8 % better than the 3-D decomposition

Data partitioning is a trade-off between the improvement of the message data locality and the efficient exploitation of the underlying communication system

Using up to 32 processors, in both systems, an appropriate 2-D decomposition, where boundaries with poor spatial locality are not needed, solves that trade-off

We should also note that a lower-dimensional partitioning program

is easier to code

allows the implementation of fast sequential algorithms in the non-partitioned directions

execution time is similar to 2-D or 3-D

Page 57: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Parallel Implementation•Architectural Advantages for the 1-D Decomposition

ISCR-CASC-LLNL Ignacio Martín Llorente

57/106

PART 1: Multigrid for the Diffussion Problem

Details of this research in

M. Prieto, I. M. Llorente and F. Tirado, Data Locality Exploitation in the Decomposition of Regular Domain Problems, IEEE Transactions on Parallel and Distributed Systems, Vol. 11, Nº 11, 2000, pp. 1141-

1150

M. Prieto, I. M. Llorente and F. Tirado, A Revision of Regular Domain Partitioning, SIAM News Vol 33 Number 1, January-February 2000

Page 58: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Conclusions of the First Part

ISCR-CASC-LLNL Ignacio Martín Llorente

58/106

• The alternating-plane approach presents:• Higher convergence rates that improves with the anisotropy strength• Lower memory requirements• Higher execution time per cycle

• In summary, better convergence per work unit

• However, its parallel implementation is not efficient since it requires the solution of distributed 2-D systems (difficult and poor efficient implementation)

• The 1-D parallel version of the semicoarsening approach is more efficient using a tradeoff between the V- and U-cycles (hybrid smoother)

PART 1: Multigrid for the Diffussion Problem

Page 59: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

59/106

¿Can we obtain an optimal solver for the convection operator?

aφ y

hx

hy

PART 2: Extension to the Convection Problem

Objectives of the Second Part

Page 60: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

60/106

PART 2: Extension to the Convection Problem

Outline of the Second Part

Introduction• Multigrid for convection problems• Model problem

Our approach• Narrow discretization• Cross-characteristic interaction

Results• Convergence rate• Parallel efficiency

Conclusions of the second part

Page 61: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• Multigrid for Convection Problems

ISCR-CASC-LLNL Ignacio Martín Llorente

61/106

Multigrid is highly efficient to solve elliptic operators

However, it fails to solve nonelliptic operators

In many cases, the nonelliptic part is represented by the convection operatorFor example, Navier-Stokes solved by multigrid based on a distributive smoother

),,( zyxFz

Ucy

Ubx

Ua =∂

∂+

∂+

Downstream marching is the more efficient sequential solver for this operator☺ Solve linear upwind operators in one sweep☺ Solve nonlinear operators in few sweeps

However:A defect-correction scheme must be applied if the discretization is not fully upwindVery low parallel efficiency due to the sequential marching

PART 2: Extension to the Convection Problem

Page 62: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• Model Problem

ISCR-CASC-LLNL Ignacio Martín Llorente

62/106

),,( zyxFz

Ucy

Ubx

Ua =∂

∂+

∂+

X

YZ

inflow

outflow

periodic

PART 2: Extension to the Convection Problem

Page 63: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Our Approach

ISCR-CASC-LLNL Ignacio Martín Llorente

63/106

NARROW DISCRETIZATION

+

SEMICOARSENING

+

CORRECTION OF OPERATORS TO MAINTAIN THE SAME CROSS-CHARACTERISTIC INTERACTION IN ALL GRIDS

+

FOUR-COLOR PLANE IMPLICIT SMOOTHER

PART 2: Extension to the Convection Problem

[Disk99]

Page 64: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Our Approach• Narrow Discretization

ISCR-CASC-LLNL Ignacio Martín Llorente

64/106

• The discretization follows the characteristic line of the operator

• The full-dimensional operator is obtained by replacing values at ghost points by weighted averages at adjacent genuine grid points

(i1,i2,i3)

(i1-1,i2-1,i3-1)

(i1-1,i2,i3-1)

(i1-1,i2-1,i3)

(i1-1,i2,i3)

(i1+1,i2+1,i3)

(i1+1,i2+1,i3+1)

X

YZ

(i1+1,i2,i3+1)

(i1+1,i2,i3)h

h

h

tzty

tz ty

3D discretization stencilLow-dimensional prototype stencilCharacteristic line

(i1,i2-1,i3)

(i1,i2+1,i3)

(i1,i2,i3-1)

(i1,i2,i3+1)

PART 2: Extension to the Convection Problem

Page 65: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Our Approach• Cross-Characteristic Interaction

ISCR-CASC-LLNL Ignacio Martín Llorente

65/106

¿What is the cross-characteristic interaction (CCI)?• The CCI induced by a discrete operator is estimated by the coefficients of the

lowest pure cross-characteristic derivatives appearing in the first differential approximation

• In our simpler case, CCI appears only because on interpolation in the y-z plane

Main difficulty in constructing an efficient solver for nonelliptic operators:• Poor coarse-grid approximation to fine-grid characteristic error components• The coarse-grid CCI is lower than required in a narrow discretization on a

semicoarsened grid

Solution:• We supply additional terms (explicit CCI) in coarse-grid discretizations so the

total coarse-grid CCI would be the same as on the fine grid

PART 2: Extension to the Convection Problem

Page 66: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Results• Convergence Rate

ISCR-CASC-LLNL Ignacio Martín Llorente

66/106

Solver 1: With explicit CCI termsSolver 2: Without explicit CCI terms

Residual versus work unitsResidual versus work units

Grid-independent convergence rate 0.09 for any angles of nonalinment

PART 2: Extension to the Convection Problem

abty =

actz =

Page 67: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Results• Parallel Efficiency

ISCR-CASC-LLNL Ignacio Martín Llorente

67/106

PART 2: Extension to the Convection Problem

• What about the convergence rate?•Realistic efficiency

Efficiency of one Hybrid-cycle

Page 68: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

68/106

It deteriorates the convergence properties

The Realistic Parallel Efficiency considers the execution time to reach the final solution (to reach a certain residual norm)

PART 2: Extension to the Convection Problem

Results• Parallel Efficiency

Page 69: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

69/106

Again the best choice is a trade-off between the parallel and numerical properties

PART 2: Extension to the Convection Problem

Results• Parallel Efficiency

Realistic efficiency using Hybrid-cycles reaching a

residual norm of 10-12

Page 70: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Results

ISCR-CASC-LLNL Ignacio Martín Llorente

70/106

PART 3: Extension to Navier-Stokes

Details of this research in

I. M. Llorente, M. Prieto-Matias and B. Diskin, An Efficient ParallelMultigrid Solver for 3-D Convection Dominated Problems, in press,

Parallel Computing

Page 71: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Conclusions of the Second Part

ISCR-CASC-LLNL Ignacio Martín Llorente

71/106

• We have proposed a multigrid algorithm to solve in a parallel setting a convection operator that is sequential in nature

• Such operator appears in many practical problems in CFD • For example, distributive smoothers

• We have studied different alternatives to implement the solver on a parallel computer

• The 1-D decomposition with a hybrid smoother appears to be a tradeoff between parallel and convergence properties

• Satisfactory efficiencies (higher than 0.8) are obtained up to 32 processors

PART 2: Extension to the Convection Problem

Page 72: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Objectives of the Third Part

ISCR-CASC-LLNL Ignacio Martín Llorente

72/106

¿Can previous conclusions be extended for the incompressible

Navier-Stokes Equations?

PART 3: Extension to Navier-Stokes

Page 73: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Outline of the Third Part

ISCR-CASC-LLNL Ignacio Martín Llorente

73/106

PART 3: Extension to Navier-Stokes

Introduction• Description of the problem• Multigrid approach• The coupled smoother• The plane smoother

Boundary layer of a flat plate at yaw• Domain• Boundary conditions• Non-zero yaw angle

Results• Validation with Blasius theory• Convergence rate

Page 74: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Outline of the Third Part

ISCR-CASC-LLNL Ignacio Martín Llorente

74/106

PART 3: Extension to Navier-Stokes

Parallel implementation• 1-D decomposition• 2-D decomposition• Parallel architecture• Analysis of the interconnection alternatives• Analysis of the execution node alternatives• Scalability

Relation to the MG/NAS parallel benchmark• Description of MG/NAS• Our code as a benchmark• Comparation to our code

Conclusions of the third part

Page 75: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• Description of the Problem

ISCR-CASC-LLNL Ignacio Martín Llorente

75/106

0uuRe

1pu)u(=⋅∇

Δ+−∇=∇⋅

• Dimensionless steady-state incompressible Navier-Stokes equations:

• Discrete system obtained using a finite volume technique

• Discretization over an orthogonal structured grid

• Staggered arrangement of unknowns

• The second-order operator is obtained using a QUICK scheme [HaHG92] solved via defect-correction inside the multigrid cycle [OGWW98]

PART 3: Extension to Navier-Stokes

Page 76: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• Multigrid Approach

ISCR-CASC-LLNL Ignacio Martín Llorente

76/106

• Multigrid strategy : Full Multi-Grid (FMG).

• Coarsest Finest.

• Reduce the algebraic errors below the discretization error in one FMG cycle.

• Multigrid Cycle: Solve each level of the FMG algorithm

• FAS (Full Approximation Scheme)

• Grids scanned with a F(γ1,γ2) cycleIterations of the smoother upwards

Iterations of the smoother downwards

Iterations of the smoother to solve the Coarsest level.

PART 3: Extension to Navier-Stokes

Page 77: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• The Coupled Smoother

ISCR-CASC-LLNL Ignacio Martín Llorente

77/106

• The solution is updated via under-relaxation

• We have chosen a coupled smoother [Vank86] instead of the distributive alternative [BrYa92] All variables involved in each control-volume are updated simultaneously

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

=

⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜

ΔΔΔΔΔΔΔ

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

ΔΔΔΔ−ΔΔΔΔ−ΔΔΔΔ−+

+

+

+

+

+

mijk

wijk

wijk

vkij

vijk

ujki

uijk

ijk

ijk

ijk

kij

ijk

jki

ijk

topp

wijk

bottomp

wijk

northp

vijk

southp

vijk

eastp

uijk

westp

uijk

rrrrrrr

pwwvvuu

yxyxzxzxzyzyLLLLLLLLLLLL

k

top

k

bottom

j

north

j

south

i

east

i

west

1

1

1

1

1

1

000000

0000000000000000000000000

1

1

1

pwpp

uwuu

pnn

unn

Δ⋅+=

Δ⋅+=+

+

1

1

PART 3: Extension to Navier-Stokes

• Relax simultaneously the momentum and continuity equations

Page 78: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• The Plane Smoother

ISCR-CASC-LLNL Ignacio Martín Llorente

78/106

• We have compared different robust smoothers: alternating-plane vs. semicoarsening [MoLS01] and the combination of semicoarsening and plane implicit presents much better properties

• All the velocity components and pressures contained within the plane are updated

• Example YZ-Plane:

.],0[,

),,,,(

1

constinkj

ppwwwwvvuu

pwwvuX

ijkijkijkijkijk

Tk

=∈∀

=====

=

++

+

rrrrr

rrrrr

z

PART 3: Extension to Navier-Stokes

Page 79: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Introduction• The Plane Smoother

ISCR-CASC-LLNL Ignacio Martín Llorente

79/106

• An 2-D direct exact solver is not needed for the planes

• The 2-D system is approximately solved with one cycle of a robust 2-D multigrid

algorithm

• The 2-D algorithm is the combination of line smoothing with semicoarsening

PART 3: Extension to Navier-Stokes

Page 80: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

X

Y

Z

YGeometrically stretched near the plate edges due to high residuals and errors

Geometrically stretched in x direction to capture the boundary layer

Boundary Layer of a Flat Plate at Yaw• Domain

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

cascade of square plates

80/106

Page 81: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Boundary Layer of a Flat Plate at Yaw• Boundary Conditions

• Region of a favorable pressure gradient that accelerates the velocity at the outlet in order to match the wrong outflow condition

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

• The outflow boundary condition is derived using Goldstein's calculations of the velocity distribution in the wake of afinite 2-D flat plate [Gold33]

• The low pressure zone in the wake of the plate disappears

81/106

Page 82: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Boundary Layer of a Flat Plate at Yaw• Non-zero Yaw Angle

The two problems associated with this simulationare boundary layers and entering flows with non-

aligned characteristics

Periodicity in the velocity field

Boundary conditions are obtained by rotating Goldstein's calculations

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

82/106

Page 83: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Results• Validation with Blasius Theory

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

83/106

Page 84: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

• We will confine Re<105 and assume that the wake remains laminar at a distance of 2L behind the trailing edge

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

84/106Results• Validation with Blasius Theory

Page 85: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

• The convergence attained within the first five cycles is below 0.1 for both smoothers and yaw angles for Reynolds numbers bellow 10000

Results• Convergence Rate

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

85/106

Page 86: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Results• Convergence Rate

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

86/106

Page 87: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Results• Convergence Rate

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

87/106

Page 88: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

88/106

Upwind stencil

No dependencies No dependencies

QUICK stencil

1-D decomposition on the semicoarsened direction

•Traditional Zebra order of planes is not possible (correction of the convective terms)

• Planes scanned in a 4-c fashion

Zebra Ordering of Planes Tri-Plane Ordering of Planes

Planes solved simultaneously

PART 3: Extension to Navier-Stokes

Parallel Implementation• 1-D Decomposition

Page 89: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

• U-cycle

• The planes are solved with Semi-coarsening + Line Smoothing (the line solvers are not distributed)

Parallel Implementation• 2-D Decomposition

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

89/106

Page 90: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Objectives of the study:

• Fast Ethernet vs. Giganet-VIA

• Single node vs. dual node computing

• Scalability

Parallel Implementation• Parallel Architecture

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

MpiPro and Porland Group compiler

90/106

Page 91: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Simulations using Dual-Nodes and a 32x128x128 grid

Parallel Implementation• Analysis of the Interconnection Alternatives

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

VIA

TCP

TT

Drop in Efficiency due to the higher interconnection latency and lower bandwidth

Speedup =

91/106

Page 92: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Parallel Implementation• Analysis of the Execution Node Alternatives

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

Simulations using VIA and a 32x128x128 grid

Drop in Efficiency due to both a single network card and memory sharing

92/106

VIA

TCP

TT

Speedup =

Page 93: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Parallel Implementation• Scalability

ISCR-CASC-LLNL Ignacio Martín LlorentePART 3: Extension to Navier-Stokes

2x1

2x2 4x4 8x4

32x64x32

32x64x64

32x128x64

32x128x12864x128x128 128x128x128

2x4

),(),(

),(),(),(

pNEnpnNE

npnNTpNTnppST ==

scal

ed e

ffic

ien

cy

The scaled efficiency is bounded away from zero, so the pair algorithm-architecture is scalable

93/106

Page 94: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Relation to the MG/NAS Parallel Benchmark• Description of MG/NAS

ISCR-CASC-LLNL Ignacio Martín Llorente

94/106

PART 3: Extension to Navier-Stokes

Goal of the NAS Parallel Benchmark (www.nas.nasa.gov):• Estimate the performance of a real CFD application on a parallel system by solving

the following MPI-based source kernels: EP, MG, CG, FT, IS, LU, SP and BT

The NAS-MG multigrid benchmark solves Poisson's equation in 3-D with periodic boundary conditions using multigrid V-cycles on a uniform 256x256x256 grid (class-B, 20 cycles)

Grid partitioning is applied in the parallel implementation

Page 95: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Relation to the MG/NAS Parallel Benchmark• Description of MG/NAS

ISCR-CASC-LLNL Ignacio Martín Llorente

95/106

PART 3: Extension to Navier-Stokes

MG/NAS

0

50

100

150

200

250

300

1 10 100

Number of processors

Tim

e

VIA singleVIA doubleTCP singleTCP doubleO2KT3E

VIA single VIA double TCP single TCP double O2K T3E1 248,23 248,23 248,23 248,23 207,492 123,3 174,38 133,55 174,67 143,124 68,4 97,23 79,5 105,36 82,75 52,18 31,52 44,71 40,13 53,15 41,85 22,416 18,33 26,82 24,26 34,73 22,23 12,932 17,28 23,11

The poor data locality in the message passing affects more strongly to the MIPS 10000 processor performance(10 MB/seg with poor data locality, see PrLT00)

Page 96: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Relation to the MG/NAS Parallel Benchmark• Our Code as a Benchmark

ISCR-CASC-LLNL Ignacio Martín Llorente

96/106

PART 3: Extension to Navier-Stokes

Navier Stokes

0

500

1000

1500

2000

2500

1 10 100

Number of processors

Tim

e

VIA singleVIA doubleTCP singleTCP doubleO2K

VIA single VIA double TCP single TCP double O2K1 2236,87 2236,87 2236,87 2236,87 1898,52 1163,19 1217,47 1060,31 1209,43 753,54 547,66 644,394 588,974 707,026 4848 294,025 351,678 350,033 425,145 242

16 152,238 187,349 183,912 235,054 129,532 110,956 144,248 59

The plane implicit smoother exhibits a higher locality

Page 97: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

97/106

PART 3: Extension to Navier-Stokes

Our code shows a higher computation to communication ratio

Relation to the MG/NAS Parallel Benchmark• Comparation to Our Code

Page 98: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

98/106

• Textbook Multigrid Convergence is attained for the mode problem (flat plate at yaw):

• Independent of Grid Size

• High Stretching Factors

• Reynolds number up to 105

• Parallel version of the smoother exhibits similar convergence properties to the lexicographic order with good scalability

• Coral shows better performance than O2K and of course much better performance-cost ratio

• Real CFD codes shows a higher computation to communication ratio and higher memory access locality than the NAS/MG kernel due to the need of implicit smoothers. So our code characterizes better the CFD work load

PART 3: Extension to Navier-Stokes

Conclusions of the Third Part

Page 99: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

99/106

PART 3: Extension to Navier-Stokes

Details of this research in

R. S. Montero, I. M. Llorente and M. D. Salas, Robust Multigrid Algorithms for the Navier-Stokes Equations, in press, Computational

Physics, Academic Press

R. S. Montero, I. M. Llorente and M. D. Salas, Semicoarsening and Implicit Smoothers for the Simulation of a Flat

Plate at Yaw, ICASE Report No. 2001-13

Conclusions of the Third Part

Page 100: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Future Research

ISCR-CASC-LLNL Ignacio Martín Llorente

100/106

• Flow over rectangular bodies sitting on a flat plate to study the flow field over a MEMS (MEMS= micro-electronic-mechanical systems) device or over buildings

• Turbulence model

• Multiblock grids with two levels of parallelism MPI-OpenMP (LlDM00)

• Distributive smoothers for Navier-Stokes

• Non-cartesian grids

• Incompressible Navier-Stokes

Page 101: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

Acknowledgements and People Involved in this Research

ISCR-CASC-LLNL Ignacio Martín Llorente

101/106

Universidad Complutense de Madrid Institute for Computer Application In Science and Engineering

NASA Langley

Francisco Tirado FernándezManuel Prieto Matías

Rubén Santiago Montero

Manuel D. SalasBoris Diskin

N. Duane MelsonJames L. Thomas

This research was supported by the National Aeronautics and Space Administration under NASA Contract No. NAS1-97046 while the authors were in residence at the Institute for Computer Applications in Science and Engineering (ICASE), NASA Langley Research Center, Hampton, VA 23681-2199. The research was also supported in part by the Spanish research

grant CICYT TIC 99/0474 and the US-Spain Joint Commission for Scientific and Technological Cooperation

Page 102: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

102/106

THANKS FOR YOUR ATTENTION!

MUCHAS GRACIAS

I am sorry if I have forgotten to include any of your references

Page 103: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

103/106References

Bran84 A. Brandt, Multigrid Techniques: 1984 Guide with Applications to Fluid Dynamic, GMD-Sudien 85,1984

Bran98 Achi Brandt, Barriers to Achieving Textbook Multigrid Efficiency (TME) in CFD, ICASE Interim Report 32, 1998

BrFJ00 P. N. Brown, R. D. Falgout, and J. E. Jones, Semicoarsening Multigrid on Distributed Memory Machines, SIAM Journal of Scientific Computing, Vo. 21, Nº 5, 2001

BrMR82 A. Brandt, S. F. McCormick, and J. W. Ruge, Algebraic multigrid (AMG) for automatic multigrid solutions with application to geodetic computations. Report, Inst. for Computational Studies, Fort Collins, Colo., 1982

BrYa92 Achi Brandt, and I. Yavneh, On Multigrid Solution of High-Reynolds Incompressible Entering Flows, Journal of Computational Physics, Vol. 101, pp. 151-164, 1992

CFHJ00 Cleary, A.J., R.D. Falgout, V.E. Henson, J.E. Jones, T.A. Manteuffel, S.F. McCormick, G.N. Miranda,and J.W. Ruge, Robustness and Scalability of Algebraic Multigrid, SIAM Journal on ScientificComputing, Vol. 21, Nº 5, 2000

Disk99 B. Diskin, Solving Upwind-Biased Discretization II: Multigrid Solver Using Semicoarsening, ICASE Report 99-14, March 1999

DMRR89 J. Dendy, Multigrid methods for the three-dimensional petroleum reservoir simulation, S. McCormick, J. Ruge, T. Russell, and S. Schaffer, Proc. tenth SPE Symposium on Reservoir Simulation, 1989

Page 104: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

104/106References

FrMc91 P. O. Frederickson, O. A. McBryan. Normalized convergence rates for the PSMG method. SIAM J. Sci. Stat. Comput., Vol 12, 1991, pp. 221-229

GrGK93 A. Grama, A. Gupta, and V. Kumar, Isoefficiency Function: A Scalability Metric for Parallel Algorithms and Architectures, IEEE Parallel and Distributed Technology, Vol. 1, Nº 3, 1993

Gold33 S. Goldstein, On the Two-dimensional Steady Flow of Viscous Fluid behind a Solid Body, Proc. Royal Society of London, Series A, pp. 545-573

Gust88 J. L. Gustafson, Reevaluating Amdahl’s Law, Communications of the ACM, Vol. 31, Nº 5, 1988

HaHG92 T. Hayase, J. Humphrey, and R Greif, A Consistently Formulated QUICK Scheme for Fast and Stable Convergence Using Finite-Volume Iteration Calculation Procedures, Journal of Computational Physics, Vol. 98, pp. 108-118, 1992

LlMe00 I. M. Llorente, and N. D. Melson, Behavior of Plane Relaxation Methods as Multigrid Smoothers, Special ETNA issue on the Ninth Copper Mountain Conference on Multigrid Methods, Vol 10, 2000

LlDM00 I. M. Llorente, B. Diskin, N. D. Melson. Plane smoothers for multiblock grids: Computational aspects. SIAM Journal on Scientific Computing, Edited by SIAM, Vol. 22 Number 1 pp. 218-242, 2000

LlTi97 I. M. Llorente, and F. Tirado, Relationships between Efficiency an Execution Time of Full Multigrid Methods on Parallel Computers, IEEE Transactions on Parallel and Distributed Systems, Vol. 8, Nº 6, 1997

LlTV96 I. M. Llorente, and F. Tirado, and L. Vázquez, Some Aspects about the Scalability of Scientific Applications on Parallel Computers, Parallel Computing, Vol. 22, 1996

Page 105: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

105/106References

MoDe99 D. Moulton, J. Dendy. MPI-based black box multigrid for workstation clusters. Proc. of the ninth Copper Mountain Conference on Multigrid Methods, 1999

MoLS01 R. S. Montero, I. M. Llorente and M. D. Salas, Robust Multigrid Algorithms for the Navier-Stokes Equations, in press, Computational Physics, Academic Press

Muld89 W. A. Mulder. A new multigrid approach to convection problems. J. Comput. Phys, 83, 1989

OGWW98 C. W. Oosterlee, F. Gaspar, T. Washio, and R. Wienands, Multigrid Line Smoothers for Higher Order Upwind Discretizations of Convection-Dominated Problems, Journal of Computational Physics, Vol. 1, pp. 274-307, 1998

Oost95 C. W. Oosterlee. The convergence of parallel multiblock multigrid methods. Appl. Num. Mathematics19, 1995

OvRo93 A. Overman, J. van Rosendale. Mapping Robust Parallel Multigrid Algorithms to Scalable Memory Architectures. ICASE Report 93-24 1993

Vank86 S. P. Vanka. Block-Implicit Multigrid Solution of Navier-Stokes Equation in Primitive Variables, Journal of Computational Physics, Vol. 65, pp. 94-121, 1986

PrLT00 M. Prieto, I. M. Llorente and F. Tirado, Data Locality Exploitation in the Decomposition of Regular Domain Problems, IEEE Transactions on Parallel and Distributed Systems, Vol. 11, Nº 11, 2000, pp. 1141-1150

PSEL01 M. Prieto, R. Santiago, D. Espadas, I. M. Llorente and F.Tirado, Parallel Multigrid for Anisotropic Elliptic Equations, Journal of Parallel and Distributed Computing, Vol. 61, No. 1, 2001

Page 106: Parallel and Robust Multigrid Techniques on Structured Gridsdsa-research.org/lib/exe/fetch.php?media=people:llorente:llnl.pdf · Parallel and Robust Multigrid Techniques on Structured

ISCR-CASC-LLNL Ignacio Martín Llorente

106/106References

RuSt87 J. W. Ruge and K. Stuuben, Algebraic Multigrid, in Multigrid Methods, Frontiers in Applied Mathematics, SIAM, 1987

Scha98 S. Schaffer, A Semicoarsening Multigrid Method for Elliptic Partial Differential Equations with Highly Discontonuos and Anisotropic Coefficient, SIAM Journal of Scientific Computing, Vol. 20, Nº1, 1998

SuNi90 X. Sun, and L. M. Ni, Another View of Parallel Speedup, Supercomputing ’90 Proceedings, 1990

XiSc97 Dexuan Xie, Ridgway Scott. The Parallel U-Cycle multigrid method. Proc. eighth Copper Mountain Conference on Multigrid Methods. Copper Mountain, Colorado, Abril 1997.