Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus...

22
Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus Widener University Computer Science Department

Transcript of Domain Decomposed Parallel Heat Distribution Problem in Two Dimensions Yana Kortsarts Jeff Rufinus...

Domain Decomposed Parallel Heat Distribution

Problem in Two Dimensions

Yana KortsartsJeff Rufinus

Widener University Computer Science Department

Introduction

2004: Office of Science in the Department of Energy issued a twenty-year strategic plan with seven highest priorities ranging from fusion energy to genomics.To achieve the necessary levels of algorithmic and computational capabilities it is essential to educate students in computation and computational techniques.Parallel Computing topic is one of the attractive topics in computation science field

Introductory Parallel Computer Course

Computer Science Department, Widener University, CS and CIS majors

Series of two courses Introduction to Parallel Computing I and II

Resources: computer cluster of six nodes, each node has two 2.4 GHz processors and 1 GB of memory, nodes are connected by Gigabit Ethernet switch.

Course Curriculum

Matrix Manipulation Numerical Simulation Concepts:

Direct applications in science and engineering

Introduction to MPI libraries and their applications

Concepts of parallelism Finite difference method for the 2-D heat

equation using parallel algorithm

2-D Heat Distribution Problem

The problem: to determine the temperature u(x,y,t) in an isotropic two-dimensional rectangular plate

The model:

0),1,(

0),0,(

0),,1(

100),,0(

conditionsBoundary

0)0,,(

conditionInitial

0,10,10,2

2

2

2

txu

txu

tyu

tyu

yxu

tyxy

u

x

u

t

u

Finite Difference Method

The finite difference method begins with the discretization of space and time such that there is an integer number of points in space and an integer number of times at which we calculate the temperature

x

y

(xi , yj)

t tk+1

tk

We will use the following notation:

We will use the finite difference approximations for the derivatives:

...,1,01,...2,1,0,1...2,1,0

,,where),,,(,,

kMjNi

tktyjyxixtyxuu kjikjikji

2,1,,,,1,

2,,1,,,,1,,1,,

)(

2

)(

2

y

uuu

x

uuu

t

uu kjikjikjikjikjikjikjikji

Expressing ui,j,k+1 from this equation yields:

22

,1,,1,,,,,1,,11,,

)(,

)(

)1()()221()(

y

ts

x

tr

uususruuru kjikjikjikjikjikji

0,100,0

ConditionsBoundaryandInitial

,,,0,,,,,00,, knikikjnkjji uuuuu

Finite Difference Method Explicit Scheme

2

1t

1

x

1 :ConditionStability

22

y

ui,j,k+1ui,j,k

ui,j+1,k

ui,j-1,k

ui-1,j,k ui+1,j,k

k k + 1

Single Processor Implementation

double u_old[n+1][n+1], u_new[n+1][n+1];Initialize u_old with initial values and boundary conditions;

while (still time points to compute) {for (i = 1; i < n; i++) {

for (j = 1; j < n; j++) {compute u_new[i, j] using formula (1)

}//end of for} // end of foru_old u_new;

} // end of while

Parallel ImplementationDomain Decomposition

Dividing computation and data into pieces Domain could be decomposed in three ways:

Column-wise: adjacent groups of columns (A) Row-wise: adjacent groups of rows (B) Block-wise: adjacent groups of two dimensional blocks (C)

(A) (B) (C)

Domain Decomposition and Partition

Example: column-wise domain decomposition method, 200 points

to be calculated simultaneously, 4 processors

MPI_Send and MPI_Recv

send

receive

receive

send

p # 1 p # 2

Processor 1

x0…x49

Processor 2

x50…x99

Processor3

x100…x149

Processor 4

x149…x199

Load Imbalance

When dividing the data into processes we have to pay attention to the number of loads being processed by each processor

Uneven load distribution may cause some processes to finish earlier than others

Load imbalance is one source of overhead Good task mapping is needed All tasks should be mapped onto processes as

evenly as possible so that all tasks complete in the shortest amount of time and the idle time is minimized

Communication

Communication time depends on the latency and the speed of communication network – these two factors are much slower than CPU’s communication time

There is a catch of using too many communications

(A)

(B)

P0 P1 P2 P3

P0 P1

P2 P3

Running Time and Speedup

The running time of one time iteration of the sequential algorithm is (MN), where M and N are numbers of grid points in each direction

The running time of one time iteration of the parallel algorithm is: computational time + communication time =

= (MN/p) + B where p is the number of processors and B is the total send-

receive communication time that is required for one time iteration

The speedup is always defined as:

algorithm parallel the of time running

algorithm sequential the of time runningspeedup

Results

The temperature distribution on the two-dimensional plate at a much later time

Results• Two cases were considered:

M x N = 1000 M x N = 500,000.

• Next slide shows the speed-up versus the number of processors for two different inputs: 500,000 (the top chart) and 1000 (the bottom chart).

• The dashed line indicates the speed-up equals to one, which is the sequential version of the algorithm.

• The higher the speed-up (at a specific number of processors) means the better the performance of the parallel algorithm.

• Most of the results come from the column-wise domain decomposition method.

Results

Results

For the case of input = 1000, the sequential version (with p = 1) is faster than the parallel version (p ≥ 2). The parallel version is slower because of the latency and speed of the communication network which does not exist in the sequential version.

The top chart shows the speedup versus the number of processors for total input = 500,000. In this case, as we increase the number of processors, the speedup also increases, reaching the speedup of ~ 4.13 at p = 10.

For a large number of inputs the communication time begins to catch up with the CPU’s computation time, resulting in a better performance of the parallel algorithm.

Speedup comparisons for column-wise and block-wise decomposition methods for number of

processors equals to 4 and 9

Total number

of inputsSpeedup

Column-wise Decomposition

P = 4

SpeedupColumn-wise Decomposition

P = 9

SpeedupBlock-wise

Decomposition

P = 4

SpeedupBlock-wise

Decomposition

P = 9

1,000 0.093 0.05 0.065 0.04

500,000 2.12 3.82 2.19 3.51

Results

Overall, the speedups between the two methods are not very different.

For number of inputs = 1,000 the column-wise decomposition produces better speed-ups than the block-wise decomposition.

For number of inputs = 500,000, we have a mixed result. The column-wise method performs better for 9 processors while the block-wise method performs (slightly) better for 4 processors.

The results given in the table do not give a conclusive idea of which decomposition method is better, unless the number of inputs and the number of processors could be extended beyond the ones used in here.

Summary

Numerical simulation of two-dimensional heat distribution has been used as an example that can be used to teach parallel computing concepts in an introductory course.

With this simple example we introduce the core concepts of parallelism: Domain decomposition and partitioning Load balancing and mapping Communication Speedup

We show the benchmarking results of the parallel version of two-dimensional heat distribution problem with different number of processors.

References

1. J. Dongarra, I. Foster, G. Fox, W. Gropp, K. Kennedy, L. Torczon, and A. White, (Editors), Sourcebook of Parallel Computing. Elsevier Science (2003).

2. I. Foster, Designing and Building Parallel Programs. Addison Wesley (1994).3. G. E. Karniadakis and R. M. Kirby, Parallel Scientific Computing in C++ and

MPI. Cambridge University Press (2003). 4. M. J. Quinn, Parallel Programming in C with MPI and OpenMP. McGraw Hill

Publishers (2005).5. B. Wilkinson and M. Allen, Parallel Programming. Second edition. Prentice-

Hall (2005).6. M. Snir, S. Otto, S. Huss-Lederman, D. Walker and J. Dongarra, MPI The

Complete Reference, Volume 1. Second edition. MIT Press (1998).7. W. F. Ames, Numerical Methods for Partial Differential Equations. Second

edition. Academic Press, New York (1977).8. T. Myint-U and L. Debnath, Partial Differential Equations for Scientists and

Engineers. Elsevier Science (1987).9. G. D. Smith, Numerical Solution of Partial Differential Equations: Finite

Difference Methods. Third edition. Oxford University Press (1985).10. S. S. Rao, Applied Numerical Methods for Engineers and Scientists. Prentice-

Hall (2002).