Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI...

52
July 30 - August 1, 20 02 Virtual Topologies 1 of 52 Linux Clusters and Tiled Display Walls MPI Virtual Topologies Kadin Tseng Scientific Computing and Visualization Group Boston University
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Linux Clusters and Tiled Display Walls July 30 - August 1, 2002 Virtual Topologies 1 of 52 MPI...

July 30 - August 1, 2002

Virtual Topologies

1 of 52

Linux Clusters and Tiled Display Walls

MPI

Virtual Topologies

Kadin Tseng

Scientific Computing and

Visualization GroupBoston University

July 30 - August 1, 2002

Virtual Topologies

2 of 52

Linux Clusters and Tiled Display WallsVirtual Topologies

Frequently, realistic science and engineering problems are inherently two or higher dimensional. For parallel computing, these higher physical dimensions often lead to domain decompositions of comparable number of dimensions in the computational domain. Ordinarily, the process ranksof the decomposition are expressed in linear order (i.e., 0, 1, 2, …). Toenable a code to refer to these ranks in multi-dimensional coordinatessimilar to that which arises in a domain decomposition, the MPI library provides many virtual topology routines. There are two kinds of virtual topologies: Cartesian and general. Only the Cartesian topology routines will be discussed in the following ...

July 30 - August 1, 2002

Virtual Topologies

3 of 52

Linux Clusters and Tiled Display WallsExample: A 9 x 4 Array

• Consider case of calculations associated with a 9x4 matrix.

• The parenthesized number-pairs, (i, j), denote array row and column indexes, respectively.

• Assume six processes available for parallel computation.

• One domain decomposition leads to six 3x2 submatrices as shown.

• Bottom number in each cell denotes process rank to which array element (i, j) belongs.

i

j(0,0)

0 (0,0)

0

(0,1)

0

(0,2)

1

(0,3)

1 (1,0)

0

(1,1)

0

(1,2)

1

(1,3)

1 (2,0)

0

(2,1)

0

(2,2)

1

(2,3)

1

(6,0)

4

(6,1)

4

(6,2)

5

(6,3)

5 (7,0)

4

(7,1)

4

(7,2)

5

(7,3)

5 (8,0)

4

(8,1)

4

(8,2)

5

(8,3)

5

(3,0)

2

(3,1)

2

(3,2)

3

(3,3)

3 (4,0)

2

(4,1)

2

(4,2)

3

(4,3)

3 (5,0)

2

(5,1)

2

(2,2)

3

(2,3)

3

July 30 - August 1, 2002

Virtual Topologies

4 of 52

Linux Clusters and Tiled Display WallsDomain Decomposition

• Domain decomposition yields linear order representation of process topology.

• Number in each cell denotes process, or thread, number.

• Each cell represents a 3x2 array out of the 9x4 array.

• Work within each cell is performed by a single process.

0 1

2 3

4 5

July 30 - August 1, 2002

Virtual Topologies

5 of 52

Linux Clusters and Tiled Display Walls2D Cartesian Topology

• Can map linear rank order into a 2D Cartesian topology (i,j) via MPI function call.

• Example: linear rank 3 can be addressed by 2D Cartesian coordinates, (1,1).

• Each cell represents a 3x2 matrix block whose work will be performed by the indicated process rank.

• MPI range index starts from 0.• MPI ranks follows row-major

convention.

(0,0)

0

(0,1)

1

(1,0)

2

(1,1)

3

(2,0)

4

(2,1)

5

July 30 - August 1, 2002

Virtual Topologies

6 of 52

Linux Clusters and Tiled Display Walls

Cartesian Topology Solver Example

• Laplace Equation

• Laplace Equation Discretized

• Computational Domain

– Five-point finite-difference stencil

• Jacobi Scheme

• Parallel Jacobi Scheme

– Unknowns At Border Cells

– Message Passing For Boundary Cells

– For Individual Threads . . .

– MPI Functions Needed For Job

• Successive Over Relaxation (SOR) Scheme

• Parallel SOR Scheme

• Scalability

July 30 - August 1, 2002

Virtual Topologies

7 of 52

Linux Clusters and Tiled Display WallsLaplace Equation

Laplace Equation:

Boundary Conditions:

(1)

(2)

Analytical solution:

(3)

]1,0[,2

2

2

2

yx 0y

u

x

u

10 010

101

100

y yuyu

x exsinxu

x xsinxux

),(),(

)(),(

)(),(

]1,0[,)(),( yx exsinyxu xy

July 30 - August 1, 2002

Virtual Topologies

8 of 52

Linux Clusters and Tiled Display WallsLaplace Equation Discretized

Discretize Equation (1) by centered-difference yields:

where n and n+1 denote current and next time step, respectively, while

For simplicity, we take

(4)

(5)

mj m; i4

uuuuu ,1,2,,1,2,

n1i,j

n1i,j

n1,ji

n1,jin

ji

1,

1m

1yx

), yjx(iu

m; jm) i,y(xuun

jinn

i,j 1,0,1,2,1,0,1,2,

July 30 - August 1, 2002

Virtual Topologies

9 of 52

Linux Clusters and Tiled Display WallsComputational Domain

mj m; i4

uuuuu ,1,2,,1,2,

n1i,j

n1i,j

n1,ji

n1,ji1n

ji

,

01,y)u(

00,y)u(

exsinxu 1 )(),(

xexsin()u(x,1 )

x, i

y, j

July 30 - August 1, 2002

Virtual Topologies

10 of 52

Linux Clusters and Tiled Display WallsFive-point Finite-Difference Stencil

Interior (or solution) cells

Where solution of the Laplace equation is sought.

(i, j)

Exterior (or boundary) cells

Blue cells denote cells where non-homogeneous boundary conditions are imposed while homogeneous boundary conditions are shown as green cells.

x x

x

x

o

x x

x

x

o

4

uuuuu

n1i,j

n1i,j

n1,ji

n1,ji1n

ji

,

July 30 - August 1, 2002

Virtual Topologies

11 of 52

Linux Clusters and Tiled Display WallsJacobi Scheme

1. Make initial guess for u at all interior points (i,j) at time n.

2. Use 5-pt stencil to compute at all interior points (i,j).

3. Stop if prescribed convergence threshold is reached, otherwise continue on to the next step.

4. Update: for all interior i and j .

5. Go to step 2.

1njiu

,

This is a simple iterative scheme that lends itself as an intuitive instructional procedure. Slowness in convergence renders it impractical for real applications.

1nji

nji uu ,,

July 30 - August 1, 2002

Virtual Topologies

12 of 52

Linux Clusters and Tiled Display Walls Solution Contour Plot

July 30 - August 1, 2002

Virtual Topologies

13 of 52

Linux Clusters and Tiled Display WallsJacobi Iterative Solver – F77

PROGRAM Jacobi

! Solve Laplace equation using Jacobi iteration method

IMPLICIT NONE

INTEGER m, mp, mp2, iter, i, j, K, P

REAL*8 TOL, gdel, PI REAL start_time, end_time

CHARACTER*40 header, outfile

INTEGER MAX_M, MAXSTEPS, INCREMENT

PARAMETER (P=1, K=0, TOL=1.d-6, PI=3.14159265)

PARAMETER (MAX_M=512, MAXSTEPS=50000, INCREMENT=100)

REAL*8 u((MAX_M+2)*(MAX_M+2))

REAL*8 unew((MAX_M+2)*(MAX_M+2))

m = MAX_M

mp = m/P

gdel = 1.0d0

iter = 0

July 30 - August 1, 2002

Virtual Topologies

14 of 52

Linux Clusters and Tiled Display WallsJacobi . . . F77 (continued)

call CPUtime(start_time) ! start timer, measured in seconds CALL bc(m, mp, u, K, P) ! initialize and define B.C. for u DO WHILE (gdel .gt. TOL) ! iterate until error below threshold iter = iter + 1 ! increment iteration counter IF(iter .gt. MAXSTEPS) THEN header = 'Number of iterations exceeds limit‘ CALL error(header,-1) ENDIF CALL update_jacobi(m, mp, u, unew, gdel) IF(MOD(iter,INCREMENT) .eq. 0) THEN WRITE(*,"('iter,gdel:',i6,e12.4)")iter,gdel ENDIF ENDDO call CPUtime(end_time) ! stop timer

July 30 - August 1, 2002

Virtual Topologies

15 of 52

Linux Clusters and Tiled Display WallsJacobi . . . F77 (continued)

WRITE(*,"(40('#'))") WRITE(*,"(' Wall clock time =',f10.2,' x ',i3)") & end_time - start_time,p WRITE(*,"(' Stopped at iteration =',i5)") iter WRITE(*,"(' The maximum error =',f12.6)") gdel WRITE(*,"(40('#'))") outfile = 'plots‘ OPEN(unit=40,file=outfile,form='formatted',status='unknown') WRITE(40,"(3i5)")m+2,mp+2,P close(40) WRITE(outfile, "('plots.',i1)")K OPEN(unit=41,file=outfile,form='formatted‘,status='unknown') WRITE(41,"(6e13.4)")(u(i),i=1,(m+2)*(mp+2)) close(41) END

July 30 - August 1, 2002

Virtual Topologies

16 of 52

Linux Clusters and Tiled Display Wallsbc.f

SUBROUTINE bc(m, n, v, k, p) IMPLICIT NONE INTEGER p, m, n, k, i CHARACTER*40 header REAL*8 v(0:m+1, 0:n+1), PI PARAMETER (PI = ACOS(-1.0d0)) CALL init_array(m, n, v) ! Init. v

IF (p .eq. 1) THEN ! serial

DO i=0,m+1

v(i,0) = sin(PI*i/(m+1)) ! Y=0

v(i,n+1) = v(i,0)*exp(-PI) ! Y=1

ENDDO

ELSE IF (p .gt. 1) THEN ! multithreads IF (k .eq. 0) THEN ! 1st thread DO i=0,m+1 v(i,0) = sin(PI*i/(m+1)) ! at y=0 ENDDO ENDIF IF (k .eq. p-1) THEN ! last thread DO i=0,m+1 ! at y=1 v(i,n+1) = sin(PI*i/(m+1))*exp(-PI) ENDDO ENDIF

ELSE ! Invalid p

header = 'p invalid in subroutine BC‘

CALL error(header,-1)

ENDIF

END

July 30 - August 1, 2002

Virtual Topologies

17 of 52

Linux Clusters and Tiled Display Wallsupdate_jacobi.f

SUBROUTINE update_jacobi(m, n, v, vnew, del)

IMPLICIT NONE

INTEGER i, j, m, n

REAL*8 v(0:m+1,0:n+1), vnew(0:m+1,0:n+1), del

del = 0.0d0

DO j=1,n

DO i=1,m

vnew(i,j) = ( v(i,j+1) + v(i+1,j) + v(i-1,j) + v(i,j-1) )*0.25

del = del + DABS(vnew(i,j)-v(i,j)) ! local error sum

ENDDO

ENDDO

DO j=1,n

DO i=1,m

v(i,j) = vnew(i,j) ! update v

ENDDO

ENDDO

END

July 30 - August 1, 2002

Virtual Topologies

18 of 52

Linux Clusters and Tiled Display WallsJacobi Iterative Solver – C

#include "solvers.h"

INT main() {

/********** MAIN PROGRAM ******************************

* Solve Laplace equation using Jacobi iterative method *

* Kadin Tseng, Boston University, August, 2000 *

*********************************************************/

INT iter, m, mp;

REAL gdel, **u, **un;

CHAR line[10];

fprintf(OUTPUT,"Enter size of interior points, m :");

(void) fgets(line, sizeof(line), stdin);

(void) sscanf(line, "%d", &m);

fprintf(OUTPUT,"m = %d\n",m);

mp = m/P;

u = allocate_2D(m, mp); /* allocate mem for 2D array */

un = allocate_2D(m, mp);

July 30 - August 1, 2002

Virtual Topologies

19 of 52

Linux Clusters and Tiled Display WallsJacobi … Solver – C (Cont’d)

gdel = 1.0; /* initialize global error to 1 */

iter = 0; /* initialize iteration counter */

bc(m, mp, u, K, P); /* initialize and define B.C. for u */

replicate(m, mp, u, un); /* copy u to un */

while (gdel > TOL) { /* iterate until error below threshold */

iter++; /* increment iteration counter */

if(iter > MAXSTEPS) {

fprintf(OUTPUT,"Iteration terminated (exceeds %6d", MAXSTEPS);

fprintf(OUTPUT," )\n");

return (1); /* nonconvergent solution */

}

update_jacobi(m, mp, u, un, &gdel); /* update u by Jacobi scheme */

if(iter%INCREMENT == 0) {

fprintf(OUTPUT,"iter,gdel: %6d, %lf\n",iter,gdel);

}

}

July 30 - August 1, 2002

Virtual Topologies

20 of 52

Linux Clusters and Tiled Display WallsJacobi … Solver – C (Cont’d)

fprintf(OUTPUT,"Stopped at iteration %d\n",iter);

fprintf(OUTPUT,"The maximum error = %f\n",gdel);

/* write u to file for use in MATLAB plots */

write_file( m, mp, u, K, P );

return (0);

}

July 30 - August 1, 2002

Virtual Topologies

21 of 52

Linux Clusters and Tiled Display Wallsbc.c

void bc(INT m, INT n, REAL **u, INT k, INT p) {

/*********** Boundary Conditions*****************************

* PDE: Laplacian u = 0; 0<=x<=1; 0<=y<=1 *

* B.C.: u(x,0)=sin(pi*x); u(x,1)=sin(pi*x)*exp(-pi); u(0,y)=u(1,y)=0 *

* SOLUTION: u(x,y)=sin(pi*x)*exp(-pi*y) * ************************************************************/

INT i;

init_array( m, n, u); /* initialize u to 0 */

if (p == 1) {

for (i = 0; i <= m+1; i++) {

u[i][ 0] = sin(PI*i/(m+1)); /* at y = 0; all x */

u[i][n+1] = u[i][0]*exp(-PI); /* at y = 1; all x */

}

} else if (p < 1) {

printf("p is invalid\n");

}

July 30 - August 1, 2002

Virtual Topologies

22 of 52

Linux Clusters and Tiled Display Wallsbc.c (continued)

if (p > 1) { /* parallel processing */

if (k == 0) { /* the 1st process */

for (i = 0; i <= m+1; i++) {

u[i][0] = sin(PI*i/(m+1)); /* at y = 0; all x */

}

}

if (k == p-1) { /* the last process */

for (i = 0; i <=m+1; i++) {

u[i][n+1] = sin(PI*i/(m+1))*exp(-PI); /* at y = 1; all x */

}

}

}

}

July 30 - August 1, 2002

Virtual Topologies

23 of 52

Linux Clusters and Tiled Display Wallsupdate_jacobi.c

INT update_jacobi( INT m, INT n, REAL **u, REAL **unew, REAL *del) {

INT i, j;

*del = 0.0;

for (i = 1; i <= m; i++) {

for (j = 1; j <= n; j++) {

unew[i][j] = ( u[i ][j+1] + u[i+1][j ] +

u[i-1][j ] + u[i ][j-1] )*0.25;

*del += fabs(unew[i][j] - u[i][j]); /* find local max error */

}

}

for (i = 1; i <= m; i++) {

for (j = 1; j <= n; j++) {

u[i][j] = unew[i][j];

}

}

return (0); }

July 30 - August 1, 2002

Virtual Topologies

24 of 52

Linux Clusters and Tiled Display WallsDomain Decompositions

1D Domain Decomposition 2D Domain Decomposition

Thread 0 Thread 1

Thread 2 Thread 3

Thread 0

Thread 2

Thread 3

Thread 1

July 30 - August 1, 2002

Virtual Topologies

25 of 52

Linux Clusters and Tiled Display WallsUnknowns At Border Cells – 1D

thread 2

thread 1

thread 0

Five-point finite-difference stencil applied at thread domain border cells require cells from neighboring threads and/or boundary cells.

Message passing required

x x

x

x

o

x x

x

x

o

x x

x

x

o

July 30 - August 1, 2002

Virtual Topologies

26 of 52

Linux Clusters and Tiled Display Walls

Message Passing to Fill Boundary Cells

thread k+1

thread k

thread k-1

current thread

July 30 - August 1, 2002

Virtual Topologies

27 of 52

Linux Clusters and Tiled Display WallsFor Individual Threads . . .

4

vvvvv

kn1

kn1

kn1

kn1k1n

,,

,,

,,

,,,

,

thread k of p threads

1,p0,1,2,k m/p;m'

,m'1,2,η ,m;1,2,ξ

Recast 5-pt finite-difference stencil for individual threads

1pk 1m0

1pk0 1m0

1pk0 1m0

0k 1m0

vv

vv

vv

vv

1knm

kn0

1kn1

kn1m

1knm

kn0

1kn1

kn1m

;,,

;,,

;,,

;,,

;

;

;

;

,',

,,

,,

,',

,',

,,

,,

,',

Boundary Conditions

• For simplicity, assume m divisible by p • B.C. time-dependent• B.C. obtained by message-passing• Additional boundary conditions on next page

July 30 - August 1, 2002

Virtual Topologies

28 of 52

Linux Clusters and Tiled Display WallsRelationship Between u and v

Physical boundary conditions

1pk0 m'1

1pk0 m'1

1pk 1m0i

0k 1m0i

0y1uv

0y0uv

exsin1xuv

xsin0xuv

mkkn1m

mkkn

0

iikn

1m

iikn0

;,,

;,,

;,,

;,,

;),(

;),(

;)(),(

);(),(

'*,

,

'*,,

,',

,,

knnmk vu ,

,'*,

Relationship between global solution u and thread-local solution v

1p , 0,1,2,k m/p;m'

m' , 1,2,η m; , 1,2,ξ

July 30 - August 1, 2002

Virtual Topologies

29 of 52

Linux Clusters and Tiled Display WallsMPI Functions Used For Job

• MPI_Sendrecv ( = MPI_Send + MPI_Recv) – to set boundary conditions for individual threads

• MPI_Allreduce – to search for global error to determine whether convergence has been reached.

• MPI_Cart_Create – to create Cartesian topology

• MPI_Cart_Coords – to find equivalent Cartesian coordinates of given rank

• MPI_Cart_Rank – to find equivalent rank of Cartesian coordinates

• MPI_Cart_shift – to find current thread’s adjoining neighbor threads

July 30 - August 1, 2002

Virtual Topologies

30 of 52

Linux Clusters and Tiled Display WallsSuccessive Over Relaxation

1. Make initial guess for u at all interior points (i,j).

2. Define a scalar

3. Use 5-pt stencil to compute at all interior points (i,j).

4. Compute

5. Stop if prescribed convergence threshold is reached.

6. Update:

7. Go to step 2.

2)ω(0 nn

njinjin

1nji u( uu 1 ,,, )'

ji uu 1nji

nji ,,,

2)1(2

4/11

4/11

22/11

)(

;;

2

122

m

n

10

1

0

2n;

1n

', jiu

In Step 3, compute u' with u at time n+1 wherever possible to accelerate convergence. This inhibits parallelism.

July 30 - August 1, 2002

Virtual Topologies

31 of 52

Linux Clusters and Tiled Display WallsRed-Black SOR Scheme

X

X

O x x

x

x

o

x x

x

x

o

x x

x

x

o

1. Compute v at black cells at time n+1 in parallel using v at red cells at time n.

2. Compute v at red cells at time n+1 in parallel with v at black cells at time n+1.

3. Repeat steps 1 and 2 until v fully converged.

Can alternate order of steps 1 and 2.

Note that, by virtue of 5-pt stencil, solution at black cells depend on 4 adjacent red cells. Conversely, red solution cells depend only on 4 respective adjoining black cells. This suggests the following parallel algorithm:

July 30 - August 1, 2002

Virtual Topologies

32 of 52

Linux Clusters and Tiled Display WallsParallel SOR F77 Program

PROGRAM sor_parallel_f77

IMPLICIT NONE

INTEGER m, mp, iter, p, i, j, k

REAL start_time, end_time

CHARACTER*40 header, outfile

REAL*8 TOL, omega, rhoj, rhojsq, PI, del, delr, delb, gdel

INTEGER grid_comm, me, iv, coord, dims, periods, ndim, ierr

INTEGER below, above

LOGICAL reorder

INTEGER MAX_M, MAXSTEP, INCREMENT

PARAMETER (MAX_M = 512, MAXSTEP = 50000, INCREMENT=100)

PARAMETER ( TOL =1.d-6, PI = 3.14159265)

PARAMETER ( periods=0, ndim=1, reorder = .false.)

REAL*8 v((MAX_M+1+2)*(MAX_M+2)) INCLUDE 'mpif.h' outfile = ‘plots’

July 30 - August 1, 2002

Virtual Topologies

33 of 52

Linux Clusters and Tiled Display WallsParallel SOR F77 Program (cont’d)

CALL MPI_Init(ierr) ! starts MPI CALL MPI_Comm_rank(MPI_COMM_WORLD, k, ierr) CALL MPI_Comm_size(MPI_COMM_WORLD, p, ierr)

start_time = MPI_Wtime() ! Start timer, in seconds m = MAX_M mp = m/p ! m’ gdel = 1.0d0 ! Initialize global error to 1 iter = 0 ! Initialize iteration counter

rhoj = 1.0d0 - PI*PI*0.5/(m+2)/(m+2) ! Chebyshev coefficients rhojsq = rhoj*rhoj

July 30 - August 1, 2002

Virtual Topologies

34 of 52

Linux Clusters and Tiled Display WallsParallel SOR F77 Program (cont’d)

! create cartesian topology for matrix dims = p CALL MPI_Cart_create(MPI_COMM_WORLD, ndim, dims, & periods, reorder, grid_comm, ierr) CALL MPI_Comm_rank(grid_comm, me, ierr) CALL MPI_Cart_coords(grid_comm, me, ndim, coord, ierr) iv = coord CALL bc(m, mp, v, iv, p) ! set up boundary conditions CALL MPI_Cart_shift(grid_comm, 0, 1, below, above, ierr) omega = 1.0d0 CALL update_sor( m, mp, v, omega, delr, 'r') CALL update_bc_2( m, mp, v, iv, below, above) omega = 1.0d0/(1.0d0 - 0.50d0*rhojsq) CALL update_sor( m, mp, v, omega, delb, 'b') CALL update_bc_2( m, mp, v, iv, below, above)

July 30 - August 1, 2002

Virtual Topologies

35 of 52

Linux Clusters and Tiled Display WallsParallel SOR F77 Program (cont’d)

DO WHILE (gdel .gt. TOL) ! Loop until gdel <=TOL iter = iter + 1 ! increment iteration counter omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega) CALL update_sor( m, mp, v, omega, delr, 'r') CALL update_bc_2( m, mp, v, iv, below, above) omega = 1.0d0/(1.0d0 - 0.25d0*rhojsq*omega) CALL update_sor( m, mp, v, omega, delb, 'b') CALL update_bc_2( m, mp, v, iv, below, above) IF(MOD(iter,INCREMENT) .eq. 0) THEN ! Check error once awhile del = (delr + delb)*4.d0 CALL MPI_Allreduce( del, gdel, 1, MPI_DOUBLE_PRECISION, & MPI_MAX, MPI_COMM_WORLD, ierr ) ! find global max error IF ( iter .ge. MAXSTEP) THEN header = 'Number of iterations exceeded limit' CALL error(header,-1) ENDIF IF(k .eq. 0) WRITE(*,'(i5,3d13.5)')iter,del,gdel,omega ENDIF ENDDO

July 30 - August 1, 2002

Virtual Topologies

36 of 52

Linux Clusters and Tiled Display WallsParallel SOR F77 Program (cont’d)

end_time = MPI_Wtime() ! stop timer IF (k .eq. 0) THEN WRITE(*,"(40('#'))") WRITE(*,"(' Wall clock time =',f10.2,' x ',i3)") & end_time - start_time,p WRITE(*,"(' Stopped at iteration ',i5)")iter WRITE(*,"(' The maximum error =',f12.6)")gdel WRITE(*,"(40('#'))") OPEN(unit=40, file=outfile, form='formatted', status='unknown') WRITE(40,"(3i5)")m+2,mp+2,p close(40) ENDIF WRITE(outfile, "('plots.',i1)")k OPEN(unit=41, file=outfile, form='formatted', status='unknown') WRITE(41,"(6e13.4)")(v(i),i=1,(m+2)*(mp+2)) close(41) CALL MPI_Finalize(ierr) END ! PROGRAM sor_parallel_f77

July 30 - August 1, 2002

Virtual Topologies

37 of 52

Linux Clusters and Tiled Display Wallsupdate_sor.f

SUBROUTINE update_sor(m, n, u, omega, del, redblack)!***********************************************************!* Updates Laplace solution u base on red-black iterative scheme *!* m - (INPUT) 1st dimension size *!* n - (INPUT) 2nd dimension size *!* u - (INPUT) solution array; u = u(0:m+1,0:n+1) *!* omega - (INPUT) iterative parameter *!* del - (OUTPUT) error between two iterative u *!* redblack - (INPUT) which to update; 'r' for red *!*********************************************************** IMPLICIT NONE CHARACTER redblack INTEGER m, n, i, ib, ie, j, jb, je REAL*8 omega, del, up REAL*8 u(0:m+1,0:n+1) del = 0.0d0

July 30 - August 1, 2002

Virtual Topologies

38 of 52

Linux Clusters and Tiled Display Wallsupdate_sor.f (cont’d)

IF (redblack .eq. 'r') THEN

! process RED odd points ...

jb = 1

je = n

ib = 1

ie = m

DO j=jb,je,2

DO i=ib,ie,2

up = ( u(i ,j+1) + u(i+1,j ) +

& u(i-1,j ) + u(i ,j-1) )*0.25d0

u(i,j) = (1.0d0 - omega)*u(i,j) +

& omega*up

del = del + dabs(up-u(i,j))

ENDDO

ENDDO

! process RED even points ...

jb = 2

je = n

ib = 2

ie = m

DO j=jb,je,2

DO i=ib,ie,2

up = ( u(i ,j+1) + u(i+1,j ) +

& u(i-1,j ) + u(i ,j-1) )*0.25d0

u(i,j) = (1.0d0 - omega)*u(i,j) +

& omega*up

del = del + dabs(up-u(i,j))

ENDDO

ENDDO

July 30 - August 1, 2002

Virtual Topologies

39 of 52

Linux Clusters and Tiled Display Wallsupdate_sor.f (cont’d)

ELSE IF ( redblack .eq. ‘b’) THEN

! process BLACK odd points ...

jb = 2

je = n

ib = 1

ie = m

DO j=jb,je,2

DO i=ib,ie,2

up = ( u(i ,j+1) + u(i+1,j ) +

& u(i-1,j ) + u(i ,j-1) )*0.25d0

u(i,j) = (1.0d0 - omega)*u(i,j) +

& omega*up

del = del + dabs(up-u(i,j))

ENDDO

ENDDO

! process BLACK even points ...

jb = 1

je = n

ib = 2

ie = m

DO j=jb,je,2

DO i=ib,ie,2

up = ( u(i ,j+1) + u(i+1,j ) +

& u(i-1,j ) + u(i ,j-1) )*0.25d0

u(i,j) = (1.0d0 - omega)*u(i,j) +

& omega*up

del = del + dabs(up-u(i,j))

ENDDO

ENDDO

ELSE

STOP

ENDIF

END

July 30 - August 1, 2002

Virtual Topologies

40 of 52

Linux Clusters and Tiled Display WallsSubroutine update_bc_2

SUBROUTINE update_bc_2( m, mp, v, k, below, above ) IMPLICIT NONE INCLUDE 'mpif.h' INTEGER m, mp, k, below, above, ierr REAL*8 v(0:m+1,0:mp+1) INTEGER status(MPI_STATUS_SIZE)

CALL MPI_Sendrecv( & v(1,mp), m, MPI_DOUBLE_PRECISION, above, 0, & v(1, 0), m, MPI_DOUBLE_PRECISION, below, 0, & MPI_COMM_WORLD, status, ierr ) CALL MPI_Sendrecv( & v(1, 1), m, MPI_DOUBLE_PRECISION, below, 1, & v(1,mp+1), m, MPI_DOUBLE_PRECISION, above, 1, & MPI_COMM_WORLD, status, ierr ) RETURN END ! SUBROUTINE update_bc_2

July 30 - August 1, 2002

Virtual Topologies

41 of 52

Linux Clusters and Tiled Display WallsParallel SOR C Program

#include "solvers.h"#include "mpi.h"INT main(INT argc, CHAR *argv[]) {/***************MAIN PROGRAM **************************** * Solve Laplace equation using Successive Over Relaxation * * and Chebyshev Acceleration (see Numerical Recipe for detail) * * Kadin Tseng, Boston University, August, 2000 * ***********************************************************/ INT iter, m, mp, p, k, below, above; REAL omega, rhoj, rhojsq, del, delr, delb, gdel; CHAR line[80], red, black; MPI_Comm grid_comm; INT me, iv, coord[1], dims, periods, ndim, reorder; REAL **v, **vt;

MPI_Init(&argc, &argv); /* starts MPI */ MPI_Comm_rank(MPI_COMM_WORLD, &k); /* get current process id */ MPI_Comm_size(MPI_COMM_WORLD, &p); /* get # procs from env or */

July 30 - August 1, 2002

Virtual Topologies

42 of 52

Linux Clusters and Tiled Display WallsParallel SOR C Program (cont’d)

periods = 0; ndim = 1; reorder = 0; red = 'r'; black = 'b';

if(k == 0) { fprintf(OUTPUT,"Enter size of interior points, m :\n"); (void) fgets(line, sizeof(line), stdin); (void) sscanf(line, "%d", &m); fprintf(OUTPUT,"m = %d\n",m); } MPI_Bcast(&m, 1, MPI_INT, 0, MPI_COMM_WORLD); mp = m/p; v = allocate_2D(m, mp); /* allocate mem for 2D array */ vt = allocate_2D(mp, m); /* allocate mem for 2D array */ gdel = 1.0; /* initialize global error to 1 */ iter = 0; /* initialize iteration counter */ rhoj = 1.0 - PI*PI*0.5/(m+2)/(m+2); /* Chebyshev coefficient */ rhojsq = rhoj*rhoj;/* create cartesian topology for matrix */ dims = p; /* cartesian coordinates has same size of procs */

July 30 - August 1, 2002

Virtual Topologies

43 of 52

Linux Clusters and Tiled Display WallsParallel SOR C Program (cont’d)

MPI_Cart_create(MPI_COMM_WORLD, ndim, &dims, &periods, reorder, &grid_comm); MPI_Comm_rank(grid_comm, &me); /* current proc */ MPI_Cart_coords(grid_comm, me, ndim, coord); /* current proc coord. */ iv = coord[0]; bc( m, mp, v, iv, p); /* set up boundary conditions */ transpose(m, mp, v, vt); /* transpose v into vt */ replicate(mp, m, vt, v);

MPI_Cart_shift(grid_comm, 0, 1, &below, &above); omega = 1.0; update_sor( mp, m, vt, omega, &delr, red); update_bc_2( mp, m, vt, iv, below, above); omega = 1.0/(1.0 - 0.50*rhojsq); update_sor( mp, m, vt, omega, &delb, black); update_bc_2( mp, m, vt, iv, below, above);

July 30 - August 1, 2002

Virtual Topologies

44 of 52

Linux Clusters and Tiled Display WallsParallel SOR C Program (cont’d)

while (gdel > TOL) { iter++; /* increment iteration counter */ omega = 1.0/(1.0 - 0.25*rhojsq*omega); update_sor( mp, m, vt, omega, &delr, red); update_bc_2( mp, m, vt, iv, below, above); omega = 1.0/(1.0 - 0.25*rhojsq*omega); update_sor( mp, m, vt, omega, &delb, black); update_bc_2( mp, m, vt, iv, below, above); if(iter%INCREMENT == 0) { del = (delr + delb)*4.0; MPI_Allreduce( &del, &gdel, 1, MPI_DOUBLE, MPI_MAX, MPI_COMM_WORLD); /* find global max error */ if (k == 0) { fprintf(OUTPUT,"iter gdel omega: %5d %13.5f %13.5f\n",iter,gdel,omega); } } if(iter > MAXSTEPS) { fprintf(OUTPUT,"Iteration terminated (exceeds %6d", MAXSTEPS); fprintf(OUTPUT," )\n"); return (1); /* nonconvergent solution */ } }

July 30 - August 1, 2002

Virtual Topologies

45 of 52

Linux Clusters and Tiled Display WallsParallel SOR C Program (cont’d)

if (k == 0) { fprintf(OUTPUT,"Stopped at iteration %d\n",iter); fprintf(OUTPUT,"The maximum error = %f\n",gdel); }

/* write v to file for use in MATLAB plots */ transpose(mp, m, vt, v); /* transpose v into vt */ write_file( m, mp, v, k, p);

MPI_Barrier(MPI_COMM_WORLD);

MPI_Finalize();

return (0);}

July 30 - August 1, 2002

Virtual Topologies

46 of 52

Linux Clusters and Tiled Display WallsParallel SOR C Program (cont’d)

INT update_sor( INT m, INT n, REAL **u, REAL omega, REAL *del, CHAR redblack) {/*********************************************************** * Updates u according to successive over relaxation method * * m - (INPUT) size of interior rows * * n - (INPUT) size of interior columns * * u - (INPUT) array * * omega - (INPUT) adjustable constant used to speed up * * convergence of SOR * * del - (OUTPUT) error norm between 2 solution steps * * redblack - (INPUT) either 'r' for red and 'b' for black * ************************************************************/ INT i, ib, ie, j, jb, je; REAL up;

*del = 0.0;

July 30 - August 1, 2002

Virtual Topologies

47 of 52

Linux Clusters and Tiled Display WallsFunction update_bc_2

#include "solvers.h"#include "mpi.h"

INT update_bc_2( INT mp, INT m, REAL **vt, INT k, INT below, INT above ) {MPI_Status status;

MPI_Sendrecv( vt[mp]+1, m, MPI_DOUBLE, above, 0, vt[0 ]+1, m, MPI_DOUBLE, below, 0, MPI_COMM_WORLD, status );

MPI_Sendrecv( vt[1 ]+1, m, MPI_DOUBLE, below, 1, vt[mp+1]+1, m, MPI_DOUBLE, above, 1, MPI_COMM_WORLD, status ); return (0);}

July 30 - August 1, 2002

Virtual Topologies

48 of 52

Linux Clusters and Tiled Display WallsParallel SOR C Program (cont’d)

if (redblack == 'r') {/* process RED odd points ... */ jb = 1; je = n; ib = 1; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25;

u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } }

/* process RED even points ... */ jb = 2; je = n; ib = 2; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1])*0.25;

u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } }

July 30 - August 1, 2002

Virtual Topologies

49 of 52

Linux Clusters and Tiled Display WallsParallel SOR C Program (cont’d)

} else { if (redblack == ‘b') {/* process BLACK odd points ... */ jb = 2; je = n; ib = 1; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1] )*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } }

/* process BLACK even points ... */ jb = 1; je = n; ib = 2; ie = m; for ( j = jb; j <= je; j+=2 ) { for ( i = ib; i <=ie; i+=2 ) { up = ( u[i ][j+1] + u[i+1][j ] + u[i-1][j ] + u[i ][j-1])*0.25; u[i][j] = (1.0 - omega)*u[i][j] + omega*up; *del += fabs(up-u[i][j]); } }

return (0);

} else {

return (1);

} } }

July 30 - August 1, 2002

Virtual Topologies

50 of 52

Linux Clusters and Tiled Display WallsSOR Timings On Origin2000

July 30 - August 1, 2002

Virtual Topologies

51 of 52

Linux Clusters and Tiled Display WallsTimings for an Iterative Solver

Processors SGI Origin2000

IBM

Regatta

Intel Pentium 3

Linux cluster

1 48.35 12.91 43.94

2 23.63 5.32 18.75

4 12.0 2.47 4.47

8 6.54 1.13 1.95

16 3.80 0.60 1.20

July 30 - August 1, 2002

Virtual Topologies

52 of 52

Linux Clusters and Tiled Display WallsTimings … (continued)