Parallel Programming in Chapel - Interdisciplinaryestrabd/LACSI2006/Programming Models/deitz.pdf–...

Parallel Programming in ChapelParallel Programming in Chapel

LACSI 2006October 18th, 2006

Steve DeitzChapel project, Cray Inc.

LACSI 2006October 18th, 2006

Steve DeitzChapel project, Cray Inc.

Why is Parallel Programming Hard?

Partitioning of data across processorsPartitioning of tasks across processorsInter-processor communicationLoad balancingRace conditionsDeadlockAnd more ...

How is it getting easier?

Support for new global address space languages– Co-Array Fortran

Modest extension to Fortran 95 for SPMD parallelismDeveloped at CrayAccepted as part of next Fortran

– Unified Parallel C (UPC)Small extension to C for SPMD parallelism

Development of new high-level parallel languages– Chapel

Under development at Cray (part of HPCS)Builds on MTA C, HPF, and ZPLProvides modern language features (C#, Java, …)

– Fortress and X10

Chapel’s Context

HPCS = High Productivity Computing Systems(a DARPA program)

Overall Goal: Increase productivity for High-End Computing (HEC) users by 10× for the year 2010

Productivity = Programmability+ Performance+ Portability+ Robustness

Result must be……revolutionary, not evolutionary…marketable to users beyond the program sponsors

Phase II Competitors (July `03 – June`06): Cray, IBM, Sun

Outline

Chapel Features– Domains and Arrays– Cobegin, Begin, and Synchronization Variables

3-Point Stencil Case Study– MPI– Co-Array Fortran– Unified Parallel C– Chapel

Domains and Arrays

Domain: an index set– Specifies size and shape of arrays– Supports sequential and parallel iteration– Potentially decomposed across locales

Three main classes:– Arithmetic: indices are Cartesian tuples

Rectilinear, multidimensionalOptionally strided and/or sparse

– Indefinite: indices serve as hash keysSupports hash tables, associative arrays, dictionaries

– Opaque: indices are anonymousSupports sets, graph-based computations

Fundamental Chapel concept for data parallelism

A generalization of ZPL’s region concept

A Simple Domain Declaration

var m: int = 4;var n: int = 8;

var D: domain(2) = [1..m, 1..n];

A Simple Domain Declaration

var m: int = 4;var n: int = 8;

var D: domain(2) = [1..m, 1..n];var DInner: subdomain(D) = [2..m-1, 2..n-1];

DInner

Domain Usage

Declaring arrays:var A, B: [D] float;

Sub-array references:A(DInner) = B(DInner);

Sequential iteration:for i,j in DInner { …A(i,j)… }

or: for ij in DInner { …A(ij)… }

Parallel iteration:forall ij in DInner { …A(ij)… }

or: [ij in DInner] …A(ij)…

Array reallocation:D = [1..2*m, 1..2*n];

ADInner BDInner

1 2 3 4 5 6 7 8 9 10 11 12

Other Arithmetic Domains

var StridedD: subdomain(D) = D by (2,3);

var SparseD: sparse domain(D) =(/(1,1),(4,1),(2,3),(4,2), … /);

StridedD

SparseD

Cobegin

Executes statements in parallelSynchronizes at join point

Composable with data parallel abstractions

cobegin {leftSum = computeLeft();rightSum = computeRight();

}sum = leftSum + rightSum;

var D: domain(1) distributed(Block) = [1..n];var A: [D] float;

cobegin {[i in 1..n/2] computeLeft(A(i));[i in n/2+1..n] computeRight(A(i));

Synchronization Variables

Implements produce/consumer idioms

Subsumes locks

Implements futures

cobegin {var s: sync int = computeProducer();computeConsumer(s);

var lock: sync bool;

lock = true;compute();

Spawns a concurrent computationNo synchronization

Subsumes SPMD model

var leftSum: sync int;begin leftSum = computeLeft();rightSum = computeRight();...sum = leftSum + rightSum;

for locale in Locales dobegin myMain(locale);

More Chapel Features

Locales and user-defined distributionsObjects (classes, records, and unions)Generics and local type inferenceTuplesSequences and iteratorsReductions and scans (parallel prefix)Default arguments, name-based argument passingFunction and operator overloadingModules (namespace management)Atomic transactionsAutomatic garbage collection

Outline

Chapel Features– Domains and Arrays– Cobegin, Begin, and Synchronization Variables

3-Point Stencil Case Study– MPI– Co-Array Fortran– Unified Parallel C– Chapel

3-Point Stencil Case Study

Compute the counting numbers via iterative averaging

Initial condition 0 0 0 0 4Iteration 1 0 0 0 2 4

Iteration 2 0 0 1 2 4Iteration 3 0 0.5 1 2.5 4

Iteration 4 0 0.5 1.5 2.5 4Iteration 5 0 0.75 1.5 2.75 4Iteration 6 0 0.75 1.75 2.75 4

Iteration K 0 1 2 3 4

Array Allocation

my_n = n * (me+1)/p – n*me/p;A = (double *)malloc((my_n+2)*sizeof(double));

C + MPI

MY_N = N*ME/P – N*(ME-1)/PALLOCATE(A(0:MY_N+1)[*])

A = (shared double*)upc_all_alloc((n+2), sizeof(double));

var D : domain(1) distributed(Block) = [0..n+1];var InnerD : subdomain(D) = [1..n];var A : [D] float;

Co-Array Fortran

Unified Parallel C

Chapelglobal view

fragmented view

Array Initialization

for (i = 0; i <= my_n; i++)A[i] = 0.0;

if (me == p - 1)A[my_n+1] = n + 1.0;

C + MPIA(:) = 0.0IF (ME .EQ. P) THEN

A(MY_N+1) = N + 1.0ENDIF

upc_forall (i = 0; i <= n; i++; &A[i])A[i] = 0.0;

if (MYTHREAD == THREADS-1)A[n+1] = n + 1.0;

A(0..n) = 0.0;A(n+1) = n + 1.0;

Co-Array Fortran

Unified Parallel C

Chapel

Stencil Computation

if (me < p-1)MPI_Send(&(A[my_n]), 1, MPI_DOUBLE, me+1, 1, MPI_COMM_WORLD);

if (me > 0)MPI_Recv(&(A[0]), 1, MPI_DOUBLE, me-1, 1, MPI_COMM_WORLD, &status);

if (me > 0)MPI_Send(&(A[1]), 1, MPI_DOUBLE, me-1, 1, MPI_COMM_WORLD);

if (me < p-1)MPI_Recv(&(A[my_n+1]), 1, MPI_DOUBLE, me+1, 1, MPI_COMM_WORLD, &status);

for (i = 1; i <= my_n; i++)Tmp[i] = (A[i-1]+A[i+1])/2.0;

C + MPI

CALL SYNC_ALL()IF (ME .LT. P) THEN

A(MY_N+1) = A(1)[ME+1]A(0)[ME+1] = A(MY_N)

ENDIFCALL SYNC_ALL()DO I = 1,MY_N

TMP(I) = (A(I-1)+A(I+1))/2.0ENDDO

upc_barrier;upc_forall (i = 1; i <= n; i++; &A[i])

Tmp[i] = (A[i-1]+A[i+1])/2.0;

forall i in InnerD doTmp(i) = (A(i-1)+A(i+1))/2.0;

Co-Array Fortran Unified Parallel C

Chapel

Reduction

my_delta = 0.0;for (i = 1; i <= my_n; i++)

my_delta += fabs(A[i]-Tmp[i]);MPI_Allreduce(&my_delta, &delta, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);

C + MPI

MY_DELTA = 0.0DO I = 1,MY_N

MY_DELTA = MY_DELTA + ABS(A(I)-TMP(I))ENDDOCALL SYNC_ALL()DELTA = 0.0DO I = 1,P

DELTA = DELTA + MY_DELTA[I]ENDDO

my_delta = 0.0;upc_forall (i = 1; i <= n; i++; &A[i])

my_delta += fabs(A[i]-Tmp[i]);upc_lock(lock);delta = delta + my_delta;upc_unlock(lock);

delta = + reduce abs(A(InnerD)-Tmp(InnerD));

Chapel

Reduction

my_delta = 0.0;for (i = 1; i <= my_n; i++)

my_delta += fabs(A[i]-Tmp[i]);MPI_Allreduce(&my_delta, &delta, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);

C + MPI

MY_DELTA = 0.0DO I = 1,MY_N

MY_DELTA = MY_DELTA + ABS(A(I)-TMP(I))ENDDOCALL SYNC_ALL()DELTA = 0.0DO I = 1,P

DELTA = DELTA + MY_DELTA[I]ENDDO

my_delta = 0.0;upc_forall (i = 1; i <= n; i++; &A[i])

my_delta += fabs(A[i]-Tmp[i]);upc_lock(lock);delta = delta + my_delta;upc_unlock(lock);

Chapeldelta = 0.0;forall i in InnerD do

atomic delta += abs(A(i)-Tmp(i));

Output

if (me == 0)printf("Iterations: %d\n", iters);

C + MPI

IF (ME .EQ. 1) THENWRITE(*,*) 'Iterations: ', ITERS

if (MYTHREAD == 0)printf("Iterations: %d\n", iters);

writeln("Iterations: ", iters);

Co-Array Fortran

Unified Parallel C

Chapel

Case Study Summary

global view of computationglobal view of datadata distributionsarbitrary data distributionsmulti-dimensional arrayssyntactic execution modelone-sided communicationsupport for reductionssimple mechanisms

CAFUPCCha

languages

desirable features

Implementation Status

Serial implementation– Covers over 95% of features in Chapel– Prototype quality

Non-distributed threaded implementation– Implements sync variables, single variables, begin,

and cobegin

For More Information

http://chapel.cs.washington.edu/

chapel_info@cray.comdeitz@cray.com

Parallel Programming in Chapel - Interdisciplinaryestrabd/LACSI2006/Programming Models/deitz.pdf–...

Documents

Transcript of Parallel Programming in Chapel - Interdisciplinaryestrabd/LACSI2006/Programming Models/deitz.pdf–...

Obsolete FORTRAN 77 features in FORTRAN 90 code

Fortran 2015 and Coarrays in GNU Fortran · Fortran 2015 and Coarrays in GNU Fortran Salvatore Filippone Cran eld University salvatore.filippone@cranfield.ac.uk Fortran Specialist

Using Semi-Permeable Membrane Device (SPMD) …...Using Semi-Permeable Membrane Device (SPMD) Samplers in Cave Environments to Detect PCBs Presented by: Julie Schaer 1 1 Beaver Pond

CUDA Fortran · 2016. 6. 27. · CUDA Fortran includes a Fortran 2003 compiler and tool chain for programming NVIDIA GPUs using Fortran. PGI 2011 includes support for CUDA Fortran

Fortran Programming Guide FORTRAN 77 5.0 — Fortran 90 2talby.rcs.manchester.ac.uk/~isd/_course_fortran_90/_docs/Sun_C... · and f90(Fortran 90 version 2.0). It presents issues relating

Fortran 90 Features - Ohio State University€¦ · Fortran 90 Features 4 Fortran 90 Objectives Language evolution - Obsolescent features ... Programming in Fortran 90, JS Morgan

Fortran 90 Handbook - USTCmicro.ustc.edu.cn/Fortran/Fortran 90 Handbook.pdf · The Fortran 90 Handbook is a deﬁnitive and comprehensive guide to Fortran 90 ... Fortran 90, including

Introduction to Modern Fortran - University of Cambridgepeople.ds.cam.ac.uk/nmm1/Fortran/paper_01.pdf · (Fortran Market, PDF, $15) ... Fortran 95/2003 Explained by Michael Metcalf,

Design of Data Management for Multi SPMD Workﬂow ...

Message-Passing Programming with MPI · Concepts if/then/else Languages Java Fortran struct Python C/C++ Subroutines Implementations icc pgcc -fast crayftn gcc -O3 OO Processes SPMD

CUDA Fortran - softek.co.jp · CUDA Fortran includes a Fortran 2003 compiler and tool chain for programming NVIDIA GPUs using Fortran. PGI 2010 includes support for CUDA Fortran on

Using GNU Fortran - GNU Compiler Collection · 2016. 8. 3. · iv The GNU Fortran Compiler 4 Fortran 2003 and 2008 Status::::: 33 4.1 Fortran 2003 status ::::: 33 4.2 Fortran 2008

Managing Hierarchy with Teams in the SPMD Programming Modelweb.eecs.umich.edu/~akamil/papers/padal14talk.pdf · Managing Hierarchy with Teams in the SPMD Programming Model Amir Kamil

Fortran Quick Guide · Most commonly used versions today are : Fortran 77, Fortran 90, and Fortran 95. Fortran 77 added strings as a distinct type. Fortran 90 added various sorts

Introduction to coarray Fortran using GNU Fortran · Modern Fortran FORTRAN 77 is usually what people think when you say \Fortran". Since Fortran 90 the language has evolved in order

Notes: Parallel MATLAB · An alternate method to perform tasks in parallel is to use ‘single program multiple data’ (spmd in MATLAB) programming. spmd programming allows a ner

Fortran 90 Handbookmicro.ustc.edu.cn/Fortran/Fortran%2090%20Handbook.pdf · 2007. 9. 12. · iii Preface The Fortran 90 Handbook is a deﬁnitive and comprehensive guide to Fortran

CALLING FORTRAN SUBROUTINES FROM FORTRAN, C, AND C++

Integrated runtime measurement ... - Interdisciplinary › ~estrabd › LACSI2006 › ...EPIK components Integrated runtime measurement library incorporating EPIK: Event preparation

Managing Hierarchy with Teams in the SPMD Programming Modelakamil/papers/padal14talk.pdf · Single Program, Multiple Data Model •The single program, multiple data (SPMD) execution