1 Parallel Programming With OpenMP. 2 Contents Overview of Parallel Programming & OpenMP ...

21
1 Parallel Programming With OpenMP

Transcript of 1 Parallel Programming With OpenMP. 2 Contents Overview of Parallel Programming & OpenMP ...

Page 1: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

1

Parallel Programming With OpenMP

Page 2: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

2

Contents Overview of Parallel Programming & OpenMP Difference between OpenMP & MPI OpenMP Programming Model OpenMP Environment Variable OpenMP Clauses OpenMP Runtime Routines General Code Structure & Sample Examples Pros & Cons Of OpenMP Performance of one program (Serial vs Parallel)

Page 3: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

3

Parallel Programming Decomposes an algorithm or data into parts, which

are processed by multiple processors simultaneously.

Co-ordinates work and communication between processors.

Threaded applications are ideal for multi-core.

OpenMP Open specifications for Multi Processing, based on

a thread paradigm. 3 primary component (Compiler Directives,

Runtime Library Routines, Environment Variables).

– Extensions for Fortran, C, C++

Page 4: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

4

OpenMP vs MPI OpenMP :

Shared Memory Model Directive Based Easier to program & debug Supported by gcc4.2 & higher

MPI : Distributed Memory Model Message Passing Style More flexible & scalable Supported by MPICH2 library

Page 5: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

6

OpenMP Programming Model Shared Memory, Thread Based Parallelism. Explicit Parallelism. For-Join Model

– Execution starts with one thread – master thread.

– Parallel regions fork off new threads on entry – team thread.

– Thread join back together at the end of the region – only master thread continues.

Page 6: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

7

OpenMP Environment Variables

OMP_SCHEDULE OMP_NUM_THREADS OMP_DYNAMIC OMP_NESTED OMP_THREAD_LIMIT OMP_STACKSIZE

Page 7: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

8

OpenMP Clauses Data Scoping Clauses (shared, private, default) InitializationClauses (firstprivate, lastprivate,

threadprivate) Data Copying Clauses (copyin, copyprivate) Worksharing Clauses (do/for directive, sections

directive, single directive, parallel do/for, parallel sections)

Scheduling Clauses (static, dynamic, guided) Synchronization Clauses (master, critical,

atomic, ordered, barrier, nowait, flush) Reduction Clause (operator: list)

Page 8: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

9

OpenMP Runtime Routines To set & get number of threads :

– OMP_SET_NUM_THREADS– OMP_GET_NUM_THREADS

To get the thread number of a thread, in a team– OMP_GET_THREAD_NUM

To get the number of processors available to the program– OMP_GET_NUM_PROCS

OMP_IN_PARALLEL To enable or disable dynamic adjustment of the

number of threads– OMP_SET_DYNAMIC

Page 9: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

10

OpenMP Runtime Routines Cont. To determine if dynamic thread adjustment is

enabled or not.– OMP_GET_DYNAMIC

To initialise and disassociates a lock associated with the lock variable.– OMP_INIT_LOCK– OMP_DESTROY_LOCK

To own and release a lock– OMP_SET_LOCK– OMP_UNSET_LOCK

To use clock timing routine– OMP_GET_WTICK

Page 10: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

11

General Code Structure#include <omp.h>

main () {

int var1, var2, var3;

// Serial code

// Beginning of parallel section.

// Specify variable scoping

#pragma omp parallel private(var1, var2) shared(var3) {

// Parallel section executed by all threads

// All threads join master thread and disband

}

Resume serial code

}

omp keyword distinguishes the pragma as a OpenMP pragma and is processed by OpenMP compilers.

Page 11: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

12

Parallel Region Example#include <omp.h>

main () {

int nthreads, tid;

/* Fork a team of threads

#pragma omp parallel private(tid) {tid = omp_get_thread_num(); /* Obtain thread id */

printf("Hello World from thread = %d\n", tid);

if (tid == 0) { /* Only master thread does this */

nthreads = omp_get_num_threads();

printf("Number of threads = %d\n", nthreads);

}

} /* All threads join master thread and terminate */

}

Page 12: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

13

“for” Directive Example#include <omp.h>

#define CHUNKSIZE 10

#define N 100

main () {

int i, chunk;

float a[N], b[N], c[N];

for (i=0; i < N; i++)

a[i] = b[i] = i * 1.0;

chunk = CHUNKSIZE;

#pragma omp parallel shared(a,b,c,chunk) private(i) {

#pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i];

} /* end of parallel section */

}

Page 13: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

14

“sections” directive example#include <omp.h>

#define N 1000

main () {

int i;

float a[N], b[N], c[N], d[N];

for (i=0; i < N; i++) {

a[i] = i * 1.5; b[i] = i + 22.35;}

#pragma omp parallel shared(a,b,c,d) private(i) {

#pragma omp sections nowait { #pragma omp section for (i=0; i < N; i++) c[i] = a[i] + b[i];

#pragma omp section

for (i=0; i < N; i++) d[i] = a[i] * b[i];

} /* end of sections */} /* end of parallel section */

}

Page 14: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

15

“critical” Directive Example

#include <omp.h>

main() {

int x;

x = 0;

#pragma omp parallel shared(x) {

#pragma omp critical x = x + 1;

} /* end of parallel section */

}

Page 15: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

16

“threadprivate” Directive Example#include <omp.h>

int a, b, i, tid; float x;

#pragma omp threadprivate(a, x)

main () {

/* Explicitly turn off dynamic threads */

omp_set_dynamic(0);

printf("1st Parallel Region:\n");

#pragma omp parallel private(b,tid) {

tid = omp_get_thread_num();

a = tid; b = tid; x = 1.1 * tid +1.0;

printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x);

} /* end of parallel section */

printf("Master thread doing serial work here\n");

printf("2nd Parallel Region:\n");

#pragma omp parallel private(tid {

tid = omp_get_thread_num();

printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x);

} /* end of parallel section */

}

Page 16: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

17

“reduction” Clause Example

#include <omp.h>

main () {

int i, n, chunk;

float a[100], b[100], result;

n = 100 ; chunk = 10 ; result = 0.0 ;

for (i=0; i < n; i++) {

a[i] = i * 1.0 ; b[i] = i * 2.0;

}

#pragma omp parallel for default(shared) private(i) schedule(static,chunk) reduction(+:result)

for (i=0; i < n; i++)

result = result + (a[i] * b[i]);

printf("Final result= %f\n",result);

}

Page 17: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

18

OpenMP - Pros and Cons Pros :

Simple Incremental Parallelism. Decomposition is handled automatically. Unified code for both serial and parallel applications.

Cons : Runs only on shared-memory multiprocessor. Scalability is limited by memory architecture. Reliable error handling is missing.

Page 18: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

19

Performance of “arrayUpdate.c”Test Done on 2 GHz Intel Core 2 Duo With 1 GB 667 MHz DDR2 SDRAM

Array Size Serial (sec) Parallel (sec)

1000 0.000221 0.000389

5000 0.001060 0.000999

10000 0.002201 0.001323

50000 0.011266 0.005892

100000 0.22638 0.011715

500000 0.114033 0.068110

1000000 0.227713 0.123106

5000000 1.134773 0.579176

10000000 2.307644 1.151099

50000000 12.536466 5.772921

100000000 194.245929 58.532328

Page 19: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

20

arrayUpdate.c Cont.

0 20000 40000 60000 80000 100000 1200000

50

100

150

200

250

Test Done on 2 GHz Intel Core 2 Duo With 1 GB 667 MHz DDR2 SDRAM

Serial (sec)

Parallel (sec)

Array Size (in 1000s)

Tim

e (s

eco

nd

s)

Page 20: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

References

• http://www.openmp.org/• Parallel Programming in OpenMP,

Morgan Kaufman Publishers.

Page 21: 1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

22

Thank You