1 Parallel Programming With OpenMP. 2 Contents Overview of Parallel Programming & OpenMP ...

Post on 26-Dec-2015

242 views 5 download

Transcript of 1 Parallel Programming With OpenMP. 2 Contents Overview of Parallel Programming & OpenMP ...

1

Parallel Programming With OpenMP

2

Contents Overview of Parallel Programming & OpenMP Difference between OpenMP & MPI OpenMP Programming Model OpenMP Environment Variable OpenMP Clauses OpenMP Runtime Routines General Code Structure & Sample Examples Pros & Cons Of OpenMP Performance of one program (Serial vs Parallel)

3

Parallel Programming Decomposes an algorithm or data into parts, which

are processed by multiple processors simultaneously.

Co-ordinates work and communication between processors.

Threaded applications are ideal for multi-core.

OpenMP Open specifications for Multi Processing, based on

a thread paradigm. 3 primary component (Compiler Directives,

Runtime Library Routines, Environment Variables).

– Extensions for Fortran, C, C++

4

OpenMP vs MPI OpenMP :

Shared Memory Model Directive Based Easier to program & debug Supported by gcc4.2 & higher

MPI : Distributed Memory Model Message Passing Style More flexible & scalable Supported by MPICH2 library

6

OpenMP Programming Model Shared Memory, Thread Based Parallelism. Explicit Parallelism. For-Join Model

– Execution starts with one thread – master thread.

– Parallel regions fork off new threads on entry – team thread.

– Thread join back together at the end of the region – only master thread continues.

7

OpenMP Environment Variables

OMP_SCHEDULE OMP_NUM_THREADS OMP_DYNAMIC OMP_NESTED OMP_THREAD_LIMIT OMP_STACKSIZE

8

OpenMP Clauses Data Scoping Clauses (shared, private, default) InitializationClauses (firstprivate, lastprivate,

threadprivate) Data Copying Clauses (copyin, copyprivate) Worksharing Clauses (do/for directive, sections

directive, single directive, parallel do/for, parallel sections)

Scheduling Clauses (static, dynamic, guided) Synchronization Clauses (master, critical,

atomic, ordered, barrier, nowait, flush) Reduction Clause (operator: list)

9

OpenMP Runtime Routines To set & get number of threads :

– OMP_SET_NUM_THREADS– OMP_GET_NUM_THREADS

To get the thread number of a thread, in a team– OMP_GET_THREAD_NUM

To get the number of processors available to the program– OMP_GET_NUM_PROCS

OMP_IN_PARALLEL To enable or disable dynamic adjustment of the

number of threads– OMP_SET_DYNAMIC

10

OpenMP Runtime Routines Cont. To determine if dynamic thread adjustment is

enabled or not.– OMP_GET_DYNAMIC

To initialise and disassociates a lock associated with the lock variable.– OMP_INIT_LOCK– OMP_DESTROY_LOCK

To own and release a lock– OMP_SET_LOCK– OMP_UNSET_LOCK

To use clock timing routine– OMP_GET_WTICK

11

General Code Structure#include <omp.h>

main () {

int var1, var2, var3;

// Serial code

// Beginning of parallel section.

// Specify variable scoping

#pragma omp parallel private(var1, var2) shared(var3) {

// Parallel section executed by all threads

// All threads join master thread and disband

}

Resume serial code

}

omp keyword distinguishes the pragma as a OpenMP pragma and is processed by OpenMP compilers.

12

Parallel Region Example#include <omp.h>

main () {

int nthreads, tid;

/* Fork a team of threads

#pragma omp parallel private(tid) {tid = omp_get_thread_num(); /* Obtain thread id */

printf("Hello World from thread = %d\n", tid);

if (tid == 0) { /* Only master thread does this */

nthreads = omp_get_num_threads();

printf("Number of threads = %d\n", nthreads);

}

} /* All threads join master thread and terminate */

}

13

“for” Directive Example#include <omp.h>

#define CHUNKSIZE 10

#define N 100

main () {

int i, chunk;

float a[N], b[N], c[N];

for (i=0; i < N; i++)

a[i] = b[i] = i * 1.0;

chunk = CHUNKSIZE;

#pragma omp parallel shared(a,b,c,chunk) private(i) {

#pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i];

} /* end of parallel section */

}

14

“sections” directive example#include <omp.h>

#define N 1000

main () {

int i;

float a[N], b[N], c[N], d[N];

for (i=0; i < N; i++) {

a[i] = i * 1.5; b[i] = i + 22.35;}

#pragma omp parallel shared(a,b,c,d) private(i) {

#pragma omp sections nowait { #pragma omp section for (i=0; i < N; i++) c[i] = a[i] + b[i];

#pragma omp section

for (i=0; i < N; i++) d[i] = a[i] * b[i];

} /* end of sections */} /* end of parallel section */

}

15

“critical” Directive Example

#include <omp.h>

main() {

int x;

x = 0;

#pragma omp parallel shared(x) {

#pragma omp critical x = x + 1;

} /* end of parallel section */

}

16

“threadprivate” Directive Example#include <omp.h>

int a, b, i, tid; float x;

#pragma omp threadprivate(a, x)

main () {

/* Explicitly turn off dynamic threads */

omp_set_dynamic(0);

printf("1st Parallel Region:\n");

#pragma omp parallel private(b,tid) {

tid = omp_get_thread_num();

a = tid; b = tid; x = 1.1 * tid +1.0;

printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x);

} /* end of parallel section */

printf("Master thread doing serial work here\n");

printf("2nd Parallel Region:\n");

#pragma omp parallel private(tid {

tid = omp_get_thread_num();

printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x);

} /* end of parallel section */

}

17

“reduction” Clause Example

#include <omp.h>

main () {

int i, n, chunk;

float a[100], b[100], result;

n = 100 ; chunk = 10 ; result = 0.0 ;

for (i=0; i < n; i++) {

a[i] = i * 1.0 ; b[i] = i * 2.0;

}

#pragma omp parallel for default(shared) private(i) schedule(static,chunk) reduction(+:result)

for (i=0; i < n; i++)

result = result + (a[i] * b[i]);

printf("Final result= %f\n",result);

}

18

OpenMP - Pros and Cons Pros :

Simple Incremental Parallelism. Decomposition is handled automatically. Unified code for both serial and parallel applications.

Cons : Runs only on shared-memory multiprocessor. Scalability is limited by memory architecture. Reliable error handling is missing.

19

Performance of “arrayUpdate.c”Test Done on 2 GHz Intel Core 2 Duo With 1 GB 667 MHz DDR2 SDRAM

Array Size Serial (sec) Parallel (sec)

1000 0.000221 0.000389

5000 0.001060 0.000999

10000 0.002201 0.001323

50000 0.011266 0.005892

100000 0.22638 0.011715

500000 0.114033 0.068110

1000000 0.227713 0.123106

5000000 1.134773 0.579176

10000000 2.307644 1.151099

50000000 12.536466 5.772921

100000000 194.245929 58.532328

20

arrayUpdate.c Cont.

0 20000 40000 60000 80000 100000 1200000

50

100

150

200

250

Test Done on 2 GHz Intel Core 2 Duo With 1 GB 667 MHz DDR2 SDRAM

Serial (sec)

Parallel (sec)

Array Size (in 1000s)

Tim

e (s

eco

nd

s)

References

• http://www.openmp.org/• Parallel Programming in OpenMP,

Morgan Kaufman Publishers.

22

Thank You