Post on 26-Dec-2015
1
Parallel Programming With OpenMP
2
Contents Overview of Parallel Programming & OpenMP Difference between OpenMP & MPI OpenMP Programming Model OpenMP Environment Variable OpenMP Clauses OpenMP Runtime Routines General Code Structure & Sample Examples Pros & Cons Of OpenMP Performance of one program (Serial vs Parallel)
3
Parallel Programming Decomposes an algorithm or data into parts, which
are processed by multiple processors simultaneously.
Co-ordinates work and communication between processors.
Threaded applications are ideal for multi-core.
OpenMP Open specifications for Multi Processing, based on
a thread paradigm. 3 primary component (Compiler Directives,
Runtime Library Routines, Environment Variables).
– Extensions for Fortran, C, C++
4
OpenMP vs MPI OpenMP :
Shared Memory Model Directive Based Easier to program & debug Supported by gcc4.2 & higher
MPI : Distributed Memory Model Message Passing Style More flexible & scalable Supported by MPICH2 library
6
OpenMP Programming Model Shared Memory, Thread Based Parallelism. Explicit Parallelism. For-Join Model
– Execution starts with one thread – master thread.
– Parallel regions fork off new threads on entry – team thread.
– Thread join back together at the end of the region – only master thread continues.
7
OpenMP Environment Variables
OMP_SCHEDULE OMP_NUM_THREADS OMP_DYNAMIC OMP_NESTED OMP_THREAD_LIMIT OMP_STACKSIZE
8
OpenMP Clauses Data Scoping Clauses (shared, private, default) InitializationClauses (firstprivate, lastprivate,
threadprivate) Data Copying Clauses (copyin, copyprivate) Worksharing Clauses (do/for directive, sections
directive, single directive, parallel do/for, parallel sections)
Scheduling Clauses (static, dynamic, guided) Synchronization Clauses (master, critical,
atomic, ordered, barrier, nowait, flush) Reduction Clause (operator: list)
9
OpenMP Runtime Routines To set & get number of threads :
– OMP_SET_NUM_THREADS– OMP_GET_NUM_THREADS
To get the thread number of a thread, in a team– OMP_GET_THREAD_NUM
To get the number of processors available to the program– OMP_GET_NUM_PROCS
OMP_IN_PARALLEL To enable or disable dynamic adjustment of the
number of threads– OMP_SET_DYNAMIC
10
OpenMP Runtime Routines Cont. To determine if dynamic thread adjustment is
enabled or not.– OMP_GET_DYNAMIC
To initialise and disassociates a lock associated with the lock variable.– OMP_INIT_LOCK– OMP_DESTROY_LOCK
To own and release a lock– OMP_SET_LOCK– OMP_UNSET_LOCK
To use clock timing routine– OMP_GET_WTICK
11
General Code Structure#include <omp.h>
main () {
int var1, var2, var3;
// Serial code
// Beginning of parallel section.
// Specify variable scoping
#pragma omp parallel private(var1, var2) shared(var3) {
// Parallel section executed by all threads
// All threads join master thread and disband
}
Resume serial code
}
omp keyword distinguishes the pragma as a OpenMP pragma and is processed by OpenMP compilers.
12
Parallel Region Example#include <omp.h>
main () {
int nthreads, tid;
/* Fork a team of threads
#pragma omp parallel private(tid) {tid = omp_get_thread_num(); /* Obtain thread id */
printf("Hello World from thread = %d\n", tid);
if (tid == 0) { /* Only master thread does this */
nthreads = omp_get_num_threads();
printf("Number of threads = %d\n", nthreads);
}
} /* All threads join master thread and terminate */
}
13
“for” Directive Example#include <omp.h>
#define CHUNKSIZE 10
#define N 100
main () {
int i, chunk;
float a[N], b[N], c[N];
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
chunk = CHUNKSIZE;
#pragma omp parallel shared(a,b,c,chunk) private(i) {
#pragma omp for schedule(dynamic,chunk) nowait for (i=0; i < N; i++) c[i] = a[i] + b[i];
} /* end of parallel section */
}
14
“sections” directive example#include <omp.h>
#define N 1000
main () {
int i;
float a[N], b[N], c[N], d[N];
for (i=0; i < N; i++) {
a[i] = i * 1.5; b[i] = i + 22.35;}
#pragma omp parallel shared(a,b,c,d) private(i) {
#pragma omp sections nowait { #pragma omp section for (i=0; i < N; i++) c[i] = a[i] + b[i];
#pragma omp section
for (i=0; i < N; i++) d[i] = a[i] * b[i];
} /* end of sections */} /* end of parallel section */
}
15
“critical” Directive Example
#include <omp.h>
main() {
int x;
x = 0;
#pragma omp parallel shared(x) {
#pragma omp critical x = x + 1;
} /* end of parallel section */
}
16
“threadprivate” Directive Example#include <omp.h>
int a, b, i, tid; float x;
#pragma omp threadprivate(a, x)
main () {
/* Explicitly turn off dynamic threads */
omp_set_dynamic(0);
printf("1st Parallel Region:\n");
#pragma omp parallel private(b,tid) {
tid = omp_get_thread_num();
a = tid; b = tid; x = 1.1 * tid +1.0;
printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x);
} /* end of parallel section */
printf("Master thread doing serial work here\n");
printf("2nd Parallel Region:\n");
#pragma omp parallel private(tid {
tid = omp_get_thread_num();
printf("Thread %d: a,b,x= %d %d %f\n",tid,a,b,x);
} /* end of parallel section */
}
17
“reduction” Clause Example
#include <omp.h>
main () {
int i, n, chunk;
float a[100], b[100], result;
n = 100 ; chunk = 10 ; result = 0.0 ;
for (i=0; i < n; i++) {
a[i] = i * 1.0 ; b[i] = i * 2.0;
}
#pragma omp parallel for default(shared) private(i) schedule(static,chunk) reduction(+:result)
for (i=0; i < n; i++)
result = result + (a[i] * b[i]);
printf("Final result= %f\n",result);
}
18
OpenMP - Pros and Cons Pros :
Simple Incremental Parallelism. Decomposition is handled automatically. Unified code for both serial and parallel applications.
Cons : Runs only on shared-memory multiprocessor. Scalability is limited by memory architecture. Reliable error handling is missing.
19
Performance of “arrayUpdate.c”Test Done on 2 GHz Intel Core 2 Duo With 1 GB 667 MHz DDR2 SDRAM
Array Size Serial (sec) Parallel (sec)
1000 0.000221 0.000389
5000 0.001060 0.000999
10000 0.002201 0.001323
50000 0.011266 0.005892
100000 0.22638 0.011715
500000 0.114033 0.068110
1000000 0.227713 0.123106
5000000 1.134773 0.579176
10000000 2.307644 1.151099
50000000 12.536466 5.772921
100000000 194.245929 58.532328
20
arrayUpdate.c Cont.
0 20000 40000 60000 80000 100000 1200000
50
100
150
200
250
Test Done on 2 GHz Intel Core 2 Duo With 1 GB 667 MHz DDR2 SDRAM
Serial (sec)
Parallel (sec)
Array Size (in 1000s)
Tim
e (s
eco
nd
s)
References
• http://www.openmp.org/• Parallel Programming in OpenMP,
Morgan Kaufman Publishers.
22
Thank You