Programming Multicores with Pthreads and OpenMPnikos/mpw/slides/Nikos-Multicore.pdf · pthread”...
Transcript of Programming Multicores with Pthreads and OpenMPnikos/mpw/slides/Nikos-Multicore.pdf · pthread”...
May 4, 2005
Programming Multicores with
Pthreads and OpenMP
Nikos P. Pitsianis [email protected]
Xiaobai Sun Bo Zhang
Duke University
Outline
• Programming with Threads
– Embarrassingly Parallel (Pleasantly Parallel) – Critical Sections (Mutual Exclusion) – Data Dependent Task Parallelism (Condition Variables & Signals)
• Quick Introduction to OpenMP Programming
Sep 29, 2010 Multicore Programming Workshop 2
Duke University
What is a thread?
• Process: • a program that is running • an address space with 1 or more threads executing within the same
address space, and the required system resources for those threads
• Thread: • a sequence of control within a process • shares the resources in that process
• We cover here Posix Threads (Pthreads) – widely supported threads programming API
• Compile with “gcc -‐pthread” – This also forces the compiler to link in thread-safe libraries
Sep 29, 2010 3 Multicore Programming Workshop
Duke University
Process
• Process: a program in running
• a single address space • one or more threads executing
within that address space
• required system resources for those threads
• Each process can have multiple
threads, even on a single-core processor
Sep 29, 2010 Multicore Programming Workshop 4
Duke University
Sep 29, 2010
Threads
• Thread: a sequence of control within a process
• All threads per process share:
• memory (program code and global data)
• open file/socket descriptors • signal handlers and signal
dispositions • working environment • Threads communicate using
shared memory
5
Multicore Programming Workshop
Duke University
Advantages and Disadvantages
• Advantages:
• creating a thread is significantly faster than creating a process • switching between threads is faster than switching between
processes • writing multithreaded programs is easier
• Disadvantages :
• writing multithreaded programs is harder • more difficult to debug than single threaded programs
Sep 29, 2010 6 Multicore Programming Workshop
Duke University
Outline
• Programming with Threads
– Embarrassingly Parallel (Pleasantly Parallel) – Critical Sections (Mutual Exclusion) – Data Dependent Task Parallelism (Condition Variables & Signals)
• Quick Introduction to OpenMP Programming
Sep 29, 2010 Multicore Programming Workshop 7
Duke University
Example 0 sequential code, as default single thread
#include <stdlib.h>!#include <stdio.h>!!void getvec (double *a);!!double dotprod (double *a, double *b, int n) {! int i;! double s = 0.0;! for ( i = 0; i < n; i++ ) ! s += a[i]*b[i];! return s;!}!!int main () {! double *a, *b;!! a = (double *) malloc(sizeof(double)*N);! b = (double *) malloc(sizeof(double)*N);!! getvec(a); getvec(b);!! double dp = dotprod(a,b,N);! printf("%f\n", dp);!}!!
Sep 29, 2010 Multicore Programming Workshop 8
Source: www.cs.duke.edu/~nikos/mpw/dp0.c
Compile:
gcc –D N=1024 –O4 dp0.c –o dp0
Run: ./dp0
Duke University
Example 1 sequential code as a separate thread
#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!void getvec (double *a);!double dotprod (double *a, double *b, int n);!!typedef struct {! double *a, *b;! int n;!} dparg;!!void *wrapper (void *arg) {! double *ap, *bp, s;! int nn;! ap = ((dparg *) arg)->a;! bp = ((dparg *) arg)->b;! nn = ((dparg *) arg)->n;!! s = dotprod(ap, bp, nn);! printf("%f\n", s);!}!!!
Sep 29, 2010 Multicore Programming Workshop 9
Source: www.cs.duke.edu/~nikos/mpw/dp1.c
Compile:
gcc –pthread –D N=1024 –O4 dp1.c –o dp1
Run: ./dp1
Duke University
Example 1 sequential code as a separate thread
#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!void getvec (double *a);!double dotprod (double *a, double *b, int n);!void *wrapper (void *arg);! !int main () {!double *a, *b;! pthread_t thread;! dparg arg;!! a = (double *) malloc(sizeof(double)*N);! b = (double *) malloc(sizeof(double)*N);!! getvec(a); getvec(b);!! arg.a = a;! arg.b = b;! arg.n = n;!! pthread_create (&thread, NULL, wrapper, (void *)
&arg);! pthread_join (thread, NULL);!}!!! Sep 29, 2010 Multicore Programming Workshop 10
Source: www.cs.duke.edu/~nikos/mpw/dp1.c
Compile:
gcc –pthread –D N=1024 –O4 dp1.c –o dp1
Run: ./dp1
Duke University
Thread Creation & Termination
pthread_create( pthread_t * tid, const pthread_attr_t * attr, void *(*func)(void *), void * arg);
func is the function to be called. When func() returns the thread is terminated
Sep 29, 2010 Multicore Programming Workshop 11
Duke University
Thread Creation Arguments
• Arguments are passed to thread library by creating a structure and passing the address of the structure
• Thread attributes can be set using a*r, – Joinable or detached state – scheduling policy – NULL for system defaults
Sep 29, 2010 Multicore Programming Workshop 12
Duke University
Sep 29, 2010
Thread Lifespan
• Once a thread is created
– it starts executing the function func() – func)) is an argument passed to pthread_create()
• The thread is terminated
– when func() returns, or – by pthread_exit()
• All threads are terminated
– when main() returns or – any thread calls exit()
13 Multicore Programming Workshop
Duke University
Sep 29, 2010
Joinable and Detached State
• Each thread can be either joinable or detached.
• Joinable: – on its termination the thread ID and exit status are saved
• Detached: – on its termination all resources used by the thread are released – A detached thread cannot be joined
• A thread can "join" another by calling pthread_join – The caller blocks until a specified thread exits.
int pthread_join( pthread_t 2d, void **status);
14 Multicore Programming Workshop
Duke University
Example 2 with multiple threads
#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!typedef struct { double *a, *b, s; int n, tid;! } dparg;!!double dotprod (double *a, double *b, int n, int tid) {! int i;! double s = 0.0;! int block = n/NTHREADS;!! for ( i = tid*block; i < (tid+1)*block; i++) ! s += a[i]*b[i];! return s;!}!!void * wrapper (void *arg) {! double *ap, *bp, s;! int nn, tid;! ap = ((dparg *) arg)->a;! bp = ((dparg *) arg)->b;! nn = ((dparg *) arg)->n;! tid = ((dparg *) arg)->tid;!! ((dparg *) arg)->s = dotprod(ap, bp, nn, tid);!}!!! !!!
Sep 29, 2010 Multicore Programming Workshop 15
Source: www.cs.duke.edu/~nikos/mpw/dp2.c
Compile:
gcc –pthread –D NTHREADS=8 –D N=1024 \ –O4 dp2.c –o dp2
Run: ./dp2
Duke University
Example 2 with multiple threads
#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!int main () {! double *a, *b, dp;! pthread_t thread[NTHREADS];! dparg arg[NTHREADS];! int i;! a = (double *) malloc(sizeof(double)*N);! b = (double *) malloc(sizeof(double)*N);! getvec(a); getvec(b);!! for (i=0; i<NTHREADS; i++) {! arg[i].a = a; arg[i].b = b;! arg[i].n = n; arg[i].tid = i;!! pthread_create (&thread[i], NULL, wrapper, ! (void *)&arg[i]);! }! dp = 0.0;! for (i=0; i<NTHREADS; i++) {! rc = pthread_join (thread[i], NULL);! dp += arg[i].s;! }! printf("%f\n", dp);!}!!! !!!
Sep 29, 2010 Multicore Programming Workshop 16
Source: www.cs.duke.edu/~nikos/mpw/dp2.c
Compile:
gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp2.c –o dp2
Run: ./dp2
Duke University
Outline
• Programming with Threads
– Embarrassingly Parallel (Pleasantly Parallel)
– Critical Sections (Mutual Exclusion)
– Data Dependent Task Parallelism (Condition Variables & Signals)
• Quick Introduction to OpenMP Programming
Sep 29, 2010 Multicore Programming Workshop 17
Duke University
Sep 29, 2010
Mutual Exclusion
• Mutual Exclusion primitives protect against races – Read-Update-Write
• Get the single key and – lock the critical section of a program before accessing global
variables – unlock as soon as you are done
pthread_mutex_t mux; pthread_mutex_init (&mux, NULL); pthread_mutex_lock (&mux); pthread_mutex_unlock (&mux);
18 Multicore Programming Workshop
Duke University
Locking and Unlocking
• To lock : pthread_mutex_lock(pthread_mutex_t &);
• To unlock : pthread_mutex_unlock(pthread_mutex_t &);
• Both functions are blocking
Sep 29, 2010 Multicore Programming Workshop 19
Duke University
Example 3 with Critical Section
#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!pthread_mutex_t dp_mtx;!double dp;!!void * wrapper (void *arg) {! double *ap, *bp, s;! int nn, tid;! ap = ((dparg *) arg)->a;! bp = ((dparg *) arg)->b;! nn = ((dparg *) arg)->n;! tid = ((dparg *) arg)->tid;!! s = dotprod(ap, bp, nn, tid);!! pthread_mutex_lock(&dp_mtx);! dp += s;! pthread_mutex_unlock(&dp_mtx);!!}!!! !!!
Sep 29, 2010 Multicore Programming Workshop 20
Source: www.cs.duke.edu/~nikos/mpw/dp3.c
Compile:
gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp3.c –o dp3
Run: ./dp3
Duke University
Example 3 with Critical Section
#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!int main () {! double *a, *b, ps;! pthread_t thread[NTHREADS];! dparg arg[NTHREADS];! int i;!! getvec(a); getvec(b);!! dp = 0.0;! pthread_mutex_init (&dp_mtx, NULL);!! for (i=0; i<NTHREADS; i++) {! arg[i].a = a; arg[i].b = b;! arg[i].n = N; arg[i].tid = i;!! pthread_create (&thread[i], NULL, wrapper, ! (void *)&arg[i]);! }! for (i=0; i<NTHREADS; i++) {! pthread_join (thread[i], NULL);! }! printf("%f\n", dp);!}!!! !!!
Sep 29, 2010 Multicore Programming Workshop 21
Source: www.cs.duke.edu/~nikos/mpw/dp3.c
Compile:
gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp3.c –o dp3
Run: ./dp3
Duke University
Outline
• Programming with Threads
– Embarrassingly Parallel (Pleasantly Parallel)
– Critical Sections (Mutual Exclusion)
– Data Dependent Task Parallelism (Condition Variables & Signals)
• Quick Introduction to OpenMP Programming
Sep 29, 2010 Multicore Programming Workshop 22
Duke University
Sep 29, 2010
Condition Variables
• Condition variables allow one thread to – wait for (sleep until) an event generated by any other thread
• This allows us to avoid the busy waiting
pthread_cond_t *notFull, *notEmpty; pthread_cond_init (q->notFull, NULL); pthread_cond_init (q->notEmpty, NULL); pthread_mutex_lock (fifo->mut); while (fifo->full) { printf ("producer: queue FULL.\n"); pthread_cond_wait (fifo->notFull, fifo->mut); } queueAdd (fifo, i); pthread_mutex_unlock (fifo->mut);
pthread_cond_signal (fifo->notEmpty);
23 Multicore Programming Workshop
Duke University
Condition Variables
Condition variables are used with a mutex pthread_cond_wait(pthread_cond_t *cptr, pthread_mutex_t *mptr); pthread_cond_signal(pthread_cond_t *cptr);
Sep 29, 2010 Multicore Programming Workshop 24
Duke University
Example 4 with Condition Variable
#include <stdlib.h>!#include <stdio.h>!#include <pthread.h>!!pthread_cond_t notEmptyVecSignal;!pthread_mutex_t vec_mtx;!pthread_mutex_t dp_mtx;!double dp;!int emptyVec;!!void * wrapper (void *arg) {! double *ap, *bp, s;! int nn, tid;! […]!! pthread_mutex_lock(&vec_mtx);! while (emptyVec) {! pthread_cond_wait(¬EmptyVecSignal,&vec_mtx);! }! pthread_mutex_unlock(&vec_mtx);!! s = dotprod(ap, bp, nn, tid);!! pthread_mutex_lock(&dp_mtx);! dp += s;! pthread_mutex_unlock(&dp_mtx);!}!!!! !!!
Sep 29, 2010 Multicore Programming Workshop 25
Source: www.cs.duke.edu/~nikos/mpw/dp4.c
Compile:
gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp4.c –o dp4
Run: ./dp4
Duke University
Example 4 with Condition Variable int main () {! [ … ]!! emptyVec = 1;! pthread_mutex_init (&vec_mtx, NULL);! pthread_cond_init (¬EmptyVecSignal, NULL);!! for (i=0; i<NTHREADS; i++) {! arg[i].a = a; arg[i].b = b;! arg[i].n = N; arg[i].tid = i;!! pthread_create (&thread[i], NULL, wrapper, (void *)
&arg[i]);! }!! getvec(a); getvec(b);!! pthread_mutex_lock(&vec_mtx);! emptyVec = 0;! pthread_mutex_unlock(&vec_mtx);! pthread_cond_broadcast (¬EmptyVecSignal);!! for (i=0; i<NTHREADS; i++) {! rc = pthread_join (thread[i], NULL);! }!! printf("%f\n", dp);!}!!!! !!!
Sep 29, 2010 Multicore Programming Workshop 26
Source: www.cs.duke.edu/~nikos/mpw/dp4.c
Compile:
gcc–D NTHREADS=8 –D N=1024 \ –pthread –O4 dp4.c –o dp4
Run: ./dp4
Duke University
Outline
• Programming with Threads
– Embarrassingly Parallel (Pleasantly Parallel)
– Critical Sections (Mutual Exclusion)
– Data Dependent Task Parallelism (Condition Variables & Signals)
• Quick Introduction to Programming with OpenMP
Sep 29, 2010 Multicore Programming Workshop 27
Duke University
OpenMP
• A set of compiler directives and library routines for parallel application programmers
• OMP simplifies writing multi-threaded programs in Fortran, C and C++
• Most of the constructs in OpenMP are compiler directives • #pragma omp construct [clause [clause]…]
• #pragma omp parallel num_threads(4) • Function prototypes and types in the file:
– #include <omp.h> • Most OpenMP* constructs apply to a structured block
– Structured block: a block of one or more statements with one point of entry at the top and one point of exit at the bottom
Sep 29, 2010 Multicore Programming Workshop 28
Duke University
Example in OpenMP #include <omp.h>!#include <stdlib.h>!#include <stdio.h>!!void getvec (double *a);!!double dotprod (double *a, double *b, int n) {! int i;! double s = 0.0;!!#pragma omp parallel for reduction(+:s)! for ( i = 0; i < n; i++ ) ! s += a[i]*b[i];! return s;!}!!int main () {! double *a, *b;!! a = (double *) malloc(sizeof(double)*N);! b = (double *) malloc(sizeof(double)*N);!! getvec(a); getvec(b);!! omp_set_num_threads(NTHREADS);! double dp = dotprod(a,b,n);! printf("%f\n", dp);!}!!!! !!!
Sep 29, 2010 Multicore Programming Workshop 29
Source: www.cs.duke.edu/~nikos/mpw/dp0-omp.c
Compile:
gcc –D NTHREADS=8 –D N=1024 \ –fopenmp –O4 dp0-omp.c –o dp0-omp
Run:
./dp0-omp
Duke University
Sep 29, 2010 Multicore Programming Workshop 30
OpenMP Parallel Region
#pragma omp parallel [clause...] ! if (scalar_expression) ! private (list) ! shared (list) ! default (shared | none)! firstprivate (list) ! reduction (operator: list) ! copyin (list) !! num_threads (n)!
! structured_block!
• When a thread reaches a PARALLEL directive, it creates a team of threads and becomes the master of the team • The master becomes thread
number 0 within that team.
• The parallel region code is executed by all threads
• A barrier implied at the end of the
parallel section • Only the master thread
continues execution
• If any thread terminates within a parallel region, all threads in the team will terminate, and the work done up until that point is undefined.
Duke University
Sep 29, 2010 Multicore Programming Workshop 31
OpenMP Work Sharing DO/for
#pragma omp for [clause...]! schedule (type [,chunk])! ordered private (list)! firstprivate (list)! lastprivate (list) ! shared (list) ! reduction (operator: list)! collapse (n) ! nowait !!for_loop !
#pragma omp parallel for \ ! shared(a,b,c) \! private(i)! for (i=0; i < n; i++) {! c[i] = a[i] + b[i]; ! } !
Duke University
Sep 29, 2010 Multicore Programming Workshop 32
Directive Responsibility
• Work-sharing • Data scoping • Synchronization • Scheduling
• Parallel region: partition work – Each thread executes same
code
• Parallel for loop: partition iterations – Threads share iterations of
loop
• Parallel section: functional parallelism – Threads perform different
tasks
Duke University
Sep 29, 2010 Multicore Programming Workshop 33
Directive Responsibility
• Work-sharing • Data scoping • Synchronization • Scheduling
• Shared: threads access a single copy of the data object
• Private: each thread gets volatile copy – Firstprivate: initialized from
master – Lastprivate: master’s copy
updated with last value of last thread
Duke University
Sep 29, 2010 Multicore Programming Workshop 34
Directive Responsibility
• Work-sharing • Data scoping • Synchronization • Scheduling
#pragma omp master
{…}
#pragma omp critical {…}
#pragma omp atomic count++;
#pragma omp barrier reduction (+: sum)
• Shared data with concurrent access lead to corrupted data
• Synchronization • Mutex – ensures exclusive
access to critical section of code
• Barrier – causes a group of threads to pause until all have reached a defined point
• Signaling • Conditional Wait – waits for
some event; signals when it occurs
• Broadcasting – signals a group of waiting threads
Duke University
Sep 29, 2010 Multicore Programming Workshop 35
Directive Responsibility
• Work-sharing • Data scoping • Synchronization • Scheduling
• Static: splits iteration space into blocks of size chunk
• Dynamic: assign blocks to threads as they become idle (uneven workloads)
• Guided: adjusts chunk-size exponentially until all assigned
Duke University
References
• D. Butenhof, Programming with POSIX threads, Addison Wesley (1997)
• Online Tutorials from LLNL – https://computing.llnl.gov/tutorials/pthreads/ – https://computing.llnl.gov/tutorials/openMP/
Sep 29, 2010 36 Multicore Programming Workshop