Shared-memory Parallel Programming
description
Transcript of Shared-memory Parallel Programming
![Page 1: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/1.jpg)
04/10/25 Parallel and Distributed Programming 1
Shared-memory Parallel Programming
Taura Lab M1Yuuki Horita
![Page 2: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/2.jpg)
04/10/25 Parallel and Distributed Programming 2
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
![Page 3: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/3.jpg)
04/10/25 Parallel and Distributed Programming 3
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
![Page 4: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/4.jpg)
04/10/25 Parallel and Distributed Programming 4
Parallel Programming Model
Message Passing Model Talked by Imatake-kun just now
Shared Memory Model Memory is shared with all process
elements Multiprocessor (SMP, SunFire, …) DSM (Distributed Shared Memory)
Process elements can communicate each other through the shared memory
![Page 5: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/5.jpg)
04/10/25 Parallel and Distributed Programming 5
Shared Memory Model
PE PE PE……
Memory
![Page 6: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/6.jpg)
04/10/25 Parallel and Distributed Programming 6
Shared Memory Model
Simplicity not necessary to think about the location
of the computation data Fast communication (Multiprocessor)
not necessary to use networks in process communication
Dynamic load sharing the same reason as simplicity
![Page 7: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/7.jpg)
04/10/25 Parallel and Distributed Programming 7
Shared Memory Parallel Programming Multi-thread programming
Pthreads OpenMP
Parallel Programming model for shared memory multiprocessor
![Page 8: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/8.jpg)
04/10/25 Parallel and Distributed Programming 8
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
![Page 9: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/9.jpg)
04/10/25 Parallel and Distributed Programming 9
Sample Sequential Program
…loop{
for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } }
}…
FDM (Finite Difference Method)
![Page 10: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/10.jpg)
04/10/25 Parallel and Distributed Programming 10
Parallelization Procedure
Sequential Computation
Decomposition
Tasks
Assignment
Process Elements
Orchestration
Mapping
Processors
![Page 11: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/11.jpg)
04/10/25 Parallel and Distributed Programming 11
Parallelize the Sequential Program
Decomposition
a task
…loop{
for (i=0; i<N; i++){ for (j=0; j<N; j++){
a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } }
}…
![Page 12: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/12.jpg)
04/10/25 Parallel and Distributed Programming 12
Parallelize the Sequential Program
Assignment
PE
PE
PE
PE
Divide the tasks equally among process elements
![Page 13: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/13.jpg)
04/10/25 Parallel and Distributed Programming 13
Parallelize the Sequential Program
Orchestration
PE
PE
PE
PE
need to communicate and to synchronize
![Page 14: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/14.jpg)
04/10/25 Parallel and Distributed Programming 14
Parallelize the Sequential Program
Mapping
PE
PE
PE
PE
Multiprocessor
![Page 15: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/15.jpg)
04/10/25 Parallel and Distributed Programming 15
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
![Page 16: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/16.jpg)
04/10/25 Parallel and Distributed Programming 16
Multi-thread Programming
A process element is a thread cf. a process
Memory is shared among all threads generated from the same process
Threads can communicate with each other through shared memory
![Page 17: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/17.jpg)
04/10/25 Parallel and Distributed Programming 17
Fork-Join Model
Fork
Join
Parallelized Section
Serialized Section
Serialized SectionProgram starts (Main Thread)
Main Thread creates new threads
Other threads join Main Thread
Main Thread continues processing
Main Thread
![Page 18: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/18.jpg)
04/10/25 Parallel and Distributed Programming 18
Libraries for Thread Programming
Pthreads (C/C++) pthread_create() pthread_join()
Java Thread Thread Class / Runnable Interface
![Page 19: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/19.jpg)
04/10/25 Parallel and Distributed Programming 19
Pthreads API (fork/join) pthread_t // thread variable pthread_create (
pthread_t *thread, // thread variable pthread_attr_t *attr, // thread attributes void *(*func)(void *), // start function void *arg // arguments of the function )
pthread_join ( pthread_t thread, // thread variable void **thread_return // the return value of the thread)
![Page 20: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/20.jpg)
04/10/25 Parallel and Distributed Programming 20
Pthreads Parallel Programming#include …
void do_sequentially (void){ /* sequential execution */}
main (){ … do_sequentially(); // want to parallelize …}
![Page 21: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/21.jpg)
04/10/25 Parallel and Distributed Programming 21
Pthreads Parallel Programming#include …#include <pthread.h>
void do_in_parallel (void){ /* parallel execution */}
main (){ pthread_t tid; … pthread_create(&tid, NULL, (void *)do_in_parallel, NULL); do_in_parallel(); pthread_join(tid); …}
![Page 22: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/22.jpg)
04/10/25 Parallel and Distributed Programming 22
Exclusive Access Control
int sum = 0;
thread_A(){ sum++;}
thread_B(){ sum++;}
ThreadA ThreadB
a ← read sum
write a → sum
a = a + 1
a ← read sum
write a → sum
a = a + 1
0
0
1
1
sum = 0
sum = 1sum = 1
![Page 23: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/23.jpg)
04/10/25 Parallel and Distributed Programming 23
Pthreads API (Exclusive Access Control)
Variable pthread_mutex_t
Initialization Function pthread_mutex_init( pthread_mutex_t *mutex, pthread_mutexattr_t *mutexattr )
Lock Function pthread_mutex_lock(pthread_mutex_t *mutex) pthread_mutex_unlock(pthread_mutex_t *mutex)
![Page 24: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/24.jpg)
04/10/25 Parallel and Distributed Programming 24
Exclusive Access Control
int sum = 0;pthread_mutex_t mutex;pthread_mutex_init(&mutex, 0)
thread_A(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex);}
thread_B(){ pthread_mutex_lock(&mutex); sum++; pthread_mutex_unlock(&mutex);}
acquire lock
sum ++
release lock
sum ++
acquire lock
acquire lock
release lock
ThreadA ThreadB
![Page 25: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/25.jpg)
04/10/25 Parallel and Distributed Programming 25
Pthreads API (Condition Variable) Variable
pthread_cond_t Initialization Function
pthread_cond_init( pthread_cond_t *cond, pthread_condattr_t *condattr )
Condition Function pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) pthread_cond_broadcast(pthread_cond_t *cond) pthread_cond_signal(pthread_cond_t *cond);
![Page 26: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/26.jpg)
04/10/25 Parallel and Distributed Programming 26
Condition Wait
acquire lock
release lock
pthread_mutex_lock(&mutex)while( condition is not satisfied ){ pthread_cond_wait(&cond, &mutex);}pthread_mutex_unlock(&mutex);
Is condition satisfied?
release lock
sleep
pthread_cond_broadcastpthread_cond_signal
pthread_mutex_lock(&mutex)update_condition();pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);
ThreadA
ThreadB
![Page 27: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/27.jpg)
04/10/25 Parallel and Distributed Programming 27
Synchronization Synchronization in the sample program
n = 0;…pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);
![Page 28: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/28.jpg)
04/10/25 Parallel and Distributed Programming 28
Characteristics of Pthreads
troublesome to describe exclusive access control and synchronization
likely to be deadlocked still hard to parallelize a given
sequential program
![Page 29: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/29.jpg)
04/10/25 Parallel and Distributed Programming 29
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
![Page 30: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/30.jpg)
04/10/25 Parallel and Distributed Programming 30
What’s OpenMP? specification for a set of compiler
directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs Fortran ver1.0 API – Oct.1997 C/C++ ver1.0 API – Oct. 1998
![Page 31: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/31.jpg)
04/10/25 Parallel and Distributed Programming 31
Background of OpenMP spread of shared memory multiprocessors need for common directives in shared
memory multiprocessors Each vendors had provided a different set of
directives need for simpler and more flexible interface
for developing parallel applications Pthread is hard for developers to describe
parallel applications
![Page 32: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/32.jpg)
04/10/25 Parallel and Distributed Programming 32
OpenMP API
Directives Libraries Environment Variables
![Page 33: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/33.jpg)
04/10/25 Parallel and Distributed Programming 33
Directives
C/C++
Fortran
#pragma omp directive_name …
!$OMP directive_name …
If user’s compiler doesn’t support openMP, the directive sentences are ignored and therefore the program can be executed as a sequential program.
![Page 34: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/34.jpg)
04/10/25 Parallel and Distributed Programming 34
Parallel Region the part parallelized by some threads
#pragma omp parallel{ /* parallel region */}
create some threads at the beginning of the parallel region
join at the end of the parallel region
![Page 35: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/35.jpg)
04/10/25 Parallel and Distributed Programming 35
Parallel Region (thread)
the number of thread omp_get_num_threads() :
get current # of threads omp_set_num_threads(int nthreads) :
set # of threads to nthreads $OMP_NUM_THREADS
thread ID (0 ~ # of threads-1)
omp_get_thread_num() : get thread ID
![Page 36: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/36.jpg)
04/10/25 Parallel and Distributed Programming 36
Work Sharing Construction
specify the task assignment inside parallel region for
sharing iterations among threads sections
sharing sections among threads single
executing only by one thread
![Page 37: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/37.jpg)
04/10/25 Parallel and Distributed Programming 37
Example of Work Sharing
for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}
omp_set_num_threads(4);
#pragma omp parallel#pragma omp forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}
omp_set_num_threads(4);
#pragma omp parallel forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}
Memory access conflict at i and j makes the computation slow
![Page 38: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/38.jpg)
04/10/25 Parallel and Distributed Programming 38
Data Scoping Attributes specify the data scoping at parallel
construction or work sharing construction shared( var_list )
var_list is shared among threads private( var_list )
var_list is private reduction (operator : var_list )
ex) #pragma omp for reduction (+: sum) var_list is private in construction and reflected
after the construction
![Page 39: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/39.jpg)
04/10/25 Parallel and Distributed Programming 39
Example of Data Scoping Attributes
omp_set_num_threads(4);
#pragma omp parallel for private(i, j)for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }}
![Page 40: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/40.jpg)
04/10/25 Parallel and Distributed Programming 40
Synchronization barrier
wait until all threads reach this line #pragma omp barrier
critical execute exclusively #pragma omp critical [(name)] { … }
atomic update a scalar variable atomically #pragma omp atomic
……
![Page 41: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/41.jpg)
04/10/25 Parallel and Distributed Programming 41
Synchronization (Pthreads/OpenMP)
Synchronization in the sample program
<Pthreads>
pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex);
<OpenMP>
#pragma omp barrier
![Page 42: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/42.jpg)
04/10/25 Parallel and Distributed Programming 42
Summary of OpenMP
Incremental parallelization of sequential programs
Portability Easier to implement parallel
application than Pthreads and MPI
![Page 43: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/43.jpg)
04/10/25 Parallel and Distributed Programming 43
Agenda Introduction Sample Sequential Program Multi-thread programming OpenMP Summary
![Page 44: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/44.jpg)
04/10/25 Parallel and Distributed Programming 44
Message Passing Model / Shared Memory Model
Message Passing Shared Memory
Architecture any SMP or DSM
Programming difficult easier
Performance good better (SMP)worse (DSM)
Cost less expensive very expensiveSunFire15K $4,140,830
![Page 45: Shared-memory Parallel Programming](https://reader036.fdocuments.in/reader036/viewer/2022081508/56813a36550346895da21e46/html5/thumbnails/45.jpg)
04/10/25 Parallel and Distributed Programming 45
Thank you!