7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW...
Transcript of 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW...
![Page 1: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/1.jpg)
Tarek El-Ghazawi, GWU1
Introduction to Unified Parallel C: A PGAS C
Tarek El-Ghazawi
![Page 2: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/2.jpg)
Tarek El-Ghazawi, GWU2
1) UPC in a nutshell Memory model Execution model UPC Systems
2) Data Distribution and Pointers Shared vs Private Data Examples of data distribution UPC pointers
3) Workload Sharing upc_forall
4) Advanced topics in UPC Dynamic Memory Allocation
Synchronization in UPC
UPC Libraries
5) UPC Productivity Code efficiency
UPC Overview
![Page 3: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/3.jpg)
Tarek El-Ghazawi, GWU3
Introduction
UPC – Unified Parallel C
Set of specs for a parallel C
v1.0 completed February of 2001
v1.1.1 in October of 2003
v1.2 in May of 2005
v1.3 in November of 2013
Compiler implementations by vendors and universities
Consortium of government, academia, and HPC vendors including IDA CCS, GWU, UCB, MTU, U of Florida, UMCP, ANL, LBNL, LLNL, DoD, DoE, HP, Cray, IBM, UMN, ARSC, Sun, Intrepid, Etnus, …
• http://upc.lbl.gov/publications/upc-spec-1.3.pdf
![Page 4: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/4.jpg)
Tarek El-Ghazawi, GWU4
Introduction cont.
UPC compilers are now available for most HPC platforms and clustersSome are open source
A debugger and a performance analysis tool are available
Benchmarks, programming examples, and compiler testing suite(s) are available
Visit www.upcworld.org or upc.gwu.edu for more information
![Page 5: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/5.jpg)
Tarek El-Ghazawi, GWU5
UPC Systems
UPC CompilersCrayHewlett-PackardBerkeleyIntrepidIBMMTU
UPC ToolsTotalviewPPW from UFTAU
![Page 6: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/6.jpg)
Tarek El-Ghazawi, GWU6
UPC Home Page http://upc.gwu.edu
![Page 7: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/7.jpg)
Tarek El-Ghazawi, GWU7
UPC textbook now available
UPC: Distributed Shared Memory ProgrammingTarek El-Ghazawi
William Carlson
Thomas Sterling
Katherine Yelick
Wiley, May, 2005
ISBN: 0-471-22048-5
![Page 8: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/8.jpg)
Tarek El-Ghazawi, GWU8
Unified Parallel C
An explicit parallel extension of ISO C
PGAS parallel programming language
What is UPC?
![Page 9: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/9.jpg)
Tarek El-Ghazawi, GWU9
A number of threads working independently in a SPMD fashionMYTHREAD specifies thread index (0..THREADS-1)
Number of threads specified at compile-time or run-time
Synchronization when needed
Barriers
Locks
Memory consistency control
UPC Execution Model
![Page 10: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/10.jpg)
Tarek El-Ghazawi, GWU10
A pointer-to-shared can reference all locations in the shared space, but there is data-thread affinity
A private pointer may reference addresses in its private space or its local portion of the shared space
Static and dynamic memory allocations are supported for both shared and private memory
Shared
Thread 0
Private 0
Thread THREADS-1
Private 1 Private THREADS-1
Par
titi
on
ed
Glo
bal
ad
dre
ss s
pac
eThread 1
Pri
vate
S
pac
esUPC Memory Model
![Page 11: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/11.jpg)
Tarek El-Ghazawi, GWU11
User’s General View
A collection of threads operating in a single global address space, which is logically partitioned among threads. Each thread has affinity with a portion of the shared address space. Each thread has also a private space.
![Page 12: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/12.jpg)
Tarek El-Ghazawi, GWU12
1) UPC in a nutshell Memory model Execution model UPC Systems
2) Data Distribution and Pointers Shared vs Private Data Examples of data distribution UPC pointers
3) Workload Sharing upc_forall
4) Advanced topics in UPC Dynamic Memory Allocation
Synchronization in UPC
UPC Libraries
5) UPC Productivity Code efficiency
UPC Overview
![Page 13: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/13.jpg)
Tarek El-Ghazawi, GWU13
A First Example: Vector addition
//vect_add.c#include <upc_relaxed.h>#define N 100*THREADS
shared int v1[N], v2[N], v1plusv2[N];void main() {
int i;for(i=0; i<N; i++)
if (MYTHREAD==i%THREADS)v1plusv2[i]=v1[i]+v2[i];
}
Thread 0 Thread 1
v1[0] v1[1]
v1[2] v1[3]
v2[0] v2[1]
v2[2] v2[3]
v1plusv2[0] v1plusv2[1]
v1plusv2[2] v1plusv2[3]
0 12 3
Iteration #:
…
…
…
Sh
are
d S
pace
![Page 14: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/14.jpg)
Tarek El-Ghazawi, GWU14
2nd Example: A More Efficient Implementation
Thread 0 Thread 1
v1[0] v1[1]
v1[2] v1[3]
v2[0] v2[1]
v2[2] v2[3]
v1plusv2[0] v1plusv2[1]
v1plusv2[2] v1plusv2[3]
0 12 3
Iteration #:
…
…
…
Sh
are
d S
pace
//vect_add.c
#include <upc_relaxed.h>#define N 100*THREADS
shared int v1[N], v2[N], v1plusv2[N];void main() {
int i;for(i=MYTHREAD; i<N; i+=THREADS)
v1plusv2[i]=v1[i]+v2[i];}
![Page 15: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/15.jpg)
Tarek El-Ghazawi, GWU15
3rd Example: A More Convenient Implementation with upc_forall
//vect_add.c
#include <upc_relaxed.h>#define N 100*THREADS
shared int v1[N], v2[N], v1plusv2[N];
void main()
{int i;upc_forall(i=0; i<N; i++; i)
v1plusv2[i]=v1[i]+v2[i];}
Thread 0 Thread 1
v1[0] v1[1]
v1[2] v1[3]
v2[0] v2[1]
v2[2] v2[3]
v1plusv2[0] v1plusv2[1]
v1plusv2[2] v1plusv2[3]
0 12 3
Iteration #:
…
…
…
Sh
are
d S
pace
![Page 16: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/16.jpg)
Tarek El-Ghazawi, GWU16
Example: UPC Matrix-Vector Multiplication- Default Distribution
// vect_mat_mult.c#include <upc_relaxed.h>
shared int a[THREADS][THREADS] ;shared int b[THREADS], c[THREADS] ;void main (void) {
int i, j; upc_forall( i = 0 ; i < THREADS ; i++; i){
c[i] = 0;for ( j= 0 ; j THREADS ; j++)
c[i] += a[i][j]*b[j];}
}
![Page 17: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/17.jpg)
Tarek El-Ghazawi, GWU17
Data Distribution
Th. 0
Th. 1
Th. 2
*
A BT
hread 0
Thread 1
Thread 2
=
C
Th. 0
Th. 1
Th. 2
![Page 18: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/18.jpg)
Tarek El-Ghazawi, GWU18
A Better Data Distribution
C
Th. 0
Th. 1
Th. 2
*=
A B
Thread 0
Thread 1
Thread 2
Th. 0
Th. 1
Th. 2
![Page 19: 7DUHN(O *KD]DZL...7DUHN (O *KD]DZL *:8 ,QWURGXFWLRQ FRQW 83& FRPSLOHUV DUH QRZ DYDLODEOH IRU PRVW +3& SODWIRUPV DQG FOXVWHUV 6RPH DUH RSHQ VRXUFH $ GHEXJJHU DQG D SHUIRUPDQFH DQDO\VLV](https://reader034.fdocuments.in/reader034/viewer/2022042807/5f7a91e9f97dad44ac74ee91/html5/thumbnails/19.jpg)
Tarek El-Ghazawi, GWU19
Example: UPC Matrix-Vector Multiplication- The Better Distribution// vect_mat_mult.c#include <upc_relaxed.h>
shared [THREADS] int a[THREADS][THREADS];shared int b[THREADS], c[THREADS];
void main (void) {
int i, j; upc_forall( i = 0 ; i < THREADS ; i++; i){
c[i] = 0;for ( j= 0 ; j THREADS ; j++)
c[i] += a[i][j]*b[j];}
}