Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP...
Transcript of Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP...
![Page 1: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/1.jpg)
Helen He!NERSC User Services Group
Adding OpenMP to Your Code Using Cray Reveal
-‐ 1 -‐
October 10, 2013
![Page 2: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/2.jpg)
Current Architecture Trend • Mul1-‐socket nodes with rapidly increasing core counts • Memory per core decreases • Memory bandwidth per core decreases • Network bandwidth per core decreases • Need a hybrid programming model with three levels of
parallelism – MPI between nodes or sockets – Shared memory (such as OpenMP) on the nodes/sockets – Increase vectoriza@on for lower level loop structures
![Page 3: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/3.jpg)
Advantages of hybrid MPI/OpenMP
• Reduce number of MPI ranks per node • Minimize network injec1on conten1on • Avoids the extra communica1on overhead with MPI within
node • Reduce memory footprint • Chance of overlapping MPI communica1on with OpenMP
thread computa1on
-‐ 3 -‐
![Page 4: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/4.jpg)
What is Reveal
• A tool developed by Cray to help developing the hybrid programming model
• Part of the Cray PerJools soJware package • Only works under PrgEnv-‐cray • U1lizes the Cray CCE program library for loopmark and
source code analysis, combined with performance data collected from CrayPat
• Helps to iden1fy top 1me consuming loops, with compiler feedback on dependency and vectoriza1on
• Loop scope analysis provides variable scope and compiler direc1ve sugges1ons for inser1ng OpenMP parallelism to a serial or pure MPI code
-‐ 4 -‐
![Page 5: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/5.jpg)
Steps to Use Reveal on Edison (1)
-‐ 5 -‐
• Load the user environment – % module swap PrgEnv-‐intel PrgEnv-‐cray – % module unload darshan – % module load perIools (current default is version 6.1.2)
• Generate loop work es1mates – % In –c –h profile_generate myprogram.f90 – % In –o poisson_serial –h profile_generate myprogram.o
• Good to separate compile and link to keep object files • Op@miza@on flags disabled with –h profile-‐generate
– % pat_build –w myprogram (-‐w enables tracing) • It will generate executable “myprogram+pat”
– Run the program “myprogram+pat” • It will generate one or more myprogram+pat+…xf files
– % pat_report myprogram+pat…xf > myprogram.rpt • It will generate myprogram+pat….ap2 file
![Page 6: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/6.jpg)
Steps to Use Reveal on Edison (2)
-‐ 6 -‐
• Generate a program library – % In –O3 –hpl=myprogram.pl –c myprogram.f90 – Op@miza@on flags can be used – Build one source code at a @me, with “-‐c” flag – Use absolute path for program library if sources are in mul@ple
directories – User needs to clean up program library from @me to @me
• Launch Reveal – % reveal myprogram.pl myprogram+pat…ap2
![Page 7: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/7.jpg)
Steps to Use Reveal on Hopper
-‐ 7 -‐
• To use the newest version perJools/6.1.2, which is built upon cray-‐mpich/6.x.x – % module unload cray-‐libsci cray-‐mpich2 – % module load cray-‐libsci/12.1.01 – % module load cray-‐mpich/6.1.0 – % module unload darshan – % module load perIools-‐lite/6.1.2
• To use perJools-‐lite/6.1.1 or older – % module unload darshan – % module load perIools-‐lite/6.1.2
• Follow the rest of steps for Edison
![Page 8: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/8.jpg)
Cray Reveal GUI
-‐ 8 -‐
![Page 9: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/9.jpg)
Top loops with compiler loopmarks and feedback
-‐ 9 -‐
Top loops
Compiler feedback
Compiler loopmarks
![Page 10: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/10.jpg)
Compiler feedback explanation
-‐ 10 -‐
Double click to explain
![Page 11: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/11.jpg)
Compiler feedback explanation (2)
-‐ 11 -‐
![Page 12: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/12.jpg)
-‐ 12 -‐
Right click to select loops
Start Scoping
Reveal scoping assistance
![Page 13: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/13.jpg)
Scoping Results
-‐ 13 -‐
![Page 14: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/14.jpg)
Suggested OpenMP directives
-‐ 14 -‐
![Page 15: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/15.jpg)
Save the directives
-‐ 15 -‐
Save direc@ves to the original file
![Page 16: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/16.jpg)
Extensive “Help” topics in Reveal
-‐ 16 -‐
![Page 17: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/17.jpg)
Reveal helps to start adding OpenMP • Only under PrgEnv-‐cray, with CCE compiler • Start from most 1me consuming loops first • Insert OpenMP direc1ves
– Make sure to save a copy of the original code first, since the saved new file will overwrite the original code
• There will be unresolved and incomplete variable scopes • There maybe more incomplete and incorrect variables
iden1fied when compiling the resulted OpenMP codes • User s1ll needs to understand OpenMP, and resolves the
issues. • Verify correctness and performance • Repeat as necessary • No OpenMP tasks, barrier, cri1cal, atomic regions, etc
-‐ 17 -‐
![Page 18: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/18.jpg)
More work after reveal (1)
-‐ 18 -‐
Final code: (Lots of changes from Reveal sugges@ons, but will s@ll make the code slower than without OpenMP direc@ves, so will not use any direc@ves) #pragma omp parallel for default(none) \ private (my_change,my_n) \ shared (my_rank,N,i_min,i_max,u_new,u) \ private (j) \ private (i) for ( i = i_min[my_rank]; i <= i_max[my_rank]; i++ ) { for ( j = 1; j <= N; j++ ) { if ( u_new[INDEX(i,j)] != 0.0 ) #pragma omp cri@cal { my_change = my_change + fabs ( 1.0 -‐ u[INDEX(i,j)] / u_new[INDEX(i,j)] ); my_n = my_n + 1; } } }
Reveal suggests: #pragma omp parallel for default(none) \ unresolved (my_change,my_n) \ shared (my_rank,N,i_max,u_new,u) \ firstprivate (i) for ( i = i_min[my_rank]; i <= i_max[my_rank]; i++ ) { for ( j = 1; j <= N; j++ ) { if ( u_new[INDEX(i,j)] != 0.0 ) { my_change = my_change + fabs ( 1.0 -‐ u[INDEX(i,j)] / u_new[INDEX(i,j)] ); my_n = my_n + 1; } } }
![Page 19: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/19.jpg)
More work after Reveal (2)
-‐ 19 -‐
Final code: #pragma omp parallel for default(none) \ private (my_rank,j,i) \ shared (f,N,u,u_new,h,i_min,i_max) for ( i = i_min[my_rank] + 1; i <= i_max[my_rank] -‐ 1; i++ ) { for ( j = 1; j <= N; j++ ) { u_new[INDEX(i,j)] = 0.25 * ( u[INDEX(i-‐1,j)] + u[INDEX(i+1,j)] + u[INDEX(i,j-‐1)] + u[INDEX(i,j+1)] + h * h * f[INDEX(i,j)] ); } }
Reveal suggests: #pragma omp parallel for default(none) \ shared (f,N,u,u_new,i,h) for ( i = i_min[my_rank] + 1; i <= i_max[my_rank] -‐ 1; i++ ) { for ( j = 1; j <= N; j++ ) { u_new[INDEX(i,j)] = 0.25 * ( u[INDEX(i-‐1,j)] + u[INDEX(i+1,j)] + u[INDEX(i,j-‐1)] + u[INDEX(i,j+1)] + h * h * f[INDEX(i,j)] ); } }
![Page 20: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/20.jpg)
Performance with OpenMP added
-‐ 20 -‐
0
20
40
60
80
100
120
140
160
1 thread 2 threads 6 threads 12 threads 24 threads
Run Time (sec)
poisson_omp nx=ny=1201, on Edison
0 5 10 15 20 25 30 35 40 45
Pure MPI 1 thread 3 threads 6 threads
Run Time (sec)
Poisson_mpi_omp, 4 MPI tasks, N=1200, on Edison
![Page 21: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/21.jpg)
More information
• % module load training • See example codes, reports, detailed steps in README at:
– $EXAMPLES/Edison2013/reveal
• Documenta1ons: – % man reveal (when the “perIools” module is loaded) – Using Cray Performance Measurement and Analysis Tools
hvp://docs.cray.com/books/S-‐2376-‐612/S-‐2376-‐612.pdf
-‐ 21 -‐
![Page 22: Adding OpenMP to Your Code Using Cray Reveal · Helen He! NERSC User Services Group Adding OpenMP to Your Code Using Cray Reveal 1 October10,2013](https://reader034.fdocuments.in/reader034/viewer/2022050209/5f5b7f5c1603a1462244763a/html5/thumbnails/22.jpg)
National Energy Research Scientific Computing Center
-‐ 22 -‐