Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases
-
Upload
cesga-foundation -
Category
Technology
-
view
274 -
download
0
Transcript of Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases
![Page 1: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/1.jpg)
Can You Get Performance from
Xeon Phi Easily?
Lessons Learned from Two Real
Cases
![Page 2: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/2.jpg)
Objective
• Check the amount of work to use Intel
Xeon Phi.
• Minimal modifications using only pragmas.
• Two applications: – CalcunetW. Test MKL Libraries.
– GammaMaps. Test pragmas.
• Two modes: – Native: Only compiled to execute on Xeon Phi
– Offload: Uses Host+Xeon Phi
![Page 3: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/3.jpg)
CalcuNetw: Calculate Measurements in Complex Networks
• Complex networks, consisting of sets of nodes or vertices joined together in pairs by links or edges.
• Application Calculates for each network: – Subgraph Centrality (SC): characterizes the
participation of each node in all subgraphs in a network.
– SC odd: account only paths of long odd
– SC even: account only paths of long even
– Bipartivity: Is a proportion of even to total number of closed walks in the network.
– Network Communicability for Connected Nodes: C(p,q): Measures how well communicated are two nodes in the network.
– Network Communicability C(G): is the mean of all the C(p,q),
Mouriño J.C., Estrada E., Gomez A. “ CalcuNetw: Calculate Measurements in Complex Networks ”,Informe Técnico
CESGA-2005-003
![Page 4: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/4.jpg)
CalcuNetW
![Page 5: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/5.jpg)
GammaMaps: A figure-of-merit in Radiation
Therapy
X
Y
Z
Dose in voxel i,j,k
X
Y
Z
![Page 6: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/6.jpg)
GammaMaps: A figure-of-merit in
Radiation Therapy Read
Doses
Initialise and
normalise
Compute
Gamma
Store
Gamma
• Application in FORTRAN 90
• Parallelised using OpenMP
• Geometric algorithm*
• 512 x 512 x 128 = 33,554,432
voxels
• Auto-vectorization
• Pragmas for offload
* T. Ju, T. Simpson, J. O. Deasy, and D. A. Low, “Geometric interpretation of the γ dose distribution
comparison technique: Interpolation-free calculation,” Medical Physics, vol. 35, no. 3, p. 879, 2008.
![Page 7: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/7.jpg)
Results of Experiments
![Page 8: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/8.jpg)
Platform Host
CPU Model Intel(R) Xeon(R) CPU E5-2680
0 @ 2.70GHz
Nr. of cores 16
Memory 32788 MB
Operating System Linux 2.6.32-279.el6.x86_64
Compiler Version 2013U2 Intel Xeon Phi
Model Beta0 Engineering Sample
Nr. of cores 61 at 1.09GHz
Memory 7936 MB
Operating System MPSS Gold U1
Compiler Version 2013U2
GDDR Technology GDDR5
GDDR Frecuency 2750000 KHz
• Remote
access to
Intel systems
• Feb. 2013
![Page 9: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/9.jpg)
COMPACT - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 1 2 3 4 5 6 7
Intel Xeon Phi Affinity Policies
SCATTER - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 4 1 5 2 6 3 7
BALANCED - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 1 2 3 4 5 6 7
BALANCED - CORE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
{0,1} {2,3} {4,5} {6,7}
• TYPE – Compact
– Scatter
– Balanced
• Granularity – Fine or Thread
– Core
![Page 10: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/10.jpg)
Results for CalcunetW
![Page 11: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/11.jpg)
CalcunetW
![Page 12: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/12.jpg)
CalcunetW
![Page 13: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/13.jpg)
CalcunetW
![Page 14: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/14.jpg)
Results for GammaMaps
![Page 15: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/15.jpg)
GammaMaps
![Page 16: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/16.jpg)
Host
0
200
400
600
800
1000
1200
1400
0 5 10 15 20
Ela
psed
Tim
e (
s)
Nr. of Threads
Host
local-compact-core
local-compact-fine
local-scatter-fine
local-scatter-core
![Page 17: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/17.jpg)
GammaMaps
![Page 18: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/18.jpg)
Xeon Phi poor I/O
![Page 19: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/19.jpg)
Conclusions
• Using MKL library is easy and does not
require changes in the code.
• Easy pragmas on code permit fast usage
• I/O performance issues in Xeon Phi
• 1 Xeon Phi ~ 1 Xeon E5-2680
• Improve performance requires additional
work.
![Page 20: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/20.jpg)
Acknowledge
The authors would like to thank Intel for
providing access to Intel Xeon Phi
coprocessor.
![Page 21: Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases](https://reader035.fdocuments.in/reader035/viewer/2022081404/5583d1b4d8b42a6b638b502f/html5/thumbnails/21.jpg)
Questions
Andrés Gómez
José Carlos Mouriño
Carmen Cotelo
Aurelio Rodríguez
The TEAM