HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr...
-
Upload
lillian-mccall -
Category
Documents
-
view
216 -
download
1
Transcript of HPC Challenge Benchmark Suite 2006 SPEC Workshop January 23, 2006 Austin, TX Jack Dongarra Piotr...
HPC Challenge Benchmark SuiteHPC Challenge Benchmark Suite
2006 SPEC WorkshopJanuary 23, 2006
Austin, TX
Jack DongarraPiotr Łuszczek
http://icl.cs.utk.edu/hpcc/http://icl.cs.utk.edu/hpcc/
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 2/20
High Productivity Computing Systems
Impact:
Performance (time-to-solution): speed up critical national security applications by a factor 10X to 40X
Programmability (idea-to-first-solution): reduce cost and time of developing application solutions
Portability (transparency): insulate research and operational application software from system
Robustness (reliability): apply all known techniques to protect against outside attacks, hardware faults, and programming errors
Goal:Provide a generation of economically viable high productivity computing systems for the national security and industrial user community (2010)
Goal:Provide a generation of economically viable high productivity computing systems for the national security and industrial user community (2010)
Fill the Critical Technology and Capability Gap Today (late 80's HPC Technology) ... to ... Future (Quantum/Bio Computing)
Fill the Critical Technology and Capability Gap Today (late 80's HPC Technology) ... to ... Future (Quantum/Bio Computing)
Applications:Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminant modeling and biotechnology
Analysis &
Analysis &
Assessment
Assessment
PerformanceCharacterization
& Prediction
SystemArchitecture
SoftwareTechnology
HardwareTechnology
Programming Models
Industry R&D
Industry R&D
HPCS Program Focus Areas
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 3/20
HPCC Motivation and Design
1. Augment TOP500
● Not use single number
● Provide detailed system description
2. Span locality space
3. Test various hardware components
Spatial LocalityT
emp
ora
l L
oca
lity
DGEMM
HPL
PTRANS
STREAM
FFT
RandomAccess
Mission Partner
Applications
LowH
igh
High
CPU
Memory Interconnect
Computationalresources
Computationalresources
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 4/20
HPCC Components
1. HPL (Hi-Perf LINPACK)
2. STREAM
3. PTRANS (A ← AT+B)
4. RandomAccess
5. FFT
6. Matrix-matrix multiply
7. b_eff (effective bandwidth/latency)
A x=b A∈ℝ n×n x , b∈ℝ n
------------------------------------------------------- name kernel bytes/iter FLOPS/iter------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2-------------------------------------------------------
f k ∑1
m
t j e−2 i jk
m 1≤k≤m ; f , t∈ℂm1
-1
T: T[k] (+) ai
T[k] (+) ai
64 bits
ER
= E
E
π
C
R
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 5/20
HPCC Test Variants
1. Local
2. Embarrassingly parallel
3. Global
4. Network only
MM
PPPP
MM
PPPP
MM
PPPP
MM
PPPP
NetworkNetwork
MM
PPPP
MM
PPPP
MM
PPPP
MM
PPPP
NetworkNetwork
MM
PPPP
MM
PPPP
MM
PPPP
MM
PPPP
NetworkNetwork
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 6/20
Official HPCC Submission Process
1. Download
2. Install
3. Run
4. Upload results
5. Confirm via @email@
6. Tune
7. Run
8. Upload results
9. Confirm via @email@
● Only some routines can be replaced● Data layout needs to be preserved● Multiple languages can be used
● Only some routines can be replaced● Data layout needs to be preserved● Multiple languages can be used
Provide detailedinstallation and
execution environment
Provide detailedinstallation and
execution environment
Results are immediately available on the web site:● Interactive HTML● XML● MS Excel● Kiviat charts (radar plots)
Results are immediately available on the web site:● Interactive HTML● XML● MS Excel● Kiviat charts (radar plots)
OptionalOptional
Prequesites:● C compiler● BLAS● MPI
Prequesites:● C compiler● BLAS● MPI
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 7/20
Measuring Locality in Code
HPC Challenge Benchmarks
Select Applications
0.00
0.20
0.40
0.60
0.80
1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Spatial Locality
Tem
po
ral l
oca
lity
HPL
Test3D
CG
OverflowGamess
RandomAccess
AVUS
OOCore
RFCTH2
STREAM
HYCOM
• Spatial and temporal data locality here is for one node/processor — i.e., locally or “in the small”
• Spatial and temporal data locality here is for one node/processor — i.e., locally or “in the small”
Generated by PMaC @ SDSC
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 8/20
HPCC Awards: SC|05 BOF
Class 1: Best Performance
● Figure of merit: raw system performance
● Submission must be valid HPCC database entry
Side effect: populate HPCC database
● 4 categories: HPCC components
HPL STREAM RandomAccess FFT
● Award certificates
4x $500 from HPCwire
Class 2: Most Productivity
● Figure of merit: performance and elegance
Highly subjective Based on committee vote
● Submission must implement at least 2 out of 4 Class 1 tests
The more tests the better
● Performance numbers are a plus
● The submission process:
Source code “Marketing brochure” SC|05 BOF presentation
● Award certificate
$1500 from HPCwireHPCwire contribution:● press coverage● $3500 awards
HPCwire contribution:● press coverage● $3500 awards
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 9/20
HPCC Awards Class 2 Detailed Results
Language HPL RandomAccess STREAM FFT
Python MPI √ √
pMatlab √ √ √ √
Cray MTA C √ √
MPT C √ √
UPC x 3 √ √ √
Cilk √ √ √ √
OpenMP C++ √ √
StarP √ √
Parallel Matlab √ √ √ √
HPF √ √
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 10/20
Time line: HPL Submission Stats
Jun 28, 03 Jan 14, 04 Aug 1, 04 Feb 17, 05 Sep 5, 05 Mar 24, 060.1
1
10
100
1000
Tflo
p/s
259 Tflop/s
110 Gflop/s
HPCS goal: 2000 Tflop/s
SC04 SC|05
1. IBM BG/L 259 (LLNL)2. IBM BG/L 67 (Watson)3. IBM Power5 58 (LLNL)
x7x7 TOP500: 280 Tflop/s
TOP500Systemsin HPCC
database: #1, #2, #3,
#4, #10, #14, #17, #35, #37, #71, #80
TOP500Systemsin HPCC
database: #1, #2, #3,
#4, #10, #14, #17, #35, #37, #71, #80
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 11/20
Time line: STREAM Submission Stats
Jun 28, 03 Jan 14, 04 Aug 1, 04 Feb 17, 05 Sep 5, 05 Mar 24, 0610
100
1000
10000
100000
1000000
GB
/s
160 TB/s
27 GB/s
HPCS goal: 6500 TB/s
SC04 SC|05
1. IBM BG/L 160 (LLNL)2. IBM Power5 55 (LLNL)3. IBM BG/L 40 (Watson)
x40x40
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 12/20
Time line: FFT Submission Stats
Jun 28, 03 Jan 14, 04 Aug 1, 04 Feb 17, 05 Sep 5, 05 Mar 24, 061
10
100
1000
10000
Gflo
p/s
2311 Gflop/s
4 Gflop/s
HPCS goal: 500 Tflop/s
SC|05SC04
1. IBM BG/L 2.3 (LLNL)2. IBM BG/L 1.1 (Watson)3. IBM Power5 1.0 (LLNL)
x200x200
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 13/20
Timeline: RandomAccess Submission Stats
Jun 28, 03 Jan 14, 04 Aug 1, 04 Feb 17, 05 Sep 5, 05 Mar 24, 061
10
100
GU
PS
35 GUPS
0.01 GUPS
HPCS goal: 64000 GUPS
SC04 SC|05
1. IBM BG/L 35(LLNL)
2. IBM BG/L 17 (Watson)
3. Cray X1E 8 (ORNL)
x1800x1800
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 14/20
Kiviat Charts: Multi-network Example
AMD Opteron clusters
● 2.2 GHz
● 64-processor cluster
Interconnects
1. GigE
2. Commodity
3. Vendor
Cannot be differentiated based on:
● HPL
● Matrix-matrix multiply
Available on HPCC website
● http://icl.cs.utk.edu/hpcc/
Kiviat chart (radar plot)
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 15/20
HPCC Data Analysis: Normalize
Example: divide by peak flop/s
System HPL RandomAccess STREAM FFT
Cray XT3 81.4% 0.031 1168.8 38.3
Cray X1E 67.3% 0.422 696.1 13.4
IBM Power5 53.5% 0.003 703.5 15.5
IBM BG/L 70.6% 0.089 435.7 6.1
SGI Altix 71.9% 0.003 308.7 3.5
NEC SX-8 86.9% 0.002 2555.9 17.5
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 16/20
HPCC Data Analysis: Correlate
HPL versus Theoretical Peak
0
5
10
15
20
25
30
0 5 10 15 20 25
HPL (Tflop/s)
Th
eore
tica
l P
eak
(Tfl
op
/s)
Cray XT3
NEC SX-8
SGI Altix
Is HPL an effective peak or just a peak?
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 17/20
HPCC Data Analysis: Correlate More
Can I just run DGEMM (local matrix-matrix multiply) instead of HPL?
DGEMM alone overestimates HPL performance
Note the 1000x difference in scales: Tera vs. Giga
HPL versus DGEMM
0
5000
10000
15000
20000
25000
30000
0 5 10 15 20 25
HPL (Tflop/s)
DG
EM
M (
Gfl
op
/s)
Cray XT3
NEC SX-8
SGI Altix
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 18/20
HPCC Data Analysis: Correlate Yet More
HPL versus G-RandomAccess
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 5 10 15 20 25
HPL (Tflop/s)
G-R
and
om
Acc
ess
(GU
PS
) Cray XT3
NEC SX-8
SGI Altix
Cray X1E/opt
IBM BG/L
Rackable
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 19/20
Future Directions
Reduce execution time
● Preserve relevance of existing results
Add new tests but not duplicate effort
● Sparse matrix operations
● I/O
● Smith-Waterman (sequence alignment)
Porting
● Cell/PS3
● Languages
Co-Array Fortran HPCS languages: Chapel, Fortress, X10
● Environments
● Paradigms
Jan 26, 2006 2006 SPEC Workshop, Austin, TX 20/20
Collaborators
David Bailey
● NERSC/LBL
Jeremy Kepner
● MIT Lincoln Lab
David Koester
● MITRE
Bob Lucas
● ISI/USC
Rusty Lusk
● ANL
John McCalpin
● IBM Austin «» AMD
Rolf Rabenseifner
● HLRS Stuttgart
Daisuke Takahashi
● Tsukuba, Japan