Advanced MPI Tutorialhtor.inf.ethz.ch/publications/img/hoefler-gg500.pdfGREEN GRAPH500 Torsten...
Transcript of Advanced MPI Tutorialhtor.inf.ethz.ch/publications/img/hoefler-gg500.pdfGREEN GRAPH500 Torsten...
GREEN GRAPH500
Torsten Hoefler University of Illinois at Urbana-Champaign
and ETH Zürich Talk at ISC’12, Hamburg, Germany
With support of David Bader, Andrew Lumsdaine, Richard Murphy, and Marc Snir
Big Data analysis may dominate datacenter cost Encourage vendors to provide “greener” hardware
MOTIVATION
Torsten Hoefler
Hoefler: “Energy-aware Software Development for Massive-Scale Systems”, EnA-HPC Keynote 2011
Memory 9%
CPU 56%
Network 33%
Memory 2% CPU
11%
Network 79%
Architectural Optimizations
Slide 2 of 9
WHY NOT JUST GREEN500?
Green500 is centered around HPL HPL: extremely structured, FP/Cache intensive
Graph500: unstructured, no good separators, (main) memory and network intensive
Completely different optimization goals! Need to be addressed
by vendors!
Maybe specialized machines?
Torsten Hoefler
0
100
200
300
400
500
600
700
0.1 10 1000
Ene
rgy/
64
bit
(p
J)
Interconnect Distance (cm)
On Die
Chip to chip
Board to Board
Between
cabinets
Source: S. Borkar, Hot Interconnects 2011, Keynote
Slide 3 of 9
REAL COMPARATIVE MEASUREMENTS
Torsten Hoefler
Idle (calibrate wait)
Panel Bcast
Scale=32
Slide 4 of 9
~75 kTEPS/W 452 MFLOPS/W
REAL COMPARATIVE MEASUREMENTS
Torsten Hoefler
Idle (calibrate wait)
Panel Bcast
Scale=32
Slide 5 of 9
~75 kTEPS/W 452 MFLOPS/W
A SECOND DETAILED POWER TRACE
Torsten Hoefler
>3 MTEPS/W measured on BG/Q
Thanks to (IBM): Fabrizio Petrini, Yutaka Sugawara, George Chiu, Paul Coteus, Fabio Checconi, James Sexton, Michael Rosenfield, Gerard V Kopcsay
Slide 6 of 9
Scale=34
THE GREEN GRAPH500 LIST
In close collaboration with Graph500 (same rules)
Will have a separate list and separate awards
http://green.graph500.org/
Measurement techniques compatible with established practice and Green500
Allows comparisons and cross-analyses
Only real measurements, no TDP etc.
Torsten Hoefler Slide 7 of 9
PROCEDURES & TECHNICALITIES
Report used energy for full solution
Metric: TEPS/Average Power [TEPS/W] or [TEPJ]
Test system: ~75 kTEPS/W vs. 452 MFLOPS/W
Count power for compute nodes and network
Either measure single node and switch and sum
Or measure cumulative power (at inlet)
Or any combination of those (rack …)
PUE is irrelevant for the benchmark
Torsten Hoefler Slide 8 of 9
THE FUTURE OF THE LIST
First List: Supercomputing 2012 Submission deadline: aligned with Graph500
Submission details: Through Graph500, provide output data and
energy information, or power trace
May run different (smaller?) problem sizes
Watch http://green.graph500.org/
Support: Thanks to David Bader, Andrew Lumsdaine, Richard
Murphy, and Marc Snir
Torsten Hoefler Slide 9 of 9