New Rules: Sustaining Performance Scaling in a Physical...

20
8/30/16 1 SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY New Rules: Sustaining Performance Scaling in a Physical World Sudhakar Yalamanchili & Many Students and Collaborators Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering Georgia Institute of Technology Sponsors: SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Its a Physical World 2 Next Generation Applications Multi-scale Physical Phenomena Architecture & Package PCB Logic Memory Memory Memory Memory TSV Test pads Glass interposer TPV RDL Thermal adhesive Thermal Vias Encapsulate GPU 10 orders of magnitude scale: 10 -9 - 10 1

Transcript of New Rules: Sustaining Performance Scaling in a Physical...

Page 1: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

1

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

New Rules: Sustaining Performance Scaling in a Physical World

Sudhakar Yalamanchili

& Many Students and Collaborators

Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems

School of Electrical and Computer Engineering Georgia Institute of Technology

Sponsors:

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Its a Physical World

2

Next Generation Applications

Multi-scale Physical

Phenomena

Architecture & Package

PCB

Logic

MemoryMemoryMemoryMemory

TSV

Test  pads

Glass  interposer TPV RDL

Thermal  adhesiveThermal  Vias

Encapsulate

GPU

10 orders of magnitude scale:

10-9 - 101

Page 2: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

2

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Overview

1.  Scaling Power and Thermal Management n Coordinated chip scale management

2.  Towards Data Centric Computing n Back to the future

3.  Pushing Back the Thermal Wall

n Challenges

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

1. Scaling Power and Thermal Management

4

Page 3: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

3

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Power and Performance

Perf opss

!

"#

$

%&= Power W( )×Efficiency ops

joule!

"#

$

%&

Power Supply (regulation) + Power Consumption + Cooling

W. J. Dally, Keynote IITC 2012

5

www.commons.wikimedia.org enercon.com imaging1.com

You can hide latency but not energy!

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Impact of Power and Thermal Limits for Multicore

n Based on scaling using Pentium-class cores modeled with Intsim1

1D. Sekar, A. Naemi, R. Savari, J. Davis, and J. Meindl, “IntSim: A CAD tool for optimization of multilevel interconnect networks,” Proceedings of the IEEE/ACM international conference on Computer-aided Design, 2007

6

Dark Silicon Gap

Page 4: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

4

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Power and Thermal Capacities are Resources

7

mikedavisfaia.wordpress.com

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Managing Thermal Capacity

Graphics Processing Unit (GPU) : 384 Radeon Cores

Multithreaded CPU cores

§ Many resources are shared between the CPU and the GPU – For example, memory hierarchy, and on-chip network

Accelerated Processing Unit (APU)

8

Shared Northbridge ! access to overlapping CPU-GPU physical address spaces

Page 5: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

5

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Programming Model

n Coupled programming model à Offload compute intensive tasks to the GPU n Consequences for energy or power management?

APU Hardware

CPU

Operating System

User Application

OpenCL™ or other Software Stack

Host Tasks

GPU Tasks

GPU

Each OpenCL kernel

Grid of threads, each operating over a data partition

9

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Managing Thermal Capacity: Thermal Coupling

10

n Significant rise in temperature of the idle component due to thermal coupling and pollution

n CPU cores consume thermal headroom more rapidly (4X faster)

n  GPUs sustain power boosts longer!

n Better management for 10%-40% gains in measured energy efficiency are possible

n Power management ≠ thermal management

Temperature on Core 2 when Core 3 is busy and remaining cores are idle

0 1 2 3

I.  Paul, S. Manne, M. Arora, W.L. Bircher, S. Yalamanchili, “Cooperative boosting: needy versus greedy power management”, ISCA 2013.

Page 6: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

6

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Thermal Coupling Effects

11

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.5

1

1.5

2

2.5

3

3.5

1 18

35

52

69

86

103

120

137

154

171

188

205

222

239

256

273

290

307

324

341

Peak

Die

Tem

pera

ture

CPU

& G

PU R

elat

ive

Pow

er

Time (seconds) ->

GPU Pow CPU CU0 Pow CPU CU1 Pow PeakDieTemp

CPU  power  is  limited,  GPU  running  at  max  DVFS  state  

Thermal  coupling  

Temp  thro>ling  

I.  Paul, S. Manne, M. Arora, W.L. Bircher, S. Yalamanchili, “Cooperative boosting: needy versus greedy power management”, ISCA 2013.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Need for Package Scale Coordinated Thermal Management!

12

Page 7: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

7

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Allocating Thermal Capacity: Cooperative Boosting

1.28 1.10 1.13 1.10 1.04

1.36

1.00 0.99 1.10

0.50 0.70 0.90 1.10 1.30 1.50

NDL HS BF FAH BS Viewdle Lbm Perl MEAN

Spee

dup

P0 P2 P4 CB Baseline CPU DVFS

-state HW Only

(Boost)

Pb0 Pb1

SW-Visible

P0 P1 P2 - - - Pmin

(Commercial Part) static power cap

Coordinated, opportunistic consumption of thermal

capacity

I.  Paul, S. Manne, M. Arora, W.L. Bircher, S. Yalamanchili, “Cooperative boosting: needy versus greedy power management”, ISCA 2013.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Impact of Power and Thermal Limits for Multicore

n Based on scaling using Pentium-class cores modeled with Intsim1

1D. Sekar, A. Naemi, R. Savari, J. Davis, and J. Meindl, “IntSim: A CAD tool for optimization of multilevel interconnect networks,” Proceedings of the IEEE/ACM international conference on Computer-aided Design, 2007

14

Power and Thermal capacities are the new shared resources

Dark Silicon Gap

Page 8: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

8

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Managing Power Capacity

15

0%

10%

20%

30%

40%

50%

Total Force Neighbor Comm Other % in

crea

se in

run

-tim

e

CPU DVFS per kernel in miniMD ->

P0 P1 P2 P3 P4

Dynamic Demand Power Sensitivity à Energy Efficiency1

Design Space2 (BFS) Sensitivity

1I. Paul, et.al, “Coordinated Energy Management in Heterogeneous Processors” SC13 2A. McLaughlin et.al, “A Power Characterization and Management of GPU Graph Traversal,” ADMS 2014

Need more DVFS states?

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Its Not Just Compute: Hardware Balance

0

5

10

15

20

25

30

0 5 10 15 20 25 30

Nor

mal

ized

Per

form

ance

Normalized Available ops/byte in Hardware

Compute Limiting Memory Bandwidth Limiting

Relative energy costs of compute

and memory access

Relative ops/byte demand of application

Hardware Balance

Coordinate  Power  States  to  Achieve  Balance  

Balance  Point  How  efficiently  is  energy  used  in  the  core  or  in  memory  system?  

I. Paul, W. Huang, M. Arora, and S. Yalamanchili, “Harmonia: Balancing Compute and Memory Power in High Performance GPUs,” IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2015.

Page 9: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

9

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Shift in the Balance Point

 38    

 123    

 207      292    

0  1  2  3  4  5  6  7  8  

200   400   600   800   1000  Ba

ndwidth  (G

B/s)  

Normalized

 Perform

ance  

Core  Frequency  (MHz)  

 38    

 151    

 264    

0  2  

4  

6  

8  

10  

12  

4   8   12   16   20   24   28   32   36   40   44   Band

width  (G

B/s)  

Normalized

 Perform

ance  

#  AcLve  CUs  

Balance  plane  for  performance  and  energy  

I. Paul, W. Huang, M. Arora, and S. Yalamanchili, “Harmonia: Balancing Compute and Memory Power in High Performance GPUs,” IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2015.

17

Relative energy costs of compute

and memory access

Relative ops/byte demand of application

Hardware Balance

Up to 36% power savings with a maximum performance loss of 3.6%

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Sensitivity Analysis for Chip Scale Coordinated Power Management

18

Page 10: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

10

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Framework for Managing Power Capacity

Assign Power Budgets

regulator

reg reg

Temperature Regulation

Power Regulation

K. Rao, W. Song, S. Yalamanchili, and Y. Wardi, “Temperature Regulation in Multicore Processors using Adjustable-Gain Integral Controllers,” IEEE

Multi-Conference on Systems and Control, September, 2015.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Power Regulation

Power regulation via power state transitions

Unregulated

Regulated

Page 11: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

11

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

2. Emergence of Data Centric Systems

21

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Energy Cost of Data Management

Perf opss

!

"#

$

%&= Power W( )×Efficiency ops

joule!

"#

$

%&

Three operands x 64 bits/operand

DataMovementEnergy = #bits× dist −mm× energy− bit −mm

*S. Borkar and A. Chien, “The Future of Microprocessors, CACM, May 2011

Operator_cost + Data_movement_cost + Storage_cost

W. J. Dally, Keynote IITC 2012

22

*

Page 12: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

12

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Interconnect Energy Taper: Electrical

n Relative costs of compute and memory accesses n Time and energy costs have shifted to data movement

Courtesy Greg Astfalk, HP

Core Core Core Core

L1$ L1$ L1$ L1$

Last Level Cache

DRAM

1’s ns

ms

Data Access Latency

10’s ns

100’s ns

Data Access Energy

23

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Pin Bandwidth Challenges1

n Number of transistors/die continues to grow n Number of pins growing at a slower rate than #transistors n Number of supply pins are crowding out data pins

n Reducing supply current/pin limits growth of #transistor/die

1P. Stanley-Marbell, V. C. Cabezas, and R. P. Luijten, “ Pinned to the Wall – Impact of Packaging and Applications on the Memory and Power Walls,” IEEE/ACM international symposium on Low-Power Electronics and Design (ISPLED), 2011

Data pin bandwidth is not growing as fast as number of transistors on chip

CPU Die

Package Substrate

PCB To DRAM

Die

CPU Die DRAM Die

Si Interposer

Package Substrate

PCB

24

Page 13: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

13

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Back to the Future: Processing Near Memory

25  

PCB

Logic

MemoryMemoryMemoryMemory

TSV

Test  pads

Glass  interposer TPV RDL

Thermal  adhesiveThermal  Vias

Encapsulate

GPU

Kogge – Execube (4K DRAM + 100K gate parallel processor (www3.nd.edu)

Courtesy Gokul Kumar

1990’s

Today

3D Interposer System

BGA

TPVs

Memory Stack

Logic Die

PWB

Draper et.al, DIVA chip (isi.edu)

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Back to Power and Thermal Management!

26

Thermal Coupling in Three Dimensions

Page 14: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

14

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

The (Re)-Emergence of Near Data Processing

Silicon Interposer

Multicore Chip

DRAM S

tack

s

Logic Tier

Memory Tiers

New BW Hierarchy and energy taper

• Hybrid Memory Cube (HMC) • High Bandwidth Memory (HBM)

• Wide I/O

Compute Package Capacity Tier Memory

27

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

NeuroCube: Programmable Digital Neural Emulator

Link

TSV

VC R R R RNoCPE PE PE PE

μ

Vault

DRAM die

PNG

...

...

...

wi,j

xi

Mapped into Memory

Current layer

Host

For all layers One layer is done

Program & initiate new layer

Layerwise operation

D. Kim, J. Kung, S. Chai, S. Yalamanchili, and S. Mukhopadhyay, “Neurocube: A Programmable Digital Neuromorphic Architecture with High Density3D Memory,” IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2016.

High Bandwidth exploited for both training and inference

Collaboration with S. Mukhopadhyay

Page 15: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

15

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

3. Pushing Back the Thermal Wall

29

Performance

Reliability Energy/Power

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Increasing Thermal Capacity

Perf opss

!

"#

$

%&= Power W( )×Efficiency ops

joule!

"#

$

%&

W. J. Dally, Keynote IITC 2012

megahdwall.com

Phase Change Material Microfluidics

wisefull.com

Conventional

Increase Performance/ft3 Impact on Revenue

Page 16: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

16

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Microfluidic Cooling

31

•  Fluid flow through the microchannels carry heat out to an external heat exchanger (e.g., heat sink)

Courtesy L. Zheng (ECE) and Professor Muhannad Bakir (ECE)

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Fabrication

32

Courtesy L. Zheng (ECE) and Professor Muhannad Bakir (ECE)

# / cm2 Specifications

Micropin-fins 1,936 Diameter =150 µm and Pitch = 225 µm

TSVs (4x4 array) 30,976 Diameter = 13 µm and Pitch = 24 µm

ICECOOL

Page 17: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

17

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

FE

SCH

DL1

INT

FPU

Workload Cooling Co-Design

33

Nehalem-like, OoO cores; 3GHz, 1.0V, max temp 100◦C DL1: 128KB, 4096 sets, 64B IL1: 32KB, 256 sets, 32B, 4 cycles;

L2 & Network Cache Layer: L2 (per core): 2MB, 4096 sets, 128B, 35 cycles; DRAM: 1GB, 50ns access time (for performance model)

Ambient: Temperature: 300K

•  Thermal Grids: 50x50 •  Sampling Period: 1us •  Steady-State Analysis

2.1mm x 2.1mm

8.4mm x 8.4mm

16 symmetric cores L2 L2 L2L2

L2 L2 L2L2

L2 L2 L2L2

L2 L2 L2L2

2.1m

m

2.1mm

8.4  mm

8.4  mm

H. Xiao, Z. Min, S. Yalamanchili and Y. Joshi, “Leakage Power Characterization and Minimization over 3D Stacked Multi-core Chip with Microfluidic Cooling,” IEEE Symposium on Thermal Measurement, Modeling, and Management (SEMITHERM), March 2014

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Architecture-Cooling Co-Design: Performance Normalized EPI comparison among 4

pin fin structures

34

Zhimin Wan, He Xiao, Yogendra Joshi, Sudhakar Yalamanchili, “Co-Design of Multicore Architectures and Microfluidic Cooling for 3D Stacked ICs,” Microelectronics Journal, May 2014.

Page 18: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

18

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Thermally Adaptive Architecture: Delay Scaling

LCC access reduction of 42% from 300oK to 400oK

n Increase the dynamic operating range of the package

n Increase performance per volume

n Coupled with power management

Cu heat spreader Memory Memory Cache

Processor

BT substrate

PCB

Back Side Air Cooling

H. Xiao, W. Yueh, S. Mukhopadhyay and S. Yalamanchili, “Thermally Adaptive Cache Access Mechanisms for 3D Many-Core Architectures,” IEEE Computer Architecture Letters, October 2015.

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Impact of Technology Scaling & Thermal Capacity

36

0.00  1.00  2.00  3.00  4.00  5.00  6.00  7.00  

11  14  18  20  24  28   Performan

ce  increase  from

 ba

selin

e  

Technology  Node  

Pkts/sec  with  Air  Cooling  

Pkts/sec  with  Liquid  Cooling  

Frequency must be reduced to keep max. temperature at

100°C, reducing performance

Estimated Scaling Factors

28nm 24nm 20nm 18nm 14nm 11nm

Area Scaling 1.00 0.74 0.52 0.42 0.25 0.16 Dynamic Power

Scaling 1.00 0.78 0.57 0.53 0.30 0.19

Leakage Power Scaling 1.00 0.96 0.90 0.83 0.80 0.75

Power Density 1.00 1.12 1.28 1.46 1.76 2.17

Altera Networking App 28 24 20 18 14 11 Estimated Avg. Power

Density (W/cm2) 22.82 26.56 29.87 34.59 39.09 50.73

Parametric scaling models tuned based on foundry data

Flow outlet Flow

inlet

Altera Networking Design: Single 3-die 3D FPGA

Page 19: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

19

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Visibility: The Energy Stack

37

Circuit level: DVFS, power states, clock gating

Chip and Package: power multiplexing, spatiotemporal

migration

Board: applications, virtual power, operating system,

middleware

Rack: mechanical design, airflow, HVAC

Memory(SRAM)

Processor(Logic)

Memory(DRAM)

n Multi-scale: Power, Performance and Reliability (PPR) events occur at distinct time scales for distinct durations

n Impact of workload models at all levels

n Transient vs. steady state

n Importance of composition of models for use in systems as well as simulators

Needed: Visibility at the Higher Levels!

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

What Next ?

Novel Cooling

Technology

Thermal Field

Modeling

Power Distr.

Network Power

Management

μarchitecture

Algorithms

Microarchitecture  and  Workload  ExecuLon  

Power  DissipaLon  

Thermal  Coupling  and  Cooling  

DegradaLon    and  Recovery  

Training Data

Model Parameters

Regression, Model Estimation

On-line, Coodinated Global Power and Thermal Management

Building Models

Power-Architecture-Thermal Co-Design Architecture-Physical Modeling

Energy Efficient Design

Page 20: New Rules: Sustaining Performance Scaling in a Physical Worldcasl.gatech.edu/wp-content/uploads/2016/08/yalamanchili.pdfBased on scaling using Pentium-class cores modeled with Intsim1

8/30/16

20

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Acknowledgements: Collaborators

n Students n Nawaf Alamoosa, Xinwei Chen, Minki Cho, Minhaj Hassan, Chad

Kersey, Duckhwan Kim, Jaeha King,, Karthik Rao, William Song, Zhimin Wan, He Xiao

n Colleagues n Muhannad Bakir, Sek Chai, Yogendra Joshi, SriLatha Manne, Saibal

Mukhpadhyay, Indrani Paul, Yorai Wardi

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 40

Applications &

Architecture

Power

NoC

Technology & Cooling

Thank YouQuestions?

Scaling Performance