Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini...

55
Model predictive control and self-learning of thermal models for multi- core platforms* Luca Benini [email protected] ork supported by Intel Labs Braunschweig

Transcript of Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini...

Page 1: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Model predictive control and self-learning of thermal models for

multi-core platforms*

Luca [email protected]

*Work supported by Intel Labs Braunschweig

Page 2: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

The Power Crisis

M.Pedram [CADS10]

mobile clients

Data center, HPC,server0%

20%

40%

60%

80%

100%

Macbook Macbook Air iPhone-3G MacPro-Desktop

Co

2 E

mis

sio

ns

%Transport

Production

Use

Run-time power ~50% of ICT

environmental impact and cost!

Page 3: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

3

The Thermal Crisis

• Never-ending shrinking: smaller, faster…

Free lunch?!

[Sun, 1.8 GHz Sparc v9

Microproc]

[Sun, Niagara

Broadband Processor]

CMOS65nm

CMOS 45nm

CMOS32nm

[Coskun et al ‘07, UCSD]

• Multi-Processor SoC possible“Cool” chips, “hot” applications

Not really…

[Cell Multi-

Processor]

system wear-out

and lifetime

reliability degradation !!

• Thermal issues: hot-spots, thermal gradients…

Page 4: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

3D-SoCs are even worse

Page 5: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

A System-level View• Heat density trend 2005-2010 (systems)

[Uptime Institute]

Cooling and hot spot avoidance is an open issue!

Page 6: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

• Increasing power density• Thermal issues at

multiple levels– Chip / component level– Server/board level– Rack level– Room level

Multi-scale Problem

Today’s focus: Chip level

Page 7: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Thermal Management

Spatial and temporalworkload variation

software

High power

densities

Tecnology scalingHigh performace requirements

Limitated dissipation capabilities

System integrationCosts

power, temperature, performanceNON UNIFORM:

Hot spots, thermal gradients and cycles

Leakage current Reliability lost,

Aging

Dynamic Approach:

on-line tuning of system performance andtemperature through closed-loop control

Page 8: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Control algomigration policy

System information

from OSWorkloadCPU utilization,

queue status

PowerReliability

alarms Temperature

Core 1 Core N …Proc. 1 Proc. 2

INTERCONNECT

Private

Mem

Private

Mem…

Multicore Platform

Management Loop: Holistic view

SW

HWIntrospection: monitors Self-calibation: knobs (Vdd, Vb, f,on/off )

Task scheduling - migration

Proc. N

Private

Mem

Page 9: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Outline

• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion

Page 10: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

O.S

DRM - General Architecture

• System (Chip Scale)• Sensors

– Performance counter - PMU

– Core temperature• Actuator - Knobs

– ACPI states– P-State DVFS– C-State PGATING

– Task allocation• Controller

– Reactive– Threshold/Heuristic– Controller theory

– Proactive – Predictors

Simulation snap-shot

L2

CPU1L1

L2

DRAM

Network

CPU2L1

CPUNL1

HW

SW

App.1

Thre

ad

1 ...

Th

read

N

App.N

Th

read

1 ...

Th

read

N.......

f,v

PGATING

TCPU,#L1MISS,#BUSACCESS,CYCLEACTIVE,....

CONTROLLER

• Controller • Minimize energy

• Bound the CPU temperature

• Our approach• Stack of controllers

• Energy controller

• Thermal controller

CONTROLLER

Energy

Thermal

TifECi

WLei

fTCi

Page 11: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Energy ControllerCPU BOUND TASK

MEMORY BOUND TASK

cpu

cache

High Frequency

Power

Exec

Time

Energy

1

0

cpu

cache

Low Frequency

Power

Exec

Time

Energy

1

0

• Performance Loss• Power reduction• Energy Efficiency Loss!

cpu

dram

cache

High Frequency

Power

Exec

Time

Energy

1

0

cpu

dram

cache

Low Frequency

Power

Exec

Time

Energy

1

0

• Same Performance• Power reduction• Energy Efficiency Gain!

Page 12: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Energy Controller

cpu

cache

cpu

dram

cache

Low Frequency

High Frequency

Power

Exec

Time

Energy

1

0

• Performance Loss• Power reduction• Energy Efficiency Loss!

Power

Exec

Time

Energy

1

0

• Same Performance• Power reduction• Energy Efficiency Gain!

CPU BOUND TASK

cpu

cache

High Frequency

Power

Exec

Time

Energy

1

0

cpu

dram

cache

Low Frequency

Power

Exec

Time

Energy

1

0

MEMORY BOUND TASK

OUR SOLUTION •Power Saving• No performance Loss• Higher Energy Efficiency

Page 13: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Outline

• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion

Page 14: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Thermal Controller

COMPLEXITY[Intel®, ISSCC 2007]

Threshold basedcontroller

•T > Tmax low freq•T < Tmin high freq• cannot prevent overshoot• thermal cycle

Classical feed-back controller

• PID controllers• Better than threshold based approach• Cannot prevent overshoot

Model Predictive Controller

•Internal prediction:avoid overshoot

•Optimization: maximizes performance

• Centralized• aware of neighbor

cores thermal influence

• All at once – MIMO controller

• Complexity !!!

Thermal Model

Past input & output

OptimizerFuture input

Target frequency

+-

Cost function

MPC

Futureoutput

Future error

Constraint

Page 15: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

CoreN

Core1

Corei

multicore

MPC Robustness

Thermal Model

Past input & output

OptimizerFuture input

Target frequency

+-

Cost function

MPC

Futureoutput

Future error

Constraint

MPC needs a Thermal Model

• Accurate, with low complexity • Must be known “at priori”• Depends on user configuration• Changes with system ageing

“In field” Self-CalibrationWorkload

tCoreN

Workload

t

Workload

tCorei

Workload

t

Workload

t

Workload

tCore1

Workload

t

Workload execution

Training tasks

Workload

Power

Temperature

System Identification

Identified State-SpaceThermal Model

• Force test workloads• Measure cores temperatures• System identification

Page 16: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Outline

• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion

Page 17: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

task tasktask

Modello Termico

TjThermalmodel

Tn,j

Pn,j

Modello di potenza

TaskjPj

Power model

P=g(task,f)

Thermal Model & Power Model

Page 18: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

si sisi

sisi

sisi

sisi

Cu cucu cucu

02 20 13 31

01

10

23

32

P0 P1

Matrix A Matrix B

Si 00 Si 11

Cu 22 Cu 33

Tn,j

Pn,j

Tenv

Model Structure

Page 19: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Modelleast squareoptimization

DataOptimal parameters

Test pattern

Systemresponse

A,B

e = Tmodel - Treal

Parametric optimization

Parameters

Cost function

Error Function

i

LS System Identification

Page 20: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

PREPROCESSING

Pattern Generator Workloader

core0 core1 coreN

XTS

LS

MODEL

LS

MODEL

PREPROCESSING

PRBS.csv

DATA.csv

..0,0,1,1,1,…

..idle,idle,run,run,run,…

•Temperature• Frequency• Workload

C/FORTRAN(SLICOT,MINPACK)

Matlab System Identification• N4SID• PEM• LS (Levenberg-Marquardt)

HW

(Ts=1/10ms)

Fan boardFan board

CPU2CPU2 CPU1CPU1

Storage drivesStorage drives

SUN FIRE X4270SUN FIRE X4270

Air

flo

w

RAMRAMRAMRAM

ChipsetChipset

CPU1CPU1CPU2CPU2

• Intel Nehalem

• 8core/16thread

• 2.9GHz

• 95W TDP

• IPMI

• Intel Nehalem

• 8core/16thread

• 2.9GHz

• 95W TDP

• IPMI

Experimental setup

Page 21: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

0 5 10 15 20 25 30 3532

34

36

38

40

42

44

°C

Tcore2

0 5 10 15 20 25 30 350

0.2

0.4

0.6

0.8

1

Pcore2

Time (seconds)

Wo

rklo

ad

Workload & Temperature

Temperature trace

Pseudorandom workload pattern

Page 22: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Identification based on pure LS fitting

TEMPERATURA MISURATA vs. SIMULATA

25 26 27 28 29 30 31 32 3334

35

36

37

38

39

40

41

42

43

44

45Tcore2

seconds

°C

Measured

Thermal Model

MEASURED vs. SIMULATED TEMPERATURE

Black-box Identification

Page 23: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

13.6 13.65 13.7 13.75 13.8 13.85 13.9 13.95 14 14.05 14.132

34

36

38

40

42

44

Tcore2

seconds

°C

MeasuredThermal Model

1.36 1.365 1.37 1.375 1.38 1.385 1.39 1.395 1.4 1.405 1.41

x 104

1

1.5

2

2.5

3

3.5

4x 10

9

f [H

z]

1.36 1.365 1.37 1.375 1.38 1.385 1.39 1.395 1.4 1.405 1.41

x 104

-1

0

1

2

W

time (ms)

PROBLEM 1: “WORKLOAD” signal does not include power variation due to frequency changes.

FREQUENCY

WORKLOAD

SOLUTION:

Learn Power Model too!

Partially unobservable model

Page 24: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Power model P=g(w,f) initially unknown

Powermodel(LUT)

Thermal model(A,B)

Tfw

P

1° STEP: set f=const, set w as [0|1]N sequence [P1|P0]N with P1, P0 pre-measured in steady state, we measure T to obtain A0 by LS2° STEP: A is known, we set f, w, we measure T, we invert A and we obtain P

Multi-step Identification

3° STEP: P is known, we now generate richer sequence w,f and we re-calibrate A by LS

Iterate until convergence

Page 25: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

13.6 13.65 13.7 13.75 13.8 13.85 13.9 13.95 14 14.05 14.1

34

35

36

37

38

39

40

41

42

Tcore2

seconds

°C

MeasuredThermal Model

1.36 1.365 1.37 1.375 1.38 1.385 1.39 1.395 1.4 1.405 1.41

x 104

1

1.5

2

2.5

3

3.5

4x 10

9

f [H

z]

1.36 1.365 1.37 1.375 1.38 1.385 1.39 1.395 1.4 1.405 1.41

x 104

-1

0

1

2

W

time (ms)

20.5 21 21.5 22 22.5 23

40.5

41

41.5

42

42.5

Tcore2

seconds

°C

Measured

Thermal Model

Problem 2: Instable Model

Tamb

Tcore

t

Tcore differs from Tamb with P=0

Problem 3: Model is not physical

Identification algorithm must be aware of physical properties to avoid over-fitting

Validation

Page 26: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Constraint on initialcondition

Linear constraint

CONSTRAINED LEAST SQUARES

26 26.2 26.4 26.6 26.8 27 27.2

37

38

39

40

41

42

43

44Tcore2

seconds

[°C

]

Measured

Thermal Model

Constrained Identification

Page 27: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

100 200 300 400 500 60032

34

36

38

Tcore2[°

C]

Measured

Thermal Model

100 200 300 400 500 60030

35

Tcore0

[°C

]

Measured

Thermal Model

100 200 300 400 500 60030

35

Tcore1

[°C

]

Measured

Thermal Model

100 200 300 400 500 60030

35

Tcore3

Seconds

[°C

]

Measured

Thermal Model

Possible causes:

• Package thermal inertia?

• Environment inertia (Air)?

• PLEAK temperature dependency?

Identification with pseudorandom trace:

• Too many samples, huge LS computation

Large time constant

Quasi-steady-state accuracy

Page 28: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

• Modelling the third time constant as heat sink temperature variation

• One-pole model identification

Enviromentthermalmodel

CPU thermal model

Theatsink

PT

Tenv

Addressing models stiffnes

100 200 300 400 500 60030

35

40Tcore

[°C

]

Measured

Thermal Model

100 200 300 400 500 60030

35

40Tcore

[°C

]

Measured

Thermal Model

100 200 300 400 500 60030

35

40Tcore

[°C

]

Measured

Thermal Model

100 200 300 400 500 60030

35

40Tcore

Seconds

[°C

]

Measured

Thermal Model

Page 29: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Outline

• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion

Page 30: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

4

36

MPC Scalability

[Intel, ISSCC 2007] [Intel, ISSCC 2007]

MPC Complexity• Implicit - a.k.a. on-line

• computational burden

• Explicit – a.k.a. off-line• high memory

occupation

# CORES

# E

xpli

cit

reg

ion

s

[Intel, ISSCC 2007] [Intel, ISSCC 2007]

8

1296

16

Ou

t o

f m

emo

ry

Complexity grows superlinearly with number of cores!!

Page 31: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Addressing Scalability

On a short time window, power has a local thermal effect!

4

36

# CORES

# E

xpli

cit

reg

ion

s

8

1296

16

Ou

t o

f m

emo

ry

8 16 32

One controller for each coreController uses:• local power & thermal model• neighbor’s temperatures

Fully distributedComplexity scales linearly with #cores

Page 32: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

CoreN

Core1

Corei

multicore

Distributed Control

Distributed and hierarchical controllers:– Energy Controller (EC)

• Output Frequency fEC

– Minimize power – CPI based– Performance degradation < 5%

– Temperature Controller (TC) • Distributed MPC• Inputs:

– fEC ,TCORE, TNEIGHBOURS

• Output– Core frequency (fTC)

fTC i,k+1

fTC N,k+1

T i,k

T i,k

controllernode

Ti+1,k

Ti-1,k

TN-1,k

Tx,k

T2,k

TCi

TCN

fEC i,k+1

ECi

CPI i,k+1

CPI N,k+1

ECN

fEC N,k+1

Page 33: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Thermal ControllerCore 1CPI

f1,EC

f1,TC

Thermal ControllerCore 2CPI

f1,EC

f2,TC

Thermal ControllerCore 3CPI

f1,EC

Thermal ControllerCore 4CPI

f1,EC

PLANT

T1+Tneigh

T1+Tneigh

f4,TC

T3+Tneigh

T4+Tneigh

f3,TC

Ene

rgy

Con

trol

ler

High Level Architecture

Distributed Controller

Page 34: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

CPI

f1,EC

Observer

P1,EC

Distributed Thermal Controller

g(·) MPC ControllerLinear Model

QP Optimizx1

TENV

g-1(·)CPI

P1,TC

f1,TC

MPC Controller Core 1

Nonlinear(Frequency to Power)

Linear(Power to Temperature)

s.t

2 states per core

T1

Classic Luenberger state observer

Implicit formulation

Page 35: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

?

Region number

x1SHIFTEDx1 +

A1-1·B1

[x1 ; TENV ; P1,EC]

REGION NUMBER

GAIN MATRICES

1 F1 , G1

2 F2 , G2

… …

nr Fnr , Gnr

u(k)=∆P1 +

P1,EC

P1,TC

Our aim is to minimize the difference between the input P1,TC (also called manipulated variable MV) and the reference (P1,EC). Our controller can only take in account a constant reference. To overcome this limitation we reformulate the tracking problem as a regulation problem consisting in taking the ∆P1 (the new MV) to 0. The regulated power P1,TC is:

At each time instant the system belongs to a region according with its current state. On each region the explicit controller executes the following linear control law:

The prediction evaluated by our explicit controller cannot take into account the measured disturbances (uMD=[Tenv, P1, Tneigh]). Thus we exploit the superposition principle of linear systems:

To remap the effect of these elements we exploit the model to modify the state (x(k) xSHIFTED(k)) projecting one step forward the MDs effects.

Explicit Distributed Controller

Page 36: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

CoreN

Core1

Corei

multicore

O.S.Implementation – Linux SMP• Controller routines

– Scheduler Routine Extension• Is distributed• Executes on the core it relies on

– Timing: Scheduler tick (1-10ms)

• CPI estimation– Performance counters:

• Clock expired• Instructions retired

• Energy Controller– Look-up-table:

• fEC = LuT [ CPI ]

• Thermal Controller– Core Temperature Sensors– Matrix Multiplication & Look-up-table:

• fTC = LuT [ M*[TCORE, TNEIGHBOURS] ]

fTC i,k+1

fTC N,k+1

T i,k

T i,k

controllernode

Ti+1,k

Ti-1,k

TN-1,k

Tx,k

T2,k

TCi

TCN

fEC i,k+1

ECi

CPI i,k+1

CPI N,k+1

ECN

fEC N,k+1

Page 37: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

System Identification

“In field” Self-Calibration

• Force test workloads• Measure cores temperatures• System identification

Model Learning Scalability

Thermal Model

Past input & output

OptimizerFuture input

Target frequency

+-

Cost function

MPC

Futureoutput

Future error

Constraint

MPC Weaknesses – 2nd

Internal Thermal Model

• Accurate, with low complexity • Must be known “at priori”• Depends on user configuration• Changes with system ageing

Workload

tCoreN

Workload

t

Workload

tCorei

Workload

t

Workload

t

Workload

tCore1

Workload

t

Workload execution

Training tasks

Workload

CoreN

Core1

Corei

multicore

Temperature

Power

Identified State-SpaceThermal Model

Complexity issue• State-of-the-art is centralized• Least square based – is based on matrix inversion (cubic with #cores)

# CORES

Tim

e -

s

4

105

16

>1h

8

1230

1 2 4

Distributed approach: each core identifies its

local thermal modelComplexity scales linearly with #cores

Page 38: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Outline

• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion

Page 39: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Simulation StrategyTrace driven Simulator [1]:• Not suitable for full system simulation (How to simulate O.S.?)• looses information on cross-dependencies

resulting in degraded simulation accuracy

Co

ntr

ol

stra

teg

y

Multicore Simulator

Workload Set

Power Model

Temperature Model

Execution Trace database

Multicore Simulator

Workload

Power Model

Temperature Model

Co

ntr

ol

stra

teg

y

Close loop simulator:• Cycle accurate simulators [2] :

• High modeling accuracy• support well-established power and temperature

co-simulation based on analytical models and system micro-architectural knowledge

• Low simulation speed• Not suitable for full-system simulation

• Functional and instruction set simulators:• allow full system simulation• less internal precision• less detailed data• introduces the challenge of having accurate

power and temperature physical models

no micro-architectural model

[1] P Chaparro et al. Understanding the thermal implications of multi-core architectures. 2007[2] Benini L. et al. MPARM: Exploring the multi-processor SoC design space with SystemC 2005

Page 40: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Virtual Platform

Virtual Platform

Simulator

SIMICS

RUBY COREStall Mem

AccessCPU1 CPU2 CPUN

L2L1

L2

DRAMNetwork

L1 L1HWSW

App.1

T1 ... TN

App.N

T1 ... TN

O.S.

....

Simics by Virtutech:• full system functional simulator• models the entire system:

peripherals, BIOS, network interfaces, cores, memories

• allows booting full OS, such as Linux SMP • supports different target CPU (arm, sparc, x86)• x86 model:

• in-order• all instruction are retired in 1 cycle• does not account for memory latency

Memory timing model• RUBY – GEMS (University of Wisconsin)[1]

• Public cycle-accurate memory timing model

• Different target memory architectures• fully integrated with Virtutech Simics• written in C++• we use it as skeleton to apply our add-

ons (as C++ object)

[1] Martin Milo M. K. et al. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset 2005

Page 41: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Virtual Platform

Virtual Platform

Simulator

SIMICS

PC#hlt,stall

active,cycles

RUBY COREStall Mem

Access

DVFSfi

fi

fi ,VDD

CPU1 CPU2 CPUN

L2L1

L2

DRAMNetwork

L1 L1HWSW

App.1

T1 ... TN

App.N

T1 ... TN

O.S.

....

#L1MISS, #BUSACCESS, CYCLEACTIVE,....

f,v f,v f,v

Performance knobs (DVFS) module:• Virtutech Simics support frequency change at run-time• RUBY does not support it:

• does not have internal knowledge of frequency• We add a new DVFS module to support it :

• ensures L2 cache and DRAM to have a constant clock frequency• L1 latency scale with Simics processor clock frequency

Performance counters module:• Needed by performance control policy • We add a new Performance Counter module to support it

• exports to O.S. and application different quantities: • the number of instruction retired, clock cycles and stall cycles expired,

halt instructions,…

Page 42: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Virtual Platform

Virtual Platform

Simulator

SIMICS

PC#hlt,stall

active,cycles

RUBY COREStall Mem

Access

DVFSfi

fi

fi ,VDD

POWER MODELPCORE, PL1, PL2

CPU1 CPU2 CPUN

L2L1

L2

DRAMNetwork

L1 L1HWSW

App.1

T1 ... TN

App.N

T1 ... TN

O.S.

....

#L1MISS, #BUSACCESS, CYCLEACTIVE,....

f,v f,v f,v

Power model module:• At run-time estimate the power consumption of the target architecture• Core model PT = [PD(f,CPI) + PS(T, VDD)] *(1 − idleness) + idleness *(PIDLE)• PD experimentally calibrated analytical power model• Cache and memory power – access cost estimated with CACTI [1]

[1] Thoziyoor Shyamkumar et al. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. 2008

Page 43: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Power Model

Power model interface

Simulation snap-shot Ruby &simics

Ruby &simics

L2

CPU1L1

CPU NL1

L2

DRAM

CPU2L1

Network

f1 = k1 * fnom f2 = k2 * fnom fN = kN * fnom

i-th CPU# Istruction retired# Stall Cycle # HLT Istructioni-th L1# Line & WD Read# Line & WD Write

TVf dd ,,

i-th L2

# Line Read# Line Write

DRAM

# Burst Read# Burst Write

POWER MODEL

POWER & ENERGY

Page 44: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Modeling Real Platform – Power

EkCKDCBCKDDAD CPIfkkkfVkP )(2

Real Power Measurement • Intel server system S7000FC4UR

• 16 cores - 4 quad cores Intel® Xeon® X7350, 2.93GHz• 16GB FBDIMMs• Intel® Core™ 2 Duo architecture

• At the wall Power consumption • test:

• set of synthetic benchmarks with different memory pattern accesses • forcing all the cores to run at different performance levels• for each benchmark we extract the clocks per instruction metrics (CPI) and

correlate it with the power consumption

• We relate the static power with the operating point by using an analytical model

High accuracy at high and low CPI

Page 45: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Virtual Plattform

Virtual Platform

Simulator

SIMICS

PC#hlt,stall

active,cycles

RUBY COREStall Mem

Access

DVFSfi

fi

fi ,VDD

POWER MODELPCORE, PL1, PL2

CPU1 CPU2 CPUN

L2L1

L2

DRAMNetwork

L1 L1HWSW

App.1

T1 ... TN

App.N

T1 ... TN

O.S.

....

#L1MISS, #BUSACCESS, CYCLEACTIVE,....

TEMPERATURE MODELT

TCPU,

f,v f,v f,v

Temperature model module:• we integrate our virtual platform with a thermal simulator [1]• Input: power dissipated by the main functional units composing the target platform• Output: Provides the temperature distribution along the simulated multicore die area

as output

[1] Paci G. et al. Exploring ”temperature-aware” design in low-power MPSoCs

Page 46: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

PCPU1PL1

PCPUnPL1

PCPU2PL1

L2L2

Network

Thermal Model

Methods to solve temperature

POWER MODEL

POWER & ENERGY

sisisi

sisi

sisi

si

si

Cu cucu cu cu

CPU2

L2

CPU1 CPUn

L1L1L1

Heat spreaderIC package

Package pin

PCB

IC die

Thermal ModelTi

Page 47: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Modeling Real Platform– Thermal• Thermal Model Calibration :

• Derived from Intel® Core™ 2 Duo layout

• We calibrate the model parameter to simulate real HW transient

• High accuracy (error < 1%) and same transient behavior

<1%

Page 48: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Virtual Platform Performance• Host:

• Intel® Core™ 2 Duo • 2.4 Ghz• 2GB RAM

Simics + Ruby:

Simics + Ruby + DVFS:

Simics + Ruby + DVFS + Power:

Simics + Ruby +DVFS + Power + Thermal interface:

Simics + Ruby +DVFS +Power +Thermal Model:

• Target:• 4 core Pentium® 4 • 2GB RAM• 32 KB private L1 cache• 4 MB shared L2 cache• Linux OS

Tsim = 1040 s

Tsim = 1045 s

Tsim = 1110 s

Tsim = 1160 s

Tsim = 1240 s

68 cellsT = 100ns

Compute every 13us

1 Billion instruction

+ 7% + 19.2%

Page 49: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Mathworks Matlab/Simulink

• Numerical computing environment developed to design, implement and test numerical algorithms

• Mathworks Simulink – for simulation of dynamic systems: simplifies and speedups the development cycle of control systems

• Can be called as a computational engine by writing C and Fortran programs that use Mathworks Matlab’s engine library

• Controller design - two steps: • developing the control algorithm that optimizes the system

performance • implementing it in the system

We allow a Mathworks Matlab/Simulink description of the controller to directly drive at run-time the performance knobs of the emulated system

Page 50: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Simulator Virtual Platform

Virtual Platform

Virtutech Simics

PC#hlt,stall

active,cycles

RUBY COREStall Mem

Access

fi ,VDD

POWER MODELPCORE, PL1, PL2

CPU1 CPU2 CPUN

L2L1

L2

DRAMNetwork

L1 L1HWSW

App.1

T1 ... TN

App.N

T1 ... TN

O.S.

....

f,vPGATING

#L1MISS, #BUSACCESS, CYCLEACTIVE,....

TEMPERATURE MODELT

Mat

hwor

ks

Mat

lab

Controller

MAT

LAB

Inte

rfac

eP

CPI

P,T

TCPU,

Mathworks Matlab interface:• New module named Controller in RUBY• Initialization: starts the Mathworks Matlab engine concurrent process,• Every N cycle - wake-up:

• send the current performance monitor output to the Mathworks Simulink model• execute one step of the controller Mathworks Simulink model• propagate the Mathworks Simulink controller decision to the DVFS module

DVFSfi

fi

CONTROL-STRATEGIES DEVELOPMENT CYCLE1. Controller design in Mathworks Matlab/Simulink framework

• system represented by a simplified model • obtained by physical considerations and identification techniques

2. Set of simulation tests and design adjustments done in Simulink

3. Tuned controller evaluationwith an accurate model of the plant done in the virtual platform

4. Performance analysis, by simulating the overall system

Page 51: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Outline

• Introduction• Energy Controller• Thermal Controller architecture• Learning (self-calibration)• Scalability• Simulation Infrastructure• Results• Conclusion

Page 52: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

ResultsEnergy Controller (EC)

–Performance Loss < 5%–Energy minimization

Temperature Controller (TC) – Complexity reduction

• 2 explicit region for controller

–Performs as the centralized• Thermal capping

<0.3° <3%

<3%

Page 53: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Next Steps

• Now working on the embedded implementation • Server multicore platform and Intel ® SCC

• Explore thermal aware scheduler solution • co-operate with presented solution

• Develop distributed+multi-scale solution for data-centers

Page 54: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Thermal-aware task scheduling

1

2

3

4

56

Sensor DataDatabase

CFD simulationsoftware

PolicyControllerSchedulerOther Impact

factors

Collecting environmental data andload information from sensors

`

Correlation ofload & power

Cost Analysis

Scheduling Policy

Control Policy

Incoming task

Onsite survey

Map load to powerconsumption

History Sensor Data

Current Sensor Data

Schematic View of Thermal Management

Datacenter

Abstract HeatModel

Page 55: Model predictive control and self- learning of thermal models for multi-core platforms* Luca Benini Luca.benini@unibo.it *Work supported by Intel Labs.

Thank you!