Power and Temperature-Aware

download Power and Temperature-Aware

of 43

Transcript of Power and Temperature-Aware

  • 8/7/2019 Power and Temperature-Aware

    1/43

    Power and Temperature-AwareClock Frequency and Thread

    Assignment in Multi-layer MPSoC

    Kyungsu Kang*, Sungjoo Yoo** and Chong-Min Kyung*

    * KAIST** POSTECH

  • 8/7/2019 Power and Temperature-Aware

    2/43

    Contents

    Introduction

    Challenges in multi-layer MPSoC Problem definition and Preliminaries

    Relate works

    Motivational Example

    Proposed method

    Temperature-slack based DVFS

    Thread assignment exploiting memory-boundness

    Experimental result Conclusion

    2MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    3/43

    3D integration of MPSoC

    Merits:

    1. Small footprint2. Short wire length

    3. Heterogeneous integration

    4. Wide bandwidth

    Challenges:1. Temperature

    2. Yield

    3. CAD support

    3MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    4/43

    Challenges of multi-layer MPSoC

    Temperature-related

    problems

    )exp(Tk

    EAMTTF a

    =

    ReliabilityLeakage power

    Performance Cooling cost

    )exp(2

    T

    BTAPl

    =

    Temperature

    Cost

    heatsink

    Fan

    Liquid (water,

    nitrogen, etc.)

    4MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    5/43

    Objective of the research

    Developing temperature-aware power management methods

    (i.e., DVFS, thread assignment) to maximize instructionthroughput in 3D multi-processor systems.

    Performance

    monitor

    Temperature

    monitor

    3D multi-processor system

    Operating system

    Threadassignment

    DVFS

    Application

    5MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    6/43

    Challenges

    Different thermal characteristics compared with 2D systems

    Instantaneous (not steady-state) temperature analysis

    Consideration of workload characteristics (e.g, instructions percycles, memory-boundness)

    Many systems with peak power constraint [ISCA05][Intel]

    [ISCA05] M. Annavaram et al., Mitigating Amdahls low through EPI throttling, in Proc. ISCA., June 2005, pp. 298-309.

    [Intel] Intel Turbo Boost White Paper. [Online] Available:http://www.intel.com/technology/turboboost/6MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    7/43

  • 8/7/2019 Power and Temperature-Aware

    8/43

    Thermal characteristics of 3D systems Heat flow

    Heat is propagated vertically to the heat sink through other cores in

    between and dissipated at the heat sink.

    + Tamb

    P2

    Core1 Core 2

    Core 3

    P3

    P1

    Rinter

    RhsC

    Rintra

    + Tamb

    C

    CRhs

    Rintra Rhs=1.22 K/W

    Rinter=0.15 K/W

    Rintra=2.44 K/W

    Simplified thermal model [Zhu,

    TCAD08]

    Rintra 16 Rinter

    8C. Zhu et al., Three-dimensional chip-multiprocessor runtime thermal management, IEEE Trans. Comput.-Aided Design Integr.

    Circuits Syst., vol. 27, no. 8, pp. 1479-1492, Aug. 2008.

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    9/43

    3D thermal characteristics Thermal coupling

    Thinning of silicon layer makes strong mutual thermal coupling among

    vertical adjacent cores.

    Thermal resistance:AkHRth

    =

    Heat sink< 6,000

    ~300

    > 50

    Thickness (m) Core `3 Core `4

    Core 1 Core 2

    Core 1 Core 2

    where H: thicknessk: thermal conductivityA: surface area

    (Source: Zhou, TPDS10)

    9

    X. Zhou et al., Thermal-aware task scheduling for 3D multi-core processors, IEEE Trans. Parallel and Distributed Syst.,

    vol. 21, no. 1, pp. 60-71, Jan. 2010.

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    10/43

    3D thermal characteristics Layer-dependent cooling efficiency

    Cores near the heat sink have lower temperature than those far from the

    heat sink

    + Tamb

    P2

    Core 2

    Core 3

    P3Rinter

    RhsC

    C

    Steady-state

    temperature:

    ambhsss TRPPT ++= )( 322

    ssssTRPT 2inter33 +=

    ssss TT 32 ; Equality holds if and only ifP3=0.

    Cool job on

    Core 3

    Hot job

    on Core2

    (Source: Zhou, TPDS10)

    10X. Zhou et al., Thermal-aware task scheduling for 3D multi-core processors, IEEE Trans. Parallel and Distributed Syst.,

    vol. 21, no. 1, pp. 60-71, Jan. 2010.

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    11/43

    Instantaneous vs. steady-state temperature

    High-level temperature model [Liao, TCAD05]:

    Instantaneous temperature needs

    to be considered as thermal time

    constant of several hundreds of

    milliseconds is much longer than

    DVFS time step. [Skadron,

    TACO04]

    where

    Pis power consumption of a core,

    R and Care thermal resistance and capacitance of a core, respectively.

    Tinit(Tss) is initial (steady-state) temperature of a core,

    Tss : steady-state temperature

    Driving temperature force

    11

    W. Liao et al., Temperature and supply voltage aware performance and

    power modeling at microarchitecture level, IEEE Trans. Comput.-Aided

    Design Integr. Circuits Syst., vol. 24, no. 7, pp. 1042 - 1053, July 2005.

    K. Skadron et al., Temperature-aware microarchitecture:

    modeling and implementation, ACM Trans. Architecture and

    Code Optimization, vol. 1, pp. 94-125, Mar. 2004.

    amb

    CRt

    amb TPRePRTTtT +++= /

    0 ))(()( ssCRt

    ss TeTT += /

    0 )(init init

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    12/43

    Effect of memory-boundness in DVFS

    Execution time of an application:

    Speedup (SU) for three programs in SPEC2000

    stall

    core

    comp

    coreex tf

    wft +=)(

    wcomp : computation workload, fcore : clock frequency of core

    tstall: stall time spent by core for external memory access

    )(

    )GHz0.1(new

    coreex

    ref

    coreex

    ft

    ftSU

    ==

    Low SUis due to

    high memory-boundness

    newcoref (GHz)

    Speedup(SU)

    12MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    13/43

    Related works

    Previous works ConstraintTemperature

    analysis

    Memory-

    boundnessSolution type

    2D integration

    [Coskun, TVLSI08] [Zhang,

    ICCAD08]Temperature

    Instantaneous

    temperature

    Not handled Design-time

    [Murali, CODES+ISSS07]Temperature and

    power

    Instantaneous

    temperatureNot handled Design-time

    [Annavaram, ISCA05] [Isci,

    Int. Symp.

    Microarchitecture06]

    [Bergamaschi, ASPDAC08]

    PowerSteady-state

    temperatureHandled Runtime

    [Donald, ISCA06] TemperatureInstantaneous

    temperatureHandled Runtime

    3D integration

    [Zhao, APCCAS08]

    Temperature and

    power

    Instantaneous

    temperature Not handled Design-time

    [Zhu, TCAD08] TemperatureSteady-state

    temperatureNot handled Runtime

    ProposedTemperature and

    power

    Instantaneous

    temperature

    Handled Runtime

    13MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    14/43

    Motivational Example

    Preliminaries

    Threads 1 and 2 are assigned to cores 1 and 2, respectively. Each core has four frequency levels (i.e., 1.00, 1.33, 1.66, 2.00 GHz).

    Constraints: Pmax = 52W, Tmax = 63oC

    Heat sink

    Core 1

    Core 2

    (a) An example platform (b) Threads to be assigned14

    Thread

    1

    Thread

    2

    (Low memory-boundness)

    (High memory-boundness)

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    15/43

    Motivational Example

    (Effect of considering instantaneous temperature)

    Steady-state temperature-based DVFS

    [Zhu, TCAD08]

    Temperature slack-based

    (Instantaneous temperature-based)

    DVFS

    8.8 % IPS

    improvement

    while keeping

    temperature

    constraint

    15

    1

    2

    2

    1

    C. Zhu et al., Three-dimensional chip-multiprocessor runtime thermal management, IEEETrans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 8, pp. 1479-1492, Aug.

    2008. MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    16/43

    Motivational Example

    (Effect of task assignment)Temperature slack-based DVFS

    with different thread assignment

    11.1 % IPS

    improvement

    while keeping

    temperature

    constraint

    Temperature slack-based DVFS

    1

    2

    2

    1

    16

    1

    2

    2

    1

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    17/43

    Temperature slack-based DVFS Two-step approach

    Power budgeting based on per-core temperature slack

    Optimal frequency assignment based on the assigned per-core power

    budget

    200 W

    Power budget, Pmax

    22 W

    25 W

    1.6 GHz @ 21.6 W

    1.6 GHz @ 21.6 W

    Power budgeting among cores Optimal frequency assignment for

    each core

    Core 1

    @ 60o

    CCore 2

    @ 57oC

    Example of two-step approach:

    Core N@ 52oC

    30 W 2.0 GHz @ 27.7 W

    17

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    18/43

    Core-set : cores in the same horizontal position

    Top core: the core farthest from the heat sink within a core-set

    Terminology

    18

    Core-set(4, 3)

    Top core (4, 3)

    = core (4, 3, Z)

    Heat sink

    Core layers

    (i, j, k)

    x

    yz

    (X, Y, Z)

    N = X Y Z

    Core (i, j, k)

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    19/43

    Temperature slack-based DVFS

    Two-step approach

    Power budgeting among core-sets based on the temperature slack of top

    core in each core-set

    Optimal frequency assignment of cores within a core-set based on theassigned power budget of core-set

    200 W

    Power budget, Pmax

    60 W

    Power budgeting among

    core-sets

    Optimal frequency assignment

    within a core-set

    Core-

    set 1 @

    top

    core

    52oC

    Example of two-step approach:

    CoreXY@

    top

    core

    60oC

    30 W

    19

    Core 1

    Core 2

    Core Z

    1.6 GHz

    @ 21.6 W

    2.0 GHz

    @ 27.7 W

    1.0 GHz

    @ 11.3 W

    Core-set 1

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    20/43

    Reducing complexity in power budgeting

    Heat sink

    (i, j, k)

    (X, Y, Z)

    Power budgeting among cores

    Heat sink

    (X, Y)

    (i, j)

    Power budgeting among core-sets

    Find power budget of each core-set ( )

    such that total IPS (instructions per second) is maximized

    subject to max,, )(;, TtTji Zji max1 1

    , PPX

    i

    Y

    j

    setcore

    ji = =

    and

    setcore

    jiP

    ,

    Core-set

    20MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    21/43

    Power budgeting among core-sets in the form of assigning

    steady-state temperature

    Power budgeting among core-sets

    CtTT

    tTTji

    Zji

    ss

    Zji

    Zji=

    )(

    )(;,

    ,,,,

    ,,max Temperature slack of core-set

    Temperature driving force

    of core-set

    = C

    amb

    setcore

    ji

    setcore

    ji

    ss

    Zji TPRT +=

    ,,,,

    Theorem: Performance is maximized when power is assigned to each core-

    set such that the equation (x) is satisfied.

    where Ti,j,Z(t) is the temperature of top core (i, j) at time tss

    ZjiT ,, is the steady-state temperature of top core (i, j)

    Cis a constantis the thermal resistance of core-set (i, j)

    setcore

    jiR

    ,

    21

    Eqn.

    (x)

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    22/43

    Proof of power budgeting theorem (1)C

    tTT

    tTT

    i

    ss

    i

    i=

    )(

    )(max `)(

    )(max CdttdT

    tTT

    i

    i=

    init

    i

    ss

    i

    RCtinit

    i

    ss

    ii TT

    RCeTT

    dt

    tdT=

    1)()(

    /Q

    represents the time spent in each core to completely close the temperature

    slack remaining at time t.

    22

    ambii

    ss

    i TPRT +=Q

    `)(

    )(max CdttdT

    tTT

    i

    i=

    `

    ))((

    )(max CTTtPR

    tTTinit

    iambii

    i=

    +

    `)(

    )(max CLtPK

    tTT

    i

    i=

    +

    dTi(t)/dt linearly increases as power (Pi) increases.

    MPSoC Workshop, Gifu, 2010, Kyung

    `C

  • 8/7/2019 Power and Temperature-Aware

    23/43

    Proof of power budgeting theorem (2)

    P1C`

    P2C`

    Time

    C`

    TimePower

    Power Core 1

    Core 2

    (a) Set power budget of each core

    such that both cores completelyclose the temperature slack at the

    same time

    P1C`+

    P2C`

    Time

    C`

    Time

    Core 1

    Core 2

    Pmax

    t1

    t

    Power

    Power

    (b) Set power budget of each core

    such that Core 1 closes itstemperature slack earlier than Core 2

    23

    tPtCPtCPw CC +++= max`

    2

    `

    1 )`()`(

    Executed workload (w) of (b): (we assume that )fP

    tPtCPPtCP CC +++= max`

    1max

    `

    1 )`()`(

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    24/43

    Proof of power budgeting theorem (3)

    Partial derivatives of executed workload (w) of (b):

    0)( max`

    1max

    `

    1

  • 8/7/2019 Power and Temperature-Aware

    25/43

    Power budgeting among core-sets

    Constraints in power budgeting

    max,, )(;, TttTji Zji +

    max

    1 1

    , PPX

    i

    Y

    j

    setcore

    ji = =

    where is the power budget of core-set (i, j)setcore

    jiP

    ,

    tis the time interval for DVFS

    25

    ss

    Zji

    RCtss

    ZjiZjiZji TeTtTttT ,,/

    ,,,,,, ))(()( +=+

    (1)

    (2)

    Binary search is used to find the smallest Cas defined below such that thetwo constraints (1) and (2) are both satisfied.

    CtTT

    tTT

    i

    ss

    i

    i=

    )(

    )(max )(,, ttT Zji +setcore

    jiP

    ,and are non-decreasing as C

    decreases.

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    26/43

    Theorem: Following relations among frequencies within a

    core-set must hold to maximize the instruction throughput

    performance;

    Frequency assignment in a core-set

    +

    Tamb

    r2

    r1

    rZ

    Layer 1

    Layer 2

    LayerZ

    Core

    Mdf

    fdPRk

    k

    kk =

    )(; where Mis a constant.

    =+

    Z

    k

    ss

    Zambkk TTfPR1 )(Subject to

    P(fk): power consumption of core running at fk

    =

    =

    k

    l

    lk rR1

    Implication of the equation: With the same total amount of workload

    executed for all layers, the core on the layer farther from heat sink with

    largerRkis assigned lower clock frequency (due to the upward

    concavity ofdP(fk)/dfk.26

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    27/43

    Proof of frequency assignment (1)

    Find fkfor all k=1, 2, , Z

    where fkis clock frequency of the core located on layerk

    such that is maximized (for maximum performance)

    subject to (for maximum frequency)

    =

    =

    Z

    k

    ktotal ff1

    =

    =+

    Z

    k

    ss

    Zambkk TTfPR1

    )(

    27MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    28/43

    Proof of frequency assignment (2) Lagrange function:

    Partial derivatives of the Lagrange function must be set to zero to maximize

    the objective;

    Therefore, following relations among frequencies within a core-set musthold;

    = =

    ++=

    Z

    k

    Z

    k

    ss

    ZambkkkZ TTfPRffffL1 1

    21 )(),,...,,(

    Objective Constraintwhere is a Lagrange multiplier.

    0)(

    1),,...,,( 21

    =+=

    k

    kk

    k

    Z

    df

    fdPR

    df

    fffdL

    Mdf

    fdPRk

    k

    kk =

    )(; where Mis a constant.

    28MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    29/43

    Frequency assignment in a core-set

    Finding optimal discrete frequency levels

    Mdf

    fdPRk

    k

    kk =

    )(;

    f1 = the highest available clock frequency

    Determine the frequencies of remaining

    cores

    Constraint check

    =

    +

    Z

    k

    ss

    Zambkk TTfPR1

    )(

    Yes

    No

    f1

    = the next highest

    available frequency

    ?

    29MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    30/43

    Temperature-aware thread assignment

    Two objectives

    Balancing temperatures among cores

    Maximizing instruction throughput performance (i.e., total IPS)

    Thread

    1

    Thread

    2

    Core 1

    @ 60oC

    Core 2

    @ 55oC

    How to assign?

    Core N@ 58oC

    ThreadN

    ?

    Points to consider: 1) IPC (instructions per cycle)

    2) Memory-boundness

    30MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    31/43

    Temperature-aware thread assignment Two-step approach Thread-set assignment among core-sets to balance temperatures among

    core-sets

    Thread assignment within a core-set to maximize IPS

    Example of two-step approach (step 1):

    1 2 3 N

    1

    X Y

    2

    Z

    1

    2

    Z

    1

    2

    Z

    Thread:

    Thread-set:

    1 2

    Heat sink

    (X, Y)

    (i, j)

    Core-set

    ?

    31

    MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    32/43

    Thread-set assignment among core-sets

    Objective of thread-set formation

    Balancing IPC sums among thread-sets

    Procedure of thread-set formation

    1) Assume XYempty sets which can store maximally Zthreads.

    2) Sort all threads according to the descending order of IPC.

    3) Put the thread with the highest IPC into a set with the lowest sum of

    IPCs.

    4) Repeat 3) until all threads are assigned to one ofXYsets.

    32MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    33/43

    Thread-set assignment among core-setsExample of thread-set formation: forming three thread-sets when Z= 2

    IPC:0.9

    IPC:0.8

    IPC:0.6

    IPC:0.4

    IPC:0.3

    IPC:0.3

    IPC:0.9

    IPC:

    0.3

    IPC:0.8

    IPC:

    0.3

    IPC:0.6

    IPC:

    0.4

    IPC sum: 1.2 IPC sum: 1.1 IPC sum: 1.0

    33MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    34/43

    Thread-set assignment among core-sets

    Procedure

    1. Sort all thread-sets according to the ascending order of IPC sum.

    2. Assign the thread-set with the lowest IPC sum to the core-set with the

    highest temperature of top core.

    3. Repeat 2 until all the thread-sets are assigned.

    IPC:

    0.9

    IPC:

    0.3 IPC sum: 0.9 + 0.3 = 1.2

    Example of IPC sum:

    Thread-set

    Thread

    Relation between IPC and switching power: IPCfVCfVP ddsdds =2),(

    34MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    35/43

    Temperature-aware thread assignment

    Two-step approach

    Thread-set assignment among core-sets to balance temperatures among

    core-sets

    Thread assignment within a core-set to maximize IPS

    Example of two-step approach (step 2):

    1

    2

    Z

    Thread-set

    1

    2

    Z

    ?

    Core-set

    Heat sink 35MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    36/43

    Thread assignment within a core-set Procedure

    1. Sort threads in a core-set according to the ascending order of SU.

    2. Assign the thread with the lowest SU to the core farthest from the

    heat sink.

    3. Repeat 2 until all threads in a core-set are assigned.

    1.91.2

    2.0

    Thread-set

    12

    Z

    Core-set

    Heat sink

    High

    temperature

    CtTT

    tTT

    iss

    i

    i=

    )(

    )(maxHigher temperature slack (i.e., lower core

    temperature) allows the assignment of largerpower budget (i.e., higher voltage/frequency),

    and, therefore, task with larger SU.

    (SU values) 36MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    37/43

    Experimental setupTarget processor Intel 65-nm Merom processor [Sakran, ISSCC07]

    Temperature estimation 3D grid-based temperature model [Huang, TVLSI06]

    Power estimationSwitching power + temperature-aware leakage

    power[W. Liao, TCAD05]

    Performance profilingPerformance profiling API on Intel Core2 processor

    in LW25 laptopPAPI

    Simulation environment

    Thermal characteristics [Coskun, DATE09]

    Layer

    Thermal

    conductance

    (W/mK)

    Heat

    capacitance

    (J/m3K)

    Thickness

    (m)

    Heat sink 400.0 3.55E+6 6,900

    Heat spreader 400.0 3.55E+6 1,000

    TIM 4.0 4.00E+6 20

    Core / L2 cache 100.0 1.75E+6 150

    Interlayer 4.0 4.00E+6 20

    ETC.

    Frequency range (4 steps) 1 ~ 2 GHz

    t(DVFS time interval) 5 ms

    DVFS overhead10 s [Lee,

    Tcomput.10]

    Thread migration overhead 1 ms [Coskun,DATE09]

    l(thread assignment

    interval)100 ms

    Psleep 2 W [datasheet]

    Maximum temperature 70o

    CMaximum power 200 W

    37MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    38/43

    Experimental setup

    Algorithm Power budgeting Thread assignment

    2DI_SST Steady state temperature analysis IPC and 2D floorplan-awareness3DI_SST Steady state temperature analysis IPC and 3D floorplan-awareness

    3DIS_SST Steady state temperature analysis IPC, SU and 3D floorplan-awareness

    3DIS_IT

    (proposed)Instantaneous temperature analysis IPC, SU and 3D floorplan-awareness

    Comparison with existing solutions

    Benchmark combinations (SPEC2000)

    Criteria Contents

    IPC hipc, lipc, mipc

    Speed up (SU) hm, lm, mm

    Ex.) hipc-hm: combination of

    applications with high IPC and high

    memory-boundness38MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    39/43

    Experimental resultsIPS result of each combination of applications:

    2DI_SST Steady state temperature analysis IPC and 2D floorplan-awareness

    3DI_SST Steady state temperature analysis IPC and 3D floorplan-awareness

    3DIS_SST Steady state temperature analysis IPC, SU and 3D floorplan-awareness

    3DIS_IT

    (proposed) Instantaneous temperature analysis IPC, SU and 3D floorplan-awareness

    1) 2DI_SST 3DI_SST;

    8.0 % IPS improvement

    2) 3DI_SST 3DIS_SST;

    8.5 % IPS improvement

    3) 3DIS_SST

    3DIS_IT;14.5 % IPS improvement

    39MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    40/43

    Experimental resultsIPS result of each combination of applications:

    1) Utilizing thermal characteristics of 3D floorplan and 2) memory-

    boundness of each applications

    3) Aggressive power budgeting by exploiting the instantaneous temperature

    analysis

    Reasons for instruction throughput improvement:

    1) 2DI_SST 3DI_SST;

    8.0 % IPS improvement

    2) 3DI_SST 3DIS_SST;

    8.5 % IPS improvement

    3) 3DIS_SST

    3DIS_IT;14.5 % IPS improvement

    40MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    41/43

    Experimental resultsEPI (energy per instruction) result:

    Reasons for EPI increase:

    2DI_SST 3DIS_IT;

    3.4 % EPI increase

    3DI_SST 3DIS_IT;

    1.3 % EPI increase

    Proposed method assigns higher frequencies to cores through aggressive

    power budgeting (gives average 24% instruction throughput improvement)

    Reasonable overhead in

    energy efficiency

    41MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    42/43

    Experimental results

    Method Measured time Time interval to invoke

    DVFS 110 s 5 ms

    Thread assignment 3 s 100 ms

    Computational time measured from LG xnote LW25 laptop running 2GHz:

    # of clock frequency changes per second:Average 61 clock frequency

    changes during 1s in 3DIS_IT.

    The overhead of DVFS is

    negligible.

    42MPSoC Workshop, Gifu, 2010, Kyung

  • 8/7/2019 Power and Temperature-Aware

    43/43

    Summary

    Temperature-aware power management in multi-layer MPSoC

    Dynamic voltage frequency scaling based on temperature slack

    Power budgeting among core-sets based on the temperature slack of top

    core in each core-set

    Optimal frequency assignment of cores within a core-set based on the

    assigned power budget of core-set

    Temperature-aware thread assignment Thread-set assignment among core-sets to balance temperatures among

    core-sets

    Thread assignment within a core-set to maximize IPS

    Experimental result shows 41% (24% on average) IPS improvementcompared with existing methods

    43MPSoC Workshop, Gifu, 2010, Kyung