Feature Thermal-Aware 3D Network-On-Chip (3D NoC)...

25
Feature Digital Object Identifier 10.1109/MCAS.2015.2484139 Date of publication: 19 November 2015 FOURTH QUARTER 2015 1531-636X/15©2015IEEE IEEE CIRCUITS AND SYSTEMS MAGAZINE 45 Thermal-Aware 3D Network-On-Chip (3D NoC) Designs: Routing Algorithms and Thermal Managements Kun-Chih (Jimmy) Chen, Chih-Hao Chao, and An-Yeu (Andy) Wu Abstract The three-dimensional Network-on-Chip (3D NoC) has been proposed to solve the complex on-chip communication issues in multicore sys- tems by using die stacking technology in recent years. However, the high integration density of the stacking dies at high operating frequency results in large power density. Furthermore, the unequal thermal conductance of different logic layers leads the 3D NoC to face a much severer thermal problem than 2D NoC. Those thermal issues may limit the performance gain of 3D integration and cause lower reliability of the 3D NoC designs. To ensure the thermal safety, the 3D NoC systems generally require a better cooling method, which can be classified into “tech- nological approaches” and “algorithmic/architectural approaches.” The technological approaches work effi- ciently for removal of internal thermal hotspots through extra devices but results in drastically increasing fabrication cost. On the other hand, the algorithmic/architectural design approaches aim to use the approaches of intelligent packet data delivery and temperature control to maximize performance under thermal constraints. Compared with technological approaches, they can control the system temperature at much lower extra circuit/device cost. In this article, we focus on the algorithmic/ architectural design approaches and review the modern packet routing algorithms and thermal managements for thermal-aware 3D NoC systems. Firstly, we introduce the thermal challenges of 3D NoC system and review the encountered design challenges. Then, recent developed techniques to handle the thermal chal- lenges of 3D NoC systems are addressed. 1. Introduction As technology scales, it is possible to integrate a larger number of Intellectual Properties (IPs). However, efficient data exchange among large number of nodes becomes a performance hindrance of Chip Multi-Processors (CMPs) and Multi-Processor SoC (MPSoC) systems [1]. Fig. 1 shows the trend of the on-chip interconnection. The tradi- tional point-to-point interconnection suffers from the high IMAGE LICENSED B GRAPHIC STOCK

Transcript of Feature Thermal-Aware 3D Network-On-Chip (3D NoC)...

Page 1: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

Feature

Digital Object Identifier 10.1109/MCAS.2015.2484139

Date of publication: 19 November 2015

fourth QuArtEr 2015 1531-636X/15©2015IEEE IEEE cIrcuIts And systEms mAgAzInE 45

Thermal-Aware 3D Network-On-Chip (3D NoC) Designs: Routing Algorithms and Thermal ManagementsKun-chih (Jimmy) chen, chih-hao chao, and An-yeu (Andy) Wu

Abstract

the three-dimensional network-on-chip (3d noc) has been proposed to solve the complex on-chip communication issues in multicore sys-tems by using die stacking technology in recent years. however, the high integration density of the stacking dies at high operating frequency results in large power density. furthermore, the unequal thermal conductance of different logic layers leads the 3d noc to face a much severer thermal problem than 2d noc. those thermal issues may limit the performance gain of 3d integration and cause lower reliability of the 3d noc designs. to ensure the thermal safety, the 3d noc systems generally require a better cooling method, which can be classified into “tech-nological approaches” and “algorithmic/architectural approaches.” the technological approaches work effi-ciently for removal of internal thermal hotspots through extra devices but results in drastically increasing fabrication cost. on the other hand, the algorithmic/architectural design approaches aim to use the approaches of intelligent packet data delivery and temperature control to maximize performance under thermal constraints. compared with technological approaches, they can control the system temperature at much lower extra circuit/device cost. In this article, we focus on the algorithmic/architectural design approaches and review the modern packet routing algorithms and thermal managements for thermal-aware 3d noc systems. firstly, we introduce the thermal challenges of 3d noc system and review the encountered design challenges. then, recent developed techniques to handle the thermal chal-lenges of 3d noc systems are addressed.

1. IntroductionAs technology scales, it is possible to integrate a larger number of Intellectual Properties (IPs). However, efficient data exchange among large number of nodes becomes a performance hindrance of Chip Multi-Processors (CMPs) and Multi-Processor SoC (MPSoC) systems [1]. Fig. 1 shows the trend of the on-chip interconnection. The tradi-tional point-to-point interconnection suffers from the high

ImA

gE

lIc

En

sE

d b

gr

Ap

hIc

sto

cK

Page 2: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

46 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

Kun-Chih (Jimmy) Chen is with the Department of Electronic Engineering, Feng Chia University, Taichung,40724, Taiwan, R.O.C. Chih-Hao Chao is with MediaTek Inc., Hsinchu, 30078, Taiwan, R.O.C. An-Yeu (Andy) Wu is with the Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, 106, Taiwan, R.O.C.

complexity of wire routing, which leads to large layout area and long transmission delay. The shared bus archi-tecture suffers from limited bandwidth as the number of functional blocks (FBs) increase. The low scalability of the two traditional on-chip connection schemes makes them insufficient to accommodate the communication requirements with predictable performance. In recent years, by viewing the on-chip interconnection as a micro-network, Network-on-Chip (NoC) has been proposed as a novel and practical solution to integrate a large number of IPs in a single silicon chip [2]. The merit of NoC-based interconnection is popular in research and commercial fields for multi-/many-core systems in recent years, such as Intel’s 80-core Teraflops Research Chip [3], Intel’s Sin-gle-Chip Cloud Computer [4], and Arteris’s FlexNoC [5].

In recent years, the emerging Through Silicon Via (TSV)-based die-stacking three-dimensional (3D) inte-gration technology provides a new dimension to exploit novel geometric integration of vertical silicon dies [6]. The TSV-based 3D IC technology can integrate dies of different technology nodes and/or different components, such as CMOS logic units, Memory, analogue sensors etc., over multiple logic layers in 3D IC [1]. By combining with the 3D IC technology and NoC technology, three-dimen-sional Network-on-Chip (3D NoC) has the following three advantages over traditional 2D NoC:

■ Higher IPs mapping density in the network: As shown in Fig. 2(a), an 8#8 2D NoC can be stacked to a 4#4#4 3D NoC. The form factor of the chip becomes smaller, in which the IP mapping density is higher.

System Evolution

IP Level:Point-to-Point Connectione.g., DCT/IDCT Processor

SoC Level:Bus Interconnection

e.g., AMBA-Based System

Multi-Core Level:Network-on-Chip

e.g., Intel 80-Core for TeraflopsComputation

BDEG MatrixVector Multiplier

DRU DRU

ACF MatrixVector Multiplier

TransposeMemory

Co-Processor

ARMuP

MPEGCoedc

Bus

Bus

SRAM DSPBus

Bridge

USB SD UART IO

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

PER

Figure 1. the trend of on-chip interconnection.

Network-on-Chip (NoC) has been proposed as a novel and practical solution to integrate a large number of IPs in a single silicon chip.

Page 3: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 47

■ Shorter connection length and lower network power consumption: As shown in Fig. 2(b), the original longest physical distance is reduced from L 2 to

. .L0 5 2 The hop count between the farthest pair of routers can be reduced from 15 to 10.

■ Higher network bandwidth and routing flexibility: As show in Fig. 2(c), in addition to the four planar direc-tions (i.e., , , ,N E S and ),W two extra directions, UP (U) and Down ( ),D are available for transmis-sion. Due to higher path diversity and bandwidth, the network throughput can be improved.

With aforementioned properties, 3D NoC has been proven to be capable of achieving less power and smaller form factor for high-performance on-chip data transmission [7].

Nevertheless, because of stacking dies, the high inte-gration density at high operating frequency of 3D NoC results in larger power density and higher temperature.

Furthermore, the heterogeneous thermal conductance of different logic layers (i.e., each layer of the stacking dies) and longer heat conduction path make the 3D NoC-based MPSoC systems suffer from severer thermal problem [6][8], as shown in Fig. 3(a). Besides, varying cooling effi-ciency at different layers increases temperature variance and cause 3D NoC have more overheated hotspots, as shown in Fig. 3(b). Thermal issue increases the leakage power, which may further increase temperature and result in thermal runaway of the chip [9][10].

To ensure thermal safety, the CMP generally requires a better cooling solution, which can be classified into “technological approaches” and “algorithmic/architec-tural approaches.” In technological approaches, two popular approaches were proposed: 1) Microchannel Fluid Cooling (MFC), or so-called Microfluidic Channel Cooling [11] and 2) Thermal-TSV (TTSV) [12]. The MFC works efficiently for removal of internal thermal hotspot.

√2

√2

2D NoC

3D NoC

< 40µm> 1500µm

0.5L

0.5L

L

L

L

0.5L

W

W

N

U

D

S

N

S

E

E

2D Router

3D Router

(a) (b) (c)

Figure 2. Advantages of 3d noc comparing to 2d noc.

(ii). Longer Heat Conduction Path

(iv). Varying Cooling Efficiency

(i). High Cross-Sectional of Each Router

(iii). Larger Cross-Sectional Power Density

2D

3D

Thermal Safe

Temperature Distribution

Thermal Unsafe

More OverheatHotspots

ThermalLimit TL

(a) (b)

Figure 3. (a) factors of 3d thermal problem; (b) temperature distribution is higher and wider, making more hotspots [19].

Page 4: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

48 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

However, it requires extra devices for stress and drain water, bringing reliability issues and extra fabrication cost. On the other hand, the size of TTSV is bigger than signal/power TSVs, and the required number of TTSV may be large. Hence, the usage of TTSV will cause new wire routing problems in 3D SoC/CMPs, which limits the per-formance improvement of 3D IC technology. Besides, the packaging and fabrication cost of these technology-based approaches grows drastically, which is another issue.

On the other hand, the algorithmic/architectural design approaches aim to use intelligent packet deliv-ery, so-called packet routing, and thermal management approaches to maximize performance under thermal con-straints [13]. Compared with technological approaches, they can control the system temperature with lower circuit/device overhead. In this article, we focus on the algorithmic/architectural design approaches. Firstly, we introduce the thermal challenges of 3D NoC system and review the encountered design issues. Then, the modern developed techniques to handle the thermal challenges of 3D NoC are addressed.

2. Design Issues of Thermal-Aware 3D Network-on-Chip

To solve the thermal issue of 3D NoC system, the design goal is to make the thermal unsafety system become a thermal safety one, as shown in Fig. 4. The design meth-odology of thermal-aware 3D network-on-chip (NoC) can be categorized into two different control strategies, which is shown in Fig. 5:

■ Dynamic Thermal Management (DTM): It is further classified into Reactive DTM (RDTM) and Proactive DTM (PDTM), as shown in Fig. 5(a)(b), which will be respectively introduced in Section 3 and 5. The key difference of the two control policies is that whether the temperature control is triggered in advance.

■ Intelligent Packet Routing: Based on the routing strat-egy, it can be separated into reactive routing and proactive routing, as shown in Fig. 5(c)(d), and the detail introduction will be presented in Section 4. The reactive routing detours the packet until the packet reach the node near the inactive throttled node. In opposition to reactive routing, the pro-active routing will detour the packet from those inactive nodes in advance based on the topology information. Note that an NoC node contains one router, one processing element, and one memory.

2.1. Design Issue of Dynamic Thermal Management TechniquesDynamic Thermal Management (DTM) is required to keep temperature below the thermal limit. As shown in Fig. 6, the node of 3D NoC starts from the ambient tem-perature TA and heats up toward its steady state temper-ature ,TSS which is usually higher than the thermal limit

.TL The thermal sensor senses the temperature of each node of NoC and reports it to the temperature-aware controller. When the temperature rises above the trig-ger level ,TT the DTM controller starts to perform tem-perature control to maintain the thermal safety. When

Thermal Unsafety System Thermal Safety System

Temperature Temperature

Tlimit

Tlimit

Throughput Throughput

Overheat

System CrashSystem Thermal Safety

Time

Time

(a) (b)

Figure 4. design goal of thermal-aware design is to make (a) thermal unsafety system become (b) thermal safety system.

The thermal issues limit the performance gain of 3D integration and cause lower reliability of the SoC/CMP designs. Consequently, the thermal-aware 3D NoC design becomes critical in recent years.

Page 5: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 49

the temperature falls beneath the trigger level for coming back to normal working, the DTM controller stops performing the temperature control. The node of NoC recovers full bandwidth.

As shown in Fig. 5(a), the reactive DTM (RDTM) is triggered reactively as the sys-tem temperature achieves the alarming level. Because of the pessimistic reaction to the alarming level of system tempera-ture, the conventional reactive DTMs usually employ the full throttling scheme (i.e., clock gating). However, this kind of temperature control policy results in sig-nificant performance impact [9][13][15]. On the other hand, the proactive DTM (PDTM) will predict the future system temperature and control the tempera-ture in early stage, as shown in Fig. 5(b). Although the PDTM can mitigate the per-formance impact caused by RDTMs, the control scheme of PDTM is shown as an NP-hard problem [20], which results in high computational complexity.

Because the fully throttled nodes cannot transmit any packets until the temperature becomes thermal safety, the RDTMs and PDTMs both lead to time-varying topology change [13]. The tradi-tional packet routing algorithms cannot handle the packet delivery in time-vary-ing topology change, which results in blocking packets in the network and performance degradation. In summary, the design concept of dynamic thermal management aims to regulate the system temperature with minimal performance overhead. In this paper, we introduce two different kinds of design methodology in this paper: 1) throttle-based design and 2) migration-based design, which will be described later.

2.2. Design Issue of Efficient Packet Routing Techniques in Throttled 3D NoCThe network topology of 3D NoC sys-tem is changed during runtime opera-tion because the overheated nodes are throttled. Such time-varying topology can be defined as a Non-Stationary Irregular Mesh (NSI-Mesh), as shown in Fig. 7. The main problem of NSI-Mesh is that the

Tem

pera

ture

Tem

pera

ture

Thr

ough

put

Thro

ughp

ut

Tlim

it

Tlim

it

TM

on

TM

on

TM

on

Shu

t Dow

n

Sho

ut D

own

Spe

ed U

p

Tim

e

Del

ayed

Tas

ksD

ynam

ic T

herm

alM

anag

emen

tS

uppo

rtin

g R

outin

gA

lgor

ithm

s

Rea

ctiv

e P

acke

tR

outin

g

Pro

activ

e P

acke

tR

outin

gP

roac

tive

Dyn

amic

The

rmal

Man

agem

ent

Rea

ctiv

e D

ynam

icT

herm

al M

anag

emen

t

Sec

tion

3

Sec

tion

5

Sec

tion

4

The

rmal

-Aw

are

Alg

orith

ms

Des

ign

Met

hodo

logy

Tim

eP

roac

tive

TM

On

Ear

lier

Pro

cess

edT

asks

Del

ayed

Tas

ks

Hea

t S

ink

Hea

t S

ink

Hea

t S

ink

Hea

t S

ink

Sou

rce/

Des

tinat

ion

Nod

e

Thr

ottle

dN

ode

Non

-O

verh

eate

dN

ode

Non

-O

verh

eate

dN

ode

Thro

ttled

Nod

e

Top

olog

yIn

form

atio

nS

ourc

e/D

estin

atio

nN

ode

SS

DD

SS

DD

(a)

(b)

(c)

(d)

Fig

ure

5.

Alg

orith

mic

/arc

hite

ctur

al d

esig

n m

etho

dolo

gy o

f gen

eral

ther

mal

-aw

are

3d n

oc d

esig

n.

Page 6: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

50 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

traditional routing algorithms cannot retain the success-ful packet delivery in such topology.

To ensure the successful packet delivery in NSI-Mesh, many routing algorithms were proposed, which can be classified into reactive routing and proactive routing, as

shown in Fig. 5(c)(d). For the reactive routing algorithms, they use the spatial information to make the packets detour the inactive nodes through the symmetric struc-ture provided by NoC [9][23], as shown in Fig. 5(c). In [23], the fault-tolerant ( )FT routings are applied to solve the routing problem in the NSI-Mesh. The packets will be transmitted around those inactive nodes to ensure the successful packet delivery. However, applying the FT routings directly results in unbalanced traffic distribu-tion in NSI-Mesh [25]. For 3D NoC systems, the downward routing was proposed to pass through the bottom logic layer (i.e., the logic layer closes to the heat sink) to detour the throttled nodes [9][19]. However, the downward rout-ing leads to unbalanced traffic distribution in vertical logic layers. As the temperature rising, there are more throttled nodes, and the traffic congestion is more seri-ous, which results in drastically performance degrada-tion. Therefore, the design challenge of reactive routing algorithm is to reduce the traffic congestion to mitigate the performance impact.

To reduce the problem of traffic congestion in throt-tled 3D NoC systems, the proactive routing algorithms were proposed to make the packets early detour the inac-tive nodes by using the topology information in advance. In [24], Lin et al. proposed to use the buffer information and broadcast throttling information, which records the location of the throttled nodes, to make the packet early detour the throttled nodes. In [13], Chao et al. proposed the Transport Layer Assisted Routing (TLAR) scheme, which considers the information in both transport layer and network layer to deliver packets. Because proactive routing algorithms can support more path diversity than reactive one, the network throughput can be significantly improved. Consequently, the design challenge of proac-tive routing algorithms is to increase the routing path diversity and ensure the deadlock-free packet routing.

3. Reactive Thermal Management in 3D NoC Systems

3.1. Throttle-Based Reactive Dynamic Thermal Management SchemesThe simplest throttle-based DTM adopts global throttling (GT) scheme to cool down the network [14]. When any node’s temperature exceeds the alarming level (i.e., the temperature of the node is higher than the TT in Fig. 6), the DTM will slow down entire network’s operation speed, as shown in Fig. 8(a). Although the GT can regulate the network temperature with short temperature controlling time, the performance impact is non-negligible. To mitigate the per-formance impact caused by the GT, a distributive and col-laborative throttling scheme, ThermalHert, was proposed in [15], as shown in Fig. 8(b). The DT scheme controls the

Temperature

Steady State TSS

Thermal Limit TLTrigger Limit TT

Ambient TA

Time

w/o DTMw/ DTM

Start DTMStop DTM

Temperature ControllingNormal Working

Figure 6. transient temperature trace with and without dynamic thermal management (DTM) [13].

RegularMesh

RegularMesh

TopLayer

BottomLayer

TimeTopology Changes Periodicly

Heat Sink Heat Sink Heat Sink Heat Sink

IrregularMesh

IrregularMesh

Throttled Node Non-Throttled Node

Figure 7. the topology changes in runtime operation because of DTM.

Heat Sink Heat Sink Heat Sink Heat Sink

(a) (b)

Overheated Node Throttled Node

Figure 8. (a) global throttling (GT) scheme slows down the operation speed of entire network; (b) distributive throttling (DT) scheme only throttles the overheated nodes.

Page 7: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 51

quota of incoming traffic of the thermal-emergent nodes. Because of less throttled nodes, the DT can mitigate the performance impact caused by the GT scheme.

For 3D NoC systems, the heterogeneous thermal-con-ductance capability of each logic layer may result in long cooling time for the nodes, which are far away from the heat sink by employing the DT scheme [9]. To provide an efficient heat conductance path in 3D stacking environ-ment, a thermal-aware vertical throttling (TAVT) scheme was proposed in [9][13]. Different from the DT scheme, the TAVT scheme’s thermal-control granularity is a pillar, which consists of the nodes with identical XY address as the overheated nodes. Based on the thermal-emergency level, the TAVT scheme determines the throttling state (i.e., the number of nodes in a throttling pillar) to regulate the system temperature, as shown in Fig. 9(a)(b).

To support the TAVT’s pillar-based temperature con-trol, each node of 3D NoC system has one throttling

trigger flag. The trigger flag is set in each node by com-paring the current temperature measurement Tcurrent and the trigger level ,TT as the following:

,, .

T T10Trigger flag

ifotherwise

Tcurrent $= ) (1)

If all the trigger flags of the pillar are zero, the tem-peratures of the nodes are all below the trigger level. Hence, none of the routers requires throttling, as shown by the Throttle 0 state in Fig. 9. If the trig-ger flags are not all zero, throttling is required. For a Z-layer 3D NoC, TAVT simultaneously throttles the upper Z-1 nodes, as shown by the Throttle 2 to Throttle 4 states for 4-layer 3D NoC in Fig. 9(b). Fig. 9(a) shows the basic finite state machine of a pillar for TAVT over a 4-layer 3D NoC system to control the throttling of the 4 vertically aligned nodes.

TriggerFlags! = 0

TriggerFlags! = 0

TriggerFlags! = 0

TriggerFlags! = 0

TriggerFlags! = 0

TriggerFlags! = 0

TriggerFlags! = 0

TriggerFlags = 0

Trigger Flags = 0

TriggerFlags = 0

TriggerFlags = 0

TriggerFlags = 0

TriggerFlags = 0

TriggerFlags = 0

TriggerFlags = 0

TriggerFlags = 0

Trig

ger F

lags

= 0

Trigger Flags = 0Start

Throttle 1

Throttle 1

Throttle 0

Throttle 0

Throttle 4

Throttle 3

Throttle 3

Throttle 2

Throttle 2

Z YX

Z YX

100%

100%

100%

100%

0%

100%

100%

100%

0%

0%

100%

100%

0%

0%

0%

100%

0%

0%

0%

0%

Heat SinkThrottle 0

Heat SinkThrottle 1

Heat SinkThrottle 2

Heat SinkThrottle 3

Heat SinkThrottle 4

Thermal Emergency

100%

100%

100%

100%

0%

100%

100%

100%

0%

0%

100%

100%

0%

0%

0%

100%

Heat SinkThrottle 0

Heat SinkThrottle 1

Heat SinkThrottle 2

Heat SinkThrottle 3

Thermal EmergencyStart

(a) (b)

(c) (d)

Figure 9. (a) the finite state machine of the pillar for TAVT scheme; (b) an example of TAVT scheme; (c) the finite state machine of the reduced TAVT scheme; (d) the reduced TAVT scheme makes the bottom logic layer as a non-throttling layer for packet delivery.

Dynamic Thermal Management (DTM) is required to keep temperature below the thermal limit.

Page 8: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

52 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

As mentioned in Section 2, the DTM causes the net-work topology change during runtime operation because the overheated nodes are throttled, which results in the problems of packet delivery. To solve this problem, Chao et al. proposed a reduced TAVT scheme in [13], and the finite state machine for a 4-layer 3D NoC system is shown in Fig. 9(c). The reduced TAVT scheme sets the bottom layer of Fig. 9(b) as a bypass layer (as shown in Fig. 9(d)) to guarantee the packet delivery, which will be intro-duced in Section 4.

To compare the cooling efficiency of the four investi-gated throttle-based RDTM schemes (i.e., GT, DT, TAVT, and reduced TAVT), by involving the experimental setting in [13], the temperature distribution of 3D NoC system

with/without RDTM is shown in Fig. 10. Obviously, the four RDTM schemes can control the system temperature of 3D NoC under the hard thermal limit (i.e., 100 °C in this experiment). Because of the pillar-based temperature control, the TAVT and the reduced TAVT can reduce the number of throttled nodes within the period of tempera-ture control, as shown in Fig. 11.

3.2. Migration-Based Reactive Dynamic Thermal Management Schemes In addition to the throttle-based DTM, migration-based DTMs are proposed in recent years [16]-[17]. The sim-plest way is to control the temperature of the hot NoC nodes through migrating the tasks from the hot nodes to

966432

0

0−40

40−5

0

50−6

0

60−7

0

70−8

0

80−9

0

90−1

00

100−

110

110−

120

120−

130

130−

140

140−

150

150−

160

160−

170

w/RDTM, Reduced TAVTw/RDTM, TAVT

w/RDTM, DT

w/RDTM, GT

w/RDTM

Temperature (°C)

Num

ber

of R

oute

rs

Figure 10. temperature distribution of 3d noc system with and without reactive dynamic thermal management [13].

300

250

200

150

100

50

08.4 8.6 8.8 9.0

Time (s)9.2 9.4 8.4 8.6 8.8 9.0

Time (s)9.2 9.4

8.4 8.6 8.8 9.0Time (s)

9.2 9.48.4 8.6 8.8 9.0Time (s)

(a) (b)

(d)(c)

9.2 9.4

50

40

30

20

10

0

50

40

30

20

10

0

50

40

30

20

10

0

Figure 11. number of fully throttled nodes in (a) GT, (b) DT, (c) TAVT, and (d) reduced TAVT [13].

Page 9: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 53

other cooler nodes. In [16], Liu et al. proposed a Dynamic Thermal-Balance Routing (DTBR) to balance the tempera-ture distribution of the NoC system. Based on the tem-perature estimating results provided from the proposed thermal model, each node can be informed the thermal information of each routing direction, which belongs to the shortest path according to the corresponding destina-tion node. Hence, the routing strategy can be formed as

where Smini is the set of minimal paths from current node to the destination node; Sthermal is the set of thermal infor-mation; [ ]S ithermal means the thermal information input from port .i To ensure the deadlock-free routing, the authors adopted the virtual channel regulation in this paper. Obviously, the DTBR always select the coolest rout-ing path while the candidate of routing direction is thermal safety. Although the DTBR can balance the temperature

Start

Sensing Temperature

End

No

Yes

Yes

Yes

Step I: Find the candidate list of migrationdestination, called r, by considering thetemperature condition.

Step II: Find the optimal migration destination,call β, without consideration of temperaturecondition.

Step III: Find the final decision of migrationdestination by considering the closet of r anb β.

Am I a Hot Node?

No

No

Did I Find Other Players Have anOverlap Among My Selfish Decision?

Am I the Hottest Node?

Migrate My Task to the Found Destinationby Using XYZ Routing.

Step 1:Selfish

Decision

Step 2:Cooperation

and FinalDecision

Step 2 (Cooperation andFinal Decision):

Node A Migrate the Task.

Step 1 (Selfish Decision):Node A Select Node rA1 asthe Migration Destination

Hot

Cool

rA1

rA1

rA2

rA3

Dst

Dst

βA

Destination of Node A

Candidate of Migration Destination

Optimal Migration Destination

rAn

βA

Dst

A

A

B

B

(a) (b)

Figure 12. (a) the flowchart of the proposed game-based task migration scheme for temperature control in [17], and (b) the example of game-based thermal-aware task migration.

, . { [ ]}

, [ ] ,i S i S S i T

XY i S S i TRouting StrategyChoose a direction from MinSend packet according to routing

<

>mini mini thermal threshold

mini thermal threshold

/

/

7

6

!

!= ' (2)

The design concept of dynamic thermal management aims to regulate the system temperature with minimal performance overhead.

Page 10: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

54 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

distribution, it suffers from heavy traffic congestion in the minimal cooling path region and degrade the system per-formance because of the adopted routing strategy in (2).

To consider the minimize the peak temperature as well as the overhead imposed on chip performance during migra-tion, Hassanpour et al. adopted the Game Theory to opti-mize the performance-temperature multi-objective problem [17]. Game Theory was introduced by Von Neumann and Morgenstern, which helps to study interactions among rational individuals or decision makers [18]. In this paper, the authors view each Processing Element (PE) of NoC sys-tem as an individual player and model the performance-tem-perature multi-object problem as a cooperative game. The main objective in a cooperative N-player game is to form a coalition to maximize the overall outcome. Hence, the goal of the proposed game-based method is to find proper migra-tion destinations for all the hot PEs to achieve the maximize performance under a certain thermal constrain.

The aim of all hot PEs is to find the best migration des-tinations to solve the optimization problem in 3D NoC system. Hence, there is one thermal sensor embedded in each PEs, and there is one central coalition management unit to sort the PEs, whose temperature is above the ther-mal limit. Then, all detected hot PEs (i.e., the PE’s tem-perature is higher than the thermal limit) start the game in parallel. Each hot PE tries to find the best migration destination for its current task and pursue the game pro-cessing below. We illustrate the flowchart and example in Fig. 12. If there are two hot PEs (i.e., node A and B) the node A will perform the following two steps to find the optimal migration destination:

■ Step 1 (Selfish Decision)In this step, each hot PE will first consider the tem-perature condition and find the candidate of the pos-sible migration destination. After that, the hot PE will find the optimal migration destination, which is with-out temperature condition, and select the candidate node closing to the optimal migration destination. For the example in Fig. 12(b), node A will determine three candidate of the possible migration destination (i.e., , ,r rA A1 2 and )rA3 and one optimal migration des-tination, .Ab Because the rA1 is near by ,Ab the rA1 is selected as the migration destination of hot node A.

■ Step 2 (Cooperation and Final Decision)In this step, each player (i.e., hot PE) will interact with other players to maximize the overall outcome. For the example in Fig. 12(b), the node A and node B will check the selfish decision of each other. In this example, if the node B’s selfish decision is the same as node A’s one, the node A will get the grant to mi-grate the task to node rA1 because node A’s tempera-ture is higher than node B. Therefore, it is critical to migrate the task of node A to another cooling node with XYZ routing algorithm.

4. Intelligent and Efficient Routing in Throttled 3D NoC Systems

4.1. Routing Problem of Packet Delivery in Non-Stationary Irregular Mesh (NSI-Mesh)As mentioned in Section 2, the topology of the thermal-aware 3D NoC transforms over time due to the throttling

PE

PE

PE

PE

S S

D D

PE

PE

S

D

PE

PE

S

D

D2

S2

Y

X

Path of XY-Routing Path of Adaptive Routing Blocked Channel Fully Throttled Router

(a) (b) (c) (d)

Figure 13. cases of fail packet delivery: (a) source-throttled case, (b) destination-throttled case, (c) path-throttled case, and (d) long-term HoL blocking caused by the previous three cases [13].

Page 11: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 55

scheme of DTM, as shown in Fig. 7. We call the situation of time-varying topology change as Non-stationary Irregular Mesh (NSI-Mesh). When throttling is applied to the near-overheated router, a specific quota is given. The quota needs to be small or even zero for faster emergent cooling. If the number of passed pack-ets exceeds the quota, the router will block all the new incoming packets.

Traffic quota is a key design parameter in DTM. A non-zero traffic quota lets the throttled router serves some packets and provides the ability to control the band-width of the node. However, once the quota is exhausted, the router stops serving the incoming requests suddenly. The near-overheated node may be inactive unpredictably during the throttling period, and the unfinished packets will form a rapid growing congestion tree, which results in more blocking packets. Therefore, the small interval of topology transformation and the large range of the num-ber of throttled nodes make conventional routing algo-rithms infeasible in NSI-Mesh.

To ensure the success of packet delivery in an NSI-Mesh network, Chao et al. analyze the following four situ-ations of unsuccessful packet delivery [13]. In the first one, as shown in Fig. 13(a), the source node is fully throt-tled. In the second one, as shown in Fig. 13(b), the desti-nation node is fully throttled. In the third case, as shown in Fig. 13(c), at least one node on the routing path is fully throttled. The last one is shown in Fig. 13 (d), where the channels on the routing path are blocked by other blocked packets. If the source node is fully throttled, the packetized message will be blocked in the network interface. If any case in Fig. 13(b) or Fig. 13(c) occurs, the injected packets will be blocked somewhere on the rout-ing path and form a congestion-tree. The other packets will be blocked as in Fig. 13(d).

To eliminate the source-throttled case in Fig. 13(a) and the destination-throttled case in Fig. 13(b), the throttling information of all nodes are required for each node in NoC system. The Head-of-Line (HoL) problem tradition-ally results from the congestion in the switch, and the probability of occurrence can be reduced by applying Virtual Channel (VC) flow control or output buffering router architectures. Due to the source-throttled case, the destination-throttled case, and the path-throttled case, a new typed long-term HoL blocking may occur. The long-term HoL blocking has to be eliminated by pre-venting the occurrence of the source-throttled case, the destination-throttled case, and the path-throttled case. However, the path-throttled case in Fig. 13(c) is depen-dent on the routing path. Therefore, it is important to guarantee that there is at least one non-fully throttled path toward destination router before injecting the packet, and the packet is routed on the guaranteed path.

To solve the routing problem in NSI-Mesh, the routing algorithms can be separated into reactive routing algo-rithms and proactive one, as shown in Fig. 5. The first type of routing algorithms detour these inactive nodes through the symmetric NoC structure. On the other hand, the proactive routing algorithms obtain the topol-ogy information in early stage and make the packets early detour the inactive nodes.

4.2. Reactive Routing in 3D NSI-Mesh with Off-Line Buffer AllocationAs mentioned before, the DTM will result in time-varying topology change and make the problem of packet deliv-ery. Because the nodes, which are close to the heat sink, in the bottom layer have the highest thermal conduc-tance to heat sink, Chao et al. proposed a reduced TAVT scheme and set the nodes in the bottom layer as non-throttled nodes, as shown in Fig. 9(d). Consequently, the channels in the bottom logic layer can be used as bypassing paths [13]. The control policy of reduced TAVT scheme makes NSI-Mesh has three key charac-teristics: (i) if a node is throttled, all the nodes above it are throttled; (ii) if a node is not throttled, all the nodes below it are not throttled; (iii) the nodes in the bottom layer are never throttled, as shown in the throttle 1 to throttle 3 cases of Fig. 9(d).

Based on the three characteristics of reduced TAVT, Chao et al. proposed a thermal-aware downward routing to deliver the packet. When some nodes are throttled in the upper layers, the downward routing is involved to deliver all the packets through the bottom bypassing layer, as shown in Fig. 14 [9]. In the bottom bypassing layer, the packets will be delivered by using adaptive routing. However, this approach results in heavy traffic load in the bottom logic layer, which make the 3D NoC system suffers from rapid network performance degra-dation [25].

To mitigate the significant performance degrada-tion by involving thermal-aware downward routing,

XY

ZNon-GuaranteedLateral Routable

GuaranteedLateral Routable

Heat Sink

Non-Throttled Node Throttled Node

Figure 14. thermal-award downward routing delivers all the packets through the bottom non-throttling layer.

Page 12: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

56 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

Chao et al. proposed a static buffer allocation (BA) scheme to achieve thermal- and traffic-balanced design [19]. In this paper, the authors adopted a heuristic method to assign the feasible input buffer depth in each logic layer of 3D NoC system by applying M/M/1/K queue-ing analysis. To simplify the derivation, the authors assume each packet’s arrival time on each input channel is Poisson distribution, and exponential service time for each channel is required. Besides, each packet is viewed

as an atomic data. We use the north input channel of the router at ( , , )x y z as an example. Assume the network is not overloaded (i.e., ), , , , , ,x y z x y zdir dir1m n in steady state. Note that , , ,x y z dirm means the packet arrival rate from the direction dir to the router at ( , , ),x y z which the service rate is ., , ,x y z dirn The main target of buffer allocation is to calculate the full probability for each channel. For the north channel of the router at ( , , ),x y z ,C , , ,x y z N the full probability b , , ,x y z N is

Step 1: Set Initial Buffer Lengthof Each Router

Step 2: Increase the BufferLength with Max. Full Probability

Step N: Process Is Done asTotal Buffer Length Reaches NB

Layer 0

Layer 1

Layer 2

Layer 3

Layer 4 Has Max.Full Probability N Iterations

R0

R1

R2

R3

R0

R1

R2

R3

R0

R1

R2

R3

Heat Sink Heat Sink Heat Sink

Start

Run Statistics of Thermal-AwareDownward Routing for Arrival Rateλx,y,z,dir with Initial Buffer Length

Compute Full Probability bz byUsing (4) for Each Layer

Choose the Layer with MaximumFull Probability

Increase Buffer Length at theChosen Layer

Allocated Total Buffer LengthReaches Total Budget NB?

Yes Finished When k1 + k2 + k3 + k4 = NB

No

Finish

Set the Initial Buffer LengthK = (k1, k2, k3, k4) = (1, 1, 1, 1) for Each Direction

(a)

(b)

Figure 15. (a) process flow of static buffer allocation scheme [19], and (b) the example of buffer length allocation.

Page 13: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 57

, .b11

where, , ,, , ,

, , ,, , , , , ,

, , ,

, , ,x y z N

x y z Nk

x y z Nx y z Nk

x y z Nx y z N

x y z N1 , , ,

, , ,

x y z N

x y z N#t

tt t

nm

=-

-=+

(3)

Obviously, the full probability depends on arrival rate , , ,x y z Nm and service rate ., , ,x y z Nn Besides, the full prob-

ability can be obtained as the arrival rate is computable. Unfortunately, because of adopted adaptive routing in this work, the path selection depends on the channel status,

which makes the arrival rate become unpredictable. Because of the connection of an NoC system, , , ,x y z Nn of one router depends on the full probability of all its down-stream channels. Therefore, this kind of heuristic buffer depth allocation needs off-line simulation to extract the unpredictable arrival rate.

To simplify the problem, the control granularity of BA is a 3D NoC logic layer. Hence, similar to (3), the blocking probability in logic layer z of a 3D NoC can be derived to

250

200

150

100

50

00.00 0.10 0.20

Injection Rate (Flits/Node/Cycle)

w/o BA w/ BA

Temp. Mean = 109.2 °CTemp. Stdv. = 7.3 °C

Max. Temp. = 123.1 °C

Temp. Mean = 105.6 °CTemp. Stdv. = 3.8 °C

Max. Temp. = 112.9 °C

120.0118.0116.0114.0112.0110.0108.0106.0104.0102.0100.098.096.094.092.090.088.086.084.082.080.078.076.074.072.070.0

w/o BA w/ BAw/o BA

(b)(a)

Ave

rage

Lat

ency

Figure 16. the buffer allocation (BA) scheme for reactive downing routing can (a) improve the system performance and (b) con-trol the temperature of 3d noc system [19].

Address Space Application Layer

1 2 3 D 1 2 3 D

TXPayload

Queue

RXPayloadQueue

RXPacketQueue

Network Layer

TransportLayer

TXPacketQueue

TransportLayer API

Packetizer

Router

Topology Table

Transport LayerController

Routing Mode Memory

1 2 3 D

De-Packetizer

Figure 17. block diagram of transport layer in the tile of thermal-aware 3d noc [13].

Page 14: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

58 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

,b1

1z

zk

zzk

1 z

z#t

tt=

-

-+ where .

z

zztnm= t

t (4)

zmt and znt represent the effective packet arrival time and service time in logic layer z of 3D NoC, which can be extracted through off-line simulation. Fig. 15(a) shows the flowchart of the proposed static buffer allocation for each 3D logic layer in [19]. The design flow is applied to each direction of channels, including north, south, east, west, up, and down. The total buffer budget count in flit is NB for each direction. First, we run statistics

for downward routing to obtain the arrival rate , , ,x y z dirm of each channel. Then, we initialize the depth of each queue to one flit. The full probability bZ of each layer is calculated by using (4), and the buffer depth of the layer that has maximum bZ is increased. Finally, the iteration runs until all the total budgets are consumed, and the buffer allocation is done.

We use Fig. 15 to illustrate the flowchart and a design example of the introduced BA scheme in [19]. If we set the total budget of buffer length NB to 8 and the initial buffer length is 1, each input buffer length at each logic layer of 3D NoC will be set to 1, as shown in the Step 1 of Fig. 15(b). Then, the downward routing across to the bottom layer is executed and extra the full probability of each logic layer. If the logic layer 4 has the maximum full probability, the each buffer length of each router in logic layer 3 will be increased by 1, as shown in the Step 2 of Fig. 15(b). The process will be done as the total budget of buffer length NB (i.e., 8 in this example), as shown in the Step 3 of Fig. 15(b). Fig. 16(a) shows the performance com-parison between a 3D NoC system with/without the intro-duced buffer allocation scheme. Because the BA scheme

S Current NodeActive?

Transport Layer Network Layer

Refer to Topology Table (TT)

Yes DestinationActive?

≥1 LateralPath Active?

Yes Yes

NoNoNo

Undeliverablefor All PacketTransference

Undeliverablefor CurrentDestination

No GuaranteedRoutable Path

Operation inNetwork

Interfaces (NIs)

Packetizeand Set

Routing Mode

PacketDelivery

Operation inRouters

S

D

D

Source

Destination

Figure 18. flowchart of the proposed transport layer assisted routing scheme [25].

N Cycles N CyclesR Cycles

Normal Stage

10ms, 107 Cycles

Normal StageReconfiguration

Stage

Figure 19. the framework of the adopted DTM [13].

S S

S

D

D

D

Non-Throttled Router

Possible Throttled Router

Forbidden Path

Lateral-First Path

Downward-First Path

Z

X, Y

Top LogicLayer

Bottom LogicLayer

(a) (b) (c)

Figure 20. path selection path of TLAR [13].

Page 15: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 59

can help to increase total buffer length of the packet downward layer, the performance will be improved. In addition, the system temperature is also controlled because all packets will be delivered to their destinations through the bottom logic layer, which is close to the heat sink, as shown in Fig. 16(b).

4.3. Proactive Routings in 3D NSI-Mesh with Transport-Layer InformationIn addition to the static-control-based routing algo-rithms, the dynamic-control-based routing algorithms can deliver the packet to early detour the inactive node. To deliver data successfully in NSI-Mesh, Chao et al. propose the Transport Layer Assisted Routing (TLAR) scheme in [13]. The key idea of TLAR scheme is that the topology information in transport layer is used to assist the determination of routing. Fig. 17 shows the block dia-gram of the transport layer in TLAR framework. There are five major components: (i) transmitter queue and pack-etizer (TX), (ii) receiver queue and depacketizer (RX), (iii) transport layer controller (TLC), (iv) topology table (TT), and (v) routing mode memory (RMM). TX and RX are the same as the normal transport layer. TLC handles the requests from the application layer and the request of the network layer. By applying the reduced TAVT scheme, Chao et al. stores the throttling information of the entire network, which represents the topology of the network,

in TT. The results of path selection, what is defined as the routing mode, are saved in the RMM.

Fig. 18 shows the flowchart for handling the applica-tion-layer requests. To prevent the packet from being blocked in the throttled node, the throttling informa-tion, which is stored in the Topology Table (TT), is checked before sending the packet to the network. The throttling information is updated during each change of topology. To realize the scheme to update TT, the authors proposed a DTM framework in [13], as shown in Fig. 19. In reconfiguration stage, based on the tem-perature sensing results provided by the distributed thermal sensors, the TT is updated by the propagating throttling information along the X-, Y-, and Z-dimension sequentially. For a network which operates at 1 GHz, and the total period of temperature-traffic control is 10 ms, Chao et al. showed that the total reconfiguration stage only needs less than 0.1% of the total period, which has negligible timing overhead [13].

Fig. 20 shows the routing mode selection of the TLAR scheme, which is a combination of vertical packet rout-ing and lateral packet routing. As shown in Fig. 20(a)-(c), only the lateral-first path and the downward-first path are allowed at source router. The up-then-lateral turns (i.e., Up-North, Up-East, Up-South, and Up-West) are prohibited for deadlock avoidance. Because the authors involve the reduced TAVT, the downward-first path is

Start

LAR Routable?No NoLDR

Routable?

Yes

Lateral AdaptiveFirst

XY Routing First DownwardRouting First

S S

D

S

D

S

D

S

D

S

D

D

Source Node

Destination Node

Inactive Throttled Node

S

D

(a) (b)

(c) (d)

Figure 21. (a) operation flow for setting the routing mode in DLADR. An example of (b) lateral adaptive routing first after DLADR checking, (c) lateral Xy routing first after DLADR checking, and (d) downward routing first after DLADR checking.

Page 16: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

60 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

guaranteed routable in the NSI-Mesh topology. However, the lateral-first path is guaranteed routable if and only if all the routers on the lateral-first routing path are active. To realize the TLAR scheme, the authors proposed three algorithms: 1) Downward Lateral Deterministic Routing (DLDR), 2) Downward Lateral Adaptive Routing (DLAR), and 3) Downward Lateral Adaptive-Deterministic Routing (DLADR).

The DLDR is the baseline algorithm in TLAR, and it uses lateral deterministic routing (LDR) for routing packets in the source logic layer or the bottom logic layer. If the lat-eral source logic layer is lateral deterministic routable, the XY routing will be adopted for packet routing. Oth-erwise, the downward routing will be used instead. The DLAR is an improved version of DLDR, which uses lateral adaptive routing (LAR) for routing packets. If the lateral source logic layer is lateral adaptive routable, the adap-tive routing will be used for packet routing. Because of adaptive routing, compared with DLDR, the DLAR can provide more routing path diversity.

Furthermore, to realize the TLAR scheme and increase the lateral routing path diversity, Chao et al. introduce the downward-lateral adaptive-deterministic routing (DLADR) in [13]. The idea of DLADR is to com-bine the adaptive routing and XY routing for the lateral-first routing, as shown in Fig. 21(a). The destinations are categorized into three types: (i) the guaranteed adap-tive routable (LAR routable), (ii) the guaranteed XY routable (LDR routable), and (iii) the non-guaranteed

lateral routable, which is downward routable. If a des-tination is guaranteed adaptive routable, it is guar-anteed XY routable. If a destination is guaranteed XY routable, it is downward routable. Therefore, the down-ward routable destination set is a super set of the LDR routable set, and the LDR routable set is a super set of the LAR routable set.

Fig. 21(a) shows the operation flow for setting the rout-ing mode of the packets in DLADR. Because the lateral-first adaptive routing is able to balance the traffic loading, it is the first priority to check whether the source-destina-tion pair is the lateral-first adaptive routable. The lateral-first deterministic routing results in less traffic congestion in the bottom layer than downward-first routing, so the priority of lateral-first deterministic routing is higher than downward-first routing. Fig. 21 illustrates three examples. In Fig. 21(b), the lateral-first adaptive routing will be involved first because the source-destination pair is LAR routable. In Fig. 21(c), although the source-destination pair is non-LAR routable, this case is LDR routable. Hence, the lateral-first deterministic routing will be adopted in Fig. 21(c). At last, the downward routing is applied in Fig. 21(d) because the source-destination pair neither is LAR routable nor LDR routable.

In summary, Chao et al. in [13] compared the rout-ing efficiency of the introduced reactive routing (i.e., downward routing) and proactive ones (i.e., TLAR) in this section, as shown in Fig. 22. For the reactive down-ward routing, heavy traffic congestion will be occurred

Downward TLAR-DLDR TLAR-DLAR TLAR-DLADR

Layer 0Top Layer

Layer 1

Layer 2

Layer 3BottomLayer

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

250

200

150

100

50

00 5 10 15 20 25 30

Network Injection Rate (Flit/Cycle)

Ave

rage

Lat

ency

(C

ycle

)

Downward

TLAR-DLAR

TLAR-DLDR

TLAR-DLADR

(a)

(b)

Figure 22. (a) the statistical traffic load distribution (stld) of downward routing and TLAR scheme, and (b) the performance comparison between downward routing and TLAR scheme [13].

Page 17: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 61

in the bottom logic layer, which results in lowest sys-tem throughput. For the proactive routing, the system performance is better than the previously introduced approaches by involving DLADR because the most bal-anced traffic distribution.

5. Proactive Thermal Management Design in 3D NoC Systems

Although the Reactive Dynamic Thermal Management (RDTM), which was introduced in Section 3, can control the system temperature, the performance impact is not

Overheat

100%

100%

100%

100%

100%

100%

100%

100% 100%

100%

100%50%

0%

100%

100%

100%

100%100%

SS SS

S S S

S S SA A AD D D

D D D

S S S AA

D

D D D

D D

Heat Sink Heat Sink Heat Sink

1 2

22 3

3 Time

Heat Sink Heat Sink Heat Sink

1 2 3 TimePredict

TemperatureThermalUnsafe

ThermalUnsafe

Time

Shut Down

Shut Down

Temperature Frequency

Frequency

100%

100%

50%

Tlimit

Tlimit

1 2 3 Time

(a) (b)

(c) (d)

Fully Throtting Node Partially Throtting Node

Figure 23. (a)(b) reactive DTM results in performance impact, and (c)(d) PDTM scheme improve the system performance [20].

SiliconLayers

Heat Spreader

Heat SinkZ

YX

SiliconLayer 0

SiliconLayer 1

SiliconLayer N

Ambient Temperature

Rsink

RhsChs

Cinter

Cinter

Rinter

Rinter

Rintra

PX, Y, 0

TX, Y, 0

TX, Y, 1

TX, Y, N

T1, Y, N

T1, Y, 1

T1, Y, 0P1, Y, 0

P1, Y, 1

P1, Y, N

PX, Y, 1

PX, Y, N

Figure 24. the thermal model of a 3d noc system.

Page 18: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

62 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

negligible due to the pessimistic thermal control policy (i.e., the RDTM usually shut down the near-overheated node [20]). To mitigate the performance impact, the mod-ern design approaches called Proactive Dynamic Ther-mal Management (PDTM) are proposed in recent years. Compared with RDTMs, the PDTMs predict the system temperature and control the system temperature in early stage. Fig. 23 illustrates an example of RDTMs and PDTMs, while both applying the reduced TAVT as the temperature control policy. Because the node A is thermal emergent, it is fully throttled at time 2 for the emergent cooling by employing the RDTM, which results in rapid performance degradation, as shown in Fig. 23(a)(b). In opposition to RDTM, the PDTM can control the temperature of node A before thermal emergency at time 2 through half throttling, which leads to the less performance impact, as shown in Fig. 23(c)(d). Because the PDTMs can con-trol the system temperature with less pessimistic actions than RDTMs, the system throughput involving the PDTMs outperforms the RDTMs in many different aspects. The key design issues of PDTMs are the precise temperature predictor design as well as the corresponding thermal management, which will be introduced in this section.

5.1. Introduction of Thermal Model of 3D NoC SystemIn practical way, to analyze the heat transfer problems, we convert the thermodynamics systems into the ther-mal circuit. The temperature difference is analogous to the “voltage”; the heat flow can be described as the “cur-rent.” In the thermal-design community, these equivalent circuits are called thermal models and dynamic thermal

models if they include thermal capacitors. This duality provides a convenient basis for an architecture-level thermal model. For a 3D NoC system, in this paper, there are multiple stacked logic layers, and the corresponding Thermal RC model with a heat sink is shown in Fig. 24. For an X-by-Y-by-Z 3D NoC, the T , ,x y z and P , ,x y z mean the cor-responding temperature and power consumption of the node at the location of ( , , ),x y z respectively. Rinter and Rintra represent the thermal resistance between each ver-tical logic layer and each horizontal logic layer, respec-tively. Cinter is the thermal capacitance between each vertical logic layer. For the heat spreader, the Rhs and Chs are the thermal resistance and thermal capacitance, respectively. In usual, the heat sink is attached in the one side of the 3D NoC package. To model the heat transfer between the heat sink and ambiance, we use the Rsink as the convective resistance of the heat sink.

To achieve the early temperature control, the tran-sient temperature of each node should be derived. By Fourier’s Law, the change of temperature in a time unit can be formulated as [29]

( ) ( ) ( )

,dtdT t

CP t

RCT t

= - (5)

where T(t) and P(t) are the temperature and total power consumption of a specific node at time ;t R and C are the effective thermal resistance and thermal capaci-tance toward the ambiance, respectively. To simplify the equation, we rewrite (5) as

( )

( ) ( ),dtdT t

a P t b T t$$= - (6)

Temperature

Time

∆T(t)>0

∆T1

∆T2

∆Tk

t − ∆ts t + k∆tst

∆T

ThermalPrediction

Measured TemperaturePredictive Temperaturew/o DTM InfluencePredictive Temperaturew/ DTM Influence

(a)

Temperature

Time

∆T(t)<0∆Tr

Tk

Tlimit

t − ∆ts t + k∆tst

ThermalPrediction

DTM On

T1′

∆Tr′

T1∗

Tk′

(b)

Figure 25. (a) temperature prediction with baseline prediction model, and (b) temperature prediction with enhanced one [20].

To ensure the successful packet delivery in NSI-Mesh, many routing algorithms were proposed, which can be classified into reactive routing and proactive routing.

Page 19: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 63

where a (i.e., it is equal to / )C1 and b (i.e., it is equal to / )RC1 are the physic constant, which depends on the

material of employed technology. To solve the linear differential equation, we can set the boundary condi-tion, which the initial temperature at t0 is set as T0 (i.e.,

( ) ) .T t T0 0= Therefore, the (6) can be solved as

( ) ( ) .T t aP e d T e( ) ( )b t b t

t

t0

0$x x= +x x

x

x - - - -

=

=# (7)

We assume the Dynamic Thermal Management (DTM) is determined every ( ),t t- 0 which is also called as the DTM period. To simplify the control policy, in [20], Chen et al. assume each node of NoC does not change its own work-ing activity in a DTM period. Therefore, we set the total power consumption of each node in a DTM period to P (i.e., the ( )P x of (7) equals to P in the time period ( )) .t t- 0 Because the temperature will approximate to a steady-state temperature in long-term running (i.e., ( ) ,T t Tss= when ),t 3= the (7) can be derived to

( ) ( ) .T t T T T e0ss ssbt$= - - - (8)

5.2. Temperature Prediction SchemeThe key design concept of PDTM is to early control the system temperature based on the information of predic-tive temperature. Hence, it is important to early moni-tor the system temperature. In [20], Chen et al. proposed a temperature prediction scheme using Thermal RC model, called RC-based Temperature Prediction (RCTP) model. Based on the temperature trend between the cur-rent temperature and the one in the history, there are two kinds of prediction modes:

1) Increasing Mode: For a quai-stationary system, Chen et al. assume that each NoC node usually operates at full speed in normal operation. Hence,

the change of temperature is usually an exhaus-tive increasing trend, and the future temperature can be predicted through a linear approximation, which will be introduced as shown in Fig. 25(a).

2) Decreasing Modes: For the temperature-controlled operation (i.e., the operation period when the DTM is triggered), the temperature difference between the current temperature and the one in the history would be a negative trend, as shown in Fig. 25(b). Hence, the prediction model should be enhanced, which will be described later.

A. Baseline Thermal RC-Based Thermal Prediction (RCTP) Model

For the thermal-aware system design, embedded ther-mal sensor for each core is a popular practical method [26][27]. Based on the present voltage and current, the thermal sensor can provide a result of temperature peri-odically [28]. We assume the embedded thermal sensor provide an information of temperature every thermal sensing period .tsT The design goal is to predict the tem-perature at the time after k tsT (as shown in Fig. 25(a)), which can be described as

( ) ( ) ,T t k t T t T*sD D+ = + (9)

where k is the thermal prediction distance (i.e., the ther-mal sensing time far away from the current time), and

( )T t* is the current providing sensing temperature from the embedded thermal sensor at time .t

To predict the TD in (9), the derivative analysis is adopted to extract the temperature difference in a ther-mal sensing period .tsT Therefore, the first derivative of (8) can be shown as

( )

( ) ,dtdT t

b T T e0ssbt$ $= - - (10)

94

92

90

88

86

84

82

80

Tem

pera

ture

(°C

)

0.25

0.2

0.15

0.1

0.05

0

4

3

2

1

0

Mea

n A

E (

°C)

Max

imum

AE

(°C

)

0.02 0.22 0.42 0.62 0.82 PD(1) PD(2) PD(3) PD(4) PD(5) PD(6)Time (sec.)

(a) (b)

TemperatureReported by HotspotPredicted Temperature

Mean AEMaximum AE

Figure 26. (a) results of peak temperature prediction, and (b) prediction error increases with respect to the prediction distance [20].

Page 20: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

64 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

Top Layer Bottom LayerHeat Sink

Layer 0 Layer 1 Layer 2 Layer 3

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

Layer 0 Layer 1 Layer 2 Layer 3

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

Layer 0 Layer 1 Layer 2 Layer 3

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

Layer 0 Layer 1 Layer 2 Layer 3

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

Layer 0 Layer 1 Layer 2 Layer 3

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

776

65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

776

6554 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

776 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

Layer 0 Layer 1 Layer 2 Layer 3

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

10095

9085

7 76 65 54 43 32 21 10 0 xy

Tem

pera

ture

(°C

)

Pre

dict

ion

Dis

tanc

e =

0(O

rigin

alT

herm

al M

ap)

Pre

dict

ion

Dis

tanc

e =

1P

redi

ctio

nD

ista

nce

= 2

Pre

dict

ion

Dis

tanc

e =

3P

redi

ctio

nD

ista

nce

= 4

Pre

dict

ion

Dis

tanc

e =

5

(a)

(b)

(c)

(d)

(e)

(f)

Figure 27. (a) measured thermal map, and (b)-(f) predictive thermal map under different prediction distance under random traffic pattern in an 8#8#4 3d noc system [20].

Page 21: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 65

which is the temperature difference between the temperature at current time t and the one at the previous thermal sensing time ( ) .t tsD- For generalization, with (10), the predictive temperature difference between the temperature at time ( )t k tsD+ and ( )t k t ts sD D+ - can be derived as

( ) ( )

.dtdT t k t

dtdT t

es b k ts$D+

= $ D- (11)

Hence, the temperature difference, ( ),T t k tsD D+ between time ( )t k tsD+ and ( )t k t ts sD D+ - is predicted as

( ) ,T T t e*k

b k ts$D D= $ D- (12)

where ( )T t*D is equal to ( ( ) ( )) .T t T t t* *sD- - With (12),

the TD in (9) is the accumulation of each change of tem-perature in each sensing time period from the time t to the time ( ),t k tsD+ which can be derived as

( )( )

,T T T te

e e1

1*i b t

b t b k t

i

k

1s

s s

$$

D D D= =-

-$

$ $

D

D D

-

- -

=

/ (13)

which is shown in Fig. 25(a). With (9) and (13), the temperature at the time after k tsT can be predicted as

( ) ( )

( ) ( )( )

.

T t k t T t T

T t T te

e e1

1

*

* *

s

b t

b t b k t

s

s s

$$

D D

D

+ = +

= +-

-$

$ $

D

D D

-

- -

(14)

If the thermal prediction distance k is determined, the term of ( ( ))/( )e e e1 1b t b k t b ts s s$ - -$ $ $D D D- - - in (14) will be a constant. Hence, the computational complexity of the proposed thermal prediction model is O(1).

B. Enhanced Thermal RC-Based Prediction (RCTP) Model

For the temperature trend in a thermal-aware NoC sys-tem, it is not an exhaustive increasing trend because the DTM will control the temperature of the thermal-emergent nodes. Therefore, the ( )T t*D in (13) would be a negative value, if the DTM starts to control the temper-ature. Fig. 25(b) illustrates an example. If the triggering

Temperature Frequency

Temperature Frequency

Temperature Frequency

ThermalUnsafe

ThermalUnsafe

ThermalUnsafe

Thermal Prediction

Tlimit

Tlimit

Tlimit

fmax

fmax

fmax

fmax/2

1 2 3

1 2 3

1 2 3

Shut Down

Half Slow Down

Time

Time

(b)

(c)

(a)

Thermal Unsafe Point Thermal Safe Point

Figure 28. (a) system will be thermal unsafe without DTM, (b) reactive DTM shuts down the thermal-emergency node, and (c) proactive DTM assigns the frequency based on power budget [20].

102

100

98

96

94

920.4 0.5 0.6 0.7

Time (Sec.)

Tem

pera

ture

(°C

)

0.8 0.9

Hard Thermal Limit

VT

VT_PD(1)

VT_PD(2)

VT_PD(3)

VT_PD(4)

VT_PD(5)

Figure 29. the maximum transient temperature under trans-pose-1 traffic pattern [20].

Page 22: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

66 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

temperature of the DTM policy is ,Tlimit the DTM starts to control the temperature at time ( )t tsD- because the tem-perature ( )T t t*

sD- exceeds the .Tlimit Hence, the sens-ing temperature might decreases at time t and leads to a negative ( ) .T t*D By following the baseline RCTP model in (13), the ( )T t*D becomes a negative value and results in smaller predictive temperature at time ( ),T t*D which results in large prediction error. The large prediction error affects the DTM policy and leads to significant per-formance impact. Therefore, it is necessary to enhance the RCTP model in (13) for the negative temperature trend.

As shown in (8), the temperature trend without DTM influence follows an approximated exponential function, if the total power consumption of an NoC node is identical in every thermal sensing period. For the predictive tem-perature, the inherent modeling error is occurred because of the b value in (14). As mentioned before, the b value is the reciprocal of effective thermal RC product. However, the thermal RC product is temperature-dependent, and it leads

to the modeling error in our thermal model. We use the Fig. 25 (b) to introduce the enhanced RCTP model for the negative temperature trend. Assume the T1l and Tkl are the predictive temperature without DTM influence at time t and time ( ),t k tsD+ and they have an inherent modeling error

1f l and kf l associated with the actual temperature (i.e., the temperature resulted from the embedded thermal sensor) T *

1 and ,T *k respectively. Obviously, as shown in Fig. 25(b),

the 1f l equals to .TrD To consider the worst case, the goal is to predict the temperature Tk at time ( )t k tsD+ without DTM influence, and there is an inherent modeling error kf associated with the actual temperature .T *

k With the result of (8), we can derive the following two equations

( ) ,T T T T T e* *k k k

b k t1ss ss

s$f= + = - - $ D- (15)

and

( )

[ ( )] .

T T T T e

T T T e*

kb k t

b k t1 1

1ss

ss ss

s

s

$

$f

= - -

= - - +

$

$

D

D

-

-

ss

l

l l

(16)

260

130

00.01 0.21 0.41

Time (Sec.)

# T

herm

al E

mer

gent

Nod

e

0.61 0.81

260

130

00.01 0.21 0.41

Time (Sec.)

# T

herm

al E

mer

gent

Nod

e

0.61 0.81

260

130

00.01 0.21 0.41

Time (Sec.)

# T

herm

al E

mer

gent

Nod

e

0.61 0.81

260

130

00.01 0.21 0.41

Time (Sec.)

# T

herm

al E

mer

gent

Nod

e

0.61 0.81

260

130

00.01 0.21 0.41

Time (Sec.)

# T

herm

al E

mer

gent

Nod

e

0.61 0.81

260

130

00.01 0.21 0.41

Time (Sec.)

(a) (b) (c)

(d) (e) (f)

# T

herm

al E

mer

gent

Nod

e

0.61 0.81

Prediction Distance = 1

Prediction Distance = 3 Prediction Distance = 4 Prediction Distance = 5

Prediction Distance = 2

Figure 30. the numbers of thermal-emergent nodes as involving (a) reactive VT and (b)-(f) proactive VT with different prediction distance [20].

To mitigate the performance impact, the modern design approaches called Proactive Dynamic Thermal Management (PDTM)

are proposed in recent years.

Page 23: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 67

By subtraction of (15) and (16), we can find that the TrD l equals to .T er

b k ts$D $ D- Consequently, as shown in Fig. 25(b), the temperature Tk at time ( )t k tsD+ can be predicted by

( ) ( )

.

T T T T T

T T

* * *

*

k kk r k k k k

k k k

f f f

f

D- = + - - + -

= + =

l ll l

(17)

Obviously, the prediction error only associates with the current sensing temperature (i.e., there is no accumulated error). Therefore, the enhanced RCTP model for the tem-perature-controlled operation period can be derived as

( ) ( ) ( ) ,T t k t T t T t e Tsb j t

j

k

r1

sD D D+ = + -$ D-

=

l l l/ (18)

where the ( )T t l is the predictive temperature without DTM influence at time ,t and the ( )T t l is the temperature difference between ( )T t l and ( ) .T t t*

sD- Similar to (14), the computational complexity of the proposed enhanced prediction model is O(1).

To estimate the precision of the temperature predic-tion model, Chen et al. in [20] applied the random traffic on an 8#8#4 3D NoC system and compared the tempera-ture resulted from the prediction model and the Hotspot [21], as shown in Fig. 26(a). Besides, to analyze the pre-diction error over different prediction distance k (PD(k)), Fig. 26 (b) shows the prediction Mean Absolute Error (AE) and Maximum Absolute Error (AE). Obviously, the prediction error increases with respect to the prediction distance, which can be also observed in Fig. 27.

5.3. Throttle-Based Proactive Dynamic Thermal Management Schemes for 3D NoC SystemsAs the mentioned in Section 2, the major difference between the RDTM and the PDTM is that the later one can manage the system temperature based on the pre-dicted temperature information, which are presented in (14) and (18). The throttle-based PDTM will determine the percentage of local node’s activity (i.e., clock fre-quency) and control the temperature of local node based on the information of predicted temperature and current one. However, Chen et al. have shown that the problem of clock frequency assignment for each NoC node is an NP-hard problem [20]. Therefore, it is difficult to find an optimal solution for a real-time system due to high com-putational complexity.

In [20], Chen et al. consider the worst case to mitigate the performance impact caused by the thermal prob-lem and assume the throughput of the 3D NoC system is quai-stationary. Therefore, each isolated node of the 3D NoC system usually operates at full speed in normal operation, which makes the total power consumption at

each thermal sensing period is the same as P tmax s$D and Pmax is the power consumption of each node of the 3D NoC system with the maximum clock frequency .fmax To reduce the control complexity of clock frequency assign-ment, Chen et al. proposed a method of frequency assign-ment based on the power budget distribution [20]. The power budget is defined that slacking thermal energy can be consumed by the NoC node until thermal emergency.

Assume that one node becomes thermal emergent at time ,t and it will become thermal safety after j tsD (i.e., the time at ( )) .t j tsD+ Besides, the authors in [20] assume that the thermal emergency is predicted before k tsD (i.e., the time at ( )) .t k tsD- To perform the emer-gent cooling, the conventional throttle-based RDTM will fully throttle the thermal-emergent nodes from the time t to the time ( ) .t j tsD+ Consequently, the total power consumption from the time ( )t k tsD- to time ( )t j tsD+ is

[( ) .. ( )] ( ) .P t k t t j t P d P k tmaxs st k t

t j ts

s

s

$x xD D D- + = =x

x

D

D

= -

= +# (19)

Therefore, the following Corollary is summarized in [20]Corollary 1: A particular node of the 3D NoC system

will be thermal-emergent at time t and be thermal-safety at time ( ) .t j tsD+ Besides, the situation of thermal emer-gency has been predicted at time ( ) .t k tsD- By following the First Law of Thermodynamics and (19), the node will be thermal safety, if the total power consumption from the time ( )t k tsD- to time ( )t j tsD+ is the same as the one from the time ( )t k tsD- to time ,t when is in normal operation.

As the thermal emergency at time t is predicted before k tsD (i.e., the time at ( )),t k tsD- with (19), the slack power budget is P k tmax s$ D (e.g., the node will be fully throttled, if the total power consumption from time ( )t k tsD- to time t exceeds ) .P k tmax s$ D Based on the Cor-ollary 1, the authors in [20] evenly distribute the power budget between each thermal sensing period tsD from time ( )t k tsD- to time ( ) .t j tsD+ Because the power con-sumption depends linearly on the clock frequency, the clock frequency of the node, which is predicted as a ther-mal-emergent node at time ( ),t k tsD- can be reduced to

( ) .f k jmax + To reduce the performance impact caused by the throttle-based DTM, the authors in [20] consider the ideal situation that the thermal-emergent node is thermal-safety at ( )t tsD+ (i.e., the j is equal to 1). There-fore, the following Lemma can be obtained

Lemma 1: For a node of a 3D NoC system, the clock fre-quency is decreased by /( ),k1 1+ if the node is expected as a thermal-emergent node before ,k tsD where k is the prediction distance and .k 0$

Fig. 28 illustrates an example. For a specific node of the 3D NoC system, it will be thermal-emergent at time 2, as shown in Fig. 28(a). As the conventional throttle-based

Page 24: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

68 IEEE cIrcuIts And systEms mAgAzInE fourth QuArtEr 2015

RDTM is employed, the system will becomes thermal-safety at time 3, as shown in Fig. 28(b). Obviously, there is no power consumption in the time interval between time 2 and time 3, and the total power consumption (i.e., power budget) from time 1 to time 3 is .Pmax On the other hand, if the time of thermal emergency can be predicted at time 1, the power budget can be distributed evenly between each time interval by employing the proposed method. Therefore, the power consumption in each time interval is /P 2max from time 1 to time 3, which makes the total power consumption from time 1 to time 3 is the same as ,Pmax as shown in Fig. 28(c).

To evaluate the cooling efficiency as involving the PDTM, Chen et al. in [20] applied the vertical throttling (VT ) [9] as a basis of DTM control policy. Fig. 29 shows that the reactive VT and proactive VT with different prediction distance can control the system tempera-ture under the hard thermal limit. Because the pro-active VT can early control the system temperature, the number of thermal-emergent nodes is less than the one by using reactive VT scheme, as shown in Fig. 30. Consequently, the system throughput can be improved significantly.

6. ConclusionsBecause of stacking dies, the thermal problem of 3D NoC becomes more exacerbated than 2D NoC. The thermal issues limit the performance gain of 3D integration and cause lower reliability of the SoC/CMP designs. Con-sequently, the thermal-aware 3D NoC design becomes critical in recent years. In this article, we separate the thermal-aware design into two different control strat-egies: (i) dynamic thermal management (DTM) and (ii) packet routing. For the DTM design, we investigate the modern reactive DTM (RDTM) and proactive DTM (PDTM) schemes, which aim to regulate the system temperature with minimal performance impact. On the other hand, for the packet routing design, we present the novel adaptive routing algorithms, which increase the sustainability of 3D NoC system under the irregular mesh topology. This work provides encouraging algorithmic level approaches to preserve the benefits of 3D NoC system without the enhancement of cooling devices.

AcknowledgementsThis work was supported by the Ministry of Science and Technology, TAIWAN, under Grant NSC-101-2220-E-002-013, MOST-102-2220-E-002-001 and MOST-104-2218-E-035-007.

Kun-Chih (Jimmy) Chen (S’10-M’14) received his B.S. degree from National Taiwan Ocean University (NTOU), Taiwan, in Computer Science and Engineering in 2007. He received the M.S. degree from National Sun Yat-sen

University (NSYSU), in Computer Science and Engineering in 2009. He received the PhD degree from Nation Taiwan Uni-versity (NTU), in Graduate Institute of Electronics Engineering (GIEE) in 2013. From October 2014 to January 2015, he

served as a postdoctoral fellow in Intel-NTU Connected Context Computing Center working on the development of Green Sensing Platform for Internet of Things (IoTs), Reliable Thermoelectric Converter, and Power-aware Software Defined Network (SDN). From February 2015, Dr. Chen joined the faculty of Department of Electronic Engineering, Feng Chia University. His research interests include algorithm development, VLSI architecture design and implementation for three dimensional Networks-on-Chip (3D NoC), advance arithmetic unit design, and fault-tolerant system designs.

Dr. Chen received the Best Paper Award of 2014 Inter-national Symposium on VLSI Design, Automation and Test (VLSI-DAT’14) and PhD Dissertation Award of IEEE Taipei Section in 2014. Besides, he was also invited to publish a book chapter in “Routing Algorithm in Network-on-Chip,” which was published by Springer in November 2013. Dr. Chen serves as referee of many IEEE journals and con-ferences, including TC, TPDS, ISCAS, ICASSP, and VLSI-DAT. Besides, he also serves on the program committee of Journal of Internet Service and Information Security. Dr. Chen is a member of IEEE and Chinese Institute of Electri-cal Engineering (CIEE).

Chih-Hao Chao received the B.S. degree in Electrical Engineering from National Taiwan University, Taipei, Taiwan, in 2004, and the M.S. and Ph.D. degrees from National Taiwan University in 2006 and 2012, respectively, both in Electron-

ics Engineering. He is currently an Engineer at MediaTek Inc., Hsinchu, Taiwan.

An-Yeu (Andy) Wu (IEEE M’96-SM’12-F’15) received the B.S. degree from National Taiwan University in 1987, and the M.S. and Ph.D. degrees from the Uni-versity of Maryland, College Park in 1992 and 1995, respectively, all in Electrical

Engineering. In August 2000, he joined the faculty of the Department of Electrical Engineering and the Graduate Institute of Electronics Engineering, National Taiwan University (NTU), where he is currently a Professor. His research interests include low-power/high-performance VLSI architectures for DSP and communication applica-tions, adaptive/multirate signal processing, reconfigu-rable broadband access systems and architectures, and

Page 25: Feature Thermal-Aware 3D Network-On-Chip (3D NoC) …access.ee.ntu.edu.tw/Publications/Journal/J69_2015.pdf · node of NoC and reports it to the temperature-aware controller. When

fourth QuArtEr 2015 IEEE cIrcuIts And systEms mAgAzInE 69

System-on-Chip (SoC)/Network-on-Chip (NoC) platform for software/hardware co-design.

Dr. Wu had served as the Associate Editors of lead-ing IEEE Transactions in the circuits and systems area and signal processing area, such as IEEE Transactions on Vert Large Scale Integration (VLSI) Systems, IEEE Trans-actions on Circuits and Systems I: Regular Papers, IEEE Transactions on Circuits and Systems II: Express Briefs, and IEEE Transactions on Signal Processing. Dr. Wu is now serving an Associate Editor for Journal of Signal Pro-cessing Systems (JSPS), and acted as the Lead Guest Edi-tor of the Special Issue of “2010 IEEE Workshop on Signal Processing Systems (SiPS) in JSPS, which was published in Nov. 2011. He also served on the technical program com-mittees of many major IEEE International Conferences, such as SiPS, AP-ASIC, ISCAS, ISPACS, ICME, SOCC, and A-SSCC. He is now serving as the Chair of VLSI Systems and Architectures (VSA) Technical Committee in IEEE Circuits and Systems (CAS) Society.

From August 2007 to Dec. 2009, he was on leave from NTU and served as the Deputy General Director of SoC Technology Center (STC), Industrial Technology Re-search Institute (ITRI), Hsinchu, TAIWAN, supervising Parallel Core Architecture (PAC) VLIW DSP Processor and Multicore/Android SoC platform projects. In 2010, Dr. Wu received “Outstanding EE Professor Award” from The Chinese Institute of Electrical Engineering (CIEE), Tai-wan. Dr. Wu is elevated to IEEE Fellow in 2015.

References[1] T. Mak, R. Al-Dujaily, K. Zhou, K.-P. Lam, Y. Meng, A. Yakovlev, and C.-S. Poon, “Dynamic programming networks for large-scale 3D chip integration,” IEEE Circuits Syst. Mag., vol. 11, no. 3, pp. 51–62, Aug. 2011.[2] Y. Jin, E. J. Kim, and T. M. Pinkston, “Communication-aware glob-ally-coordinated on-chip networks,” IEEE Trans. Parallel Distrib. Syst., vol. 23, no. 2, pp. 242–254, Feb. 2012.[3] Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar, ”A 5-GHz mesh interconnect for a teraflops processor,” IEEE Micro, vol. 27, pp. 51–61, Nov. 2007.[4] J. Howard, S. Dighe, S. R. Vangal, G. Ruhl, N. Borkar, S. Jain, V. Er-raguntla, M. Konow, M. Riepen, M. Gries, G. Droege, T. Lund-Larsen, S. Steibl, S. Borkar, V. K. De, and R. Van Der Wijngaart, “A 48-core IA-32 processor in 45 nm CMOS using on-die message-passing and DVFS for performance and power scaling,” IEEE J. Solid-State Circuits (JSSC), vol. 46, no. 1, pp. 173–183, Jan. 2011.[5] J.-J. Lecler and G. Baillieu, “Application driven network-on-chip ar-chitecture exploration & refinement for a complex SoC,” Des. Automat. Embedded Syst. (DAES), vol. 15, no. 2, pp. 133–158, June 2011.[6] B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. McCauley, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Ru-pley, S. Shankar, J. Shen, and C. Webb, “Die stacking (3D) microarchi-tecture,” in Proc. IEEE/ACM Int. Symp. Microarchitecture (Micro), Dec. 2006, pp. 469–479.[7] B. S. Feero and P. O. Pande, “Networks-on-chip in a three dimen-sional environment: a performance evaluation,” IEEE Trans. Comput., vol. 58, no. 1, pp. 32–45, Jan. 2009.[8] K. Kang, J. Kim, S. Yoo, and C.-M. Kyung, “Runtime power manage-ment of 3-D multi-core architectures under peak power and tempera-ture constraints,” IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst., vol. 30, no. 6, pp. 905–918, June 2011.

[9] C.-H. Chao, K.-Y. Jheng, H.-Y. Wang, J.-C. Wu, and A.-Y. Wu, “Traffic- and thermal-aware run-time thermal management scheme for 3D NoC systems,” in Proc. ACM/IEEE Int. Symp. Network-on-Chip (NOCS), May 2010, pp. 223–230.[10] I. Yeo, C. C. Liu, and E. J. Kim, “Predictive dynamic thermal man-agement for multicore systems,” in Proc. ACM/IEEE Design Automation Conf. (DAC), June 2008, pp. 734–739.[11] J.-M. Koo, S. Im, L. Jiang, and K. E. Goodson, “Integrated microchan-nel cooling for three-dimensional electronic circuit architectures,” ASME J. Heat Transfer, vol. 127, no. 1, pp. 49–58, Feb. 2005.[12] S. G. Singh and C. S. Tan, “Thermal mitigation using thermal through silicon via (TTSV) in 3-D ICs,” in Proc. Int. Microsystems, Packaging, As-sembly and Circuits Technology Conf. (IMPACT), Oct. 2009, pp. 182–185.[13] C.-H. Chao, K.-C. Chen, T.-C. Yin, S.-Y. Lin, and A.-Y. Wu, “Transport layer assisted routing for runtime thermal management of 3D NoC sys-tems,” ACM Trans. Embedded Comput. Syst. (TECS), vol. 13, no. 1, Article 11, Aug. 2013. [14] Mobile Intel Pentium 4 processor—M datasheet. [Online]. Avail-able: http://www.intel.com [15] L. Shang, L.-S. Peh, A. Kummar, and N. K. Jha, “Thermal modeling, characterization and management of on-chip networks,” IEEE Micro, pp. 67–68, Dec. 2004. [16] F. Liu, H. Gu, and Y. Yang, “DTBR: A dynamic thermal balance rout-ing algorithm for Network-on-Chip,” J. Comput. Electr. Eng., vol. 38, no. 2, pp. 270–281, Mar. 2012.[17] N. Hassanpour, S. Hessabi, and P. K. Hamedani, “Temperature con-trol in three-network on chips using task migration,” IET Comput. Digit. Techn., vol. 7, no. 6, pp. 274–281, June 2013.[18] J. Von Neumann and O. Morgenstern, Theory of Games and Eco-nomic Behavior. Princeton Univ. Press, 2007. [19] C.-H. Chao, K.-C. Chen, and A.-Y. Wu, “Routing-based traffic migra-tion and buffer allocation schemes for three-dimensional network-on-chip systems with thermal limit,” IEEE Trans. Very Large Scale Integr. Syst., vol. 21, no. 11, pp. 2188–2131, Nov. 2013.[20] K.-C. Chen, E.-J. Chang, H.-T. Li, and A.-Y. Wu, “RC-based tempera-ture prediction scheme for proactive dynamic thermal management in throttled-based 3D NoCs,” IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 1, pp. 206–218, Jan. 2015.[21] W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, and M. R. Stan, “HotSpot: A compact thermal modeling methodology for early-stage VLSI design,” IEEE Trans. Very Large Scale Integr. Syst., vol. 14, no. 5, pp. 501–513, May 2006.[22] M. K.-F. Schafer, T. Hollstein, H. Zimmer, and M. Glesner, “Dead-lock-free routing and component placement for irregular mesh-based network-on-chip,” in Proc. IEEE/ACM Int. Conf. Computer-Aided Design, Nov. 2005, pp. 238–245.[23] R. Holsmark and S. Kummar, “Design issues and performance evaluation of mesh NoC with regions,” in Proc. Norchip Conf., Nov. 2005, pp. 40–43.[24] S.-Y. Lin, T.-C. Yin, H.-Y. Wang, and A.-Y. Wu, “Traffic- and thermal-aware routing for throttling three-dimensional network-on-chip sys-tem,” in Proc. Int. Symp. VLSI Design, Automation, and Test (VLSI-DAT), Apr. 2011, pp. 135–138.[25] K.-C. Chen, S.-Y. Lin, H.-S. Hung, and A.-Y. Wu, “Topology-aware adaptive routing for non-stationary irregular mesh in throttled 3D NoC systems,” IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 10, pp. 2109–2120, Oct. 2013.[26] X. Wang, K. Ma, and Y. Wang, “Adaptive power control with online model estimation for chip multiprocessors,” IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 10, pp. 1681–1696, Oct. 2011.[27] A. Bartolini, M. Cacciari, A. Tilli, and L. Benini, “Thermal and ener-gy management of high-performance multicores: distributed and self-calibrating model-predictive controller,” IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 1, pp. 170–183, Apr. 2012.[28] Y. W. Li, H. Lakdawaka, A. Raychowdhury, G. Taylor, and K. Sou-myanath, “A 1.05 V 1.6 mW 0.45 °C 3v-resolution TR-based temperature sensor with parasitic-resistance compensation in 32nm CMOS,” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2009, pp. 340–341.[29] S. Wang and R. Bettati, “Reactive speed control in temperature-constrained real-time systems,” Real-Time Syst., vol. 39, no. 1–3, pp. 73–95, Dec. 2007.