A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita...
-
Upload
gerald-dennis -
Category
Documents
-
view
215 -
download
0
Transcript of A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita...
![Page 1: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/1.jpg)
A Heterogeneous Multiple Network-On-Chip Design:
An Application-Aware Approach
Asit K. Mishra Chita R. DasOnur Mutlu
![Page 2: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/2.jpg)
2
Executive summary• Problem: Current day NoC designs are agnostic to application requirements and
are provisioned for the general case or worst case. Applications have widely differing demands from the network
• Our goal: To design a NoC that can satisfy the diverse dynamic performance requirements of applications
• Observation: Applications can be divided into two general classes in terms of their requirements from the network: bandwidth-sensitive and latency-sensitive
- Not all applications are equally sensitive to bandwidth and latency
• Key idea: Design two NoC - Each sub-network customized for either BW or LAT sensitive applications - Propose metrics to classify applications as BW or LAT sensitive - Prioritize applications’ packets within the sub-networks based on their sensitivity
• Network design: BW optimized network has wider link width but operates at a lower frequency and LAT optimized network has narrow link width but operates at a higher frequency
• Results: Our proposal is significantly better when compared to an iso-resource monolithic network (5%/3% weighted/instruction throughput improvement and 31% energy reduction)
![Page 3: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/3.jpg)
3
• Channel bandwidth affects network latency, throughput and energy/power
• Increase in channel BW leads to- Reduction in packet serialization- Increase in router power
Resource requirements of various applications - I
Impact of channel bandwidth on application performance
![Page 4: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/4.jpg)
4
Resource requirements of various applications - I
Impact of channel bandwidth on application performance
Simulation settings:
• 8x8 multi-hop packet based mesh network
• Each node in the network has an OoO processor (2GHz), private L1 cache and a router (2GHz) • Shared 1MB per core shared L2
• 6VC/PC, 2 stage router
![Page 5: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/5.jpg)
5
Resource requirements of various applications - I
Impact of channel bandwidth on application performance
appl
u
wrf art
deal
sjen
g
barn
es
grm
cs
nam
d
h264 gcc
pvra
y
tont
o
libq
gobm
k
asta
r
milc
hmm
er
swim
sjbb sap
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
0
1
2
3
4
5
6
7
64b links 128b links 256b links 512b links
IT (
norm
. to
64b
links
)
![Page 6: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/6.jpg)
6
Resource requirements of various applications - I
Impact of channel bandwidth on application performance
appl
u
wrf art
deal
sjen
g
barn
es
grm
cs
nam
d
h264 gcc
pvra
y
tont
o
libq
gobm
k
asta
r
milc
hmm
er
swim
sjbb sap
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
0
1
2
3
4
5
6
7
64b links 128b links 256b links 512b links
IT (
norm
. to
64b
links
)
1. 18/30 (21/36 total) applications’ performance is agnostic to channel BW (8x BW inc. → less than 2x performance inc.)
![Page 7: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/7.jpg)
7
Resource requirements of various applications - I
Impact of channel bandwidth on application performance
appl
u
wrf art
deal
sjen
g
barn
es
grm
cs
nam
d
h264 gcc
pvra
y
tont
o
libq
gobm
k
asta
r
milc
hmm
er
swim
sjbb sap
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
0
1
2
3
4
5
6
7
64b links 128b links 256b links 512b links
IT (
norm
. to
64b
links
)
1. 18/30 (21/36 total) applications’ performance is agnostic to channel BW (8x BW inc. → less than 2x performance inc.)
2. 12/30 (15/36 total) applications’ performance scale with increase in channel BW (8x BW inc. → at least 2x performance inc.)
![Page 8: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/8.jpg)
8
• Reduction in router latency (by increasing frequency)- Reduction in packet latency- Increase in router power consumption
Impact of network latency on application performance
Resource requirements of various applications - II
![Page 9: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/9.jpg)
9
Resource requirements of various applications - II
Simulation settings:
• … same as last experiment
• 128b links
• Added dummy stages (2-cycle and 4-cycle ) to each router
Impact of network latency on application performance
![Page 10: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/10.jpg)
10
Resource requirements of various applications - II
appl
u
wrf art
deal
sjen
g
barn
es
grm
cs
nam
d
h264 gc
c
pvra
y
tont
o
libq g
asta
r
milc h
swim
sjbb sa
p
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
0.5
0.6
0.7
0.8
0.9
1.0
1.12-cycle router 4-cycle router 6-cycle router
IT (n
orm
. to
2-cy
cle
rout
er)
Impact of network latency on application performance
![Page 11: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/11.jpg)
11
Resource requirements of various applications - II
appl
u
wrf art
deal
sjen
g
barn
es
grm
cs
nam
d
h264 gc
c
pvra
y
tont
o
libq g
asta
r
milc h
swim
sjbb sa
p
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
0.5
0.6
0.7
0.8
0.9
1.0
1.12-cycle router 4-cycle router 6-cycle router
IT (n
orm
. to
2-cy
cle
rout
er)
1. 18/30 (21/36 total) applications’ performance is sensitive to network latency (3x latency reduction → at least 25% performance improvement)
Impact of network latency on application performance
![Page 12: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/12.jpg)
12
Resource requirements of various applications - II
appl
u
wrf art
deal
sjen
g
barn
es
grm
cs
nam
d
h264 gc
c
pvra
y
tont
o
libq g
asta
r
milc h
swim
sjbb sa
p
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
0.5
0.6
0.7
0.8
0.9
1.0
1.12-cycle router 4-cycle router 6-cycle router
IT (n
orm
. to
2-cy
cle
rout
er)
2. 12/30 (15/36 total) applications’ performance is marginally sensitive to network latency (3x latency increase → less than 15% performance improvement)
1. 18/30 (21/36 total) applications’ performance is sensitive to network latency (3x latency reduction → at least 25% performance improvement)
Impact of network latency on application performance
![Page 13: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/13.jpg)
13
a
wrf art
deal s ba g
h264 gc
c p t
libq g
asta
r
milc h
sjbb sa
p x s
bzip
lbm
sjas
sopl
x
cact
s o
mcf
0.5
0.7
0.9
1.12-cycle router 4-cycle router 6-cycle router
IT (n
orm
. to
2-cy
cle
rout
er)
Application-aware approach to designing multiple NoCs a
pp
lu wrf
art
de
al
sje
ng
ba g
h2
64
gcc
pvr
ay
ton
to
libq g
ast
ar
milc h
swim
sjb
b
sap
xala
n s
bzi
p
lbm
sja
s
sop
lx
cact
s o
mcf
01234567
64b links 128b links 256b links 512b links
IT (
no
rm. t
o 6
4b
lin
ks)
![Page 14: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/14.jpg)
14
a
wrf art
deal s ba g
h264 gc
c p t
libq g
asta
r
milc h
sjbb sa
p x s
bzip
lbm
sjas
sopl
x
cact
s o
mcf
0.5
0.7
0.9
1.12-cycle router 4-cycle router 6-cycle router
IT (n
orm
. to
2-cy
cle
rout
er)
Application-aware approach to designing multiple NoCs
Based on the observations:
1. Applications can be classified into distinct classes: typically LAT/BW sensitive2. LAT sensitive applications can benefit from low network latency3. BW sensitive applications can benefit from high network bandwidth4. Not all applications are equally sensitive to either LAT or BW5. Monolithic network cannot optimize both classes simultaneously
ap
plu wrf
art
de
al
sje
ng
ba g
h2
64
gcc
pvr
ay
ton
to
libq g
ast
ar
milc h
swim
sjb
b
sap
xala
n s
bzi
p
lbm
sja
s
sop
lx
cact
s o
mcf
01234567
64b links 128b links 256b links 512b links
IT (
no
rm. t
o 6
4b
lin
ks)
![Page 15: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/15.jpg)
15
a
wrf art
deal s ba g
h264 gc
c p t
libq g
asta
r
milc h
sjbb sa
p x s
bzip
lbm
sjas
sopl
x
cact
s o
mcf
0.5
0.7
0.9
1.12-cycle router 4-cycle router 6-cycle router
IT (n
orm
. to
2-cy
cle
rout
er)
Application-aware approach to designing multiple NoCs
Solution
Two NoCs where each (sub)network is optimized for either LAT or BW sensitive applications
ap
plu wrf
art
de
al
sje
ng
ba g
h2
64
gcc
pvr
ay
ton
to
libq g
ast
ar
milc h
swim
sjb
b
sap
xala
n s
bzi
p
lbm
sja
s
sop
lx
cact
s o
mcf
01234567
64b links 128b links 256b links 512b links
IT (
no
rm. t
o 6
4b
lin
ks)
![Page 16: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/16.jpg)
16
Design methodology
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
![Page 17: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/17.jpg)
17
Design methodology
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
1
Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme
1
![Page 18: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/18.jpg)
18
Design methodology
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
1
Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme
1
2 Design sub-networks based on applications’ demand- This network architecture is better than a monolithic iso-resource
network
2
![Page 19: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/19.jpg)
19
Design methodology
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
1
Identify LAT/BW sensitive applications- Proposes a novel dynamic application classification scheme
1
2 Design sub-networks based on applications’ demand- This network architecture is better than a monolithic iso-resource
network
2
DE
MU
X
![Page 20: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/20.jpg)
20
Design: Dynamic classification of applications
Network episode Compute episode
time
Application life cycleO
uts
tan
din
g n
etw
ork
p
acke
ts
![Page 21: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/21.jpg)
21
Design: Dynamic classification of applications
Network episode Compute episode
time
Application life cycle
• App. has at least one outstanding packet• Processor is likely stalling → low IPC
Ou
tsta
nd
ing
net
wo
rk
pac
kets
![Page 22: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/22.jpg)
22
Design: Dynamic classification of applications
Network episode Compute episode
time
Application life cycle
• App. has at least one outstanding packet• Processor is likely stalling → low IPC
• App. has no outstanding packet• High IPC
Ou
tsta
nd
ing
net
wo
rk
pac
kets
![Page 23: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/23.jpg)
23
Design: Dynamic classification of applications
Network episode Compute episode
Ep
iso
de
he
igh
t
Episode length time
Application life cycle
• App. has at least one outstanding packet• Processor is likely stalling → low IPC
• App. has no outstanding packet• High IPC
Episode length = Number of consecutive cycles there are net. packets
Episode height = Avg. number of L1 packets injected during an episode
Ou
tsta
nd
ing
net
wo
rk
pac
kets
![Page 24: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/24.jpg)
24
Design: Dynamic classification of applications
Network episode Compute episode
time
Application life cycle
• App. has at least one outstanding packet• Processor is likely stalling → low IPC
• App. has no outstanding packet• High IPC
Ou
tsta
nd
ing
net
wo
rk
pac
kets
Short episode ht.: Low MLP, each request is critical (LAT sensitive)Tall episode ht.: High MLP (BW sensitive)
Short episode len.: Packets are very critical (LAT sensitive)Long episode len.: Latency tolerant (could be de-prioritized)
Episode length Ep
iso
de
he
igh
t
![Page 25: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/25.jpg)
25
Classification and ranking
Classification Length Long Medium Short
Tall gems, mcf sphinx, lbm, cactus, xalan sjeng, tonto
Height Medium omnetpp, apsiocean, sjbb, sap, bzip,
sjas, soplex, tpc
applu, perl, barnes, gromacs, namd, calculix,
gcc, povray, h264, gobmk, hmmer, astar
Short leslie art, libq, milc, swim wrf, deal
Classification: LAT/BW
![Page 26: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/26.jpg)
26
Classification and ranking
Classification: LAT/BW
Ranking Length Long Medium Short
High Rank-4 Rank-2 Rank-1Height Medium Rank-3 Rank-2 Rank-2
Short Rank-4 Rank-3 Rank-1
Ranking: Sensitivity to LAT/BW
Classification Length Long Medium Short
Tall gems, mcf sphinx, lbm, cactus, xalan sjeng, tonto
Height Medium omnetpp, apsiocean, sjbb, sap, bzip,
sjas, soplex, tpc
applu, perl, barnes, gromacs, namd, calculix,
gcc, povray, h264, gobmk, hmmer, astar
Short leslie art, libq, milc, swim wrf, deal
![Page 27: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/27.jpg)
27
Network design
1N-128
![Page 28: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/28.jpg)
28
Network design
1N-128 2N-64x256-ST(Steering)
![Page 29: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/29.jpg)
29
Network design
1N-128 2N-64x256-ST(Steering)
2N-64x256-ST-RK(Steering+Ranking)
![Page 30: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/30.jpg)
30
Network design
1N-128 2N-64x256-ST(Steering)
2N-64x256-ST-RK(Steering+Ranking)
2N-64x256-ST-RK(FS)(Steering+Ranking and
Frequency Scaling)
![Page 31: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/31.jpg)
31
Network design
1N-128
1N-256
2N-64x256-ST(Steering)
2N-64x256-ST-RK(Steering+Ranking)
2N-64x256-ST-RK(FS)(Steering+Ranking and
Frequency Scaling)
1N-512 (High BW)2N-128X128
1N-320(iso-BW)
1N-320(FS)(iso-resource)
![Page 32: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/32.jpg)
32
Analysis
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Performance (25 WL with 50% BW and 50% LAT)
![Page 33: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/33.jpg)
33
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
Performance (25 WL with 50% BW and 50% LAT)
![Page 34: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/34.jpg)
34
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
+18%
Performance (25 WL with 50% BW and 50% LAT)
![Page 35: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/35.jpg)
35
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
Performance (25 WL with 50% BW and 50% LAT)
+7%+18%
![Page 36: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/36.jpg)
36
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
Performance (25 WL with 50% BW and 50% LAT)
+5%+7%
+18%
![Page 37: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/37.jpg)
37
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
Performance (25 WL with 50% BW and 50% LAT)
5%+5%
+7%+18%
![Page 38: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/38.jpg)
38
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
Performance (25 WL with 50% BW and 50% LAT)
w. 2%5%
+5%+7%
+18%
![Page 39: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/39.jpg)
39
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
Performance (25 WL with 50% BW and 50% LAT)
w. 2% w. 2%5%
+5%+7%
+18%
![Page 40: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/40.jpg)
40
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
Performance (25 WL with 50% BW and 50% LAT)
w. 2% w. 2%5%
+5%+7%
+18%
1N-128
1N-256
2N-128x128
1N-512
2N-64x256-ST
2N-64x256-ST+RK(no FS)
2N-64x256-ST+RK(FS)
1N-320(no FS)
1N-320 (FS)
0
0.4
0.8
1.2
1.6
2
Nor
mal
ized
ene
rgy
Energy (25 WL with 50% BW and 50% LAT)
- 47%- 59%
![Page 41: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/41.jpg)
41
1N-128
1N-256
2N-128
x128
1N-512
2N-64x
256-S
T
2N-64x
256-S
T+RK(no
FS)
2N-64x
256-S
T+RK(FS)
1N-320
(no F
S)
1N-320
(FS)
0
10
20
30
40
50
60
We
igh
ted
sp
ee
du
p
Analysis
Performance (25 WL with 50% BW and 50% LAT)
w. 2% w. 2%5%
+5%+7%
+18%
1N-128
1N-256
2N-128x128
1N-512
2N-64x256-ST
2N-64x256-ST+RK(no FS)
2N-64x256-ST+RK(FS)
1N-320(no FS)
1N-320 (FS)
0
0.4
0.8
1.2
1.6
2
Nor
mal
ized
ene
rgy
Energy (25 WL with 50% BW and 50% LAT)
- 47%- 59%
Best EDP across all designs
![Page 42: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/42.jpg)
42
Conclusions• Problem: Current day NoC designs are agnostic to application requirements and
are provisioned for the general case or worst case. Applications have widely differing demands from the network
• Our goal: To design a NoC that can satisfy the diverse dynamic performance requirements of applications
• Observation: Applications can be divided into two general classes in terms of their requirements from the network: bandwidth-sensitive and latency-sensitive
- Not all applications are equally sensitive to bandwidth and latency
• Key idea: Design two NoC - Each sub-network customized for either BW or LAT sensitive applications - Propose metrics to classify applications as BW or LAT sensitive - Prioritize applications’ packets within the sub-networks based on their sensitivity
• Network design: BW optimized network has wider link width but operates at a lower frequency and LAT optimized network has narrow link width but operates at a higher frequency
• Results: Our proposal is significantly better when compared to an iso-resource monolithic network (5%/3% weighted/instruction throughput improvement and 31% energy reduction)
![Page 44: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/44.jpg)
44
Backup Slides . . .
![Page 45: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/45.jpg)
45
Other metrics considered for application classification
0
20
40
60
80
100
120
0
200
400
600
800
1000
1200
1400
1600
1800
2000
L1MPKI L2MPKI Slack
L1
/L2
MP
KI
Sla
ck (
in c
ycle
s)
![Page 46: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/46.jpg)
46
Analysis of network episode length and height
appl
u
wrf art
deal
sjeng
barn
es
grm
cs
nam
d
h264 gc
c
pvra
y
tont
o
libq
gobm
k
asta
r
milc h
swim
sjbb
sap
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
02468
101214
Avg.
epi
sode
hei
ght
(net
wor
k pa
cket
s)
appl
u
wrf art
deal
sjeng
barn
es
grm
cs
nam
d
h264 gc
c
pvra
y
tont
o
libq
gobm
k
asta
r
milc
hmm
er
swim
sjbb
sap
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
0100020003000400050006000
Avg
. epi
sode
leng
th
(in c
ycle
s)
Short length/height
Medium length/height
Long length/High height
0.3M10K
0.4M18K
![Page 47: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/47.jpg)
47
Analysis of network episode length and height
appl
u
wrf art
deal
sjeng
barn
es
grm
cs
nam
d
h264 gc
c
pvra
y
tont
o
libq
gobm
k
asta
r
milc h
swim
sjbb
sap
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
02468
101214
Avg.
epi
sode
hei
ght
(net
wor
k pa
cket
s)
appl
u
wrf art
deal
sjeng
barn
es
grm
cs
nam
d
h264 gc
c
pvra
y
tont
o
libq
gobm
k
asta
r
milc
hmm
er
swim
sjbb
sap
xala
n
sphn
x
bzip
lbm
sjas
sopl
x
cact
s
omne
t
gem
s
mcf
0100020003000400050006000
Avg
. epi
sode
leng
th
(in c
ycle
s)
Short length/height
Medium length/height
Long length/High height
Based on performance scaling sensitivity to bandwidth and frequency
0.3M10K
0.4M18K
![Page 48: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/48.jpg)
48
Empirical results to support the classification
SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications
![Page 49: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/49.jpg)
49
Empirical results to support the classification
SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications
![Page 50: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/50.jpg)
50
Empirical results to support the classification
SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications
![Page 51: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/51.jpg)
51
Empirical results to support the classification
SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications
![Page 52: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/52.jpg)
52
Empirical results to support the classification
SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications
Why 9 clusters?
0 5 10 15 20 25 30 350
25
50
75
100
125
150
175
200
225
Number of clusters
Wit
hin
gro
up
su
m o
f s
qu
are
s
![Page 53: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/53.jpg)
53
Empirical results to support the classification
SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications
Why 9 clusters?
0 5 10 15 20 25 30 350
25
50
75
100
125
150
175
200
225
Number of clusters
Wit
hin
gro
up
su
m o
f s
qu
are
s 13x
![Page 54: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/54.jpg)
54
Empirical results to support the classification
SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications
Why 9 clusters?
0 5 10 15 20 25 30 350
25
50
75
100
125
150
175
200
225
Number of clusters
Wit
hin
gro
up
su
m o
f s
qu
are
s
![Page 55: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/55.jpg)
55
Analysis with varying workload combinations
0% BANDWIDTH 100% LATENCY
25% BAND-WIDTH 75% LATENCY
50% BAND-WIDTH 50% LATENCY
75% BAND-WIDTH 25% LATENCY
100% BAND-WIDTH 0% LATENCY
0.7
0.9
1.1
1.3
1.5
WS IT
WS
and
IT
(no
rm.
to 1
N-1
28 n
et.)
![Page 56: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/56.jpg)
56
Comparison to prior works
1N
-12
8-S
TC
1N
-12
8-S
T+
RK
2N
-12
8x1
28
-LD
-BA
L
2N
-64
x25
6-W
-LD
-BA
L
2N
-64
x25
6-S
T+
RK
...
0.8
1.0
1.2
1.4WS IT
WS
an
d IT
(n
orm
. to
1N
-12
8 n
et.)
![Page 57: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/57.jpg)
57
Dynamic steering of packets
ap
plu art
ba
rne
s
na
md
gcc libq
ast
ar
hm
me
r
lesl
ie
calc
ulix
sje
ng
sap
sph
nx
lbm
sop
lx
om
ne
t
mcf
oce
an
0%
20%
40%
60%
80%
100%
Latency-optimized network
% p
ack
ets
in s
ub
-ne
two
rk
![Page 58: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/58.jpg)
58
Design: Putting it all together
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
MU
X
![Page 59: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/59.jpg)
59
Design: Putting it all together
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
MU
X
Classify applications based on sensitivity to network BW/LAT
![Page 60: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/60.jpg)
60
Design: Putting it all together
Episode LEN/HT
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
MU
X
Classify applications based on sensitivity to network BW/LAT
Use network episode length/height to dynamically
identify apps
![Page 61: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/61.jpg)
61
Design: Putting it all together
Episode LEN/HT
Design LAT/BW optimized networks
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
MU
X
Classify applications based on sensitivity to network BW/LAT
Use network episode length/height to dynamically
identify apps
![Page 62: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/62.jpg)
62
Design: Putting it all together
Classify applications based on sensitivity to network BW/LAT
Episode LEN/HT
Design LAT/BW optimized networks
Processors/L1$ Network L2$ and Mem. Controllers
Logical view of a multicore processor
MU
X
Use network episode length/height to dynamically
identify apps
Prioritization within networks
![Page 63: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/63.jpg)
63
Summary
• A NoC paradigm based on top-down approach (application demand/requirement analysis)
• An efficient design paradigm for future heterogeneous multicores
![Page 64: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/64.jpg)
64
Summary
• A NoC paradigm based on top-down approach (application demand/requirement analysis)
• An efficient design paradigm for future heterogeneous multicores
Small core
GPGPUs
Accelerators/ ASIC
Latency critical
Throughput (BW) critical
Throughput (BW) critical
Latency critical (real-
time constraints)
Big core
![Page 65: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/65.jpg)
65
Summary
• A NoC paradigm based on top-down approach (application demand/requirement analysis)
• An efficient design paradigm for future heterogeneous multicores
Small core
GPGPUs
Accelerators/ ASIC
Latency critical
Big core
Providing all these guarantees in one network is hard
Multiple networks: each customized for one metric
Throughput (BW) critical
Throughput (BW) critical
Latency critical (real-
time constraints)
![Page 66: A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. MishraChita R. DasOnur Mutlu.](https://reader036.fdocuments.in/reader036/viewer/2022062519/5697c0041a28abf838cc41ae/html5/thumbnails/66.jpg)
66
Summary
• A NoC paradigm based on top-down approach (application demand/requirement analysis)
• An efficient design paradigm for future heterogeneous multicore m/c
Latency
Throughput
Local communication
Long haul comm.
Power
1 cycle/ bufferless/Faster
routers
1 cycle/ high bandwidth
Power efficient links/DVFS router
Hybrid/ fewer connectivity
network
Butterfly/express channels
MU
XShare 2D space or 3D layers
Episode LEN/HT/??