Data Criticality in Network-On-Chip Design...Data Criticality in Network-On-Chip Design Joshua San...
Transcript of Data Criticality in Network-On-Chip Design...Data Criticality in Network-On-Chip Design Joshua San...
Data Criticality in Network-On-Chip Design
Joshua San Miguel
Natalie Enright Jerger
Network-On-Chip Efficiency
2
Efficiency is the ability to produce results with the least amount of waste.
Wasted time NoC accounts for 30-70% of on-chip data access latency
[Z. Li, HPCA 2009][A. Sharifi, MICRO 2012]
Wasted energy NoC accounts for 20-30% of total chip power
[J. D. Owens, IEEE Micro 2007][S. R. Vangal, IEEE JSSC 2008]
Network-On-Chip Efficiency
3
Network-On-Chip Efficiency
4
load request
Network-On-Chip Efficiency
5
data response
load request
Network-On-Chip Efficiency
6
data response
load request
Minimize wasted time Deliver data no later than needed
Network-On-Chip Efficiency
7
Network-On-Chip Efficiency
8
load request 0x2
Network-On-Chip Efficiency
9
data response
0 1 2 3 4 5 6 7
load request 0x2
Network-On-Chip Efficiency
10
data response
0 1 2 3 4 5 6 7
load request 0x2
Minimize wasted energy Deliver data no earlier than needed
Network-On-Chip Efficiency
11
Why store data in blocks of multiple words?
Exploit spatial locality in applications
Avoid large tag arrays in caches
Improve row buffer utilization in DRAM
Network-On-Chip Efficiency
12
Why store data in blocks of multiple words?
Exploit spatial locality in applications
Avoid large tag arrays in caches
Improve row buffer utilization in DRAM
Store data at a coarse granularity, but
move data at a fine granularity.
Network-On-Chip Efficiency
13
01:00
03:00
02:00
04:00
Network-On-Chip Efficiency
14
01:00
03:00
02:00
04:00
Network-On-Chip Efficiency
15
01:00
03:00
02:00
04:00
Network-On-Chip Efficiency
16
01:00
03:00
02:00
04:00
Arrive no later than needed. But expensive. Wasted money since some arrive too early.
Network-On-Chip Efficiency
17
01:00
03:00
02:00
04:00
Network-On-Chip Efficiency
18
01:00
03:00
02:00
04:00
Network-On-Chip Efficiency
19
01:00
03:00
02:00
04:00
Network-On-Chip Efficiency
20
01:00
03:00
02:00
04:00
Network-On-Chip Efficiency
21
01:00
03:00
02:00
04:00
Network-On-Chip Efficiency
22
01:00
03:00
02:00
04:00
Spend just enough money to arrive both no later and no earlier than needed.
Network-On-Chip Efficiency
23
01:00
03:00
02:00
04:00
Spend just enough money to arrive both no later and no earlier than needed.
Deliver data both no later and no earlier than needed Design for data criticality
Outline
24
Defining Criticality Data Criticality
Data Liveness
Measuring Criticality Energy Wasted
Addressing Criticality NoCNoC
Data Criticality
25
Data Criticality is the promptness with which an application uses a data word after fetching it from memory.
Critical: used immediately after being fetched.
Non-critical: used some time later after being fetched.
Defining Criticality
Data Criticality – blackscholes
26
for (i++) {
... = BlkSchlsEqEuroNoDiv(sptprice[i]);
}
Defining Criticality
Data Criticality – blackscholes
27
for (i++) {
... = BlkSchlsEqEuroNoDiv(sptprice[i]);
}
Defining Criticality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sptprice
i = 0
Data Criticality – blackscholes
28
for (i++) {
... = BlkSchlsEqEuroNoDiv(sptprice[i]);
}
Defining Criticality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sptprice
i = 15
Data Criticality – blackscholes
29
for (i++) {
... = BlkSchlsEqEuroNoDiv(sptprice[i]);
}
Defining Criticality
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sptprice
criticality
Data Criticality – fluidanimate
30
for (iparNeigh++) {
if (borderNeigh) {
pthread_mutex_lock();
neigh->a[iparNeigh] -= ...;
pthread_mutex_unlock();
} else {
neigh->a[iparNeigh] -= ...;
}
}
Defining Criticality
Data Criticality
31
Data criticality is an inherent consequence of spatial locality and is exhibited by most (if not all) real-world applications.
Examples of non-criticality: Long-running code between accesses
Interference due to thread synchronization
Dependences from other cache misses
Preemption by the operating system
Defining Criticality
Data Criticality vs. Instruction Criticality
32
Instruction (or packet) criticality
Defining Criticality
load miss B
load miss A
load miss C
load miss D
Data Criticality vs. Instruction Criticality
33
Instruction (or packet) criticality
Defining Criticality
load miss B
load miss A
load miss C
load miss D
Data Criticality vs. Instruction Criticality
34
Data criticality
Defining Criticality
load miss B
load miss A
load miss C
load miss D
Data Liveness
35
Data Liveness describes whether or not an application uses a data word at all after fetching it from memory.
Live-on-arrival (live): used at least once during its cache lifetime.
Dead-on-arrival (dead): never used during its cache lifetime.
Defining Criticality
Data Liveness – fluidanimate
36
for (iz++)
for (iy++)
for (ix++)
for (j++) {
if (border(ix)) ... = cell->v[j].x;
if (border(iy)) ... = cell->v[j].y;
if (border(iz)) ... = cell->v[j].z;
}
Defining Criticality
Data Liveness – fluidanimate
37
for (iz++)
for (iy++)
for (ix++)
for (j++) {
if (border(ix)) ... = cell->v[j].x;
if (border(iy)) ... = cell->v[j].y;
if (border(iz)) ... = cell->v[j].z;
}
Defining Criticality
x y z x y z x y z x y z x y z
cell->v
Data Liveness – fluidanimate
38
for (iz++)
for (iy++)
for (ix++)
for (j++) {
if (border(ix)) ... = cell->v[j].x;
if (border(iy)) ... = cell->v[j].y;
if (border(iz)) ... = cell->v[j].z;
}
Defining Criticality
x y z x y z x y z x y z x y z
cell->v
Data Liveness
39
Data liveness measures the degree of spatial locality in an application.
Examples of dead words: Unused members of structs
Irregular or random access patterns
Heap fragmentation
Padding between data elements
Early evictions due to invalidations, cache pressure or poor replacement policies
Defining Criticality
Outline
40
Defining Criticality Data Criticality
Data Liveness
Measuring Criticality Energy Wasted
Addressing Criticality NoCNoC
Measuring Criticality
41 Measuring Criticality
time
load miss A fetch A[i] use A[i]
fetch latency
Measuring Criticality
42 Measuring Criticality
time
load miss A fetch A[i] use A[i]
fetch latency
access latency
𝑛𝑜𝑛−𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙𝑖𝑡𝑦 = 𝑎𝑐𝑐𝑒𝑠𝑠 𝑙𝑎𝑡𝑒𝑛𝑐𝑦
𝑓𝑒𝑡𝑐ℎ 𝑙𝑎𝑡𝑒𝑛𝑐𝑦
1x for critical words, >1x for non-critical words
Measuring Criticality
43 Measuring Criticality
time
load miss A fetch A[i] use A[i]
fetch latency
access latency
Measuring Criticality
44
Full-system simulations: FeS2, BookSim, DSENT
16 2.0 GHz OoO cores
64 kB private L1 per core, 16-word cache blocks
16 MB shared distributed L2
Baseline NoC configuration: 4 x 4 mesh, 2.0 GHz, 128-bit channels
X-Y routing, 3-stage router pipeline, 6 4-flit VCs per port
Applications: PARSEC and SPLASH-2
Measuring Criticality
Measuring Criticality
45
Very low criticality
Measuring Criticality
0%
20%
40%
60%
80%
100%
1x 2x 3x 4x 5x 6x 7x 8x 9x 10x
% a
cces
sed
wo
rds
(cu
mu
lati
ve)
access latency / fetch latency
blackscholes bodytrack fluidanimate streamcluster swaptions
Measuring Criticality
46
Low criticality
Measuring Criticality
0%
20%
40%
60%
80%
100%
1x 2x 3x 4x 5x 6x 7x 8x 9x 10x
% a
cces
sed
wo
rds
(cu
mu
lati
ve)
access latency / fetch latency
barnes lu_cb water_nsquared water_spatial
Measuring Criticality
47
High criticality
Measuring Criticality
0%
20%
40%
60%
80%
100%
1x 2x 3x 4x 5x 6x 7x 8x 9x 10x
% a
cces
sed
wo
rds
(cu
mu
lati
ve)
access latency / fetch latency
fft vips volrend x264
Measuring Criticality
48
Very high criticality
Measuring Criticality
0%
20%
40%
60%
80%
100%
1x 2x 3x 4x 5x 6x 7x 8x 9x 10x
% a
cces
sed
wo
rds
(cu
mu
lati
ve)
access latency / fetch latency
canneal cholesky radiosity radix
Measuring Criticality – Energy Wasted
49
Estimate energy wasted due to non-criticality
Model an ideal NoC where for each word: 𝑓𝑒𝑡𝑐ℎ 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 ≈ 𝑎𝑐𝑐𝑒𝑠𝑠 𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Measuring Criticality
Measuring Criticality – Energy Wasted
50 Measuring Criticality
Estimate energy wasted due to non-criticality
Model an ideal NoC where for each word: 𝑓𝑒𝑡𝑐ℎ 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 ≈ 𝑎𝑐𝑐𝑒𝑠𝑠 𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Measuring Criticality – Energy Wasted
51 Measuring Criticality
high criticality
low criticality
Estimate energy wasted due to non-criticality
Model an ideal NoC where for each word: 𝑓𝑒𝑡𝑐ℎ 𝑙𝑎𝑡𝑒𝑛𝑐𝑦 ≈ 𝑎𝑐𝑐𝑒𝑠𝑠 𝑙𝑎𝑡𝑒𝑛𝑐𝑦
Measuring Criticality – Energy Wasted
52
e.g., bodytrack:
Measuring Criticality
0%
20%
40%
60%
80%
100%
1x 6x 11x 16x 21x 26x 31x
% a
cces
sed
wo
rds
(cu
mu
lati
ve)
access latency / fetch latency
subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6
Measuring Criticality – Energy Wasted
53
e.g., bodytrack:
Measuring Criticality
0%
20%
40%
60%
80%
100%
1x 6x 11x 16x 21x 26x 31x
% a
cces
sed
wo
rds
(cu
mu
lati
ve)
access latency / fetch latency
subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6
1x – 1.1x
1.1x – 2x
2x – 3.5x
3.5x – 7x
18x - ∞
7x – 18x
Measuring Criticality – Energy Wasted
54
e.g., bodytrack:
Measuring Criticality
0%
20%
40%
60%
80%
100%
1x 6x 11x 16x 21x 26x 31x
% a
cces
sed
wo
rds
(cu
mu
lati
ve)
access latency / fetch latency
subnet 1 subnet 2 subnet 3 subnet 4 subnet 5 subnet 6
1x – 1.1x, 2.0 GHz
1.1x – 2x, 1.8 GHz
2x – 3.5x , 1.0 GHz
3.5x – 7x, 571.4 MHz
18x - ∞, 111.0 MHz
7x – 18x, 285.7 MHz
Measuring Criticality – Energy Wasted
55
Dynamic energy wasted due to non-criticality
Measuring Criticality
0%
5%
10%
15%
20%
25%
30%
35%
40%b
arn
es
bla
cksc
ho
les
bo
dyt
rack
can
ne
al
cho
lesk
y fft
flu
idan
imat
e
lu_c
b
rad
iosi
ty
rad
ix
stre
amcl
ust
er
swap
tio
ns
vip
s
volr
end
wat
er_n
squ
ared
wat
er_s
pat
ial
x26
4
geo
mea
n
dyn
amic
en
ergy
was
ted
Measuring Criticality – Energy Wasted
56
Dynamic energy wasted due to non-criticality
Measuring Criticality
0%
5%
10%
15%
20%
25%
30%
35%
40%b
arn
es
bla
cksc
ho
les
bo
dyt
rack
can
ne
al
cho
lesk
y fft
flu
idan
imat
e
lu_c
b
rad
iosi
ty
rad
ix
stre
amcl
ust
er
swap
tio
ns
vip
s
volr
end
wat
er_n
squ
ared
wat
er_s
pat
ial
x26
4
geo
mea
n
dyn
amic
en
ergy
was
ted
Measuring Criticality – Energy Wasted
57
Dynamic energy wasted due to non-criticality
Measuring Criticality
0%
5%
10%
15%
20%
25%
30%
35%
40%b
arn
es
bla
cksc
ho
les
bo
dyt
rack
can
ne
al
cho
lesk
y fft
flu
idan
imat
e
lu_c
b
rad
iosi
ty
rad
ix
stre
amcl
ust
er
swap
tio
ns
vip
s
volr
end
wat
er_n
squ
ared
wat
er_s
pat
ial
x26
4
geo
mea
n
dyn
amic
en
ergy
was
ted
Measuring Criticality – Energy Wasted
58
Dynamic energy wasted due to dead words
Measuring Criticality
0%10%20%30%40%50%60%70%80%90%
100%b
arn
es
bla
cksc
ho
les
bo
dyt
rack
can
ne
al
cho
lesk
y fft
flu
idan
imat
e
lu_c
b
rad
iosi
ty
rad
ix
stre
amcl
ust
er
swap
tio
ns
vip
s
volr
end
wat
er_n
squ
ared
wat
er_s
pat
ial
x26
4
geo
mea
n
dyn
amic
en
ergy
was
ted
Measuring Criticality – Energy Wasted
59
Dynamic energy wasted due to dead words
Measuring Criticality
0%10%20%30%40%50%60%70%80%90%
100%b
arn
es
bla
cksc
ho
les
bo
dyt
rack
can
ne
al
cho
lesk
y fft
flu
idan
imat
e
lu_c
b
rad
iosi
ty
rad
ix
stre
amcl
ust
er
swap
tio
ns
vip
s
volr
end
wat
er_n
squ
ared
wat
er_s
pat
ial
x26
4
geo
mea
n
dyn
amic
en
ergy
was
ted
68.8% energy wasted due to non-critical and dead words.
Outline
60
Defining Criticality Data Criticality
Data Liveness
Measuring Criticality Energy Wasted
Addressing Criticality NoCNoC
Addressing Criticality
61
A criticality-aware NoC design needs to:
1. Predict the criticality of a word prior to fetching it.
2. Separate the fetching of words based on their criticality.
3. Reduce energy consumption in fetching low-criticality words.
4. Eliminate the fetching of dead words.
NoCNoC (Non-Critical NoC)
A proof-of-concept, criticality-aware design
Addressing Criticality
NoCNoC
62 Addressing Criticality
1. Predict the criticality of a word prior to fetching it.
Binary predictor: either critical or non-critical.
NoCNoC
63 Addressing Criticality
1. Predict the criticality of a word prior to fetching it.
Binary predictor: either critical or non-critical. load miss (requested word 4)
NoCNoC
64 Addressing Criticality
000000000000010X101000000000000
-15 … 0 … 15
instruction address
prediction table
1. Predict the criticality of a word prior to fetching it.
Binary predictor: either critical or non-critical. load miss (requested word 4)
NoCNoC
65 Addressing Criticality
000000000000010X101000000000000
-15 … 0 … 15
instruction address
prediction table
1. Predict the criticality of a word prior to fetching it.
Binary predictor: either critical or non-critical. load miss (requested word 4)
NoCNoC
66 Addressing Criticality
000000000000010X101000000000000
-15 … 0 … 15
instruction address
prediction table
1. Predict the criticality of a word prior to fetching it.
Binary predictor: either critical or non-critical. load miss (requested word 4)
NoCNoC
67 Addressing Criticality
000000000000010X101000000000000
-15 … 0 … 15
instruction address
prediction table
1. Predict the criticality of a word prior to fetching it.
Binary predictor: either critical or non-critical. load miss (requested word 4)
NoCNoC
68 Addressing Criticality
000000000000010X101000000000000
-15 … 0 … 15
instruction address
prediction table
0 … 15
prediction vector (requested word 4)
0010110100000000
1. Predict the criticality of a word prior to fetching it.
Binary predictor: either critical or non-critical. load miss (requested word 4)
NoCNoC
69 Addressing Criticality
000000000000010X101000000000000
-15 … 0 … 15
instruction address
prediction table
0 … 15
prediction vector (requested word 4)
L1 request packet 0010110100000000
1. Predict the criticality of a word prior to fetching it.
Binary predictor: either critical or non-critical. load miss (requested word 4)
NoCNoC
70
2. Separate the fetching of words based on their criticality.
Two physical subnetworks: critical and non-critical.
Addressing Criticality
NoCNoC
71
2. Separate the fetching of words based on their criticality.
Two physical subnetworks: critical and non-critical.
Addressing Criticality
NoCNoC
72 Addressing Criticality
critical
non-critical
2. Separate the fetching of words based on their criticality.
Two physical subnetworks: critical and non-critical.
NoCNoC
73
3. Reduce energy consumption in fetching low-criticality words.
DVFS in non-critical subnetwork.
Addressing Criticality
NoCNoC
74
3. Reduce energy consumption in fetching low-criticality words.
DVFS in non-critical subnetwork.
Addressing Criticality
critical
if criticality very low
NoCNoC
75
3. Reduce energy consumption in fetching low-criticality words.
DVFS in non-critical subnetwork.
Addressing Criticality
critical
if criticality low
NoCNoC
76
3. Reduce energy consumption in fetching low-criticality words.
DVFS in non-critical subnetwork.
Addressing Criticality
critical
if criticality high
NoCNoC
77
3. Reduce energy consumption in fetching low-criticality words.
DVFS in non-critical subnetwork.
Addressing Criticality
critical
if criticality very high
NoCNoC
78
4. Eliminate the fetching of dead words.
Binary predictor: either live or dead.
Addressing Criticality
NoCNoC
79
4. Eliminate the fetching of dead words.
Binary predictor: either live or dead.
Addressing Criticality
000000000000010X101000000000000
-15 … 0 … 15
instruction address
prediction table
NoCNoC
80
More details and results in the paper:
DVFS scheme
Prediction tables
Comparison to instruction criticality (Aergia [R. Das, ISCA 2010])
Addressing Criticality
NoCNoC – Prediction Accuracy
81 Addressing Criticality
80%
85%
90%
95%
100%
criticality liveness
pre
dic
tio
n a
ccu
racy
correct over under
NoCNoC – Prediction Accuracy
82 Addressing Criticality
80%
85%
90%
95%
100%
criticality liveness
pre
dic
tio
n a
ccu
racy
correct over under
Outperforms prior liveness predictor (70% accuracy) [H. Kim, NOCS 2011]
NoCNoC – Performance and Energy
83 Addressing Criticality
0.6
0.7
0.8
0.9
1
1.1
dynamic energy runtime
no
rmal
ized
to
bas
elin
e
Conclusion
84
Define Data Criticality Deliver data both no later and no earlier than needed.
Measure Criticality 68.8% energy wasted due to non-critical and dead words.
Address Criticality NoCNoC, a proof-of-concept, criticality-aware design.
Thank you