Load Value Approximation: Approaching the Ideal Memory ...Load Value Approximation 5 core core core...
Transcript of Load Value Approximation: Approaching the Ideal Memory ...Load Value Approximation 5 core core core...
-
Load Value Approximation: Approaching the Ideal Memory Access Latency
Joshua San Miguel Natalie Enright Jerger
-
Chip Multiprocessor
2
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
miss
-
Approximate Data
Many applications can tolerate inexact data values. In approximate computing applications, 40% to nearly 100% of
memory data footprint can be approximated [Sampson, MICRO 2013].
3
Approximate data storage: Reducing SRAM power by lowering supply voltage [Flautner, ISCA 2002].
Reducing DRAM power by lowering refresh rate [Liu, ASPLOS 2011].
Improving PCM performance and lifetime by lowering write precision and reusing failed cells [Sampson, MICRO 2013].
-
Outline
• Load Value Approximation
• Approximator Design
• Evaluation
• Conclusion
4
-
Load Value Approximation
5
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
-
Load Value Approximation
6
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
approximator approximator approximator
-
Load Value Approximation
7
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
approximator approximator approximator
miss A
-
Load Value Approximation
8
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
approximator approximator approximator
generate A_approx
-
Load Value Approximation
9
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
approximator approximator approximator
A_approx
-
Load Value Approximation
10
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
approximator approximator approximator
A_approx
request A_actual
-
Load Value Approximation
11
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
approximator approximator approximator
A_approx
train with A_actual
-
Load Value Approximation
12
core core core
private cache private cache private cache
shared caches, network-on-chip
main memory
approximator approximator approximator
Takes memory access off critical path.
-
Approximator Design
13
ℎ , 1.0 2.2 3.1 instruction
address
global history buffer
tag
tag
tag
tag
tag
tag
tag
tag
approximator table
𝑓
local history buffer
4.1 3.9 4.0
PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1
(4.1 + 3.9 + 4.0) / 3
A_approx = 4.0
load A
-
Approximator Design
14
Load value approximators overcome the challenges of traditional value predictors: No complexity of tracking speculative values.
No rollbacks.
High accuracy/coverage with floating-point values.
More tolerant to value delay.
-
Evaluation
15
EnerJ framework [Sampson, PLDI 2011]: Program annotations to distinguish approximate data from
precise data.
Evaluate final output error and approximator coverage.
benchmark GHB size LHB size approximator size
fft 0 2 49 kB
lu 3 1 32 kB
raytracer 1 1 32 kB
smm 5 1 32 kB
sor 0 2 49 kB
-
Evaluation
16
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
fft lu raytracer smm sor
output error approximator coverage
-
Conclusion
17
Future work:
Further explore approximator design space (dynamic/hybrid schemes, machine learning).
Measure speedup of load value approximation using full-system simulations.
Measure power savings (low-power caches/NoCs/memory for approximate data).
Low-error, high-coverage approximators allow us to approach the ideal memory access latency.
-
Thank you
18
baseline (precise) - raytracer load value approximation - raytracer