Load Value Approximation: Approaching the Ideal Memory ...Load Value Approximation 5 core core core...

Load Value Approximation: Approaching the Ideal Memory Access Latency

Joshua San Miguel Natalie Enright Jerger

Chip Multiprocessor

2

core core core

private cache private cache private cache

shared caches, network-on-chip

main memory

miss

Approximate Data

Many applications can tolerate inexact data values. In approximate computing applications, 40% to nearly 100% of

memory data footprint can be approximated [Sampson, MICRO 2013].

3

Approximate data storage: Reducing SRAM power by lowering supply voltage [Flautner, ISCA 2002].

Reducing DRAM power by lowering refresh rate [Liu, ASPLOS 2011].

Improving PCM performance and lifetime by lowering write precision and reusing failed cells [Sampson, MICRO 2013].

Outline

• Load Value Approximation

• Approximator Design

• Evaluation

• Conclusion

4

Load Value Approximation

5

core core core



main memory


6

core core core



main memory

approximator approximator approximator


7

core core core



main memory


miss A


8

core core core



main memory


generate A_approx


9

core core core



main memory


A_approx


10

core core core



main memory


A_approx

request A_actual


11

core core core



main memory


A_approx

train with A_actual


12

core core core



main memory


Takes memory access off critical path.

Approximator Design

13

ℎ , 1.0 2.2 3.1 instruction

address

global history buffer

tag

tag

tag

tag

tag

tag

tag

tag

approximator table

𝑓

local history buffer

4.1 3.9 4.0

PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1

(4.1 + 3.9 + 4.0) / 3

A_approx = 4.0

load A

Approximator Design

14

Load value approximators overcome the challenges of traditional value predictors: No complexity of tracking speculative values.

No rollbacks.

High accuracy/coverage with floating-point values.

More tolerant to value delay.

Evaluation

15

EnerJ framework [Sampson, PLDI 2011]: Program annotations to distinguish approximate data from

precise data.

Evaluate final output error and approximator coverage.

benchmark GHB size LHB size approximator size

fft 0 2 49 kB

lu 3 1 32 kB

raytracer 1 1 32 kB

smm 5 1 32 kB

sor 0 2 49 kB

Evaluation

16

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

fft lu raytracer smm sor

output error approximator coverage

Conclusion

17

Future work:

Further explore approximator design space (dynamic/hybrid schemes, machine learning).

Measure speedup of load value approximation using full-system simulations.

Measure power savings (low-power caches/NoCs/memory for approximate data).

Low-error, high-coverage approximators allow us to approach the ideal memory access latency.

Thank you

18

baseline (precise) - raytracer load value approximation - raytracer

Load Value Approximation: Approaching the Ideal Memory ...Load Value Approximation 5 core core core...

Documents

Transcript of Load Value Approximation: Approaching the Ideal Memory ...Load Value Approximation 5 core core core...