RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU,...

Post on 17-Dec-2015

224 views 0 download

Transcript of RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU,...

RATHIJIT SENDAVID A. WOOD

Reuse-based Online Models for Caches

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

1

2

The Problem

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Caches: power vs performance

Reconfigurable caches e.g., IvyBridge

The Problem: Which configuration to select?

e.g., to get the best energy-efficiency?

Core

Core

Core

Core

Core

Core

Core

Core

LLC LLC

LLC LLC

LLC LLC

LLC LLC

DRAM

Miss Fetch

3

Cache Performance Prediction

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

We propose a frameworkh = (r · B) · φ

h: hit ratio r: reuse-distance distribution (novel hardware support) B: stochastic Binomial matrix φ: hit function (LRU, PLRU, RANDOM, NMRU)

Case study: Energy-Delay Product (EDP) within 7% of minimum

4

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

5

Cache Overview

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Limited storage Sets of (usually 64-byte) blocks #blocks/set = associativity (#ways) Set Index + Address tags identify data

b b b b b b b b

b b b b b b b b

b b b b b b b b

b b b b b b b b

Associativity (A)

Sets (S)

AddressTag

Match?

Y HitMiss

N

6

Last-Level Cache (LLC)

Workload Variation

2MB 4MB 8MB 16MB 32MB0

5

10

15

20

25

30

Mis

s / 1

000

Inst

ructi

on swim

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

ammp, blackscholes, bodytrack, fluidanimate, freqmine, swaptions

equake, gafort, wupwise

apache

mgrid

zeus

oltpjbb

fma3d

7

Bad configurations hurt!

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

EDP (energy-delay product)

blac body flui freq swapammp equa fma3 gafo mgri swim wupw apac jbb oltp zeus1

1.5

2

2.5

3

3.5Max. EDP

Rela

tive

to m

in. E

DP

27% worse

218% worse

MinimumMaximum

8

Problem Summary

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Reconfigurable caches

Multiple replacement policies

Goal: Online miss-ratio prediction

b b b b b b b b

b b b b b b b b

b b b b b b b b

b b b b b b b b

Associativity (A)

Sets (S)

9

Indexing Assumption

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Mapping of unique addresses to cache setsAssumption: independent, uniform [Smith, 1978]

Unique accesses as Bernoulli trials

(Partial) Hashing POWER4, POWER5, POWER6, Xeon Simple XOR-based function [similar to Cypher, 2008]

10

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

11

Temporal Locality Metrics

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Unique Reuse Distance (URD) #unique intervening addresses x y z z y x : URD(x)=2 Stack Distance [Mattson, 1970] – 1 Large cache large distances to track

Absolute Reuse Distance (ARD) #intervening addresses x y z z y x : ARD(x)=4

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

Size?

12

Per-set Locality, r(S)

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

r(S) is “compressed” as S (#sets) increases Less of the tail is important

0 4 8 12 16 20 24 28 320

0.1

0.2

0.3

0.4

0.5

0.6S=2^14S=2^13S=2^12S=2^11S=2^10

Per-set URD (unique reuse distance)

Prob

abili

ty

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

x x

x x

#sets: S #sets: S > S

0 4 8 12 16 20 24 28 320

0.2

0.4

0.6

0.8

1

S=2^14S=2^13S=2^12S=2^11S=2^10

Per-set URD (unique reuse distance)

Cum

ulati

ve P

rob.

13

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

14

Generalized stochastic Binomial matrices [Strum, 1977]r(S) = r(1) · B(1 – 1/S, 1/S)

Composition:r(S) = r(S) · B(1 – S/S, S/S)

0

0

0

0

0 0

0

0

0

0

0 0 0

0

0

0

0

Estimating per-set locality

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

■ ■ ■ ■ ■ ■ ■ ■ i

P(URD=i)

k

ir

B

P(k successes in i trials) i.e.,P(k of i to the same set)

0

0

0 0

0

0

0

0

0

0

0

1

15

Computation reuse & speedup

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

“Shorter” tail smaller matrices

r(1)

r(214)

r(213)

r(212)

r(211)

r(210)

r(210)

r(214)

r(213)

r(212)

r(211)

r(1)

Now: computeLater: hardware support

Size?

Poisson Approximation

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

16

Size of r(210)?

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Prediction with r(210) limited to URD < n

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2MB 4MB 8MB 16MB 32MB

0

0.05

0.1

0.15

0.2

0.25

0.3n=32 n=64 n=128n=256 n=512 Actual

Mis

s Ra

tio

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

17

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

18

Hit Function, φ

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

φk: P(x will hit|URD(x)=k)

Monotonically decreasing model Intuition: larger URD same or larger eviction probability

φ0 = 1φk ≤ φk-1

φ = 0

x

Not x

x

19

Hit Function, φ

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Example: A=8

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 320

0.10.20.30.40.50.60.70.80.9

1LRUPLRUNMRURANDOM

Unique Reuse Distance

Hit

Prob

abili

ty

20

Formulating φ

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

φ(LRU): step-function (r · B) · φ(LRU) [Smith, 1978], [Hill & Smith, 1989]

φ(PLRU): Assumes on average, traffic evenly divided between subtrees

φ(RANDOM): Estimates #intervening misses using ARD

φ(NMRU): similar to φ(RANDOM) except φ1=1

21

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

22

Prediction Accuracy

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

LRU, PLRU(A=2), NMRU(A=2): exact per-set modelOthers: approximate per-set model

-1% 0% 1% 2% 3% 4% 5% 6%0

0.10.20.30.40.50.60.70.80.9

1

LRU PLRU RANDOM NMRU

abs((predicted-actual)/actual) miss ratio

Cum

ulati

ve P

roba

bilit

y

23

Overheads

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

r = r · B : 6 80 μsec Binomial Poisson approximation for each row of B

h = (r · B) · φ : 20 30 μsec Average over 24 configurations B applied 8 times

24

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

25

Computation reuse & speedup

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

“Shorter” tail smaller matrices

r(1)

r(214)

r(213)

r(212)

r(211)

r(210)

r(210)

r(214)

r(213)

r(212)

r(211)

r(1)

Now: computeLater: hardware support

Size=512

Poisson Approximation

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

Now

26

Insights

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

x y z z y x : URD(x)=2

Unique “remember” addresses Only cardinality, not full addresses

Bloom filter for compact (approximate) representation

r(210) is seen by any set of a cache with S=210 Filter address stream

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r

27

Reference address register

access

insert

Set Filter Control Logic

filtered access

load hitinc

reset

read

read

1024-bit Bloom Filter2 hash fns

9-bit Counter

inc

512-entry Histogram

array

Hardware Support for estimating r(210)

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

Start Sample

Addr match?

Unique?

Remember

End Sample

N

Y (not hit)

Y

28

Agenda

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem

Framework Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ

Hardware support

Case Study

+ way counters

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

29

LRU Way Counters [Suh, et al. 2002]

6/20/2013

One counter per logical way (stack position)Determining logical position is hard

not totally (re-)ordered with every access heuristics, e.g., for PLRU [Kedzierski, et al. 2010]

Other Limitations Inclusion property Fixed #sets

S = S : special case of reuse frameworkS S ? Use B

provided, enough tail of r(S) is available

30

Min. EDP configuration

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

EDP within 7% of minimumReuse models outperform PLRU way counters in most cases

blac body flui freq swapammpequa fma3 gafo mgri swim wupw apac jbb oltp zeus1

1.01

1.02

1.03

1.04

1.05

1.06

1.07

1.08Reuse ModelPLRU Way Counters

Rela

tive

to m

in. E

DP

31

Summary

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

The Problem: Online miss-rate estimation for reconfigurable caches

We propose a frameworkh = (r · B) · φ

h: hit-ratio r: reuse-distance distribution (novel hardware support) B: stochastic Binomial matrix φ: hit function (LRU, PLRU, RANDOM, NMRU)

Case study: EDP within 7% of minimum

Future work: More policies, applications/case studies

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

32

Also in the paper

6/20/2013

r: lossy summarization of the address trace

Estimation for ARD

Optimizations for LRU

Conditions for PLRU eviction

More details on models & evaluation

Reuse-based Online Models for Caches

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

33

Questions?

34

Example LLC performance

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

OLTP (TPC-C + IBM DB2)

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2-w

4-w

8-w

16-w

32-w

2MB 4MB 8MB 16MB 32MB

0

0.1

0.2

0.3

0.4RANDOMNMRUPLRULRU

Mis

s Ra

tio

ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

35

Estimating cache performance

6/20/2013

Hit ratio = hits/access

∑ P(URD=i) · P(hit|URD=i)

= ·

Miss ratio = misses/access = 1 – hit ratio

Miss rate = misses/instruction = miss ratio x access/instruction

■ ■ ■ ■ … ■ ■ i

P(URD=i)

r … i

P(hit|URD=i)

φ

i

36

URD vs ARD

6/20/2013ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA

x x

z0z1 z2 z3 zk-1

{z0}* {z0,z1}* {z0,z1,z2}* {z0,z1,z2,...,zk-1}*

dk = dk-1 +1/rikApproximation:

∞dk