SHiP : Signature-based Hit Predictor for High Performance Caching

34
SHiP: Signature-based Hit Predictor for High Performance Caching * Carole-Jean Wu, # Aamer Jaleel , #,+ William Hasenplaugh, * Margaret Martonosi, # Simon Steely Jr., #,+ Joel Emer * Princeton University # Intel Corporation, VSSAD #,+ MIT IEEE/ACM International Symposium on Microarchitecture (MICRO’2011)

description

SHiP : Signature-based Hit Predictor for High Performance Caching. * Carole-Jean Wu, # Aamer Jaleel , #, + William Hasenplaugh, * Margaret Martonosi, # Simon Steely Jr., #, + Joel Emer * Princeton University # Intel Corporation, VSSAD #,+ MIT. - PowerPoint PPT Presentation

Transcript of SHiP : Signature-based Hit Predictor for High Performance Caching

Page 1: SHiP : Signature-based Hit Predictor for High Performance Caching

SHiP: Signature-based Hit Predictor forHigh Performance Caching

*Carole-Jean Wu, #Aamer Jaleel, #,+William Hasenplaugh,*Margaret Martonosi, #Simon Steely Jr., #,+Joel Emer

*Princeton University #Intel Corporation, VSSAD #,+MIT

IEEE/ACM International Symposium on Microarchitecture (MICRO’2011)

Page 2: SHiP : Signature-based Hit Predictor for High Performance Caching

Motivation

• Factors making caching important• Increasing ratio of CPU speed to memory speed• Multi-core poses challenges on better shared cache

management

• LRU has been the standard LLC replacement policy• However LRU has problems!

2

Page 3: SHiP : Signature-based Hit Predictor for High Performance Caching

Problems with LRU Replacement

3

Working set larger than the cache causes thrashing

References to non-temporal data (scans) discards frequently referenced working set

miss miss miss missmiss

hit hit hit miss hit miss missscan scan scan

Wsize

Wsize

scans occur frequently in commercial workloads

LLCsize

LLCsize

Page 4: SHiP : Signature-based Hit Predictor for High Performance Caching

hit hit hit hit hit

Desired Behavior from Cache Replacement

4

miss

miss

miss

miss

miss

Working set larger than the cache Preserve some of working set in the cache

Recurring scans Preserve frequently referenced working set in the cache

hit hit hit hitscan scan scanhit hit hit

Wsize

LLCsize [ DIP (ISCA’07), DRRIP (ISCA’10) achieves this effect ]

[ SRRIP (ISCA’10) achieves this effect ]

Page 5: SHiP : Signature-based Hit Predictor for High Performance Caching

Dynamic Re-Reference Interval Prediction ( DRRIP )

0Imme-diate

1Inter-

mediate

2far

3distant

re-reference

No Victim

insertion

re-reference

eviction

re-reference

5

No Victim No Victim

(SRRIP)Scan-Resistant

[ Jaleel et al., ISCA’10 ]

insertion

( BRRIP )Thrash-Resistant

Page 6: SHiP : Signature-based Hit Predictor for High Performance Caching

SRRIP Not Always Scan Resistant…

• LONG scans in access pattern

6

miss hit hit miss“short” scan “long” scanhit

Page 7: SHiP : Signature-based Hit Predictor for High Performance Caching

SRRIP Not Always Scan Resistant…

• LONG scans in access pattern

7

miss miss miss missscan scan scan

miss hit hit miss“short” scan “long” scanhit

• Active working-set MUST be RE-REFERENCED at least ONCE between scans

Page 8: SHiP : Signature-based Hit Predictor for High Performance Caching

SRRIP Not Always Scan Resistant…

• LONG scans in access pattern

8

miss miss miss missscan scan scan

miss hit hit miss“short” scan “long” scanhit hit

hithithit

Can We Be More Intelligent in Dealing with Scans?

• Active working-set MUST be RE-REFERENCED at least ONCE between scans

Page 9: SHiP : Signature-based Hit Predictor for High Performance Caching

Closer Look at Scan Access Patterns

9

scan scan

Future Reference No Future References

Assuming Perfect Knowledge of Re-Reference Pattern

Page 10: SHiP : Signature-based Hit Predictor for High Performance Caching

Improving RRIP on Cache Insertions

0Imme-diate

1Inter-

mediate

2far

3distant

re-reference

No Victim

re-reference

eviction

re-reference

10

No Victim No Victim

scan Improve Insertion

Need to Assign DIFFERENT Re-Reference Predictions on Cache Insertion

Page 11: SHiP : Signature-based Hit Predictor for High Performance Caching

Focus of this Paper…• Goal: Learn re-reference interval of a cache line

11

PREDICTORcache access re-reference prediction

0: immediate1: intermediate2: far3: distant

How Best to Learn the Re-Reference Interval?

Page 12: SHiP : Signature-based Hit Predictor for High Performance Caching

Learning Re-Reference Behavior

12

scan scan

REFERENCE SAMEMEMORY REGIONREFERENCED BY

SIMILAR SET OF PCs

Can We Learn Re-References By Correlating Accesses With Some Other Information?

Page 13: SHiP : Signature-based Hit Predictor for High Performance Caching

Learning Re-Reference Behavior

13

scan scan

REFERENCE SAMEMEMORY REGIONREFERENCED BY

SIMILAR SET OF PCs

Can We Learn Re-References By Correlating Accesses With Some Other Information?

Page 14: SHiP : Signature-based Hit Predictor for High Performance Caching

Using Signatures to Correlate Re-Reference

scan scan

• Different types of information:• Memory Region• Memory Instruction PC• Instruction Sequence

• Observation: LLC accesses by the same “signature” tend to have similar re-reference patterns

“signature“

OBSERVE, LEARN and PREDICT Re-Reference Pattern of a Signature

Page 15: SHiP : Signature-based Hit Predictor for High Performance Caching

• Observe re-reference pattern in the baseline cache

• Cache Tag• Replacement State• Coherence State

Observe Signature Re-Reference Behavior

15LLC

Load

/Sto

re

Addr

ess

Page 16: SHiP : Signature-based Hit Predictor for High Performance Caching

• Observe re-reference pattern in the baseline cache

• Hardware Required: • Was line re-referenced after cache insertion ( 1-bit )• “Signature” responsible for cache insertion ( 14-bits )

Observe Signature Re-Reference Behavior

16LLC

• reuse bit• signature_insert

Sign

atur

e

Load

/Sto

re

Addr

ess

metadata

Page 17: SHiP : Signature-based Hit Predictor for High Performance Caching

• Learn signature re-reference behavior

• Hardware Required: • Signature History Counter Table (SHCT) ( 16K, 2-bit counters

)

• SHCT Training:• If evicted line reused: SHCT [ signature_insert ] ++• If evicted line NOT reused: SHCT [ signature_insert ] --

Learn Signature Re-Reference Behavior

17Last Level Cache (LLC)

SHCTcounter = 0, signature NOT re-referenced

counter != 0, signature re-referenced

Page 18: SHiP : Signature-based Hit Predictor for High Performance Caching

Signature-based Hit Predictor (SHiP)• Predict re-reference interval of line using SHCT

18

SHiPcache hit/miss

re-reference prediction

0: immediate1: intermediate2: far3: distantsignature

SHCT

Page 19: SHiP : Signature-based Hit Predictor for High Performance Caching

Signature-based Hit Predictor (SHiP)• Predict re-reference interval using SHCT on CACHE MISS

19

cache missre-reference

prediction0: immediate1: intermediate2: far3: distantsignature

if ( SHCT [ signature ] == 0 )if ( SHCT [ signature ] == 0 )

elsepredict DISTANT (i.e. 3)

predict FAR (i.e. 2)

SHiP Re-Reference Predictions On Miss

Page 20: SHiP : Signature-based Hit Predictor for High Performance Caching

Signature-based Hit Predictor (SHiP)• Predict re-reference interval on CACHE HIT

20

cache hitre-reference

prediction0: immediate1: intermediate2: far3: distantsignature Always predict IMMEDIATE (i.e. 0)

SHiP Re-Reference Predictions On Hit

Page 21: SHiP : Signature-based Hit Predictor for High Performance Caching

SHiP – High Level Architectural Overview

21

Acce

ss Ty

pe

Addr

ess

data

hit/m

iss

Sign

atur

e

Last Level Cache (LLC)

SHiPSHCT

Re-Reference Prediction

signature_insertreuse_bit

LLC hit/miss

SHCT Training

Page 22: SHiP : Signature-based Hit Predictor for High Performance Caching

SHiP – High Level Architectural Overview

22

Acce

ss Ty

pe

Addr

ess

data

hit/m

iss

Sign

atur

e

Last Level Cache (LLC)

Per-Line Overhead Can Be Reduced by usingSet Sampling ( need only 32 - 64 sets )

SHiPSHCT

Re-Reference Prediction

signature_insertreuse_bit

LLC hit/miss

SHCT Training

Page 23: SHiP : Signature-based Hit Predictor for High Performance Caching

SHiP

SHiP – High Level Architectural Overview

23

Acce

ss Ty

pe

Addr

ess

data

hit/m

iss

Sign

atur

e

Last Level Cache (LLC)

Per-Line Overhead Can Be Reduced by usingSet Sampling ( need only 32 - 64 sets )

SHCT

Re-Reference Prediction

signature_insertreuse_bit

LLC hit/miss

SHCT Training

~6

KB

NO CHANGE

Page 24: SHiP : Signature-based Hit Predictor for High Performance Caching

Performance Comparison of Replacement Policies

24

SHiP Significantly Improves Performance Across All Workload Categories

Mm./Games Server SPEC2K6 All1.00

1.05

1.10

1.15

SRRIP DRRIP SHiP-PC

Perf

orm

ance

Rel

ativ

e to

LR

U

16-way 2MB LLCCore i7 Type Hierarchy

Page 25: SHiP : Signature-based Hit Predictor for High Performance Caching

1.00

1.05

1.10

1.15

SRRIP DRRIP Seg-LRU SDBP SHiP-PC

Perf

orm

ance

Rel

ativ

e to

LR

UPerformance Comparison of Replacement Policies

CRC Results Comparison

25

16-way 1MB Private Cache65 Single-Threaded Workloads

16-way 4MB Shared Cache165 4-core Workloads

Averaged Across PC Games, Multimedia, Enterprise Server, SPEC CPU2006 Workloads S

HiPSHiP

SHiP Has 2X Performance Improvements of Prior State-of-the-Art Policies

Page 26: SHiP : Signature-based Hit Predictor for High Performance Caching

Total Storage Overhead (16-way Set Associative Cache)

26

• LRU: 4-bits / cache block• Pseudo-LRU 1-bit / cache block• RRIP: [ ISCA’10 ] 2-bits / cache block• Seg-LRU: [ CRC’10 ] ~8-bits / cache block• SDBP: [ MICRO’10 ] ~10-bits / cache block• SHiP: [ MICRO’11 ] ~5-bits / cache block

SHiP Outperforms State-of-the-Art with HW Similar to LRU

Page 27: SHiP : Signature-based Hit Predictor for High Performance Caching

Summary

• Scan-resistance is an important problem in commercial workloads• State-of-the art policies do not fully address scan-resistance

• Signatures help improve re-reference predictions to address scans• Need fine-grained re-reference predictions at insertion

• Proposed a Simple and Practical Scan-Resistant Replacement

• SHiP significantly outperforms winner of CRC Championship• SHiP requires less storage than CRC winner• HW overhead of SHiP is comparable to LRU

27

Page 28: SHiP : Signature-based Hit Predictor for High Performance Caching

28

Q&A

Page 29: SHiP : Signature-based Hit Predictor for High Performance Caching

29

Q&A

Page 30: SHiP : Signature-based Hit Predictor for High Performance Caching

30

Q&A

Page 31: SHiP : Signature-based Hit Predictor for High Performance Caching

Re-Reference Interval Prediction ( RRIP )

0Imme-diate

1Inter-

mediate

2far

3distant

re-reference

No Victim

insertion

re-reference

eviction

re-reference

31

No Victim No Victim

Scan-ResistantCAN INSERTION BEMORE INTELLIGENT?

Page 32: SHiP : Signature-based Hit Predictor for High Performance Caching

Using Signatures to Correlate Re-Reference Behavior

32

scan scan

Future Cache Hits No Future Cache Hits

SIGNATURE a b a c d c

a c b d

Example SignaturesMemory Region Program Counter Instruction Decode History

Page 33: SHiP : Signature-based Hit Predictor for High Performance Caching

LRU vs. Re-Reference Interval Prediction (RRIP)

33

21

Cache Tag

2

s c

3

b

0

h

5

f

4

d

7

g

6

e

“LRU Chain” position

0 1 2 3 4 5 6 7Physical Way #

LRU

RRIP Outperforms LRU with Storage Less Than LRU

20

Cache Tag

0

s c

1

b

0

h

2

f

2

d

3

g

3

e

Re-Reference Prediction

0 1 2 3 4 5 6 7Physical Way # RRIP

Page 34: SHiP : Signature-based Hit Predictor for High Performance Caching

Signature-based Hit Predictor (SHiP)

• Goal: Predict the re-reference behavior of a signature

• Learn Re-Reference Behavior:

34

Acce

ss Ty

pe

Addr

ess

data

hit/m

iss

Sign

atur

e

LLC