P refetch - A ware C ache Man agement for High Performance Caching

23
PA Man: P refetch- A ware C ache Man agement for High Performance Caching Carole-Jean Wu , Aamer Jaleel*, Margaret Martonosi , Simon Steely Jr.*, Joel Emer* § Princeton University Intel VSSAD* MIT § December 7, 2011 International Symposium on Microarchitecture

description

PA Man: . P refetch - A ware C ache Man agement for High Performance Caching. Carole-Jean W u ¶ , Aamer Jaleel *, Margaret Martonosi ¶ , Simon Steely Jr.*, Joel Emer * § Princeton University ¶ Intel VSSAD* MIT § December 7, 2011 - PowerPoint PPT Presentation

Transcript of P refetch - A ware C ache Man agement for High Performance Caching

Page 1: P refetch - A ware  C ache  Man agement for High Performance Caching

PA Man: Prefetch-Aware Cache Management for High Performance Caching

Carole-Jean Wu¶, Aamer Jaleel*, Margaret Martonosi¶, Simon Steely Jr.*, Joel Emer*§

Princeton University¶ Intel VSSAD* MIT§

December 7, 2011International Symposium on Microarchitecture

Page 2: P refetch - A ware  C ache  Man agement for High Performance Caching

Memory Latency is Performance Bottleneck• Many commonly studied memory optimization

techniques• Our work studies two:– Prefetching

• For our workloads, prefetching alone improves performance by an avg. of 35%

– Intelligent Last-Level Cache (LLC) Management

2No Prefetch

ing 1

1.0404970330.96

1

1.04

1.08

LRUDRRIPSDBPSHiP-PC

IPC

Perf

orm

ance

N

orm

aliz

ed to

LRU

[ISCA `10]

[MICRO `10]

[MICRO `11]

LLC management alone

Page 3: P refetch - A ware  C ache  Man agement for High Performance Caching

L2 Prefetcher: LLC Misses

CPU0

L1I

LLC

L2

CPU1

L2

CPU2

L2

CPU3

L2

L1D L1I L1D L1I L1D L1I L1D

PF

PF

PF

PF Miss

Miss

. . .

Page 4: P refetch - A ware  C ache  Man agement for High Performance Caching

L2 Prefetcher: LLC Hits

CPU0

L1I

LLC

L2

CPU1

L2

CPU2

L2

CPU3

L2

L1D L1I L1D L1I L1D L1I L1D

PF

PF

PF

PF Miss

Hit

. . .

Page 5: P refetch - A ware  C ache  Man agement for High Performance Caching

Prefetching

Intelligent LLC Management

Page 6: P refetch - A ware  C ache  Man agement for High Performance Caching

Observation 1: For Not-Easily-Prefetchable Applications…

doom

3

final

-fant

asy

halfl

ife2

GG IB

tpc-

c

bwav

es

gem

sFDT

D

sphi

nx3

Mm./Games Server SPEC CPU2006

0.51

1.52

2.53

3.54

4.5LRU DRRIP SDBP SHiP-PC

IPC

Perf

orm

ance

with

Pre

fetc

hing

(N

orm

aliz

ed to

LRU

with

out

Pref

etch

ing)

Observation 1: Cache pollution causes unexpected performance degradation despite intelligent LLC Management

Page 7: P refetch - A ware  C ache  Man agement for High Performance Caching

Prefetching SPEC CPU2006

0.9

0.95

1

1.05

1.1

LRUDRRIPSDBPSHiP-PC

IPC

Perf

orm

ance

Nor

mal

ized

to LR

U

No Prefetching

SPEC CPU2006

0.9

0.95

1

1.05

1.1

LRUDRRIPSDBPSHiP-PC

IPC

Perf

orm

ance

Nor

mal

ized

to LR

UObservation 2:

For Prefetching-Friendly Applications

6.5%+3.0%+

4

Observation 2: Prefetched data in LLC diminishes the performance gains from intelligent LLC management.

SPEC CPU2006No Prefetching

SPEC CPU2006Prefetching

Page 8: P refetch - A ware  C ache  Man agement for High Performance Caching

Design Dimensions for Prefetcher/Cache Management

Prefetcher Cache Interference

Reduced Perf. Gains from Intelligent LLC Management

Hardware Overhead

Adaptive prefetch filters/buffers

Prefetch pollution estimation

Perf. counter-based prefetcher manager

✔ ✗✔ ✗ Moderate

(pf. bit/line)

Some(new hw.)

✔ ✗ Software

Synergistic management for prefetchers and intelligent LLC management

Page 9: P refetch - A ware  C ache  Man agement for High Performance Caching

PACMan: Prefetch-Aware Cache Management

Research Question 2: For applications already benefiting from prefetching, can PACMan improve performance even more?

Research Question 1: For applications suffering from prefetcher cache pollution,

can PACMan minimize such interference?

Page 10: P refetch - A ware  C ache  Man agement for High Performance Caching

Talk Outline

• Motivation • PACMan: Prefetch-Aware Cache Management– PACMan-M– PACMan-H– PACMan-HM – PACMan-Dyn

• Performance Evaluation• Conclusion

Page 11: P refetch - A ware  C ache  Man agement for High Performance Caching

Opportunities for a More Intelligent Cache Management Policy

• A cache line’s state is naturally updated when– Inserting an incoming cache line @ cache miss – Updating a cache line’s state @ cache hit

1114

Cache line is re-referenced

Cache line is inserted

Cache line is re-referenced

Cache line is re-referenced

Cache line is evicted

0Imme-diate

1Inter-

mediate

2far

3distant

No victim is found

No victim is found

No victim is found

Re-Reference Interval Prediction (RRIP) ISCA `10

PACMan treats demand and prefetch requests differently at cache insertion and hit promotion

Page 12: P refetch - A ware  C ache  Man agement for High Performance Caching

PACMan-M: Treat Prefetch Requests Differently at Cache Misses

• Reducing prefetcher cache pollution at cache line insertion

14

0Imme-diate

1Inter-

mediate

2far

3distant

Cache line is re-referenced

Cache line is inserted

Cache line is re-referenced

Cache line is re-referenced

Cache line is evicted

PrefetchDemand

Page 13: P refetch - A ware  C ache  Man agement for High Performance Caching

PACMan-H: Treat Prefetch Requests Differently at Cache Hits

• Retaining more “valuable” cache lines at cache hit promotion

16

0Imme-diate

1Inter-

mediate

2far

3distant

Cache line is re-referenced

Cache line is inserted

Cache line is re-referenced

Cache line is re-referenced

Cache line is evicted

Prefetch HitDemand Hit

Prefetch HitDemand Hit Prefetch Hit

Demand Hit

Page 14: P refetch - A ware  C ache  Man agement for High Performance Caching

PACMan-HM = PAMan-H + PACMan-M

0Imme-diate

1Inter-

mediate

2far

3distant

Cache line is re-referenced

Cache line is inserted

Cache line is re-referenced

Cache line is re-referenced

Cache line is evicted

Prefetch MissDemand Miss

Prefetch Hit

Demand Hit

Prefetch HitDemand Hit

Prefetch HitDemand Hit

Page 15: P refetch - A ware  C ache  Man agement for High Performance Caching

SDMBaseline + PACMan-M

PACMan-Dyn dynamically chooses between static PACMan policies

Follower Sets

SDM Baseline + PACMan-HM

index

Cnt policy1

Cnt policy3

Cnt policy2 MIN

Policy Selection

.

.

.

.

SDMBaseline + PACMan-H

Set Dueling

19

Page 16: P refetch - A ware  C ache  Man agement for High Performance Caching

Evaluation Methodology

• CMP$im simulation framework– 4-way OOO processor– 128-entry ROB– 3-level cache hierarchy

• L1 inst. and data caches: 32KB, 4-way, private, 1-cycle• L2 unified cache: 256KB, 8-way, private, 10-cycle• L3 last-level cache: 1MB per core, 16-way, shared, 30-cycle

– Main memory: 32 outstanding requests, 200-cycle• Streamer prefetcher – 16 stream detectors• DRRIP-based LLC: 2-bit RRIP counter

Page 17: P refetch - A ware  C ache  Man agement for High Performance Caching

PACMan-HM Outperforms PACMan-H and PACMan-M

doom

3

final

-fant

asy

halfl

ife2

GG IB

tpc-

c

bwav

es

gem

sFDT

D

sphi

nx3

Mul

timed

ia

Serv

er

SPEC

2K6 All

Multimedia Server SPEC CPU2006 Avg.

0.8

1

1.2

1.4

1.6

1.8

2 DRRIP PACMan-M PACMan-H PACMan-HM

Perf

orm

ance

Nor

mal

ized

to

LRU

in th

e Pr

esen

ce o

f Pre

fetc

h-in

g

While PACMan policies improve performance overall, static PACMan policies can hurt some applications i.e. bwaves and gemsFDTD

Page 18: P refetch - A ware  C ache  Man agement for High Performance Caching

doom

3

final

-fant

asy

halfl

ife2

GG IB

tpc-

c

bwav

es

gem

sFDT

D

sphi

nx3

Mul

timed

ia

Serv

er

SPEC

2K6 All

Multimedia Server SPEC CPU2006 Avg.

0.81

1.21.41.61.8

2DRRIP PACMan-M PACMan-H PACMan-HM PACMan-DYN

Perf

orm

ance

Nor

mal

ized

to

LRU

in th

e Pr

esen

ce o

f Pre

fetc

hing

PACMan-Dyn:Better and More Predictable Performance Gains

PACMan-Dyn performs the best (overall) while providing more consistent performance gains.

Page 19: P refetch - A ware  C ache  Man agement for High Performance Caching

PACMan: Prefetch-Aware Cache Management

Research Question 2: For applications already benefiting from prefetching, can PACMan improve performance even more?

Research Question 1: For applications suffering from prefetcher cache pollution,

can PACMan minimize such interference?

Page 20: P refetch - A ware  C ache  Man agement for High Performance Caching

PACMan Combines Benefits of Intelligent LLC Management and Prefetching

Prefetching Prefetching Prefetching PrefetchingMm./Games Server SPEC

CPU2006All

0.6

1

1.4

1.8

2.2

2.6

LRUDRRIPPACMan-HMPACMan-DYN

IPC

Perf

orm

ance

Nor

mal

ized

to

Base

line

LRU

with

out P

refe

tchi

ng

22% better

15% better

Prefetch-InducedLLC Interference

PrefetchingFriendly

Page 21: P refetch - A ware  C ache  Man agement for High Performance Caching

Other Topics in the Paper

• PACMan-Dyn-Local/Global for multiprog. workloads– An avg. of 21.0% perf. improvement

• PACMan cache size sensitivity• PACMan for inclusive, non-inclusive, and

exclusive cache hierarchies• PACMan’s impact on memory bandwidth

Page 22: P refetch - A ware  C ache  Man agement for High Performance Caching

PACMan Conclusion

• First synergistic approach for prefetching and intelligent LLC management

• Prefetch-aware cache insertion and update– ~21% performance improvement– Minimal hardware storage overhead

• PACMan’s Fine-Grained Prefetcher Control– Reduces performance variability from prefetching

Page 23: P refetch - A ware  C ache  Man agement for High Performance Caching

PA Man: Prefetch-Aware Cache Management for High Performance Caching

Carole-Jean Wu¶, Aamer Jaleel*, Margaret Martonosi¶, Simon Steely Jr.*, Joel Emer*§

Princeton University¶ Intel VSSAD* MIT§

December 7, 2011International Symposium on Microarchitecture