The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms
for Cache Replacement An Imitation Learning Approach
Transcript of for Cache Replacement An Imitation Learning Approach
![Page 1: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/1.jpg)
Evan Z. Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan Ahn
An Imitation Learning Approach for Cache Replacement
![Page 2: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/2.jpg)
The Need for Faster Compute
(https://openai.com/blog/ai-and-compute/)
Small cache improvements can make large differences! (Beckman, 2019)● E.g., 1% cache hit rate improvement → 35%
decrease in latency (Cidon, et. al., 2016)
Caches are everywhere:● CPU chips● Operating Systems● Databases● Web applications
Our goal: Faster applications via better cache replacement policies
![Page 3: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/3.jpg)
TL;DR:
I. We approximate the optimal cache replacement policy by (implicitly) predicting the future
II. Caching is an attractive benchmark for the general reinforcement learning / imitation learning communities
![Page 4: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/4.jpg)
MissHit (100x faster)
Cache Replacement
Miss
BA C
D A
BA D
C
BA DCache
Accesses
Evict
Goal: Evict the cache lines to maximize cache hits
![Page 5: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/5.jpg)
Miss
Cache Replacement
C
C
Cache
Accesses
Evict
BA D
D A
BA B DA
HitMiss
Mistake
![Page 6: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/6.jpg)
Cache Replacement
C
C
Cache
Accesses
BA D
D
B DA
HitMiss
BA
A
Optimal decision
Miss
![Page 7: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/7.jpg)
Cache Replacement
C
C
Cache
Accesses
BA D
D
B DA
HitMiss
BA
A
Miss
Reuse distance dt(line): number of accesses from access t until the line is reusedd0(A) = 1, d0(B) > 2, d0(C) =
2
Optimal Policy (Belady’s): Evict the line with the greatest reuse distance (Belady, 1966)
![Page 8: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/8.jpg)
Belady’s Requires Future Information
Reuse distance dt(line): number of accesses from access t until the line is reused
Problem: Computing reuse distance requires knowing the future
So in practice, we use heuristics, e.g.:● Least-recently used (LRU)● Most-recently used (MRU)
… but these perform poorly on complex access patterns
![Page 9: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/9.jpg)
Leveraging Belady’s
Idea: approximate Belady’s from past accesses
Past accesses Current access
Future accesses
. . .. . .
Learned Model Belady’s
Predicted decision Optimal decisionTraining
![Page 10: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/10.jpg)
Prior Work
Past accesses
Current access
Currentcache state
Current line cache friendly or averse?
Evict line XTrained on
Belady’s
Traditional Algorithm
Hawkeye / Glider
Current state-of-the-art (Shi et. al., ‘19, Jain et. al., ‘18)
![Page 11: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/11.jpg)
Prior Work
+ binary classification is relatively easy to learn
- traditional algorithm can’t express optimal policy
Past accesses
Current access
Currentcache state
Current line cache friendly or averse?
Evict line XTrained on
Belady’s
Traditional Algorithm
Hawkeye / Glider
Current state-of-the-art (Shi et. al., ‘19, Jain et. al., ‘18)
![Page 12: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/12.jpg)
. . .
Our proposal
Our Approach
Past accesses
Current access
Model
Currentcache state
Evict line X
Our contribution:Directly approximate Belady’s
via imitation learning
Trained on Belady’s
Past accesses
Current access
Currentcache state
Current line cache friendly or averse?
Evict line XTrained on
Belady’s
Traditional Algorithm
Hawkeye / Glider
Current state-of-the-art (Shi et. al., ‘19, Jain et. al., ‘18)
![Page 13: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/13.jpg)
Cache Replacement Markov Decision Process
MissHitMiss
BA C
D
B D
C
BA DCache
Accesses
Evict
A
A
Similar to Wang, et. al., 2019
![Page 14: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/14.jpg)
Past accesses Current access MissHitMiss
BA C
D
B D
C
BA DCache
Accesses
Evict
A
A
Current cache contents
Cache Replacement Markov Decision Process
Similar to Wang, et. al., 2019
![Page 15: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/15.jpg)
MissHitMiss
BA C
D
B D
C
BA D
A
ACache
Accesses
Cache Replacement Markov Decision Process
Similar to Wang, et. al., 2019
![Page 16: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/16.jpg)
BA DCache
Accesses
Evict
MissHitMiss
D CA
BA C B DA
Cache Replacement Markov Decision Process
Similar to Wang, et. al., 2019
![Page 17: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/17.jpg)
Leveraging the Optimal Policy
Typical imitation learning setting(Pomerlau, 1991, Ross, et. al., 2011, Kim, et. al., 2013)
state
Learned policy
optimal action
Learned policy Approximate optimal policy
state
optimize, e.g.,
Observation: Not all errors are equally bad● Learning from optimal policy yields
greater training signal
Concretely: minimize a ranking loss
![Page 18: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/18.jpg)
Reuse distance
Reuse Distance as an Auxiliary Task
Observation: predicting reuse distance is correlated with cache replacement● Cast this as an auxiliary task (Jaderberg, et. al., 2016)
State st
State embedding
Policy
Loss
![Page 19: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/19.jpg)
Results
LRU cache-hit rate
Optimal cache-hit rate
~19% cache-hit rate increase over Glider (Shi, et. al., 2019) on memory-intensive SPEC2006 applications (Jaleel, et. al., 2009)
~64% cache-hit rate increase over LRU on Google Web Search
![Page 20: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/20.jpg)
This work: Establish a proof-of-concept
A Note on Practicality
12 ...Address: 0x C5 A1
Byte 1 Byte 2 Byte 3
Linear Layer
address embedding
Per-byte address embedding● Reduce embedding size from 100MB to <10KB● ~6% cache-hit rate increase on SPEC2006 vs.
Glider● ~59% cache-hit rate increase on Google Web
Search vs. LRU
![Page 21: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/21.jpg)
Per-byte address embedding● Reduce embedding size from 100MB to <10KB● ~6% cache-hit rate increase on SPEC2006 vs.
Glider● ~59% cache-hit rate increase on Google Web
Search vs. LRU
This work: Establish a proof-of-concept
A Note on Practicality
12 ...Address: 0x C5 A1
Byte 1 Byte 2 Byte 3
Linear Layer
address embedding
Future work: Production ready learned policies● Smaller models via distillation (Hinton, et. al., 2015), pruning (Janowsky, 1989,
Han, et. al., 2015, Sze, et. al., 2017), or quantization● Target domains with longer latency and larger caches (e.g., software
caches)
![Page 22: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/22.jpg)
A New Imitation / Reinforcement Learning Benchmark
+ plentiful data- delayed real-world utility
- limited / expensive data+ immediate real-world impact
+ plentiful data+ immediate real-world impact
Miss
BA C
D
EvictBellemare, et. al., 2012,Silver, et. al., 2017, OpenAI, 2019, Vinyals, et. al., 2019
Levine, et. al., 2016, Lillicrap, et. al., 2015
Open-source cache replacement Gym environment coming soon!
![Page 23: for Cache Replacement An Imitation Learning Approach](https://reader030.fdocuments.in/reader030/viewer/2022012502/617bd180991fee18b555a23b/html5/thumbnails/23.jpg)
Takeaways
● A new state-of-the-art approach for cache replacement by imitating the oracle policy○ Future work: making this production ready
● A new benchmark for imitation learning / reinforcement learning research