pacman-templatesTitle pacman-templates Created Date 9/23/2011 6:57:14 PM
PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling...
Transcript of PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling...
![Page 1: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/1.jpg)
PACMAN: Program-level Approximately Optimal Cache Management
Xiaoming Gu, Chen DingDepartment of Computer Science
University of Rochester
![Page 2: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/2.jpg)
• What is PACMAN?
2
• An ongoing compiler study to reduce cache misses using hardware bypassing supports
• A famous video game
![Page 3: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/3.jpg)
The Framework
3
Analysisidentify
references for bypassing
tag references with
bypassingTransformation
feedback
![Page 4: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/4.jpg)
Outline
• Motivation
• Hardware Bypassing Support
• An Example
• Details of Analysis and Transformation
• Summary
4
![Page 5: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/5.jpg)
Outline
• Motivation
• Hardware Bypassing Support
• An Example
• Details of Analysis and Transformation
• Summary
5
![Page 6: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/6.jpg)
Motivation
• Running a sequential program alone on a multi-core chip
• hard to parallelize the program
• running alone for no interference
• reduce cache misses
PACMAN: focus on reducing cache misses for sequential programs running alone
6
![Page 7: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/7.jpg)
Outline
• Motivation
• Hardware Bypassing Support
• An Example
• Details of Analysis and Transformation
• Summary
7
![Page 8: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/8.jpg)
Normal LRU Inst.
8
evicted
![Page 9: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/9.jpg)
Bypass LRU Inst.
9
evicted
![Page 10: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/10.jpg)
Outline
• Motivation
• Hardware Bypassing Support
• An Example
• Details of Analysis and Transformation
• Summary
10
![Page 11: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/11.jpg)
SOR• Jacobi Successive Over-relaxation
• from NIST SciMark 2.0
• a classical stencil computation
• compiled by LLVM 2.7 using gold plugin with -O4
11
![Page 12: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/12.jpg)
The Gap between LRU and OPT
12
NUM_ITERATIONS=10 M=N=512
• Two gaps
• The working sets (knees) of OPT are much smoother than LRU’s
gaps
![Page 13: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/13.jpg)
The Transformation
Normal and bypass LRU instructions mixed ==> LRU Bypassing
13
![Page 14: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/14.jpg)
The Improvement of PACMAN
NUM_ITERATIONS=10 M=N=512
•The gap at the second working set disappears!
14
![Page 15: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/15.jpg)
Outline
• Motivation
• Hardware Bypassing Support
• An Example
• Details of Analysis and Transformation
• Summary
15
![Page 16: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/16.jpg)
The Analysis
• Simulate OPT• for a given cache configuration
• each run-time access has three fields• data addr, static ref. ID, and bypassing flag (off by default)
• when an eviction happens, set bypassing flag on for the previously last access to the victim
16
A1, A2, ..., Ai, ......, Aj, ..., AN
X is evictedX is accessed
no access to Xin between
set bypassing flag on
![Page 17: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/17.jpg)
The Analysis (cont’d)
• Simulate OPT (cont’d)
• calculate bypassing ratios for all memory references
17
bypassing ratio of a reference =#accesses generated by the reference and with bypassing flag on
#accesses generated by the reference
• the references with high bypassing ratios are the candidates for bypassing
![Page 18: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/18.jpg)
The Transformation• Loop unrolling
• find out the target references in IR using the candidates’ ref. IDs
• figure out the last touch to a cache line in the innermost loop body• the cache line size
• the array element size
• the loop step stride
• the array indexing
• separate the last touch using loop unrolling
• tag the last touch with bypass LRU
18
![Page 19: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/19.jpg)
SOR by PACMAN
• Gim1[j] is the candidate
• only do bypassing for the last touch in the innermost loop body
• spatial locality retained
• use loop unrolling to do separation in practice
19
NUM_ITERATIONS=10, M=N=512 fully-associative, 512KB, line size=64B
![Page 20: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/20.jpg)
Why only the Second Working Set Improved?
• PACMAN simulation only at the second working set
• Reduce cache misses for the second working set
20
do OPT simulation at 512KB
![Page 21: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/21.jpg)
Set-associative Cache
21
• Keep losing benefits on cache with lower associativities
• The improvement is still significant
NUM_ITERATIONS=10 M=N=512
![Page 22: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/22.jpg)
With a Different Input
• The improvement is scalable with input sizes
22
NUM_ITERATIONS=10 M=N=512
NUM_ITERATIONS=20 M=N=1024
8X accesses
![Page 23: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/23.jpg)
Future Work
23
• extend PACMAN for general applications
• more realistic hardware environment
• multi-level pseudo-LRU cache
• the differences between loads and stores
• the interaction with prefetching
![Page 24: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/24.jpg)
Summary
24
Use simple hardware bypassing supports
Find out bypassing references by simulating OPT
Cache misses are reduced very close to the optimal case
Achieve significant improvement even on low associativity cache
The training results can be used for a real run with a larger input size
![Page 25: PACMAN: P Approximately Optimalclump/cdp2010/Gu_PACMAN.pdf · The Transformation • Loop unrolling • find out the target references in IR using the candidates’ ref. IDs •](https://reader033.fdocuments.in/reader033/viewer/2022042409/5f254366dff641532d4a05a8/html5/thumbnails/25.jpg)
Q & ANOT