Sustainable reuse: the adaptive reuse of the Chattanooga ...
Re-examining Instruction Reuse in Pre-execution Approaches
description
Transcript of Re-examining Instruction Reuse in Pre-execution Approaches
![Page 1: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/1.jpg)
Re-examining Instruction Reusein Pre-execution Approaches
BySonya R. Wolff
Prof. Ronald D. Barnes
June 5, 2011
![Page 2: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/2.jpg)
Processor Stalls
• Instruction Dependences– Compiler Optimization– Dynamic Scheduling
• Branch Instructions– Branch Predictors
• Memory Accesses– Caches– Non-Blocking Caches– Cache Prefetchers– Pre-Execution
2
![Page 3: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/3.jpg)
Code Example3
![Page 4: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/4.jpg)
What is Pre-Execution?In-Order
Pre-Execution
4
![Page 5: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/5.jpg)
Pre-Execution Techniques
• Out-of-Order Execution• Run Ahead Execution– Run Ahead [Dundas1997], [Multu2003]– Continuous Flow Pipelines [Srinivasan2004, Hilton2009]– Two-Pass Pipelining [Barnes2006]– Dual-Core Execution [Zhou2005]– Multi-Pass Pipelining [Barnes2006]– Rock’s Execute Ahead [Chaudhry2002]
5
![Page 6: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/6.jpg)
Run Ahead Execution• In-Order and Out-of-Order Pipeline Designs• Two Modes of Operation
– Normal Mode: Pipeline functions in traditional manner– Run-Ahead Mode: Instructions are retired without altering the
machine state. • Run-Ahead Entry on Very Long Latency Memory Operations• Upon Run-Ahead Exit, Program Counter Set to Instruction
After the Run-Ahead Entry Point.• Instruction Results from Run-Ahead Mode Not Reused
During Normal Mode Operation
6
![Page 7: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/7.jpg)
What is Reuse?Run Ahead
Run Ahead with Reuse
7
![Page 8: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/8.jpg)
Instruction Reuse Questions
• Previously Shown to be Ineffective for Out-of-Order Processors [Multu2005]
• Why is reuse ineffective for out-of-order?• Is reuse ineffective for in-order operations?• If effective for in-order operations, what cause
the behavioral differences?• How does pipeline variations affect reuse in
run-ahead pipelines?
8
![Page 9: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/9.jpg)
Simulation Setup• Two Processor Models: In-Order and Out-of-Order• 8-wide Instruction Fetch and Issue• L1 Cache: 64KB, 2 cycle, 4 loads per cycle• L2 Cache: 1 MB, 10 cycle, 1 load per cycle• Memory: 500 latency, 4:1 bus freq. ratio• Simulations
– SPEC CPU2006 benchmarks (reference inputs)– Compiled for x86, 64-bit– 250 million instructions simulations– 25 million warm-up period– Region chosen for statistical relevance [Sherwood2002]
9
![Page 10: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/10.jpg)
Normalized Cycles(Normalized to In-Order)
10
![Page 11: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/11.jpg)
Run Ahead Entries (Values in 1000)
11
![Page 12: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/12.jpg)
In-Order L2 Cache Misses (Values in 1000)
12
![Page 13: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/13.jpg)
Normalized Cycles (Normalized to In-Order)
13
![Page 14: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/14.jpg)
Percentage Clean Memory Accesses
14
![Page 15: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/15.jpg)
Summary of Reuse Results
• Out-of-Order Reuse vs. Run Ahead Only– Average: 1.03 X– Maximum (mcf): 1.12 X– Reduced Set: 1.05 X
• In-Order Reuse vs. Run Ahead Only– Average: 1.09 X– Maximum (lbm): 1.47 X– Reduced Set: 1.14 X
15
![Page 16: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/16.jpg)
In-OrderRun Ahead
Run Ahead with Reuse
16
![Page 17: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/17.jpg)
Out-of-OrderRun Ahead
Run Ahead with Reuse
17
![Page 18: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/18.jpg)
Variations and Expectations
• Main Memory Latency (1000, 500, 100)– Reduction in Run-Ahead Benefit for Lower Latency– Convergence of In-Order and Out-of-Order– Increase in Reuse Benefit for Higher Latency
• Fetch and Issue (8, 4, 3, and 2 Width)– Increase Benefit for Reuse with Smaller Issue– Convergence of In-Order and Out-of-Order
18
![Page 19: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/19.jpg)
Variations on Memory Latency(Normalized to In-Order for Each Latency)
19
![Page 20: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/20.jpg)
Variations on Issue Width(Normalized to In-Order to Each Issue Width)
20
![Page 21: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/21.jpg)
Conclusion
• Pre-Execution is a Highly Effective Way to Deal with Long Latency Memory Accesses
• Reuse of Run Ahead Results Provides Little to No Speedup for Out-of-Order Pipelines– Average Speedup = 1.03 X– Maximum Speedup = 1.12 X (MCF)
• Reuse of Run Ahead Results Provides Speedup for In-Order Pipelines– Average Speedup = 1.09 X– Maximum Speedup = 1.47 X (LBM)
21
![Page 22: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/22.jpg)
Additional Slides
22
![Page 23: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/23.jpg)
Run Ahead Entries (Values in 1000)
23
![Page 24: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/24.jpg)
Percentage Clean Memory Accesses
24
![Page 25: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/25.jpg)
Variations on Memory Latency (Normalized to Cache 1000 In-Order)
25
![Page 26: Re-examining Instruction Reuse in Pre-execution Approaches](https://reader036.fdocuments.in/reader036/viewer/2022081514/568166d8550346895ddaf3f2/html5/thumbnails/26.jpg)
Variations on Stage Widths(Normalized to Width 2 In-Order)
26