1/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
BranchTapImproving Performance With Very Few Checkpoints
Through Adaptive Speculation Control
Patrick Akl and Andreas Moshovos
AENAO Research GroupDepartment of Electrical and Computer Engineering
University of Toronto
2/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
• We wish to make the recovery fast
What Happens on a Branch Misprediction?
Execution Timeline
Misprediction
Discovered Recover Processor
State
Redirect Fetch
Resume
Execution
Predict a Branch Outcome
Predicted Path Correct Path
3/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
• Existing mechanisms– Reorder buffer based: slow– Instantaneous checkpoints: faster
• Problem: can’t have enough checkpoints
• State-of-the-art solution: checkpoint prediction– Allocate the few checkpoints judiciously
• Another degree of freedom: speculation control– Sometimes deeper speculation = higher recovery cost
• Can hurt performance
– Throttle speculation
State-of-the-art recovery
4/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
• No additional checkpoints are needed
• Dynamically adapts to application behavior
• Improves performance for most programs– Misprediction performance penalty reduced by 28% on AVG
• BranchTap comes “for free” – Very simple to implement– Better than more accurate checkpoint predictors
BranchTap Results / Benefits
5/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Outline
• Background
• BranchTap
• Methodology and Results
• Summary
6/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
State Recovery Example: Register Alias Table
RAT
ArchitecturalRegister
PhysicalRegister
# a
rch
. re
gs
Lg(# arch. regs)
A add r1, r2, 100B breq r1, EC sub r1, r2, r2
Original Code
A add p4, p2, 100B breq p4, EC sub r5, p2, p2
Renamed Code
p1
p2
p3
p4p5p5p4
7/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
ROB: Slow, Fine-Grain Recovery
• Too slow: recovery latency proportional to number of instructions to squash
Reorder
BufferB B B BB
1. Misprediction discovered2. Locate newest instruction
3. Undo RAT updates in reverse order
Program Order
RATINVALID
Each entry contains
1. Architectural destination register
2. Its previous RAT map
8/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Global Checkpoints: Fast, Coarse-Grain Recovery
• Branch w/ GC: Recovery is “Instantaneous”
Reorder
BufferB B B BB
1. Misprediction discovered
Program Order
RATINVALID
checkpointcheckpointcheckpointcheckpoint
9/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Impact of More Checkpoints
• More checkpoints ?– Power hungry structure
– Increased delay
• Only a few checkpoints can practically be implemented– Cannot always cover all branches
architecturalregister
physical register
Actual Implementation
Working Copy chec
kpoint
sRAT
Concept
10/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Intelligent Checkpointing
• State of the art solution– Checkpoint allocation: Allocate checkpoints at hard-to-
predict branches
– Checkpoint management: Release checkpoints as soon as they are no longer needed
• Use few checkpoints efficiently
11/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
• Mispeculation on a branch w/ a GC: Direct recovery
• Mispeculation on a branch w/o a GC: Indirect recovery
• With intelligent checkpointing: • 30% Indirect recoveries 75% of performance loss
Conventional Mechanisms: Recovery Scenarios
BBB ROB
BBB ROB
checkpoint
Fast Recovery
Slow Recovery
checkpoint
12/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Outline
• Background
• BranchTap
• Methodology and Results
• Summary
13/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
BranchTap Motivation
ROBNo Wait Scenario
Misprediction
discovered
~ Recovery Cost
~ Recovery Cost
checkpoint
Low confidence branch
checkpoint
checkpoint checkpoint
ROB
Sometimes, it is better to wait if no checkpoint is available
Wait Scenario
B B B
B B B
14/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
BranchTap Concept
• Key idea: stall when speculation is likely to deteriorate performance– Count the number of low confidence branches w/o a checkpoint– If it exceeds a threshold, stall
• Threshold selection– Fixed
• Varies greatly across programs• Can deteriorate performance significantly
– Adaptive• Robust performance
• Minimize recovery cost while conserving good speculation opportunities
15/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
No adaptation Sample &adapt
Execution Timeline (Cycles)
WT Next WT
Threshold Adaptation Policy
• BranchTap adapts across and within applications
16/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Outline
• Background
• BranchTap
• Methodology and Results
• Summary
17/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Results Overview
• Performance w/o Checkpoints– BranchTap improves even with just an ROB
• Performance w/ 4 Checkpoints– BranchTap improves over conventional recovery methods
• Performance w/ Larger Checkpoint Predictors– BranchTap offers better performance than a 64x larger
predictor
18/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Methodology
• Simulator based on Simplescalar
• 24 SPEC CPU 2000 benchmarks
• Reference Inputs
• Processor configurations– 8-way OoO core– Up to 1K in-flight instructions– 1K-entry confidence table for low confidence branch
identification
• 1B committed instructions after skipping 100B
19/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
“Perfect Checkpointing” Configuration
• A checkpoint is auto-magically taken at all mispredicted branches– All recoveries are fast
• We report the “deterioration relative to perfect checkpointing”
20/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
0%
5%
10%
15%
20%
25%
gzip vpr lucas art AVG
Conventional BranchTap Adaptive BranchTap Non-Adaptive
Performance with No Checkpoints• Deterioration relative to “perfect checkpointing”
-39%
dete
riora
tion
• BranchTap improves over conventional mechanisms• Adaptation leads to robust performance improvements
bet
ter
21/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
• Deterioration relative to “perfect checkpointing”
• BranchTap with 4 checkpoints is better than 6 checkpoints alone
0%
2%
4%
6%
8%
10%
twolf parser lucas mcf bzip2 AVG
Conventional BranchTap Adaptive BranchTap non-Adaptive
Performance Evaluation with 4 Checkpoints
-28%
dete
riora
tion b
ette
r
22/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
• BranchTap with a 1K-entry confidence table and 4 GCs:– Higher performance than a 64K-entry confidence table with 4 GCs
– Lower complexity, virtually comes “for free”
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
64 256 1K 4K 16K 64K
BranchTap vs. Larger Checkpoint Predictors
BranchTapde
terio
ratio
n
confidence table size
bet
ter
23/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Outline
• Background
• BranchTap
• Methodology and Results
• Summary
24/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
Summary
• Performance with 4 (no) checkpoints– ~28 (39) % of misprediction penalty removed– BranchTap is robust:
• Up to 6 (13) % better and max 1.2 (0.1) % worse than conventional mechanisms
• BranchTap is very simple to implement– Few counters and comparators
• BranchTap is better than other alternatives– BT + 1K predictor better than a 64K predictor alone– BT + 4 GCs better than 6 GCs alone
25/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control
BranchTapImproving Performance With Very Few Checkpoints
Through Adaptive Speculation Control
Patrick Akl and Andreas Moshovos
AENAO Research GroupDepartment of Electrical and Computer Engineering
University of Toronto
{pakl, moshovos}@eecg.toronto.edu
Top Related