IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization
description
Transcript of IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization
![Page 1: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/1.jpg)
IFRA
Instruction Footprint Recording & Analysis
for Post-Silicon Bug Localization
Sung-Boem ParkSubhasish Mitra
Robust Systems Group
Departments of Electrical Eng. & Computer Sc.
Stanford University11
![Page 2: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/2.jpg)
Key Message Post-silicon bug localization – Major bottleneck
Pinpoint from system failure
Bug location, exposing stimulus
Existing schemes – Expensive & not scalable
IFRA – New technique for processors
Eliminates limitations of existing techniques
96% accuracy
1% area, ~0% performance impact
22
![Page 3: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/3.jpg)
Outline Motivation
IFRA Overview
Simulation Results
Conclusion
3
![Page 4: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/4.jpg)
Microprocessor Development Flow
4
“Post-silicon cost & complexity is rising faster than design cost”
S. Yerramilli, VP, Intel, ITC06 Invited Address
Pre-Silicon
Post-Silicon
Pre-Silicon Verification
Design
Manufacturing Test
POST-SILICON VALIDATION
Post-Silicon Validation Costs: 35% of Development Time25% of Design Resources
![Page 5: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/5.jpg)
Detect – Run test content in system
e.g., OS, games, functional tests Localize – Pinpoint from system failure (e.g., crash)
Bug location – e.g., ALU, decoder, scheduler
Exposing stimulus – e.g., instruction sequence
Dominates cost [Josephson DAC06] Root cause & Fix
Optical probing, patch / circuit edit / respin
5
Post-Silicon Validation Steps
![Page 6: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/6.jpg)
6
Post-Silicon Bug Types [Josephson DAC06] Functional bugs – Incorrect logic implementation
e.g., design errors
Short localization time – e.g., hours to days Electrical bugs / circuit marginalities
e.g., speed-path, noise, races, hold time
Some voltage / temp / frequency corners
LONG localization time – e.g., days to weeks
Our focus
6
![Page 7: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/7.jpg)
Reproduce failure on tester
2 days
Localize on tester3 days
Not always Possible
Tester-based
Detect in system
Existing Post-Silicon Bug Localization Flows
7
Detect in system
System-based
Localize failure in system
1 to 4 weeks
Major ProblemsFailure Reproduction
System-level simulation
![Page 8: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/8.jpg)
8
IFRA vs. Existing Techniques
8
Techniques Trace buffers
Clock manipulation
Checkpoint+ replay
Scan techniques IFRA
Intrusive? ? Yes No
Failure reproduction? Yes No
System-level simulation? Yes No
Area impact? Yes No 1%
![Page 9: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/9.jpg)
Instruction Footprint Recording & Analysis
Insert recorders inside chip design
DesignPhase
Record special info. in recorders / Run tests
Scan out recorder contents
Post-analyze offline
Localized Bug: (location, stimulus)
Failuredetected?
Yes
No
Post-SiValidation
9
No system simulation Self-consistency against
test program binary
Non-intrusiveNo failure reproduction Single test run sufficient
![Page 10: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/10.jpg)
Outline Motivation
IFRA Overview
Hardware Support
Automated Post-Analysis Techniques
Simulation Results
Conclusion
10
![Page 11: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/11.jpg)
IFRA Hardware in Superscalar Processor
11
FETCH
DECODE
ISSUE
EXECUTE
COMMIT
Branch Predictor I-CacheI-TLBFetch Queue
Pipeline Registers
Decoders
Pipeline Registers
Reg Rename
Phys Regfile
Pipeline Registers
Instruction Window
Pipeline Registers
2xBr2xALUMUL
2xLSUD-CacheD-TLBFPU
Pipeline Registers
Reorder Buffer Reg Map
Pipeline Registers
Reg Map Reg FreeDISPATCH
Alpha 21264Part of
scan chain
Post-TriggerGenerator
Recorders
Recorders
Recorders
Recorders
Recorders
Recorders
ID assignment
Slow wireNo at-speed
routing
Scan chain
![Page 12: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/12.jpg)
INST1 ID1Auxiliary Info: PC1INST2 Auxiliary Info: PC2 ID2
Pipeline Reg
Pipeline Reg ID1INST1
ID1INST1 Auxiliary Info: Decoded bits1
ID1INST1
ID2 Auxiliary Info: Decoded bits2ID2INST2
INST2 ID2 Auxiliary Info: Decoded bits2
INST2 ID2 Auxiliary Info: PC2ID2
Recording Operation Example
12
FETCH
DECODE
ID Assignment
Branch Predictor I-CacheI-TLB
Fetch Queue
Decoder
ID1 Auxiliary Info: PC1
ID1 Auxiliary Info: Decoded bits1
Recorder 1
Recorder 2
Instruction Footprints
Special ID assignment rule
![Page 13: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/13.jpg)
13
Special Rule for Instruction ID Assignment Simplistic ID assignment inadequate
Speculation + flushes, out-of-order execution
PC does not work for loops Special ID assignment rule – formal proof in paper
ID width: log24n bits
n = max. instructions in flight
e.g., 8 bits for Alpha-like processor (n=64) No timestamp or global synchronization required
13
![Page 14: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/14.jpg)
Dominated by memory
Simple control logic
Idle cycle compaction
Circular buffer control
Serialization
Stop / Start recording
No high-speed global routing
Contents scanned out after failure detection
Instruction Footprint Recorder Design
14
Circular Buffer
Con
trol L
ogic
Post-triggersignal
Instruction ID + Auxiliary info.
To slow scan chain
14
![Page 15: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/15.jpg)
What to Record?Pipeline stage Auxiliary information Bits per
recorderNumber of recorders
Fetch PC 32 4Decode Decoding results 4 4Dispatch 2-bit residue of reg. name 6 4
Issue 3-bit residue of operands 6 4Execution
(ALU, MUL)3-bit residue of result 3 4
Execution(Branch)
None 0 2
Execution(Load/Store unit)
3-bit residue of result32-bit memory address
35 2
Commit Exceptions ~0 4
15
Total required storage for all recorders: 60 KBytes
![Page 16: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/16.jpg)
Post-Trigger Generation
16
time
Failure after 2 billion cycles(e.g., crash)
Error after a billion cycles(e.g., speedpath)
t=0
Code Execution
Too much storage overheadto store 1 billion cycles
![Page 17: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/17.jpg)
Post-Trigger Generation
17
time
Early failure detection techniques (post-triggers) Classical error detection – residue, parity Deadlock & segfault detection
Special early warnings to pause recording Details in paper
Failure after 2 billion cycles(e.g., crash)
Error after a billion cycles(e.g., speedpath)
t=0
Code Execution
Need to capturein recorder storage
Early failure detection necessary
![Page 18: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/18.jpg)
18
IFRA Area Impact 1% chip-level area impact
Synopsys Design Compiler synthesis
Alpha 21264-like processor: 2MB L2 cache
TSMC 130nm technology
No global at-speed routing
Area dominated by circular buffers in recorders
Total recorder storage: 60 KBytes
![Page 19: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/19.jpg)
Outline Motivation
IFRA Overview
Hardware Support
Post-Analysis Techniques
Simulation Results
Conclusion
19
![Page 20: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/20.jpg)
20
Post-Analysis Overview
Link footprints
Test program binary
Footprints from recorders
Run high-level analysis
Run low-level analysis
List of bug location-stimulus pairs
Control-flow analysisData-dependency analysis
Decoding analysisLoad/Store analysis
Residue consistency check
(Not covered today – Details in paper)
![Page 21: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/21.jpg)
21
Linking Footprints from Recorder ContentsCommit-stage
recorderFetch-stage
recorderExecution-stage
recorderTest program
binary
INST6 INST5 INST4 INST3 INST2
INST0
ID: 7 ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5
AUX7 AUX6 AUX5 AUX4 AUX3 AUX2 AUX1
PC4 PC3 PC2 PC1 PC3 PC2 PC1
ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5
AUX17 AUX16 AUX15 AUX14 AUX12 AUX11
ID: 7 ID: 6 ID: 5 ID: 4 ID: 7 ID: 6 ID: 5
PC6 PC5 PC4 PC3 PC2 PC0…
… ……
ID: 0 AUX13
ID: 0 AUX0
ID: 0 AUX8
ID: 0 PC0
ID: 0 PC5
PC1 INST1
PC7 INST7
time
ID: 0 AUX10
Special ID assignment rule ensures: Uncommitted instructions uniquely identified Relative orders of identical IDs maintained
Even under flushes & out-of-order execution
ID: 0 AUX18
… … … …
ID: 0 PC4
![Page 22: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/22.jpg)
22
Debug ExampleLink footprints
Bug locations + exposing stimulus
?
??
???
??
???
?? ??
?
?
?
?
?
??
?
?
?
??
Low-level analysis
High-level analysis
![Page 23: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/23.jpg)
23
Debug Example – Decision 1
R0 R3 + R6
R5 R0 + R6
……
R0 R1 + R2
Test Program Binary
Fetch-stage recorder
Serial execution trace
![Page 24: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/24.jpg)
24
Debug Example – Question 1
R0 R3 + R6
R5 R0 + R6
……
RAW hazard
R0 R1 + R2
R0=3
Issue-stagerecorder
R0=5
Execute-stagerecorder
Residue of values mismatch?
Serial execution trace
Producer of R0
Consumer of R0
![Page 25: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/25.jpg)
25
Debug Example – Question 2
R0 R3 + R6
R5 R0 + R6
……
RAW hazard
R0 R1 + R2
Residue of phys. reg. names mismatch?
R0=P5
Dispatch-stagerecorder
R0=P2
Serial execution trace
Producer of R0
Consumer of R0
![Page 26: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/26.jpg)
26
Debug Example – Question 3
R0 R3 + R6
R5 R0 + R6
……
RAW hazard
R0 R1 + R2
Serial execution trace
Producer of R0
Consumer of R0
Residue of phys. reg. name match with
previous producer?
R0=P5
Dispatch-stagerecorderR0=P5
Previous producer
![Page 27: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/27.jpg)
27
Debug Example – Result
Arch. Dest. Reg
Pipeline Register
Decoder
Read Circuit
Write Circuit
Reg. Mapping
Rest of pipeline reg. R0 R1 + R2R0 R3 + R6
R5 R0 + R6
Stim
ulates Bug
Bug Location
Rest of modules in
dispatch stage
……
…
Propagates to failure
![Page 28: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/28.jpg)
Outline Motivation
IFRA Overview
Simulation Results
Conclusion
28
![Page 29: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/29.jpg)
29
Experimental Setup Simplescalar architectural simulator
Alpha 21264 configuration
Augmented with ~1K error injection points Error model – single bit-flips
Hard-to-repeat electrical bugs
Both flip-flops & combinational logic Stimulus
SpecInt 2000 benchmarks
![Page 30: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/30.jpg)
Experimental Flow
30
Any failure detected?
Yes
NoShort error
latency? Yes
Warm up for a million cycles
Inject errorMasked/
silent error
No
100K simulation runs800 post-analysis runs
Post-analyze
Complete miss
Localization with
candidates
Exact localization
![Page 31: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/31.jpg)
IFRA Bug Localization Results
31
Localization resolution Bug exposing stimulus One of 200 erroneous design blocks
Avg. block size: 10K 2-input NAND gates
Correct localization (96%)
Complete miss (4%)
Exactlocalization
(78%)
Localization with avg. 6 candidates
(22%)
![Page 32: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/32.jpg)
Outline Motivation
IFRA Overview
Simulation Results
Conclusion
32
![Page 33: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/33.jpg)
Conclusion IFRA
Inexpensive
1% area, no expensive logic analyzers
No failure reproduction or system simulation
Effective
96% accuracy
Practical
Alpha processor demonstration
3333
![Page 34: IFRA Instruction Footprint Recording & Analysis for Post-Silicon Bug Localization](https://reader035.fdocuments.in/reader035/viewer/2022070503/5681566b550346895dc41e59/html5/thumbnails/34.jpg)
Acknowledgement Bob Gottlieb, Intel Nagib Hakim, Intel Ted Hong, Stanford University Doug Josephson, Intel Onur Mutlu, Microsoft Research Priyadarshan Patra, Intel Eric Rentschler, AMD Jason Stinson, Intel
34