OpenSPARC – An Open Platform for Hardware Reliability ...
Transcript of OpenSPARC – An Open Platform for Hardware Reliability ...
OpenSPARC – An Open Platform for Hardware Reliability Experimentation
Ishwar Parulkar and Alan Wood Sun Microsystems, Inc.
James C. Hoe and Babak FalsafiCarnegie Mellon University
Sarita V. Adve and Josep TorrellasUniversity of Illinois at Urbana-
ChampaignSubhasish Mitra
Stanford University
IEEE SELSE 4 - March 26, 2008
2IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Outline
1.Chip Multi-threading (CMT)
2.OpenSPARC T2 and T1 processors
3.Reliability in OpenSPARC processors
4.What is available in OpenSPARC
5.Current university research using OpenSPARC
6.Future research directions
3IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
World's First 64-bit Open Source Microprocessor
OpenSPARC.net Governed by GPLv2
Complete processor architecture & implementation
Register Transfer Level (RTL) Hypervisor API Verification suite and
architectural models Simulation model for operating
system bringup on s/w
4IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large Large
Low Medium High Medium High Medium
Chip Multithreading (CMT)
5IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Memory BottleneckRelative Performance
10000
11990 1995 2005 1980
1000
100
10
1985 2000
Gap
CPU Frequency
DRAM Speeds
Source: Sun World Wide Analyst Conference Feb. 25, 2003
CPU -- 2x Every 2 Years
DRAM -- 2x Every 6
Years
6IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Single Threaded Performance
Single Threading
Thread
Memory Latency Compute
Time
HURRYUP ANDWAIT!
C C C
Typical Processor Utilization:15–25%
M M M
Up to 85% Cycles Waiting for Memory
7IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Single Threaded Performance Chip Multi-threaded
(CMT) Performance
The Power of CMT
UltraSPARC T1 core Processor Utilization: Up to
85%
C MC MC MThread 1
Memory Latency ComputeTime
C MC MC M
C MC MC M
C MC MC M
Thread 2
Thread 3
Thread 4
8IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Chip Multi-Threading (CMT)
CMP (chip multiprocessing)
HMT (hardware multithreading)
CMT (chip
multithreading)
n cores per processor m threads per core n x m threads per processor
9IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
CMT Paradigm Shift!
> Higher reliability> Better performance> Lower cost> Faster Installation> More efficient energy use> Lower HVAC cost> Faster time-to-repair> ... and more
CMT technology allows simple, compact system designs, which deliver:
Everybody has changed to multi-core (CMP) and/or chip multi-threaded (CMT) processors: Sun(CMT), IBM(CMT), Intel(CMP), AMD(CMP)
10IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large Large
Low Medium High Medium High Medium
UltraSPARC T2 and T1CMT Processors
11IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
UltraSPARC T2Die Photo
8 SPARC cores, 8 threads each
Shared 4MB L2, 8 banks, 16-way associative
Four dual-channel FBDIMM memory controllers
Two 10/1 Gb Enet ports w/onboard packet classification and filtering
One PCI-E x8 port
Cryptograhic coprocessor on chip
1831 pins, 711 signal I/0
342mm2 die in 65nm
12IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
UltraSPARC T2Block Diagram
13IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
UltraSPARC T2
14IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
UltraSPARC T2 Reliability Extensive error detection and correction
Parity protection on I$, D$ tags and data, ITLB, DTLB, CAM and data, modular arithmetic, store address buffer
ECC on integer RF, floating point RF, store data buffer, trap stack, L2$ and other internal arrays
Combination of hardware and software correction flows Hardware re-fetch for I$ and D$ Software recovery for other errors Offlining of a thread, group of threads or physical core
Hardware error injection for verification Selective disabling of detection and
reporting for bringup
15IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Single-Core Processor
(Not to Scale)
C1
C2
C3
C4
C5
C6
C7
C8
Faster Can Be Cooler (1)
107C
102C
96C
91C
85C
80C
74C
69C
63C
58C
UltraSPARC T2 Reliability
16IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Single-Core Processor T2 Processor
(Not to Scale)
C1
C2
C3
C4
C5
C6
C7
C8
107C
102C
96C
91C
85C
80C
74C
69C
63C
58C
UltraSPARC T2 ReliabilityFaster Can Be Cooler (2)
17IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large Large
Low Medium High Medium High Medium
OpenSPARC
18IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
OpenSPARC Communities
Chip Designers
Hardware IP Suppliers
EDA Vendors
CMT Tools
Academia/Universities
Operating Systems
BenchmarkingReference flowFPGAEmulationVerificationPhysical DesignMulti-threaded tools
Architecture, ISA, VLSI course workThreading, Scaling, ParallelizationBenchmarks
PCI cores, SERDES etc.
Compilers, ThreadingOptimizationPerformance Analysis
OpenSolaris,Linux, BSD variants,Embedded OSs
SoC designs, Hard macrosTelecom applications
19IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
What's Available in OpenSPARC1. Chip design and verification UltraSPARC Architecture 2005 spec UltraSPARC T2/T1 implementation spec Full RTL (Verilog) of OpenSPARC T2/T1
(8 cores, 64/32 threads – more than 4 million lines of code!) Verification test suites Full OpenSPARC simulation environment Synthesis scripts for RTL FPGA implementation support
Reduced (to fit capacity), synthesizable version of RTL Synplicity scripts for FPGA synthesis
20IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
What's Available in OpenSPARC2. Architecture and performance modeling
SAM – SPARC Architectural Model (including source code)
Legion – Instruction accurate simulator (incl. source code)
OBP – Open Boot PROM source code Hypervisor source code Solaris images for simulation RST Trace Tool – trace format for SPARC
instruction-level traces
21IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
What's Available in OpenSPARC3. Tools for tuning and debug ATS – Binary reoptimization and recompilation
tool for tuning and troubleshooting applications Corestat – Online monitoring of core and FPU
utilization Discover – Runtime detection of programming
errors in allocating and using program memory Thread Analyzer – Checking of multi-threaded
programming errors such as data races and deadlocks
More...
22IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
What's Available in OpenSPARC4. Tools for software developers Sun Studio 12 – C, C++, Fortran compilers for
Solaris/Linux combined with Netbeans, etc. BIT – Binary Improvement Tool analyzes and
optimizes SPARC binaries for performance and code coverage
SPOT – produces detailed report on conditions that impact performance of an application
Source code analysis tool to identify incompatible APIs between Solaris and Linux to speed up migration
More...
23IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large
Low Medium High Medium High Medium
University research in hardware reliability using
OpenSPARC
24IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Mem WritebackExDecode
Decode ALUD-
Cache
RegFilex4
StoreBuffer
Hash Queue
FP Match Com
pare
ToL2
Problem: Error detection for the processor pipeline ( soft, wearout, … )
Solution: Architectural fingerprints Summarize retiring architectural updates into compact hash (regs, stores) Periodically compare hash with reference (another core, previous execution)
Results: Multithreaded OpenSPARC T1 RTL implementation — less than 4% area
overhead Scalable to wide-issue superscalar BW Soft fault injection: effective detection for errors propagated to arch. state
0.00.20.40.60.81.0
byp exu fcl fdp lsu swl tlu FullSPARC
Frac
t. ar
ch. e
rror
s
Silent Data Corruption Hang Loop
Architectural Fingerprints
Prof. Hoe and Prof. Falsafi @Carnegie Mellon University
25IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Problem: Detecting device wearout during soft breakdown stage Faults initially hidden by guardbands & masking
Solution: Periodically test processor cores for signs of growing wearout Reduce freq./voltage guardbands until marginal Test w/Arch. or Arch. fingerprintsμ Observe fails at incr. conservative conditions
Results: Wearout fault injection in OpenSPARC Arch. and Arch. fingerprintsμ
equivalent for wide-spread wearout Arch. needed for isolatedμ wearout
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200Stress past guardband (ps)
Fra
c. F
ails
det
ecte
d
ArchμArchTimeout
FIRST – Detecting Emerging Wearout Faults
Prof. Hoe and Prof. Falsafi @Carnegie Mellon University
26IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
• Detection: Software symptoms, minimal backup hardware • Recovery: Software/hardware checkpoint and rollback• Diagnosis: Firmware-controlled rollback/replay on multicore• Repair/reconfiguration: Redundant, reconfigurable hardware
Fault Error Symptomdetected
Recovery
Diagnosis Repair
Chkpoint Chkpoint
SWAT – SoftWare Anomaly Treatment
Prof. S. Adve, V. Adve and Y. Zhou @University of Illinois at U-C
Always-on, zero or low cost
May have high overhead, rarely invoked
Low cost solutions needed for in-field detection, diagnosis, recovery and repair for failures due to aging, soft errors inadequate burn-in, design defects, …
SWAT Framework Components
Motivation
27IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Status Detection techniques with > 95% coverage for most structures
[ASPLOS’08, SELSE’08, DSN’08] Microarchitecture level, firmware-driven diagnosis with > 97%
coverage [SELSE’08, DSN’08] So far, used microarchitecture-level fault injection in simulation
Ongoing/future work with OpenSPARC Gate-level fault modeling Hypervisor implementation
SWAT – Status and Ongoing Work
Prof. S. Adve, V. Adve and Y. Zhou @University of Illinois at U-C
28IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Goals Understand how gate level faults propagate to microarch & s/w Abstract fault models at microarchitecture level Evaluate reliability solutions and validate results
Methodology Perform fault injections at gate level For better simulation speed
Hierarchical integration of microarchitecture level full system simulator with lower-level simulation of faulty unit
Using OpenSPARC Verilog model
SWAT – Ongoing WorkHigh-level fault models and validation
Prof. S. Adve, V. Adve and Y. Zhou @University of Illinois at U-C
29IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Plan to use OpenSPARC hypervisor to prototype and evaluate firmware part of SWATMethodology
Leverage, extend interface between hypervisor/hardware and hypervisor/OS
Extend hypervisor for functionality Use for error detection, recovery, diagnosis, repair
SWAT – Future WorkHypervisor implementation
Prof. S. Adve, V. Adve and Y. Zhou @University of Illinois at U-C
30IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
VARIUS – Process Parameter Variation
Problem: Parameter variation in present and future multicore chips
Goals: Model parameter variation and resulting timing errors Design multicore microarchitectures to detect and tolerate
variation-induced errors Develop new microarchitectural techniques to mitigate
variation and variation-induced errors.
Prof. Torrellas @University of Illinois at U-C
31IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
VARIUS – Process Parameter VariationAccomplishments
VARIUS model of parameter variation and resulting timing errors for microarchitects [TSM08]
ReCycle: Pipeline rebalance under process variation [ISCA07]
Fine-grain adaptive body bias (ABB) to mitigate variation in multicores [MICRO07]
Workload scheduling and DVFS power management in multicores under variation [ISCA08]
Paceline: Core pairing for reliability under process variation [PACT07]
Prof. Torrellas @University of Illinois at U-C
32IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
VARIUS – Process Parameter VariationUsing OpenSPARC
Goal: Get insights into the effect of parameter variation on a real processor Measure the distribution of the path delays Apply the variation model
Evaluation Flow: Synopsys dc_shell-t:
compile RTL to gate-level netlist Cadence SOCEncounter
Floorplan, Placement, Routing, Timing analysis Synopsys Primetime
Static timing analysis & timing debugging Cadence NCSim
Simulation
Compile RTL (dc_shell-t)
Design entry
SOCEncounter
RTL & Timing Constraints &Library
Netlist & Timing Constraints &Physical library
Primetime
Placement & Timing report &Routing
Netlist & Timing info
NCSim
Prof. Torrellas @University of Illinois at U-C
33IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
CASP – Concurrent Autonomous Chip Self-test using Stored PatternsMotivation
33
WearoutInfant mortality Normal lifetimeTime
Failure rate
Burn-in difficult
Circuit agingdominant
Solution: EXTREMELY THOROUGH
online self-test
Soft errors: effective techniques exist Prof. Mitra @Stanford University
34IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
CASP – Test Flow
34
Core N normal
operation
Schedule test on
next core
Core 4 resume
operation
Core N normal
operation
Core 4 temporarily
isolated
Core N normal
operation
Prepare core for
test
Core 4 selected for
test
Core 4
under test
Core N normal
operation
Thorough scan &
functional testing;
recovery if failed
Test Scheduling Pre-processing
Test Application Post-processing
Bring core from
test to normal
operation
... ...
......
Prof. Mitra @Stanford University
35IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
OpenSPARC Modifications for CASP
35
8 processor
cores
Modified for
CASP support
Cross-bar
Switch
Modified for CASP support
L2Cache
FPU
DRAMControl
Jbus Interface
on-chip buffer
(7.5KB)
CASP control
CASP off-chip Storage (52MB)
CASP Controller
On-chip buffer for scan test data
Architectural modfications
➢ Before a core is tested➢ stalling/draining pipeline➢ disabling communication with
core under test➢ saving critical state➢ invalidating D$
➢ After a core is tested➢ restoring critical state➢ enabling communication with core
under test➢ restarting pipeline
● 8000 lines of new Verilog code
● Verification regression used to simulate normal operation of chip
Prof. Mitra @Stanford University
36IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Instruction-level Parallelism
Thread-level Parallelism
Instruction/DataWorking Set
Data Sharing
Low Low Low LowMedium High
High High High High High
Large Large Medium Large
Low Medium High Medium High Medium
Future research possibilities in hardware reliability using
OpenSPARC
37IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Future research possibilities Using CMT hardware resources for error
detection and recovery cores, threads, structures used by cores/threads
Understanding errors in the context of CMT architectural constructs thread arbitration and scheduling speculative threading
Validate error management solutions using a state-of-the-art microprocessor design
38IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Future research possibilities Study impact of reliability solutions on
microprocessor performance use performance tools available in OpenSPARC
Firmware and software solutions for hardware reliability FPGA implementation and T1000/2000 servers with
Solaris/Hypervisor source and other tools Study impact of error detectors in processor
on chip level and application failure rates enable error detection selectively, use simulators
Several more...
39IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Conclusions
OpenSPARC is an open source community based around UltraSPARC T1 and T2 CMT microprocessors
OpenSPARC provides a rich, state-of-the-art infrastructure for research in hardware reliability
Many universities are actively using OpenSPARC in their research, with a lot of success
There is a lot more research in hardware reliability that can be done using OpenSPARC
40IEEE SELSE 4 – March 26, 2008
www.OpenSPARC.net
Acknowledgment
We would like to acknowledge the students (past and present) from Carnegie Mellon University, University of Illinois at U-C and Stanford University who contributed to the research described in this presentation.