A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin,...
-
Upload
brett-staker -
Category
Documents
-
view
215 -
download
2
Transcript of A Mechanism for Online Diagnosis of Hard Faults in Microprocessors Fred A. Bower, Daniel J. Sorin,...
A Mechanism for Online Diagnosis of Hard Faults in Microprocessors
Fred A. Bower, Daniel J. Sorin, and Sule Ozev
overview
Motivation Current Techniques
Proposed Mechanism for Online Fault DiagnosisResults
ChallengesConclusion
online diagnosis
Track Units
DIVA ERROR
deconfigureunit
error_count++
If(error_count > threshold)
YES
NONo Action
ALU DIVA CHECKER
Reorder Buffer
Reservation Station
Units that can be turned off in case of a fault
Field Deconfigurable Units (FDU)
Deconfigure entries in circular buffer Deconfigure entries in tabular structure
deconfiguring mechanism
Hard fault diagnosis latency Performance impact of losing component to hard fault
analysis
• DIVA: 6% of an Alpha 21264 core
• Error counters (~1227 bits total)
• Instruction resource usage (19 wires in total)
• Deconfiguration logic
• Can be reduced using coarse granularity
challenges
Error count threshold• Related to resource usage• Heavily used resources have higher
counters• Pipeline flushes before threshold is
reached
challenges
Error count threshold• Related to resource usage• Heavily used resources have higher
counters• Pipeline flushes before threshold is
reached
Transient faults
Independent resource usage
ERRORHARD FAULT
TRANSIENT FAULT
A B C
D E F
Desired
Observed
DIVA CHECKER
challenges
• Certain structures cannot be protected• Register File• Issue logic• Common Data Bus (CDB)
• Transient fault False Deconfiguration• Possibly masked by error counter
• Faults in the error counter or deconfiguration logic• Periodically test counters• Permanently configure or deconfigure FDU
upon error
• Window of vulnerability• DIVA produces errors until counter
saturates
limitations
• As transistors shrink, hard fault rate increases
• Current reliability mechanisms• Redundancy (TMR)• Thread level redundancy• Pre shipment testing and deconfiguration• Low cost solutions such as DIVA
• Online diagnosis• Low cost and hardware overhead• Use FDUs along with DIVA to diagnose faults dynamically• Increase yield Binned to a lower performance bin
conclusion
discussion
What are the advantages of this hybrid scheme over using just a DIVA checker?
As process technology gets smaller, can this mechanism help increase the lifetime of the processor a significant amount?
As transistors shrink, the number of cores will increase, can this mechanism be used still as opposed to turning off a faulty core?
How can we extend this mechanism to take care of the issue logic, singleton resources and CDB?
citations
images• Electron Migration. Digital image. Wikimedia.org. Wikimedia, 6 Mar. 2007. Web.
<http://upload.wikimedia.org/wikipedia/commons/thumb/8/8b/Leiterbahn_ausfallort_elektromigration.jpg/220px-Leiterbahn_ausfallort_elektromigration.jpg>.
• Gate Oxide Breakdown. Digital image. Attopsemi Technology. Attopsemi Technology, n.d. Web. <http://www.attopsemi.com/tec3.htm>.
• Sawant, Minal. Single Event Upset. Digital image. COTS. Microsemi, Jan. 2012. Web. <http://www.cotsjournalonline.com/articles/view/102279>.
• Sawant, Minal. Soft Error Rate. Digital image. CCCP. University of Michigan, 11 May 2012. Web. <http://cccp.eecs.umich.edu/research/reliability.php>.
• Carr, Robert. Simultaneous Multithreading. Digital image. Prezi. Prezi, 31 Oct. 2013. Web. <http://prezi.com/tegbbfk34l57/question-2/>.
• Wong, William. Out of Order Pipeline. Digital image. Electronic Design. Electronic Design, 19 Oct. 2011. Web. <http://electronicdesign.com/microcontrollers/little-core-shares-big-core-architecture>.
• Mark Brehob, EECS 470 Lecture Slides
• Fred A. Bower, Daniel J. Sorin, and Sule Ozev. A Mechanism for Online Diagnosis of Hard Faults Microprocessors. In Proc. Of the 38th Annual IEEE/ACM International Symposium on Microarchiteceture (MICRO’05), 2005
• T.M. Austin. DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design. In Proc. Of the 32nd Annual IEEE/ACM Int’l Symposium on Microarchitecture, pages 196-207, Nov. 1999.
papers