Multi-Bit Upsets in the Virtex Devices Heather Quinn, Paul Graham, Jim Krone, Michael Caffrey Los...

25
Multi-Bit Upsets in the Virtex Devices Heather Quinn, Paul Graham, Jim Krone, Michael Caffrey Los Alamos National Laboratory Gary Swift, Jeff George, Fayez Chayab XRTC

Transcript of Multi-Bit Upsets in the Virtex Devices Heather Quinn, Paul Graham, Jim Krone, Michael Caffrey Los...

Multi-Bit Upsets in the Virtex Devices

Heather Quinn, Paul Graham, Jim Krone, Michael Caffrey

Los Alamos National Laboratory

Gary Swift, Jeff George, Fayez Chayab

XRTC

Multi-Bit Upsets

• Multiple neighboring bits flipped from a

single ionized particle strike

• Common to all memory devices

• FPGAs a unique case – Heterogeneous layout affects the size, shape

– Domain crossing event (DCE) where an SEU

affects two or more TMR modules are a

concern

MBU Research Plan

• Determine the event rate

• Determine the MBU frequency– Multiple families of FPGAs: feature size, epi

– Radiation type: proton and heavy ion

– Strike angle

• Determine the DCEs frequency– Early simulation results

– Early probabilistic model to predict DCEs

Virtex Proton Test Setup: Crocker Nuclear Laboratory,

UC-Davis

SLAAC1-V

Linux PC

ProtonSource

LANL V2 and V4 Hardware Test Fixture

Virtex-II AFX

Board

USB 2.0Interface

toLinux PCVirtex-4

AFXBoard

LANL Software Test Fixture

• Text-based interface – SSH'd into the test machine

– Instant access to incremental readback results

• Tight readback/scrubs cycle– Saves differential bitstream for upsets

– Saves FAR, COR, and STAT registers

– Cycles approximately once a second

– Scrubs entire bitstream

• Data analyzed for MBUs post-collection

Data Collection & Analysis

● Possibility of coincident SBUs must be minimized● Minimize the number of upsets per readback● Fluence per readback cycle controlled

● SEFIs (or unexplained events) removed in post-collection

data analysis procedures● SEFIs present as numerous large MBUs: skew statistics● Data cube insignificantly reduced, results cleaner

Heavy Ion Accelerator Tests

• Done by Xilinx Radiation Testing Consortium

• Collected data for:

Family PartVirtex XCV300 5.4077%Virtex-II XC2V1000 5.8946%Virtex-II Pro XC2VP40 1.1697%Virtex-4 XC4VSX35 0.0093%

Worst Case Probability

of Coincident SBUs

Heavy Ion Event Bit Cross-Sections

V4 Heavy Ion Event and Upsets Cross-Sections

63 % decrease in cross-sections

Percentage of MBU Events in Heavy Ion:

Comparison Across Families

V4 Percentage of Events by Resource:

BRAM, CLB

V4 MBUs from Angle Strikes Across Columns

V4 MBUs from Angle Strikes Down Columns

Proton Accelerator Tests

• Done by Los Alamos National Laboratory

• Collected data for:

Family PartVirtex XCV1000 0.0006%Virtex-II XC2V250 0.0149%Virtex-II XC2V1000 0.0298%Virtex-4 XC4VLX25 0.0379%Virtex-4 XC4VSX35 0.0044%

Worst Case Probability

of False MBUs

Percentage of MBU Events in Proton at 63 MeV:

Comparison Across Families

Device Total Events 1-Bit Events 2-Bit Events 3-Bit Events 4-Bit EventsXCV1000 241,166 241,070 96 0 0

(99.96%) (0.04%) (0%) (0%) XC2V250 337,814 333,639 4,129 44 2

(98.76%) (1.22%) (0.01%) (0.0006%) XC2V1000 204,009 199,641 2,164 12 1

(97.86%) (1.06%) (0.006%) (0.0005%) XC4VLX25 152,577 147,902 4,567 78 8

(96.44%) (2.99%) (0.051%) (0.0052%) XC4VSX35 142,422 139,349 3,037 30 6

(97.84%) (2.13%) (0.021%) (0.0042%)

Proton Bit-Cross-Sections

Device

XCV1000

XC2V250

XC2V1000

XC2VP40

XC4VLX25

SBU Bit-Cross-

Section (cm2/bit)

MBU Bit-Cross-Section

(cm2/bit)

6.29x10-14 2.77x10-17

3.06x10-14 3.34x10-16

2.30x10-14 2.50x10-16

3.68x10-14 4.94x10-16

1.22x10-14 3.85x10-16

V4 Proton Event and Upsets Cross-Sections

9 % decrease in cross-sections

MBUs from Angle Strikes Down Columns

TMR Studies

• TMR designs: – Edge detection, max filter, min filter

– Each design has 2-3 implementations that vary module granularity

• LANL/BYU SEU simulation extended to simulate column

MBUs:– Observe, characterize, and determine the DCE rate

– Test both 1-bit and 2-bit column events

• Data inconclusive at best

TMR Results: MBU Sensitive Bits

Design Unique MBU BitsMax Filter (no TMR) 692

Max Filter (TMR Type 1) 122Max Filter (TMR Type 2) 133Max Filter (TMR Type 3) 117Min Filter (no TMR) 626

Min Filter (TMR Type 1) 134Min Filter (TMR Type 2) 135Min Filter (TMR Type 3) 139Sobel Edge (no TMR) 468Sobel Edge (TMR Type 2) 151

Sobel Edge (TMR Type 3) 121

Device: XC2V250 (Virtex-II) with 1.6 Million Bits

Even without triplicated clocks, TMR improves

the cross-section 4-6 times

Given an MBU what is the probability of a DCE?

SBUs MBUs

BRAM

CLB

BRAMi

IOB

BRA

M CLBBRAMi IOB

Voted

Out

Voted

Out

D

C

ESBU

and

MBU

Event

SpaceNeed to determine what percentage of the

event space are DCEs

Probability of an MBU DCE in CLBs and BRAMi:

Proton-Induced Radiation Events

DesignMax Filter (no TMR) 0.000010% 0.000964%Max Filter (TMR Type 1) 0.000007% 0.000159%Max Filter (TMR Type 2) 0.000009% 0.000150%Max Filter (TMR Type 3) 0.000009% 0.000131%Min Filter (no TMR) 0.000023% 0.000872%Min Filter (TMR Type 1) 0.000007% 0.000178%Min Filter (TMR Type 2) 0.000015% 0.000162%Min Filter (TMR Type 3) 0.000009% 0.000176%Edge Filter (no TMR) 0.000006% 0.000664%Edge Filter (TMR Type 2) 0.000011% 0.000184%Edge Filter (TMR Type 3) 0.000002% 0.000145%

BRAMi CLBs

Probability of a DCE in CLBs and BRAMi:

Heavy-Ion-Induced Radiation Events

Summary

• MBU problem getting worse with each generation

• DCEs in TMR a concern

• Need more data collection, more simulation, more modeling– Data collection: finish qualification

– Simulation: complete TMR studies with more designs and more

devices

– Modeling: predict on-orbit SBU/MBU rates