A Survey of Fault Tolerant Methodologies for FPGA’s
-
Upload
cora-moses -
Category
Documents
-
view
34 -
download
3
description
Transcript of A Survey of Fault Tolerant Methodologies for FPGA’s
A Survey of Fault Tolerant Methodologies for FPGA’s
Gökhan Kabukcu2006703357
Outline
Introduction to FPGA’s Device-Level Fault Tolerance
Methods Configuration Level Fault Tolerance
Methods Comparison of Methodologies Conclusion
Introduction (FPGA) A field programmable gate array
is a semiconductor device containing programmable logic components and programmable interconnects
Consists of regular arrays of processing logic blocks (PLBs)
Programmable routing matrix Configuration of FPGA includes
The functionality of the FPGA Which PLBs will be used The functionality of the PLBs Which wire segments will be used for
connecting PLBs
Introduction (FPGA) PLB’s are multi-input,
multi-output circuits and allow: Sequential Designs Combinational Designs
PLB’s include: Look Up Tables (LUTs or
small ROMs) Multiplexers Flip-Flops
Introduction (FPGA)
Look Up Tables (LUTs): 4 input-1 output units Can be used as:
RAM ROM Shift Register Functional Unit
Configured by an 16-bit“INIT” function
Introduction (FPGA) An Example:
y=(x1+x2)*x3+x4 Create truth table Assign “INIT” to the LUT Since there are 4 inputs and 1 output, 1
LUT is enough to represent the equation
The LUT can be put into any PLB in the FPGA
x1 x2 x3 x4 y
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 1
0 1 0 0 0
0 1 0 1 1
0 1 1 0 1
0 1 1 1 1
1 0 0 0 0
1 0 0 1 1
1 0 1 0 1
1 0 1 1 1
1 1 0 0 0
1 1 0 1 1
1 1 1 0 1
1 1 1 1 1
Introduction (FPGA) Another Example:
y=(x1+x2)*x3+x4 z=y*x5 Create truth tables Assign “INIT”s to LUTs Since there are 5 inputs
and 1 output, 2 LUTs needed to represent the equation
The LUTs can be put into any PLBs in the FPGA
A1 and A0 are “don’t care”s
x1 x2 x3 x4 y
0 0 0 0 0
0 0 0 1 1
0 0 1 0 0
0 0 1 1 1
0 1 0 0 0
0 1 0 1 1
0 1 1 0 1
0 1 1 1 1
1 0 0 0 0
1 0 0 1 1
1 0 1 0 1
1 0 1 1 1
1 1 0 0 0
1 1 0 1 1
1 1 1 0 1
1 1 1 1 1
y x5 z
0 0 0
0 1 0
1 0 0
1 1 1
Introduction (FPGA) An example of a full
design on an FPGA
Fault Tolerance Device-Level Fault Tolerance
Attempts to deal with faults at the level of FPGA hardware
Select redundant HW, replace faulty one Solution with extra HW resources
Configuration-Level Fault Tolerance Tolerates faults at the level of FPGA
configuration When a circuit is placed, fault-free resources are
selected Status of the resources is considered each time
a circuit is placed-and-routed Solution with extra reconfiguration time
Device-Level FT Methods(1) Extra Rows
One extra spare row is added
Selection Logic is added to bypass the defective row
Vertical wire segments are increased by one row
Faults in one row can be tolerated
More than 1 spare row needed to tolerate faults in multiple rows
Device-Level FT Methods(2) Reconfiguration Network
Four architectural changes Additional routing
resources (bypass lines) Reconfiguration Memory
to store locations of faulty resources
On-chip circuitry for reconfiguration routing
Additional column of PLBs
Device-Level FT Methods(2) Reconfiguration Network
Test and identify faulty resources Create fault map Load map into Reconfiguration
Memory On-board router avoids faulty
resources The network is constructed by
shifting all PLBs in the fault-containing row towards the right
Method can tolerate 1 fault in each row if there is one extra spare column.
Device-Level FT Methods(3) Self-Repairing Architecture
Sub-arrays of PLBs Routers between sub-arrays Extra columns of PLBs PLBs constantly test themselves If a fault is detected,
Column of affected PLB is shifted one position to the right
The inter-array routers are adjusted Area overhead of this method is
significant If there is 1 spare column and N sub-
arrays in vertical, method can tolerate N faults at a time
Device-Level FT Methods(4) Block-Structured
Architecture Goal: tolerate larger and
denser patterns of defects efficiently
Blocks of PLBs FPGA is configured by a
loading arm. The block at the end of
loading arm is configured
Device-Level FT Methods(4) Block-Structured Architecture
A block is selected by the loading arm and tested
If the test is passed, it is configured, otherwise designated as faulty
Loading arm configures blocks one by one
If the arm cannot extend any further in a path, it’s retracted by one block
Fault tolerance is provided by redundant rows and/or columns
Area overhead is significant
Device-Level FT Methods(5) Fault Tolerant Segments/Grids
Fault Tolerant Segments: Adds one track of spare segment to
each wiring channel If a faulty segment is found, segment
is shifted to spare Single fault can be tolerated
Fault Tolerant Grids: An entire spare routing grid is added No additional elements in routing
channel, no extra time delay
Device-Level FT Methods(6) SRAM Shifting
Based on shifting the entire circuit on the FPGA
PLBs should be placed in 2 ways: King Allocation: 8 PLBs uses one spare,
circuit can move in 8 directions Horse Allocation: 4 PLBs uses one spare,
circuit can move in 4 directions Testing determines the faulty cells, feeds
information to the shifter circuitry on the FPGA.
Device-Level FT Methods(6) SRAM Shifting
Additional spare PLBs surrounding the FPGA
Horse Allocation used in the figure
The circuit is shifted up and right Advantages of the Method:
No external reconfiguration algorithm is required
The timing of the circuit is almost fixed
Any single fault can be tolerated
Configuration-Level FT Methods(1) Pebble Shifting
Find an initial circuit configuration, then move pieces from faulty units
Occupied PLBs are called pebbles Pair pebbles on faulty cells with unique,
unused cells such that sum of weighted Manhattan distance is minimized
Start shifting pebbles If a pebble finds an empty cell other than
the intended cell, this empty cell becomes the destination
No limit to the number of faults that can be tolerated
Configuration-Level FT Methods(1)
Pebble Shifting Example: 1 and 6 are on faulty cells Using a minimum-cost, maximum
matching algorithm, pairings are: 1->v11 and 6->v32
Element 1 is shifted its position To move 6, we shift 3,8 and 7 Now all elements are on non-faulty
cells and allocation is done
Configuration-Level FT Methods(2)
Mini-Max Grid Matching Uses a grid matching algorithm to match faulty
logic to empty, non-faulty locations Like Pebble Shifting, uses minimum cost,
maximum matching algorithm Minimizes the maximum distance between the
pairings, since the circuit’s performance is set by the critical (longest) path
Can tolerate faults until there are no unused cells
Configuration-Level FT Methods(3) Node-Covering and Cover
Segments When a fault is discovered,
nodes are shifted along the chain (row) towards the right
The last PLB of a chain is reserved as a spare
One fault in a row can be tolerated
Needs no reconfiguration if local routing configurations are present
Configuration-Level FT Methods(4) Tiling
Partition FPGA into tiles Precompiled configurations of tiles are
stored in memory Each tile contains system function, some
spare logic and interconnect resources When a logic fault occurs in a tile, the
configuration of the tile is replaced by a configuration that does not use the faulty resources
Many logic faults can be tolerated Local interconnect faults can be
tolerated, but global ones can’t be tolerated
Configuration-Level FT Methods(5)
Cluster-Based Intracluster tolerance in a PLB Basic Logic Elements (BLEs or LUTs) For simple LUT faults, preferred solution
is to use another LUT in the PLB Instead of changing PLB, try to find a
solution in the same PLB In example, T is faulty and 4th PLB is
used instead of 2nd PLB
Configuration-Level FT Methods(6) Column-Based
Treats the design as a set of functional units, each unit is a column
Like Tiling, less cost precompiled configurations
At least one column should be spare If there is a faulty cell in a column, the
column is shifted toward the spare column
Method can tolerate m faulty columns, where m is the number of columns not occupied by system functions
Comparison of Methodologies(1) Device Level (DL) Methods need extra HW and
have more area cost
DL Methods use one initial reconfiguration and no extra reconfiguration cost
Configuration Level Methods needs more than one reconfiguration and sometimes result in high time cost
CL Methods don’t need extra HW and no additional area cost
Comparison of Methodologies(2)
DL Methods are less flexible, therefore less able to improve reliability
CL Methods usually tolerate more faults than DL Methods
Performance impact of fault tolerance is less for DL Methods than CL Methods
Conclusion
No single Fault Tolerance methodology is better than the others in all cases.
DL Techniques has less impact on performance, but not flexible
CL Methods tolerates more faults but have more impact on performance