3D-DRESD R4R

22
POLITECNICO DI MILANO Lecturer: Antonio Miele – [email protected] Involved People: Cristiana Bolchini, Antonio Miele, Marco D. Santambrogio SEU Mitigation for SRAM-Based FPGAs through Dynamic Partial Reconfiguration - 3D-DRESD Second Edition - - 3D-DRESD Second Edition -

description

 

Transcript of 3D-DRESD R4R

Page 1: 3D-DRESD R4R

POLITECNICO DI MILANO

Lecturer: Antonio Miele – [email protected]

Involved People:

Cristiana Bolchini, Antonio Miele, Marco D. Santambrogio

SEU Mitigation for SRAM-Based FPGAs through Dynamic Partial

Reconfiguration- 3D-DRESD Second Edition -- 3D-DRESD Second Edition -

Page 2: 3D-DRESD R4R

2

MotivationsMotivations

Designing reliable systems implemented on FPGAs, able to cope with the effects of faults caused by radiations

Appling already known and well studied detection and recovery techniques to novel scenarios

Exploiting dynamic partial reconfiguration to trigger the reconfiguration of the affected portion of the architecture

… while the rest of the system is still working… without need to entirely reprogrammed the system

Page 3: 3D-DRESD R4R

3

OutlineOutline

GoalsStarting point

Fault tolerance and reliabilityReconfigurable architectureRelated work

The proposed approachRequirementsSolution space exploration

Project roadmapCompleted stepsWork in progress

Other works

Conclusions and Future Work

Page 4: 3D-DRESD R4R

4

GoalsGoals

Design space exploration w.r.t. reliability

Apply traditional, sound techniques in a different context, exploiting the peculiarity of the platform

Evaluate the alternative designs, comparing costs, performance and fault detection properties

Support the designer in selecting the most convenient solution

Page 5: 3D-DRESD R4R

5

Fault Model && ReliabilityFault Model && Reliability

Adopted fault modelRadiation and -particles causedSingle Event Transient (SET), Single Event Upset (SEU)

Bit-flipTemporary – data and control registersPermanent – configuration memory

Page 6: 3D-DRESD R4R

6

Reconfigurable ScenarioReconfigurable Scenario

FPGAs:Xilinx family

(Virtex, VirtexII, VirtexIIPro, Virtex4, ...)

ReconfigurationModular design flow

E.g., Early Access Partial Reconfiguration (EAPR)

Page 7: 3D-DRESD R4R

7

Related WorkRelated Work

TMR at different levels of abstraction replication of the entire circuit or of each register

Periodic bitstream scrubbing

Bitstream readback

Area overhead, latency in recovering and power consumption

Page 8: 3D-DRESD R4R

8

Proposed ApproachProposed Approach

Fault detection and masking

Duplication with comparison (DWC)Triple Modular Redundancy (TMR)Redundant Codes

presented in the 70s and 80s

RecoveryPartial dynamic reconfiguration

Page 9: 3D-DRESD R4R

9

RequirementsRequirements

Fault detection and characterizationIdentification of a mismatchDetect if transient or permanent

Fault localizationIdentification of the portion of the device where the fault occurred

Partial reconfigurationReconfiguration of the smallest portion of the FPGA if fault effect is characterized as permanent

Page 10: 3D-DRESD R4R

10

Design Space ExplorationDesign Space Exploration

Several solutions with applying DWC

Several solutions with applying TMR

Page 11: 3D-DRESD R4R

11

Design Space ExplorationDesign Space Exploration

Discarding of disadvantageous solutionsFor instance, elimination of not required error controlling modules (E.g.: voters)

Page 12: 3D-DRESD R4R

Design Space ExplorationDesign Space Exploration

Presented issues lead to the definition of a framework for the design space explorationIt aims at

Estimating the costs and benefits deriving from the possible different solutionsExploring the solution space on the based of several metrics

E.g.: size of the subsystems, size of the data widthsIdentifying most promising solutions

12

Page 13: 3D-DRESD R4R

Project roadmap:Completed steps

13

Page 14: 3D-DRESD R4R

14

Case StudiesCase Studies

Noekeon algorithm:Block cipher (128-bit key, 128-bit block)

FIR filter: Simple and regular architecture

10

0

)()(i

i itxcty

Page 15: 3D-DRESD R4R

15

A first attemptA first attempt

Few solutions have been implementedDWC (or TMR) has been adoptedEach solution proposes a different grouping of system modules and a different placement on reconfigurable areas

Page 16: 3D-DRESD R4R

Exhaustive exploration of solution Exhaustive exploration of solution spacespace

Considering TMR, all the possible solutions have been generated (not implemented!)An all-to-all comparison have been performed to choose most promising ones and to discard least interesting

Area occupation has been taken into account as metricSolution area have been estimated by adding single module area occupations

16

Page 17: 3D-DRESD R4R

Project roadmap:Work in progress

17

Page 18: 3D-DRESD R4R

Exhaustive exploration of solution Exhaustive exploration of solution spacespace

Designing an algorithm that Enables a “smart” exploration of the solution space Enable the search of the most promising solutions on the base of an objective function that considers cost/benefit metricsExplores the design space considering more than one technique (E.g.: TMR, DWC, redundant codes)

18

Page 19: 3D-DRESD R4R

A first draft

Implementing the frameworkImplementing the framework

19

RoadRunner Lib

(TRC, ...)

Project Lib

Top ModuleVHDL

Transf.XML

Mod. VHDL

VHDL Parser VHDL Re-builder

Mod. VHDL

Rec Arch VHDL

Graph Manipulator

Rec Lib(TRC, ...)

Component Syntheses Constraint File Builder

Constr File

Tranf. Rules (Rec,

TMR,...)

Page 20: 3D-DRESD R4R

Other worksOther works

Another related work deals with the design of a fault injector for FPGA Motivations:

Reliability assessment is an important task when designing reliable embedded systemsIt is usually performed by means of fault injector experiments

Requirements:Stop the execution preserving system stateInject a fault by downloading a partial bitstream

It should allow corruption of both data registers and configuration memory

Restart the executionIMPORTANT ISSUES: osservability and controllability of fault injection

20

Page 21: 3D-DRESD R4R

21

Conclusions and Future WorkConclusions and Future Work

We proposed guidelines for evaluating various alternatives for SEU mitigation techniquesWe applied DWC and TMR to detect faults and partial dynamic reconfiguration to recoverWe explored exhaustively the solution space considering a single technique

Next steps:Automatic system partitioning in reliable areasGathering alternative concurrent error detection techniquesDesigning an EAPR-based flow

Page 22: 3D-DRESD R4R

22

QuestionsQuestions

??