Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu,...

Post on 15-Jan-2016

212 views 0 download

Tags:

Transcript of Committee Members: Annie S. Wu, Jooheung Lee, and Ronald F. DeMara Committee Members: Annie S. Wu,...

Committee Members:Annie S. Wu, Jooheung Lee, and Ronald F. DeMara

Committee Members:Annie S. Wu, Jooheung Lee, and Ronald F. DeMara

Optimizing Dynamic Logic Realizationsfor Partial Reconfiguration of

Field Programmable Gate Arrays

Matthew G. ParrisUniversity of Central Florida

Matthew G. ParrisUniversity of Central Florida

Agenda

• Contributions of Thesis

• Previous Work

• Evolvable Hardware Optimization Strategies

• Partial Reconfiguration & Architectural Analysis

• Dynamic Processor Allocation Strategies

• Conclusion and Future Work

Contributions of Thesis

• Novel Taxonomy Classify current FPGA fault-handling methods

• FPGA Repair Optimization Improve the performance of a Genetic Algorithm

• Architectural Analysis Demonstrate benefits of newer FPGA devices

• Adaptive Architecture Implementation Exploit benefits of Partial Reconfiguration

Previous Work

• SRAM Field Programmable Gate Arrays

(FPGA)

From: The Design Warrior’s Guide to FPGAs by Clive Maxfield

LUTmux flip-flop

abcd

in

clock

q

y

Programmable Logic Block (PLB)

Previous Work

• Unlimited Programmability Quickly test prototypes on final H/W architecture Patch design flaws while in use Repair radiation faults

• Ideal target for space applications

Previous Work

• Manufacturer-provided Increase production yield of FPGAs Architectural / hardware modifications

• User Provided Integrate fault-handling methods into FPGA

application

Previous Work

• A-priori Allocation Assign spare resources

during design process

• Dynamic Processes Assign spare resources

or determine repair during run-time

Previous Work

Fine-grained

Medium-grained

Coarse-grained

Sub-PLB Spares

PLB Spares

Incremental Rerouting

GA Repair

Augmented GA Repair

TMR w/ Single Module Repair

Online BIST

Competing Configurations

ResourcesOperational DelayFault LatencyUnavailabilityFault OcclusionRepair GranularityFault ToleranceFault CoverageCritical Requirements

MetricsMethods

Previous Work

• Genetic Algorithm Fault-Handling Some other method detects a fault Create a population of candidate solutions Test each candidate to evaluate performance Apply genetic operators to create new individuals

Crossover Mutation

Repeat process until complete repair is found

+ +

Evolvable Hardware Optimization Strategies

• Optimize GA fault-handling method Some partition methods are based on similarity

between individuals Requires similarity function that may not be possible, and

also incurs undesired computation

Age-layered Population Structure (ALPS) Used to evolve higher-fit antenna designs Partition population of candidate solutions based on age

of individual Negligible additional computation Contains best individual within one sub-population to

prevent convergence of the population

Evolvable Hardware Optimization Strategies

• Optimize GA fault-handling method

Standard GApopulation

age-level 9

age-level 8

age-level 7

age-level 6

age-level 5

age-level 4

age-level 3

age-level 2

age-level 1

age-level 0

Repair

Repair

Evolvable Hardware Optimization Strategies

Individuals increasing in age

Evolvable Hardware Optimization Strategies

Evolution of competitive individuals

Evolvable Hardware Optimization Strategies

Best Individuals at each Generation (averaged over 100 runs)

Evolvable Hardware Optimization Strategies

• Reasons for sluggish performance Partitioning the population into sub-populations

(restricts rate that genetic info is communicated)

Replacing the bottom age-level every 20 gen.(causes ALPS to be less deterministic)

Beginning population size of ALPS is 1/10 of standard(700 generations are needed to saturate capacity)

Parent1

Parent2

Choice1

Choice2

Evolvable Hardware Optimization Strategies

• Propose new selection strategy for crossover genetic operator

Old Selection Strategy(combined)

New Selection Strategy(separate)

Parent1

Pop1

Pops0&1

Parent2

Pop0

Pop0

Pop1

Choose with probability p

Evolvable Hardware Optimization Strategies

Best Individuals at each Generation (averaged over 100 runs)

Evolvable Hardware Optimization Strategies

Partial Reconfiguration and Architectural Analysis

• Overview Partial reconfiguration

modifies a portion of the FPGA

Multiple modules may reside within reconfigurable area

Previous Work

Spare Configs: Fine-grained

Previous Work

Online Recovery: Competitive Configurations

Partial Reconfiguration and Architectural Analysis

• Benefits of Partial Reconfiguration Reconfiguration: time-multiplex between functions

(extend the number of available resources with time)

Partial: module granularity reduced Unchanged portion of FPGA is not affected by configuration Smaller bitstream filesize

Smaller reconfiguration time Less storage requirements

Result: significantly more combinations of hardwarearrangements with similar storage requirements

Partial Reconfiguration and Architectural Analysis

xc2vp30-7ff896, 80CLB configuration frameBitstream Filesize(bytes)

Area Allocated (slices)

Area Used(slices)

Time to Configure (seconds)

Full Device 1,448,817 13,696 13,696 7MD5 320,597

(22.1%)1280 (9.3%) 389 (2.8%) 2 (28.6%)

SHA-1 356,702 (24.6%)

1280 (9.3%) 457 (3.3%) 2 (28.6%)

2.8 –3.3% resource usageversus

22.1 –24.6% bitstream filesize

Partial Reconfiguration and Architectural Analysis

Overview of partial reconfiguration design

Partial Reconfiguration and Architectural Analysis

FPGA Implementation and Resource Utilization

Partial Reconfiguration and Architectural Analysis

xc4vfx60-11ff672, 16CLB configuration frame

BitstreamFilesize(bytes)

Area Allocated (slices)

Area Used(slices)

Full Device 2,625,438 25,280 25,280

MD5 95,962 (3.7%) 1,280 (5.1%) 405 (1.6%)

SHA-1 97,619 (3.7%) 1,280 (5.1%) 472 (1.9%)

1.6 –1.9% resource usageversus

3.7% bitstream filesize

V-II: 320,597 bytesversus

V-4: 95,962 bytes(70% reduction)

Dynamic Processor Allocation Strategies

• Increase Reconfigurable Areas from 1 to 8

• Implement Adaptable Architecture for Video Processing Functions Discrete Cosine Transform (DCT) Motion Estimation

• Video functions are sufficiently different in resources to require reconfiguration

Dynamic Processor Allocation Strategies

Location of 8 PEs on a V4SX device

Dynamic Processor Allocation Strategies

Slices within Area

(Slice Utilization)

Bitstream

Filesize in bytes

PE0 320 (94.38%) 22,306

PE1 384 (95.05%) 27,794

PE2 384 (84.38%) 28,306

PE3 384 (92.97%) 28,158

PE4 320 (91.25%) 22,306

PE5 384 (88.54%) 27,354

PE6 384 (87.76%) 27,618

PE7 384 (95.57%) 27,654

Dynamic Processor Allocation Strategies

Dynamic Processor Allocation Strategy

• Benefits of Partial Reconfiguration Reconfiguration: time-multiplex between functions

(extend the number of available resources with time)

Partial: module granularity reduced Unchanged portion of FPGA is not affected by configuration Smaller bitstream filesize

Smaller reconfiguration time Less storage requirements

Result: significantly more combinations of hardwarearrangements with similar storage requirements

Conclusion andFuture Work

• Evolvable Hardware Non-deterministic methods can repair faulty digital

circuits

Time required justified by ability to exploit faults

Increase complete repair occurrence rate 5-fold

Future Improvements make use of fault location optimize genetic algorithm parameters

Conclusion andFuture Work

• Partial Reconfiguration Newer partial reconfiguration flow allows rectangle-

sized areas Allows static resources to maximize FPGA area

Newer architecture allows: multiple rectangle-sized areas within one column of resources

reduced configuration granularity for modules

30% reduction in storage and configuration time

Conclusion andFuture Work

• Dynamic Processors Utilizes newer software design flow and newer

FPGA hardware architecture Storage reduced 55-fold Time reduced 8–160 fold

Benefits make reconfiguration possible for fast processes such as video functions

Time multiplexing may enable smaller FPGA devices to compete with larger devices not utilizing partial reconfiguration

Conclusion andFuture Work

• Future Work Develop self-contained partial reconfiguration

solution

Continue to challenge and improve reconfiguration process and hardware design

enable FPGAs to be standard hardware platform for evolvable/adaptable systems

Publication

HUANG, J., PARRIS, M., LEE, J. and DEMARA, R.F. 2008.

Scalable FPGA Architecture for DCT Computation using Dynamic Partial Reconfiguration.

accepted to International Conference on Engineering of Reconfigurable Systems and Algorithms.

Previous Work

Spare Resources: Sub-PLB Spares

Previous Work

Offline Recovery: Incremental Rerouting

Previous Work

Online Recovery: Online BIST

Evolvable Hardware Optimization Strategies

Evolvable Hardware Optimization Strategies

Evolvable Hardware Optimization Strategies

Evolvable Hardware Optimization Strategies

Evolvable Hardware Optimization Strategies