Built-In Self-Test of DSPs in Virtex-4 FPGAsstrouce/class/elec6970/DSPBIST.pdf · 9Application to...
Transcript of Built-In Self-Test of DSPs in Virtex-4 FPGAsstrouce/class/elec6970/DSPBIST.pdf · 9Application to...
BuiltBuilt--In SelfIn Self--Test of Test of DSPsDSPsin Virtexin Virtex--4 FPGAs4 FPGAs
Charles StroudCharles StroudDept. of Electrical & Computer EngineeringDept. of Electrical & Computer Engineering
Auburn UniversityAuburn University
(Funded by NSA)
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 22
Outline of PresentationOutline of PresentationHistory of DSP Architectures in History of DSP Architectures in FPGAsFPGAs
Overview of VirtexOverview of Virtex--4 DSP4 DSPPrior Testing R&D vs. Our Analysis for:Prior Testing R&D vs. Our Analysis for:
Literature on DSP test not applicableLiterature on DSP test not applicableNo papers published on No papers published on DSPsDSPs in in FPGAsFPGAs
Literature on Multipliers and AddersLiterature on Multipliers and AddersApplication to VirtexApplication to Virtex--4 DSPs4 DSPs
BIST for DSPs in VirtexBIST for DSPs in Virtex--44Architecture, Operation, and ImplementationArchitecture, Operation, and ImplementationTiming and Fault Injection AnalysisTiming and Fault Injection Analysis
Summary and ConclusionsSummary and ConclusionsPlans for application to VirtexPlans for application to Virtex--55
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 33
Xilinx FPGA ArchitecturesXilinx FPGA Architectures4000/Spartan4000/Spartan
NNxxNN array of unit cellsarray of unit cellsUnit cell = CLB + routingUnit cell = CLB + routingFast carry logic in CLBs for addersFast carry logic in CLBs for adders
Virtex/SpartanVirtex/Spartan--22MMxxNN array of unit cellsarray of unit cells
Carry logic + AND gate for array multipliersCarry logic + AND gate for array multipliers4K block 4K block RAMsRAMs at edgesat edges
VirtexVirtex--2/Spartan2/Spartan--3318K block 18K block RAMsRAMs in arrayin array18x1818x18--bit multipliers with each RAMbit multipliers with each RAM
““based on modified Booth architecturebased on modified Booth architecture””
VirtexVirtex--4/Virtex4/Virtex--55Added 48Added 48--bit DSP cores w/multipliersbit DSP cores w/multipliers
Altera includes 9x9 multipliersAltera includes 9x9 multipliers““based on modified Booth architecturebased on modified Booth architecture””
PC PC
PC
PC
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 44
VirtexVirtex--4 DSP Architecture4 DSP Architecture2 DSP slices per tile2 DSP slices per tile
1616--256 tiles in 1256 tiles in 1--8 8 columnscolumns
Each DSP includes:3-input, 48-bit adder/subtractor
P = ZP = Z±±(X+Y+Cin)(X+Y+Cin)Optional accum reg
18x18-bit 2's-comp multiplier (w/o adder)User controlled User controlled operational modesoperational modes
For X, Y, & Z MUXsFor X, Y, & Z MUXsConfiguration bits Configuration bits control other MUXscontrol other MUXs
Pipelining registersPipelining registersAccumulator registerAccumulator register
×
×
±
±
X
Y
Z
X
Y
Z
C(48)
A(18)B(18)
A(18)B(18)
P(48)
P(48)
Inputs for cascading
Inputs for cascadingOutputs w/ dedicated routing
Outputs w/ dedicated routing
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 55
Multiplier and Adder ArchitecturesMultiplier and Adder ArchitecturesTest algorithm depends Test algorithm depends on architectureon architecture
But architecture is not But architecture is not specified in data sheetsspecified in data sheets
Eliminate sequential logic Eliminate sequential logic architecturesarchitectures““Based on modified BoothBased on modified Booth””
Adder Adder choices include:choices include:Ripple carryRipple carryCarry selectCarry selectCarry saveCarry saveCarryCarry--looklook--ahead (CLA)ahead (CLA)
Our assumption based on Our assumption based on area/performance analysisarea/performance analysisBut multiple types of CLABut multiple types of CLA
Multiplier choices include:Multiplier choices include:ArrayArrayBoothBoothModified BoothModified BoothWallace treeWallace treeModified Booth/Wallace Modified Booth/Wallace treetree
Our assumption based on Our assumption based on area/performance analysisarea/performance analysis
Our goal: find/develop Our goal: find/develop architecture independent architecture independent test test algorithm(salgorithm(s) )
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 66
Array Multiplier Array Multiplier TestTest AlgorithmAlgorithmKalyana KantipudiKalyana Kantipudi’’s MS thesiss MS thesis
10 vectors give 100% fault coverage for C628810 vectors give 100% fault coverage for C6288a 16x16a 16x16--bit array multiplierbit array multiplier
18x1818x18--bit array multiplier resultsbit array multiplier resultsOnly achieved Only achieved ≈≈ 95% fault coverage95% fault coverage
Pattern expansion required for 16x16Pattern expansion required for 16x16--bit to 18x18bit to 18x18--bitbitPotential for mistakes Potential for mistakes ifif patterns not expanded properlypatterns not expanded properly
Modified Booth multiplier resultsModified Booth multiplier results≈≈ 62% with carry62% with carry--save addersave adder≈≈ 37% with CLA37% with CLA
ConclusionConclusion: array multiplier test vectors do not : array multiplier test vectors do not adequately test modified Booth multiplieradequately test modified Booth multiplier
Chris EricksonChris Erickson’’ssResultsResults
Note differenceNote differencein FC wrt adderin FC wrt adderimplementationimplementation
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 77
Modified Booth Test AlgorithmsModified Booth Test AlgorithmsTwo test algorithms using 8Two test algorithms using 8--bit counter bit counter (256 vectors)(256 vectors)
““Low Power BIST for Wallace TreeLow Power BIST for Wallace Tree--based Fast Multipliersbased Fast Multipliers””Bakalis, Kalligeros, Nikolos, Vergos & AlexiouBakalis, Kalligeros, Nikolos, Vergos & Alexiou
Proc. Int. Symp. on Quality of Electronic Design,Proc. Int. Symp. on Quality of Electronic Design,pp. 433pp. 433--438, 2000438, 2000
5x3 connections with 5 inputs to Booth encoding5x3 connections with 5 inputs to Booth encodingBut which side is Booth encoding?But which side is Booth encoding?Our approach: run both 5x3 and 3x5 algorithmsOur approach: run both 5x3 and 3x5 algorithms
““Effective BuiltEffective Built--In SelfIn Self--Test for Booth MultipliersTest for Booth Multipliers””Gizopoulos, Paschalis & ZorianGizopoulos, Paschalis & Zorian
IEEE Design & Test of ComputersIEEE Design & Test of Computerspp. 105pp. 105--111, 1998111, 1998
4x4 connections to multiplier inputs4x4 connections to multiplier inputsOur approach: also include 4x4 if fault coverage improvesOur approach: also include 4x4 if fault coverage improves
×nn
2n
Booth encoding
n×n multiplier
8-bit counterMSB LSB
4 4
4×4 algorithm
5 3
5×3 algorithm
3 5
3×5 algorithm
Algorithm used inAlgorithm used inSrinivasSrinivas GarimellaGarimella’’ss
MS thesis forMS thesis forVirtexVirtex--2 multipliers2 multipliers
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 88
4x4 Booth Multiplier 4x4 Booth Multiplier TestTest AlgorithmAlgorithm18x1818x18--bit array multiplier resultsbit array multiplier results
≈≈ 99.99% (99.99% (1 undetected fault1 undetected fault))Booth multiplier resultsBooth multiplier results
≈≈ 90%90% with ripplewith ripple--carry addercarry adder≈≈ 90%90% with carrywith carry--save addersave adder≈≈ 70%70% with CLAwith CLA
ConclusionConclusion: modified Booth multiplier test : modified Booth multiplier test vectors do test array multipliervectors do test array multiplier
But ModifiedBut Modified--Booth/WallaceBooth/Wallace--Tree appears to Tree appears to be most likely candidate for Virtexbe most likely candidate for Virtex--4 DSP 4 DSP multiplier implementationmultiplier implementation
Also for VirtexAlso for Virtex--5 and 5 and AlteraAltera
Chris EricksonChris Erickson’’ssResultsResults
Note differenceNote differencein FC wrt adderin FC wrt adderimplementationimplementation
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 99
Other Multiplier ResultsOther Multiplier Results4x44x4--bit implementationsbit implementationsExhaustive test patternsExhaustive test patterns
Undetected faults are undetectableUndetected faults are undetectableSame as 4x4, 5x3, & 3x5 algorithm for 4x4Same as 4x4, 5x3, & 3x5 algorithm for 4x4--bit multiplierbit multiplier
Simulation results discrepancy for array multiplierSimulation results discrepancy for array multiplier4 undetected faults in 4x44 undetected faults in 4x4--bit implementationbit implementation1 undetected fault in 18x18 multiplier w/ 4x4 algorithm 1 undetected fault in 18x18 multiplier w/ 4x4 algorithm in in Chris EricksonChris Erickson’’ss resultsresults
280280320320268268
# detected# detected
283283337337272272
# faults# faults
98.9%98.9%33Wallace TreeWallace Tree95.0%95.0%1717Signed ArraySigned Array98.5%98.5%44ArrayArray
FCFC# # undetectundetectMultiplierMultiplier
Chitanya BandiChitanya Bandi’’s Resultss Results
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1010
8×8 Modified-Booth/Wallace-Tree
Fault simulation results:Fault simulation results:5x3 plus 3x5 give best fault coverage5x3 plus 3x5 give best fault coverage
No additional faults detected with 4x4No additional faults detected with 4x4
99.4%99.4%86.2%3222,0194×4 & 3×599.9%99.9%86.7%3122,0295×3 & 3×5
99.2%86.1%3262,015512
4×4 & 5×399.2%99.2%86.1%3262,0153×599.0%99.0%85.9%3302,0115×398.7%85.6%3362,005
2564×4
100%86.8%3102,031
2,341
65,536Exhaustive
With reduction
99.0%72.8%9252,4772564×4100%74.1%8822,520
3,40265,536ExhaustiveNo
reduction
Effective FC
Fault Coverage
Not Detected
Faults Detected
Total Faults
# Vectors
Test Algorithm
MultiplierVersion
Chitanya BandiChitanya Bandi’’s Resultss Results(note: used ripple carry (note: used ripple carry
adder to sumadder to sumpartial products)partial products)
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1111
CarryCarry--LookLook--Ahead AdderAhead AdderRecall CLA was Recall CLA was more difficult to testmore difficult to testBasic CLA is 4Basic CLA is 4--bitsbits
44--bit CLAs then bit CLAs then combined to form combined to form larger adderslarger adders
Ripple CLAsRipple CLAs2 types based on 2 types based on Lookahead Carry Lookahead Carry Unit (LCU):Unit (LCU):
Ripple LCURipple LCUMultiMulti--stage LCUstage LCU
C1=G0+P0•C0C2=G1+G0•P1+P1•P0•C0C3=G2+G1•P2+G0•P1•P2+P2•P1•P0•C0C4=G3+G2•P3+G1•P2•P3+G0•P1•P2•P3+P3•P2•P1•P0•C0
Gi=Ai•BiPi=Ai+Bi
FullAdder
A3 B3
S3
FullAdder
A2 B2
S2
FullAdder
A1 B1
S1
FullAdder
A0 B0
S0
P3G3 C3 P2G2 C2 P1G1 C1 P0G0
4-bit Carry Look Ahead PG GG
C0
C4
PG=P0•P1•P2•P3GG=G3+G2•P3+G1•P2•P3+G0•P1•P2•P3
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1212
CLA Test AlgorithmsCLA Test Algorithms““On the Adders with Minimum TestsOn the Adders with Minimum Tests””
Kajihara and SasaoKajihara and SasaoProc. VLSI Test Symp, pp. 10Proc. VLSI Test Symp, pp. 10--15, 1997 (VTS15, 1997 (VTS’’97)97)
10 vectors detect all single and multiple faults10 vectors detect all single and multiple faultsIn any size In any size rippleripple CLA (CLA (not an LCU implementationnot an LCU implementation))
““Scalable Test Generators for HighScalable Test Generators for High--Speed Speed Datapath CircuitsDatapath Circuits””
AlAl--Asaad, Hayes, and MurrayAsaad, Hayes, and MurrayJ. Electronic Testing, vol 12, pp. 111J. Electronic Testing, vol 12, pp. 111--125, 1998 (JETTA125, 1998 (JETTA’’98)98)
22××((NN+1) vector sequence (for an +1) vector sequence (for an NN--bit adder)bit adder)TPG implementation requires:TPG implementation requires:
NN+1+1--bit shift registerbit shift registerNN XOR gates, XOR gates, NN XNOR gates, and 1 inverterXNOR gates, and 1 inverter
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1313
Qi Qi+1
to CLAcarry-in
Ai Bi
N+1-bit Serial Shift Register
CLA BIST SchemeCLA BIST SchemeEasy BIST circuit to implement
But we found a problem in design2 missing patterns needed for 100% FC
Replace inverter with flip-flop2×(N+2) vector sequence
1111111110000000000111111111000000000111111111000000000011111111010000000011111111011000000011111111011100000011111111011110000011111111011111000011111111011111100011111111011111110011111111011111111011111111100000000011111111110000000001111111110000000001111111111000000001011111111000000001001111111000000001000111111000000001000011111000000001000001111000000001000000111000000001000000011000000001000000001000000000
Ai Bi Cin
reset
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1414
Fault Simulation ResultsFault Simulation ResultsJETTAJETTA’’98 approach gives best overall fault coverage 98 approach gives best overall fault coverage regardless of adder implementationregardless of adder implementation
Undetected faults in JETTAUndetected faults in JETTA’’98 approach can be detected98 approach can be detectedResults in Results in ““New BISTNew BIST”” column for column for 2×(N+2) vector sequencevector sequence
JETTAJETTA’’98 also claims similar BIST approach for 98 also claims similar BIST approach for ModifiedModified--Booth multiplierBooth multiplier
But description of test algorithm is very sketchyBut description of test algorithm is very sketchy
100%99.9%95.7%154212Ripple LCU
Test AlgorithmNew BISTJETTA’98VTS’97
#Faults
GateDelays
48-bit CLA AdderImplementation
100%99.9%95.9%150610Multi-stage LCU
100%99.9%100%139228Ripple CLA
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1515
Adder in VirtexAdder in Virtex--4 DSP4 DSPAdder has 3 input portsAdder has 3 input ports
P = ZP = Z±±(X+Y+Cin)(X+Y+Cin)We interpret this as a 2We interpret this as a 2--stage CLA stage CLA adder/subtractor implementationadder/subtractor implementation
Apply test patterns to each stage in turnApply test patterns to each stage in turn2 clock cycles2 clock cyclesper vectorper vectorOPMODEOPMODEcontrolcontrol
48-bit CLA
48-bit CLA
(X MUX)A port
(Y MUX)B port
(Z MUX)C port CIN
Subtract
Clock cycle #1Clock cycle #1X test vectorX test vectorClock cycle #2Clock cycle #2Y test vectorY test vectorClock cycle #2Clock cycle #2Z test vectorZ test vector
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1616
DSP BIST Modes & SequencesDSP BIST Modes & SequencesTest pattern sequenceTest pattern sequence
Four groups of 256 clock cycles (ccs) eachFour groups of 256 clock cycles (ccs) eachAllows control of operational modes (OPMODEs) of DSPAllows control of operational modes (OPMODEs) of DSP
Test mode controlled by 4Test mode controlled by 4--bit shift registerbit shift registerBits include: Test Mode (2), Invert Control Signals, ResetBits include: Test Mode (2), Invert Control Signals, ResetContents loaded via Boundary Scan interfaceContents loaded via Boundary Scan interface
Reduces the number of downloads to FPGAReduces the number of downloads to FPGA
Pseudo-Random Control SignalsConstant Control Signals
P1 = Z(C)P0=A:B+Z(ShiftPC)
P1 = Z(C)P0=A:B+Z(PC)
P1=A:B+Z(ShiftPC)P0 = Z(C)
P1 = A:B+Z(PC)P0 = Z(C)10 (cascade)
P = Y(C)P=Y(C)+Z(ShiftP)
P = Z(C)P=Y(C)+Z(P)
P = Y(C)P=Y(C)+Z(P)
P = Z(C)P=X(P)+Y(C)
01 (adder)Preg=1 only
P = A:B+CP = A×B+CP = A×BP = A×B00 (multiply)
Fourth 256 ccsThird 256 ccsSecond 256 ccsFirst 256 ccsMode (Test)
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1717
BIST ArchitectureBIST Architecture2 2 TPGsTPGs drive alternate rows drive alternate rows of of DSPsDSPs tilestiles
TPG drives both TPG drives both DSPsDSPs in tilein tilePrevents faulty TPG from Prevents faulty TPG from escaping detectionescaping detection
DSPsDSPs driven by different driven by different TPGsTPGs compared by compared by ORAsORAs
Like Like DSPsDSPs comparedcomparedSlice 0 compared to slice 0Slice 0 compared to slice 0Slice 1 compared to slice 1Slice 1 compared to slice 1
Top Top DSPsDSPs compared to compared to bottom bottom DSPsDSPs in circular in circular comparisoncomparison
TPG0
TPG1
DSP s0
DSP s1
DSP s0
DSP s1
DSP s0
DSP s1
DSP s0
DSP s1
DSP s0
DSP s1
DSP s0
DSP s1
ORAs
ORAs
ORAs
ORAs
ORAs
ORAs
ORAs
ORAs
ORAs
ORAs
ORAs
ORAs
BSCANshift reg
test mode
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1818
TPG ArchitectureTPG ArchitectureCounter Counter ⇒⇒ 55×3 and 33 and 3×5 multiplier test to ports A&B 5 multiplier test to ports A&B Shift register Shift register ⇒⇒ 2×(N+2) vector adder test to port Cvector adder test to port CFSM FSM ⇒⇒ OPMODE control for 4 group sequencesOPMODE control for 4 group sequencesLFSR LFSR ⇒⇒ pseudopseudo--random patterns to other control random patterns to other control inputs during last two groups of 256 clock cyclesinputs during last two groups of 256 clock cycles
A portB port
DSP slice 0P port
C port
OPMODEcontrol
toORAs
36
48
7
32
48
TPG
Counter
ShiftRegister
LFSR
FSM
36
48
7
48to
ORAs
A portB port
DSP slice 1P port
C port
OPMODEcontrol
32
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1919
ORA ImplementationORA ImplementationOld comparisonOld comparison--based ORAbased ORA
Logic 1 latched in FF due to mismatchesLogic 1 latched in FF due to mismatchesConfiguration memory readback used to get resultsConfiguration memory readback used to get results
CLBs have dedicated carry chain for fast adders CLBs have dedicated carry chain for fast adders and countersand counters
New ORA latches logic 0 due to mismatchNew ORA latches logic 0 due to mismatchCarry chain performs iterative OR functionCarry chain performs iterative OR functionSingle pass/failSingle pass/failindication at end ofindication at end ofBIST sequenceBIST sequenceOnly read configuration memory to get failing results Only read configuration memory to get failing results for diagnosisfor diagnosis
LUT
DSPi outputkDSPj outputk
LUT
DSPi outputkDSPj outputk
0 1
carry-in
carry-out
1
O O OTDI TDO
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2020
BIST ConfigurationsBIST Configurations5 downloads to FPGA5 downloads to FPGA
1 compressed download (<50% of full config)1 compressed download (<50% of full config)+ 4 partial reconfigurations (<0.5% of full config)+ 4 partial reconfigurations (<0.5% of full config)
only change DPS configuration bitsonly change DPS configuration bits
7 BIST sequences7 BIST sequencesBIST configurations #2 & #3 ran twiceBIST configurations #2 & #3 ran twice
different control register values for multiplier/adder test algodifferent control register values for multiplier/adder test algorithmsrithms
Yes (7)NoNoDirectCascadeLowAll Regs=15Yes (6)NoNoCascadeDirectHighAll Regs=14
NoYes (5)Yes (4)DirectDirectLowA&Breg=2Other Regs=13
NoYes (3)Yes (2)DirectDirectHighAll Regs=12NoNoYes (1)DirectDirectHighAll Regs=01
CascadeAdderMultiplySlice1Slice0Test Modes AppliedB Input SourceSignals
Active Level PipelineRegisters
BISTConfig
bottom row failures due to unconnected cascade inputs
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2121
Cascade Mode TestingCascade Mode Testing
One slice from pair One slice from pair put in cascade put in cascade mode at a timemode at a time
Circular comparison Circular comparison of slices sees of slices sees identical behavioridentical behavior
Cascade inputs to Cascade inputs to bottom DSP are not bottom DSP are not connectedconnected
Expected failures in Expected failures in ORAs comparing ORAs comparing that DSPthat DSP’’s outputs s outputs
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2222
DSP BIST ImplementationsDSP BIST ImplementationsCircular Circular comparison per comparison per DSP columnDSP column
Each slice in tile Each slice in tile compared with its compared with its counterpartcounterpart
slice0slice0--toto--slice0slice0slice1slice1--toto--slice1slice1
CLB carry chain CLB carry chain used to provide used to provide pass/fail indicationpass/fail indication
Only read config Only read config memory contents memory contents to get results for to get results for diagnosisdiagnosis
TDI
BSCAN
TDO
TPG0 TPG1ORAs
DSPs
SX25SX25
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2323
Automated BIST ConfigurationsAutomated BIST ConfigurationsC program generates C program generates .XDL file.XDL file.XDL to .NCD.XDL to .NCD
xdl xdl ––xdl2ncd bist.ncdxdl2ncd bist.ncdFPGA EditorFPGA Editor
Design Rule CheckDesign Rule CheckRoute designRoute design
.NCD to .BIT.NCD to .BITBitGenBitGenDownload into FPGADownload into FPGA
.NCD to .XDL.NCD to .XDLModification program for Modification program for generating remaining 4 generating remaining 4 BIST configurationsBIST configurations
FPGA EditorFPGA Editor
BIST ProgramsBIST Programs
BitGen.exeBitGen.exe
BIT fileBIT file
XDL fileXDL file
NCD fileNCD file
XDL.exeXDL.exe
download
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2424
DSP BIST ImplementationsDSP BIST Implementations
ORAs DSPs
TPG0
TPG1
ORAsDSPs
TPG0
TPG1
PowerPowerPCPC
LX15
FX12
Brad DuttonBrad DuttonGenerated BIST Generated BIST configurations forconfigurations forall Virtexall Virtex--4 4 FPGAsFPGAsand verified BISTand verified BISTon LX25, LX60,on LX25, LX60,SX35, & FX12SX35, & FX12
via download andvia download andexecutionexecution
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2525
BIST Timing AnalysisBIST Timing Analysis
0
30
60
90
120
150
Config 1 Config 2 Config 3 Config 4 Config 5
Max
imum
CLo
ck F
requ
ency
(MH
z)
Bogus timing analysis by Xilinx tools
due to unused cascade path
with no pipeline registers
David BaumannDavid Baumann’’ssresultsresults
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2626
BIST Timing AnalysisBIST Timing Analysis
0
10
20
30
40
50
60
70
80
FX12 FX25 FX40 FX60 FX100 SX25 SX35 SX55 LX15 LX25 LX40 LX60 LX80 LX100
Max
imum
Clo
ck F
requ
ency
(MH
z)
Based on configuration #332
48
64 64
80
96
3232
48
128
160
128192
512
FFmaxmax function offunction of##DSPsDSPs & size of array& size of array
4 4 TPGsTPGs mightmightimprove improve FmaxFmax
1
1
1 1
1
1
1 1
1
2
2
4
4
8
#DSPs#DSP columns
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2727
110111010000100010110101System or BIST configuration fileSystem or BIST configuration file
FPGAFPGA
Physical Fault InjectionPhysical Fault InjectionFaulty FPGAs are difficult to findFaulty FPGAs are difficult to find
1 ORCA with faulty PLB & 2 ORCAs with faulty routing1 ORCA with faulty PLB & 2 ORCAs with faulty routingPhysical fault insertionPhysical fault insertion
Etch package down to bare die and Etch package down to bare die and ““zapzap””We use fault injection emulationWe use fault injection emulation
Modify configuration bits before or after download (RMW)Modify configuration bits before or after download (RMW)Can inject single and/or multiple faultsCan inject single and/or multiple faults
StuckStuck--at faults & bridging faultsat faults & bridging faultsFaults limited effects of configuration bitsFaults limited effects of configuration bits
011001101110011001000000StuckStuck--at valuesat values
000000001100000000110000Fault maskFault mask
110111011100100010000101Download fileDownload file 1101 1101
1100100100000101faultsfaults
Mustfa AliMustfa Ali’’s works work
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2828
0
1
2
3
4
5
6C
inib
Cse
l0ib
Cse
l1ib
Sub
ibO
p0ib
Op1
ibO
p2ib
Op3
ibO
p4ib
Op5
ibO
p6ib
Cea
ibC
ebib
Cem
ibC
epib
Cec
crtli
bC
ecin
subi
bC
ecin
ibR
stai
bR
stbi
bR
stm
ibR
stpi
bR
stct
libR
stci
nib
Are
g0b
Are
g2b
Bre
g0b
Bre
g2b
Mre
g0b
Pre
g0b
Cin
reg0
bC
selre
g0b
Opr
eg0b
Sub
reg0
bC
lkib
Cas
cbC
reg0
bC
ecib
&t
nocf
gb
# B
IST
conf
igs
dete
ctin
g fa
ult
stuck-at-0
stuck-at-1
Fault Injection Emulation ResultsFault Injection Emulation Results1) Download BIST configuration1) Download BIST configuration2) Manipulate configuration bit via read2) Manipulate configuration bit via read--modifymodify--writewrite3) Run BIST sequence3) Run BIST sequence4) Get BIST results4) Get BIST results
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 2929
SummarySummaryInvestigated known test algorithms for Investigated known test algorithms for multipliers and addersmultipliers and addersLooked for architecture independent tests Looked for architecture independent tests with highest fault coveragewith highest fault coverageJETTAJETTA’’98 approach easy to implement98 approach easy to implement
Needs modification for 100% FCNeeds modification for 100% FC7 DSP BIST sequences with 5 downloads7 DSP BIST sequences with 5 downloads
New ORA eliminates config memory readbackNew ORA eliminates config memory readbackTotal testing time < 52% of 1 full downloadTotal testing time < 52% of 1 full download
Using compressed and partial reconfigurationUsing compressed and partial reconfigurationOnly DSP configuration bits need to be changedOnly DSP configuration bits need to be changed
Application to VirtexApplication to Virtex--5 DSPs5 DSPs
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 3030
BIST Approach for VirtexBIST Approach for Virtex--5 DSP5 DSP
Larger multiplier butsame test algorithm
Logical operations but48-bit cascade of A:Ballows direct testing
Pattern detect but knownalgorithm for = comparator
Optional regs like V4 butdata sheets have less info