Using Embedded FPGAs for SoC Yield Improvement

download Using Embedded FPGAs for SoC Yield Improvement

of 12

Transcript of Using Embedded FPGAs for SoC Yield Improvement

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    1/12

    Abstract:In this paper we show that an embedded FPGAcore is an ideal host to implement infrastructure IP for yieldimprovement in a bus-based SoC. We present methods fortesting, diagnosing, and repairing embedded FPGAs, forwhich complete testability is achieved without any area over-head or performance degradation. We show how an FPGAcore can provide embedded testers for other cores in the SoC,so that cores designed to be tested with external vectors canbe tested with BIST, and the entire SoC can be tested with alow-cost tester.

    Categories and Subject Descriptors:B.8.1 [Performanceand Reliability]: Reliability, Testing, and Fault-Tolerance

    General Terms:Algorithms, Design, Reliability

    1. IntroductionReusing embedded IP cores in System-on-Chip (SoC)design has become the primary means of improving the pro-ductivity of designers faced with very large and complexcircuits. One problem in manufacturing a SoC with millionsof transistors using deep-submicron technologies (0.13mand below), is an increase in the probability of defects in sil-icon, which results in decreasing manufacturing yield. Theeconomic consequences of low yield for expensive SoCdevices can be disastrous. To effectively deal with thisincreased defect density, we need efficient methods for faultdetection, location, and fault tolerance implemented on-chip.Typically, such methods are implemented by embedded IPblocks referred to asinfrastructure IP. In this paper we will

    show that an embedded FPGA core is an ideal vehicle forinfrastructure IP.

    But an embedded FPGA is also a very useful functionalSoC component. A SoC design where at least part of theuser-defined logic is implemented in an embedded FPGAprovides the following benefits:1) Usually fixing logic design errors in an ASIC requires a

    respin to change a set of manufacturing masks. The costof a respin is around $1M, and it also causes a 2-to-6months hit in the time-to-market. In contrast, a fix that canbe done in the FPGA can be completed in one day.

    2) The embedded FPGA may be reused for different func-tions at different times. For example, a cell phone may

    work with different protocols (CDMA, GSM, etc.) in dif-ferent geographical areas, and the logic implementing eachprotocol can be downloaded in the FPGA on a demandbasis.

    3) Although FPGA clock speeds are slower than ASIC clockspeeds, the massive parallelism of an FPGA can beexploited to implement many algorithms (such as imageprocessing, signal processing, cryptography, etc.) at higherperformance levels than on Ps or DSPs. Usually suchFPGA implementations consume less power than P orDSP implementations, because they can be run at lowerclock frequency.

    4) The same SoC can be used as a platform for differentproducts, or for different versions of the same product,

    which are created by different FPGA configurations.5) A good strategy is to use the embedded FPGA to imple-

    ment protocols and algorithms likely to change in thefuture (for example, hardware support for a standard notyet officially adopted). In this way, future changes will beeasy to implement without changing the SoC.

    6) The embedded FPGA can support remote and Inter-net-based field-upgrades, which may be very importantfeatures for system manufacturers.

    2. Using an FPGA Core for Infrastructure IPIn addition to the functional features described above, an

    embedded FPGA can also provide great benefits in imple-menting infrastructure IP. SoC yield-improvement requires

    detecting faults, locating them, and repairing them so that theSoC will work correctly in their presence. Methods fordetecting, locating, and repairing faults in RAMs aredescribed in [9].The problems we address in this paper aredetecting, locating, and repairing faults in an embeddedFPGA; some of our previous work dealt with the same prob-lems in the context of on-line testing [4], and some of thesolutions developed for on-line testingcan be directly appliedto, or adapted for, manufacturing testing of an embeddedFPGA. Another contribution of this paper is novel ways touse an embedded FPGA to test other cores in the SoC.

    We assume a bus-based SoC architecture, where theembedded cores are connected by a common system bus. The

    FPGA core can become the bus master. The SoC is tested byautomatic test equipment (ATE) for manufacturing testing, orby a maintenance processor for in-system testing. In eithercase, the tester also controls the configuration process for theembedded FPGA core via the boundary-scan access mecha-nism of the SoC [22]; the same mechanism can be used bothfor manufacturing test and for in-system test. Various config-urations for the FPGA are stored on disk. We will refer to theexternal tester astest and reconfiguration controller(TREC).If BIST is used within the SoC, TREC initiates the internal

    Permission to make digital or hard copies of all or part of this work forpersonalor classroomuse is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that cop-ies bear this notice and the full citation on the first page. To copyotherwise, or republish, to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.

    DAC 2002, June 10-14, 2002, New Orleans, Louisiana, USA.

    Copyright 2002 ACM 1-58113-461-4/02/0006...$5.00.

    This material is based upon work supported in part by the DARPA ACSprogram under contract F33615-98-C-1318, and by LucentTechnologies.

    Using Embedded FPGAs for SoC Yield ImprovementMiron Abramovici Charles Stroud Marty Emmert

    Agere Systems University of North Carolina Wright State University

    Murray Hill, NJ 07974 Charlotte, NC 28223 Dayton, OH 45435

    [email protected] [email protected] [email protected]

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    2/12

    BIST controller and processes the results. TREC also per-forms diagnosis and fault tolerance functions that will bedescribed in the following sections. Later we will show thatthe embedded FPGA allows TREC to be a low-cost tester(possibly PC-based).

    Our basic principle for using reconfigurable logic for infra-structure IP is create the infrastructure only when needed.Hence instead of having the infrastructure for test alwayspresent in the circuit, we download it in the embedded FPGAonly when we want to test the SoC. In most cases, this infra-structure will be used only once for manufacturing testing, sohaving it permanently in the system is clearly wasteful (wecan recreate it for maintenance testing, if and when needed).Even if the FPGA implements user-defined system functionsduring normal operation, this logic is not needed when theSoC is under test and may be temporarily replaced by testlogic. This is a generalization of our free lunch FPGABIST [24]. Typically, logic BIST for ASICs introduces about20% area overhead and some performance degradation. Incontrast, BIST for FPGAs is a free lunch with no areaoverhead or delay penalties, since the entire FPGA is config-ured only for BIST, and all the BIST logic disappears whenthe FPGA is no longer under test.

    Our basic principle can have surprising consequences. Forexample, the proposed IEEE P1500 Standard for EmbeddedCore Test [16] specifies that embedded cores should providea wrapper for test purposes. An embedded FPGA core, how-ever, can safely ignore this requirement, since the wrappercan be downloaded in the FPGA only during the SoC test.

    Taking advantage of FPGA features, such as reconfig-urability and regular structure, allows FPGA testing toachieve features that are impossible for ASICs: for example,FPGAs can be tested pseudo-exhaustively [20] and faults canbe very precisely located. Pseudo-exhaustive testing results

    in practically complete fault coverage without the use ofcomputationally expensive ASIC test tools, such as fault sim-ulation and automatic test-pattern generation (ATPG).

    Two types of fault tolerance exist for FPGAs. The first,manufacturer-level fault-tolerance, relies on providing spareresources that can be used to replace the faulty ones. Forexample, some FPGAs have a spare column and a hardwaremechanism that allows any column to be replaced by an adja-cent one [8]; the replacement for each column is controlledby a fuse. After the faulty column is diagnosed, all columnsbetween the faulty one and the spare are shifted by oneposition, so that the faulty one is completely avoided, and thechange is invisible to the outside world. The costs of this

    approach - the additional area needed for the spare resourcesand the performance penalty introduced by the column selec-tion hardware - are paid by all chips, including thenon-defective ones.

    The second approach, user-level fault-tolerance, avoidsthese costs since it relies on the spares naturally available inan FPGA, where any application uses only a subset of theexisting resources. Knowing the user circuit to be imple-mented in the FPGA, and the exact location of the faultyresources, one can modify the existing implementation (map-

    ping, placement, routing) to replace the faulty resources withfault-free spares. For example, this approach allowed theTeramac custom computer [10] to be build from 864 FPGAs,out of which 75% were defective. Note that configurationdata for each faulty FPGA should be separately maintained.Unlike manufacturing-level fault tolerance, which is inde-pendent of the target circuit, user-level fault tolerance can

    ignore any fault that does not affect the circuit. User-levelfault tolerance requires testing and diagnosis of FPGAs to bedone by users, and this may be unacceptable for off-the-shelfFPGAs typically tested by their manufacturers.

    In contrast, an embedded FPGA core is tested only as partof an newly fabricated SoC, and the SoC designer is alsoresponsible for its testing. In this environment, user-levelfault-tolerance is clearly the right solution for yieldimprovement.

    The remainder of the paper is organized as follows. Section3 explains why a commonly used approach - testing theusers circuit - is not good for FPGA test. Section 4 presentstechniques for testing and diagnosing logic cell faults in an

    embedded FPGA. Section 5 describes techniques for detect-ing and locating faults in the programmable interconnectnetwork. Section 6 shows how to use an embedded FPGA tocreate the infrastructure for testing other cores. Section 7 pre-sents fault tolerance methods that allow the embedded FPGAto work correctly in the presence of the located faults. Section8 concludes the paper.

    3. A Wrong Approach: Testing Users CircuitMany FPGA design and test flows generate tests for the cir-

    cuit implemented in the FPGA using ATPG tools for ASICs.However, such an approach will not completely test theFPGA hardware. The first problem is testing of a configura-tion multiplexer (MUX), which is a commonly used

    hardware mechanism to select subcircuits for various modesof operation. A configuration MUX is controlled by configu-ration memory bits to select one input to be connected to itsoutput. In Figure 1a, assume that we set the configuration bitSto 0 to connect subcircuitC0toX. Then subcircuitC1dis-appears from the circuit model seen by the user. This iscorrect from a design viewpoint, because the value V1pro-duced by C1 can no longer affect the MUX output in thecurrent configuration. But from a testing viewpoint, in anytest for the MUX, weneed to set V0 andV1 to complementaryvalues. In general, for a MUX withkinputs, ifVis the valueof the selected input, all the otherk-1 inputs should be set tovalueV.

    The problem arises because FPGA CAD tools generate theconfiguration bitstream based on the user model, which willnever include the functionally inactive subcircuits (called

    0

    X

    V1S=0

    Figure 1. Configuration MUX

    1

    V0

    V1

    S=0

    V0

    X

    x

    s-a-1

    1

    0

    1a) b)

    C0

    C1

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    3/12

    invisible logic in [1]). Thus in Figure 1a, when S=0, V0 willbe set to both 0 and 1, butV1cannot change. Similarly, theuser logic cannot controlV0in any configuration whereS=1.Hence the testing of the MUX may not be complete. Forexample, the s-a-1 fault in the gate-level MUX model inFigure 1b is detected only whenS=0,V0=0, andV1=1. Butthis pattern may never be applied ifV1cannot be controlled

    whenS=0.In most previous work dealing with testing FPGAs, the

    problem of testing a configuration MUX is either notaddressed or it is solved functionally, by connecting everyinput in turn to the output, and providing both 0 and 1 valuesto the selected input. However, the invisible logic driving theinactive inputs is incorrectly ignored. Since an FPGA hastens of thousands of configuration MUX structures, such atest is likely to have a poor quality.

    Our solution relies on separately configuring the invisiblelogic so that it will generate the proper values needed for theinactive MUX inputs. Then we overlay the resulting con-figuration files over the main configuration file with the

    active logic, and we merge them without changing anyMUX setting done in the main configuration. This process isconceptually simple, but its implementation requires knowl-edge of the FPGA configuration stream structure.

    A second problem with testingonly the users circuit is thatspare resources are never tested, and when the need arises toreplace a faulty resource with a spare one, we cannot be surethat the spare is fault-free. Our approach is to test the entireFPGA and to construct tests that are independent of the appli-cations to be implemented in the device.

    4. Testing the Logic Cells4.1 BIST for Logic Cells

    The embedded FPGA is composed of an array of program-

    mable logic cells connected by programmable interconnectresources. Usually the logic and the interconnect of an FPGAare separately tested. The goal of logic tests is to detect anyfaults (single or multiple) affecting the operation of a cell,and also any combination of multiple faulty cells.

    Figure 2 illustrates our logic BIST architecture [1]. Someof the cells are configured astest-pattern generators(TPGs)andoutput-response analyzers(ORAs), while another groupof cells are configured asblocks under test(BUTs). The twoTPGs provide identical test patterns to alternating rows of

    identically configured BUTs. The outputs of the BUTs arecompared by the ORAs which latch any mismatches. The

    Pass/Failresult flip-flops of all ORAs are connected in a scanchain. The BUTs are then repeatedly reconfigured so thatthey are tested in all of their modes of operation. Each recon-figuration of the embedded FPGA to test a different cell modeof operation is referred to as a test phase. Atest sessionis a

    collection of test phases that completely test the BUTs in allof their modes of operation. The number of test phases is afunction of the cell architecture. Once the BUTs have beentested, the roles of the cells are reversed, so that in the nexttest session the previous BUTs become TPGs or ORAs, andvice versa. Figure 3 shows the cell functions in the two testsessions for an 88 FPGA. Since half of the cells are BUTsduring each test session, only two test sessions are needed totest all cells in the embedded FPGA.

    In every test phase, TREC performs the following steps:1) reconfigure the embedded FPGA with a BIST configura-tion retrieved from the disk, 2) initiate the BIST sequence,and 3) read thePass/Failresults from the ORAs. The down-loading of the BIST configurationand the control of the BISTprocess are done via the boundary-scan access mechanism ofthe SoC [22]. Since the test application time for any phase isdominated by the embedded FPGA reconfiguration time, an

    important goal of the BIST approach is to minimize the totalnumber of test configurations.

    Figure 4 illustrates the typical structure of a cell, consistingof a memory block that can function as a look-up table (LUT)or RAM, several flip-flops, and multiplexing output logic.The LUT/RAM block may also contain special-purpose logicfor arithmetic functions (counters, adders, multipliers, etc.)The RAM may be configured in various modes of operation(synchronous, asynchronous, single-port, dual-port, etc.).The flip-flops can also be configured as latches, and mayhave programmable clock-enable, preset/clear, and dataselector functions. Our TPG applies exhaustive patterns toevery subcircuit of a cell for each one of its modes of opera-tion, except for the RAM modes, where the memory block is

    checked with RAM March tests which cover most RAM-spe-cific faults. Exhaustive testing of every subcircuit is feasiblesince their number of inputs is reasonably small. This resultsin practically complete cell fault coveragewithout explicitfault model assumptions and without fault simulation.

    Figure 2. Logic BIST architecture

    ORA

    BUT

    TPG #1

    ORA

    BUT

    BUT

    BUT

    BUT

    ORA

    BUT

    BUT

    ORA

    BUT

    TPG #2

    Pass/Fail

    BIST Start

    BIST_Done

    Figure 3. Cell functions for the two test sessions

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    T

    B

    O

    B

    O

    B

    O

    B

    Figure 4. Typical cell structure

    LUT/RAM FFsOutput

    Logic

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    4/12

    Although there are situations that make detection of multi-ple faulty cells more complicated (a pair of compared BUTshave equivalent faults and do not cause a mismatch; a faultyORA does not detect a mismatch; a faulty TPG skips patternsthat detect a faulty BUT),almost any combination of multiple

    faulty cells is guaranteed to be detectedby our two test ses-sions. We have shown that the conditions that would allow a

    group of faulty cells to escape detection are extremely restric-tive, so that they are very unlikely to occur in practice [1].

    4.2 Logic Cell DiagnosisFigure 5 illustrates a test session where two faulty BUTs

    (denoted by black squares) are detected. Errors are observedat the ORAs bordering the faulty cells (shaded squares).Clearly, the set of failing ORAs is unique for every faultyBUT, and therefore it can be used to locate single faulty cells.(As a consistency check, the two failing ORAs observing thesame BUT must fail in the same phases [1].)

    Actually in Figure 5 we are deal-ing with a group of two faulty cells,but because they are observed at dis-

    joint ORAs, they do not interact.Locating BUTs that are observed atcommon ORAs is more difficult, butwe have developed diagnostic proce-dures [1] that can locate almost anycombination of multiple faulty cellslikely to occur in practice. These pro-cedures rely on analyzing the sets of failing test phasesobtained at different ORAs, and by applying additional BISTconfigurations when needed.

    In addition to diagnosing the faulty cells, we can alsoidentify the faulty subcircuits (LUTs, flip-flops) or the defec-tive modes of operation (e.g., count-up, multiply) within afaulty cell [2].

    Although our tests are application-independent, we cantake advantage of knowing the circuits to be implemented inthe embedded FPGA. Specifically, TREC may avoid apply-ing additional diagnostic configurations when the suspectedcells are not used in any of the target circuits. However, accu-rate diagnosis of unused cells may be required later to allowthese cells to replace other defective cells for fault tolerance.

    If a configuration memory bit that controls a resource ina logic cell is stuck, this fault is detected by the tests for thecorresponding resource. However, the faulty resource and thestuck fault in its controlling configuration bit cannot be dis-tinguished, unless the configuration memory has a readbackfeature that can detect the stuck bit.

    5. Testing the Interconnect Network5.1 BIST for Interconnect

    The programmable interconnect network consists of wiresegments of different lengths that can be connected via pro-grammable switches referred to asconfigurable interconnect

    points(CIPs). Wire segments connecting non-adjacent cellsform global routing resources, while local routing resourcesconnect a cell to global routing resources or to adjacent cells.The basic CIP structure consists of a transmission gate con-

    trolled by a configuration memory bit (Figure 6a). There arethree types of CIPs which we refer to as the cross-point CIP(Figure 6b), thebreak-point CIP(Figure 6c), and themulti-

    plexer(MUX) CIP(Figure 6d) [18]. While a cross-point CIPconnects wire segments located in disjoint planes (a horizon-tal segment with a vertical one), a break-point CIP connectstwo wire segments in the same plane. The MUX CIP comes

    in two varieties: decoded and non-decoded. A decoded MUXCIP is a group of 2kcross-point CIPs sharing a common out-put wire and controlled bykconfiguration bits, such that theinput wire being addressed by the configuration bits is con-nected to the output wire. A non-decoded MUX CIP containsa configuration bit controlling each transmission gate; hereonly one of the configuration bits is active for any configura-tion. There is also acompound CIP(Figure 6e), which is acombination of four cross-point and two break-point CIPs,each separately controlled by a configuration bit [31].

    The fault model we consider is typical for interconnect:CIPs stuck-closed (stuck-on) and stuck-open (stuck-off),wires stuck at 0 or 1, open wires, and shorted wires. Astuck-closed CIP creates a short between its two wires. Weassume that no detailed layout information is available

    regarding adjacency relations between segments. Instead, weuse only rough physical data (available in data books) todetermine adjacent bunches of wires, where a bunch is agroup of wires thatmayhave pair-wise shorts, but not everywire is necessarily adjacent with every other wire in thebunch. This treatment makes our methodlayout-independent.

    Detecting a CIP stuck-open (closed) fault also detects thestuck-at-0 (1) fault in the configuration memory bit that con-trols that CIP. However, most previous work in FPGAinterconnect testinghas ignored the problem of shorts involv-ing configuration memory bits. This is understandable,because the configuration memory does not appear in anymodel available to users. Nevertheless, this is an important

    class of faults, since the configuration memory is physicallydistributed across the entire chip, so that each bit is close tothe resource it controls. Hence shorts between configurationbits (or the wires that transmit them) and the wire segmentsin the interconnect network are quite likely and cannot beignored. We will analyze the effect of such faults at the endof Section 5.

    Figure 7 illustrates the concepts of the interconnect BIST.We configure subsets of routing resources (wire segments

    Figure 5.

    T

    B

    O

    BO

    B

    O

    B

    T

    B

    O

    BO

    B

    O

    B

    T

    B

    O

    BO

    B

    O

    B

    T

    B

    O

    BO

    B

    O

    B

    T

    B

    O

    BO

    B

    O

    B

    T

    B

    O

    BO

    B

    O

    B

    T

    B

    O

    BO

    B

    O

    B

    Figure 6. Configurable interconnect points

    b) cross-point CIP

    c) break-point CIP

    wire A

    wire B

    wire Bwire A

    a) basic structure

    wire Bwire A

    configuration

    memory bit

    wire Bwire A

    output

    d) MUX CIP

    CB

    CBCB

    CB

    CB

    CB

    e) compoundCIP

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    5/12

    and CIPs) to form two groups ofwires under test(WUTs)that receive exhaustive test patterns from a TPG, while thevalues at the other end ofthe WUTs are compared byan ORA[26]. A WUT consists of several wire segments connected byclosed CIPs. To check local routing resources, WUTs mayalso go through cells configured to pass their inputs directlyto outputs. In Figure 7, the WUTs are shown by bold lines,

    and the activated (closed) CIPs are shown in gray, while theopen CIPs are white. To test the open CIPs for stuck-onfaults, when the TPG drives a value v on the WUTs (0 inFigure 7), v must be applied to the wire segments on theopposite side of the open CIPs.

    The TPG applies exhaustive patterns to the WUTs, all pos-sible 2n test vectors to every group of n WUTs. Forreasonably smalln, the application time for 2n test vectors isstill negligible compared to the reconfiguration time of theFPGA. The TPG also provides test patterns to the oppositesides of open CIPs that isolate the WUTs from the rest of theinterconnect.

    If the two compared sets of WUTs have identical (or equiv-alent) faults, then no mismatch will be detected at the ORA[15]. For example, each set of WUTs could have a different

    CIP stuck-open in itsi

    th

    wire. To overcome this problem, wetest every set of WUTs (at least) twice, each time being com-pared with a different set of WUTs [25].

    The basic routing BIST architecture shown in Figure 7 isimplemented withinself-testing areas(STARs), where eachSTAR spans two columns or two rows of the embeddedFPGA to test vertical or horizontal interconnect resources,respectively [4][5]. Figure 8 illustrates the structure for test-ing the horizontal interconnect. Testing within STARs isdone concurrently.

    While most of the interconnect resources are tested eitherin an horizontal or in a vertical STAR, testing the cross-pointCIPs connecting global horizontal and vertical busses

    requires both a vertical and an horizontal STAR, where thecross-point CIPs at the intersection of the STARs are undertest (Figure 9).

    5.2 Interconnect Fault Diagnosis

    Our diagnosis procedure begins by analyzing the BISTresults to narrow the search area, and proceeds only in the

    failing STARs. We use an adaptive strategy, where the nextdiagnostic phase is determined based on the results obtainedso far. Figure 10 illustrates several diagnosis techniques (fail-ing ORAs are darkened).

    In Figure 10a, after the first phase we do not know whichgroup of WUTs is faulty. During the second test phase, theWUTs are swapped between TPGs, and the faulty group canbe identified by checking whether the failure moves to theother ORA or stays at the first ORA.

    Figure 10b depicts one technique used to identify the faultyregion in a set ofWUTs. We add several ORAs to observe theWUTs at different points and use two phases with oppositedirections of the test pattern flow through WUTs. Faults arelocated between the highest failing ORA in the first phaseand the lowest failing ORA in the second phase. An alterna-tive technique is to divide the WUTs into pairs of shorterWUTs and retest (Figure 10c). This divide and conquermethod can be recursively repeated to obtain a very highdiagnostic resolution.

    Identifying the faulty WUT in a group can be accomplishedby either testing one wire at a time (and comparing with aknown-good wire), or by making different ORA connections

    with the faulty set of WUTs and observing the movement ofORA failures (a similar idea to that of Figure 10a). The faultywire segment or CIP is then identified by re-routing to elim-inate a segment (or CIP) at a time during the subsequent tests(Figure 10d).

    Diagnostic phases can be either precomputed and stored ondisk, or generated on the fly by TREC. In general, the diag-nostic phases illustrated in Figure 10a through 10c can beprecomputed, while the detailed diagnostic phases implied

    Fail

    Figure 7. Interconnect BIST architecture

    TPG

    B WUTs

    Pass/

    A WUTs

    ORA

    BIST

    BIST

    cell

    Start

    Done

    0

    0

    0

    1

    1

    1

    1 1

    0

    T O

    T O

    T O

    T O

    Figure 8. Horizontal

    interconnect testing

    Figure 9. Global

    cross-point CIP test

    O

    T

    OO

    O

    T

    OO

    Figure 10. Diagnostic configurations

    T

    1st test 2nd testa) swapping WUTs

    O O

    T

    O

    T T

    O OO

    O

    T

    TO

    O

    O

    1st test 2nd test

    b) adding ORAs

    O

    T

    O

    T

    O

    T

    1st test 2nd testc) divide & conquer

    O

    T

    1st test 2nd testd) re-route wire segments

    singlewire

    O

    T

    singlewire

    knowngoodwires

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    6/12

    by Figure 10d would be generated on the fly for mostFPGA interconnect architectures.

    A tool like JBITS [29] that can efficiently generate config-uration streams without relying on CAD tools, would be idealto use during testing. Since no such tools are commerciallyavailable, we rely on commercially available place-and-routetools used incrementally, followed by configuration-genera-tion software; these may run for several minutes to generatemultiple diagnostic phases, but this time could be acceptableif TREC is a low-cost tester. A limit on the time allowed forcomputing diagnosis configurations can be established basedon factors such as the cost of a SoC, the production volume,the current yield, the probability of repairing the defectiveSoC given the number of faults found so far, the target reso-lution, etc. The use of precomputed diagnostic phases inconjunction with an adaptive diagnostic procedure mini-mizes the total time spent on TREC, while maintainingdefect-tolerant capabilities.

    To illustrate the diagnostic resolution that can be obtainedwith these techniques, we will assume a single fault in a typ-

    ical configuration of interconnect resources illustrated inFigure 11a. Here segmentXis bounded by break-point CIPsthat control its connections to YandZ, while segment Wshares no CIPs withX. For example, if a cross-point CIP isstuck-open (Figure 11b), this is the only condition that wouldallow the vertical and the horizontal segments connected bythe that CIP to be driven to complementary values. A similarreasoning identifies an open wire segment (Figure 11c) Inthese cases, diagnostic resolution is to a CIP or a segment.However, we cannot distinguish between an open in a seg-ment and a stuck-open fault in its adjacent break-point CIP(Figure 11d). Figure 11d also shows another pair of equiva-lent faults - a short between two segments separated by abreak-point CIP and that CIP being stuck-on. On the other

    hand, shorts between two segments that do not share a CIP,such asWandXin Figure 11e, can be diagnosed by checkingthat each segment is fully functional while the other segmentit is shorted to is unused, but when both segments are drivento opposite values, then we detect a failure on at least one ofthem.

    For multiple faults in the same area, there are cases where

    we will not always be able to identify the faulty resourcesamong a set of suspects. These cases arise when some faultsinterfere with the routing configurations needed to diagnoseother faults. In such cases, we will play it safe by labelingthe entire set of suspects as faulty, so that all of them will bebypassed by fault tolerant techniques.

    To illustrate the problem of detecting and diagnosingshorts with configuration memory bits, consider a shortbetween a configuration bitBand an interconnect segmentS,and let us assume thatBcontrols a CIPC. Note that the valuebofBis constant during a phase. This short may be detectedin any configuration whereSis used and at least once is setto valueb. The detection will occur in one of the two cases:

    1) the short creates a wrong value onBandCis under test; 2)the short creates the wrong value onSandSis under test.

    The problem is thatCmay be under test in different config-urations, and the value ofBmay be affected in only some ofthem. Similarly,Smay be tested in different configurations,and the value ofBmay affect only some of them. This behav-ior would seem to be that of an intermittent fault, as the sameresource looks like it fails only some of the configurationswhere it is tested. So an interconnect diagnosis procedure thatlooks for consistency in all cases is doomed to fail. The prac-tical solution in such situations is to label all the suspectedresources as faulty. Avoiding all the potential defects is a rea-sonable strategy for yield enhancement.

    6. Testing Other Cores with the FPGA CoreThe embedded FPGA can also be used to test any otherembedded core connected to the same system bus. We willseparately analyze BIST-cores, whose testing is based onin-core BIST, and externally-tested-(ET)-cores, which relyon vectors external to the core.

    6.1 BIST-Cores

    For a BIST-core, its BIST logic may be moved from thecore to the embedded FPGA, as illustrated in Figure 12.(Similar techniques have been described for using an FPGAinstalled on a board to test other devices on the board

    Figure 11. Diagnostic resolution examples

    a) typicalinterconnectconfiguration

    W

    XY Z

    W

    XY Z

    W

    XY Z

    W

    XY Z

    W

    XY Z

    b) diagnosingstuck-opencross-point CIPs

    c) diagnosingwire segmentopen

    d) equivalentinterconnectfaults

    e) diagnosingshorts betweenwire segments

    s-open

    ) (

    open

    ) (

    open

    s-openshort

    s-closed

    short

    eFPGA

    BIST-Core

    BIST

    Figure 12. Relocating the BIST logic

    a) b)eFPGA

    BIST

    Orig. Core

    Orig. Core

    b

    c

    d

    c

    d

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    7/12

    [22][27].) In practice, this means that the core providers donot have to implement BIST in embedded cores; instead, theyprovide the BIST logic as an additional IP to be implementedin the FPGA core. Several conditions must be satisfied tomake this transformation possible. First, the BIST logicshould not be intrusive, that is, it should be easy to separatefrom the rest of the logic. Letb,c, andddenote, respectively,

    the number of connections between the BIST logic and theoriginal core (without BIST), between the core (with BIST)and the system bus, and between the embedded FPGA andthe system bus. The relocation of the BIST logic is possibleif and . Another condition is for the FPGA to sup-ply test patterns at the speed required to test the core. Ofcourse, the BIST logic implemented in the core would befaster than its FPGA implementation, but this speed may behigher than the one used in normal operation, when the coreis accessed over the system bus. So if the FPGA can apply thetest at the same speed the core is accessed over the systembus, and if the bus width conditions are also satisfied, theBIST logic may be relocated in the FPGA.

    We will illustrate this scheme for testing an embeddedRAM core. Figure 13 illustrates a typical RAM BIST archi-tecture.Data In,Address, Control, andData Outare includedin the system bus. During BIST, the inputsData In,Address,andControlare replaced with the signals produced by anintegrated BIST controller and TPG, which generate a Marchtest for the RAM block. The controller/TPG also producesthe expected output responses to be compared with the RAMoutput data by the ORA, which latches any mismatchesobserved to produce a failure indication at the end of theBIST sequence. (Note that separate tests are needed to checkthe connections between the RAM block and the I/Os of theRAM core.)

    Using the embedded FPGA, we can follow the scheme inFigure 12, remove all the BIST logic from the RAM core,and download the BIST controller/TPG and the ORA logic in

    the FPGA only when we need to test the RAM. This resultsin several advantages. First, the RAM core is smaller, beingreduced to only the shaded RAM block, and faster, since weno longer need multiplexers to select between system andBIST data (this selection is now done by the system bus).Second, when the BIST logic is in the same core with theRAM (Figure 13), we also need to test the BIST logic to

    make sure that it will correctly test the RAM; this is no longernecessary when the RAM BIST logic is implemented in theFPGA core, because the self-test of the embedded FPGA(described in the previous sections) completely tests all itsresources. Third, while defects in the BIST logic of a RAMcore cannot be repaired and will likely cause the entire SoCto be discarded, defects in the embedded FPGA core can beeasily tolerated (see Section 7).

    If we have several RAM cores in the SoC, one drawback ofthis approach is that it can test only one RAM at a time,because the FPGA can observe the output of only one core ata time. In contrast, with the BIST logic in the same core withthe RAM, all embedded RAMs may be tested in parallel, pro-vided that the power dissipation constraints of the SoC arenot violated.

    A compromise solution, illustrated in Figure 14, movesonly part of the BIST logic in the embedded FPGA core,while allowing several RAMs to be tested concurrently. Wemove only the BIST controller/TPG in the embedded FPGA,and we leave the ORA and a separate data generator block inthe RAM core to produce only the expected data. The FPGAprovides the sameData In,Address, andControlinputs to allRAMs under test. All embedded RAMs can be tested concur-rently, since here results analysis is independently donewithin each RAM core. This scheme still avoids the BISTmultiplexers which cause the performance degradation.

    Similarly, the embedded FPGA core can be used in turn toimplement (at least part of) the BIST logic for most embed-ded BIST-cores that are connected to the system bus. Movingthe BIST logic in the embedded FPGA requires analyzing thetrade-offs involving the area overhead and the performance

    b c b d

    Figure 13. Typical RAM BIST architecture

    2KMRAM1

    0control

    data out

    data inaddress

    01 01

    M-bit ORA

    Data Out

    Control

    K M

    M

    Data In

    BIST Controller & TPG

    Mdata bits

    K M

    Kaddress

    addressgenerator

    datagenerator

    bits

    read/writegenerator

    Pass/Fail

    BISTStart

    BISTDone

    Mexpectedresponse

    Ccontrol

    bits

    C

    Address

    2KMRAMcontrol

    data out

    data inaddress

    M-bit ORAData Out

    K M

    M

    Pass/Fail

    datagenerator expected

    response

    M

    C

    BISTStart

    BIST Controller & TPG

    MK

    addressgenerator

    datagenerator

    read/writegenerator

    BISTDone

    C

    Data InAddressControl

    system bus

    eFPGA

    Reduced BISTRAM core

    Figure 14. Partial move of RAM BIST logic

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    8/12

    penalties introduced by in-core BIST, the total test time, andthe SoC power constraints. This analysis is the responsibilityof the SoC designer; to make the choice possible, the coreprovider should supply both BIST and non-BIST (and, whereapplicable, partial-BIST) versions of the same core.

    6.2 ET-Cores

    In this section we consider embedded ET-cores that aretested with externally applied vectors. Typically, these coresuse scan DFT and the vectors are generated by an ATPG pro-gram. Applying an external test to an embedded core is adifficult problem (for example, the core may have more pinsthan the SoC); the solution usually consists of inserting DFTstructures to allow the input stimuli and the core responses tobe transported between core and SoC pins [16]. If theET-core is connected to the system bus, our approach is tochange the core testing mode from external to BIST andimplement the BIST circuit in the embedded FPGA.Notethat the ET-core does not have to be changed, since thesource for its input vectors and the sink for its output vectorsare still external to the core under test. But from the SoCs

    perspective, this is a significant change.Given a full-scan circuitC, many procedures exist to create

    a TPG that produces a pseudo-random test of reasonablelength that always achieves 100% fault-coverage for C;[7][28][30] are more recent techniques of this type. The keyconcept is generating vectors whose signal probabilities aredynamically changed so that the random-pattern-resistantfaults [6] ofCnot yet detected will be more likely (or evencertain) to be detected by the subsequent vectors. Althoughsome of these techniques require a large area overhead forcircuits with multiple scan-chains, the area overhead is nolonger a primary concern when BIST is implemented inreconfigurable logic(the only constraint is to fit within theembedded FPGA).

    Figure 15 illustrates this scheme for an ET-core with a sin-gle scan-chain. The circuit under test (CUT) is thecombinational part of the full-scan circuit. An input vector isapplied at primary inputs (PIs) and via the scan chain, and anoutput vector is observed at primary outputs (POs) and viathe scan chain. The scan chain in the ET-core is extended bya register RI that supplies the PI values, and by a register ROthat captures the PO values; RI and RO are implemented inthe embedded FPGA, along with a TPG and an ORA. (A sep-

    arate TPG per chain is needed for multiple scan chains.) TheTPG provides the input sequence to the extended scan-chain,and the ORA processes the CUT outputs captured in theextended scan-chain. The ORA is a signature analyzer [6].Note that the FPGA must have access to the serial data input,serial data output, and the scan-enable pins of the scan chain.

    If the synthesis of the TPG is based on the structure of theCUT, which is not available to the SoC designer, then the toolthat generates the TPG will be run by the core provider, andthe TPG may be offered as an additional IP to be imple-mented in the FPGA core. Alternatively, if (non-compacted)ATPG vectors for the CUT are provided with the core, theTPG can be constructed to generate a pseudo-random test setthat includes the given vectors. This solution allows the TPGto be created without access to the CUT implementation.

    If this approach is feasible for every full-scan embeddedcore, thenthe entire SoC can be tested without external vec-tors. The FPGA core takes turns to implement an embeddedtester for every ET-core connected to the system bus. IfTREC does not need to store and apply vectors, then it can be

    a low-cost PC-based tester. The total test time will increasebecause most cores will not be concurrently tested, but thisbecomes less important when TREC is a low-cost tester.

    7. Fault Tolerance TechniquesOur approach for user-level fault-tolerance for the embed-

    ded FPGA starts with the circuit implemented in thereconfigurable logic and with the location and type of everydiagnosed fault. Note that we may have different circuitsresiding in the FPGA core at different times, either to per-form different functions, or to implement various BISTcircuits for other cores; the fault-tolerance analysis is sepa-rately done for each such circuit.

    Given a circuitCimplemented in the FPGA and a faultf,

    the first question is whetherfaffects the operation ofC. If itdoes, then we have to reconfigure a region ofCto bypassf.We will try to limit the extent of the reconfigured region asmuch as possible, to avoid changing the critical timing pathsofC. The computation of mostfault-bypassing configura-tions (FABCOs) is done by TREC. Although thiscomputation may involve CAD tools, they are used onlyincrementally on a small region ofC, and their run-times arereasonably small. A limit on the allowed computation time isestablished based on factors such as the cost of a SoC, theproduction volume, the current yield, etc. When computing aFABCO exceeds the time limit, we will let TREC test otherchips, and either discard the defective SoC, or transfer thecomputation job to a different, and possibly more powerful,

    processor.Sometimes the reconfiguration ofCdoes affect its critical

    paths, and then the resulting SoC is functional, but it mustoperate at a lower clock frequency. If the SoC manufacturerprovides several speed versions of the device, selling arepaired SoC as a slower device is much better than discard-ing it as defective.

    In the process of bypassing faults in a defective FPGAcore, we have created new FPGA configuration files for the

    system bus

    eFPGA

    Scan chain

    CUT

    ET-Core

    PIs POs

    TPG ORA

    Figure 15. BIST for an ET-core

    RI RO

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    9/12

    defective SoC; these files should be used only with the defec-tive chip they were generated for. A good method to maintainthe association between a faulty SoC and its configurationfiles is to create a uniquefaulty chip serial number, and tolabel both the defective SoC and its configuration files usingthis identifier. The serial number for all the fault-free chipswill be zero. The serial number will be recorded in a small

    non-volatile memory within the SoC, and it will be readableusing a USERCODEboundary-scan instruction [22]. Theserial number should also appear in the header of every con-figuration file for this SoC, and the configurationprocess willbe allowed to proceed only if there is a match between theserial number in the SoC and the one in theconfiguration file.

    If fault f does not affect C, then no reconfiguration for tol-erating f is required.This is clearly the case when faffectsFPGA cells or interconnect resources not used byC. In addi-tion, we have developed techniques for using defectiveresources whenever possible. This increases the effectiveyield without having to changeC, and in most cases withoutimpact on performance. Since using defective resourcesavoids using spare resources for fault tolerance, we will beable to tolerate more faults in the same area. Using defectiveresources is a major departure from most previous work infault tolerance, whose goal is to bypass every located fault.

    7.1 Compatibility

    If the function of a circuitCimplemented in the FPGA isnot changed by faultf,Candfare said to be compatible. Acompatible fault does not requireCto be reconfigured. Notethat this definition allowsfto change the timingofC. We willdefine compatibility separately for logic faults and intercon-nect faults.

    7.1.1 Compatibility for Logic Faults

    We say that a logic faultfin a logic cell iscompatible with

    the function Fto be implemented in that cell, iffFdoes notchange in the presence off. For example, ifFis a combina-tional function using only the LUTs, any fault in flip-flops iscompatible withF. Similarly, a fault affecting only the mul-tiplication operation is compatible with any F not usingmultiplication. In general, any fault that affects only modulesor modes of operation of the cell that are not used by Fiscompatible withF[3][12]. Then any suchFcan be imple-mented in the faulty cell, and the configuration file of thefault-free FPGA core is also applicable to this faulty SoC.

    Higher diagnostic resolution can allow additional compat-ibility relations. For example, suppose that testing the RAMmodule of the cell has identifiedfas a single bit in the RAMbeing stuck-at-0(1). Consider any functionFusing the RAM

    module only as a LUT. If the value needed for the faulty bitis 0(1),fis compatible withF.

    7.1.2 Compatibility for Interconnect Faults

    We define alayoutof a programmable interconnect net-work as the set of positions (open or closed) for all its CIPs,needed for a specific routing of the netlist. Similar to [21], wesay that an interconnect faultfiscompatible with a layout Lifffdoes not change the connections implemented byL. Iff

    andLare compatible, thenLcan toleratefand we don't haveto reconfigure to bypassf.

    For example, in Figure 16, assume a layoutLwhere break-point CIPsB1andB2are open, crosspoint CIPsX1andX2areclosed, and segmentZis not used. Then an open onB1X1iscompatible withL, while an open inX1X2is not. More inter-estingly, the short betweenB1B2andZis also compatiblewithL, sinceZdoes not carry any signal. Thus the routed sig-nal net can use the shorted segment B1B2. Table 1 givesdetailed conditions that need to be met for single-faultcompatibility.

    Note that the compatibility relation between a faultfand alayoutLmay be affected by the presence of another faultg.This happens becausegchanges the structure of the program-

    mable network on whichLis implemented. This interactionbetween faults is not considered in [21]. For example, inFigure 17, assume a layoutLwhere crosspoint CIPsX1and

    X2and are closed, and breakpoint CIPsB1andB2are open.SegmentAis used by signalS1via crosspointX1, segmentCis used by signalS2via crosspointX2, and segmentBis notused. Letfbe the short between segmentsAandB,and letgbe the breakpointB2stuck-closed. The single faultfis com-patible withL, as is the single faultg. However, the multiplefault {f,g} shorts the signalsS1andS2, so it is not compatiblewithL.

    Our procedures rely on a network model coupled with alayout data base. Thenetwork modelis a graph that repre-sents the topological relations between the components of theprogrammable interconnect network - wire segments, cellpins, FPGA pins, and CIPs. Thelayout data basedefines theCIP settings required to route all the signal nets in the circuit.

    Table 1. Single-fault compatibility conditions

    f Condition forf~L

    segment openorCIP stuck-openor

    segment stuck-at

    Ldoes not transmit values

    through the faulty resource

    CIP stuck-closed

    Luses the CIPor

    Ldrives values only on at most

    one of the shorted segments

    short between

    segments

    Ldrives values only on at most

    one of the shorted segments or

    Luses the shorted segments for

    the same signal

    X1)(

    B1 X2 B2)( )(

    Figure 16. Illustration for fault compatibility

    Z routed signal

    Figure 17. Multiple fault interaction

    BA C S2

    f g

    S1

    X1 X2B1 B2

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    10/12

    From the layout data base we can retrieve all the segmentsand CIPs used by a given signal, and determine which signalis routed through a given segment or CIP. In a fault-free net-work, a segment may be driven by only one signal.

    To record faults in the network model, every CIP and seg-ment has an additional field that indicates its fault status: aCIP can be fault-free, stuck-on, or stuck-off, and a segmentcan be fault-free, open, stuck, or shorted. For a shorted seg-ment we also record the identity of its partner- the othersegment involved in the short.

    We determine the faulty resources that may be reused (andthe ones that need to by bypassed) indirectly, by finding the

    signal nets that have to be rerouted. We say that asignal netS is compatible with fault fiffScan connect its source to allof its sinks in the presence off, and no other signal drives anysegment ofS. (Note thatfmay be a multiple fault.)

    The procedure to find the incompatible nets works as fol-lows. We first insert all the located faults in the networkmodel. Next, we process the faults to find all the signal netsaffected by faults. Then we trace every affected signalSfrom

    its source to all its sinks, keeping track of the segments used.Tracing stops at open segments, at stuck-open CIPs, and atstuck segments, but it does go through stuck-on CIPs, and itjumps from a shorted segment to its partner. If faults preventS from reaching at least one of its destinations, then S isincompatible with the faults. We record all the segmentsreached by every traced net. After all signals are traced, wecan identify the segments driven by more than one signal; thesignals that drive these segments are shorted, and only one ofthem may be allowed to continue to drive the shorted seg-ments, while the other ones will be marked as incompatible.To determine which signal to leave in place, first we excludethe ones already found incompatible because the trace has notreached some of their destinations. To select among the

    remaining signals, we analyze the timing margins (slacks) oftheir paths, and chose the one with the least slack. In this way,we minimize the impact of rerouting on the system timing,since the signals to be rerouted have more slack. We breakties by selecting the longest net as the one left in place, sinceshorter nets are likely to be easier to reroute.

    For example, in Figure 17, tracingS1will go throughA,B,C, and continue alongS2, while tracingS2will go throughC,

    B,A, and continue alongS1. Since the traced segments arerecorded as being driven by bothS1andS2, these signals areshorted. Assuming that neitherS1 norS2 have other faults, weselect one of them to be left unchanged (based on the analysisdescribed above), while the other will be allowed to continue

    to drive the shorted segments. This is possible since afterrerouting, the multiple fault will become compatible with thenew layout.

    In Figure 18, assume thatthe source ofS1 is on theleft. Then tracingS1stops atthe open segment X, andtracingS2reachesBvia theshort between B and C.Because of the open, B is

    driven by only one signal (S2), so that there is no other signalshorted toS2, which is hence compatible with the multiplefault, and does not have to be rerouted. However,S1needs tobe rerouted.

    7.2 Reconfiguration

    A tool like JBITS [29] would be ideal to generate FABCOsby directly modifying configuration files without involvingany CAD tools. Since no such tools are commercially avail-able, we rely on commercially available place-and-routetools used incrementally. The concept of a fault, however, isalien to these tools, and we need modeling workarounds tomake them avoid incompatible faulty resources. To avoid adefective logic cell, we program it to implement a dummylogic function whose inputs and outputs are unconnected. Toavoid a defective CIP or wire segment, we program a dummysignal net with no source or sink to occupy the faulty resourceand its immediate neighbors.

    When we consider reusing a wire segment Swhich isshorted to an unused segment, the delay of S increasesbecause of the additional capacitance provided by its partner.

    However, a conventional path tracing tool that reports thedelay along the traced path, will not see the additional delaycaused by the short. Our solution is to add the delay of theunused shorted segment to the delay ofS, and if the new delaycauses any path going throughSto become too slow to oper-ate at the specified clock speed, we give up on reusingSandmark the short as incompatible. Although this solution maybe conservative in overestimating the delay ofS, it guaran-teed that the short will not affect the critical paths in thecircuit.

    7.2.1 Reconfiguration for Logic Cells

    Many times the function of a logic cell does not utilize allits internal resources. In such cases, when a cell fault fis

    incompatible with the cell functionF, it is often possible toavoidfby small changes in the way Fis implemented. Forexample, if one of the LUTs in the cell is faulty, butFis notusing all the LUTs, we remapFto use only fault-free LUTs,which makesfcompatible and the cell reusable. Of course,we need to incrementally reroute the signals that connected tothe inputs and output of the faulty LUT, to bring them to thereplacement LUT.

    Another example is a 3-input combinational logic functionmapped to a 4-input LUT. This is implemented by settingidentical values to every pair of RAM bits whose addressesdiffer only in the value of the dont care address bit, so thatthe RAM has two identical halves. In this case we can tolerate

    any combination of faults that affect only one of the twohalves, by simply setting the previously unused bit to thevalue that selects the fault-free half. This also requires anincremental reconfiguration.

    If a defective cell has a logic fault incompatible with thefunction assigned to this cell, we relocate the function to acompatible unused cell, preferably one adjacent to the defec-tive cell (see Figure 19). If we cannot move the function to acompatible cell adjacent to the defective one, we set up a

    Figure 18.

    B

    )(

    A X

    S1 S2

    C

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    11/12

    chain of cell replacements [11], as illustrated in Figure 20.This approach averages the overall change in signal distur-bance, but increases the number of signals that must bererouted. Note that in our approach, no replacement cell(including the unused one) is required to be fault-free, butonly compatible with the desired function. After cell func-tions are relocated, we use an incremental router to reroutethe signal nets connected to the moved cells. Also we need toanalyze the changes in the timing of the rerouted signals, todetermine whether the chip should operate at the originalclock frequency or a reduced one.

    As a preprocessing step, we prepare a FABCO for each oneof the used cells in the target circuit. This not a computation-ally expensive job, since all the circuit transformations areonly incremental. These precomputed configurations arestored on the TREC disk, and are applied by TREC based onthe diagnosis results. Note that the same unused cell can bedesignated as replacement for several working cells. Afterapplying one FABCO, several other precomputed configura-tions may no longer be valid, since the circuit for which they

    were derived has been changed. It is another TREC task tokeep track of the validity of precomputed FABCOs. If theFABCO for a defective cell has been invalidated, TRECcomputes a new one, by incrementally rerouting the signalnets connected to the moved function.

    Another major advantage of our technique over conven-tional techniques is that we handle groups of faults in a tightarea. Other techniques limit the number of faults they canhandle by providing a limited number of spare logicresources in a region of the FPGA. When these local sparesare exhausted, their fault tolerant techniques fail. We providea global reconfiguration approach that usesminimax match-ing[19] to match defective cells to compatible cells, so thatthe maximum distance between any defective cell and itsreplacement cell is minimized.

    7.2.2 Reconfiguration for Interconnect

    Our experimental results show that in most practical cir-cuits implemented in FPGAs, a large number of interconnectfaults are compatible with the circuit [13]; this happensbecause many interconnect resources are unused, even in cir-cuits where more than 70% of the cells are used. Every suchfault can be tolerated without reconfiguration.

    Next we describe our multi-step approach for avoidingincompatible interconnect faults. In the following procedure,Sis the set of nets that have to be rerouted.

    Stage 1:First we process incompatible faults that do notallow some signal nets to connect to inputs or outputs on acell, because such faults may need reconfiguration for bothlogic and interconnect. If an input or an output of a cell isblocked, and the cell is not fully used, then we try to move thefunction of the blocked resource to an unused resource in thesame cell. Figure 21 shows an example of a stuck-open CIPdenying access to an input pin on LUT B. If LUT A is notused, we first try to move the function from LUT B to LUTA; keeping the LUT function within the same cell is lesslikely to introduce delays. For this, we reconfigure the celllogic and we rip-up the signal nets going to the inputs andoutputs of LUT A [17]. If such a transformation is not possi-ble within the blocked cell, we rip-up all the signal netsattached to the cell, and move the cell function a new compat-ible unused location. In either case we add the ripped-upsignals toS

    Stage 2:We determine all the signal nets incompatiblewith the remaining faults using the procedure described inSection 7.1. For every such net we perform a partial rip-up,trying to maintain as much as possible from its original rout-

    ing, and add the signal toS. For example, Figure 22a showsa signal net that was originally routed through segmentX1X2that is now open. Here we rip-up only the faulty segment.

    Stage 3:In this stage we attempt to incrementally rerouteeach signal net inS, without changing the routing of othernon-affected signal nets. For example, in Figure 22a, ifX3andX4are not used, the signal may be easily rerouted asshown in Figure 22b. Every successfully rerouted net is

    removed fromS. IfSbecomes empty, we stop.Stage 4:Here we use a more global transformation, com-

    pletely ripping-up all the signal nets within a variable-sizedwindow in the area around the signal nets inS, adding all theripped-up signals toS, and rerouting all signals inS. If we arenot successful, we increment the window size and repeat theprocess. IfSbecomes empty, we stop.

    Stage 5:Here we do a new place-and-route for the entirecircuit (with the faults marked by dummy signals).

    Figure 19. Cell replacement

    L4L3L2L1

    L4L3L2L1

    Figure 20. Chain of cell replacements

    Figure 21. Reconfiguring a cell to bypass a CIP fault

    LUT B

    LUT A

    cell

    LUT B

    LUT A

    cell

    open CIP

    X4

    Figure 22. Bypassing an open segment

    )(X1 X2

    X3 X4

    )(X1 X2

    X3a) b)

  • 8/13/2019 Using Embedded FPGAs for SoC Yield Improvement

    12/12

    Because many of interconnect faults are compatible withthe layout, the last two stages are unlikely to be used in prac-tice. Since each stage is computationally more expensivethan its preceding one, this procedure will be subject torun-time bounds as described at the beginning of Section 7.

    8. Conclusions

    In this paper, we have shown that an embedded FPGA coreis an ideal host to implement infrastructure IP for yieldimprovement in a bus-based SoC. The reconfigurable logicallows the infrastructure IP to be created only when needed,and enables BIST without any area overhead or performancedegradation. Our BIST techniques achieve practically com-plete fault coverage for both logic cells and interconnect, andour adaptive diagnosis techniques can locate almost anycom-bination of faulty cells or interconnect faults, and can alsoidentify faults within defective cells. Such accurate diagnosisis a key factor in achieving effective repair for the embeddedFPGA. Instead of the conventional approach of reconfiguringto bypass every located fault, our fault tolerance approachreuses defective resources whenever possible. This increases

    the effective yield without reconfiguration, and allows toler-ating more faults in the same area. Reuse is based on newtechniques to determine compatibility between faults in a celland its function, or between a group of interconnect faultsand the layout. We may reuse fault-free parts of defectivecells and even shorted segments. For incompatible faults wedetermine fault-bypassing configurations by incrementallychanging the implementation of the circuit, usually in a smallarea around the faults.

    An embedded FPGA core makes significant changes to theSoCs testing paradigm. After the FPGA core is fully testedand repaired if needed, we can use it to provide embeddedtesters in turn for other cores in the SoC. This allowsnon-intrusive BIST logic to be removed from cores, reducing

    area overhead and delay penalties. It also enables coresdesigned to be tested with external vectors to be tested withBIST, and the entire SoC to be tested with a low-cost tester.

    9. References[1] M. Abramovici and C. Stroud, BIST-Based Test and Diagnosis

    of FPGA Logic Blocks,IEEE Trans. on VLSI Systems, Vol. 9,No. 1, pp. 159-172, 2001

    [2] M. Abramovici, C. Stroud, B. Skaggs, and J. Emmert, Improv-ing On-Line BIST-Based Diagnosis for Roving STARs,Proc.

    IEEE Intnl. On-Line Test Workshop, pp. 31-39, 2000[3] M. Abramovici, C. Stroud, S. Wijesuriya, C. Hamilton, and V.

    Verma, Using Roving STARs for On-Line Testing and Diagno-sis of FPGAs in Fault-Tolerant Applications,Proc. Intnl. TestConf., pp. 973-982, 1999

    [4] M. Abramovici, J. Emmert, and C. Stroud, Roving STARs: An

    Integrated Approach to On-Line Testing, Diagnosis, and FaultTolerance for FPGAs in Adaptive Computing Systems,Proc.Third NASA/DoD Workshop on Evolvable Hardware, pp. 73-92,2001

    [5] M. Abramovici, C. Stroud, and J. Nall, BIST-Based Diagnosisof FPGA Interconnect, submitted toIntnl. Test Conf., 2002

    [6] P. H. Bardell, W. H. McAnney, and J.Savir,Built-In Testing forVLSI: Pseudorandom Techniques, John Wiley and Sons, 1987

    [7] N. Basturkmen, S. M. Reddy, and I. Pomeranz, Pseudo Ran-dom Patterns Using Markov Sources for Scan BIST, submittedtoIntnl. Test Conf., 2002

    [8] R. Cliffet al., Programmable logic devices with spare circuitsfor replacement of defects,U.S. Patent 5,485,102, Jan. 1996

    [9] K. Chakraborty and P. Mazumder,Fault-Tolerance and Reli-ability Techniques for High-Density Random-Access Memories,Prentice Hall, 2002

    [10] B. Culbertsonet al., Defect Tolerance on the Teramac Cus-tom Computer,Proc. 5th IEEE Symp. on Field-ProgrammableCustom Computing Machines, pp. 140-147, 1997

    [11] J. Emmert and D. Bhatia, Partial Reconfiguration of FPGAMapped Designs with Applications to Fault Tolerance and YieldEnhancement,Proc. Intnl Conf. on Field-Programmable Logic,

    pp. 141-150, 1997[12] J. Emmert, C. Stroud, B. Skaggs, and M. Abramovici,

    Dynamic Fault Tolerance in FPGAs via Partial Reconfigura-tion,Proc. IEEE Symp. on Field-Programmable Custom Com-

    puting Machines, pp. 165-174, 2000[13] J. Emmert, S. Baumgart, P. Kataria, A. Taylor, C. Stroud, M.

    Abramovici, On-line Fault Tolerance for FPGA Interconnectwith Roving STARs,IEEE Intnl. Symp. on Defect and FaultTolerance in VLSI Systems, pp. 445-454, 2001

    [14] I. Harris and R. Tessier, Interconnect Testing in Clus-ter-Based FPGA Architectures,Proc. AMC/IEEE Design Auto-mation Conf., pp. 49-54, 2000

    [15] I. Harris and R. Tessier, Diagnosis of Interconnect Faults inCluster-Based FPGA Architectures,Proc. IEEE Intnl Conf. onComputer Aided Design, pp. 472-476, 2000.

    [16] IEEE P1500 Standard for Embedded Core Test, http://grou-per.ieee.org/groups/1500/

    [17] V. Lakamraju and R. Tessier, Tolerating Operational Faults inCluster Based FPGAs,Proc. ACM Intnl. Symp. on FPGAs, pp.197-194, Febr. 2000

    [18] Lattice Semiconductor Co.,http://www.latticesemi.com/prod-ucts/fpga

    [19] T. Leighton and P. Shor, Tight Bounds for Minimax GridMatching with Applications to Average Case Analysis of Algo-rithms,Proc. Symp. on Theory of Computing, pp. 91-103, 1986

    [20] E. McCluskey, Verification Testing - A PseudoexhaustiveTest Technique,IEEE Trans. on Computers, Vol. C-33, No. 6,

    pp. 541-546, June, 1984.[21] K. Roy and S. Nag, On Routability for FPGAs Under Faulty

    Conditions,IEEE Trans. on Computers, Vol. 44, pp. 1296-1305,1995

    [22] A. Russ and C. Stroud, Non-Intrusive Built-In Self-Test forFPGA and MCM Applications,Proc. IEEE Automatic TestConf., pp. 480-485, 1995

    [23] Standard Test Access Port and Boundary-Scan Architecture,IEEE Standard P1149.1, 1990

    [24] C. Stroud, S. Konala, P. Chen, and M. Abramovici, Built-InSelf-Test for Programmable Logic Blocks in FPGAs (Finally, AFree Lunch: BIST Without Overhead!),Proc. IEEE VLSI TestSymp., pp. 387-392, 1996

    [25] C. Stroud, M. Lashinsky, J. Nall, J. Emmert, and M. Abramov-ici, On-Line BIST and Diagnosis of FPGA Interconnect UsingRoving STARs,Proc. IEEE Intnl. On-Line Test Workshop, pp.31-39, 2001

    [26] C. Stroud, S. Wijesuriya, C. Hamilton, and M. Abramovici,Built-In Self-Test of FPGA Interconnect,Proc. Intnl. TestConf., pp. 404-411, 1998

    [27] C. Stroud,A Designers Guide to Built-In Self-Test, KluwerAcademic Publishers, 2002.

    [28] N.A. Touba and E.J. McCluskey, Bit-Fixing in Pseudo-Ran-dom Sequences for Scan BIST,IEEE Trans. on CAD, Vol. 20,

    No. 4, pp. 545-555, Apr. 2001.[29] P. Sundararajanand and S. A. Guccione, Run-Time Defect

    Tolerance using JBits,Proc. ACM Intnl. Symp. on FPGAs, pp.193-200, Febr. 2001.

    [30] S. Wang, Low Hardware Overhead Scan Based 3-WeightWeighted Random BIST,Proc. Intnl. Test Conf., pp. 868-877,2001

    [31] Xilinx, Inc., http://www.xilinx.com/products