fpga 3D

By: V. JAGATHI M.TECH VLSI DESIGN REG NO:1581320001

GUIDED BY MR. E.POOVANNAN Asst.Professor, SRM University

The emerging three-dimensional (3D) integration technology is one of the promising solutions to overcome the barriers in interconnect scaling, thereby offering an opportunity to continue performance improvements using CMOS technology. As the fabrication of 3D integrated circuits has become viable, developing CAD tools and architectural techniques are imperative for the successful adoption of 3D integration technology.

A brief introduction on the 3D integration technology has been proposed, and then reviewed the EDA challenges and solutions that can enable the adoption of 3D ICs, and finally presented the design and architectural techniques on the application of 3D ICs, including a survey of various approaches to design future 3D ICs, leveraging the benefits of fast latency, higher bandwidth, and heterogeneous integration capability that are offered by 3D technology.

• It depicts the basic 2-D FPGA architecture, which contains input/output blocks (IOBs), CLBs, and switch matrices (SMs).

• Besides the simple input/output functions, they have logic resources to provide the JTAG interface

• Despite that the two topologies which have the same flexibility, their routability may differ.

• The connections in the top die are extended to the backside by the TSVs, forming the vertical interconnects.

• It gives an example of the connections in a face-to-back bonding fashion, where the TSV, passing through the silicon substrate, joins a front-side metal layer to the backside.

An automatic test pattern generator for open, short, and delay faults on 3-D FPGA interconnects by exploiting the regularity of switch matrix topology and forming repetitive paths with finite steps and with loop-back.

The experimental results show that 12 test patterns (TPs) suffice to achieve 100% open fault coverage (FC). To detect all possible neighboring short faults, we need more than 40 TPs, whose number increases only slightly with the height of the 3-D FPGA. The TPs have high delay FC (96%) for 3-D FPGAs with the number of configurable logic blocks ranging from 50 × 50 × 2to 50 × 50 × 6.

3-D integration has been approached by reducing the lengths of critical paths in field programmable gate arrays(FPGAs).

We are implementing the testing of 3D- FPGAs by using the input monitoring of BIST schemes perform testing during the normal operation of the circuit without imposing a need to set the circuit offline to perform the test. These schemes are evaluated based on the hardware overhead and the concurrent test latency (CTL).

Implementation of fault element to support multiple fault models, and use a fault element graph (FEG) to consider fault masking and reinforcing effects among multiple faults. Based on the FEGs of all failing patterns, the most likely fault locations and their fault elements are iteratively identified.

Open and short fault models and their TPs.

BIST circuit with loop-back paths

Modified BIST circuit

Input vector monitoring concurrent BIST.

The block diagram of an input vector monitoring concurrent BIST architecture is shown in Fig. The CUT has n inputs and m outputs and is tested exhaustively; hence, the test set size is N = 2n. The technique can operate in either normal or test mode, depending on the value of the signal labeled T/N.

During normal mode, the vector that drives the inputs of the CUT (denoted by d[n:1] in Fig) is driven from the normal input vector (A[n:1]). A is also driven to a concurrent BIST unit (CBU), where it is compared with the active test set. If it is found that A matches one of the vectors in the active test set a hit has occurred.

Proposed architecture.

The proposed scheme is based on the idea of monitoring a window of vectors, whose size is W, with W = 2w, where w is an integer number w < n. Every moment, the test vectors belonging to the window are monitored, and if a vector performs a hit, the RV is enabled.

The bits of the input vector are separated into two distinct sets comprising w and k bits, respectively, such that w + k = n. The k (high order) bits of the input vector show whether the input vector belongs to the window under consideration. The w remaining bits show the relative location of the incoming vector in the current window.

Modified decoder design used in the proposed architecture.

The design of the m_dec module for w = 3 is shown in Fig. And operates as follows. When test generator enable (tge) is enabled, all outputs of the decoder are equal to one. When comparator (cmp) is disabled (and tge is not enabled) all outputs are disabled.

When tge is disabled and cmp is enabled, the module operates as a normal decoding structure. At the beginning of the operation, the module is reset through the external reset signal. When reset is issued, the tge signal is enabled and all the outputs of the decoder are enabled. Hence, DA1, DA2, . . . , DAW are one; furthermore, the CD signal is enabled.

Design of the logic module.

The module labeled logic is shown in Fig. It comprises W cells (operating in a fashion similar to the SRAM cell), a sense amplifier, two D flip-flops, and a w-stage counter (where w = log2W).

The overflow signal of the counter drives the tge signal through a unit flip-flop delay. The signals clk and clock (clk) are enabled during the active low and high of the clock, respectively. In the sequel, we have assumed a clock that isactive during the second half of the period, as shown in Fig. In the sequel, we describe the operation of the logic module, presenting the following cases: 1) reset of the module; 2) hit of a vector; 3) a vector that belongs in the current window reaches the CUT inputs but not for the first time; and 4) tge operation

Concept of fault element to describe a fault effect at a location under a pattern.

A fault element describes a fault location and its faulty value under a pattern. It can support the diagnosis of a real silicon defect that behaves as DM under different failing patterns. Using fault elements with the layout information, we are able to not only further narrow down the candidate locations but also identify the behavior of real silicon defects.

To handle the issue of fault masking and reinforcing effects among multiple faults, fault-element graphs (FEGs) are constructed to describe the combined effects of multiple fault elements.

consider one fault at a time to identify multiple faults, our approach identifies the multiple faults by using FEGs to keep track of multiple-fault effects. Based on FEGs of all failing patterns, the candidate locations and their fault elements are iteratively identified with FEGs pruned iteratively.

Types of Multiple Fault Effects are Single fault and Multiple faultswith masking and reinforcing effects.

We are implementing fault diagnosis in fig.(a), under the pattern p, if only one fault element b/1/p exists, there is only one failing output h., as illustrated in Fig.(b), if two other fault elements a/1/p and c/0/p exist, b/1/p is masked by c/0/p and is reinforced by a/1/p. Instead of h, the failing output becomes g.

Fig. presents a circuit with three fault locations q, b, and c. Their fault elements are given at the top of the figure. The three faults produce three failing patterns.

In the FEGs, each vertex represents a fault element, and each directed line represents the relation between corresponding fault elements. The score of a fault element is written below the fault-element label in the vertex.

In Fig. (a), w/1/p1 is picked out from TRACE_SET = {t/1/p1, u/0/p1, v/1/p1, w/1/p1}. For TRACE_SET = {w/0/p2, y/0/p2} in Fig. (b), since w and y have the same logic level, either of them can be picked out.

The values of s and q under p2 are both controlling value 1 of the OR gate, s/0/p2 and q/0/p2 need to reinforce each other to produce the fault effect of y/0/p2. Since the score of y/0/p2 is 0.5, s/0/p2 and q/0/p2 are scored 0.5/2 = 0.25.

Algorithm Constructing an FEG for a Failing Pattern1 Initialize an empty set TRACE_SET;2 NFO = Number of failing outputs of the failing pattern;3 FOREACH (failing output FO) {4 Set the score of the fault-element at FO as 1/NFO;5 Add the fault-element at FO to TRACE_SET;6 }7 WHILE (TRACE_SET != Ø) {8 From TRACE_SET, pick out a fault-element FEPICK which hasthe highest logic level;9 IF (FEPICK is at a gate G’s output OG) {//Situation-110 FOREACH (G’s input IG) {11 If (fault-element at IG has contribution to produce FEPICK) {12 Construct and Score the fault-element at IG in the FEG;13 Add the fault-element at IG to TRACE_SET;14 }15 }16 }17 ELSE IF (FEPICK is at a stem T’s fan-out branch) {//Situation-218 From TRACE_SET, pick out all the fault-elements at T’sfan-out branches (FEBRANCHES);19 Compare the fault effects of the fault-element at T (FET)with the union fault effects of FEBRANCEHS to construct and score the FET in the FEG;20 Supply fault-elements to balance the fault effects of FET and FEBRANCEHS in the FEG;21 Add FET and supplemented fault-elements to TRACE_SET;22 }23 ELSE { //FEPICK is at a circuit input24 Continue;25 }26 }

Solution Identification

The score of the fault location c is the score of c/0/p2 plus c/0/p3 (0.875 + 0.875 = 1.75). Since c does not appear in the FEG of p1, its fault element under p1 is set as c/x/p1. Among all the fault locations, c and h with the highest score are selected as the candidate locations in the first iteration.

The score of y/0/p2 is updated to 1. Since c/0/p2 makes z be 0, the vertex of z/0/p2 is pruned also. Then, because the vertex of m/1/p2 points only to the vertex of d/1/p2, the score of d/1/p2 is updated to 1. Similarly, the pruned FEGs for the candidate location h are shown in , , and .➍ ➎ ➏

In the second iteration, based on the pruned FEGs of , , and and the ➊ ➋ ➌updated scores of the fault elements, the candidate locations y, q, m, and d are selected. Since they have same fault effects, they are pruned from FEGs at the same time. After pruning, the FEGs of p2 and p3 become NULL,leaving only one pruned FEG of p1 .➐

We rank the candidate locations in all the solutions to obtain the final diagnosis results. The candidate locations are ranked based on the two following metrics. Metric 1: The number of solutions that contain a candidate location. The candidate location appearing in more solutions gets a higher rank.

Metric 2: The order of selection for a candidate location. The candidate locations selected earlier have more contributions to explain the failing patterns, and are ranked higher. From the probability perspective, we give Metric 1 a higher priority than Metric 2 during ranking.

Algorithm 2 Select Candidate Locations in an Iteration

1 Initialize every fault location’s score to 0;

2 FOREACH (fault location FL) {

3 FOREACH (failing pattern FP) {

4 IF (the fault-element at FL under FP exists in the FEG)

5 Score of FL += Score of the fault-element at FL under FP;

6 ELSE Set the fault-element at FL under FP as FL/x/FP;

7 }

8 }

9 Candidate Locations = Fault Locations with the Highest Score;

Physical Layout of the XC2VP30 FPGA with the design for C432 mapped into it. The two PowerPC processor block are shown together with the used logic blocks (red) and unused logic blocks (white).

Fault emulation approaches in hardware have included vector processors , multi-processors, Graphics Processing units , supercomputers as well as reconfigurable computing platforms. These approaches have their own merit but mostly suffer from requiring significant design time and effort and require the use of complex and very expensive specialized hardware.

It performs fault emulation by using off-the-shelf commercial reconfigurable computer called the Field Programmable Gate Array (FPGA) processor. An FPGA contains a collection of configurable logic blocks and programmable interconnects that can be configured by the designer to fit the design needs.

• Xilinx Virtex-II Pro Development System (XUPV2P) serves as the platform for this work. The low-cost but powerful board houses a Xilinx XC2VP30 FPGA with 30,816 Configurable Logic Blocks (CLBs), 136 18-bit multipliers, 2,448Kb of block RAM, and two PowerPC Processor cores.

• The PowerPC features a 64-bit architecture that can also run in a 32-bit mode. This processor has 5 pipeline stages, 16 KB of instruction and data caches, and can run at clock rates of up to and above 400 MHz.

• The communication between the processor and the custom logic cores built using the available CLBs takes place via the Processor Local Bus (PLB). Figure shows the physical layout of the Virtex II Pro FPGA FPGA used in this design.

C17 circuit is one of the bench mark suite of ISCAS

Consider the schematic netlist shown in Figure for circuit C17 from the ISCAS benchmark suite. From the circuit netlist, a directed graph table is created with the name of the circuit node, the logic type and the node parents as shown in Table . After the table has been created, the output wires from other gates that lead into each node are all stored into a vector.

This vector is cycled through numerous times, and if the wire appears more than once it is stored in another vector along with the number of times it has been detected. If the parent column contains the same node more than once, this node is detected as a fanout. Any node with a NULL parent is a primary input. Since the checkpoint consists of all the fanout branches and primary inputs.

Directed graph table for c17

At every checkpoint a multiplexer is inserted with one input connecting to the original gate connection and the second input connecting to the stuck-at-1 or 0 fault being simulated. A mux select signal is used to choose between the faulty node data and the correct data

At every checkpoint a multiplexer is inserted with one input connecting to the original gate connection and the second input connecting to the stuck-at-1 or 0 fault being simulated. A mux-select signal is used to choose between the faulty node data and the correct data as shown in Figure.

Once the checkpoints have been identified, a new file is generated with the fault inserted netlist. The module name, input, and output lines from the original netlist are copied directly from the original netlist into the new file. The declaration of ports on the module and wires is also copied over with the extra multiplexer control signals added. The multiplexers are then added for the nodes identified as the primary inputs.

Fault free C17 and three instantiations of the faulty circuit. The faultsare activated based on the SS10-SS0 vector.

Figure shows an example of three instantiations of the C17 circuit with different multiplexer select signals. In the circuit labeled C17-1, the select lines SS10 − SS0 = 00000000001. This means that only fault associated with SS0 is added to the net list. Similarly, C17-2 and C17-3 have faults associated with multiplexer select lines SS1 and SS2 added to them respectively.

As shown in the figure, the output of the circuit with the fault is compared to that of a fault-free circuit with the use of a series of XOR gates. This creates an error signal indicating that an input pattern applied has detected a fault in that particular module. During the first iteration of these three instantiations, faults associated with SS0, SS1 and SS2 were tested. The next iteration will test the next three set of faults (i.e. SS3, SS4 and SS5) until all faults have been excited and tested.

Characteristics of the benchmark circuits.

Input vector monitoring techniques: comparison (n = 16, m = 16, and 100-MHz clock).

Average FPGA hardware resources (Slices and Look-up tables) used for test pattern generation using random vectors for benchmark circuits as a function of the number of instantiations. The hardware usage is normalized to the single instantiation case.

Total Number Of Fault Elements In All FEGs.

Number Of Identified Solutions

The number of vectors tested vs, faults found for 32 instantiationsof C1355 using 10 different seeds for the LFSR. The red solid line is the mean of the 10 iterations.

module bist(input clk,ce,input start,mode_select,output reg d1,d2,q1,q2);always@(posedge clk or negedge ce)beginif (~ce)begin q1<=0; q2<=0; d1<=0; d2<=0;endelsebegin q1<=d1; q2<=d2; d1<=start^q1; d2<=mode_select^q1;end endendmodule

module Diagnosis(d,clk,in_reset,reset_div,enable,ref1,clk0,reset_div1,n0,n1,n2,n3,reset_pv,a,k,rst,fdco,q0,q1,q2,q3,w1,clkn0,clkn1,clkn2,clkn3,w2,f1,f2,m,w,in,din_0,wfinal,enable,out,lower,greater,equal,p0,p1,p2,p3);input d;inout reset_div;input clk;input in_reset,enable,ref1,clk0;input reset_pv;input n0,n1,n2,n3;input w1,w2;input f1,f2;input m;input a;input k,rst;input in,w;input din_0;input clkn0,clkn1,clkn2,clkn3;output equal ;output greater ;output lower ;output reg out;output p0,p1,p2,p3;reg tmp0,tmp1,tmp2,tmp3;

module c17test(N1,N2,N3,N6,N7,N22,N23,SS0,SS1,SS2,SS3,SS4,SS5,SS6,SS7,SS8,SS9,SS10, errsig);input N1,N2,N3,N6,N7,SS0,SS1,SS2,SS3,SS4,SS5,SS6,SS7,SS8,SS9,SS10, errsig;output N22,N23;wire N10,N11,N16,N19, N1_0, N2_0,N3_0, N6_0, N7_0, N3_1, N3_2,N11_1, N11_2, N16_1, N16_2;mux Mux0(N1_0, SS0, errsig, N1);mux Mux1(N2_0, SS1, errsig, N2);mux Mux2(N3_0, SS2, errsig, N3);mux Mux3(N6_0, SS3, errsig, N6);mux Mux4(N7_0, SS4, errsig, N7);mux Mux5(N3_2, SS5, errsig, N3_0);nand NAND2_1 (N10,N1_0, N3_2);mux Mux6(N3_1, SS6, errsig, N3_0);nand NAND2_2 (N11,N3_1,N6_0);mux Mux7(N11_2, SS7, errsig, N11);nand NAND2_3 (N16,N2_0, N11_2);mux Mux8(N11_1, SS8, errsig, N11);nand NAND2_4 (N19,N11_1,N7_0);mux Mux9(N16_2, SS9, errsig, N16);nand NAND2_5 (N22, N10, N16_2);mux Mux10(N16_1, SS10, errsig, N16);nand NAND2_6 (N23,N16_1,N19);endmodule

reg k3;reg mux_out;// parameter a=1'b0;reg clear;output reg fdco;output reg wfinal;reg n;reg delw,delf;reg kf,kw,kfkw;reg reset;output reg q0;output reg q1;output reg q2;output reg q3;output reg reset_div1;reg upd;reg reset_alg;reg a1;always@(posedge clk)begin q1=d; q2=d; a1=q1&q2; clear=a1|reset_div;endalways@(posedge clk)if (clear==1'b1)begin q1=1'b0; q2=1'b0;end

For Testing of 3D FPGAs

For Fault Diagnosis of 3D- FPGAs

• Input vector monitoring concurrent BIST schemes perform testing during the

circuit normal operation without imposing a need to set the circuit offline to

perform the test.

• Implemented a fault diagnosis method of failures caused by multiple locations at

a time, describe a fault location and its faulty value under a pattern.

• Experimental results proved the effectiveness of the proposed method in

diagnosing multiple faults under the failing pattern of the circuit.

[1] A. Gayasen, V. Narayanan, M. Kandemir, and A. Rahman, “Designing a 3-D FPGA: Switch box architecture and thermal issues,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 7, pp. 882–893, Jul. 2008.

[2] M. J. Alexander, J. P. Cohoon, J. L. Colflesh, J. Karro, and G. Robins, “Three-dimensional field-programmable gate arrays,” in Proc. 8th Annu.IEEE Int. ASIC Conf. Exhibit., Sep. 1995, pp. 253–256.

[3] L. Shang, A. Kaviani, and K. Bathala, “Dynamic power consumption in Virtex-II FPGA family,” in Proc. Int. Symp. Field Program. Gate Arrays, 2002, pp. 157–164.

[4] R. S. Patti, “3-D integrated circuits and future of system-on-chip designs,” Proc. IEEE, vol. 94, no. 6, pp. 1214–1224, Jun. 2006.

[5] D. Das and N. A. Touba, “A low cost approach for detecting, locating, and avoiding interconnect faults in FPGA-based reconfigurable systems,” in Proc. Int. Conf. VLSI Syst. Design, Jan. 1999, pp. 266–269.

[6] I. G. Harris, P. R. Menon, and R. Tessier, “BIST-based delay path testing in FPGA architectures,” in Proc. Int. Test Conf., Nov. 2001, pp. 932–938. [7] M. Renovell, J. M. Portal, J. Figueras, and Y. Zorian, “Testing the interconnect of RAM-based FPGAs,” IEEE Design Test Comput., vol. 15, no. 1, pp. 45–50, Jan.–Mar. 1998. [8] W.-K. Huang, F. J. Meyer, X.-T. Chen, and F. Lombardi, “Testing configurable LUT-based FPGA’s,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 6, no. 2, pp. 276–283, Jun. 1998. [9] M. B. Tahoori and S. Mitra, “Automatic configuration generation for FPGA interconnect testing,” in Proc. VLSI Test Symp., Apr. 2003, pp. 134–139. [10] M. Abramovici and E. Charles, “BIST-based test and diagnosis of FPGA blocks,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 9, no. 1, pp. 159–172, Feb. 2001.

fpga 3D

Documents

Transcript of fpga 3D