A Survey of Repair Analysis Algorithms for...

47

A Survey of Repair Analysis Algorithms for Memories

KEEWON CHO, Yonsei UniversityWOOHEON KANG, SK Hynix Inc.HYUNGJUN CHO, Samsung Inc.CHANGWOOK LEE, SK Hynix Inc.SUNGHO KANG, Yonsei University

Current rapid advancements in deep submicron technologies have enabled the implementation of very largememory devices and embedded memories. However, the memory growth increases the number of defects,reducing the yield and reliability of such devices. Faulty cells are commonly repaired by using redundantcells, which are embedded in memory arrays by adding spare rows and columns. The repair process requiresan efficient redundancy analysis (RA) algorithm. Spare architectures for the repair of faulty memory includeone-dimensional (1D) spare architectures, two-dimensional (2D) spare architectures, and configurable sparearchitectures. Of these types, 2D spare architectures, which prepare extra rows and columns for repair, arepopular because of their better repairing efficiency than 1D spare architectures and easier implementationthan configurable spare architectures. However, because the complexity of the RA is NP-complete, the RAalgorithm should consider various factors in order to determine a repair solution. The performance dependson three factors: analysis time, repair rate, and area overhead. In this article, we survey RA algorithmsfor memory devices as well as built-in repair algorithms for improving these performance factors. Built-in redundancy analysis techniques for emergent three-dimensional integrated circuits are also discussed.Based on this analysis, we then discuss future research challenges for faulty-memory repair studies.

Categories and Subject Descriptors: B.6.2. [Reliability and Testing]: Redundant Design; B.8.1 [Perfor-mance and Reliability]: Reliability, Testing, and Fault Tolerance; B.7.1 [Integrated Circuits]: MemoryTechnologies; I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search; A.1 [GeneralLiterature]: Introductory and Survey

General Terms: Memory, Redundancy Analysis Algorithms, Spare Architecture, Yield

Additional Key Words and Phrases: Built-in redundancy analysis (BIRA), built-in self-repair (BISR), built-inself-test (BIST), normalized repair rate, repair rate

ACM Reference Format:Keewon Cho, Wooheon Kang, Hyungjun Cho, Changwook Lee, and Sungho Kang. 2016. A survey of repairanalysis algorithms for memories. ACM Comput. Surv. 49, 3, Article 47 (October 2016), 41 pages.DOI: http://dx.doi.org/10.1145/2971481

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Koreagovernment (MSIP) (No. 2015R1A2A1A13001751).Authors’ addresses: K. Cho, Department of Electrical and Electronic Engineering, Yonsei University, Seoul,Korea; email: [email protected]; W. Kang, Memory CAE Team, SK Hynix, Inc., Icheon-si, Gyeonggi-do, Korea; email: [email protected]; H. Cho, AP Development Division, Samsung Inc., Suwon-si,Gyeonggi-do, Korea; email: [email protected]; C. Lee, Probe Test Engineering Team, SK Hynix, Inc.,Icheon-si, Gyeonggi-do, Korea; email: [email protected]; S. Kang, Department of Electrical andElectronic Engineering, Yonsei University, Seoul, Korea; email: [email protected] to make digital or hard copies of part or all of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and thatcopies show this notice on the first page or initial screen of a display along with the full citation. Copyrights forcomponents of this work owned by others than ACM must be honored. Abstracting with credit is permitted.To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of thiswork in other works requires prior specific permission and/or a fee. Permissions may be requested fromPublications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)869-0481, or [email protected]© 2016 ACM 0360-0300/2016/10-ART47 $15.00DOI: http://dx.doi.org/10.1145/2971481

ACM Computing Surveys, Vol. 49, No. 3, Article 47, Publication date: October 2016.

http://dx.doi.org/10.1145/2971481

http://dx.doi.org/10.1145/2971481

47:2 K. Cho et al.

1. INTRODUCTION

Rapid advancements in deep submicron technologies in semiconductor manufacturinghave not only increased the scale of integration of memory devices and embeddedmemories but also the incidence of memory defects. Such defects reduce device yields[Zorian and Shoukourian 2003]. By 2017, embedded memories are projected to occupyapproximately 68% of the total chip area [Mohammad 2015]. These expanded embeddedmemories are expected to affect the yield of systems-on-a-chip (SOC) increasingly. Inaddition, because memory defects can cause critical errors in systems and devices,they should be prohibited from memory cores [ITRS 2011]. In order to enhance thedevice yields in high-density, reliable memories, semiconductor manufacturers haveincorporated redundancies into the memories [Stapper and Rosner 1995]. In suchredundancy analysis (RA), faulty cells detected in a memory are replaced by allocatingspare cells arrayed along a row or column.

RA is the most common repair technique performed on various platforms requiringhigh memory device yields. Memory diagnosis and the repair process are importantconsiderations in faulty cell repair. Memory diagnosis not only locates and repairsthe failures but also analyzes them to improve the design and manufacturing pro-cesses, which are essential for maintaining the yields and improving the reliabilitiesof memory devices. Most commercial semiconductor memory devices are tested usingexternal systems called automatic test equipment (ATE), and faulty cell replacement iscomputed by software programmed in the ATE. Thus, RA is conventionally performedby software-controlled ATE. After a test, the ATE stores the fault information in afault bitmap for execution of the RA. An online self-test was recently investigated toovercome the limited reliability of SOC [Li et al. 2010; Bernardi et al. 2013; Li et al.2013].

The RA algorithm in ATE is software based, whereas the built-in redundancy anal-ysis (BIRA) algorithm is hardware based. Although embedded memories in SOC aredifficult to access using external ATE [Huang et al. 1999], they can be tested and re-paired by built-in self-repair (BISR), which consists of a built-in self-test (BIST) andBIRA [McConnell and Rajsuman 2001; Nelson et al. 2001; Bayraktaroglu et al. 2005;Boutobza et al. 2005; Huang et al. 2007; Zhang et al. 2007]. The BIST minimizes therequirements of the embedded memory tester and greatly reduces the testing time byimproving the test flow. BIRA similarly minimizes the embedded memory requirementsand reduces the repair time through dedicated hardware logic.

To overcome the limitations of two-dimensional (2D) circuits, researchers haverecently developed three-dimensional (3D) integrated circuit technologies usingthrough-silicon vias (TSVs). These TSV-based 3D technologies drastically decrease theinterconnect length and interconnect power and largely increase the SOC capabilities[Knickerbocker et al. 2006; Pavlidis and Friedman 2009; Xie 2010]. In addition, 3Dintegration enables highly heterogeneous and sophisticated systems that increase thecircuit performance and reduce the cost [Davis et al. 2005]. However, the yield-lossproblem is severe in 3D memories because a whole stacked memory is affected by evena single defective memory in a die [Lee et al. 2009; Noia et al. 2011; Wu et al. 2012]. Thisyield problem has been solved by several BIRA algorithms designed for 3D memories.

The present article overviews the existing literature on RA and BIRA. Methodologiesbased on ATE with an RA algorithm and BIRA are now among the most desirablememory repair solutions. Our goal is to enhance understanding of current RA andBIRA studies by describing their important issues and discussing the existing solutionmethods.

The rest of this survey is organized as follows. Section 2 introduces existing yieldenhancement techniques that repair faulty memories caused by redundant cells. It


A Survey of Repair Analysis Algorithms for Memories 47:3

defines RA, as well as the performance criteria, preprocessing/filter algorithms, and RAapproaches. Section 3 describes the repair processes that use redundant elements withvarious spare architectures. Section 4 reviews RA algorithms and provides examples ofheuristic and exhaustive search algorithms. Similarly, Section 5 reviews and providesexamples of BIRA algorithms. Emerging issues concerning BIRA algorithms for 3Dmemory are discussed in Section 6. Finally, Section 7 summarizes this review andconcludes with comments on the survey findings.

2. BACKGROUND

Some preliminaries that will assist the reader’s understanding of the memory repairanalysis methods are introduced in this section. Although the following concepts willbe repeatedly mentioned in later sections, we consider it prudent to introduce themhere.

2.1. Performance Criteria

Before defining the performance criteria used to assess RA and BIRA algorithms, wefirst define the repair rate, an important criterion that is directly connected to thememory yield. The repair rate is the capability of an RA algorithm to obtain a correctrepair solution [Huang et al. 2003; Jeong et al. 2009]. The repair rate and normalizedrepair rate are respectively defined as

repair rate = number of repaired memoriesnumber of total tested memories

(1)

and

normalized repair rate = number of repaired memoriesnumber of reparable memories

. (2)

The repair rate is the ratio of the number of memories repaired by the RA procedure tothe total number of tested memories. The tested memories include all of the reparableand irreparable memories. The repair rate performance is counterintuitive, becausethe RA and BIRA algorithms can repair only the reparable memories; thus, the repairrate cannot be 100%. The normalized repair rate depends not on the total number oftested memories but also on the number of reparable memories, for which there existsat least one repair solution. Therefore, the normalized repair rate is more intuitive andbetter suited to estimating the performance of an RA algorithm than the actual repairrate. At the optimal repair rate, the normalized repair rate is 100%, meaning that allof the reparable memories are repaired.

The most commonly used performance criteria for RA and BIRA are the analysisspeed, repair rate, and area overhead [Huang et al. 2003; Ohler et al. 2007; Jeong et al.2009; Jeong et al. 2010]. Both RA and BIRA maximize the yield of memory devices byincreasing the repair rate, which improves the yield during mass production. Therefore,the optimal repair rate is essential for maximizing the yield of memories. Reducing thearea overhead lowers the cost of semiconductor production. The ATE, which uses anRA algorithm, has sufficient area overhead, so the area overhead is not a critical factorin RA system. However, the area overhead directly influences the BIRA, because thissystem uses additional hardware logic. The RA requires much longer to test high-density memories than low-density memories, due to the greater spaces that must beanalyzed. Moreover, the probabilities of faulty cells and numbers of spares and areaoverheads are larger in high-density memories than in low-density memories. The areaoverhead is particularly problematic, since the semiconductor memory manufacturingis trending towards higher-density memories. To support such high-density memories,


47:4 K. Cho et al.

longer RA times and larger area overheads are required. Faster RA speeds and lowarea overheads have many merits for RA and BIRA, because they can significantlyreduce the test costs. However, tradeoffs between the RA speed and repair rate aswell as between the repair rate and area overhead must be considered. To obtain anoptimal repair rate, exhaustive search algorithms must be used. These algorithmsrequire substantial time to analyze all of the fault addresses; furthermore, they needlarge analyzers that can burden the memories. Therefore, it is difficult to improve allthree of these features simultaneously.

2.2. Preprocessing/Filter Algorithms

The complexity of the redundancy allocation in a 2D spare architecture is NP-complete.The fault information is compatible with a bipartite graph with a vertex (node) set di-vided into two sets. The vertices on one side correspond to faulty rows; those on theother side correspond to faulty columns. The edges correspond to the fault information.The reconfiguration problem of the 2D spare architecture then becomes a Clique prob-lem, which is known to be NP-complete. Therefore, the redundancy allocation problemon a 2D spare architecture is also NP-complete [Kuo and Fuchs 1987], and searchingfor repair solutions by RA is a lengthy task. The RA time can be reduced by prepro-cessing and filter algorithms, which usually have polynomial time complexity. Thesealgorithms either terminate the analysis as early as possible or skip as many faultycells as possible. Some preprocessing and filter algorithms are presented below.

—Must-repair algorithm: To perform must-repair analysis, row error counters andcolumn error counters are created for each row and column of the devices. The criteriain this process are based on the numbers of available spare rows and columns in thedevices. If there are more faulty cells in a row than available spare columns, thenthe faulty row must be replaced by a spare row. Similarly, if there are more faultycells in a column than available spare rows, the faulty column must be replaced bya spare column [Tarr et al. 1984; Day 1985]. If a must-repair faulty row is repairedwith all of the available spare columns, then at least one faulty cell remains in thatfaulty row. In this scenario, no solution is found even though the faulty memory isreparable. Because it repairs the faulty lines that must be repaired to find a solution,this algorithm is called the must-repair algorithm.

—Early-abort algorithm: Some faulty memories are irreparable even when the repairsolution is found from all available spare elements, because no solution exists. Find-ing that the faulty memory is irreparable after the RA procedure is time intensive.However, if the reparability status is known before the RA procedure, time that wouldotherwise be wasted on unhelpful actions is saved. For this reason, many early-abortalgorithms have been proposed. The upper bound of the early-abort algorithm intro-duced in Day [1985] is not very useful in practice. Because the matching problemcan be resolved in polynomial time, other early-abort algorithms operate by maxi-mum matching of bipartite graphs [Haddad et al. 1991; Lin et al. 2006]. Irreparablememories can be distinguished in several ways. Basic observations of faulty patternscan be sufficient to detect irreparability [Kuo and Fuchs 1987; Huang et al. 2003;Jeong et al. 2010]. For example, if the number of faulty cells not sharing the sameaddress number with any other faulty cells exceeds the number of total spare lines,then the faulty memory is irreparable. Figure 1 shows an irreparable faulty mem-ory with three spare rows and three spare columns. There are seven faulty cells inthis faulty memory, each with its own row–column address. Because no two faultycells in this memory can be concurrently repaired with one spare element, the sevenfaulty cells cannot be repaired with three spare rows and three spare columns. Atthis point, regardless of the number of additional faulty cells, this faulty memory is



Fig. 1. Example of irreparable faulty memory.

Fig. 2. Status of DUT.

judged as irreparable. A more practical early-abort algorithm, using a repair set ofleading elements, was proposed by Wey and Lombardi [1987]. Figure 2 shows thediagnosed status of a device (also called the device under test, DUT) with three sparerows and three spare columns. The faulty bits were modeled by a Polya–Eggenbergerdistribution random number generator with A = 1.0, α = 0.6212, and β = 3.230. Theexperimental reliability was confirmed using 100,000 different samples. When thenumber of faulty cells exceeds a certain value, the region in which a device cannotbe determined as either reparable or irreparable is significantly reduced, and theregion in which a device is definitely irreparable increases. With a limited number ofspare elements, faulty memories containing large numbers of faulty cells can hardlybe repaired, as illustrated by the irreparable (green) region in Figure 2. Although the


47:6 K. Cho et al.

Fig. 3. Three types of RA approaches.

early-abort algorithm cannot always identify irreparability, it reduces the RA timeby preventing unhelpful actions on irreparable devices.

—Single-faulty-cell filter: A faulty cell that shares no address with any other faultycell is called a single faulty cell [Huang et al. 1990]. Unlike a must-repair faultyline, a single faulty cell can be repaired by only one spare element, regardless of thespare type. This property is exploited by the single-faulty-cell filter, which reducesthe number of faulty cells analyzed in the second stage by recording and filteringout the single faulty cells. Once the second stage has found a potential solutionfor the remaining faulty cells, the remaining spare lines can be easily calculated.Specifically, if the number of remaining spare lines (i.e., the sum of the numbers ofremaining spare rows and columns) exceeds the number of single faulty cells, then arepair solution can be found. Otherwise, there is no solution and the faulty memoryis irreparable.

2.3. Classification of RA Approach

Depending on its algorithm, the RA process can be supported by various faultcollection methods. RA approaches are classified into three types with differentfault collection and RA processes [Jeong et al. 2009], as shown in Figure 3.

The first type, the static RA approach, requires a full-sized fault bitmap to store allfaulty cells during memory testing. After the memory test, the fault bitmap storing allof the fault information is analyzed by an RA algorithm. The full-sized fault bitmapensures that the repair rate of the static RA is high, but its large size demands ahigh area overhead and additional search time. Therefore, the static RA approach isinappropriate for BIRA, in which a large area overhead is a critical factor.

The second type is the dynamic RA approach, which requires no fault bitmapbecause every arriving fault address is repaired by a spare element. The memory testand RA finish at the same time. Therefore, the dynamic approach is adopted in manyBIRAs, such as the comprehensive real-time exhaustive search test and analysis(CRESTA) [Kawagoe et al. 2000] and essential spare pivoting (ESP) [Huang et al. 2003]algorithms. However, because dynamic RA is designed for fast analysis, the CRESTAalgorithm increases the area overhead, whereas ESP decreases the repair rate. InCRESTA, the area overhead increases exponentially with the number of redundancies.

The last type is the hybrid RA approach. During memory testing, hybrid RAs store thefault addresses while executing the RA (preprocessing part, labeled RA1 in Figure 3)and then analyze the remaining fault addresses by the RA algorithm (labeled RA2



in Figure 3). Hybrid RA combines the advantages of static and dynamic RA, whilecompensating for their disadvantages. Hybrid RA is popularly used in BIRAs, such asin the local repair-most (LRM) [Huang et al. 2003], intelligent solve first (ISF) [Ohleret al. 2007], selected fail count comparison (SFCC) [Jeong et al. 2009], and BRANCH[Jeong et al. 2010] algorithms.

3. REPAIR PROCESS AND SPARE ARCHITECTURES

Methods of improving the yields of the RA and BIRA algorithms have been extensivelyresearched. Although their mechanisms differ, the repair processes share a commonrepair procedure. Additionally, some algorithms (depending on their characteristics)suppose similar spare architectures. This section discusses memory repair process andvarious spare architectures.

3.1. Process of Memory Repair

Figure 4 shows the typical memory test and repair process of the hybrid RA approach.During testing, if a fault cell is uncovered while applying the test patterns, the faultinformation should be saved for later repair. To reduce the test and repair times,a preprocessing/filter algorithm is applied. When a memory meets the early-abortcondition, it is filtered out as irreparable memory, and the memory test and repairprocess terminate, avoiding further unhelpful procedures. After the memory test, theRA algorithm seeks a solution to repair the faulty cells. Finally, the faulty memory isrepaired by replacing the faulty cells with good spare cells.

The fundamental test-and-repair procedure are common to all static and dynamic RAapproaches. However, since the static RA approach collects faulty cells before executingthe RA process, this procedure cannot apply a preprocessing/filter algorithm. In thedynamic RA approach, a preprocessing/filter algorithm is not necessary because thememory test and RA process terminate at the same time.

Although software and hardware repair methods yield the same results, they differin their processes. Software-based RA is performed on the ATE, so the test patternsare applied to the memory during the ATE testing. If a faulty cell is uncovered, thenthe ATE receives the fault information and stores it in a fault bitmap. Since software-based algorithms are quite slow, the RA execution time needs to be reduced by apreprocessing/filter algorithm, which examines the fault information. After the memorytest, the repair solution is found by a software-based RA algorithm in the ATE.

For memories using both BIST and BIRA (i.e., BISR), the BIST generates the testpatterns for memory cells and uncovers faulty cells. Fault information (when found) issent to the BIRA and stored in the BIRA’s storage elements. These elements, namely,registers and content-addressable memories (CAMs), compare the incoming data withthe data already stored within a clock cycle [Pagiamtzis and Sheikholeslami 2006].Governed by its RA algorithm, the BIRA then analyzes the fault information. Becausethe area overheads of BIST and BIRA are limited, any fault information that is filteredby the preprocessing/filter algorithm is ignored. The preprocessing/filter algorithmreduces the analysis time similarly to software-based RA. Once the memory test isfinished, the BIRA hardware (which is embedded in the chip) finds the repair solution.The typical BISR scheme [Lu et al. 2009; Jeong et al. 2009; Lee et al. 2011; Kang et al.2014; Hou et al. 2015] is schematized in Figure 5.

3.2. Spare Architectures

Figure 6 illustrates various types of spare architectures. Figures 6(a) and (b) of showthe 1D redundancy line architecture [Kim et al. 1998; Bhavsar 1999; Ottavi et al. 2004;Lu and Huang 2004], which repairs the memory array by using spare rows or sparecolumns. The 1D structure enables a simple repair solution without requiring a complex


47:8 K. Cho et al.

Fig. 4. Memory test and repair process of hybrid RA approach.

RA algorithm. Whenever a faulty memory cell is uncovered, the faulty row/column isreplaced with one spare element from the extra rows/columns. If a new faulty cellis uncovered at an address previously repaired by a spare line, then this faulty cell hasalready been repaired. Figures 6(a) and (b) present examples with three spare rowsand no columns and with three spare columns and no rows, respectively. The 1D sparearchitecture repairs a faulty cell (or a faulty line) by using a spare row/column line.The greatest benefit of the 1D spare architecture is its easy implementation. However,as the 1D architecture cannot repair orthogonal faulty lines, its repair efficiency is verypoor.

The 2D spare architecture repairs the memory array by using both spare rows andspare columns. Figure 6(c) presents a 2D spare line architecture, which repairs fourfaulty cells by using only two spares (i.e., one spare row and one spare column). In



Fig. 5. Overview of memory BISR.

contrast, the 1D architecture requires three spares to repair four faulty cells. For thisreason, the 2D architecture is more widely used to repair faulty memories. However, al-though 2D architectures are more flexible than 1D architectures, their limited numbersof spare lines reduce their repair rates when the number of faulty cells is large.

To reduce the disadvantages of traditional 2D spare architectures, researchers haveproposed new 2D architectures with configurable spares [Hsu and Lu 2002; Li et al.2005; Lu et al. 2005; Lu et al. 2006; Lee et al. 2011]. These architectures divide theword-lines and bit-lines in random access memories (RAMs) [Yoshimoto et al. 1983;Karandikar and Parhi 1998]. For example, if one spare row is split into two spareblocks, each block can repair two orthogonal faulty cells (i.e., single faulty cells), aswell as any faulty cells repaired by one spare row. Two-dimensional spare architectureswith spare cells have also been introduced [Shekar et al. 2011]. By appropriatelyutilizing the spare elements, such architectures achieve higher repair efficiencies butalso require additional area overheads, and their divided redundant elements mayrequire complicated control procedures.

Finally, Figure 6(d) presents a 2D spare architecture using spare words for word-oriented memory [Lu et al. 2006; Huang et al. 2006; Chang et al. 2008; Cao et al. 2012].This 2D spare architecture does not distinguish between bit-oriented and word-orientedmemory arrays. Because this architecture can flexibly allocate the spare words, itsrepair efficiency is higher than that of a traditional 2D spare architecture. This exampleassumes that one word occupies two bits. If a faulty cell at (3, 6) is relocated to (4, 6),then the faulty memory cannot be repaired from one spare row and one spare columnbut can be repaired from two words and one spare column without additional spares.In this example, one spare row and four spare words are equally sized. Although theuse of spare words improves the repair efficiency, it may require a complicated control.

4. RA ALGORITHMS USING SOFTWARE

Spare allocation problems can be solved by heuristic algorithms or by exhaustive searchalgorithms. Many search algorithms in both categories have been proposed. Although


47:10 K. Cho et al.

Fig. 6. Examples of various types of spare architectures.

heuristic algorithms can be very fast, they cannot guarantee that the optimal repairrate is achieved because they may not find solutions. Because heuristic algorithms areexecuted in the ATE, a non-optimal repair rate will reduce the yield and efficiencyof the ATE. Although exhaustive search algorithms will certainly reach the optimalrepair rate, the RA time and memory space requirements of these algorithms grow ex-ponentially (in the worst case) as the problem size or number of faulty cells increases.Furthermore, the repair solution depends on the constraints imposed on the redun-dancies, so the solution is non-unique for a given fault distribution. Therefore, the RAalgorithms must be carefully chosen.

4.1. Heuristic RA Algorithms with Non-Optimal Repair Rate

Low and Leong [1996] proposed a linear time heuristic algorithm that reconfiguresredundant RAMs. The average time complexities of several heuristic algorithms wereanalyzed by Shi and Fuchs [1992].

4.1.1. Repair-Most (RM) Algorithm [Tarr et al. 1984]. Because the complexity of the 2D RAis known to be NP-complete, it is hard to get fast analysis speed. To solve this problem,



the RM algorithm uses a simple algorithm that is based on the number of faulty cellsin faulty lines. Therefore, it shows fast analysis speed. Moreover, this algorithm is veryeasy to implement in various types of memories.

The RM algorithm adopts a hybrid RA approach consisting of two phases: a must-repair phase and a final repair phase. For each row, the must-repair analysis comparesthe value of each row-fault counter with the number of spare columns. After analyzingeach row, the must-repair process is applied to the columns, replacing those with morefaulty cells than spare rows. This analysis is repeatedly applied to the rows and columnsuntil the must-repair rows or columns do not use spare elements or until the availablespares have been exhausted. The final repair phase adopts an intuitive and simplealgorithm. The row-fault and column-fault counters are arranged in descending order,and the faulty line containing the greatest number of faulty cells is repaired. If a spareis available for that line, then the procedure removes the faulty cells in that faultyline and updates the row and column counters. This process is repeated until no faultycells remaining. If faulty cells remain after all spares are used, then the RM algorithmcannot repair the faulty memory.

4.1.2. FAST Algorithm [Cho et al. 2012]. Because the RM algorithm has too-simple amechanism, its repair rate is quite low. The FAST algorithm uses the concept of faultgrouping to improve the repair rate. In this way, it achieves relatively high repair rateand fast analysis speed.

The fast algorithm adopts a hybrid RA approach, which organizes individual faultgroups. The fault bitmap and fault groups are simultaneously formed during theATE memory test. Fault groups are differentiated into single-fault and multiple-fault groups, depending on their relations with the fault locations. The faulty cellsin multiple-fault groups share their addresses with other faulty cells. If a new faultycell has the same row or column address number as another faulty cell in a fault group,then the new faulty cell joins that fault group and becomes a member of a multiple-fault group. This algorithm adopts a hybrid RA approach with three preprocessingalgorithms: must-repair analysis, an early termination condition, and a single faultycell filter. If the number of groups exceeds the sum of the spare rows and columns,then the memory cannot be repaired. The preprocessing algorithm is succeeded by aFAST algorithm that examines the numbers of faulty row and column addresses in allof the multiple-fault groups. Single-fault groups are not checked, because these groupscontain only single-faults. As mentioned in Section 2.2, a single-fault can be repairedby a spare element after applying the FAST algorithm. The numbers of row and columnaddresses are then compared with the number of available spares. Finally, the numberof single-fault groups is compared with the number of remaining spares, and a solutionis returned.

Figure 7(a) shows the status of a faulty memory repaired with must-repair analysis.After the repair of faulty row 3 and faulty column 4, six faulty cells and two fault groupsremain. Because faulty cells (1, 0) and (1, 2) have the same row address, these faultycells are collected into Group 1. However, the row and column addresses of faulty cell(4, 7) differ from those of Group 1, so faulty cell (4, 7) is assigned to Group 2. Faultycells (5, 0) and (6, 2) share their column address numbers with faulty cells (1, 0) and(1, 2), respectively, so both are assigned to Group 1. Similarly, faulty cell (7, 7) sharesits column address with that of faulty cell (4, 7), so it is placed in Group 2.

Figure 7(b) shows the numbers of row and column addresses in all of the multiple-fault groups. The repair of Group 1 requires three spare rows or two spare columns,whereas Group 2 requires two spare rows or one spare column. There are four solutioncandidates, which are designated as (R3, R2), (R3, C1), (C2, R2), and (C2, C1), whereR and C represent rows and columns, respectively. The first and second terms in the


47:12 K. Cho et al.

Fig. 7. Example of the FAST algorithm.

parentheses denote the solution numbers of Group 1 and Group 2, respectively. Becausethere are two spare rows and two spare columns, combination (C2, R2) is chosen as therepair solution. Therefore, Group 1 is repaired with two spare columns and Group 2 isrepaired with two spare rows.

4.2. RA Algorithms with Optimal Repair Rate

To find the repair solution and achieve an optimal repair rate, many exhaustive searchalgorithms can be used. Because an exhaustive search considers all possible cases, itcan always find a repair solution in reparable memories. A detected faulty cell will berepaired by either a spare row or a spare column. Since a binary search tree alwaysconsiders the two possible consequences from one node, it is highly suitable for use inexhaustive search algorithms (in memory repair, one of the two branches correspondsto a spare row, and the other corresponds to a spare column). If a faulty cell is uncov-ered, then the ATE processor expands the binary search tree, regardless of whetherthe repair is spare-row or spare-column. Having completed the fault analysis with thebinary search tree, the algorithm decides the reparability of the tested memory. Thefollowings are RA algorithms with optimal repair rate using binary search trees.

4.2.1. Fault-Driven Algorithm. Since repair rate is a critical criterion in memory repairanalysis, exhaustive search RA algorithms are needed. The fault-driven algorithmdetects the repair solution using a binary search tree. Therefore, this algorithm canalways obtain an optimal repair rate.

The fault-driven algorithm is an exhaustive binary search tree algorithm introducedby Day [1985]. The RA time of the fault-driven algorithm is reduced by organizingthe algorithm into two phases (which constitutes a hybrid RA approach). The firstphase performs a must-repair analysis; the second phase performs a sparse-repair(final-repair) analysis based on a fault-driven process. The root record derived in thefirst phase is maintained as a solution record in the second phase. All of the repairsolution records are expanded and updated from this root record. When a faulty cell isdetected, the solution records are expanded and updated to include the row and columnaddresses of the faulty cells. In other words, two solution records are created from eachoriginal solution record: one expanded to the row address, and the other expanded tothe column address. If a solution record has no available spare, then it is excluded fromthe repair solution. Therefore, each solution record is a valid repair solution for thefaulty cells detected thus far.

Figure 8 demonstrates the fault-driven algorithm. The memory contains three sparerows and three spare columns (i.e., SR = 3 and SC = 3). The faulty memory consists of13 faulty cells. Row address 3 contains more faulty cells than spare columns (4 vs. 3).



Fig. 8. Example of fault-driven algorithm.


47:14 K. Cho et al.

Therefore, row address 3 becomes a must-repair line and is assigned one spare row.Next, column address 4 (which contains three faulty cells; one more than the numberof remaining spare rows) becomes a must-repair line and is allocated one sparecolumn. Therefore, the must-repair analysis repairs one faulty row and one faultycolumn, as shown in Figure 8(a).

Figure 8(a) depicts the faulty memory cell array with the two available spare rowsand two spare columns, as well as solution record 1, after the must-repair analysis.After detecting faulty cell (1, 0), the algorithm compares the row and column addressesof this cell with the data in solution record 1. Because neither the row nor columnaddress matches that of solution record 1, the row address 1 and column address 0are added to solution records 2 and 3, respectively, as illustrated in Figure 8(b). Therow and column addresses of the next detected faulty cell (1, 2) are compared withthe contents of all of the solution records (see Figure 8(c)). The faulty row addressmatches that in solution record 2, so solution record 2 is maintained as solution record4. Because neither the row address nor the column address is found in solution record3, the solution records are changed: the row address 1 and column address 2 are newlyadded as solution records 5 and 6, respectively.

The fault addresses are stored in the solution records in order of the detected faultycells. Invalid solution records, which may contain overlapping solutions, are removedby optimizing the solutions. A faulty cell is repaired by a spare row and a spare columnin the invalid solution records. This optimization step is illustrated in Figure 8(c).Solution record 5 contains (R1, C0), so one faulty cell, (1, 0), is repaired from two sparelines, and solution record 5 is removed after the optimization process. If no spares areavailable, then there is insufficient space to store the addresses of the detected faultycells in the records. Figures 8(d)–(g) repeat the fault-driven analysis for the remainingfaulty cells [(4, 7), (5, 0), (6, 2), and (7, 7)]. Finally, the fault-driven analysis finds threevalid solution records (solution records 18–20).

4.2.2. Branch-and-Bound Algorithm. In reality, there is a difference between the cost ofembedding a spare row line and a column line. The branch-and-bound algorithm,introduced by Kuo and Fuchs [1987], employs the spare cost-function exhaustive binarysearch tree algorithm. It can achieve not only an optimal repair rate but also the cost-effective repair solutions.

This hybrid-RA algorithm performs a must-repair analysis in the first phase andfinds the repair solutions by a branch-and-bound final analysis in the second phase.The algorithm assigns a cost to each node of the tree by a cost function F, which iscalculated from the weighted values of the spares. Kuo and Fuchs proposed weightingthe number of allocated spare rows by 8 and the number of allocated spare columns by15. The cost function is then given by

F = 8 × R + 15 × C, (3)

where R and C are the numbers of allocated spare rows and spare columns, respectively.Since this algorithm employs a branch-and-bound algorithm based on the binary searchtree, the second phase maintains a solution of must-repair lines in the root node (i.e.,node 1) of the binary search tree. When a faulty cell is detected, the solution node withthe lowest cost function is updated to cover the faulty cell and accumulate the cost.

Figure 9 demonstrates the branch-and-bound algorithm. Figure 9(a) shows a memorycell array with three spare rows and three spare columns (i.e., SR = 3 and SC = 3).In the first phase, row address 3 and column address 4 are allocated as the must-repair lines. After the must-repair analysis, two spare rows and two spare columns areavailable for the second phase, as shown in Figure 9(b).



Fig. 9. Example of branch-and-bound algorithm.

Figure 9(c) demonstrates the analysis procedure using the branch-and-bound algo-rithm. Node 2 contains row address 3 and column address 4, and its cost function Fis 23. The row and column addresses of the detected faulty cell (1, 0) differ from theaddresses in record 2, so nodes 3 and 4 are created. The faulty row address 1 is addedto node 3 with a cost of 31, and the faulty column address 0 is added to node 4 witha cost of 38. The address values of the second detected faulty cell (1, 2) are comparedwith those of node 3, with the lowest cost. Because the faulty row address 1 is alreadystored in node 3, node 5 maintains the status of node 3. However, node 6 is updatedwith a cost of 46 because the second faulty cell is located at a new column address (2).The third faulty cell (4, 7) is added to the lowest-cost node (in this case, node 5), andnodes 7 and 8 are generated. Once these nodes are produced, node 4 has the smallestcost and is thus expanded for the second faulty cell (1, 2), which is assigned to nodes3 and 4. From node 4, nodes 9 and 10 are generated with costs of F = 46 and F = 53,respectively. Node 7 now has the smallest cost, but no spare row is available. Conse-quently, node 11 is generated for the fourth faulty cell (5, 0). Although nodes 8 and9 have the same cost, node 8 is deeper in the tree than node 9. Therefore, nodes 12


47:16 K. Cho et al.

and 13 are generated from record 8. Meanwhile, nodes 14 and 15 are generated fromnode 9, and node 16 is generated from node 10 for the third faulty cell. Similarly, nodes17 and 18 are generated from nodes 11 and 12, respectively, for the fifth faulty cell (6,2). Although one faulty cell remains after all spare cells have been assigned, this cell(7, 7) can be repaired because 7 is held in node 18. The final repair solution is node 19,which removes all faulty cells at the lowest cost.

4.2.3. PAGEB Algorithm. The repair algorithms based on the binary search tree showpoor performances in analysis speed. The PAGEB algorithm [Lin et al. 2006] transformsan RA problem into Boolean functions that are handled by a binary decision diagram(BDD). In this way, the PAGEB algorithm achieves an optimal repair rate and relativelyfast analysis speed.

This hybrid RA approach executes in two phases: the preprocessing/filter algorithmsdescribed in Section 2.2 followed by Boolean transformation of the remaining faultycells. Three Boolean functions are available: a defect function (DF) derived from theencodings of the positions of all faulty cells, a constraint function (CF) derived fromthe encodings of all combinations of reparable faulty lines, and a repair function (RF)derived from the encodings of all repair solutions. Because the row and column con-straints are orthogonal, they are considered separately and denoted as CFr and CFc,respectively. The CF is the product of CFr and CFc. The row and column constraintsare equivalent to a more intuitive function called the replacement function (PF). If afaulty line will be repaired by a spare line, then the number of spares decreases by one.If a faulty line will not be replaced, then the number of spare lines remains unchanged.This procedure is iterated over all faulty lines, as dictated by the following equation.Here, d faulty lines can be placed into L in any order, and SR is the number of sparerows,

PF (Lk, d, SR) = Lk · PF (Lk+1, d − 1, SR − 1) + Lk · PF(Lk+1, d − 1, SR). (4)

The PF has two useful properties. If the number of spares is equal to or larger thanthe number of faulty lines, then the PF is true. Otherwise, if no spares are available,none of the faulty lines can be replaced.

By calculating the RF as the product of DF and CF, we can calculate the reparabilityof each faulty cell (i.e., the DF) in a faulty line from the PF. A repair solution is foundduring a single search for the BDD of the RF. A path from the top of a BDD to 1 and 0is the solution and non-solution paths, respectively.

Figure 10 depicts a memory cell array with a spare row and a spare column (i.e.,SR = 1 and SC = 1) and its three Boolean functions (DF, CF, and RF). Selecting thevariable ordering as (R1 < C1 < C3 < R3), the DF can be calculated by using

DF = (R1 + C1) · (R1 + C3) · (R3 + C3). (5)

Figure 10(b) shows the BDD of the DF. The CFr can then be calculated by using

CFr = PF (L0, 2, 1)= L0 · PF (L1, 1, 0) + L0 · PF (L1, 1, 1)= L0 · L1 + L0 · true= R1 R3 + R1.

(6)

The CFc can be derived in the same manner. The global constraint function is then

CF = CFr · CFc = R1C1 + R1C1C3 + R1 R3C1 + R1 R3C1C3. (7)



Fig. 10. Example of PAGEB algorithm.

Table I. Comparison of RA Algorithms

Category Name Type of RA Approaches Repair Rate Analysis Speed

HeuristicRM Hybrid Low Fast

FAST Hybrid Medium Very Fast

ExhaustiveFault-driven Hybrid Optimal Very Slow

Branch-and-bound Hybrid Optimal SlowPAGEB Hybrid Optimal Medium

Figure 10(c) shows the BDD of the CF. Finally, the repair function RF can be derivedfrom the product of Equations (5) and (7) as follows:

RF = R1C1C3R3. (8)

Figure 10(d) shows the BDD of the RF. The final repair solution of Figure 10(a) is(R1, C3).

4.2.4. Other Algorithms. Most exhaustive search algorithms are based on branch-and-bound algorithms, which have been detailed above. A binary branch-and-boundalgorithm for optimum repair was introduced in Hemmady and Reddy [1989].This algorithm categorizes faulty cells as single faulty cells, linear faulty cells (faultylines that share no faulty cells with other faulty lines), and nonlinear faulty cells. Whenall of the faulty cells are either single or linear faulty cells, the problem is called a lin-ear problem instance (LPI). Because an LPI is known to be irreparable without furtheriterations of the algorithm, this identification system reduces the time required by theprocedure.

To maximize the throughput, defined as the number of good chips produced per unittime, Haddad et al. [1991] proposed a new algorithm that either reduces the averagetime of the ATE or increases the repair rate. This algorithm terminates before the RAprocedure by calling an early-abort algorithm based on a bipartite graph. The algorithmseeks a repair solution by a branch-and-bound algorithm.

4.3. Summary of RA

This subsection compares the representative RA algorithms discussed above. Qualita-tive analyses of the various RA algorithms are summarized in Table I.


47:18 K. Cho et al.

As mentioned in Section 2.1, the area overhead is not an important performancecriterion for RA algorithms, because the ATE system requires no additional area. As anatural consequence, RA algorithms are designed to have high repair rates and analysisspeeds. The analysis speed is improved by preprocessing/filter algorithms, which arecalled during the fault classification phase. For this reason, the five RA algorithmsdiscussed above are hybrid RA approaches.

The exact repair rates and analysis speeds of RA algorithms depend on the numberof faulty cells or the number of spare elements. Space limitations preclude quantitativeanalysis of the various RA algorithms. Clearly, the fast analysis speeds of heuristic RAalgorithms are counteracted by their low repair rates. On the other hand, exhaustiveRA algorithms achieve the optimal repair rate but at relatively slow analysis speeds.Among the heuristic RA algorithms, the FAST algorithm delivers the best repair rateand analysis speed, whereas RM is simple and can easily be implemented. Among theexhaustive RA algorithms, the PAGEB algorithm shows the highest analysis speed.However, exhaustive search RA algorithms are inherently slower than their heuristiccounterparts. The tradeoff relationship between repair rate and analysis speed hasbeen extensively investigated.

5. BIRA ALGORITHMS USING HARDWARE

External ATEs cannot easily access the embedded memories in SOCs. Instead, SOCmemories are usually repaired by BISR, which combines BIST and BIRA. BecauseBISR necessitates the use of additional hardware, the BIRA algorithm requires morearea overhead than RA algorithms, which use software in external ATE. To achievethe optimal repair rate, all of the faulty cell information is stored in a register orCAM. However, the area overheads of these storage elements are quite large, andthe larger spaces to be analyzed require greater runtimes. Therefore, where possible,BIRA algorithms achieve the optimal repair rate under the constraint of area reduction.This section discusses various existing BIRA algorithms, which are widely used andbased on 2D spare architectures, and illustrates them with a simple example. We alsodiscuss other unique BIRA algorithms targeting multiple memories and different sparearchitectures.

5.1. BIRA Algorithms with Non-Optimal Repair Rate

5.1.1. LRM Algorithms. The area overhead of the full bitmap is unmanageably large.The LRM algorithm employs a local bitmap and adopts the hybrid RA approach[Huang et al. 2003]. This algorithm greatly reduces the size of fault bitmap so itenables BIRA to adopt RM algorithms.

The LRM properly allocates spares concurrently with the BIST operation using alocal bitmap. The local bitmap consists of row and column address tags and m byn flags. The address of an uncovered faulty cell is compared with the address tags.If the row or column address is not in the local bitmap, then the address of thefaulty cell is stored in a new address tag, and the next available local bitmap isfilled. If the detected faulty cell is beyond the local bitmap, then the LRM repairsthe line with the most faulty cells in the local bitmap and then clears the local bitmapby eliminating the relevant faulty cells. Next, the information contained within thefaulty cell is stored in the address tags. The LRM algorithm executes until no faultycell is detected or until the row or column address tags are full. By reducing the bitmapsize, this scheme largely reduces the area overhead. However, the bitmap storage re-duction can cause the loss of information about the faulty cells in the RA procedure.Therefore, the LRM algorithm cannot guarantee the optimal repair rate.

Figures 11(a) and (b)–(f) illustrate a faulty memory and its repair by LRM analysis,respectively. The local bitmap in Figure 11(b) stores six faulty cells: (0, 4), (1, 0), (1, 2),



Fig. 11. Example of LRM analysis process.

(3, 1), (3, 3), and (3, 5). Because a faulty cell (3, 6) cannot be stored in the local bitmap,row address 3 is repaired with a spare row. The corresponding entries are cleared,and the information contained within the remaining faulty cells is stored. Figure 11(c)shows the status of the local bitmap stored with all of the faulty cells. Column address4, with the largest number of faulty cells, is repaired, and the corresponding entriesare cleared (see Figure 11(d)). Figure 11(e) shows the status of the local bitmap afterrepairing column address 4. Although there are three faulty columns in this case, rowaddress 1 is repaired first because this algorithm adopts the row-first strategy. Theremaining three single-fault cells are repaired because a sufficient number of sparesis available, as shown in Figure 11(e). Thus, the faulty repair solution is R(3,1,5) andC(4,2,1), where R and C denote the row and column addresses, respectively.

5.1.2. ESP Algorithm. Area overhead is important as much as repair rate in BIRAapproaches. The ESP algorithm adopts the dynamic RA approach, which uses repairregisters instead of a bitmap [Huang et al. 2003]. An ESP algorithm greatly simplifiesthe control circuit and offers the lowest area overhead.

The rules of the ESP algorithm are similar to the must-repair rules. However, thesolution of the ESP algorithm is decided by a customized threshold value. If the numberof faulty cells in a line reaches the threshold, then the faulty line becomes an essentialpivot line that must be repaired. When the threshold value is set to 2, an essential pivotline is defined as a faulty line that contains more than two faulty cells. It means that if anew incoming faulty cell has the same row address number or column address numberas an already-detected faulty cell, both faulty cells must be repaired by a single spareline. So, in this case, the sum of the numbers of spare rows and columns can specify theminimum storage required to repair a faulty memory. ESP is very simple to implementand quickly reaches a solution. However, since ESP registers cannot store all of thefaulty cell information, the repair rate cannot be optimal.

Figures 12(a) and (b) illustrate a faulty memory and its repair by the ESP algorithm,respectively. The spares sum to six, so there are six registers for storing faulty celladdresses. The row–column address of the first faulty cell (0, 4) is stored in the first


47:20 K. Cho et al.

Fig. 12. Example of ESP analysis process.

repair register. Because the second faulty cell (1, 0) is orthogonal to cell (0, 4), itsaddress is stored in the second repair register. The row address of the third faulty cell(1, 2) is already stored in the second repair register. Therefore, the row of the secondrepair register becomes an essential pivot row, and row 1 (i.e., R1) is repaired. Theaddress of the fourth faulty cell (3, 1) is then stored in the third repair register. Therow address of the fifth faulty cell (3, 3) matches that of the faulty cell (3, 1), so row3 becomes an essential pivot row. Once row 3 is repaired, faulty cells (3, 5) and (3, 6)are automatically repaired. When the eighth faulty cell (4, 4) is detected, column 4 ofthe first repair register becomes an essential pivot column and is repaired. The ninthfaulty cell (4, 7) is stored in the fourth repair register. The tenth faulty cell (5, 0) sharesa column address 0 with the second register; thus, column 0 becomes an essential pivotcolumn. The 11th faulty cell (6, 2) is stored in the fifth repair register. The essentialpivot row 6 and the essential pivot column 7 are repaired after detecting the 12thfaulty cell (6, 4) and the 13th faulty cell (7, 7), respectively. Thus, the repair solution isR(1,3,6) and C(4,0,7).

5.1.3. Other BIRA Algorithms. One BIRA algorithm based on an RM algorithm uses sparememories as the fault counters [Kang et al. 2008]. This algorithm stores the fault countsin spare memories rather than a dedicated row/column fault counter. Another BIRAalgorithm stores and compares the addresses of faulty cells in spare mapping registers[Yang et al. 2009]. To improve the effectiveness of spare usage and the repair rate, thisalgorithm identifies a dummy spare line containing a faulty cell covered by alreadyassigned spare lines.

5.2. BIRA Algorithms with Optimal Repair Rate

5.2.1. CRESTA Algorithm. While other BIRA researches focus on area overhead, theCRESTA algorithm wants to improve analysis speed. This algorithm repairs faultymemory using plural sub-analyzers [Kawagoe et al. 2000]. Its major benefit is zeroanalysis time.

CRESTA adopts the dynamic RA approach to test memory and analyzes its faultinformation in real time by employing exhaustive searching. During the memory test,all of the sub-analyzers concurrently seek solutions in different scheduled orders using



various combinations of spare rows and columns. Because the sub-analyzers find allpossible repair solutions, CRESTA delivers an optimal repair rate. However, as thenumbers of spare rows and columns increase, the number of required sub-analyzersmay become unrealistic. The required number of sub-analyzers is given by

(Rs + Cs)!/(Rs! × Cs!), (9)

where Rs and Cs denote the numbers of spare rows and columns, respectively. The areaoverhead of CRESTA is an exponential function of the number of redundant cells.

Figure 13(a) shows a faulty memory to be repaired by CRESTA. Figures 13(b) and (c)demonstrate the solution-finding performances of 2 of 20 sub-analyzers (RRRCCC andCRRCRC, where the sub-analyzers are labeled as RRRCCC, RRCRCC, RRCCRC, . . . ,CCCRRR). Since displaying the performances of all sub-analyzers would occupy a lotof space in this article, we illustrate the operation of the CRESTA analyzer by oneunsuccessful and one successful sub-analyzer. When a faulty cell is detected, eachsub-analyzer allocates a spare line in some predetermined order. If the row or columnaddress of a newly detected faulty cell is already allocated, then the correspondingsub-analyzers allocate no new space to that faulty cell. When the first faulty cell (0,4) is detected, sub-analyzers #1 and #12 allocate a spare row and a spare column tothis faulty cell, respectively. Each sub-analyzer then assigns a spare row to the secondfaulty cell (1, 0). The row address of the third faulty cell (1, 2) is already repaired, soboth sub-analyzers wait for the next faulty cell. Because their third ordered allocationis a spare row, both sub-analyzers allocate a spare row to the fourth faulty cell (3, 1).Subsequently, as the assigned spare row repairs faulty cells (3, 3), (3, 5), and (3, 6), bothanalyzers halt their allocations. Later, when the eighth faulty cell (4, 4) is detected,sub-analyzer #1 allocates a spare column, but sub-analyzer #12 performs no actionbecause the column address of (4, 4) has already been repaired. When the 10th faultycell (5, 0) is detected, sub-analyzer #1 and sub-analyzer #12 allocate a spare columnand a spare row, respectively. At this point, sub-analyzer #1 is unsuccessful becauseit has exhausted its spares. Alternatively, sub-analyzer #12 has one spare column,which it allocates to the 11th faulty cell (6, 2). Although sub-analyzer #12 has nowalso exhausted its spares, the procedure is successful because the remaining faultycells (6, 4) and (7, 7) were repaired earlier. Therefore, by virtue of the predeterminedallocation order, CRRCRC can repair the faulty memory shown in Figure 13(a). Therepair solution is R(1, 3, 5) and C(4, 7, 2).

5.2.2. Intelligent Solve (IS) Algorithm [Ohler et al. 2007]. Even allowing for the excellentperformance in analysis speed, the area overhead in CRESTA is too large for practicalmemory. The IS algorithm finds an optimal solution by a depth-first traversal binarysearch tree. It was introduced to obtain the optimal repair rate and reduce the areaoverhead for built-in systems.

The IS algorithm is based on the hybrid RA approach. The hardware is easily imple-mented on a stack. Whenever BIST detects a faulty cell, it pushes a new branch ontothe stack and recalculates the address of the faulty line stored in the address registerof the first available spare row or column counter. Since the allocation orders of thespare rows and columns are predetermined before the memory test, backtracking to re-tract the repair solutions is initiated by decreasing the row or column counter. In otherwords, if the search fails, the IS algorithm backtracks up the binary search tree andsearches other branches. This process is repeated until all of the alternative brancheshave been explored.

The IS algorithm reduces the search spaced by dynamic must-repair analysis and bymust-repair analysis. First, a faulty line that becomes a must-repair line is replacedby a spare line. The remaining faulty cells are then analyzed by the IS algorithm.


47:22 K. Cho et al.

Fig. 13. Example of CRESTA analysis.

As a branch is pushed onto the stack in the new repair step, the number of availablespares changes, and the dynamic must-repair analysis is implemented. This dynamicmust-repair process reduces the search space of faulty memory repair and the timeof searching the repair solution. The ISF algorithm stops the searching process afterfinding the first solution, further reducing the amount of backtracking and the searchtime.

Figure 14 shows a faulty memory and the depth-first tree traversal of the IS algo-rithm. After repairing row address 3 and column address 4 as must-repair lines, theISF has two available spare rows and two available spare columns for repairing the six



Fig. 14. Example of ISF analysis process.

remaining faulty cells (see Figure 14(b)). The root node of the tree shown in Figure 14(c)is pushed onto the stack. To analyze the R-R-C-C branch in Figure 14(c), the algorithmcompares the faulty cells with the previously stored addresses in the nodes. Because afaulty cell (7, 7) remains when all of the spare elements have been used, backtrackingalong the other branches is initiated. Backtracking stops at node 3, because the nextbranch is R-C-R-C and the first R has already been allocated. Because the R-C-R-Cbranch can repair the faulty memory, the contents of node 10 are a repair solution. Thenode of this branch is closed, and backtracking restarts at node 7 in order to apply theR-C-C-R branch. Because the R-C-R-C branch can also repair the faulty memory, thecontents of node 13 are also a repair solution. The node of this branch is then closed,


47:24 K. Cho et al.

and backtracking is performed until node 1 is reached. Although the C-R-R-C and C-R-C-R branches cannot be repair solutions, the faulty memory can be repaired throughthe C-C-R-R branch, as shown in Figure 14(c). Thus, in the ISF execution, node 10 isthe last node of the search tree because it contains the repair solution.

5.2.3. BRANCH Algorithm [Jeong et al. 2010]. Since the performance criteria of BIRA areplaced in tradeoff relationship to each other, it is difficult to get favorable evaluations inoverall performances. When running the test sequence, BRANCH stores the detectedfaulty cells in a fault-storing CAM structure, enabling easy recognition of a sparepivot and implementation of RA. The BRANCH algorithm shows good performances inrespect of repair rate, area overhead, and analysis speed.

This algorithm is based on the hybrid RA approach. BRACH’s fault-storing CAMstructure consists of parent address CAMs and child address CAMs. Parent addressesare collected identically to the spare pivots in ESP, and the non-spare pivot addressesare stored as child addresses. Parent address CAMs have three fields: enable flags,row/column addresses of faulty cells, and must-repair flags for rows/columns. Childaddress CAMs have four fields: enable flags, parent CAM pointers, child address de-scriptors, and row or column addresses. BRANCH checks whether an incoming faultycell has either the same row address number or the same column address number asone of the parents. If so, then BRANCH stores the column (row) address of the incomingfaulty cell in the next valid position in the child address CAM. The storage procedureoperates as follows: (1) the enable flag turns on, (2) the pointer of the parent addressCAM is stored in the parent CAM pointer, and (3) the child address descriptor is set to0 (shared row address) or 1 (shared column address). Otherwise, when a new faulty cellmeets the must-repair rule, the corresponding row/address must-repair flag is set to1. Finally, the algorithm clears the contents of the child address CAMs with the samerow or column address number as the must-repair faulty line.

The BRANCH analyzer simultaneously analyzes all of the nodes in a branch throughcontrol signals and repair candidates. When all parent address CAMs are occupied, thetotal number of repair candidates (total number of branches in the binary search tree)is given by Equation (9). BRANCH selects parent addresses as repair candidates byusing a control signal. The BRANCH analyzer investigates all of the faulty cells in thechild address CAMs covered by the selected parent addresses. If a child row addressmatches a selected parent address, the Row-cover signal is set to 0; otherwise, it isset to 1. Similarly, if a child column address matches a selected parent address, theColumn-cover signal is set to 0; otherwise it is set to 1. The Cover signal is generatedby the logical AND operation of Row-cover and Column-cover. If all of the Cover signalsare 0, then the Analysis signal is 0, meaning that a repair solution candidate can coverall of the child addresses. Otherwise, the Analysis signal is 1. Even when a candidaterepair solution covers all faulty cells, BRANCH checks the status (high or low) of theValid_control signal, which investigates whether the candidate repair solution containsthe address of a must-repair faulty line. Because the child address CAMs do not storethe addresses of must-repair faulty lines, unless the address of the must-repair faultyline is contained in the candidate repair solution, that solution is invalid. Invalidsolutions may invalidate the parent addresses, and the Analysis signal may not become0. To boost the analysis speed, if the number of invalid parent addresses is larger thanor equal to the number of uncovered child addresses, the Cover_match signal becomes0; otherwise, it is 1. Thus, if one of the Analysis or Cover_match signals is 0 and theValid_control signal is also 0, the Result signal becomes 1, and the candidate repairsolution candidate is determined as the repair solution.

Figures 15(a), (b), and (c) display a faulty memory, a fault-storing CAM structure aftera fault collection, and the performance of a BRANCH analyzer, respectively. Faulty cells



Fig. 15. Example of BRANCH analyzer.

are detected from top to bottom and from left to right. After fault collection, the parentaddress CAMs contain the pivot set [(0, 4), (1, 0), (3, 1), (4, 7), and (6, 2)] (Figure 15(b)).Since row address 3 satisfies a must-repair condition, the row must-repair flag of rowaddress 3 is set to 1. The BRANCH algorithm now analyzes the contents of the fault-storing CAMs. Among the 20 candidate repair solutions, we present the analyses ofthree in this example. The parent column (row) addresses are selected according to thevalues and positions of the control signal. If the ith bit of the control signal is 0, thenthe candidate repair solution is the column value of the ith parent address; otherwise,it is the row value of the ith parent address.

Figure 15(c) demonstrates the BRANCH analyzer on 111000 (i.e., RRRCCC) and011010 (i.e., CRRCRC). The 111000 control signal selects parent row addresses 0, 1,and 3 and parent column addresses 7 and 2. The AND operation of Row-cover andColumn-cover yields the Cover signal sequence 010000. The number of 1s in the Coversignal sequence equals the number of invalid parent addresses, so the Cover_matchsignal is set to 0. However, the candidate repair solution does not contain column 4(the must-repair faulty line), so Valid_control is set to 1, and the Result is 0. Therefore,control signal 111000 cannot repair the faulty memory. Control signal 011010 can bedescribed in the same manner. There is one child address that is uncovered by anyrepair solution candidate, so the Cover signal sequence becomes 010000. Although theAnalysis signal is set to 1, the Cover_match signal is 0 because the number of 1s inthe Cover signal sequence equals the number of invalid parent addresses. In addition,the control signal contains the must-repair faulty lines (i.e., row address 3 and columnaddress 4). Therefore, the Valid_control signal becomes 0, the Result signal becomes 1,and the control signal is determined to be the repair solution R(1, 3, 6) and C(4, 7, 0).


47:26 K. Cho et al.

5.2.4. Other BIRA Algorithms. Alternative binary search tree–based algorithms includeSFCC [Jeong et al. 2009], the end-of-file (EOF) algorithm [Yang et al. 2010], andthe adopting fault count algorithm [Cho et al. 2010]. The SFCC algorithm comparesthe number of faulty cells among the faulty lines and builds a line-based search tree.The EOF algorithm reduces the RA time by applying a preprocessing filter and re-moving needless branches in the search space through fault classification. Cho et al.’s[2010] searches the tree branches in descending order of the row/column fault countsof the incoming faulty cells. In the BIRA algorithm with depth-first searching, a circuitis formed by a parallel prefix algorithm, which reduces the area overheads and eval-uates the time requirements [Chung et al. 2010]. RA algorithms have been designedand estimated by defining vectors of preferences for rows (or columns). Based on thesepreferences, an exhaustive search algorithm assigns a spare row or column to eachfaulty cell for each feasible solution path vector of rows (or columns) [Chung et al.2010]. CRESTA has been enhanced to support embedded word-oriented memories [Duand Cheng 2004]. Another BIRA algorithm using a single analyzer, single test run,and local bitmap is extendible to multiple RAMs for word-oriented memories [Chenet al. 2012]. This BIRA algorithm performs exhaustive searching of a maximized localbitmap to guarantee the optimal repair rate.

5.3. Other BIRA Algorithms for Unique Circumstances

This section discusses other unique BIRA algorithms with multiple memory targetingor different spare architectures.

In one approach, the spare columns are divided into groups [Li et al. 2003; Li et al.2005]. After applying the row-repair rule, the remaining faulty cells are repaired bythe RM algorithm [Li et al. 2005]. The rows are reconfigured by organizing them intorow (column) blocks. The extended LRM algorithm sequentially repairs each block-level RA [Lu et al. 2005]. The extended ESP algorithm, with additional spare words,uses a block-level RA [Lu et al. 2006]. Block-level repair techniques that use switchingelements have been proposed for CAMs [Lin et al. 2009]. Additionally, a flexible BISRstrategy with selectable word redundancy has been proposed for static random accessmemory (SRAM) [Cao et al. 2012].

BIRA algorithms for word-oriented memory correct multiple faulty cells within aword. These words are defined as vectors or hamming syndromes and are effectivelystored in the storage elements of BIRA algorithms. One BIRA scheme allocates the2D redundancy using a 1D local bitmap for word-oriented memory [Tseng et al. 2006;Tseng et al. 2011]. There are several BIRA algorithms for RAMs with two-level re-dundancy consisting of 2D redundancy and spare words [Schober et al. 2001; Huanget al. 2006; Chang et al. 2008]. Because spare words can be utilized more effectivelythan spare rows, the repair rate is higher in spare word architectures than in 2D sparearchitectures. Huang et al. [2004] proposed a BIRA design with a fail-pattern identifi-cation technique. This algorithm detects faulty rows, columns, and words, and repairsthem by using spare rows and columns.

RA algorithms can be evaluated by a simulator that calculates the repair rate, mem-ory configuration, and redundancy structure [Huang et al. 2002]. With this tool, userscan design BIRA algorithms according to access and can plan the spare elements of theembedded memories.

A row-redundancy scheme that applies Gray code to the row decoder of a flashmemory provides a useful erase characteristic [Mihara et al. 1994; Campardo et al.2003; Silvagni et al. 2003]. The method of Mataresse and Fasoli [2001] calculates theredundancy coverage of various spare architectures of flash memory. A BISR schemewith a modified ESP algorithm for NOR-type flash memory [Hsiao et al. 2006], and



BISR architectures for flash memories, such as NOR and NAND [Hsiao et al. 2010],have also been introduced.

Other BISR schemes have been tailored to multiple RAM addresses and multipleRAM cores with different sizes and spare architectures [Shoukourian et al. 2001; Zorian2002; Zorian and Shoukourian 2003; Huang et al. 2006; Tseng et al. 2006; Huang et al.2007; Tseng and Li 2008; Tseng et al. 2010; Lu et al. 2012]. Equipping each RAMconfiguration with its own BIRA is impractical because of the large area overheads.To overcome this problem, RA for multiple RAMs is performed in series or parallel.In a serial method, in which the RA sequentially processes each RAM, one BIST andone BIRA are required. These methods greatly reduce the area overhead, but thesequential operation involved is time-consuming. Parallel methods operate throughcentral controllers, and each RAM has a wrapper. The RA uses a test pattern generator,local bitmap, and simple controller, which interact with the central controller in thewrapper to support complicated RA procedures. The RAMs are simultaneously testedand repaired by the wrappers. If a faulty RAM cannot be repaired by its wrapper, thenits fault information is analyzed by a central controller (i.e., BIRA). Although methodsof this type are faster than sequential methods, their area overhead requirements arehigher.

5.4. Dynamic Random Access Memory (DRAM) Repair Techniques

DRAM chips need refresh operations to restore periodic data loss. Retention failuresoccur when DRAM cells lose their data before the refresh operation. As mentionedin Section 1, yield loss is a major problem in current semiconductor manufacturing,which must keep pace with rapid advancements of deep submicron technologies. Thisreliability problem is even more serious in DRAMs, which are subject to retentionfailures. Therefore, DRAM reliability has been the focus of many studies [Mandelmanet al. 2002; Mutlu 2013; Khan et al. 2014; Sridharan et al. 2015]. DRAM reliabilitycannot be ensured by typical repair techniques using row and column redundancy,because retention failures are intermittently generated even in the same cell. Thesefailures, called intermittent retention failures, largely originate from the variableretention time (VRT), which describes the time variability of the memory cell leakage[Mori et al. 2005; Liu et al. 2013]. Data pattern sensitivity can also cause intermittentretention failures in DRAM [Lee et al. 2010; Liu et al. 2013]. The failure conditionsof a DRAM cell depend on the data in its adjacent cells. Several reports on practicalstudies have described the errors and failure mechanisms in DRAM [Schroeder et al.2009; Hwang et al. 2012; Kim et al. 2014; Meza et al. 2015].

5.4.1. Retention-Aware Intelligent DRAM Refresh (RAIDR) [Liu et al. 2012]. Although refreshoperations are absolutely necessary for data preservation, they reduce the memorythroughput and increase the memory latency. RAIDR reduces the number of refreshoperations of DRAM chips by differentiating refresh rate in each row lines. With mini-mizing overhead of the memory controller, RAIDR shows great efficiencies in memoryperformances and energy savings.

Many refresh operations are redundant because the allocated refresh rate is usuallybased on the weakest cell in the device. Therefore, most of the DRAM cells are refreshedbefore their data retention times expire. RAIDR categorizes the row lines of DRAMaccording to their retention times and assigns a different refresh rate to each category.In RAIDR, a row’s retention time is defined as the retention time of the leakiest cell inthat row. The RAIDR algorithm proceeds as follows. First, RAIDR checks the retentiontime of each row, a process called retention time profiling. The memory controller thenplaces the rows into retention time bins according to their retention times. Finally, the


47:28 K. Cho et al.

Fig. 16. Flowchart of entire RAIDR process.

memory controller issues refresh operations whenever a refresh candidate is chosen.Figure 16 shows the entire RAIDR process.

In Figure 16, rows are stored in two retention time bins. The first and second binsstore rows with retention times ranging from 64ms to 128ms and from 128ms to 256ms,respectively. To guarantee the integrity of the data, the refresh intervals of the first andsecond bins are set to 64ms and 128ms, respectively. The new default refresh intervalis generated when the shortest retention time cannot be reached by all of the rows inthe retention time bin. In Figure 16, the new default refresh interval is 256ms. Afterretention time profiling, the memory controller inserts all of the rows into their properbins. RAIDR then assigns different refresh rates to each bin. Finding the row needinga refresh operation is simple, because all of the refresh intervals are multiples of 64ms.Therefore, RAIDR determines the timing of a refresh operation simply by calculatingthe number of 64ms intervals since the last refresh operation. In conclusion, RAIDRconserves energy and reduces the performance overhead of DRAM chips by modifyingthe memory controller.

5.4.2. ArchShield [Nair et al. 2013]. Existing repair techniques of DRAM memories havelimits for enduring high error rates. The ArchShield framework exposes the faultycell information at the architectural level, increasing the tolerance to high error-ratecircumstances in DRAM chips.

The two main components of ArchShield are the fault map and selective word levelreplication (SWLR). The fault map stores the number of faulty cells in word lines. TheSWLR then copies all of the words with at least one faulty cell. If multiple faulty cellsoccur in the word, then ArchShield replaces the original word line with the replicatedone. Otherwise, if one faulty cell occurs in the word, ArchShield checks whether thisfaulty cell is reparable or not. If the faulty cell is identified as a one-bit soft erroror an irreparable error, then the word is replaced with the replicate. To reduce theperformance overhead, each fault map is accessed through a last level cache (LLC).The read and write operations of ArchShield are depicted in Figure 17.

When the cache request is a “miss,” the read signal is delivered to memory, and afault map entry is calculated while consulting the fault map address in the LLC. If anLLC hit occurs, then the fault map entry is retrieved; otherwise, another read signal issent to the memory. As mentioned above, the replicated area is read only in the eventof multiple errors or a one-bit irreparable error in the word line. During this process,a replication bit (R-bit) is added to check whether the row line needs replacing by itsreplicate.

5.4.3. Other DRAM Repair Techniques. Son et al. [2015] proposed a new cache-inspiredDRAM resilience architecture (CiDRA), which targets faulty DRAMs with many andwidespread single-bit errors. The faulty cells are controlled by a small-size SRAMcache inserted into the device near the I/O pad. When a faulty cell is uncovered, the



Fig. 17. Flowchart of read and write operations in ArchShield.

DRAM commands and faulty addresses are captured by the CiDRA cache. CiDRA showsexcellent area-efficiency performance and reduces the energy overhead by employinga Bloom filter [Bloom 1970].

Various bit repair mechanisms that test for faulty bits at the system boot time havealso been proposed [Yenkatesan et al. 2006; Schechter et al. 2010; Lin et al. 2012].Because these mechanisms perform simple tests at the system boot-up level, they maynot enable adequate data retention. Therefore, data loss is prevented by various errormitigation techniques.

Qureshi et al. [2015] proposed an architecture-level model that analyzes the influenceof VRT. The VRT-aware multirate refresh (AVATAR) efficiently profiles the VRT andadapts the multirate refresh rate to specific row lines of DRAM, depending on thecurrent conditions of the VRT failures.

The VRT problem is aggravated when the memory cells are heated by the DRAMpackaging process. The data pattern dependencies and VRT are observed by a thermalmodeling tool, which determines the relationship between retention time and temper-ature [Weis et al. 2015].

5.5. Summary of BIRA Algorithms

This section introduced various BIRA algorithms. The performance criteria of theserepresentative BIRA algorithms are compared in Table II.

BIRA algorithms should consider the area overhead as well as both of the perfor-mance criteria of the RA algorithms. The tradeoff among the three main performancecriteria renders the performance evaluation of BIRA algorithms much harder than thatof RA algorithms. An ideal BIRA algorithm would have no area overhead or analysistime and an optimal repair rate. To better satisfy other criteria, the repair rates of BIRA


47:30 K. Cho et al.

Table II. Comparison of BIRA Algorithms

Category NameType of RAApproaches Repair Rate

AnalysisSpeed

AreaOverhead

Non-optimal Repair RateLRM Hybrid Low Fast HighESP Dynamic Medium Fastest Very Low

Optimal Repair Rate

CRESTA Dynamic Optimal Fastest Very HighISF (IS) Hybrid Optimal Medium

(Slow)Medium

BRANCH Hybrid Optimal Fast LowIdeal BIRA - Optimal Fastest None

algorithms must be compromised. The LRM algorithm, an RM algorithm modified forBIRA, can be implemented easily, but its other performances are quite poor. Conversely,the ESP algorithm performs well in terms of both analysis speed and area overhead.If the repair rate is unimportant, then the ESP algorithm most closely approachesthe ideal BIRA. However, the repair rate is very important in practice, because it di-rectly influences the yield. Among the BIRA algorithms with optimal repair rate, theCRESTA algorithm adopts the dynamic RA approach. Therefore, the RA terminateswith the fault collection phase, reducing the analysis time to zero. However, the areaoverheads and number of redundancies in CRESTA are prohibitively large for practi-cal memory. The ISF and IS algorithms have smaller area overheads than CRESTA,but this advantage is compromised by slower analysis. The BRANCH algorithm offersreasonable storage capabilities and a relatively fast analysis.

6. BIRA TECHNIQUES FOR 3D MEMORY

Although the components of semiconductor chips have continued to shrink, the chipperformance enhancement will clearly plateau in 2D chip architectures. Consequently,researchers have turned to design techniques for manufacturing 3D integrated circuits.The production of 3D memories such as 3D DRAM with TSVs has already begun [Kanget al. 2010]. However, in stacked memory configurations, an irreparable stacked diedisables the whole memory stack. Therefore, the yields of 3D stacked memories areproblematic. BIRA techniques for 3D memories have been recently investigated andhave presented new challenges and solutions for 3D memory environments.

6.1. Redundancy Scheme for 3D Memory

A known-good-die (KGD) is used to manufacture 3D memory and is repaired by self-contained redundancy elements. Depending on the performance of the employed RAalgorithm, the fixed redundancy scheme (which separates the redundancies for pre-bond and post-bond phases) can waste unused redundancies. In 2D memory, the unusedredundancies can be shared to increase the memory yield, but this sharing demandsa high routing overhead [Yamagata et al. 1996]. Conversely, the unused redundanciescan be effectively shared in 3D memory, because the verticality and short path lengthsof TSVs enable easy routing [Chou et al. 2009; Jiang et al. 2010]. The 3D memoryyield partly depends on the KGD. Irreparable memories using self-contained sparescan become KGDs by using a post-stacking redundancy-sharing strategy. After pre-bond testing and repair, the memory dies are categorized into three groups: fault-free dies or dies that can be repaired by self-contained redundancies, dies that canbe repaired by shared redundancies, and irreparable dies. To repair dies by sharedredundancies, the required number of redundancies of these dies should be equalto or less than the number of remaining redundancies of dies in the first category.[Reda et al. 2009; Chou et al. 2010; Lee et al. 2011; Taouil et al. 2011; Chi et al. 2012;



Fig. 18. Spare architectures of die-share redundancy scheme.

Lu et al. 2012; Wu et al. 2012; Chou et al. 2013]. For this purpose, the relevant numbersof redundancies in both categories are stored and compared.

6.1.1. Die-Share Redundancy Scheme [Jiang et al. 2010; Chou et al. 2010]. Especially in 3Dstacked memories, yield can be seriously damaged from faulty dies. One of the veryeffective solution is sharing redundancies between two neighboring dies. This methodhelps to improve 3D memory yield obtaining more KGDs.

Similarly to the fixed redundancy scheme, the die-share redundancy scheme haslimited usage in each test phase but can share the redundancies between adjacentdies through the TSVs. Figure 18 shows the spare architectures that provide the die-share redundancy scheme in 3D stacked memories. In the fundamental architecture(Figure 18(a)), the sharing of all of the redundancies between two adjacent memorydies requires the number of total redundancies (obtained by summing the number ofSR and SC TSVs). Figure 18(b) shows a modified architecture that shares the TSVs ofmultiple redundancies to conserve the hardware overhead. The die-share redundancyscheme provides the necessary flexibility for repairing memory dies, but the matchingalgorithms cannot analyze faulty cells that occur after the memory die stacking. Be-cause this redundancy scheme cannot share redundancies between the pre-bond andpost-bond phases, it is rather ineffective in the post-bond phase.

6.1.2. Post-Share Redundancy Scheme [Lee et al. 2014]. Because repair processes of 3Dstacked memories consist of pre-bond phase and post-bond phase, it is vital to allocateshared redundancies in a proper repair phase. The post-share redundancy schemeselectively utilizes shared redundancies. By doing so, it gains an advantage of repairrate over other redundancy schemes studied before.

The post-share redundancy scheme repairs each memory die in the pre-bond phasewith its own redundancies. To avoid excessive redundancies on clustered faulty cells,some parts of the redundancies are reserved for post-bond repair. The post-bond repairprocess uses all of the unused redundancies remaining after the pre-bond repair phase.Any faulty cells detected in the unused redundancies are masked in the pre-bond phaseto prevent their usage in the post-bond repair process.

Figure 19 illustrates the repair process by the post-share redundancy scheme. Thememory cells are arrayed into three spare rows and three spare columns (i.e., SR =3 and SC = 3). Two of the spare rows and two of the spare columns are allocated topre-bond repair; the remainder are reserved for post-bond repair. Figure 19(a) shows


47:32 K. Cho et al.

Fig. 19. Example of post-share redundancy scheme.

the pre-bond repair process of the given faulty memory. The nine faulty cells uncoveredin the pre-bond test are repaired by both spare rows and one spare column. Since onlythree redundancies are used in the pre-bond repair process, the unused spare columnis available for post-bond repair. The post-bond test reveals an additional four faultycells [(1, 0), (1, 2), (6, 2), and (7, 7)], which can be repaired by allocating all of theremaining redundancies (see Figure 19(b)). This faulty memory is reparable becausethe redundancies are shared in the post-bond repair of the post-share redundancyscheme.

6.2. Integration Methods

Wafer-to-wafer (WTW), die-to-wafer (DTW), and die-to-die (DTD) stacking methodscan be used to manufacture 3D integrated circuits. The characteristics of the stackingmethods influence the final yield of 3D integrated circuits. The WTW method directlystacks the entire wafers. WTW integration is advantageous because of the simplic-ity of its manufacturing process, but the yield losses are relatively large. Becausethe entire wafers are stacked without a cutting process, a good die within one wafercan be aligned with a bad die in another wafer. The yield of WTW integration canbe improved by numerous heuristic and comprehensive algorithms [Reda et al. 2009;Taouil and Hamdioui 2012]. The DTW and DTD methods select only good dies for in-tegration, which improves their yields but complicates their manufacturing processes.Die-selection methods with matching algorithms for DTW and DTD integration havealso been proposed [Chou et al. 2010; Lee et al. 2011; Chi et al. 2012; Lu et al. 2012;Wu et al. 2012; Chou et al. 2013; Kang et al. 2013].

6.2.1. Iterative Matching Heuristic (IMH) [Reda et al. 2009]. Despite its simple implementa-tion, wafer-to-wafer stacking methods can cause a serious yield loss issue. The IMHsuggests an iterative matching algorithm in the wafer stacking problem to solve thisissue.

The IMH stacks K wafer lots, forming groups of multiple wafer maps. Denoting onegroup of N wafer maps as wafer lot Li, the K wafer lots can be expressed as

K = {L1, L2, . . . , LK}. (10)



Fig. 20. Optimal integration algorithm between two wafer lots.

The IMH first stacks two wafer lots {L1, L2}; that is, K = 2. In this case, the optimalsolution is obtained by the graph-theoretical method illustrated in Figure 20.

Figure 20 shows a bipartite graph with 2N vertices and N2 edges. The left andright sets, each of N vertices, denote the first wafer lot L1 and second wafer lot L2,respectively. Each edge indicates the total number of good dies generated by stackingtwo wafer maps at their end points. The yield is then maximized by the Hungarianalgorithm [Munkres et al. 1957]. The optimal stacking of two wafer lots is defined bya new operator �. Thus, a new set of wafer maps is generated by computing L1 � L2.The IMH then iteratively expands this operation to the entire case of K wafer lots. TheIMH output of K wafer lots is given by

Li1 � Li2 · · · � LiK. (11)

The values from i1 to iK are ordered as follows. An arbitrarily selected wafer lot Li1should generate the largest number of good dies when employed in Equation (11) withK = j - 1. The IMH heuristically solves the WTW integration problem with a runtimeof O(K2N3) and a memory requirement of O(N2).

6.2.2. Die-Selection Method Using Three Search-Space Conditions [Lee et al. 2011]. Previousdie-selection methods focus on sharing redundancies only between adjacent memorydies. On the other hand, this die-selection method using three search-space conditionsconsiders all memory dies. Therefore, it can greatly improve the 3D memory yield.

After the pre-bond phase, the categorized memory dies are stored in a memory dieclassification map. The memory die with the most difficult reparable condition in theclassification map is selected as the target die. A counterpart die to the target die isthen sought by a die-selection method. The search space of the counterpart die includes(1) the number of available spare rows (columns) RT (CT) of the target die after thepre-bond phase, (2) the number of spare rows (columns) RC (CC) of the counterpart dieafter the pre-bond phase, and (3) the total number of spare rows (columns) RS (CS) is in


47:34 K. Cho et al.

Fig. 21. Example of die-selection method over three search spaces.

each die. A negative RT (CT, RC, or CC) indicates that the memory die needs additionalspares for the repair process. The condition of search space (1) is then defined as

{−RT ≤ RC ≤ RS} (12)

and that of search space (2) is defined as

{−CT ≤ CC ≤ CS}. (13)

Equations (12) and (13) imply that the available spare rows (columns) of the targetdie and the counterpart die should sum to a positive number or 0. For example, if RTis −1, then the target die can be repaired by one additional spare row. Therefore, thecounterpart die should have at least one available spare row; that is, RCshould exceed1. The condition of search space (3) relates to the characteristics of single-cell faultycells. A single-cell fault with a row–column address that differs from the addressesof all of the other faulty cells can be repaired by a spare row and a spare column.Therefore, the memory die can decide which type of spares should be used to repairsingle-cell faults in the last phase of the repair process. The condition of search space(3) is defined as

{−RT − CT + ST + SC ≤ RC + CC ≤ RS + CS}, (14)

where ST and SC are the numbers of single-cell faults in the target and counterpartdies, respectively.

Figure 21 demonstrates the search process of a counterpart die under the threesearch-space conditions. Figure 21(a) shows a memory die classification map withRS = 1 and CS = 2. SSR (SSC) indicates the current number of available spare rows(columns) after the pre-bond phase. Each memory die within this map is classified byits status after the pre-bond phase. Dies with no faulty cells after the pre-bond phaseare classified as fault-free. Dies that can be repaired with their own redundanciesare designated as self-reparable. Dies that can be repaired only if redundancies areavailable in other dies are inter-reparable dies. Irreparable dies cannot be repairedunder any circumstances. Figure 21(b) applies the die-selection method to four memorydies, A, B, C, and D. The subscripts indicate the numbers of single-cell faults of eachdie. In this example, A, C, and D possess no single-cell faults, and B has a single-cellfault, so the dies are expressed as A0, B1, C0, and D0. Under search-space conditions(1) and (2), memory die A0 can be matched to C0 or D0 (indicated by the bold arrow),but memory die B1 cannot find a counterpart die. At this point, memory die B1 changesits status under search-space condition (3). If the single-cell fault of B is repaired withspare rows, then B’s status changes to BR; otherwise, its status changes to BC. BecauseBC finds a counterpart C0, the final solution of the die-selection method is (BC, C0) and(A0, D0).



6.3. 3D BIRA Algorithms

Chi et al. [2012] proposed a BISR architecture with shared redundancies. The 3Dmemory in this architecture consists of a logic die and memory dies. The authorsdedicated a local BIST and BIRA to each memory die and equipped the logic die witha global spare assignment unit and an address remapping unit (i.e., a central BIRA).In the pre-bond test and repair phase, each memory die is concurrently tested andrepaired by its dedicated BISR. In the post-bond test and repair phase, the globalspare assignment unit allocates spares to faulty cells considering each die’s conditions.Wu et al. [2012] also dedicated a local BIST and BIRA to each memory die in a BISRarchitecture, but their logic die has an additional redundant memory in the centralBIRA. Since the KGDs compose a known-good-stack (KGS), the single/multiple-stackeddies are evaluated by a KGS test. If the memory dies can be repaired by redundantmemory in the logic die, then the integrity of the KGS is maintained, and the faultycells are replaced by the redundant memory.

The simulator of Lin et al. [2013] evaluates 3D redundancy architectures by calculat-ing their repair rates under different design constraints. The redundancy constraintsare local, global, and extended. Local redundancies repair only their own memoryblocks, whereas global redundancies can be shared among multiple blocks. Extended re-dundancies can repair several memory blocks simultaneously. When provided with theredundancy constraints and failure distributions, the simulator automatically searchesfor the specific redundancy architecture among the various memory architectures thatmaximizes the repair.

Lin et al. [2014] also proposed 3D redundancy architectures, referred to as cubicalredundancy architectures 1 and 2 (CRA1 and CRA2). In these architectures, the BIRAmodule is replaced by a word-based RA algorithm using BIST and address remappingunit (ARU) circuits during the test mode. In CRA1, spare elements are placed in eachDRAM; in CRA2, the memory dies are set to SRAM.

Anigundi et al.’s [2009] inter-die inter-sub-array redundancy repair algorithm im-proves the repair rate of 3D BIRA algorithms. To reduce the burdens on the memorysystem, such as the interconnect cost, area overhead of the control logics and largememory access latency, redundancy sharing is usually restricted to two adjacent layersof the stacked memories. To overcome these difficulties, Anigundi’s approach focuses onthe direction of redundancy sharing. Redundancy sharing among memory sub-arraysis configured vertically rather horizontally. This approach increases the repair ratewhile maintaining relatively small interconnect costs.

Kang et al. [2015] proposed a 3D BISR scheme with both parallel and serial test-repair phases. This 3D BISR scheme handles multiple memories by using a singleBIRA module. During the parallel test-repair phase, all of the memory dies are testedin parallel and re-arranged in descending order according to their numbers of faultycells. The re-arranged memory dies are then sequentially repaired by the BIRA module.Because fault-free memories need not be repaired, the analysis time of the serial test-repair phase is reduced.

6.4. Summary of 3D Techniques

This section discussed emerging issues in 3D memory repair techniques. The challengesof 3D memory repair will become more important in future research, since the down-scaling of 2D memory will eventually reach its limit. The yield is much more sensitiveto memory failure in 3D integrated circuits than in 2D integrated circuits. Therefore,3D memory repair techniques will become increasingly required in the semiconductorfield. The basic concepts of 3D memory repair techniques; that is, various redundancyschemes, integration methods, and 3D BIRA algorithms, were briefly introduced in thissection.


47:36 K. Cho et al.

7. SUMMARY AND CONCLUSION

This article surveyed the issues and challenges facing the analysis and repair of faultymemories. Yield enhancement techniques that repair faulty memories using redundantelements were described, and representative methods were discussed in detail. Thesemethods include software-based RA algorithms in ATEs and hardware-based BIRAalgorithms. The three main performance criteria of RA and BIRA algorithms are theiranalysis speeds, repair rates, and area overheads. Although heuristic RA algorithmsare fast, they cannot repair all the reparable memory. In contrast, RA algorithmsbased on exhaustive searching can guarantee complete memory repair; however, theyrequire long execution times. To optimize the repair rate, an algorithm should storeall the fault information; however, the area overhead required to do so may beprohibitive. The repair rate cannot be optimized unless all the fault informationis preserved. Therefore, there are tradeoffs among the three performance criteria,and improving the performances of RA and BIRA algorithms is the focus of currentresearch in this field.

This article also described memory repair processes and spare architectures. If afaulty cell is detected when applying test patterns to the memory, then the faultycell information should be saved to find a repair solution. After the memory test, allthe fault information is stored and analyzed by RA algorithms, which find a repairsolution if the memory is reparable. Based on the repair solution, the faulty cells arethen repaired by redundant storage elements. The test and repair times are minimizedby a preprocessing/filter algorithm, applied either during fault collection or before thefault information is analyzed by the RA algorithm. The repair mechanism dependson the spare architecture. If a faulty cell is repaired by a bit-by-bit approach or in a1D spare architecture, then no RA procedure is required. Although the complexity ofRA using a line-by-line approach in a 2D spare architecture is NP-complete, 2D sparearchitectures are widely used because of their superior repair efficiencies.

The repair processes of various existing RA algorithms were illustrated by applyingthe algorithms to the same faulty memory problem. Numerous BIRA schemes for 3Dmemory were also described in this article. Because 3D memory is constructed byusing KGDs, its yield depends on the 2D memory yield. Bad dies can be repaired bysharing inter-die redundancies. Therefore, the die selection, matching algorithms, and3D BIRA architectures were investigated. All these approaches attempt to extend theconventional RA algorithms from 2D memory to 3D memory.

In conclusion, these various RA algorithms and BIRA algorithms share a commongoal of achieving higher memory yields and optimal performance criteria. An idealalgorithm would achieve perfect performance criteria, namely, an optimal repair rate,zero analysis time, and no area overhead. Although perfect algorithms are practicallyimpossible, the examples in this survey are representative attempts at finding the bestalgorithms. Further optimization will require additional analysis and repair methodsfor faulty memories. In addition, defects can be dependent on memory technologies butmemory repair that can only deal with memory cells at the logic level can be appliedto any memory technologies. Therefore, we expect that this survey will contribute tothe development of novel algorithms, new spare architectures, and new techniques foremploying the existing modules and future technologies in memory devices.

REFERENCES

R. Anigundi, H. Sun, J. Q. Lu, K. Rose, and T. Zhang. 2009. Architecture design exploration of three-dimensional (3D) integrated DRAM. In Proceedings of the 10th International Symposium on Quality ofElectronic Design. 86–90.

I. Bayraktaroglu, O. Caty, and Y. Wong. 2005. Highly configurable programmable built-in self test architec-ture for high-speed memories. In Proceedings of the IEEE VLSI Test Symposium. 21–26.



P. Bernardi, L. M. Ciganda, E. Sanchez, and M. S. Reorda. 2013. MIHST: A hardware technique for embeddedmicroprocessor functional on-line self-test. IEEE Trans. Comput. 63, 11, 2760–2771.

D. K. Bhavsar. 1999. An algorithm for row-column self-repair of RAMs and its implementation in the Alpha21264. In Proceedings of the International Test Conference. 311–318.

B. H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7, 422–426.S. Boutobza, M. Nicolaidis, K. L. Lamara, and A. Costa. 2005. Programmable memory BIST. In Proceedings

of the International Test Conference. 1155–1164.G. Campardo, M. Scotti, S. Scommegna, S. Pollara, and A. Silvagni. 2003. An overview of flash architectural

developments. Proc. IEEE, 91, 4, 523–536.H. Cao, M. Liu, H. Chen, X. Zheng, C. Wang, and Z. Wang. 2012. Efficient built-in self-repair strategy for

embedded SRAM with selectable redundancy. In Proceedings of the 2nd International Conference onConsumer Electronics, Communications and Networks (CECNet). 2565–2568.

D.-M. Chang, J. F. Li, and Y. J. Huang. 2008. A built-in redundancy-analysis scheme for random accessmemories with two-level redundancy. J. Electron. Test.: Theor. Appl. 24, 1–3, 181–192.

T. J. Chen, J. F. Li, and T. W. Tseng. 2012. Cost-efficient built-in redundancy analysis with optimal repairrate for RAMs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 31, 6, 930–940.

C. C. Chi, Y. F. Chou, D. M. Kwai, Y. Y. Hsial, C. W. Wu, Y. T. Hsing, L. M. Denq, and T. H. Lin. 2012. 3D-ICBISR for stacked memories using cross-die spares. In Proceedings of the International Symposium onVLSI Design, Automation, and Test (VLSI-DAT). 1–4.

H. Cho, W. Kang, and S. Kang. 2010. A built-in redundancy analysis with a minimized binary search tree.ETRI J. 32, 4, 638–641.

H. Cho, W. Kang, and S. Kang. 2012. A fast redundancy analysis algorithm in ATE for repairing faultymemories. ETRI J. 34, 3, 478–481.

Y. F. Chou, D. M. Kwai, and C. W. Wu. 2009. Memory repair by die stacking with through silicon vias. InProceedings of the IEEE International Workshop on Memory Technology, Design, and Testing. 53–58.

C. W. Chou, Y. J. Huang, and J. F. Li. 2010. Yield-enhancement techniques for 3D random access memories. InProceedings of the International Symposium on VLSI Design Automation and Test (VLSI-DAT). 104–107.

C. W. Chou, Y. J. Huang, and J. F. Li. 2013. A built-in self-repair scheme for 3-D RAMs with interdieredundancy. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 32, 4, 572–583.

J. Chung, J. Park, J. A. Abraham, E. Byun, and C. J. Woo. 2010. Reducing test time and area overhead of anembedded memory array built-in repair analyzer with optimal repair rate. In Proceedings of the 28thVLSI Test Symposium (VTS). 33–38.

J. Chung, J. Park, and J. A. Abraham. 2013. A built-in repair analyzer with optimal repair rate for word-oriented memories. IEEE Trans. VLSI Syst. 21, 2, 281–291.

W. R. Davis, J. Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A. M. Sule, M. Steer, and P. D. Franzon. 2005.Demystifying 3D ICs: the pros and cons of going vertical. IEEE Des. Test Comput. 22, 6, 498–510.

J. R. Day. 1985. A fault-driven, comprehensive redundancy algorithm. IEEE Design Test Comput. 2, 3, 35–44.X. Du, S. M. Reddy, W. T. Cheng, J. Rayhawk, and N. Mukherjee. 2004. At-speed built-in self-repair analyzer

for embedded word-oriented memories. In Proceedings of the 17th International Conference on VLSIDesign. 895–900.

R. W. Haddad, A. T. Dahbura, and A. B. Sharma. 1991. Increased throughput for the testing and repair ofRAMs with redundancy. IEEE Trans. Comput. 40, 2, 154–166.

V. G. Hemmady and S. M. Reddy. 1989. On the repair of redundant RAMs. In Proceedings of the 26thConference on Design Automation. 710–713.

C. S. Hou and J. F. Li. 2015. High repair-efficiency BISR scheme for RAMs by reusing bitmap for bitredundancy. IEEE Trans. VLSI Syst. 23, 9, 1720–1728.

Y. Y. Hsiao, C. H. Chen, and C. W. Wu. 2006. A built-in self-repair scheme for NOR-type flash memory. InProceedings of the 24th IEEE VLSI Test Symposium. 114–119.

Y. Y. Hsiao, C. H. Chen, and C. W. Wu. 2010. Built-in self-repair scheme for flash memories. IEEE Trans.Comput.-Aid. Des. Integr. Circ. Syst. 29, 8, 1243–1256.

C. H. Hsu and S. K. Lu. 2002. Fault-tolerance design of memory systems based on DBL structures. InProceedings of the 2002 Asia-Pacific Conference on Circuits and Systems 1, 221–224.

W. K. Huang, Y. N. Shen, and F. Lombardi. 1990. New approaches for the repairs of memories with redundancyby row/column deletion for yield enhancement. IEEE Trans. Comput.-Aid. Des. 9, 3, 323–328.

C. T. Huang, J. R. Huang, C. F. Wu, C. W. Wu, and T. Y. Chang. 1999. A programmable core BIST for embeddedDRAM. IEEE Des. Test Comput. 16, 1, 59–70.


47:38 K. Cho et al.

R. F. Huang, J. F. Li, J. C. Yeh, and C. W. Wu. 2002. A simulator for evaluating redundancy analysis algorithmsof repairable embedded memories. In Proceedings of the 2002 IEEE International Workshop on MemoryTechnology, Design and Testing (MTDT). 68–73.

C. T. Huang, C. F. Wu, J. F. Li, and C. W. Wu. 2003. Built-in redundancy analysis for memory yield improve-ment. IEEE Trans. Reliabil. 52, 4, 386–399.

R. F. Huang, C. L. Su, C. W. Wu, S. T. Lin, K. L. Luo, and Y. J. Chang. 2004. Fail pattern identification formemory built-in self-repair. In Proceedings of the 13th Asian Test Symposium. 366–371.

C. D. Huang, T. W. Tseng, and J. F. Li. 2006. An infrastructure IP for repairing multiple RAMs in SOCs. InProceedings of the International Symposium on VLSI Design, Automation and Test (VLSI-DAT). 1–4.

Y. J. Huang, D. M. Chang, and J. F. Li. 2006. A built-in redundancy-analysis scheme for self-repairableRAMs with two-level redundancy. In Proceedings of the 21st IEEE International Symposium on Defectand Fault-Tolerance in VLSI Systems. 362–370.

C. D. Huang, J. F. Li, and T. W. Tseng. 2007. ProTaR: An infrastructure IP for repairing RAMs in System-on-Chips. IEEE Trans.VLSI Syst. 15, 10, 1135–1143.

R. F. Huang, C. H. Chen, and C. W. Wu. 2007. Economic aspects of memory built-in self-repair. IEEE Des.Test Comput. 24, 2, 164–172.

A. A. Hwang, I. A. Stefanovici, and B. Schroeder. 2012. Cosmic rays don’t strike twice: Understanding thenature of DRAM errors and the implications for system design. In Proceeding of the 17th InternationalConference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).111–122.

International Technology Roadmap for Semiconductors (ITRS). 2011. Semiconductor Industry Association,San Jose, CA. Retrieved from http://www.itrs.net/Links/2011ITRS/Home2011.htm.

W. Jeong, I. Kang, K. J. Jin, and S. Kang. 2009. A fast built-in redundancy analysis for memories with optimalrepair rate using a line-based search tree. IEEE Trans. VLSI Syst. 17, 12, 1665–1678.

W. Jeong, J. Lee, T. Han, K. Lee, and S. Kang. 2010. An advanced BIRA for memories with an optimal repairrate and fast analysis speed by using a branch analyzer. IEEE Trans. Comput-Aid Des. Integr. Circ. Syst.29, 12, 2014–2026.

L. Jiang, R. Ye, and Q. Xu. 2010. Yield enhancement for 3D-stacked memory by redundancy sharing acrossdies. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD).230–234.

I. Kang, W. Jeong, and S. Kang. 2008. High-efficient memory BISR with two serial RA stages using sparememories. Electron. Lett. 44, 8, 515–517.

U. Kang, H. J. Chung, S. Heo, D. H. Park, H. Lee, J. H. Kim, S. H. Ahn, S. H. Cha, J. Ahn, D. Kwon, J. W. Lee,H. S. Joo, W. S. Kim, D. H. Jang, N. S. Kim, J. H. Choi, T. G. Chung, J. H. Yoo, J. S. Choi, C. Kim, andY. H. Jun. 2010. 8Gb 3-D DDR3 DRAM using through-silicon-via technology. IEEE J. Solid-State Circ.45, 1, 111–119.

W. Kang, C. Lee, K. Cho, and S. Kang. 2013. A die selection and matching method with two stages for yieldenhancement of 3-D memories. In Proceedings of the 22nd Asian Test Symposium. 301–306.

W. Kang, H. Cho, J. Lee, and S. Kang. 2014. A BIRA for memories with an optimal repair ate using sparememories for area reduction. IEEE Trans. VLSI Syst. 22, 11, 2336–2349.

W. Kang, C. Lee, H. Lim, and S. Kang. 2015. A 3 dimensional built-in self-repair scheme for yield improvementof 3 dimensional memories. IEEE Trans. Reliabil. 64, 2 (2015), 586–595.

A. Karandikar and K. K. Parhi. 1998. Low power SRAM design using hierarchical divided bit-line approach.In Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors.82–88.

T. Kawagoe, J. Ohtani, M. Niiro, T. Ooishi, M. Hamada, and H. Hidaka. 2000. A built-in self-repair analyzer(CRESTA) for embedded DRAMs. In Proceedings of the International Test Conference. 567–574.

S. Khan, D. Lee, Y. Kim, A. R. Alameldeen, C. Wilkerson, and O. Mutlu. 2014. The efficacy of error mitigationtechniques for DRAM retention failures: A comparative experimental study. In Proceedings of the 2014ACM International Conference on Measurement and Modeling of Computer Systems. 519–532.

I. Kim, Y. Zorian, G. Komoriya, H. Pham, F. P. Higgins, and J. L. Lewandowski. 1998. Built in self repair forembedded high density SRAM. In Proceedings of the International Test Conference. 1112–1119.

Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, C. Wilkerson, and O. Mutlu. 2014. Flipping bits inmemory without accessing them: An experimental study of DRAM disturbance errors. In Proceeding ofthe 41st Annual International Symposium on Computer Architecture (ISCA). 361–372.

J. U. Knickerbocker, C. S. Patel, P. S. Andry, C. K. Tsang, P. Buchwalter, E. J. Sprogis, H. Gan, R. R. Horton, R.J. Polastre, S. L. Wright, and J. M. Cotte. 2006. 3-D silicon integration and silicon packaging technologyusing silicon through-vias. IEEE J. Solid-State Circ. 41, 8, 1718–1725.



S. Y. Kuo and K. F. Fuchs. 1987. Efficient spare allocation for reconfigurable arrays. IEEE Design and Testof Computers. 4, 1, 24–31.

H. H. S. Lee and K. Chakrabarty. 2009. Test challenges for 3D integrated circuits. IEEE Des. Test Comput.26, 5, 26–35.

M. J. Lee and K. W. Park. 2010. A mechanism for dependence of refresh time on data pattern in DRAM.IEEE Electron. Dev. Lett. 31, 2, 168–170.

J. Lee, K. Park, and S. Kang. 2011. A die-selection method using search-space conditions for yield enhance-ment in 3D memory. ETRI J. 33, 6, 904–913.

M. Lee, L. M. Denq, and C. W. Wu. 2011. A memory built-in self-repair scheme based on configurable spares.IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 30, 6, 919–929.

C. Lee, W. Kang, D. Cho, and S. Kang. 2014. A new fuse architecture and a new post-share redundancyscheme for yield enhancement in 3-D-stacked memories. IEEE Trans. Comput.-Aid. Des. Integr. Circ.Syst. 33, 5 (2014), 786–797.

J. F. Li, J. C. Yeh, R. F. Huang, and C. W. Wu. 2003. A built-in self-repair scheme for semiconductor memorieswith 2D redundancy. In Proceedings of the International Test Conference. 393–402.

J. F. Li, J. C. Yeh, R. F. Huang, and C. W. Wu. 2005. A built-in self-repair design for RAMs with 2D redundancy.IEEE Trans. VLSI Syst. 13, 6, 742–745.

Y. Li, O. Mutlu, D. S. Gardner, and S. Mitra. 2010. Concurrent autonomous self-test for uncore componentsin system-on-chips. In Proceedings of the 28th VLSI Test Symposium (VTS). 232–237.

Y. Li, E. Cheng, S. Makar, and S. Mitra. 2013. Self-repair of uncore components in robust system-on-chips:An OpenSPARC T2 case study. In Proceeding of 2013 IEEE International Test Conference (ITC). 1–10.

H. Y. Lin, F. M. Yeh, and S. Y. Kuo. 2006. An efficient algorithm for spare allocation problems. IEEE Trans.Reliabil. 55, 2, 369–378.

G. Q. Lin, Z. Y. Wang, and S. K. Lu. 2009. Built-in self-repair techniques for content addressable memories.In Proceedings of the International Symposium on VLSI Design, Automation and Test (VLSI-DAT).267–270.

C. H. Lin, D. Y. Shen, Y. J. Chen, C. L. Yang, and M. Wang. 2012. SECRET: Selective error correction forrefresh energy reduction in DRAMs. In Proceedings of 2012 IEEE 30th International Conference onComputer Design (ICCD). 67–74.

B. Y. Lin, M. Lee, and C. W. Wu. 2013. Exploration methodology for 3D memory redundancy architecturesunder redundancy constraints. In Proceedings of the 22nd Asian Test Symposium. 1–6.

B. Y. Lin, W. T. Chiang, C. W. Wu, M. Lee, H. C. Lin, C. N. Peng, and M. J. Wang. 2014. Redundancyarchitectures for channel-based 3D DRAM yield improvement. In Proceedings of the International TestConference. 1–7.

J. Liu, B. Jaiyen, R. Veras, and O. Mutlu. 2012. RAIDR: Retention-aware intelligent DRAM refresh. InProceedings of the 39th Annual International Symposium on Computer Architecture (ISCA). 1–12.

J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, and O. Mutlu. 2013. An experimental study of data retention behaviorin modern DRAM devices: Implications for retention time profiling mechanisms. In Proceeding of the40th Annual International Symposium on Computer Architecture (ISCA). 60–71.

C. P. Low and H. W. Leong. 1996. A new class of new class of efficient algorithms for reconfiguration ofmemory arrays. IEEE Trans. Comput. 45, 5, 614–618.

S. K. Lu and S. C. Huang. 2004. Built-in self-test and repair (BISTR) techniques for embedded RAMs.In Proceedings of the Records of the 2004 International Workshop on Memory Technology, Design andTesting. 60–64.

S. K. Lu, Y. C. Tsai, and S. C. Huang. 2005. A BIRA algorithm for embedded memories with 2D redundancy.In Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing.121–126.

S. K. Lu, Y. C. Tsai, C. H. Hsu, K. H. Wang, and C. W. Wu. 2006. Efficient built-in redundancy analysis forembedded memories with 2D redundancy. IEEE Trans. VLSI Syst. 14, 1, 34–42.

S. K. Lu, C. L. Yang, and H. W. Lin. 2006. Efficient BISR techniques for word-oriented embedded memorieswith hierarchical redundancy. In Proceedings of the IEEE /ACIS International conference on Computerand Information Science and 1st IEEE/ACIS International Workshop on Component-Based SoftwareEngineering, Software Architecture and Reuse. 355–360.

S. K. Lu, C. L. Yang, Y. C. Hsiao and C. W. Wu. 2009. Efficient BISR techniques for embedded memoriesconsidering cluster faults. IEEE Trans. VLSI Syst. 18, 2, 184–193.

S. K. Lu, Z. Y. Wang, Y. M. Tsai, and J. L. Chen. 2012. Efficient built-in self-repair techniques for multiplerepairable embedded RAMs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 31, 4, 620–629.

S. K. Lu, T. W. Chang, and H. Y. Hsu. 2012. Yield enhancement techniques for 3-dimensional random accessmemories. Microelectron. Reliabil. 52, 6, 1065–1070.


47:40 K. Cho et al.

J. A. Mandelman, R. H. Dennard, G. B. Bronner, J. K. DeBrosse, R. Divakaruni, Y. Li, and C. J. Radens.2002. Challenges and future directions for the scaling of dynamic random-access memory (DRAM). IBMJ. Res. Dev. 46, 2.3, 187–212.

S. Matarress and L. Fasoli. 2001. A method to calculate redundancy coverage for FLASH memories. InProceedings of the IEEE International Workshop on Memory Technology, Design and Testing, 41–44.

R. McConnell and R. Rajsuman. 2001. Test and repair of large embedded DRAMs. I. In Proceeding of theInternational Test Conference. 163–172.

J. Meza, Q. Wu, S. Kumar, and O. Mutlu. 2015. Revisiting memory errors in large-scale production datacenters: Analysis and modeling of new trends from the field. In Proceeding of the 45th Annual IEEE/IFIPInternational Conference on Dependable Systems and Networks (DSN). 415–426.

M. Mihara, T. Nakayama, M. Ohkawa, S. Kawai, Y. Miyawaki, Y. Terada, M. Ohi, H. Onoda, M. Hatanaka, H.Miyoshi, and T. Yoshihara. 1994. Row-redundancy scheme for high-density flash memory. In Proceedingsof the IEEE International Solid-State Circuits Conference, 1994. Digest of Technical Papers 150–151.

B. Mohammad. 2015. Embedded memory interface logic and interconnect testing. IEEE Trans. VLSI Syst.23, 9, 1946–1950.

Y. Mori, K. Ohyu, K. Okonogi, and R. I. Yamada. 2005. The origin of variable retention time in DRAM. InProceeding of the 2005 IEDM Technical Digest IEEE International Electron Devices Meeting. 1034–1037.

J. Munkres. 1957. Algorithms for the assignment and transportation problems. J. Soc. Industr. Appl. Math.5, 1, 32–38.

O. Mutlu. 2013. Memory scaling: A systems architecture perspective. In Proceedings of the 5th IEEE Inter-national Memory Workshop. 21–25.

P. J. Nair, D. Kim, and M. K. Qureshi. 2013. ArchShield: Architectural framework for assisting DRAM scalingby tolerating high error rates. In Proceeding of the 40th Annual International Symposium on ComputerArchitecture (ISCA). 72–83.

E. Nelson, J. Dreibelbis, and R. McConnell. 2001. Test and repair of large embedded DRAMs. 2. In Proceedingof the International Test Conference. 173–181.

B. Noia and K. Chakrabarty. 2011. Testing and design-for-testability techniques for 3D integrated circuits.In Proceedings of the 20th Asian Test Symposium. 474–479.

P. Ohler, S. Hellebrand, and H. Wunderlich. 2007. An integrated built-in test and repair approach formemories with 2D redundancy. In Proceedings of the 12th IEEE European Test Symposium. 91–96.

M. Ottavi, S. Luca, X. Wang, Y. B. Kim, E. J. Meyer, and F. Lombardi. 2004. Yield evaluation methods ofSRAM arrays: a comparative study. In Proceedings of the 21st IEEE Instrumentation and MeasurementTechnology Conference. 2, 1525–1530.

K. Pagiamtzis and A. Sheikholeslami. 2006. Content-addressable memory (CAM) circuits and architectures:A tutorial and survey. IEEE J. Solid-State Circ. 41, 3, 712–727.

V. F. Pavlidis and E. G. Friedman. 2009. Interconnect-based design methodologies for three-dimensionalintegrated circuits. Proc. IEEE 97, 1, 123–140.

M. K. Qureshi, D. H. Kim, S. Khan, P. J. Nair, and O. Mutlu. 2015. AVATAR: A variable-retention-time(VRT) aware refresh for DRAM systems. In Proceeding of 2015 45th Annual IEEE /IFIP InternationalConference on Dependable Systems and Networks (DSN). 427–437.

S. Reda, S. Gregory, and S. Larry. 2009. Maximizing the functional yield of wafer-to-wafer 3-D integration.IEEE Trans. VLSI Syst. 17, 9, 1357–1362.

S. Schechter, G. Loh, K. Strauss, and D. Burger. 2010. Use ECP, not ECC, for hard failures in resistivememories. In Proceeding of the 37th Annual International Symposium on Computer Architecture (ISCA).141–152.

V. Schober, S. Paul, and O. Picot. 2001. Memory built-in self-repair using redundant words. In Proceedingsof the International Test Conference. 995–1001.

B. Schroeder, E. Pinheiro, and W. D. Weber. 2009. DRAM errors in the wild: A large-scale field study. InProceedings of the 11th Joint International Conference on Measurement and Modeling of ComputerSystems (SIGMETRICS). 193–204.

B. M. Shekar, K. R. Sumanth, and V. V. S. Sateesh. 2011. Built-in self-repair for multiple RAMs with differentredundancies in a SOC. Int. J. Comput. Appl. 24, 8, 26–29.

W. Shi and W. K. Fuchs. 1992. Probabilistic analysis and algorithms for reconfiguration of memory arrays.IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 11, 9, 1153–1160.

S. Shoukourian, V. Vardanian, and Y. Zorian. 2001. An approach for evaluation of redundancy algorithms.In Proceedings of the IEEE International Workshop on Memory Technology, Design and Testing. 51–55.

A. Silvagni, G. Fusillo, R. Ravasio, M. Picca, and S. Zanardi. 2003. An overview of logic architectures insideflash memory devices. Proc. IEEE 91, 4, 569–580.



Y. H. Son, S. Lee, O. Seongil, S. Kwon, N. S. Kim, and J. H. Ahn. 2015. CiDRA: A cache-inspired DRAMresilience architecture. In Proceeding of 2015 IEEE 21st International Symposium on High PerformanceComputer Architecture (HPCA). 502–513.

V. Sridharan, N. DeBardeleben, S. Blanchard, K. B. Ferreira, J. Stearley, J. Shalf, and S. Gurumurthi.2015. Memroy errors in modern systems: The good, the bad, and the ugly. In Proceeding of the 20thInternational Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS). 297–310.

C. H. Stapper and R. J. Rosner. 1995. Integrated circuits yield management and yield analysis: Developmentand implementation. IEEE Trans. Semicond. Manuf. 8, 2, 95–102.

M. Taouil and S. Hamdioui. 2011. Layer redundancy based yield improvement for 3D wafer-to-wafer stackedmemories. In Proceedings of the 16th IEEE European Test Symposium. 45–50.

M. Taouil and S. Hamdioui. 2012. Yield improvement for 3D wafer-to-wafer stacked memories. J. Electron.Test. 28, 4, 523–534.

M. Tarr, D. Boundreau, and R. Murphy. 1984. Defect analysis system speeds test and repair of redundantmemories. Electronics 57, 1, 175–179.

T. W. Tseng, J. F. Li, C. C. Hsu, A. Pao, K. Chiu, and E. Chen. 2006. A reconfigurable built-in self-repairscheme for multiple repairable RAMs in SOCs. In Proceedings of the International Test Conference. 1–9.

T. W. Tseng, J. F. Li, and D. M. Chang. 2006. A built-in redundancy-analysis scheme for RAMs with 2Dredundancy using 1D local bitmap. In Proceedings of the Design, Automation and Test in Europe. 53–58.

T. W. Tseng and J. F. Li. 2008. A shared parallel built-in self-repair scheme for random access memories inSOCs. In Proceedings of the International Test Conference. 1–9.

T. W. Tseng, J. F. Li, and C. S. Hou. 2010. A built-in method to repair SoC RAMs in parallel. IEEE Des. TestComput. 27, 6, 46–57.

T. W. Tseng, J. F. Li, and C. C. Hsu. 2010. ReBISR: A reconfigurable built-in self-repair scheme for randomaccess memories in SOCs. IEEE Trans. VLSI Syst. 18, 6, 921–932.

T. W. Tseng and J. F. Li. 2011. A low-cost built-in redundancy-analysis scheme for word-oriented RAMs with2D redundancy. IEEE Trans. VLSI Syst. 19, 11, 1983–1995.

C. Weis, M. Jung, P. Ehses, C. Santos, P. Vivet, S. Goossens, M. Koedam, and N. Wehn. 2015. Retention timemeasurements and modeling of bit error rates of WIDE I/O DRAM in MPSoCs. In Proceeding of 2015Design, Automation & Test in Europe Conference & Exhibition (DATE). 495–500.

C. L. Wey and F. Lombardi. 1987. On the repair of redundant RAM’s. IEEE Trans. Comput.-Aid. Des. Integr.Circ. Syst. 6, 2, 222–231.

C. W. Wu, S. K. Lu, and J. F. Li. 2012. On test and repair of 3D random access memory. In Proceedings of the17th Asia and South Pacific Design Automation Conference. 744–749.

Y. Xie. 2010. Processor architecture design using 3-D integration technology. In Proceedings of the 23rdInternational Conference on VLSI Design. 446–451.

T. Yamagata, H. Sato, K. Fujita, Y. Nishimura, and K. Anami. 1996. A distributed globally replaceableredundancy scheme for sub-half-micron ULSI memories and beyond. IEEE J. Solid-State Circ. 31, 2(1996), 195–201.

M. H. Yang, H. Cho, W. Jeong, and S. Kang. 2009. A novel BIRA method with high repair efficiency and smallhardware overhead. ETRI J. 31, 3, 339–341.

M. H. Yang, H. Cho, W. Kang, and S. Kang. 2010. EOF: efficient built-in redundancy analysis methodologywith optimal repair rate. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 29, 7, 1130–1135.

R. K. Venkatesan, S. Herr, and E. Rotenberg. 2006. Retention-aware placement in DRAM (RAPID): Softwaremethods for quasi-non-volatile DRAM. In Proceeding of the 2006 12th International Symposium onHigh-Performance Computer Architecture (HPCA). 155–165.

M. Yoshimoto, K. Anami, H. Shinohara, T. Yoshihara, H. Takagi, S. Nagao, S. Kayano, and T. Nakano. 1983.A divided word-line structure in the static RAM and its application to a 64K full COMS RAM. IEEE J.Solid-State Circ. 18, 5, 479–485.

S. Zhang, M. Choi, N. Park, and F. Lombardi. 2007. Cost-driven optimization of coverage of combined built-inself-test/automated test equipment testing. IEEE Trans. Instrument. Meas. 56, 3, 1094–1100.

Y. Zorian. 2002. Embedded-memory test and repair: infrastructure IP for SoC yield. In Proceedings of theInternational Test Conference. 340–349.

Y. Zorian and S. Shoukourian. 2003. Embedded-memory test and repair: infrastructure IP for SoC yield.IEEE Des. Test Comput. 20, 3, 58–66.

Received September 2015; revised July 2016; accepted July 2016


A Survey of Repair Analysis Algorithms for...

Documents

Transcript of A Survey of Repair Analysis Algorithms for...