Final
-
Upload
karthik-palaniappan -
Category
Documents
-
view
148 -
download
1
Transcript of Final
CHAPTER I
INTRODUCTION
1.1 INTRODUCTION ABOUT THE PROJECT
Nanotechnology provides smaller, faster, and lower energy devices which
allow more powerful and compact circuitry; however, these benefits come with
a cost—the nanoscale devices may be less reliable. Thermal- and shot-noise
estimations alone suggest that the transient fault rate of an individual nanoscale
device (e.g., transistor or nanowire) may be orders of magnitude higher than
today’s devices. As a result, we can expect combinational logic to be
susceptible to transient faults in addition to storage cells and communication
channels. Therefore, the paradigm of protecting only memory cells and
assuming the surrounding circuitries (i.e., encoder and decoder) will never
introduce errors is no longer valid .In this paper, we introduce a fault-tolerant
nanoscale memory architecture which tolerates transient faults both in the
storage unit and in the supporting logic (i.e., encoder, decoder (corrector), and
detector circuitries). Particularly, this involves identifying a class of error-
correcting codes (ECCs) that guarantees the existence of a simple fault-tolerant
detector design. This class satisfies a new, restricted definition for ECCs which
guarantees that the ECC codeword has an appropriate redundancy structure
such that it can detect multiple errors occurring in both the stored codeword in
memory and the surrounding circuitries. We call this type of error-correcting
codes, fault-secure detector capable ECCs (FSD-ECC). The parity-check
Matrix of an FSD-ECC has a particular structure that the decoder circuit,
generated from the parity-check Matrix, is Fault-Secure. The ECCs we identify
1
in this class are close to optimal in rate and distance, suggesting we can
achieve this property without sacrificing traditional ECC metrics. We use the
fault-secure detection unit to design a fault-tolerant encoder and corrector by
monitoring their outputs. If a detector detects an error in either of these units,
that unit must repeat the operation to generate the correct output vector. Using
this retry technique, we can correct potential transient errors in the encoder and
corrector outputs and provide a fully fault-tolerant memory system.
The novel contributions of this paper include the following:
1. a mathematical definition of ECCs which have simple FSD which
do not requiring the addition of further redundancies in order to
achieve the fault-secure property
2. identification and proof that an existing LDPC code (EGLDPC)
has the FSD property
3. a detailed ECC encoder, decoder, and corrector design that can be
built out of fault-prone circuits when protected by this fault-secure
detector also implemented in fault-prone
4. circuits and guarded with a simple OR gate built out of reliable
circuitry .
To further show the practical viability of these codes, work is done through the
engineering design of a nanoscale memory system based on these encoders and
decoders including the following:
memory banking strategies and scrubbing
reliability analysis
unified ECC scheme for both permanent memory bit
defects and transient upsets
2
This allows us to report the area, performance, and reliability achieved for
systems based on these encoders and decoders,
1.2 LITERETURE SURVEY
H. Naeimi and A. DeHon, “Fault secure encoder and decoder for memory applications,” in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., Sep. 2007,
Proposed the concept of a nanowire-based, sub lithographic memory
architecture tolerant to transient faults. Both the storage elements and the
supporting ECC encoder and corrector are implemented in dense, but
potentially unreliable, nanowire based technology. This compactness is made
possible by a recently introduced Fault-Secure detector design [18]. Using
Euclidean Geometry error-correcting codes (ECC), and identify particular
codes which correct up to 8 errors in data words, achieving a FIT rate at or
below one for the entire memory system for bit and nanowire transient failure
rates as high as 10−17 upsets/device/cycle with a total area below 1.7× the area
of the unprotected memory for memories as small as 0.1 Gbit. Scrubbing
designs are explored and this shows that the overhead for serial error
correction and periodic data scrubbing can be below 0.02% for fault rates as
high as 10−20 upsets/device/cycle. A design is presented to unify the error-
correction coding and circuitry used for permanent defect and transient fault
tolerance.
M. Davey and D.J.Mackay, “Low density parity check codes over Gf(q),”
IEEE Commun. Lett.,vol.2,no.6,pp.165-167,jun.1998.
3
Proposed the concept of memory cells were the only circuitry susceptible
to transient faults, and all the supporting circuitries around the memory (i.e.,
encoders and decoders) were assumed to be fault-free. As a result most of prior
work designs for fault-tolerant memory systems focused on protecting only the
memory cells. However, as we continue scaling down feature sizes or use sub
lithographic devices, the surrounding circuitries of the memory system will
also be susceptible to permanent defects and transient faults .
S. J. Piestrak, A. Dandache, and F. Monteiro, “Designing fault-secure
parallel encoders for systematic linear error correcting codes,” IEEE Trans.
Reliab., vol. 52, june 2003
Proposed the scheme is using redundancy to generate fault tolerant
encoder. develops a fault- secure encoder unit using a concurrent parity
prediction scheme. Like the general parity-prediction technique, concurrently
generates (predicts) the parity-bits of the encoder outputs (encoded bits) from
the encoder inputs (information bits). The predicted parity bits are then
compared against the actual parity function of the encoder output (encoded
bits) to check the correctness of the encoder unit. The parity predictor circuit
implementation is further optimized for each ECC to make a more compact
design. For this reason, efficient parity prediction designs are tailored to a
specific code. Simple parity prediction guarantees single error detection;
however, no generalization is given for detecting multiple errors in the detector
other than complete replication of the prediction and comparison units.
H. Tang, J. Xu, S. Lin, and K. A. S. Abdel-Ghaffar, “Codes on
finite geometries,” IEEE Trans. Inf. Theory, vol. 51, no. 2, Feb. 2005.
Proposed new techniques Euclidean Geometry codes based on the lines
and points of the corresponding finite geometries .Euclidean Geometry codes
4
are also called EG-LDPC codes based on the fact that they are low-density
parity-check (LDPC) codes .LDPC codes have a limited number of 1’s in each
row and column of the matrix; this limit guarantees limited complexity in their
associated detectors and correctors making them fast and light weight .
D. J. C. MacKay, “Good error-correcting codes based on very sparse
matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar.1999.
Proposed on a simple electromechanical memory device in which an
iron nano particle shuttle is controllably positioned within a hollow nano tube
channel. The shuttle can be moved reversibly via an electrical write signal and
can be positioned with nanoscale precision. The position of the shuttle can be
read out directly via a blind resistance read measurement, allowing application
as a non volatile memory element with potentially hundreds of memory states
per device. The shuttle memory has application for archival storage, with
information density as high as 1012 bits/in2, and thermodynamic stability in
excess of one billion years.
H. Wymeersch, H. Steendam, and M. Moeneclaey, “Log-domain
decoding of LDPC codes over Gf(q),” in Proc. IEEE Int. Conf. Commun.,
Paris, France, Jun. 2004, pp. 772–776.
Proposed on a performance and reliability analysis of a scaled crossbar
molecular switch memory and demultiplexer. In particular, multi-switch
junction fault tolerance scheme is compared with a banking defect tolerance
scheme. Results indicate that delay and power scale linearly with increasing
number of redundant molecular switch junctions. The
5
multi-switch junction scheme was also shown to achieve greater than 99%
reliability for molecular switch junction failures rates less than 20%, when a
redundancy of at least 3 was implemented. In contrast, the banking scheme
was only effective for molecular switch junction failure rates of less 1%, which
requires over three times the number of banking modules.
CHAPTER II
SYSTEM ANALYSIS
2.1 EXISTING METHOD
With the popularity of mobile wireless devices soaring, the
wireless communication market continues to see rapid growth. However, with
this growth comes a significant challenge. Many applications, such as digital
video, need new high data rate wireless communication algorithms. The
continuous evolution of these wireless specifications is constantly widening the
gap between wireless algorithmic innovation and hardware implementation. In
addition, low power consumption is now a critical design issue, since the life
of a battery is a key differentiator among consumer mobile devices. The chip
designer's most important task is to implement highly complex algorithms into
hardware as quickly as possible, while still retaining power efficiency. High
Level Synthesis (HLS) methodology has already been widely adopted as the
best way to meet the challenge. This article gives an example in which an HLS
tool is used, together with architectural innovation, to create a low power
LDPC decoder.
HLS methodology allows the hardware design to be completed at a
higher level of abstraction such as C/C++ algorithmic description. This
6
provides significant time and cost savings, and paves the way for designers to
handle complex designs quickly and efficiently, producing results that compare
favorably with hand design. HLS tools also offer specific power-saving
features, designed to solve the problems of power optimization. In any design,
there are huge opportunities for power reduction at both the system and the
architecture levels.
HLS can make a significant contribution to power reduction at the
architecture level, specifically by offering the following: Ease of architecture
and micro-architecture exploration and ease of frequency and voltage
exploration. Use of high level power reduction techniques such as multi-level
clock gating, which are time-consuming and error-prone when done manually
at the RTL level. Power-saving opportunities at the RTL and gate-level are
limited and have a much smaller impact on the total power consumption
LOW DENSITY PARITY CHECK CODERS
Forward Error Correction (FEC) coding, a core technology in wireless
communications, has already advanced from 2G convolutional/block codes to
more powerful 3G Turbo codes. Recently, designers have been looking
elsewhere for help with the more complex 4G systems. A Low-Density, Parity-
Check (LDPC) encoding scheme is an attractive proposition for these systems,
because of its excellent error correction performance and highly parallel
decoding scheme. Nevertheless, it is a major challenge for any designer to
create quickly and efficiently a high performance LDPC decoder which also
meets the data rate and power consumption constraints in wireless handsets.
LDPC decoders vary significantly in their levels of parallelism, which range
from fully parallel to partially parallel to fully sequential. A fully parallel
decoder requires a large amount of hardware resources. Moreover, it hard-
wires the entire parity matrix into hardware, and therefore can only support one
7
particular LDPC code. This makes it impractical to implement in a wireless
system-on-a-chip (SoC) because different or multiple LDPC codes might need
to be supported eventually. Partial parallel architectures can achieve high
throughput decoding at a reduced hardware complexity. However, the level of
parallelism in these instances has to be at the sub-circulant (shifted identity
matrix) level, which makes it code-specific as well and therefore can be too
inflexible for the wireless SoC
2.2 PROPOSED METHOD
In this paper a fault-tolerant nano-technology
memory system that tolerates faults in the encoder, corrector and detector
circuitry as well as the memory is presented. Euclidean Geometry codes with a
fault-secure detector are used to design this memory system. These particular
codes tolerate up to 8 errors in the stored data and up to 16 total errors in
memory and correction logic with an area less than 1.7 times the unprotected
memory area; thereby this involves determining an optimum scrubbing
interval, banking scheme, and corrector parallelism so that error correction has
negligible performance overhead. This design shows a nanoscale corrector to
tolerate permanent cross point defects. Nanotechnology provides smaller,
faster, and lower energy devices, which allow more powerful and compact
circuitry; however, these benefits come with a cost—the nanoscale devices
may be less reliable. Thermal- and shot-noise estimations alone suggest that
the transient fault rate of an individual nanoscale device (e.g., transistor or
nanowire) may be orders of magnitude higher than today’s devices. As a result,
8
we can expect combinational logic to be susceptible to transient faults, not just
the storage and communication systems. Therefore, to build fault-tolerant
nanoscale systems, we must protect both combinational logic and memory
against transient faults. In the present work we introduce a fault-tolerant
nanoscale memory architecture which tolerates transient faults both in the
storage unit and in the supporting logic (i.e., encoder and decoder (corrector)
circuitry). Our proposed system with high fault-tolerant capability is feasible
when the following two fundamental properties are satisfied:
1) Any single error in the encoder or corrector circuitry can only corrupt a
single codeword digit (i.e., cannot propagate to multiple codeword digits).
2) There is a Fault Secure detector (FSD) circuit which can detect any limited
combination of errors in the received codeword or the detector circuit itself.
Property 1 is guaranteed by not sharing logic between the circuitry which
produces each bit. The FSD (Property 2) is possible with a more constrained
definition for the ECC .Figure 1 shows the memory architecture based on this
FSD. There are two FSD units monitoring the output vector of the encoder and
corrector circuitry. If an error is detected at the output of the encoder or
corrector units, that unit has to redo the operation to generate the correct output
vector. Using this detect-and-repeat technique, correct potential transient errors
can be corrected in the encoder or corrector output to provide a fault-tolerant
memory system with fault-tolerant supporting circuitry. The conventional
strategy only works as long as we can expect the encoding, decoding, and
checking logic to be fault-free, which would prevent the use of nanoscale
devices.
It is important to note that transient errors accumulate in the memory
words over time. In order to avoid error accumulation, which exceeds the code
9
correction capability, the system must scrub memory frequently to remove
errors. Memory scrubbing is periodically reading memory words from the
memory, correcting any potential errors, and writing the corrected words back
into the memory . The frequency of scrubbing must be determined carefully.
The scrubbing frequency impacts the throughput from two directions:
i) The memory cannot be used on scrubbing cycles, reducing the memory
bandwidth
available to the application; more frequent scrubbing increases this
throughput loss effect.
ii) During the normal operation, when an error is detected in a memory word,
the system
must spend a number of cycles correcting the error; these cycles also take
bandwidth
away from the application. When scrubbing happens less frequently, more
errors
accumulate in the memory, and therefore more memory reads require error
correction,
increasing bandwidth loss.
10
Fig 2.1: Fault-tolerant memory architecture, with Multiple Parallel Pipelined Corrector corrector
The information bits are fed into the encoder to encode the information
vector, and the fault secure detector of the encoder verifies the validity of the
encoded vector. If the detector detects any error, the encoding operation must
be redone to generate the correct codeword. The codeword is then stored in the
memory. During memory access operation, the stored code words will be
accessed from the memory unit. Code words are susceptible to transient faults
while they are stored in the memory; therefore a corrector unit is designed to
correct potential errors in the retrieved code words.
CHAPTER III
DEVELOPMENT ENVIRONMENT
11
3.1. HARDWARE ENVIRONMENT
1. WINDOWS XP
2. DUAL CORE processor
3. 512 SD RAM
4. JTAG CABLE
5. CPLD
3.1.1 INTRODUCTION TO CPLD
A complex programmable logic device (CPLD) is a programmable logic
device with complexity between that of PALs and FPGAs, and architectural
features of both. The building block of a CPLD is the macro cell, which
contains logic implementing disjunctive expressions and more specialized
logic operations.
Features in common with PALs:
Non-volatile configuration memory. Unlike many FPGAs, an external
configuration ROM isn't required, and the CPLD can function
immediately on system start-up.
For many legacy CPLD devices, routing constrains most logic blocks to
have input and output signals connected to external pins, reducing
opportunities for internal state storage and deeply layered logic. This is
usually not a factor for larger CPLDs and newer CPLD product families.
12
Features in common with FPGAs:
Large number of gates available. CPLDs typically have the equivalent of
thousands to tens of thousands of logic gates, allowing implementation
of moderately complicated data processing devices. PALs typically have
a few hundred gate equivalents at most, while FPGAs typically range
from tens of thousands to several million.
Some provisions for logic more flexible than sum-of-
product expressions, including complicated feedback paths between
macro cells, and specialized logic for implementing various commonly-
used functions, such as integer arithmetic.
The most noticeable difference between a large CPLD and a small FPGA is
the presence of on-chip non-volatile memory in the CPLD. This distinction is
rapidly becoming less relevant, as several of the latest FPGA products also
offer models with embedded configuration memory. The characteristic of non-
volatility makes the CPLD the device of choice in modern digital designs to
perform 'boot loader' functions before handing over control to other devices
not having this capability. A good example is where a CPLD is used to load
configuration data for an FPGA from non-volatile memory.
CPLDs were an evolutionary step from even smaller devices that preceded
them, PLAs (first shipped by Signetics), and PALs. These in turn were
preceded by standard logic products, that offered no programmability and were
"programmed" by wiring several standard logic chips together.
13
Because they offer high speeds and a range of capacities, CPLDs are useful
for a very wide assortment of applications, from implementing random glue
logic to prototyping small gate arrays. One of the most common uses in
industry at this time, and a strong reason for the large growth of the CPLD
market, is the conversion of designs that consist of multiple SPLDs into a
smaller number of CPLDs.
CPLDs can realize reasonably complex designs, such as graphics controller,
LAN controllers, UARTs, cache control, and many others. As a general rule-
of-thumb, circuits that can exploit wide AND/OR gates, and do not need a very
large number of flip-flops are good candidates for implementation in CPLDs.
A significant advantage of CPLDs is that they provide simple design changes
through re-programming (all commercial CPLD products are re-
programmable). With insystem programmable CPLDs it is even possible to re-
configure hardware (an example might be to change a protocol for a
communications circuit) without power-down. Designs often partition
naturally into the SPLD-like blocks in a CPLD. The result is more predictable
speed-performance than would be the case if a design were split into many
small pieces and then those pieces were mapped into different areas of the
chip. Predictability of circuit implementation is one of the strongest advantages
of CPLD architectures.
Commercially Available FPGAs
As one of the largest growing segments of the semiconductor industry,
the FPGA market-place is volatile. As such, the pool of companies involved
14
changes rapidly and it is somewhat difficult to say which products will be the
most significant when the industry reaches a stable state. For this reason, and
to provide a more focused discussion, we will not mention all of the FPGA
manufacturers that currently exist, but will instead focus on those companies
whose products are in widespread use at this time. In describing each device
we will list its capacity, nominally in 2-input NAND gates as given by the
vendor. Gate count is an especially contentious issue in the FPGA industry,
and so the numbers given in this paper for all manufacturers should not be
taken too seriously.
Wags have taken to calling them “dog” gates, in reference to the
traditional ratio between human and dog years. There are two basic categories
of FPGAs on the market today: 1. SRAM-based FPGAs and 2. antifuse-based
FPGAs. In the first category, Xilinx and Altera are the leading manufacturers
in terms of number of users, with the major competitor being AT&T. For
antifuse-based products, Actel, Quicklogic and Cypress, and Xilinx offer
competing products.
3.2 SOFTWARE ENVIRONMENT
SOFTWARE TOOLS:
MODEL SIM
XILINX
3.2.1 AN INTRODUCTION ABOUT MODEL SIM
ModelSim XE-III is a complete PC HDL simulation environment that
enables you to verify the HDL source code and functional and timing models
15
of your designs. Each of the ModelSim tools includes a complete HDL
simulation and debugging environment providing 100% VHDL and Verilog
language coverage, a source code viewer/editor, waveform viewer, design
structure browser, list window, and a host of other features designed to
enhance productivity.
ModelSim is an easy-to-use yet versatile
VHDL/(System)Verilog/SystemC simulator by Mentor Graphics. It supports
behavioral, register transfer level, and gate-level modeling. ModelSim supports
all platforms used here at the Institute of Digital and Computer Systems
(i.e. Linux, Solaris and Windows) and many others too. On Linux and Solaris
platforms ModelSim can be found preinstalled on Institute's computers.
Windows users, however, must install it by themself. This tutorial is intended
for users with no previous experience with ModelSim simulator. It introduces
you with the basic flow how to set up ModelSim simulator, compile your
designs and the simulation basics with ModelSim SE. The example used in this
tutorial is a small design written in VHDL and only the most basic commands
will be covered in this tutorial. This tutorial was made by using version 6.1b of
ModelSim SE on Linux.
The example used in this tutorial is a simple design describing an
electronic lock that can be unlocked by entering a 4-digit PIN (4169) code
from a key pad. When the lock detects the correct input sequence, it will set its
output high for one clock cycle as a sign to unlock the door. The figure below
shows the state machine of the design. The design also includes one dummy
variable (count_v) which has no practical meaning but is used to demonstrate
debug methods in ModelSim.
16
Modelsim eases the process of finding design defects with an
intelligently engineered debug environment. The model sim debug
environment efficiently displays design data for analysis and debug of all
languages. Model Sim allows many debug and analysis capabilities to be
employed post-simulation on saved results, as well as during live simulation
runs. For example, the coverage viewer analyzes and annotates source code
with code coverage results, including FSM state and transition, statement,
expression, branch, and toggle coverage. Signal values can be annotated in the
source window and viewed in the waveform viewer, easing debug navigation
with hyperlinked navigation between objects and its declaration and between
visited files. Race conditions, delta, and event activity can be analyzed in the
list and wave windows. User-defined enumeration values can be easily defined
for quicker understanding of simulation results. For improved debug
productivity, Model Sim also has graphical and textual dataflow capabilities.
FEATURES High-performance, high-capacity engine for the fastest regression suite
throughput
Native support of Verilog, VHDL, and SystemC for effective
verification of the most sophisticated design environments
Fast time-to-debug causality tracing and multi-language debug
environment
Advanced code coverage and analysis tools for fast time to coverage
closure
3.2.2 AN INTRODUCTION ABOUT XILINX
17
Xilinx, is a supplier of programmable logic devices. It is known for
inventing the field programmable gate array (FPGA) and as the first
semiconductor company with a fabless manufacturing model. Xilinx was
founded in 1984 by two semiconductor engineers, Ross Freeman and Bernard
Vonderschmitt, who were both working for integrated circuit and solid-state
device manufacturer Zilog Corp. Xilinx designs, develops and markets
programmable logic products including integrated circuits (ICs), software
design tools, predefined system functions delivered as intellectual property (IP)
cores, design services, customer training, field engineering and technical
support Xilinx sells both FPGAs and CPLDs programmable logic devices for
electronic equipment manufacturers in end markets such as communications,
industrial, consumer, automotive and data processing.
Xilinx's FPGAs have been used for the ALICE (A Large Ion Collider
Experiment) at the CERN European laboratory on the French-Swiss border to
map and disentangle the trajectories of thousands of subatomic particles The
Virtex-II Pro, Virtex-4, Virtex-5, and Virtex-6 FPGA families are focused on
system-on-chip (SoC) designers because they include up to two embedded
IBM PowerPC cores.
Xilinx FPGAs can run a regular embedded OS (such as Linux or
vxWorks) and can implement processor peripherals in programmable logic.
Xilinx's IP cores include IP for simple functions (BCD encoders, counters,
etc.), for domain specific cores (digital signal processing, FFT and FIR cores)
to complex systems (multi-gigabit networking cores, MicroBlaze soft
microprocessor, and the compact Picoblaze microcontroller). Xilinx also
creates custom cores for a fee.The ISE Design Suite is the central electronic
design automation (EDA) product family sold by Xilinx. The ISE Design Suite
18
features include design entry and synthesis supporting Verilog or VHDL,
place-and-route (PAR), completed verification and debug using ChipScope Pro
tools, and creation of the bit files that are used to configure the chip.
Xilinx's Embedded Developer's Kit (EDK) supports the embedded
PowerPC 405 and 440 cores (in Virtex-II Pro and some Virtex-4 and -5 chips)
and the Microblaze core. Xilinx's System Generator for DSP implements DSP
designs on Xilinx FPGAs. A freeware version of its EDA software called ISE
WebPACK is used with some of its non-high-performance chips. Xilinx is the
only (as of 2007) FPGA vendor to distribute a native Linux freeware synthesis
toolchain. The Spartan series targets applications with a low-power footprint,
extreme cost sensitivity and high-volume; e.g. displays, set-top boxes, wireless
routers and other applications.The Spartan-6 family is built on a 45-nanometer
[nm], 9-metal layer, dual-oxide process technology. The Spartan-6 was
marketed in 2009 as a low-cost solution for automotive, wireless
communications, flat-panel display and video surveillance applications.
3.2.3 HISTORICAL PERSPECTIVE-VLSI
The electronics industry has achieved a phenomenal growth over
the last two decades, mainly due to the advent of VLSI. The number of
applications of integrated circuits in high-performance computing,
telecommunications and consumer electronics has been raising steadily and at
a very fast pace. Typically, the required computational power (or, in other
words, the intelligence) of these applications is the driving force for the fast
development of this field. The current leading-edge technologies (such as low
bit-rate video and cellular communications) already provide the end users a
19
certain amount of processing power and portability. This trend is expected to
be continued with very important implications of VLSI and systems design.
As more and more complex functions are required in
various data processing and telecommunications devices, the need to integrate
these function in the small
system /package is also increasing .The level of integration as measured by the
number of logic gates in a monolithic chip has been steadily rising for almost
three decades, mainly due to the rapid progress in processing technology and
interconnect technology. Shows the evolution of logic complexity in integrated
circuits over the last three decades, and marks the milestone of each era. Here,
the numbers for circuit complexity should be interpreted only as representative
examples to show the order of magnitude. A logic block can contain ten to
hundred transistors depending upon the function.
The important message here is that the logic complexity per chip has been
increasing exponentially. The monolithic integration of a large number of
functions on a single chip usually provides:
Less area / volume and therefore compactness.
Less power consumption.
Less testing requirements at system level.
Higher reliability, mainly due to improved on-chip
Interconnects.
Higher speed, due to significantly reduced interconnection length.
Significant cost savings.
20
Therefore, the current trend of integration will also continue in the foreseeable
future.
3.2.3.1 VLSI DESIGN FLOW
Fig 3.1 :VLSI Design Flow
21
The design process, at various levels, is usually evolutionary in nature. It
starts with a given set of requirements. Initial design is developed and tested
against the requirements. When requirements are not met, the design has to be
improved. If such improvement is either not possible or too costly, then the
revision of requirements and its impacts analysis must be considered.
The VLSI design flow consists of three major domains, namely:
Behavioral domain
Structural domain
Geometrical Layout domain.
The design flow starts from the algorithm that the behavior of the target
chips. The corresponding architecture of the processor is first defined. It is
mapped onto the chip surface by floor planning. The next design evolution in
the behavioral domain defines finite state machines(FSMs), which are
structurally implemented with functional modules such as registers and the
arithmetic logic units (ALUs).These modules are then geometrically placed
onto the chip surface using CAD tools for automatic module placement
followed by routing, with a goal of minimizing the interconnects area and
signal delays. The third evolution starts with behavioral modules are then
implemented with leaf cells. At this stage the chip is described in terms of
logic gates (leaf cells), which can be placed and interconnected by using a cell
placement and routing program. The last evolution involves a detailed Boolean
description of leaf cells and mask generation. In standard-cell based design,
leaf cells are already pre-designed and stored in a library for logic design use.
22
3.2.4 VHDL –AN OVERVIEW
VHDL is a hardware description language. The word ‘hardware’
however is used in a wide variety of contexts, which range from complete
systems like personal computers on one side to the small logical on their
internal integrated circuits on the other side.
3.24.1 USES OF A VHDL:
Since VHDL is a standard, the chip vendors can easily exchange their
circuit designs without depending on their proprietary software. The designing
process can be greatly simplified, as each component is designed individually
and all such components are interconnected to form a full system- hierarchy
and timing are always maintained.
With simulators available, a circuit can be tested easily and any error
found can be rectified without the expense of using a physical prototype, which
means that design time and expenditure on this get slashed down. Programs
written in either of the HDLS can be easily understood as they are similar to
programs of C or Pascal.
3.2.4.2 FEATURES OF VHDL:
VHDL provides five different types of primary constructs, called design
units. They are,
Entity: It consists of a design’s interface signals to the external circuitry
Architecture: It describes a design’s behavior and functionality.
Package: It contains frequently used declarations, constants, functions,
procedures user data types and components.
23
Configuration: It binds an entity to architecture when there are multiple
architecture for a single entity
Library: It consists of all the compiled design units like entities
architectures, packages and configurations
3.2.4.3 RANGE OF USE:
The design process always starts with a specification phase. The
component, which is to be designed, is defined with respect to function, size,
interfaces, etc. Despite the complexity of the final product, mainly simple
methods based on paper and pencil most of the time are being used. After that,
self-contained modules have to be defined on the system level. Behavior
models of standard components can be integrated into the system from libraries
of commercial model developer’s .The overall system can already be
simulated.
On the logic level, the models that have to be designed are described
with all the synthesis aspects in view .As long as only a certain subset of
VHDL constructs is used, commercial synthesis programs can derive the
Boolean functions from this abstract model description and map them to the
elements of an ASIC gate library or the configurable logic blocks of FPGAs.
The result is a net list of the circuit or of the module on the gate level.
Finally, the circuit layout for a specific ASIC technology can be created
by means of other tools from the net list description. Every transition to a
lower abstraction level must be proven by functional validation. For this
purpose, the description is simulated in such a way that for all stimuli (=input
24
signals for the simulation) the module’s responses are compared. VHDL is
suitable for the design phases from system level to gate level.
3.2.4.4 APPLICATION FIELD:
VHDL is used mainly for the development of Application Specific
Integrated Circuits (Asics). Tools for the automatic transformation of VHDL
code into a gate level net list were developed already at an early point of time.
This transformation is called synthesis and is an integral part of current design
flows. For the use with Field
Programmable Gate Arrays (FPGAs) several problems exist. In the first
step, Boolean equations are derived from the VHDL description, no matter,
whether an ASIC or a FPGA is the target technology. But now, this Boolean
code has to be partitioned into the configurable logic blocks (CLB) of the
FPGA. This is more difficult than the mapping onto an ASIC library. Another
big problem is the routing of the CLBs, as the available resources for
interconnections are the bottleneck of current FPGAs.
MODELING PROCEDURES USING VHDL
STRUCTURAL STYLE OF MODELING In the structural style of modeling, an entity is described as a set of Inter
connected components. Here the architecture body is composed of two parts:
the declarative part and statement part. Declarative part contains the
component declarations. The declared components are instantiated in the
statement part.
BEHAVIORAL STYLE OF MODELING
25
The behavioral style of modeling specifies the behavior of an entity as
aset of statements that are executed sequentially in the specified order. This set
of sequential statements, which are specified inside a process statement, do not
explicitly specify the structure of the entity but merely its functionality. A
process statement is a concurrent statement that can appear with in an
architecture body.
DATA FLOW STYLE OF MODELING
In this modeling style, the flow of data through the entity is expressed
primarily using concurrent signal assignment statements. The structure of the
entity is not explicitly specified in this modeling style, but it can be implicitly
deduced.
MIXED STYLE OF MODELING:
It is possible to mix three modeling styles that we have known in a
single Architecture body. That is, within an architecture body, we could use
component instantiation statements, concurrent signal assignment statements,
and process statements.
CHAPTER IV
ARCHITECTURE DETAILS
This paper presents a high-throughput decoder architecture for generic
quasi-cyclic low-density parity-check (QC-LDPC) codes. Various
26
optimizations are employed to increase the clock speed. A row permutation
scheme is proposed to significantly simplify the implementation of the shuffle
network in LDPC decoder. An approximate layered decoding approach is
explored to reduce the critical path of the layered LDPC decoder. Provided are
an LDPC encoder and decoder, and LDPC encoding and decoding methods.
The LDPC encoder includes: a code generating circuit that includes a memory
storing a first parity check matrix and sums a first row which is at least one
row of the first parity check matrix and a second row which is at least one of
the remaining rows of the first parity check matrix to output a second parity
check matrix; and an encoding circuit receiving the second parity check matrix
and an information word to output an LDPC-encoded code word. Also the
LDPC decoder includes: a code generating circuit including a memory which
stores a first parity check matrix and summing a first row which is at least one
row of the first parity check matrix and a second row which is at least one of
the remaining rows of the first parity check matrix to output a second parity
check matrix; and a decoding circuit receiving the second parity check matrix
and a code word to output an LDPC-decoded information word. he low-density
parity-check (LDPC) code invented in 1962 by Robert Gallager is a linear
block code defined by a very sparse parity check matrix, which is populated
primarily with zeros and sparsely with ones.
When it was first introduced, the LDPC code was too complicated to
implement, and so it was forgotten for a long time until not too long ago. The
LDPC code was brought to light again in 1995, and an irregular LDPC code
(which is a generalization of the LDPC code suggested by Robert Gallager)
was introduced in 1998. When the LDPC code was first introduced by
Gallager, a probabilistic decoding algorithm was also suggested, and the LDPC
code which is decoded using this algorithm exhibited excellent performance
27
characteristics. The LDPC code also showed improved performance when
extended to non-binary code as well as binary code to define code words. Like
the turbo code, the LDPC code yields a bit error rate (BER) approaching a
Shannon channel capacity limit, which is the theoretical maximum amount of
digital data that can be transmitted in a given bandwidth in presence of a
certain noise interference. The irregular LDPC code which is known to have
the best performance only needs an additional 0.13 dB from the Shannon
channel capacity to achieve a BER of 10−6 when a code length is a million bits
in an additive white Gaussian noise (AWGN) channel environment, and is thus
suitable for applications which require high-quality transmission with a very
low BER.
Unlike algebraic decoding algorithms usually used for decoding a block
code, the decoding algorithm of the LDPC code is a probabilistic decoding
algorithm to which a belief-propagation algorithm, which employs a graph
theory and a guessing theory, is applied “as is”. An LDPC decoder computes a
probability of a bit corresponding to each bit of a code word received through a
channel being “1” or “0”. The probability information computed by the LDPC
decoder is referred to as a message, and the quality of the message can be
checked through each parity defined in a parity check matrix. If a certain parity
of the parity check matrix is satisfied, i.e., the result of a parity check is
positive, the computed message is specially referred to as a parity check
message and contains the most probable value of each code word bit. The
parity check message for each parity is used to determine the value of a
corresponding bit, and information on a computed bit is referred to as a bit
message. Through a procedure of repeating such message transmission, the
information for bits of each code word comes to satisfy all parities of the
parity-check matrix. Finally, when all parities of the parity-check matrix are
28
satisfied, the decoding of the code word is finished. In an environment where a
signal to noise (S/N) ratio is low, systematic codes are used, and thus certain
portions of the code word are extracted to reproduce information bits.
If a channel is a frequency selective fading channel, adaptive modulation
and coding is used for low-error communication. The LDPC code is a type of
block channel code and thus has the disadvantage of being difficult to
adaptively modulate compared to a trellis code such as a convolution code or a
turbo code to which a desired form of modulation and coding can easily be
applied through puncturing. In order for the LDPC code to support various
code rates for adaptive transmission, it has to have various code matrices,
which carries the disadvantage of the encoder and the decoder needing a large
memory.
4.1 SUMMARY OF THE INVENTION
The present invention is directed to an LDPC encoder, an LDPC
decoder, and LDPC encoding and decoding methods in which a size of a
memory of the encoder and decoder can be reduced by forming, from one
parity-check matrix, a smaller parity-check matrix. A first aspect of the present
invention is to provide an LDPC encoder, including: a code generating circuit
including a memory which stores a first parity check matrix and summing a
first row which is at least one row of the first parity check matrix and a second
row which is at least one of the remaining rows of the first parity check matrix
to output a second parity check matrix; and an encoding circuit receiving the
second parity check matrix and an information word to output an LDPC-
encoded code word.
29
A second aspect of the present invention is to provide an LDPC decoder,
including: a code generating circuit including a memory which stores a first
parity check matrix and summing a first row which is at least one row of the
first parity check matrix and a second row which is at least one of the
remaining rows of the first parity check matrix to output a second parity check
matrix; and a decoding circuit receiving the second parity check matrix and a
code word to output an LDPC-decoded information word.
A third aspect of the present invention is to provide an LDPC encoder,
including: a code generating circuit including a memory which stores a first
parity check matrix and outputting a second parity check matrix formed by
removing a first row which is at least one row of the first parity check matrix;
and an encoding circuit receiving the second parity check matrix and an
information word to output an LDPC-encoded code word.
A fourth aspect of the present invention is to provide an LDPC decoder,
including: a code generating circuit including a memory which stores a first
parity check matrix and outputting a second parity check matrix formed by
removing a first row which is at least one row of the first parity check matrix;
and a decoding circuit receiving the second parity check matrix and a code
word to output an LDPC-decoded information word.
A fifth aspect of the present invention is to provide an LDPC encoding
method, including: storing a first parity check matrix in a memory; summing a
first row which is at least one row of the first parity check matrix and a second
row which is at least one of the remaining rows of the first parity check matrix
to form a second parity check matrix; and receiving the second parity check
matrix and an information word and performing LDPC-encoding.
30
A sixth aspect of the present invention is to provide an LDPC decoding
method, including: storing a first parity check matrix in a memory; summing a
first row which is at least one row of the first parity check matrix and a second
row which is at least one of the remaining rows of the first parity check matrix
to form a second parity check matrix; and receiving the second parity check
matrix and a code word and performing LDPC-decoding. Low Density Parity
Check (LDPC) codes offer excellent error correcting performance. However,
current implementations are not capable of achieving the performance required
by next generation storage and telecom applications. Extrapolation of many of
those designs is not possible because of routing congestions. This article
proposes a new architecture, based on a redefinition of a lesser-known LDPC
decoding algorithm. As random LDPC codes are the most powerful, we abstain
from making simplifying assumptions about the LDPC code which could ease
the routing problem. We avoid the routing congestion problem by going for
multiple independent sequential decoding machines, each decoding separate
received codewords. In this serial approach the required amount of memory
must be multiplied by the large number of machines. Our key contribution is a
check node centric reformulation of the algorithm which gives huge memory
reduction and which thus makes the serial approach possible.
NANO-X API
The Nano-X API tries to be compliant with the Microsoft Win32 and
WinCE GDI standard. Currently, there is support for most of the graphics
drawing and clipping routines, as well as automatic window title bar drawing
and dragging windows for movement. The Nano-X API is message-based, and
allows programs to be written without regard to the eventual window
31
management policies implemented by the system. The Nano-X API is not
currently client/server, and will be discussed in more detail in the section
called Nano-X API.
NANO-X API
The Nano-X API is modeled after the mini-x server written initially by
David Bell, which was a reimplementation of X on the MINIX operating
system. It loosely follows the X Window System Xlib API, but the names all
being with GrXXX() rather than X...(). Currently, the Nano-X API is
client/server, but does not have any provisions for automatic window
dressings, title bars, or user window moves. There are several groups writing
widget sets currently, which will provide such things. Unfortunately, the user
programs must also then write only to a specific widget set API, rather than
using the Nano-X API directly, which means that only the functionality
provided by the widget set will be upwardly available to the applications
programmer. (Although this could be considerable, in the case that, say Gdk
was ported.)
In recent years, research on nanotechnology has advanced rapidly. Novel
nanodevices have been developed, such as those based on carbon nanotubes,
nanowires, etc. Using these emerging nanodevices, diverse nanoarchitectures
have been proposed. Among them, hybrid nano/CMOS reconfigurable
architectures have attracted attention because of their advantages in
performance, integration density, and fault tolerance. Recently, a high
performance hybrid nano/CMOS reconfigurable architecture, called NATURE,
was presented. NATURE comprises CMOS reconfigurable logic and
interconnect fabric, and CMOS-fabrication-compatible nanomemory. High-
32
density, fast nano RAMs are distributed in NATURE as on-chip storage to
store multiple reconfiguration copies for each reconfigurable element. It
enables cycle-by-cycle runtime reconfiguration and a highly efficient
computational model, called temporal logic folding. Through logic folding,
NATURE provides more than an order of magnitude improvement in logic
density and area-delay product, and significant design flexibility in performing
area-delay trade-offs, at the same technology node. Moreover, NATURE can
be fabricated using mainstream photolithography fabrication techniques.
Hence, it offers a currently commercially viable reconfigurable architecture
with high performance, superior logic density, and outstanding design
flexibility, which is very attractive for deployment in cost-conscious embedded
systems.
In order to fully explore the potential of NATURE and further improve
its performance, in this article, a thorough design space exploration is
conducted to optimize its architecture. Investigations in terms of different logic
element architectures, interconnect designs, and various technologies for nano
RAMs are presented. Nano RAMs can not only be used as storage for
configuration bits, but the high density of nano RAMs also makes them
excellent candidates for large-capacity on-chip data storage in NATURE.
Many logic- and memory-intensive applications, such as video and image
processing, require large storage of temporal results. To enhance the capability
of NATURE for implementing such applications, we investigate the design of
nano data memory structures in NATURE and explore the impact of memory
density. Experimental results demonstrate significant throughput
improvements due to area saving from logic folding and parallel data
processing.
33
CHAPTER V
SYSTEM MODULES
5.1 FAULT TOLERANCE APPROACH
Fault tolerance technique is based on at least one of the three
types of redundancy: time, data, or hardware redundancy. Hardware
redundancy means the replication of hardware modules and some kind of result
comparison or voting instance. The inherent redundancy in field-
programmable logic resulting from the regular cell-based structure allows a
very efficient implementation of hardware redundancy. The faulty resource
must not be reused by the new configuration. After the reconfiguration, the
possible effect of the fault must be confined for some applications and the
circuit must be reset to a consistent state. Then the system can continue to
operate. The idea of an autonomous mechanism for fault detection and
reconfiguration at an appropriate speed, in terms of the regarded system, is the
starting point for the fault tolerance technique presented here. The technique
combines a scalable hardware-based fault detection mechanism with a fast
online fault reconfiguration technique and a check pointing and rollback
mechanism for fault recovery. The reconfiguration is based on a hardware-
34
implemented reconfiguration controller: the reconfiguration control unit
(RCU). In contrast to other online fault test and reconfiguration strategies as
described. The fault detection mechanism must provide the fault location and
trigger reconfiguration. The reconfiguration step must replace the current
configuration data set by an alternative configuration (which provides a fault-
avoiding mapping of the user circuit) and trigger recovery. The recovery step
must bring the whole system back into a consistent state. a fast online
technique, such differentiations are too time-consuming and a simpler
approach must be taken: all faults are assumed to be permanent. Even under
this assumption, no general technique is available today which controls the
appropriate reconfiguration procedure.
Fig.5.1: Phases of the fault tolerance technique.
The basic characteristics of fault tolerance require:
1. No single point of repair
2. Fault isolation to the failing component
3. Fault containment to prevent propagation of the failure
35
4. Availability of reversion modes
Fault-tolerant systems are typically based on the concept of redundancy.
5.2 NANOMEMORY ARCHITECTURE MODEL
The design structure of the encoder, corrector, and detector units of
our proposed fault-tolerant memory system. We also present the
implementation of these units on a sub-lithographic, nanowire-based substrate.
Before going into the design structure details we start with a brief overview of
the sub-lithographic memory architecture model.
Fig. 5.2. Structure of Nano Memory core
We use the Nano Memory and Nano PLA architectures to implement the
memory core and the supporting logic, respectively. Nano Memory and Nano
PLA are based on nanowire crossbars .The Nano Memory architecture
developed in can achieve greater than b/cm density even after including the
lithographic-scale address wires and defects. This design uses a nanowire
crossbar to store memory bits and a limited number of lithographic scale wires
for address and control lines. Fig.3 shows a schematic overview of this
memory structure. The nanowires can be uniquely selected through the two
address decoders located on the two sides of the memory core. Instead of using
36
a lithographic-scale interface to read and write into the memory core, we use a
nanowire-based interface. The reason that we can remove the lithographic-
scale interface is that all the blocks interfacing with the memory core (encoder,
corrector and detectors) are implemented with nanowire-based crossbars.
5.3 FAULT SECURE DETECTOR
The core of the detector operation is to generate the syndrome vector,
which is basically implementing the following vector-matrix multiplication on
the received encoded vector C and parity-check matrix H:
S=C.HT
Fig.5.3: Fault-secure detector for (15, 7, 5) EG-LDPC code
This binary sum is implemented with an XOR gate. Fig. 4 shows the
detector circuit for the (15, 7, 5)EG-LDPC code. Since the row weight of the
parity-check matrix is ρ , to generate one digit of the syndrome vector we need
a ρ -input XOR gate, or (ρ-1)2-input XOR gates. For the whole detector, it take
n(ρ-1) 2-input XOR gates. Table II illustrates
this quantity for some of the smaller EG-LDPC codes.
37
Hamming bound EG-LDPC Gilert Varshamov bound
(14,7,5) (15,7,5) (17,7,5)
(58,37,9) (63,37,9) (67,37,9)
(222,175,17) (255,175,17) (255,175,17)
TABLE 5.1 Detector, encoder, and corrector circuit area
An error is detected if any of the syndrome bits has a nonzero value. The
final error detection signal is implemented by an OR function of all the
syndrome bits. The output of this -input OR gate is the error detector signal
(see Fig. 4). In order to avoid a single point of failure, we must implement the
OR gate with a reliable substrate (e.g., in a system with sub-lithographic
nanowire substrate, the OR gate is implemented with reliable lithographic
technology—i.e., lithographic-scaled wire-OR).
5.4 ENCODER
An n-bit codeword c, which encodes a k-bit information vector is
generated by multiplying the k -bit information vector with k x n a bit
generator matrix G ; i.e., c=i .G.. EG-LDPC codes are not systematic and the
information bits must be decoded from the encoded vector, which is not
desirable for our fault-tolerant approach due to the further complication and
delay that it adds to the operation. these codes are cyclic codes 15. We used the
procedure to convert the cyclic generator matrices to systematic generator
matrices for all the EG-LDPC codes under consideration.
38
Fig. 5.4: Structure of an encoder circuit for the (15, 7, 5) EG-LDPC code
The above figure shows the encoder circuit to compute the parity bits of
the (15, 7, 5) EG-LDPC code. In this figure i=(i0,…….,i6) is the information
vector and will be copied to c0,…….c6 bits of the encoded vector, c , and the
rest of encoded vector ,the parity bits, are linear sums (XOR) of the
information bits. If the building block is two-input gates then the encoder
circuitry takes 22 two-input XOR gates. Table I shows the area of the encoder
circuits for each EG-LDPC codes under consideration based on their generator
matrices.
5.5 CORRECTOR
1) ONE-STEP MAJORITY-LOGIC CORRECTOR
One-step majority logic correction is the procedure that identifies the
correct value of a each bit in the codeword directly from the received
codeword; this is in contrast to the general message-passing error correction
39
strategy (e.g., [23]) which may demand multiple iterations of error diagnosis
and trial correction. Avoiding iteration makes the correction latency both small
and deterministic
This method consists of two parts:
1) Generating a specific set of linear sums of the received vector bits
2) Finding the majority value of the computed linear sums. linear sum of
the received
encoded vector bits can be formed by computing the inner product of
the received
vector and a row of a parity-check matrix. This sum is called Parity-
Check sum
2) MAJORITY CIRCUIT IMPLEMENTATION
Here we present a compact implementation for the majority gate using
Sorting Networks
5.6 BANKED MEMORY
Large memories are conventionally organized as sets of smaller memory
blocks called banks. The reason for breaking a large memory into smaller
banks is to trade off overall memory density for access speed and reliability.
Excessively small bank sizes will incur a large area overhead for memory
drivers and receivers. Large memory banks require long rows and columns
which results in high capacitance wires that consequently increases
the delay. Furthermore long wires are more susceptible to breaks and bridging
defects. Therefore excessively large memory banks have high defect rate and
low performance.
40
Fig.5.5. Banked memory organization, with single global corrector.
The number of faults that accumulate in the memory is directly related
to the scrubbing period. The longer the scrubbing period is, the larger the
number of errors that can accumulate in the system. However, scrubbing all
memory words serially can take a long time. If the time to serially scrub the
memory becomes noticeable compared to the scrubbing period, it can reduce
the system performance. To reduce the scrubbing time, we can potentially
scrub all the memory banks in parallel
CHAPTER VI
SYSTEM IMPLEMENTATION
41
6.1 PROCESS (Dynamic Reconfiguration)
The feasibility of run-time reconfiguration of FPGAs has been
established by a large number of case studies. However, these systems have
typically involved an ad hoc combination of hardware and software. The
software that manages the dynamic reconfiguration is typically specialized to
one application and one hardware configuration. We present three different
applications of dynamic reconfiguration, based on research activities at
Glasgow University, and extract a set of common requirements. We present the
design of an extensible run-time system for managing the dynamic
reconfiguration of FPGAs, motivated by these requirements. The system is
called RAGE, and incorporates operating-system style services that permit
sophisticated and high level operations on circuits.
ECC stands for "Error Correction Codes" and is a method used to detect
and correct errors introduced during storage or transmission of data. Certain
kinds of RAM chips inside a computer implement this technique to correct
data errors and are known as ECC Memory. ECC Memory chips are
predominantly used in servers rather than in client computers. Memory errors
are proportional to the amount of RAM in a computer as well as the duration of
operation. Since servers typically contain several Gigabytes of RAM and are in
operation 24 hours a day, the likelihood of errors cropping up in their memory
chips is comparatively high and hence they require ECC Memory.
Memory errors are of two types, namely hard and soft. Hard errors are
caused due to fabrication defects in the memory chip and cannot be corrected
once they start appearing. Soft errors on the other hand are caused
predominantly by electrical disturbances. Memory errors that are not corrected
immediately can eventually crash a computer. This again has more relevance
42
to a server than a client computer in an office or home environment. When a
client crashes, it normally does not affect other computers even when it is
connected to a network, but when a server crashes it brings the entire network
down with it. Hence ECC memory is mandatory for servers but optional for
clients unless they are used for mission critical applications.
ECC Memory chips mostly use Hamming Code or Triple Modular
Redundancy as the method of error detection and correction. These are known
as FEC codes or Forward Error Correction codes that manage error correction
on their own instead of going back and requesting the data source to resend the
original data. These codes can correct single bit errors occurring in data. Multi-
bit errors are very rare and hence due not pose much of a threat to memory
systems.
ENCODING PROCESS
EGLDPC codes have received tremendous attention in the coding
community because of their excellent error correction capability and near-
capacity performance. Some randomly constructed EGLDPC codes, measured
in Bit Error Rate (BER), come very close to the Shannon limit for the AWGN
channel (within 0.05 dB) with iterative decoding and very long block sizes (on
the order of 106 to 107). However, for many practical applications (e.g.
packet-based communication systems), shorter and variable block-size
EGLDPC codes with good Frame Error Rate (FER) performance are desired.
Communications in packet-based wireless networks usually involve a large
per-frame overhead including both the physical (PHY) layer and MAC layer
headers. As a result, the design for a reliable wireless link often faces a trade-
off between channel utilization (frame size) and error correction capability.
One solution is to use adaptive burst profiles in which, transmission parameters
43
relevant to modulation and coding may be assigned dynamically on a burst-by-
burst basis. Therefore, LDPC codes with variable block lengths and multiple
code rates for different quality-of service under various channel conditions are
highly desired.
FLOW OF ENCODING PROCESS
Fig 6.1:Flow of encoding process
In the recent literature, there are many EGLDPC decoder architectures
but few of them support variable block-size and muti-rate decoding. For
example, a 1 Gbps 1024-bit, rate 1/2 EGLDPC decoder has been implemented.
However this architecture just supports one particular EGLDPC code by
wiring the whole Tanner graph into hardware. A code rate programmable
EGLDPC decoder is proposed, but the code length is still fixed to 2048 bit for
simple VLSI implementation. In [3], a EGLDPC decoder that supports three
block sizes and four code rates is designed by storing 12 different parity check
matrices on-chip. As we can see, the main design challenge for supporting
variable block sizes and multiple code rates stems from the random or
unstructured nature of the EGLDPC codes. Generally support for different
44
block sizes of EGLDPC codes would require different hardware architectures.
To address this problem, we propose a generalized decoder architecture based
on the quasi-cyclic EGLDPC codes that can support a wider range of block
sizes and code rates at a low hardware requirement. To balance the
implementation complexity and the decoding throughput, a structured
EGLDPC code was proposed in recently for modern wireless communication
systems including but not limited to IEEE 802.16e and IEEE 802.11n. An
expansion factor. It divides the variable nodes and the check nodes into
clusters of size P such that if there exists an edge between variable and check
clusters, then it means P variable nodes connect to P check nodes via a
permutation (cyclic shift) network. Generally, support for different block sizes
and code rates implies usage of multiple PCMs. Storing all the PCMs onchip is
almost impractical and expensive. A good tradeoff between design complexity
and decoding throughput is partially parallel decoding by grouping a certain
number of variable and check nodes into a cluster for parallel processing.
Furthermore, the layered decoding algorithm can be applied to improve the
decoding convergence time by a factor of two and hence increases the
throughput. The structured EGLDPC code makes it effectively suitable for
efficient VLSI implementation by significantly simplifying the memory access
and message passing. The PCM can be viewed as a group of concatenated
horizontal layers, where the column weight is at most 1in each layer due to the
cyclic shift structure.
6.2 TESTING TECHNIQUES
In this project describe simple iterative decoders for low-density parity-
check codes based on Euclidean geometries, suitable for practical very-large-
scale-integration implementation in applications requiring very fast decoders.
45
The decoders are based on shuffled and replica-shuffled versions of iterative
bit-flipping (BF) and quantized weighted BF schemes. The proposed decoders
converge faster and provide better ultimate performance than standard BF
decoders. Here present simulations that illustrate the performance versus
complexity tradeoffs for these decoders. This project can show in some cases
through importance sampling that no significant error floor exists. Here novel
architectures comprising of one parallel and two semi-parallel decoder
architectures for popular PG-based LDPC codes.
These architectures have no memory clash and further are reconfigurable
for different lengths (and their corresponding rates). The architectures can be
configured either for the regular belief propagation based decoding or majority
logic decoding (MLD).In this paper, these analyze storage circuits constructed
from unreliable memory components. This project propose a memory
construction, using low-density parity-check codes, based on a construction
originally made by Taylor. The storage circuit consists of unreliable memory
cells along with a correcting circuit. The correcting circuit is also constructed
from unreliable logic gates along with a small number of perfect gates. The
modified construction enables the memory device to perform better than the
original construction. These present numerical results supporting our claims.
CHAPTER VII
PERFORMANCE AND LIMITATIONS
REED-SOLOMON APPLICATIONS
Modem Technologies xDSL, Cable modems CD, DVD Players
46
Digital Audio and Video Broadcast HDTV/Digital TV Data Storage and Retrieval Systems Hard-Disk Drives, CD-ROM Wireless Communications Cell Phones, Base Stations Wireless Enabled PDAs Digital Satellite Communication and Broadcast RAID Controllers with Fault-Tolerance
7.1 APPLICATIONS:
Used in SOC, NOC Processor
Used in Radios
Used almost in all electronic devices
Loopback BIST model for digital transceivers with limited test
circuitry
Spot-defects models (typical of CMOS technology) based on noise
and nonlinear analysis, using fault abstraction
7.2 MERITS OF SYSTEM
Reduces maintenance cost
High speed fault tolerance
Can easily identify faults
Process Capability
No external circuitry
Does not affect the internal Architecture of nano memory
Multiple faults can be easily solved
47
7.3 LIMITATIONS OF SYSTEM
Hardware faults cannot be recognized
Only pre designed regions can be checked
May negatively impact manufacturers current technology of silicon
chips
Only used in specific application
7.4 FUTURE ENHANCEMENT
With the advancement in science electrical and electronic devices
has reached unimaginable levels. The main constraint of any good device is,
it serves its purpose effectively BiST enables this efficiency. Future BiST t
system can be designed in such a way that hardware faults can also be
indicated so that it can be corrected. A multiprocessor system-on-chip is an
integrated system that performs real-time tasks at low power and for low
cost.
CHAPTER VIII
OUTPUT RESULTS AND DISCUSSIONS
ENCODER
48
DECODER
49
EXISTING METHOD’S RESULT
50
PAPER’S RESULT
PROPOSED METHOD’S RESULT
51
CHAPTER IX
CONCLUSION
This paper presents an algebraic method for constructing Modified E.G
low-density parity-check (LDPC) codes based on the structural properties of
Euclidean geometries. The construction method results in a class of M-EG-
LDPC codes. The key novel contribution of this paper is identifying and
defining a new class of error-correcting codes whose redundancy makes the
design of fault-secure detectors (FSD) particularly simple. We further quantify
the importance of protecting encoder and decoder circuitry against transient
errors, illustrating a scenario where the system failure rate (FIT) is dominated
by the failure rate of the encoder and decoder. We prove that Euclidean
geometry low-density parity-check (EG-LDPC) codes have the fault-secure
detector capability
52
CHAPTER X
REFERENCES
[1] Fault Secure Encoder and Decoder for memory Applications. Naeimi and
A. DeHon, in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Syst.,
Sep.2007, pp. 409–417
[2] M. Davey and D.J.Mackay, “Low density parity check codes over Gf(q),”
IEEE Commun. Lett.,vol.2,no.6,pp.165-167,jun.1998.
[3] Codes on finite geometries H.Tang, J. Xu, S.Lin, and K. A. S. Abdel- IEEE
Trans. Inf. Theory, vol. 51, no. 2,Feb. 2005.
[4] Designing fault-secure parallel encoders for systematic linear error
correcting codes S. J. Piestrak, A. Dandache, and F. Monteiro IEEE Trans.
Reliab., vol. 52, no.4 December 2003.
[5] D. J. C. MacKay, “Good error-correcting codes based on very sparse
matrices,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399–431, Mar.1999.
[6] H. Wymeersch, H. Steendam, and M. Moeneclaey, “Log-domain
decoding of LDPC codes over Gf(q),” in Proc. IEEE Int. Conf. Commun.,
Paris, France, Jun. 2004, pp. 772–776.
53
54