LDPC Decoding Methods
-
Upload
ajithc0003 -
Category
Documents
-
view
20 -
download
1
Transcript of LDPC Decoding Methods
![Page 1: LDPC Decoding Methods](https://reader036.fdocuments.in/reader036/viewer/2022081908/552a546b55034675428b45d4/html5/thumbnails/1.jpg)
DESIGN OF OPTIMAL LAYER DECODING FOR LDPC CODES
AJITH.C Mr.P.KABILAMANI
M.E-VLSI DESIGN ASSISTANT PROFESSOR,Dept.ECE
SAKTHI MARIAMMAN ENGINEERING COLLEGE SAKTHI MARIAMMAN ENGINEERING COLLEGE
CHENNAI-602 105 CHENNAI-602 105
E-mail ID:[email protected] E-mail ID: [email protected].
ABSTRACT
In this paper, present a method for creating LDPC
codes which are specifically designed to be
hardware friendly. This method involves
constraining the cyclic shift values in the base H-
matrix to reduce the complexity of the cyclic shift
hardware. In this, show that the decoder hardware
implementation for these codes has higher
throughput and lower power consumption than
decoders designed for traditional LDPC codes. It
provide results showing that these codes maintain
the error rate performance expected of LDPC codes
while achieving these throughput and power
consumption improvements.
Keywords - low density parity check decoder;
encoder; layered; LDPC; permutation network;
shuffle; cyclic shift; constrained code design; low
power hardware; high data rate; gigabit;
communications; WiGig; flooding;
I. INTRODUCTION
Low Density Parity Check (LDPC) codes [1]
have received a great deal of attention in recent years.
This is due to their ability to achieve performance
close to the Shannon limit [2],
hardware [3][4], and their support of high data rates.
However, data rates in modern communication
systems are everincreasing, and the rise of battery-
powered mobile
communication and computing devices necessitates
low-power hardware design wherever possible. As a
result, there is continuing work in the area of
improving the power-efficiency
and throughput of LDPC encoding and decoding
hardware.
Traditionally, LDPC code design has focused on
error rate performance and parallelization, while
decoder hardware design has focused on throughput
and power consumption. Often the approach used is
code-first design, where the hardware is designed
after a good-performing code has been created. In
this paper, our approach is a hardware-first design
approach focused on improving the cyclic shift
function (alternatively referred to as a shuffle
network or permutation network) used to vary the
connections between the variable nodes (VN) and
check nodes (CN) in the decoder.
This paper presents a method for designing LDPC
codes which are inherently hardware-friendly.
Initially, we discuss some background information on
LDPC codes and the hardware required to perform
cyclic permutations of the VN data in a layered
decoder. We then present our approach to code
design which minimizes the number of cyclic
permutations that must be supported by the hardware.
We show the improved hardware implementation for
these codes, and discuss the improved performance
characteristics of this hardware. And finally, we
demonstrate that the error-rate performance of these
codes is comparable to codes designed by existing
methods.
II. LDPC CODES AND LAYERED DECODER
DESIGN
The structure of a specific LDPC code is
determined entirely by its H-matrix. Because of the
size and routing complexity involved in
implementing a fully-parallel decoder (also known as
a flooding decoder), most practical LDPC codes are
structured to support partially parallel decoders in
hardware (also known as layered-schedule decoders).
To create a code with good performance which
supports parallel decoding in hardware, one
commonly used approach is protograph-based code
design. Using this approach, a structured code is
created by expanding a relatively small Tanner graph
![Page 2: LDPC Decoding Methods](https://reader036.fdocuments.in/reader036/viewer/2022081908/552a546b55034675428b45d4/html5/thumbnails/2.jpg)
called a protograph [5]. The resultant code has an H-
matrix, which can be described by a set of
permutation matrices. In one form of protograph-
based code design, an expansion factor Z is chosen.
Each permutation matrix is of size Z×Z, and is
created by performing a right cyclic shift on the Z×Z
identity matrix. Fig. 1 shows an illustrative example
of the base H-matrix of a protograph-based LDPC
code designed using Z-factor expansion. Note that we
use the term base Hmatrix to refer to the compact
form of the H-matrix, where each cell represents a
permutation matrix. In this example, the positive
number x refers to the identity matrix shifted x times
to the right (e.g., the number ‘5’ refers to the identity
matrix cyclically shifted five times to the right), and
the value ‘-1’ refers to the Z×Z all-zeros matrix. The
shift value determines the connection of the VNs to
the CNs for the particular row in the base H-matrix.
The amount of the cyclic shift (i.e., the value x)
ranges from 0 to Z-1 and generally varies across the
rows and columns of the base H-matrix. Careful
selection of the shift values is necessary to design a
code with good errorrate performance.
For any row of the base H-matrix, each VN will
be connected to at most one CN. This is easily
explained because the row weight and column weight
of the identity matrix (or, a cyclically shifted identity
matrix) are both one. This condition is important to
eliminate contention issues on the reading of VN
information and the writing of CN messages back to
the VN memory. This concept is usually generalized
by dividing the H-matrix into layers. Each layer
consists of a set of contention-free CNs (i.e., only one
CN can access a given VN memory at a specific
time), which are processed in parallel without
contention. Note that one can treat each row of the
base H-matrix as a layer. To increase decoding
throughput it is possible to process more than one
row of the base H-matrix in the same layer. In this
case, the protograph of the LDPC code needs to be
carefully designed to allow parallel processing of
multiple base H-matrix rows in a single layer.
Figure 1. Base H-matrix of a protograph-based LDPC
code designed using
Z-factor expansion (Z=7)
A protograph-based LDPC code is well-suited to
layered decoder design. To simplify description,
unless otherwise stated we consider that each layer
consists of one base Hmatrix row. We also refer to
the hardware which processes all CNs in one row of
the base H-matrix as the CN processor [6, Ch. 5]. In
a layered decoder, one CN processor can be designed
to serially process the different rows of the base H-
matrix. That is, the decoder operates on one layer at a
time, and processes the layers in sequence until a full
iteration through all parity checks (i.e. the entire H-
matrix) is completed. However, in order to use the
same CN processor for each layer, the interconnect
between the VNs and the CNs changes with each
layer processed according to the cyclic shifts
represented by the values in the base H-matrix. The
base H-matrix has c columns, and each column
corresponds to Z elements of the VN data. For each
column, the hardware must be able to perform a
number of cyclic shifts equal to the column weight of
that column. Moreover, if the decoder supports
multiple code rates (multiple base H-matrices), the
CN processing machine needs to be designed to
process any row in the set of base H-matrices.
Consequently, the number of cyclic-shift possibilities
which must be supported increases accordingly. The
permutations of the protograph are usually designed
to result in codes of good error-rate performance
without paying attention to the number of cyclic-shift
possibilities in each column across the set of base H-
matrices. However, this can significantly increase the
complexity of the CN processing hardware as will be
shown later in this paper.
![Page 3: LDPC Decoding Methods](https://reader036.fdocuments.in/reader036/viewer/2022081908/552a546b55034675428b45d4/html5/thumbnails/3.jpg)
There are several hardware approaches to
handling the cyclic shifts in the decoder. Fig. 2 shows
one common approach. This approach uses an
absolute right-cyclic-shift when data is prepared for
the CN processor and a complementary absolute left-
cyclic-shift when messages from the CN processor
are returned to the VNs.
Figure 2. Layered LDPC decoder using absolute
cyclic shifts
An improved approach [7] is to replace the
dual rightshift/ left-shift hardware with a single
relative shift. This approach is shown in Fig. 3. The
single shift performed in this method is a right
relative shift equal to the difference between the
current layer's absolute shift value and the previous
layer's absolute right shift value. This approach will
generally require less hardware (silicon) and have
less overall path delay and power consumption than
implementing two absolute shift functions.
Figure 3. Layered LDPC decoder using relative
cyclic shift
In the hardware implementation of an
LDPC decoder, the cyclic-shift functions are
typically constructed with multiplexers created from
CMOS logic or pass-transistor logic. The logic for
these multiplexers adds to the processing delay for
each layer of the code. This delay limits the operating
frequency of the CN processors, which in turn
reduces the throughput of the decoder. Moreover, the
switching of the multiplexer increases the power
consumption of the decoder.
The approaches in Figs. 2 and 3 are effective
decoder implementations for existing LDPC codes.
However, if we take a step back and consider the
hardware together with the code design, we can
design codes with higher-throughput and lowerpower
consumption. By paying attention to the cyclic-shift
possibilities in each column across all base H-
matrices, we can intelligently select them to optimize
both hardware implementation and performance.
III. VECTOR DECODER ARCHITECTURE
FOR QC-LDPC CODES
Vector decoder architecture overcomes the
limitation of the scalar decoder (described in the
previous section) by packing multiple messages in
the same memory word. As noted before, this is
possible because, block RAMs can be configured into
different aspect rations. For the NMSA, the intrinsic
![Page 4: LDPC Decoding Methods](https://reader036.fdocuments.in/reader036/viewer/2022081908/552a546b55034675428b45d4/html5/thumbnails/4.jpg)
and the extrinsic messages are usually 6–8 bit wide,
thus up to six 6-bit messages can be packed in one
memory word in the 512x36 block RAM
configuration. We define the number of messages
packed into one memory word as . The vector
decoder for a -regular code requires CNUs, VNUs,
and block RAMs for intrinsic and extrinsic
memories. Fig. 4.1 shows a vector decoder for the
code when . Potentially, the throughput of a vector
decoder can be times that of a scalar decoder, given
that there are times more functional units operating
simultaneously. However, without proper data
packing scheme, memory access conflicts will be
caused since multiple messages are accesses per
cycle. Besides, efficient message alignment units are
required so that the additional logic incurred would
be reduced. The techniques to overcome these
challenges is described next.
In a vector decoder, each block RAM location
holds multiple messages. Memory conflicts could
arise if the CNU and VNU try to access the same
location simultaneously. The key challenges with
vector decoding is to reduce the potential for such
conflicts and to ensure that the overhead of resolving
these conflicts through alignment units does not
increase the complexity of the decoder and limit
scalability and clock frequency. We address this
challenge with a combination of three techniques that
are described next. First, we use double buffering,
i.e., the messages are replicated for CNU and VNU
access, so that they are stored in different ways to
match the access pattern of the CNU and VNU
processing. Though it doubles the amount of memory
needed for the storage, it does not increase the
number of block RAMs necessary, because we use
the same block RAM to store both CNU and VNU
memory. This works because typically the CPM sizes
are much smaller than the depth of the block RAMs
in an FPGA. Second, we develop a new packing
strategy that not only uses the block RAMs
efficiently but also reduces the potential for conflicts.
Third we propose a sequential alignment unit and its
implementation to demonstrate that the alignment
task can be achieved with relatively low complexity,
which makes the scheme scalable.
Figure 4.Vector Decoder Architecture
Message Packing and Alignment
The CNU and VNU memory can be modelled
as two dimensional arrays. the nonzero entries in the
CPM are stored in the block RAM. Each message
appears in two different locations, because of the
double buffering. Each block RAM is partitioned
logically into a VNU memory and CNU memory and
the messages are stored in different order to facilitate
conflict free access by CN and VN processing units.
There are two advantages of our scheme First, our
scheme does not require any read alignment units
Second, our method works for any value of c and v
which is essential to support overlapped message
passing.
Proposed Packing Scheme
The CNU and VNU memory can be
modeled as two dimensional arrays, denoted by and .
The variable-to-check messages are stored in the
CNU memory by the CNU access order, i.e., the
message is packed as . The check-to-variable
messages are packed in the VNU memory by the
VNU access order, i.e., the message is packed as . In
general, message is packed to the location in the
CNU memory and to in the VNU memory.Each
block RAM is partitioned logically into a VNU
memory and CNU memory and the messages are
stored in different order to facilitate conflict free
access by CN and VN processing units. We compare
our method with the scheme presented in Wang [12],
which is shown in Fig. There are two advantages of
![Page 5: LDPC Decoding Methods](https://reader036.fdocuments.in/reader036/viewer/2022081908/552a546b55034675428b45d4/html5/thumbnails/5.jpg)
our scheme over the scheme proposed in [12]. First,
our scheme does not require any read alignment units
(due to double buffering), where as the packing
shown in Fig. 4(c) requires two read alignment units
in addition to two write alignment units (which we
also require). Second, our method works for any
value of and which is essential to support overlapped
message passing, whereas the method in [12] works
well for nonoverlapped message passing, i.e., and are
implicity assumed to be 0. When and are nonzero as
in this example, the packing scheme in [12] becomes
inefficient. For example, consider the updating of the
messages in the third word in the VNU memory, one
can see that these messages are spread across three
different memory words, which would entail three
reads and a very complex alignment circuitry with
the concomitant increase in latency.
Sequential Write Alignment Unit:
Given that a given message is mapped to
different locations in the CNU and VNU memory,
except when , the messages need to be aligned before
they are written to the memory. Given the double
buffered efficient packing scheme described above,
the alignment task is greatly simplified. We need just
the current word and the previous word to reconstruct
the input order for the CN and VN processing units.
The messages are assumed to be quantized to bits.
The hardware requirements are relatively modest. Let
for VNU alignment and for CNU alignment. For
vector length and CPM size , 1) when , or when , ,
no alignment unit circuitry is needed; 2) when of the
partially parallel decoder.Vector processing can be
combined with overlapped message passing to further
improve the throughput.
IV.RESULTS AND DISCUSSION
The Proposed design has been implemented by
Verilog HDL,Simulated with Xilinx ISE 9.1i.
The First step in the Vectored layer Decoder
implementation is the Error detection of two
errors,then Error correction using Quasi-cyclic Error
correction.Then the corrected output is decoded to
original message.
Fig 5. Error Correction Using Layered Decoding
Fig.6.Error Correction Using Vectored Layer
Decoding
V. CONCLUSION
In ths paper,we explore a compact software
implementation of Layered and Vectored Layer
Decoder Design.From the performance analysis of
the proposed design.it is evident that the error rate
performance of the Layered Decoder can be
improved by the proposed design.The Verilog HDL
code for Layered and Vectored Layer Decoding
algorithm is developed block wise.Optimized and
synthesizable Verilog HDL code for each block
synthesized using XilinxISE 9.1i and the verilog
HDL implementation has shown that the language
provides a useful tool of practicing the algorithms
without drawing of large amount of logic
gates.Verified that functionally correct.
![Page 6: LDPC Decoding Methods](https://reader036.fdocuments.in/reader036/viewer/2022081908/552a546b55034675428b45d4/html5/thumbnails/6.jpg)
REFERENCES
[1] R. G. Gallager, "Low-density parity-check
codes," Cambridge, MA: MIT Press, 1963.
[2] D. J. C. MacKay and R. M. Neal, “Near Shannon
limit performance of low density parity check codes,”
Electronics Letters, vol. 32, pp. 1645– 1646, Aug.
1996.
[3] E. Boutillon, J. Castura, and F. R. Kschischang,
"Decoder-first code design," Proceedings of the
2nd Intl Symposium on Turbo Codes and Related
Topics, pp. 459-462, Sep. 2000.
[4] T. Zhang and K. K. Parhi, "VLSI implementation-
oriented (3,k)-regular
low-density parity-check codes," IEEE Workshop on
Signal Processing Systems, pp 25-36, Sep. 2001.
[5] J. Thorpe, "Low-density parity-check (LDPC)
codes constructed from protographs," IPN Progress
Report 42-154, Apr.-Jun. 2003.
[6] W. E. Ryan and S. Lin, Channel Codes: Classical
and Modern, Cambridge University Press, 2009.
[7] A. Blanksby, B. Shen, and J. Trachewsky, "LDPC
code set for mmwave communication," Proceedings
of the 2010 ACM international workshop on
mmWave communications, pp. 39-43, Sep. 2010.
[8] G. Liva, and M. Chiani,“Protograph LDPC Codes
Design Based on EXIT Analysis,” IEEE Global
Telecommunication Conference, GLOBECOM , pp.
3250-3254, Nov. 2007.
[9] Y. Hu, E. Eleftheriou, and D. M. Arnold,
“Regular and irregular progressive edge growth
Tanner graphs,” IEEE Trans. Inform. Theory, vol. 51,
no. 1, pp. 386-398, Jan. 2005.
[10] S. A. Surra, E. Pisek, T. Henige, "Gigabit rate
achieving low-power LDPC codes: Design and
architecture", IEEE Wireless Communications and
Networking Conference (WCNC), pp. 1994-1999,
Mar. 2011.
[11] “IEEE 802.11ad. PHY/MAC Complete Proposal
Specification (approved as IEEE 802.11ad D0.1),”
http://tinyurl.com/2fqlkxx, May 2010.