[IEEE 2012 IEEE Workshop on Signal Processing Systems (SiPS) - Quebec City, QC, Canada...
Transcript of [IEEE 2012 IEEE Workshop on Signal Processing Systems (SiPS) - Quebec City, QC, Canada...
ENERGY-EFFICIENT LDPC DECODERS BASED ON ERROR-RESILIENCY
Eric P. Kim and Naresh R. Shanbhag
University of Illinois at Urbana ChampaignCoordinate Science Laboratory/Department of Electrical and Computer Engineering
1308 W. Main St. Urbana, IL 61801[epkim2, shanbhag]@illinois.edu
ABSTRACT
Low density parity check (LDPC) codes are used in vari-
ous communication standards. However, LDPC decoders are
complex and power hungry. In this paper, we present an
energy-efficient LDPC decoder based on statistical error com-
pensation (SEC). Three different size LDPC codes, (50, 25),(800, 400), and (1800, 900) were implemented with 5 itera-tions/block. Circuit simulations in a commercial 45 nm pro-
cess show that the SEC based LDPC decoder can operate at
a supply voltage up to 38% less than the nominal voltage and
tolerate up to 30× more errors over an SNR range of 3 dBto 8 dB, while maintaining less than 3× degradation in BER.This is equivalent with energy savings of 45.7% compared to
conventional LDPC decoders, and 33.2% compared to a sign
bit protected LDPC decoder.
Index Terms— LDPC, Error resiliency, low power
1. INTRODUCTION
Low density parity check (LDPC) codes offer excellent er-
ror correction performance and have been employed in com-
munication systems such as 802.11 Wi-Fi, DVB-S2 satellite
transmission of digital television, and are considered for many
4G systems including WiMax (IEEE Std 802.16e) [1, 2, 3].
However, the decoding complexity of LDPC codes are quite
large and low power LDPC decoders are required to satisfy
the power constraints of wireless handsets. Much work has
been done on low power LDPC decoders. Analog decoder
architectures have been proposed for short length codes [4],
however, scaling the code length to more than 250 will be
challenging due to device mismatch, and buffering require-
ments. Digital low power LDPC decoder architectures mostly
focus on reducing the decoding complexity through early ter-
mination or approximation [5, 6, 7].
The recent trend in energy efficient designs have been to
focus on the trade-off between error-resiliency and energy-
efficiency [8, 9, 10, 11, 12, 13]. Previous work on error re-
silient LDPC decoders have protected the sign bit of decoding
messages (see Section 3.1) or employed triple modular redun-
dancy (TMR) [14]. However, the error resiliency provided by
sign bit protection (SBP) alone has limitations at high error
rates (more than 3% errors per clock cycle), and TMR has
very high overhead.
In this paper, we present an energy-efficient LDPC de-
coder based on statistical error compensation (SEC) [11, 12,
13] and evaluate the energy efficiency vs. robustness trade-
offs in a 45 nm process technology. SEC techniques intro-
duce resiliency by exploiting knowledge on the statistics of
the data and error to compensate for the errors. This is in
contrast to microarchitectural techniques [8, 9] that employ
local error detection and global correction via replay and can
compensate for an error rate pe (percentage of clock cyclesin which the output is in error) of less than 0.1%. SEC tech-
niques have shown to successfully compensate for pe up to86% [13] while demonstrating measured energy savings of
3.3×-to-5.8×. This energy efficiency is obtained by trad-ing off reliability with energy savings via voltage overscaling
(VOS). VOS is employed to induce timing errors by reduc-
ing the supply voltage Vdd below the critical voltage Vdd,crit,
the lowest voltage with error free operation. The error rate peincreases as Vdd is reduced further below Vdd,crit. Figure 1
shows the energy vs. Vdd for a bit node, a block used in de-
coding LDPC codes, for a conventional error-free design, and
a VOS design that targets an error rate of 70%. If the VOSinduced errors are fully compensated for, significant energy
savings (up to 70%) can be achieved.The paper is organized as follows. Section 2 provides
background information on LDPC codes and SEC techniques
that are used in this paper. Section 3 describes the LDPC de-
coder architecture in detail. The simulation setup including
energy and error modeling are presented in Section 4. Simu-
lation results are given in Section 5 with Section 6 concluding
the paper.
2. BACKGROUND
2.1. Low density parity check codes
LDPC codes are linear block codes based on a sparse parity
check matrix H . Let H be a binary r × n matrix. In codingtheory, any vector c of length n is a valid codeword ifHcT =
2012 IEEE Workshop on Signal Processing Systems
978-0-7695-4856-2/12 $26.00 © 2012 IEEE
DOI 10.1109/SiPS.2012.60
149
0.4 0.5 0.6 0.7 0.8 0.9 10
0.4
0.8
1.2
1.6
2x 10
-12
Supply Voltage (V)
Ene
rgy
(J)
0.4 0.5 0.6 0.7 0.8 0.9 110
-10
10-9
10-8
Del
ay (s
)
ConventionalVoltage Overscaled
Energy
70% energysavings
VOS
Fig. 1: Supply voltage vs. energy and delay of an LDPC bitnode (Fig.6(a)) in a commercial 45 nm process. By voltage
overscaling up to an error rate of 70%, same performance canbe achieved with 70% less energy.
�����
�
�
�����
�
�
01011001111001000010011110011010
(a)
check nodes
variable nodes
(b)
Fig. 2: Example LDPC code: (a) parity check matrix, and (b)its bipartite decoding graph.
0. This parity check matrix gives rise to the bipartite factorgraph, where there are r check nodes and n variable nodes.The graph is connected in a way such that if the entry (i, j) ofH is 1 then the ith check node is connected to the jth variablenode. An example parity check matrix and its factor graph is
depicted in Fig. 2. Each column of H represent the code bits
(variables nodes), while each row represent the parity check
constraint (check nodes).
The following notation will be used: Pi is the probability
P r(ci = 1|yi), qij is the message sent from variable node
vi to check node cj , and rji is the message sent from checknode cj to variable node vi. Both messages qij and rji are apair of messages that give the belief of variable vi being 0 and1. Thus, qij(0) = 1 − qij(1), and likewise for rji. Then thecheck nodes compute, on average, the probability that an even
number of 1’s are observed through the following equation:
rji(0) =1
2+
1
2
∏i′∈Vj\i
(1− 2qi′j(1)) (1)
rji(1) = 1− rji(0), (2)
where Vj\i denotes all the nodes connected to variable nodevj excluding check node ci. This follows directly from the
proof by Gallager [15] whereby for a sequence of M inde-
pendent binary digits ai with probability pi = P r(ai = 1),the probability that the sequence contains an even number of
1’s is 12 + 1
2
∏Mi=1(1−2pi). The variable node then computes
its message by:
qij(0) = (1− Pi)∏
j′∈Ci\jrj′i(0) (3)
qij = Pi
∏j′∈Ci\j
rj′i(1). (4)
This is just multiplying all the beliefs from the check node of
being 0 to obtain the final belief of the variable being 0 and
likewise for 1. In a more practical implementation of LDPC,
instead of tracking two beliefs, only one message is passed,
the log likelihood ratio (LLR), which is the ratio of the two
messages logqij(1)qij(0)
and logrji(1)rji(0)
. Then the update equations
can be shown to be:
mij = logqij(1)
qij(0)= log
(Pi
1− Pi
)+
∑j′∈Ci\j
nj′i (5)
nji = logrji(1)rji(0)
= log1+
∏i′∈Vj\i tanh(mi′j/2)
1−∏i′∈Vj\i tanh(mi′j/2)
(6)
= ψ−1(∑
i′∈Vj\i ψ(mi′j))
(7)
where mij and nij are the LLR of the variable to check node
and check node to variable node messages, respectively, and
ψ(x) = − log tanh(|x|/2). Using the max-log approxima-tion, the check node to variable node message can be further
approximated as:
nji ≈ ( mini′∈Vj\i
|mi′j |)∏
i′∈Vj\isgn(mi′j). (8)
2.2. Statistical Error Compensation
A high level depiction of SEC is given in Fig. 3(a). SEC
utilizes the statistics of errors to perform detection and esti-
mation to compensate for errors. It also incorporates system
level statistical metrics, such as signal-to-noise ratio (SNR),
or bit error rate (BER). SEC operates on multiple observa-
tions, where each observation is generated by erroneous hard-
ware, an error free estimator, or an erroneous estimator. Each
150
CC yx Estimator/Detector
Estimator/Detector
1y2yNy
observations correctedoutput
),(, �� ePe
(a)
M
M-est
x
� oa yy
eyy oe
hardware errors
y
estimation errors
| |> T-
error-freeactual
�
(b) (c)
Fig. 3: Block diagram: (a) statistical error compensation, (b)algorithmic noise tolerance, and (c) error distributions.
observation yi is a corrupted version of the correct output yo,i.e., yi = yo + ηi + ei, where ηi denotes hardware errors andei denotes estimation errors. Based on these observations,detection and estimation techniques are employed in conjunc-
tion with the statistical information of ηi and ei to obtain themost likely correct output. Errors that have a large effect on
the system level performance are detected and compensated
while errors with minimal effect on performance are consid-
ered benign and are permitted.
2.2.1. Algorithmic noise tolerance
Statistical error compensation (SEC) in the form of algorith-
mic noise-tolerance (ANT) [11, 12] in Fig. 3(b) incorporates
a main block and an estimator. The main block is permittedto make hardware/timing errors, but not the estimator. The
estimator is a low-complexity block (typically 5%-to-20% of
the main block complexity) generating a statistical estimate
of the correct main block output, i.e.,
ya = yo + η (9)
ye = yo + e (10)
where ya is the actual main block output, yo is the error-freemain block output, η is the hardware error, ye is the estima-tor output, and e is the estimation error. Note: the estimatorexhibits estimation error e because it is simpler than the mainblock and does not perform exact computation. ANT exploits
the difference in the statistics of η and e (see Fig. 3(c)). Toenhance robustness, it is necessary that when η �= 0, that η belarge compared to e. In addition, the probability of the eventη �= 0, must be small. The final/corrected output of an ANTsystem y is obtained via the following decision rule:
y =
{ya, if |ya − ye| < τ
ye, otherwise(11)
Check Nodes
Variable Nodes
D D
y b
Fig. 4: High level block diagram of the LDPC decoder
where τ is an application-dependent parameter chosen to
maximize the performance of ANT. Under the conditions
outlined above, it is possible to show that
SNRuc � SNRe � SNRANT ≈ SNRo (12)
where SNRuc, SNRe, SNRANT and SNRo are the signal-
to-noise-ratios of the uncorrected main block (η dominates),the estimator (e dominates), the ANT system, and the error-free main block (ideal), respectively. Thus, ANT detects
and corrects errors approximately, but does so in a manner
that satisfies an application-level performance specification
(SNR). Several low-overhead estimation techniques have
been proposed by exploiting data correlation, system archi-
tecture, and statistical signal processing[12].
3. LDPC DECODER ARCHITECTURE
High throughput LDPC decoders require parallel computa-
tion of messages. We implement a fully parallel LDPC de-
coder, with the computation of variable and check nodes mul-
tiplexed in time, i.e., at one clock cycle variable nodes operate
in parallel, and at the next clock cycle check nodes operate in
parallel. The high level architecture along with node architec-
tures are is shown in Fig. 4 and 5.
The variable node implements (5). It takes the prior in-
formation given by the received signal y, and the messagesincoming from the check nodes and sums them up. A hard
decision block extracts the sign bit of the computed messages
and outputs them as the decoded bits b. The check node im-plements (7). The ψ(·) and ψ−1(·) functions are implementedas a table lookup via a piecewise linear approximation.
3.1. Error Resiliency
The original LDPC decoder is inherently robust to errors due
to its iterative message passing decoding algorithm. However,
errors in the sign bit have shown to be detrimental. In our
work, we apply SBP as our baseline LDPC decoder. The sign
bit is computed separately via the max-log approximation in
(8). The critical path for the sign bit computation is signif-
icantly shorter than that of the magnitude computation, and
the sign bit will experience errors only at significant VOS. In
the proposed error resilient LDPC decoder, SEC is added to
the variable node and check node via ANT. The estimator is
151
8
8
8
8
8
8
SignExtraction
8
LLR_cn(�,2)
LLR_ch
LLR_cn(�,1)
LLR_cn(�,k)
LLR_bn(i,2)
LLR_bn(i,1)
LLR_bn(i,k)
-
-
-
kb_hat
(a)
� 1 �
�
� 1 �
1 �
8
8
8
8
8
8
8
8
8
SignExtraction
LLR_cn(j,2)
LLR_cn(j,1)
LLR_cn(j,l)
LLR_bn(� ,2)
LLR_bn(� ,1)
LLR_bn(� ,l)
(b)
Fig. 5: Architecture of nodes: (a) variable node, and (b) checknode.
a reduced precision replica of the original variable and check
node. For the 8 bit precision main block, three estimators are
employed with 2, 3, and 4 bits of precision, respectively.
4. ENERGY, DELAY AND ERRORMODELING
4.1. Energy Model
We estimate the energy consumption of the LDPC decoder by
its constituent blocks. The total energy of the LDPC decoder
ELDPC is given by:
ELDPC = NvarEvar +NcheckEcheck +NwireEwire, (13)
where Nvar, Ncheck, and Nwire is the total number of vari-
able nodes, check nodes and interconnect wires, respectively,
and Evar, Echeck, and Ewire the energy consumptions of a
single variable node, check node, and interconnect wire, re-
spectively. To obtain Evar and Echeck, a single variable node
and check node, as shown in Fig. 5, is implemented in Ver-
ilog and synthesized using a commercial 45 nm standard cell
library. The synthesized netlist is then used to extract a SPICE
netlist which is used for circuit simulations. The energy vs.
Vdd plot and the delay vs. Vdd is shown if Fig. 6 for both
variable node and check node.
A 3-wire distributed RC network is used for the intercon-
nect model as shown in Fig. 7(a). Only coupling between
adjacent wires is considered, and the energy consumed by the
middle wire is averaged over all possible transitions. The wire
0.4 0.5 0.6 0.7 0.8 0.9 10
0.2
0.4
0.6
0.8
x 10-11
Supply Voltage (V)
Ene
rgy
(J)
0.4 0.5 0.6 0.7 0.8 0.9 110
-10
10-9
10-8
10-7
10-6
10-5
Del
ay (s
)
EnergyDelay
0.4 0.5 0.6 0.7 0.8 0.9 110
-10
10-9
10-8
Del
ay (s
)
check node
variablenode
Fig. 6: Energy consumption and delay curves obtained
through circuit simulation of the variable and check node ar-
chitectures in Fig. 5, synthesized in a commercial 45 nm pro-
cess.
was assumed to be routed on metal 4 with a length of 200μm.The values for R, CC , and CG were obtained from the design
manual of a commercial 45 nm process. Figure 7(b) shows
the energy consumption obtained through circuit simulations.
As the wire delays were significantly shorter than the bit node
and check node delays, the interconnect is assumed to be er-
ror free. The energy values obtained are comparable to those
obtained from the bus energy consumption model in [16].
4.2. Error Modeling
To simulate input dependent timing errors, gate level simu-
lations using an HDL simulator is performed. First the gate
delay is characterized with respect to supply voltage for basic
gates such as a full adder, and XOR using circuit level sim-
ulators as in Section 4.1. Then a structural HDL implemen-
tation of the LDPC decoder is simulated via an HDL simu-
lator using the pre-characterized delay values. By choosing
the delay values that correspond to various supply voltages,
HDL simulation is effectively run at different voltages, and
for Vdd < Vdd,crit, errors can be observed in the outputs. The
complete simulation methodology is summarized in Fig. 8.
5. SIMULATION RESULTS
Simulation for multiple LDPC codes was performed at var-
ious Vdd and SNR. A randomly generated regular LDPC
code is used. In this section we only show the results for
a (800, 400) LDPC code, due to area constraints, but we
have observed that the trends are true for shorter (50, 25) andlonger (1800, 900) codes as well.
152
CCGC
CCGC
CCGC
CCGC
CCGC
CCGC
CCGC
CCGC
CCGC
R R R
R R R
R R R
(a)
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
2
2.5
3
3.5
4
4.5
5
5.5
6x 10�13
Supply voltage (V)
Ene
rgy
(J)
Modelspice
(b)
Fig. 7: Interconnect energy for a 200 μm wire: (a) distributed
RC model, and (b) average energy vs. supply voltage curve.
VerilogRTL
CircuitSimulation
LDPCDecoder
Architecture
Delay-basederror injection(voltage
overscaling)
Error resiliencyTechniques
BER
Energy
45nm CMOSPDK
EnergyModels
SBP, ANT
Fig. 8: Simulation methodology.
0 1 2 3 4 5 6 710
-5
10-4
10-3
10-2
10-1
100
SNR (dB)
BE
R
Error freeNo correctionSBPANT - 4bANT - 3bANT - 2b
Fig. 9: BER vs. SNR plot of a (800, 400) LDPC code withpe = 0.2.
Table 1: Error rate that can be tolerated at a given BERthreshold.
BER threshold 2× 3× 4× 5×No correction 0.0060 0.0097 0.0134 0.0171
SBP 0.0087 0.0114 0.0140 0.0167
ANT-4b 0.2873 0.3503 0.4134 0.4839
ANT-2b 0.0355 0.0513 0.0672 0.0830
(ANT-4b/SBP) 4.08 30.73 29.53 28.98
5.1. BER Performance
The BER performance of the LDPC code for an error rate
of pe = 0.2 is shown in Fig. 9. It can be seen that SBPschemes only work till an SNR of 1.8 dB and fails completelyat SNR> 3 dB. ANT schemes are significantly more ro-
bust with with ANT-2b (ANT with a 2b estimator precision)
breaking down at SNR> 7 dB, ANT-3b breaking down atSNR> 8 dB (not shown in figure), and ANT-4b retaining per-formance close to error free LDPC for SNR as high as 10 dB.
5.2. Robustness
Figure 10 shows the change in BER as Vdd is reduced. It
can be seen that even with no error correction, the LDPC de-
coder maintains performance up to pe = 2× 10−3. However,its performance degrades rapidly when subject to higher er-
ror rates. SBP is able to tolerate higher error rates of up to
pe = 7 × 10−3, but ANT is still much powerful, achievingacceptable performance for up to pe = 0.2. As a slight degra-dation in BER is tolerable in most systems (as compared to a
magnitude change), we have chosen several BER thresholds,
2 to 5 times the BER of the error free case, and found the tol-erable error rate for each scheme. The results are summarized
in Table 1. It can be seen that ANT shows up to 30.7× morerobustness than SBP.
153
10-4
10-3
10-2
10-1
100
10-4
10-3
10-2
10-1
pe
BE
R
Error FreeNo correctionSBPANT-4bANT-3bANT-2b
2X3X4X5X
BERthresholds
Fig. 10: BER vs. pe graph at SNR = 5dB.
10-4
10-3
10-2
10-1
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
3.2x 10
-9
BER
Ene
rgy/
bit (
J)
No correctionSBPANT-4bANT-3bANT-2b
Nominalenergy
45.7%savings
33.2%savings
Fig. 11: Energy vs. BER plot of a (800, 400) LDPC code atSNR = 5dB.
5.3. Energy Savings
Energy savings of various schemes are compared at the same
BER performance. ANT schemes are able to achieve same
BER performance at a significantly lower Vdd and thus results
in energy savings. Fig. 11 shows that ANT-4b can achieve up
to 45.7% energy savings compared to the error free conven-
tional LDPC decoder, and up to 33.2% energy savings com-
pared to the erroneous conventional LDPC decoder. This is in
addition to the 30× enhanced robustness.
6. CONCLUSION
In this paper we have applied SEC to LDPC decoders and
achieve up to 30× enhancement in robustness and 45.7% en-
ergy savings compared to conventional LDPC decoders. Fu-
ture work will include methods to reduce the overhead of SEC
and further increase the energy savings. Possible methods are
to apply SEC to a partial set of nodes, while simpler error
resiliency techniques, such as SBP, are applied to the remain-
ing nodes. A hybrid scheme between ANT and SBP is also a
possibility.
7. REFERENCES
[1] “Digital video broadcasting (DVB)(DVB-S2) EN 302 307
V1.2.1,” ETSI, Aug. 2009.
[2] “IEEE Std. 802.16e, IEEE standard for local and metropolitan
area networks,” Mar. 2006.
[3] “IEEE 802.11n. Wireless LAN Medium Access Control
and Physical Layer specifications: Enhancements for higher
Throughput. IEEE P802.16n/D1.0,” Mar. 2006.
[4] S. Hemati, A. Banihashemi, and C. Plett, “A 0.18-CMOS ana-
log min-sum iterative decoder for a (32, 8) low-density parity-
check (LDPC) code,” IEEE J. Solid-State Circuits, vol. 41,no. 11, pp. 2531–2540, 2006.
[5] A. Darabiha, A. Chan Carusone, and F. Kschischang, “Power
reduction techniques for LDPC decoders,” IEEE J. Solid-StateCircuits, vol. 43, no. 8, pp. 1835–1845, 2008.
[6] K. Shimizu, N. Togawa, T. Ikenaga, and S. Goto, “Power-
efficient LDPC code decoder architecture,” in Int. Symp. onLow Power Elect. and Design (ISLPED), Aug. 2007, pp. 359–362.
[7] J. Jin and C. ying Tsui, “An energy efficient layered decod-
ing architecture for LDPC decoder,” IEEE Trans. VLSI Syst.,vol. 18, no. 8, pp. 1185 –1195, Aug. 2010.
[8] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin,
K. Flautner, and T. Mudge, “A self-tuning DVS processor us-
ing delay-error detection and correction,” IEEE J. Solid-StateCircuits, vol. 41, no. 4, pp. 792–804, 2006.
[9] S. Das, C. Tokunaga, S. Pant, W. Ma, S. Kalaiselvan, K. Lai,
D. Bull, and D. Blaauw, “Razor II: In situ error detection and
correction for PVT and SER tolerance,” IEEE J. Solid-StateCircuits, vol. 44, no. 1, pp. 32–48, 2009.
[10] K. Bowman, J. Tschanz, N. Kim, J. Lee, C. Wilkerson, S. Lu,
T. Karnik, and V. De, “Energy-efficient and metastability-
immune resilient circuits for dynamic variation tolerance,”
IEEE J. Solid-State Circuits, vol. 44, no. 1, pp. 49–63, 2009.[11] R. Hegde and N. R. Shanbhag, “A voltage overscaled low-
power digital filter IC,” IEEE J. Solid-State Circuits, vol. 39,no. 2, pp. 388–391, Feb. 2004.
[12] N. R. Shanbhag, R. A. Abdallah, R. Kumar, and D. L. Jones,
“Stochastic computation,” in Proc. 47th Design AutomationConf. (DAC), 2010, pp. 859–864.
[13] E. Kim, D. Baker, S. Narayanan, D. Jones, and N. Shanbhag,
“Low power and error resilient PN code acquisition filter via
statistical error compensation,” in Proc. Custom Integ. CircuitsConf. (CICC), Sep. 2011.
[14] M. May, M. Alles, and N. Wehn, “A case study in reliability-
aware design: a resilient LDPC code decoder,” in Proc. Conf.on Design, Automation and Test in Europe (DATE). ACM,
2008, pp. 456–461.
[15] R. Gallager, “Low-density parity-check codes,” IRE Trans. onInf. Theory, vol. 8, no. 1, pp. 21–28, 1962.
[16] P. Sotiriadis and A. Chandrakasan, “A bus energy model for
deep submicron technology,” IEEE Trans. VLSI Syst., vol. 10,no. 3, pp. 341–350, 2002.
154