[IEEE 2012 IEEE Workshop on Signal Processing Systems (SiPS) - Quebec City, QC, Canada...

ENERGY-EFFICIENT LDPC DECODERS BASED ON ERROR-RESILIENCY

Eric P. Kim and Naresh R. Shanbhag

University of Illinois at Urbana ChampaignCoordinate Science Laboratory/Department of Electrical and Computer Engineering

1308 W. Main St. Urbana, IL 61801[epkim2, shanbhag]@illinois.edu

ABSTRACT

Low density parity check (LDPC) codes are used in vari-

ous communication standards. However, LDPC decoders are

complex and power hungry. In this paper, we present an

energy-efficient LDPC decoder based on statistical error com-

pensation (SEC). Three different size LDPC codes, (50, 25),(800, 400), and (1800, 900) were implemented with 5 itera-tions/block. Circuit simulations in a commercial 45 nm pro-

cess show that the SEC based LDPC decoder can operate at

a supply voltage up to 38% less than the nominal voltage and

tolerate up to 30× more errors over an SNR range of 3 dBto 8 dB, while maintaining less than 3× degradation in BER.This is equivalent with energy savings of 45.7% compared to

conventional LDPC decoders, and 33.2% compared to a sign

bit protected LDPC decoder.

Index Terms— LDPC, Error resiliency, low power

1. INTRODUCTION

Low density parity check (LDPC) codes offer excellent er-

ror correction performance and have been employed in com-

munication systems such as 802.11 Wi-Fi, DVB-S2 satellite

transmission of digital television, and are considered for many

4G systems including WiMax (IEEE Std 802.16e) [1, 2, 3].

However, the decoding complexity of LDPC codes are quite

large and low power LDPC decoders are required to satisfy

the power constraints of wireless handsets. Much work has

been done on low power LDPC decoders. Analog decoder

architectures have been proposed for short length codes [4],

however, scaling the code length to more than 250 will be

challenging due to device mismatch, and buffering require-

ments. Digital low power LDPC decoder architectures mostly

focus on reducing the decoding complexity through early ter-

mination or approximation [5, 6, 7].

The recent trend in energy efficient designs have been to

focus on the trade-off between error-resiliency and energy-

efficiency [8, 9, 10, 11, 12, 13]. Previous work on error re-

silient LDPC decoders have protected the sign bit of decoding

messages (see Section 3.1) or employed triple modular redun-

dancy (TMR) [14]. However, the error resiliency provided by

sign bit protection (SBP) alone has limitations at high error

rates (more than 3% errors per clock cycle), and TMR has

very high overhead.

In this paper, we present an energy-efficient LDPC de-

coder based on statistical error compensation (SEC) [11, 12,

13] and evaluate the energy efficiency vs. robustness trade-

offs in a 45 nm process technology. SEC techniques intro-

duce resiliency by exploiting knowledge on the statistics of

the data and error to compensate for the errors. This is in

contrast to microarchitectural techniques [8, 9] that employ

local error detection and global correction via replay and can

compensate for an error rate pe (percentage of clock cyclesin which the output is in error) of less than 0.1%. SEC tech-

niques have shown to successfully compensate for pe up to86% [13] while demonstrating measured energy savings of

3.3×-to-5.8×. This energy efficiency is obtained by trad-ing off reliability with energy savings via voltage overscaling

(VOS). VOS is employed to induce timing errors by reduc-

ing the supply voltage Vdd below the critical voltage Vdd,crit,

the lowest voltage with error free operation. The error rate peincreases as Vdd is reduced further below Vdd,crit. Figure 1

shows the energy vs. Vdd for a bit node, a block used in de-

coding LDPC codes, for a conventional error-free design, and

a VOS design that targets an error rate of 70%. If the VOSinduced errors are fully compensated for, significant energy

savings (up to 70%) can be achieved.The paper is organized as follows. Section 2 provides

background information on LDPC codes and SEC techniques

that are used in this paper. Section 3 describes the LDPC de-

coder architecture in detail. The simulation setup including

energy and error modeling are presented in Section 4. Simu-

lation results are given in Section 5 with Section 6 concluding

the paper.

2. BACKGROUND

2.1. Low density parity check codes

LDPC codes are linear block codes based on a sparse parity

check matrix H . Let H be a binary r × n matrix. In codingtheory, any vector c of length n is a valid codeword ifHcT =

2012 IEEE Workshop on Signal Processing Systems

978-0-7695-4856-2/12 $26.00 © 2012 IEEE

DOI 10.1109/SiPS.2012.60

149

0.4 0.5 0.6 0.7 0.8 0.9 10

0.4

0.8

1.2

1.6

2x 10

-12

Supply Voltage (V)

Ene

rgy

(J)

0.4 0.5 0.6 0.7 0.8 0.9 110

-10

10-9

10-8

Del

ay (s

)

ConventionalVoltage Overscaled

Energy

70% energysavings

VOS

Fig. 1: Supply voltage vs. energy and delay of an LDPC bitnode (Fig.6(a)) in a commercial 45 nm process. By voltage

overscaling up to an error rate of 70%, same performance canbe achieved with 70% less energy.

��

�

�

��

�

�

01011001111001000010011110011010

(a)

check nodes

variable nodes

(b)

Fig. 2: Example LDPC code: (a) parity check matrix, and (b)its bipartite decoding graph.

0. This parity check matrix gives rise to the bipartite factorgraph, where there are r check nodes and n variable nodes.The graph is connected in a way such that if the entry (i, j) ofH is 1 then the ith check node is connected to the jth variablenode. An example parity check matrix and its factor graph is

depicted in Fig. 2. Each column of H represent the code bits

(variables nodes), while each row represent the parity check

constraint (check nodes).

The following notation will be used: Pi is the probability

P r(ci = 1|yi), qij is the message sent from variable node

vi to check node cj , and rji is the message sent from checknode cj to variable node vi. Both messages qij and rji are apair of messages that give the belief of variable vi being 0 and1. Thus, qij(0) = 1 − qij(1), and likewise for rji. Then thecheck nodes compute, on average, the probability that an even

number of 1’s are observed through the following equation:

rji(0) =1

2+

1

2

∏i′∈Vj\i

(1− 2qi′j(1)) (1)

rji(1) = 1− rji(0), (2)

where Vj\i denotes all the nodes connected to variable nodevj excluding check node ci. This follows directly from the

proof by Gallager [15] whereby for a sequence of M inde-

pendent binary digits ai with probability pi = P r(ai = 1),the probability that the sequence contains an even number of

1’s is 12 + 1

2

∏Mi=1(1−2pi). The variable node then computes

its message by:

qij(0) = (1− Pi)∏

j′∈Ci\jrj′i(0) (3)

qij = Pi

∏j′∈Ci\j

rj′i(1). (4)

This is just multiplying all the beliefs from the check node of

being 0 to obtain the final belief of the variable being 0 and

likewise for 1. In a more practical implementation of LDPC,

instead of tracking two beliefs, only one message is passed,

the log likelihood ratio (LLR), which is the ratio of the two

messages logqij(1)qij(0)

and logrji(1)rji(0)

. Then the update equations

can be shown to be:

mij = logqij(1)

qij(0)= log

(Pi

1− Pi

)+

∑j′∈Ci\j

nj′i (5)

nji = logrji(1)rji(0)

= log1+

∏i′∈Vj\i tanh(mi′j/2)

1−∏i′∈Vj\i tanh(mi′j/2)

(6)

= ψ−1(∑

i′∈Vj\i ψ(mi′j))

(7)

where mij and nij are the LLR of the variable to check node

and check node to variable node messages, respectively, and

ψ(x) = − log tanh(|x|/2). Using the max-log approxima-tion, the check node to variable node message can be further

approximated as:

nji ≈ ( mini′∈Vj\i

|mi′j |)∏

i′∈Vj\isgn(mi′j). (8)

2.2. Statistical Error Compensation

A high level depiction of SEC is given in Fig. 3(a). SEC

utilizes the statistics of errors to perform detection and esti-

mation to compensate for errors. It also incorporates system

level statistical metrics, such as signal-to-noise ratio (SNR),

or bit error rate (BER). SEC operates on multiple observa-

tions, where each observation is generated by erroneous hard-

ware, an error free estimator, or an erroneous estimator. Each

150

CC yx Estimator/Detector

Estimator/Detector

1y2yNy

observations correctedoutput

),(, �� ePe

(a)

M

M-est

x

� oa yy

eyy oe

hardware errors

y

estimation errors

| |> T-

error-freeactual

�

(b) (c)

Fig. 3: Block diagram: (a) statistical error compensation, (b)algorithmic noise tolerance, and (c) error distributions.

observation yi is a corrupted version of the correct output yo,i.e., yi = yo + ηi + ei, where ηi denotes hardware errors andei denotes estimation errors. Based on these observations,detection and estimation techniques are employed in conjunc-

tion with the statistical information of ηi and ei to obtain themost likely correct output. Errors that have a large effect on

the system level performance are detected and compensated

while errors with minimal effect on performance are consid-

ered benign and are permitted.

2.2.1. Algorithmic noise tolerance

Statistical error compensation (SEC) in the form of algorith-

mic noise-tolerance (ANT) [11, 12] in Fig. 3(b) incorporates

a main block and an estimator. The main block is permittedto make hardware/timing errors, but not the estimator. The

estimator is a low-complexity block (typically 5%-to-20% of

the main block complexity) generating a statistical estimate

of the correct main block output, i.e.,

ya = yo + η (9)

ye = yo + e (10)

where ya is the actual main block output, yo is the error-freemain block output, η is the hardware error, ye is the estima-tor output, and e is the estimation error. Note: the estimatorexhibits estimation error e because it is simpler than the mainblock and does not perform exact computation. ANT exploits

the difference in the statistics of η and e (see Fig. 3(c)). Toenhance robustness, it is necessary that when η �= 0, that η belarge compared to e. In addition, the probability of the eventη �= 0, must be small. The final/corrected output of an ANTsystem y is obtained via the following decision rule:

y =

{ya, if |ya − ye| < τ

ye, otherwise(11)

Check Nodes

Variable Nodes

D D

y b

Fig. 4: High level block diagram of the LDPC decoder

where τ is an application-dependent parameter chosen to

maximize the performance of ANT. Under the conditions

outlined above, it is possible to show that

SNRuc � SNRe � SNRANT ≈ SNRo (12)

where SNRuc, SNRe, SNRANT and SNRo are the signal-

to-noise-ratios of the uncorrected main block (η dominates),the estimator (e dominates), the ANT system, and the error-free main block (ideal), respectively. Thus, ANT detects

and corrects errors approximately, but does so in a manner

that satisfies an application-level performance specification

(SNR). Several low-overhead estimation techniques have

been proposed by exploiting data correlation, system archi-

tecture, and statistical signal processing[12].

3. LDPC DECODER ARCHITECTURE

High throughput LDPC decoders require parallel computa-

tion of messages. We implement a fully parallel LDPC de-

coder, with the computation of variable and check nodes mul-

tiplexed in time, i.e., at one clock cycle variable nodes operate

in parallel, and at the next clock cycle check nodes operate in

parallel. The high level architecture along with node architec-

tures are is shown in Fig. 4 and 5.

The variable node implements (5). It takes the prior in-

formation given by the received signal y, and the messagesincoming from the check nodes and sums them up. A hard

decision block extracts the sign bit of the computed messages

and outputs them as the decoded bits b. The check node im-plements (7). The ψ(·) and ψ−1(·) functions are implementedas a table lookup via a piecewise linear approximation.

3.1. Error Resiliency

The original LDPC decoder is inherently robust to errors due

to its iterative message passing decoding algorithm. However,

errors in the sign bit have shown to be detrimental. In our

work, we apply SBP as our baseline LDPC decoder. The sign

bit is computed separately via the max-log approximation in

(8). The critical path for the sign bit computation is signif-

icantly shorter than that of the magnitude computation, and

the sign bit will experience errors only at significant VOS. In

the proposed error resilient LDPC decoder, SEC is added to

the variable node and check node via ANT. The estimator is

151

8

8

8

8

8

8

SignExtraction

8

LLR_cn(�,2)

LLR_ch

LLR_cn(�,1)

LLR_cn(�,k)

LLR_bn(i,2)

LLR_bn(i,1)

LLR_bn(i,k)

-

-

-

kb_hat

(a)

� 1 �

�

� 1 �

1 �

8

8

8

8

8

8

8

8

8

SignExtraction

LLR_cn(j,2)

LLR_cn(j,1)

LLR_cn(j,l)

LLR_bn(� ,2)

LLR_bn(� ,1)

LLR_bn(� ,l)

(b)

Fig. 5: Architecture of nodes: (a) variable node, and (b) checknode.

a reduced precision replica of the original variable and check

node. For the 8 bit precision main block, three estimators are

employed with 2, 3, and 4 bits of precision, respectively.

4. ENERGY, DELAY AND ERRORMODELING

4.1. Energy Model

We estimate the energy consumption of the LDPC decoder by

its constituent blocks. The total energy of the LDPC decoder

ELDPC is given by:

ELDPC = NvarEvar +NcheckEcheck +NwireEwire, (13)

where Nvar, Ncheck, and Nwire is the total number of vari-

able nodes, check nodes and interconnect wires, respectively,

and Evar, Echeck, and Ewire the energy consumptions of a

single variable node, check node, and interconnect wire, re-

spectively. To obtain Evar and Echeck, a single variable node

and check node, as shown in Fig. 5, is implemented in Ver-

ilog and synthesized using a commercial 45 nm standard cell

library. The synthesized netlist is then used to extract a SPICE

netlist which is used for circuit simulations. The energy vs.

Vdd plot and the delay vs. Vdd is shown if Fig. 6 for both

variable node and check node.

A 3-wire distributed RC network is used for the intercon-

nect model as shown in Fig. 7(a). Only coupling between

adjacent wires is considered, and the energy consumed by the

middle wire is averaged over all possible transitions. The wire

0.4 0.5 0.6 0.7 0.8 0.9 10

0.2

0.4

0.6

0.8

x 10-11

Supply Voltage (V)

Ene

rgy

(J)

0.4 0.5 0.6 0.7 0.8 0.9 110

-10

10-9

10-8

10-7

10-6

10-5

Del

ay (s

)

EnergyDelay

0.4 0.5 0.6 0.7 0.8 0.9 110

-10

10-9

10-8

Del

ay (s

)

check node

variablenode

Fig. 6: Energy consumption and delay curves obtained

through circuit simulation of the variable and check node ar-

chitectures in Fig. 5, synthesized in a commercial 45 nm pro-

cess.

was assumed to be routed on metal 4 with a length of 200μm.The values for R, CC , and CG were obtained from the design

manual of a commercial 45 nm process. Figure 7(b) shows

the energy consumption obtained through circuit simulations.

As the wire delays were significantly shorter than the bit node

and check node delays, the interconnect is assumed to be er-

ror free. The energy values obtained are comparable to those

obtained from the bus energy consumption model in [16].

4.2. Error Modeling

To simulate input dependent timing errors, gate level simu-

lations using an HDL simulator is performed. First the gate

delay is characterized with respect to supply voltage for basic

gates such as a full adder, and XOR using circuit level sim-

ulators as in Section 4.1. Then a structural HDL implemen-

tation of the LDPC decoder is simulated via an HDL simu-

lator using the pre-characterized delay values. By choosing

the delay values that correspond to various supply voltages,

HDL simulation is effectively run at different voltages, and

for Vdd < Vdd,crit, errors can be observed in the outputs. The

complete simulation methodology is summarized in Fig. 8.

5. SIMULATION RESULTS

Simulation for multiple LDPC codes was performed at var-

ious Vdd and SNR. A randomly generated regular LDPC

code is used. In this section we only show the results for

a (800, 400) LDPC code, due to area constraints, but we

have observed that the trends are true for shorter (50, 25) andlonger (1800, 900) codes as well.

152

CCGC

CCGC

CCGC

CCGC

CCGC

CCGC

CCGC

CCGC

CCGC

R R R

R R R

R R R

(a)

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

2

2.5

3

3.5

4

4.5

5

5.5

6x 10�13

Supply voltage (V)

Ene

rgy

(J)

Modelspice

(b)

Fig. 7: Interconnect energy for a 200 μm wire: (a) distributed

RC model, and (b) average energy vs. supply voltage curve.

VerilogRTL

CircuitSimulation

LDPCDecoder

Architecture

Delay-basederror injection(voltage

overscaling)

Error resiliencyTechniques

BER

Energy

45nm CMOSPDK

EnergyModels

SBP, ANT

Fig. 8: Simulation methodology.

0 1 2 3 4 5 6 710

-5

10-4

10-3

10-2

10-1

100

SNR (dB)

BE

R

Error freeNo correctionSBPANT - 4bANT - 3bANT - 2b

Fig. 9: BER vs. SNR plot of a (800, 400) LDPC code withpe = 0.2.

Table 1: Error rate that can be tolerated at a given BERthreshold.

BER threshold 2× 3× 4× 5×No correction 0.0060 0.0097 0.0134 0.0171

SBP 0.0087 0.0114 0.0140 0.0167

ANT-4b 0.2873 0.3503 0.4134 0.4839

ANT-2b 0.0355 0.0513 0.0672 0.0830

(ANT-4b/SBP) 4.08 30.73 29.53 28.98

5.1. BER Performance

The BER performance of the LDPC code for an error rate

of pe = 0.2 is shown in Fig. 9. It can be seen that SBPschemes only work till an SNR of 1.8 dB and fails completelyat SNR> 3 dB. ANT schemes are significantly more ro-

bust with with ANT-2b (ANT with a 2b estimator precision)

breaking down at SNR> 7 dB, ANT-3b breaking down atSNR> 8 dB (not shown in figure), and ANT-4b retaining per-formance close to error free LDPC for SNR as high as 10 dB.

5.2. Robustness

Figure 10 shows the change in BER as Vdd is reduced. It

can be seen that even with no error correction, the LDPC de-

coder maintains performance up to pe = 2× 10−3. However,its performance degrades rapidly when subject to higher er-

ror rates. SBP is able to tolerate higher error rates of up to

pe = 7 × 10−3, but ANT is still much powerful, achievingacceptable performance for up to pe = 0.2. As a slight degra-dation in BER is tolerable in most systems (as compared to a

magnitude change), we have chosen several BER thresholds,

2 to 5 times the BER of the error free case, and found the tol-erable error rate for each scheme. The results are summarized

in Table 1. It can be seen that ANT shows up to 30.7× morerobustness than SBP.

153

10-4

10-3

10-2

10-1

100

10-4

10-3

10-2

10-1

pe

BE

R

Error FreeNo correctionSBPANT-4bANT-3bANT-2b

2X3X4X5X

BERthresholds

Fig. 10: BER vs. pe graph at SNR = 5dB.

10-4

10-3

10-2

10-1

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

3.2x 10

-9

BER

Ene

rgy/

bit (

J)

No correctionSBPANT-4bANT-3bANT-2b

Nominalenergy

45.7%savings

33.2%savings

Fig. 11: Energy vs. BER plot of a (800, 400) LDPC code atSNR = 5dB.

5.3. Energy Savings

Energy savings of various schemes are compared at the same

BER performance. ANT schemes are able to achieve same

BER performance at a significantly lower Vdd and thus results

in energy savings. Fig. 11 shows that ANT-4b can achieve up

to 45.7% energy savings compared to the error free conven-

tional LDPC decoder, and up to 33.2% energy savings com-

pared to the erroneous conventional LDPC decoder. This is in

addition to the 30× enhanced robustness.

6. CONCLUSION

In this paper we have applied SEC to LDPC decoders and

achieve up to 30× enhancement in robustness and 45.7% en-

ergy savings compared to conventional LDPC decoders. Fu-

ture work will include methods to reduce the overhead of SEC

and further increase the energy savings. Possible methods are

to apply SEC to a partial set of nodes, while simpler error

resiliency techniques, such as SBP, are applied to the remain-

ing nodes. A hybrid scheme between ANT and SBP is also a

possibility.

7. REFERENCES

[1] “Digital video broadcasting (DVB)(DVB-S2) EN 302 307

V1.2.1,” ETSI, Aug. 2009.

[2] “IEEE Std. 802.16e, IEEE standard for local and metropolitan

area networks,” Mar. 2006.

[3] “IEEE 802.11n. Wireless LAN Medium Access Control

and Physical Layer specifications: Enhancements for higher

Throughput. IEEE P802.16n/D1.0,” Mar. 2006.

[4] S. Hemati, A. Banihashemi, and C. Plett, “A 0.18-CMOS ana-

log min-sum iterative decoder for a (32, 8) low-density parity-

check (LDPC) code,” IEEE J. Solid-State Circuits, vol. 41,no. 11, pp. 2531–2540, 2006.

[5] A. Darabiha, A. Chan Carusone, and F. Kschischang, “Power

reduction techniques for LDPC decoders,” IEEE J. Solid-StateCircuits, vol. 43, no. 8, pp. 1835–1845, 2008.

[6] K. Shimizu, N. Togawa, T. Ikenaga, and S. Goto, “Power-

efficient LDPC code decoder architecture,” in Int. Symp. onLow Power Elect. and Design (ISLPED), Aug. 2007, pp. 359–362.

[7] J. Jin and C. ying Tsui, “An energy efficient layered decod-

ing architecture for LDPC decoder,” IEEE Trans. VLSI Syst.,vol. 18, no. 8, pp. 1185 –1195, Aug. 2010.

[8] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin,

K. Flautner, and T. Mudge, “A self-tuning DVS processor us-

ing delay-error detection and correction,” IEEE J. Solid-StateCircuits, vol. 41, no. 4, pp. 792–804, 2006.

[9] S. Das, C. Tokunaga, S. Pant, W. Ma, S. Kalaiselvan, K. Lai,

D. Bull, and D. Blaauw, “Razor II: In situ error detection and

correction for PVT and SER tolerance,” IEEE J. Solid-StateCircuits, vol. 44, no. 1, pp. 32–48, 2009.

[10] K. Bowman, J. Tschanz, N. Kim, J. Lee, C. Wilkerson, S. Lu,

T. Karnik, and V. De, “Energy-efficient and metastability-

immune resilient circuits for dynamic variation tolerance,”

IEEE J. Solid-State Circuits, vol. 44, no. 1, pp. 49–63, 2009.[11] R. Hegde and N. R. Shanbhag, “A voltage overscaled low-

power digital filter IC,” IEEE J. Solid-State Circuits, vol. 39,no. 2, pp. 388–391, Feb. 2004.

[12] N. R. Shanbhag, R. A. Abdallah, R. Kumar, and D. L. Jones,

“Stochastic computation,” in Proc. 47th Design AutomationConf. (DAC), 2010, pp. 859–864.

[13] E. Kim, D. Baker, S. Narayanan, D. Jones, and N. Shanbhag,

“Low power and error resilient PN code acquisition filter via

statistical error compensation,” in Proc. Custom Integ. CircuitsConf. (CICC), Sep. 2011.

[14] M. May, M. Alles, and N. Wehn, “A case study in reliability-

aware design: a resilient LDPC code decoder,” in Proc. Conf.on Design, Automation and Test in Europe (DATE). ACM,

2008, pp. 456–461.

[15] R. Gallager, “Low-density parity-check codes,” IRE Trans. onInf. Theory, vol. 8, no. 1, pp. 21–28, 1962.

[16] P. Sotiriadis and A. Chandrakasan, “A bus energy model for

deep submicron technology,” IEEE Trans. VLSI Syst., vol. 10,no. 3, pp. 341–350, 2002.

154

[IEEE 2012 IEEE Workshop on Signal Processing Systems (SiPS) - Quebec City, QC, Canada...

Documents

Transcript of [IEEE 2012 IEEE Workshop on Signal Processing Systems (SiPS) - Quebec City, QC, Canada...