LLR-Based Successive-Cancellation List Decoder for … · Bo Yuan and Keshab K. Parhi, Fellow, IEEE...

5
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 64, NO. 1, JANUARY 2017 21 LLR-Based Successive-Cancellation List Decoder for Polar Codes With Multibit Decision Bo Yuan and Keshab K. Parhi, Fellow, IEEE Abstract—Due to their capacity-achieving property, polar codes have become one of the most attractive channel codes. To date, the successive-cancellation list (SCL) decoding algorithm is the pri- mary approach that can guarantee outstanding error-correcting performance of polar codes. However, the hardware designs of the original SCL decoder have a large silicon area and a long decoding latency. Although some recent efforts can reduce either the area or latency of SCL decoders, these two metrics still cannot be optimized at the same time. This brief, for the first time, proposes a general log-likelihood-ratio (LLR) based SCL decoding algorithm with multibit decision. This new algorithm, referred to as LLR 2 K b-SCL, can determine 2 K bits simultaneously for arbitrary K with the use of LLR messages. In addition, a reduced-data-width scheme is presented to reduce the critical path of the sorting block. Then, based on the proposed algorithm, a VLSI architecture of the new SCL decoder is developed. Synthesis results show that, for an example (1024, 512) polar code with list size 4, the proposed LLR 2 K b SCL decoders achieve a significant reduction in both area and latency as compared to prior works. As a result, the hardware efficiencies of the proposed designs with K =2 and 3 are 2.33 times and 3.32 times of that of the state-of-the-art works, respectively. Index Terms—Log-likelihood-ratio (LLR), multibit decision, polar codes, successive-cancellation (SC), VLSI. I. I NTRODUCTION P OLAR codes [1] have emerged as one of the most attractive forward error correction codes in recent years. Due to their unique capacity-achieving property, polar codes provide out- standing error-correcting capability that would be very useful for digital transmission. However, to date, polar codes suffer from inferior finite length error-correcting performance. In particular, in the region of short or medium code length, polar codes are not comparable to the LDPC codes in terms of coding gain. To solve this prob- lem, the successive-cancellation list (SCL) decoding algorithm was proposed in [2] to improve the coding gain of the polar codes. In [2], it was shown that, with the use of SCL algorithm, polar codes can outperform the WiMAX LDPC codes even for a shorter code length. Manuscript received January 22, 2016; accepted March 15, 2016. Date of publication March 24, 2016; date of current version December 22, 2016. This brief was recommended by Associate Editor C. K. Tse. B. Yuan was with the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA. He is now with the Department of Electrical Engineering, University of New York, City College, New York, NY 10031 USA (e-mail: [email protected]). K. K. Parhi is with the Department of Electrical and Computer Engineer- ing, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: parhi@ umn.edu). Color versions of one or more of the figures in this brief are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSII.2016.2546904 Although the SCL algorithm can help polar codes achieve beyond-LDPC performance, this approach suffers from high complexity and long latency. Recently, some efforts have been proposed to address these problems. A log-likelihood-ratio (LLR) based SCL algorithm was proposed in [3] and [4] to reduce the amount of combinational logic and memory. In [5], [6], and [11], low-latency SCL algorithms were presented to reduce the required number of decoding cycles. However, these prior works only focused on either reducing area or latency but not on optimizing these two metrics at the same time. This brief, for the first time, proposes a general reduced- latency LLR-based SCL decoding algorithm. This new algo- rithm, namely, LLR 2 K b-SCL, can determine 2 K bits in one cycle for arbitrary K with the use of LLR messages. As a result, it can achieve both low complexity and short latency. In addition, a reduced-data-width scheme is presented to reduce the critical path of the sorting block. Based on the proposed al- gorithm, a VLSI architecture of the new SCL decoder is devel- oped. Synthesis results show that, for an example (1024, 512) polar code with list size 4, the proposed LLR 2 K b-SCL decoder achieves great reduction in both area and latency as compared to the prior works, respectively. As a result, the hard- ware efficiencies of the proposed designs with K =2 and 3 are 2.33 times and 3.32 times of that of the state-of-the-art works, respectively. Notice that this brief is a generalized version of our prior work [10]. The rest of this brief is organized as follows. Section II gives a brief review of polar codes. The proposed LLR 2 K b-SCL algorithm and the reduced-bit-width scheme for the sorting block are presented in Section III. Section IV develops the hard- ware architecture of the new algorithm. Hardware performance is analyzed and discussed in Section V. Section VI draws the conclusion. II. REVIEW OF POLAR CODES A. Encoding of Polar Codes The encoding procedure of (n, p) polar codes consists of two steps. First, the p-bit source message is extended to an n- bit message u =(u 1 ,u 2 ,...,u n ) with padding n p “0” bits. Here, those padded “0” bits are referred to as frozen bits, and their positions over u, namely, frozen positions, are known to both the transmitter and receiver. Then, u is multiplied with an n-by-n generator matrix G to obtain the transmitted codeword x =(x 1 ,x 2 ,...,x n )= uG. Fig. 1 shows the example of an n =4 polar code encoder. For details on encoding of polar codes, the reader is referred to [1]. B. SC Decoding Algorithm At the receiver end, the original transmitted codeword x is corrupted by noise to the received codeword y =( y 1 ,y 2 ,...,y n ). An LLR-based successive-cancellation (SC) decoder can be 1549-7747 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of LLR-Based Successive-Cancellation List Decoder for … · Bo Yuan and Keshab K. Parhi, Fellow, IEEE...

Page 1: LLR-Based Successive-Cancellation List Decoder for … · Bo Yuan and Keshab K. Parhi, Fellow, IEEE Abstract —Due to their capacity-achieving property, polar codes have become one

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 64, NO. 1, JANUARY 2017 21

LLR-Based Successive-Cancellation List Decoderfor Polar Codes With Multibit Decision

Bo Yuan and Keshab K. Parhi, Fellow, IEEE

Abstract—Due to their capacity-achieving property, polar codeshave become one of the most attractive channel codes. To date, thesuccessive-cancellation list (SCL) decoding algorithm is the pri-mary approach that can guarantee outstanding error-correctingperformance of polar codes. However, the hardware designs ofthe original SCL decoder have a large silicon area and a longdecoding latency. Although some recent efforts can reduce eitherthe area or latency of SCL decoders, these two metrics still cannotbe optimized at the same time. This brief, for the first time,proposes a general log-likelihood-ratio (LLR) based SCL decodingalgorithm with multibit decision. This new algorithm, referredto as LLR − 2K b-SCL, can determine 2K bits simultaneouslyfor arbitrary K with the use of LLR messages. In addition, areduced-data-width scheme is presented to reduce the critical pathof the sorting block. Then, based on the proposed algorithm, aVLSI architecture of the new SCL decoder is developed. Synthesisresults show that, for an example (1024, 512) polar code withlist size 4, the proposed LLR − 2K b − SCL decoders achievea significant reduction in both area and latency as compared toprior works. As a result, the hardware efficiencies of the proposeddesigns with K = 2 and 3 are 2.33 times and 3.32 times of that ofthe state-of-the-art works, respectively.

Index Terms—Log-likelihood-ratio (LLR), multibit decision,polar codes, successive-cancellation (SC), VLSI.

I. INTRODUCTION

POLAR codes [1] have emerged as one of the most attractiveforward error correction codes in recent years. Due to their

unique capacity-achieving property, polar codes provide out-standing error-correcting capability that would be very usefulfor digital transmission.

However, to date, polar codes suffer from inferior finitelength error-correcting performance. In particular, in the regionof short or medium code length, polar codes are not comparableto the LDPC codes in terms of coding gain. To solve this prob-lem, the successive-cancellation list (SCL) decoding algorithmwas proposed in [2] to improve the coding gain of the polarcodes. In [2], it was shown that, with the use of SCL algorithm,polar codes can outperform the WiMAX LDPC codes even fora shorter code length.

Manuscript received January 22, 2016; accepted March 15, 2016. Date ofpublication March 24, 2016; date of current version December 22, 2016. Thisbrief was recommended by Associate Editor C. K. Tse.

B. Yuan was with the Department of Electrical and Computer Engineering,University of Minnesota, Minneapolis, MN 55455 USA. He is now with theDepartment of Electrical Engineering, University of New York, City College,New York, NY 10031 USA (e-mail: [email protected]).

K. K. Parhi is with the Department of Electrical and Computer Engineer-ing, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this brief are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSII.2016.2546904

Although the SCL algorithm can help polar codes achievebeyond-LDPC performance, this approach suffers from highcomplexity and long latency. Recently, some efforts have beenproposed to address these problems. A log-likelihood-ratio(LLR) based SCL algorithm was proposed in [3] and [4] toreduce the amount of combinational logic and memory. In [5],[6], and [11], low-latency SCL algorithms were presented toreduce the required number of decoding cycles. However, theseprior works only focused on either reducing area or latency butnot on optimizing these two metrics at the same time.

This brief, for the first time, proposes a general reduced-latency LLR-based SCL decoding algorithm. This new algo-rithm, namely, LLR − 2K b-SCL, can determine 2K bits inone cycle for arbitrary K with the use of LLR messages. Asa result, it can achieve both low complexity and short latency.In addition, a reduced-data-width scheme is presented to reducethe critical path of the sorting block. Based on the proposed al-gorithm, a VLSI architecture of the new SCL decoder is devel-oped. Synthesis results show that, for an example (1024, 512)polar code with list size 4, the proposed LLR − 2K b-SCLdecoder achieves great reduction in both area and latency ascompared to the prior works, respectively. As a result, the hard-ware efficiencies of the proposed designs with K = 2 and 3 are2.33 times and 3.32 times of that of the state-of-the-art works,respectively. Notice that this brief is a generalized version ofour prior work [10].

The rest of this brief is organized as follows. Section II givesa brief review of polar codes. The proposed LLR − 2K b-SCLalgorithm and the reduced-bit-width scheme for the sortingblock are presented in Section III. Section IV develops the hard-ware architecture of the new algorithm. Hardware performanceis analyzed and discussed in Section V. Section VI draws theconclusion.

II. REVIEW OF POLAR CODES

A. Encoding of Polar Codes

The encoding procedure of (n, p) polar codes consists oftwo steps. First, the p-bit source message is extended to an n-bit message u = (u1, u2, . . . , un) with padding n− p “0” bits.Here, those padded “0” bits are referred to as frozen bits, andtheir positions over u, namely, frozen positions, are known toboth the transmitter and receiver. Then, u is multiplied with ann-by-n generator matrix G to obtain the transmitted codewordx = (x1, x2, . . . , xn) = uG. Fig. 1 shows the example of ann = 4 polar code encoder. For details on encoding of polarcodes, the reader is referred to [1].

B. SC Decoding Algorithm

At the receiver end, the original transmitted codeword x iscorrupted by noise to the received codewordy=(y1, y2, . . . , yn).An LLR-based successive-cancellation (SC) decoder can be

1549-7747 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: LLR-Based Successive-Cancellation List Decoder for … · Bo Yuan and Keshab K. Parhi, Fellow, IEEE Abstract —Due to their capacity-achieving property, polar codes have become one

22 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 64, NO. 1, JANUARY 2017

Fig. 1. n = 4 polar encoder, cited from [5].

Fig. 2. Example LLR-based SC decoding scheme for n = 4, cited from [7].

Fig. 3. Searching of SCL decoder with n=4, p=4, and L=2, cited from [5].

used to recoveru from y. Fig. 2 shows an example SC decodingprocedure for n = 4 polar codes. It can be seen that the SCdecoder consists of two types of computation units, namely, fand g, respectively. Here, each f or g unit is labeled with anumber, which indicates the time index when the correspondingunit is activated for computation. It can be seen that, at clockcycles 2, 3, 5, and 6, the f or g unit in the last stage (stage 2in this example) sends its LLR output to the hard-decision hunit to determine the decoded bit. Notice that the example SCdecoder is based on LLR form. In this case, the function of fand g units can be approximated as

f(a, b) = sign(a)sign(b)min(|a|, |b|) (1)g(a, b) = a(−1)usum + b. (2)

C. SCL Algorithm

The decoding process of polar codes can also be interpretedfrom the view of code tree. Fig. 3 shows an example code treefor n = 4 polar codes. Here, each level represents a decodedbit. The value that is associated with each node represents theprobability (metric) of the corresponding decoding path. Forexample, 0.23 at level 4 represents that the probability for thelength-4 path (1000) is Pr (u1 = 1, u2 = 0, u3 = 0, u4 = 0) =0.23. Based on this representation, the objective of a successfuldecoding procedure is to find the length-n path that is thecorrected codeword. To achieve this goal, the SC decoder firstvisits the children nodes that are associated with the currentsurvival path at each level. Then, it selects the new path thathas the larger metric as the updated survival path. Because thissearching strategy is only locally optimal, the performance ofthe SC decoder is limited.

Different from the SC decoder that only selects a single path,the L-size SCL algorithm utilizes L different searching paths.Therefore, it is more likely for the SCL algorithm to find thedesired path than the SC algorithm. For example, the SCL de-coder with L = 2 is able to trace the valid path (1000) in Fig. 3,while the SC decoder fails to find it.

III. PROPOSED LLR-BASED MULTIBIT DECODING

From Fig. 3, it can be seen that the SCL algorithm needsto visit the nodes at each level of the code tree. This traversingsearching style leads to a very long decoding latency. To addressthis problem, a reduced-latency algorithm that can determine2K bits simultaneously was proposed in [5]. However, theapproach in [5] is based on the likelihood messages, whichrequires much larger computation and memory complexity thanthe commonly used LLR-based decoder in [3] and [4]. On theother hand, those LLR-based SCL decoders in [3] and [4] areonly able to determine one bit in one cycle; hence, they havemuch longer latency than the decoder in [5]. As a result, to date,those prior SCL decoders in [3]–[5] are not able to reduce thelatency and area at the same time.

A. LLR-Based Multibit SCL Decoding

This section presents an LLR-based SCL decoding algorithmwith 2K bits decision, namely, LLR − 2K b-SCL. Notice that[5] also proposed a reduced-latency decoder that can determinemultiple bits in one cycle. However, the SCL decoder in [5]is based on the likelihood form, while the design in this briefis based on the LLR form. This difference in the fundamentalrepresentation of the proposed algorithm leads to much lesscomputation complexity than the approach in [5].

Next, we show how to determine each successive 2K bit asu2K(i−1)+1, . . . , u2Ki at the same time in the LLR form. Fromthe view of code tree (see Fig. 3), this means that the SCLdecoder is able to directly calculate the metrics of length-(2Ki)paths from the metrics of length-(2K(i − 1)) paths. In general,such direct computation is performed by an LLR-based metriccomputation unit (MCU), which replaces the original last Kstages of each LLR-based SC decoder (see Fig. 2). In thefollowing paragraph, we show how to derive the function ofthe LLR-based MCU.

Assume that the previously decoded 2K(i− 1) bits u1, . . . ,u2K(i−1) are z1, . . . , z2K(i−1) , respectively. This event is de-

noted as u2K(i−1)1 = z

2K(i−1)1 . Therefore, in the logarithmic

domain, the length-(2Ki) path metric can be represented as

M(α1 . . . α2K , z

2K(i−1)1

)� ln

(Pr

(u2Ki2K(i−1)+1 = α2K

1 , u2K(i−1)1 = z2

K(i−1)

1

))(3)

where u2Ki2K(i−1)+1 is defined as (u2K(i−1)+1, . . . , u2Ki) that is

the set of the current 2K decoded bits. In addition, α2K1 is de-

fined as (α1, . . . , α2K1 ), whose elements are the binary values.

Equation (3) contains the probabilistic information of the cur-rent2K decoded bits, which is unknown during the decoding pro-cedure. To address this problem, we need to further represent thelogarithmic path metrics with the LLR messages that are inputto the MCU. Such reformulation is based on the fact that thepolar decoding procedure is inherently “guided” by its encodingprocedure [8]. Simultaneous right-to-left decoding procedureof the successive 2K bits [see Fig. 4(a)] involves the estimation

Page 3: LLR-Based Successive-Cancellation List Decoder for … · Bo Yuan and Keshab K. Parhi, Fellow, IEEE Abstract —Due to their capacity-achieving property, polar codes have become one

YUAN AND PARHI: LLR-BASED SCL DECODER FOR POLAR CODES WITH MULTIBIT DECISION 23

Fig. 4. (a) Right-to-left decoding of 2K bits. (b) Left-to-right encoding bits.

of the left-to-right encoding procedure [see Fig. 4(b)]. Hence, if

u2Ki2K(i−1)+1 is estimated to beα2K

1 , then out2K

1 �(out1, . . . , out2K)

should be the estimation of α2K1 U , where U is the 2K-by-2K

generator matrix. As a result, (3) can be further rewritten as

M(α1 . . . α2K , z

2K(i−1)1

)= ln

(Pr

(out

2K

1 = α2K

1 U, u2K(i−1)1 = z2

K(i−1)

1

)). (4)

Notice that the determination of out1, . . . , out2K is indepen-dent. In addition, if we denote the jth column vector of U asU(j), then we have outj = α2K

1 U(j). As a result, (4) can befurther derived as follows:

M(α1 . . . α2K , z

2K(i−1)1

)= ln

(Pr

(out

2K

1 = α2K

1 U |u2K(i−1)1 = z2

K(i−1)

1

)×Pr

(u2K(i−1)1 = z2

K(i−1)

1

))

=2K∑j=1

ln(Pr

(outj = α2K

1 U(j)|u2K(i−1)1 = z2

K(i−1)

1

))

+M(z2K(i−1)1

)(5)

where M(z2K(i−1)1 ) = ln(Pr(u

2K(i−1)1 = z2

K(i−1)

1 )) is the log-arithmic length-(2K(i − 1)) path metric.

Recall that each SC component decoder is based on the LLRform. In that case, the jth input to the MCU block is

sj = ln

⎛⎝Pr

(outj = 0|u2K(i−1)

1 = z2K(i−1)1

)

Pr(

outj = 1|u2K(i−1)1 = z

2K(i−1)1

)⎞⎠ .

As a result, we can obtain the elements of the first item in (5)

Pr(

outj = 0|u2K(i−1)1 = z

2K(i−1)1

)=

esj

esj + 1

Pr(

outj = 1|u2K(i−1)1 = z

2K(i−1)1

)=

1

esj + 1. (6)

Substituting (6) into (5), we have

M(α1 . . . α2K , z

2K(i−1)1

)

=2K∑j=1

(sj

(1− α2K

1 U(j))− ln (esj + 1)

)+M

(z2K(i−1)1

).

(7)

Equation (7) describes the LLR-based update principle forpath metrics. Once the MCU block receives the 2K input LLRmessages sj and the previous metric of length-(2K(i−1)) path,it can immediately calculate the new metric of length-(2Ki)path with the use of (7), which corresponds to the simultaneousdecision for 2K bits.

Notice that(7) contains exponential and logarithmic functions,which require long critical paths in hardware design. Therefore,(7) needs to be simplified for feasible VLSI implementation.

Consider ln(1+ex)≈x for large x; otherwise, 0 for small x.Equation (7) can be further approximated as follows:

M(α1 . . . α2K , z

2K(i−1)1

)

≈2K∑j=1

(sj

(1− α2K

1 U(j))− δ(sj)

)+M

(z2K(i−1)1

)(8)

where δ(sj) = sj if sj ≥ 0; otherwise, 0.Equation (8) shows how to directly calculate the metric of

length-(2Ki) paths from the metric of length-(2K(i−1)) paths.With the use of this update principle, we can develop theLLR-based SCL decoding algorithm with 2K bits decision asscheme-A. In general, an L-size LLR − 2K b-SCL decoderconsists of L copies of the LLR-based SC decoder. To de-code every 2K successive bit, each SC component decoderfirst performs the regular SC decoding procedure until thelast-(m−K) stage (see Fig. 2), where m = log2 n. At thistime, the (m−K) stage outputs 2K LLR messages sj (j =1, 2, . . . , 2K) to the MCU block. Then, the MCU block in eachSC component decoder calculates the new path metrics withthe use of (8). After that, all of the updated path metrics fromthe L SC component decoders are compared, and L largest areselected as the survival path metrics. The aforementioned entireprocedure is repeated for every 2K bit until all of the n bits aredetermined. Note that, similar to [5], a simple zero-forcing unit(ZFU) is needed after the computation of (8), which helps todrop the unqualified paths that violate the frozen conditions.

Scheme A L-size LLR−2K b-SCL Algorithm for (n, k) polar codes

1: Input: Log-Likelihood ratios of each bit in the receivedcodeword

2: Initialization: Path metric M0 = 0 for each survival path3: For i = 1 to n/2K

4: For each length-(2K(i−1)) survival path u2K (i−1)1 =z2i−2

1

5: SC component decoding:6: Activate stage-1 to stage-(m−K) of LLR-based SC

decoder7: stage-(m−K) output 2K LLR-based message sj(j = 1,

2, . . . , 2K)8: Path Expansion:

9: Expand survival path z2K(i−1)1 to 22

Kcandidate paths

(u2K i1 = (α2K

1 , z2i−21 ))

10: 1 length-(2K(i− 1)) path ⇒ 22K

length-(2K i) paths11: Metric Computation:

12: Calculate22K

actual path metricsM(α1 . . .α2K,z2K(i−1)1 )

by (8)13: Forcing Zero:

14: M(α1 . . .α2K ,z2K (i−1)1 ) for path u2

Ki1 =(α2K

1 ,z2i−21 )

15: u2K (i−1)+j is frozen ⇒ all M(α1α2 . . . αj−10αj+1 . . .

α2K , z2K (i−1)1 ) = −inf

16: End for17: Compare and Prune:

18: Compare M(α1 . . . α2K , z2K(i−1)1 ) for all of the 22

K

length-(2K i) candidate paths19: Select L paths with the L largest metrics as the new

survival paths20: End for21: Output: Choose the length-n survival path with the largest

metric

Page 4: LLR-Based Successive-Cancellation List Decoder for … · Bo Yuan and Keshab K. Parhi, Fellow, IEEE Abstract —Due to their capacity-achieving property, polar codes have become one

24 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 64, NO. 1, JANUARY 2017

Fig. 5. Simulation results of (1024, 512) polar codes over the AWGN channel.

B. Reduced-Data-Width Scheme for Sorting Block

Typically, the Q = 6 bit quantization scheme is sufficientfor the fixed-point implementations of the LLR-based SCLdecoder. However, as indicated in [3], the representation of pathmetrics needs more bits since the path metrics have a larger datarange than the propagated LLR messages. In [3], the authorsshowed that M = 8 bit for path metrics can avoid significantperformance degradation in terms of frame error rate. Since theoverall critical path of the SCL decoder is in the sorting blockthat sorts those path metrics [3], [5], the escalating data-widthof path metrics inevitably causes a significant increase incritical path delay. To address this challenge, we propose areduced-bit-width scheme for the sorting block. The key idea isto only utilize S = M − 1 bits to represent the path metrics forsorting, while the representations of path metrics for updatingand storing are still based on M bits. This approach is derivedfrom the following observation: in the SCL decoder, the sortingblock does not require as high precision as MCUs, since thefunction of the soring block is just to rank the path metricswithout changing their values, while the MCUs have to adopt alarger data-width to guarantee accurate calculation. Therefore,a reduced-data-width for the sorting block does not cause asignificant performance degradation but enables reduction incritical path delay. Fig. 5 shows the fixed-point simulationresults for the proposed LLR − 2K b-SCL algorithm withreduced-data-width for the sorting block. Here, the simulationenvironment is an AWGN channel with BPSK modulation,and the code parameters are n = 1024 and r = 0.5. From thefigure, it can be seen that, compared to the original schemeusing S = 8 bits for sorting the path metric, the proposedscheme with S = 7 bits only has negligible performance lossfor different values of K and L.

IV. HARDWARE ARCHITECTURE

A. Overall Architecture

Fig. 6 shows the overall hardware architecture of the pro-posed L-size LLR − 2K b-SCL algorithm decoder. The datapath of the entire decoder contains L LLR-based SC decodersplus a metric sorting block. For each SC component decoder,it is reformulated from the LLR-based decoder in [8], whichretains the (m−K) stages, but the last K stages are replacedby the LLR-based MCU and ZFU. In addition, the requiredmemory resource of the entire decoder consists of registerarrays, bulk memory, and buffer for survival paths, path metrics,propagating LLR messages, and channel outputs, respectively.

Fig. 6. Overall architecture of the LLR − 2K b-SCL decoder.

Fig. 7. Architecture of MCU for K = 3.

In this section, the hardware design of the sorting block isvery straightforward and similar to the approaches in [3] and[5]. The only difference is that the data-width of comparatorsis reduced from M bits to M − 1 bits, and the least significantbits of all of the input path metrics are dropped to be consistentwith the data-width of comparators. Therefore, in this section,we focus on other parts of the data path and memory resource.Notice that, since ZFU can be easily implemented with multi-plexers, the analysis of this block is omitted as well.

B. LLR-Based MCU

In Section III, (8) describes the function of MCU. Sincethis function depends on K , the hardware design of MCUvaries with different choices of K . Fig. 7 illustrates the innerarchitecture of MCU forK = 3. Here, δ(•) block can be simplyimplemented with a multiplexer. In addition, StoC and CtoSblocks represent the components that perform the conversionbetween sign-magnitude and 2’s complement forms.

C. LLR-Based PE

As shown in line 5–line 7 in scheme-A, the input LLRmessages of MCU are calculated from the first (m−K) stagesof the LLR-based SC decoder in [8]. In general, these stagesconsist of f and g nodes in Fig. 2, which can be implementedin hardware as the following processing element (PE) in Fig. 8.Notice here that the addition and subtraction in (2) are designedas unified adder and subtractor to save area.

D. Memory Resource

For the proposed decoder, different types of memory re-sources are needed for different types of data. Because the Lsurvival paths and their metrics need to be updated simulta-neously each time, they are stored in registers. In addition,

Page 5: LLR-Based Successive-Cancellation List Decoder for … · Bo Yuan and Keshab K. Parhi, Fellow, IEEE Abstract —Due to their capacity-achieving property, polar codes have become one

YUAN AND PARHI: LLR-BASED SCL DECODER FOR POLAR CODES WITH MULTIBIT DECISION 25

Fig. 8. Architecture of PE.

TABLE IHARDWARE PERFORMANCE OF (1024, 512) DECODERS WITH L = 4

for the propagating LLR messages that are processed in thePEs, L bulk memory banks are used to store the correspondingmessages in the L different SC component decoders. Thismemory partition approach can also avoid the potential memoryaccess conflict. Moreover, a specific buffer is used to store theinitial LLR inputs to the decoder.

V. PERFORMANCE AND DISCUSSION

A. Hardware Performance Comparison

Table I shows the hardware performance of different SCLdecoders with list size L = 4 for the same (1024, 512) polarcodes. For the designs with different technology libraries, theresults on area and frequency are scaled to the same 90-nmnodes. From this table, it can be seen that the proposed designwith K = 3 can achieve a much higher throughput than thesame design with K = 2 with a slight increase in area andcritical path delay (lower clock frequency). As a result, thehardware efficiency of the design with K = 3 can achieve572 Mbit/s/mm2, which is the highest among the listed works.

Compared with the LL-based designs [5] and [9], the pro-posed design with K = 3 has 71% and 52% less area, respec-tively. In addition, its low decoding latency leads to 31% and94.8% increase in throughput, respectively.

Compared with the LLR-SCL decoder in [3], the proposeddesign with K = 3 has 79.3% shorter latency, which translatesto a 133% increase in decoding throughput. It should be notedthat the proposed works with K = 2 and 3 to adopt the datapath balancing technique in [5] to reduce the critical path delay.If an advanced technique, such as optimized sorting block in[3], is utilized, the clock frequency of the proposed designs willbe further improved, thereby leading to even higher throughputand hardware efficiency.

B. Discussion on Relevant Works

In [3], an LLR-based SCL decoder was proposed. The deriva-tion for the LLR representation in [3] was similar to the workin [4], with a slight difference on the sign of path metrics. How-ever, the bit decision in [3] and [4] is serial, thereby causing lowthroughput. Different from these works, this brief enables thesimultaneous decision of each 2K bit with the LLR representa-tion; hence, it can achieve much higher throughput and lowerlatency. For instance, as shown in Table I, the proposed designwith K = 3 achieves 133% increase in data rate than [3].

In [9], a channel message compression scheme was proposedto reduce the memory requirement. However, because [9] isonly an LL-based decoder, its silicon area is at least two timesthe proposed design. Interestingly, because the area-optimizingtechniques in [9] and this brief are performed at differentlevels, they can be jointly used to develop a more area-efficientdecoder.

To address the long latency problem, multibit decision orthe so-called parallel output was proposed in [5], [6], and [11].Literature describes the reduced-latency technique in differentmanners, but all groups utilize the special recursive propertyof polar codes. However, different from those prior LL-basedworks, the proposed approach successfully enables multibitdecision with LLR-based messages, thereby leading to greatreduction in both computation complexity and memory require-ment, which are very important for application of polar codes.

VI. CONCLUSION

In this brief, we have presented the LLR − 2K b-SCL al-gorithm for polar code decoding. The proposed algorithm canreduce complexity and decoding latency at the same timewithout performance loss. Then, based on the proposed algo-rithm, we have developed the corresponding VLSI architecture.Hardware analysis shows that the proposed SCL decoders havea significant reduction in area and decoding latency.

REFERENCES

[1] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEETrans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, Jul. 2009.

[2] I. Tal and A. Vardy, “List Decoding of Polar Codes,” arXiv:1206.0050,May 2012.

[3] A. Balatsoukas-Stimming, M. Bastani Parizi, and A. Burg, “LLR-basedsuccessive cancellation list decoding of polar codes,” in Proc. IEEEInt. Conf. Acoustics, Speech and Signal Processing (ICASSP 2014),May 2014, pp. 3903–3907.

[4] B. Yuan and K. K. Parhi, “Successive cancellation list polar decoder usinglog-likelihood ratios,” in Proc. Asilomar Conf. Signal, Syst. Comput.,2014, pp. 548–552.

[5] B. Yuan and K. K. Parhi, “Low-latency successive-cancellation list de-coders for polar codes with multi-bit decision,” IEEE Trans. Very LargeScale Integr. (VLSI) Syst., vol. 23, no. 10, pp. 2268–2280, Oct. 2015.

[6] B. Li, H. Shen, D. Tse, and W. Tong, “Low-latency polar codes via hybriddecoding,” in Proc. 8th ISTC, Aug. 2014, pp. 223–227.

[7] B. Yuan and K. K. Parhi, “Successive cancellation decoding of polarcodes using stochastic computing,” in Proc. IEEE ISCAS, May 2015,pp. 3040–3043.

[8] B. Yuan and K. K. Parhi, “Low-Latency successive-cancellation polardecoder architectures using 2-bit decoding,” IEEE Trans. Circuits Syst.I, Reg. Papers, vol. 61, no. 4, pp. 1241–1254, Apr. 2014.

[9] J. Lin and Z. Yan, “An efficient list decoder architecture for polar codes,”IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 11,pp. 2508–2518, Nov. 2015.

[10] B. Yuan and K. K. Parhi, “Reduced-latency LLR-based SC list decoderfor polar codes,” in Proc. ACM Great Lakes Symp. VLSI, May 2015,pp. 107–110.

[11] H. Vangala, E. Viterbo, and Y. Hong, “A new multiple folded successivecancellation decoder for polar codes,” in Proc. IEEE ITW, Nov. 2014,pp. 381–385.