Iter ative D ec oding o f C onca tena ted C odes: A Tu torial D ecoding of C onca tena ted C odes: A...

13
EURASIP Journal on Applied Signal Processing 2005:6, 762–774 c 2005 Phillip A. Regalia Iterative Decoding of Concatenated Codes: A Tutorial Phillip A. Regalia epartement Communications, Images et Traitement de l’ Information, Institut National des T´ el´ ecommunications, 91011 Evry Cedex, France Department of Electrical Engineering and Computer Science, Catholic University of America, Washington, DC 20064, USA Email: [email protected] Received 29 September 2003; Revised 1 June 2004 The turbo decoding algorithm of a decade ago constituted a milestone in error-correction coding for digital communications, and has inspired extensions to generalized receiver topologies, including turbo equalization, turbo synchronization, and turbo CDMA, among others. Despite an accrued understanding of iterative decoding over the years, the “turbo principle” remains elusive to master analytically, thereby inciting interest from researchers outside the communications domain. In this spirit, we develop a tutorial presentation of iterative decoding for parallel and serial concatenated codes, in terms hopefully accessible to a broader audience. We motivate iterative decoding as a computationally tractable attempt to approach maximum-likelihood decoding, and characterize fixed points in terms of a “consensus” property between constituent decoders. We review how the decoding algorithm for both parallel and serial concatenated codes coincides with an alternating projection algorithm, which allows one to identify conditions under which the algorithm indeed converges to a maximum-likelihood solution, in terms of particular likelihood functions factoring into the product of their marginals. The presentation emphasizes a common framework applicable to both parallel and serial concatenated codes. Keywords and phrases: iterative decoding, maximum-likelihood decoding, information geometry, belief propagation. 1. INTRODUCTION The advent of the turbo decoding algorithm for parallel con- catenated codes a decade ago [1] ranks among the most sig- nificant breakthroughs in modern communications in the past half century: a coding and decoding procedure of rea- sonable computational complexity was finally at hand oer- ing performance approaching the previously elusive Shan- non limit, which predicts reliable communications for all channel capacity rates slightly in excess of the source entropy rate. The practical success of the iterative turbo decoding al- gorithm has inspired its adaptation to other code classes, no- tably serially concatenated codes [2, 3], and has rekindled in- terest [4, 5] in low-density parity-check codes [6], which give the definitive historical precedent in iterative decoding. The serial concatenated configuration holds particular interest for communication systems, since the “inner en- coder” of such a configuration can be given more general interpretations, such as a “parasitic” encoder induced by a convolutional channel or by the spreading codes used in CDMA. The corresponding iterative decoding algorithm can then be extended into new arenas, giving rise to turbo equal- This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ization [7, 8, 9] or turbo CDMA [10, 11], among doubt- less other possibilities. Such applications demonstrate the power of iterative techniques which aim to jointly opti- mize receiver components, compared to the traditional ap- proach of adapting such components independently of one another. The turbo decoding algorithm for error-correction codes is known not to converge, in general, to a maximum- likelihood solution, although in practice it is usually ob- served to give comparable performance [12, 13, 14]. The quest to understand the convergence behavior has spawned numerous inroads, including extrinsic information trans- fer (or EXIT) charts [15], density evolution of intermediate quantities [16, 17], phase trajectory techniques [18], Gaus- sian approximations which simplify the analysis [19], and cross-entropy minimization [20], to name a few. Some of these analysis techniques have been applied with success to other configurations, such as turbo equalization [21, 22]. Connections to the belief propagation algorithm [23] have also been identified [24], which approach in turn is closely linked to earlier work on graph theoretic methods [25, 26, 27, 28]. In this context, the turbo decoding algorithm gives rise to a directed graph having cycles; the belief propagation algorithm is known to converge provided no cycles appear in the directed graph, although less can be said in general once cycles appear.

Transcript of Iter ative D ec oding o f C onca tena ted C odes: A Tu torial D ecoding of C onca tena ted C odes: A...

EURASIP Journal on Applied Signal Processing 2005:6, 762–774c! 2005 Phillip A. Regalia

Iterative Decoding of Concatenated Codes: A Tutorial

Phillip A. RegaliaDepartement Communications, Images et Traitement de l’ Information, Institut National des Telecommunications,91011 Evry Cedex, France

Department of Electrical Engineering and Computer Science, Catholic University of America, Washington, DC 20064, USAEmail: [email protected]

Received 29 September 2003; Revised 1 June 2004

The turbo decoding algorithm of a decade ago constituted a milestone in error-correction coding for digital communications, andhas inspired extensions to generalized receiver topologies, including turbo equalization, turbo synchronization, and turbo CDMA,among others. Despite an accrued understanding of iterative decoding over the years, the “turbo principle” remains elusive tomaster analytically, thereby inciting interest from researchers outside the communications domain. In this spirit, we develop atutorial presentation of iterative decoding for parallel and serial concatenated codes, in terms hopefully accessible to a broaderaudience. We motivate iterative decoding as a computationally tractable attempt to approach maximum-likelihood decoding, andcharacterize fixed points in terms of a “consensus” property between constituent decoders. We review how the decoding algorithmfor both parallel and serial concatenated codes coincides with an alternating projection algorithm, which allows one to identifyconditions under which the algorithm indeed converges to a maximum-likelihood solution, in terms of particular likelihoodfunctions factoring into the product of their marginals. The presentation emphasizes a common framework applicable to bothparallel and serial concatenated codes.

Keywords and phrases: iterative decoding, maximum-likelihood decoding, information geometry, belief propagation.

1. INTRODUCTION

The advent of the turbo decoding algorithm for parallel con-catenated codes a decade ago [1] ranks among the most sig-nificant breakthroughs in modern communications in thepast half century: a coding and decoding procedure of rea-sonable computational complexity was finally at hand o!er-ing performance approaching the previously elusive Shan-non limit, which predicts reliable communications for allchannel capacity rates slightly in excess of the source entropyrate. The practical success of the iterative turbo decoding al-gorithm has inspired its adaptation to other code classes, no-tably serially concatenated codes [2, 3], and has rekindled in-terest [4, 5] in low-density parity-check codes [6], which givethe definitive historical precedent in iterative decoding.

The serial concatenated configuration holds particularinterest for communication systems, since the “inner en-coder” of such a configuration can be given more generalinterpretations, such as a “parasitic” encoder induced by aconvolutional channel or by the spreading codes used inCDMA. The corresponding iterative decoding algorithm canthen be extended into new arenas, giving rise to turbo equal-

This is an open access article distributed under the Creative CommonsAttribution License, which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

ization [7, 8, 9] or turbo CDMA [10, 11], among doubt-less other possibilities. Such applications demonstrate thepower of iterative techniques which aim to jointly opti-mize receiver components, compared to the traditional ap-proach of adapting such components independently of oneanother.

The turbo decoding algorithm for error-correction codesis known not to converge, in general, to a maximum-likelihood solution, although in practice it is usually ob-served to give comparable performance [12, 13, 14]. Thequest to understand the convergence behavior has spawnednumerous inroads, including extrinsic information trans-fer (or EXIT) charts [15], density evolution of intermediatequantities [16, 17], phase trajectory techniques [18], Gaus-sian approximations which simplify the analysis [19], andcross-entropy minimization [20], to name a few. Some ofthese analysis techniques have been applied with success toother configurations, such as turbo equalization [21, 22].Connections to the belief propagation algorithm [23] havealso been identified [24], which approach in turn is closelylinked to earlier work on graph theoretic methods [25, 26,27, 28]. In this context, the turbo decoding algorithm givesrise to a directed graph having cycles; the belief propagationalgorithm is known to converge provided no cycles appear inthe directed graph, although less can be said in general oncecycles appear.

Iterative Decoding of Concatenated Codes: A Tutorial 763

Interest in turbo decoding and related topics now ex-tends beyond the communications community, and has beenmet with useful insights from other fields; some referencesin this direction include [29] which draws on nonlinear sys-tem analysis, [30] which draws on computer science, in ad-dition to [31] (predating turbo codes) and [32] (more re-cent) which inject ideas from statistical physics, which in turncan be rephrased in terms of information geometry [33, 34].Despite this impressive pedigree of analysis techniques, the“turbo principle” remains di"cult to master analytically and,given its fair share of specialized terminology if not a certaindegree of mystique, is often perceived as di"cult to grasp tothe nonspecialist. In this spirit, the aim of this paper is to pro-vide a reasonably self-contained and tutorial development ofiterative decoding for parallel and serial concatenated codes,in terms hopefully accessible to a broader audience. The pa-per does not aim at a comprehensive survey of available anal-ysis techniques and implementation tricks surrounding it-erative decoding (for which the texts [12, 13, 14] would bemore appropriate), but rather chooses a particular vantagepoint which steers clear of unnecessary sophistication andavoids approximations.

We begin in Section 2 by reviewing optimum (maximuma posteriori and maximum-likelihood) decoding of parallelconcatenated codes. We motivate the turbo decoding algo-rithm as a computationally tractable attempt to approachmaximum-likelihood decoding. A characterization of fixedpoints is obtained in terms of a “consensus” property be-tween the two constituent decoders, and a simple proof ofthe existence of fixed points is obtained as an application ofthe Brouwer fixed point theorem.

Section 3 then reexamines the calculation of marginaldistributions in terms of a projection operator, leading to acompact formulation of the turbo decoding algorithm as analternating projection algorithm. The material of the sectionaims at a concrete transcription of ideas originally developedby Richardson [29]; we include in addition a minimum-distance property of the projector in terms of the Kullback-Leibler divergence, and review how the turbo decoding algo-rithm indeed converges to a maximum-likelihood solutionwhenever specific likelihood functions factor into the prod-uct of its marginals. The factorization is known [18] to holdin extreme signal-to-noise ratios.

Section 4 shows that the iterative decoding algorithm forserial concatenated codes also admits an alternating pro-jection interpretation, allowing us to transcribe all resultsfor parallel concatenated codes to their serial concatenatedcounterparts. This should also facilitate unified studies ofboth code classes. Concluding remarks are summarized inSection 5.

2. TURBO DECODING OF PARALLELCONCATENATED CODES

We begin by reviewing the classical turbo decoding algorithmfor parallel concatenated codes. For simplicity, we restrict ourdevelopment to the binary signaling case; the m-ary case can

Systematicencoder 1

(!1, . . . , !k ,"1, . . . ,"n!k)

(!1, . . . , !k , #1, . . . , #n!k)

! = (!1, . . . , !k)

Systematicencoder 2

Informationbits

Parity-checkbits

Figure 1: Parallel concatenated encoder structure.

Systematicencoder 1

! = (!1, . . . , !k)

Systematicencoder 1

Permu-tation

Systematic encoder 2

Figure 2: Particular realization of the second encoder by using thefirst encoder with an interleaver.

be handled by direct extension (see, e.g., [24] for a particu-larly clear treatment) or by mapping the m-ary constellationback to its binary origins.

To begin, a binary (0 or 1) information block ! =(!1, . . . , !k) is passed through two constituent encoders, as inFigure 1, to create two codewords:

!!1, . . . , !k,"1, . . . ,"n!k

",

!!1, . . . , !k, #1, . . . ,#n!k

". (1)

Both encoders are systematic and of rate k/n, so that the in-formation bits !1, . . . , !k are directly available in either code-word. Note also that the two encoders need not share a com-mon rate, although we will adhere to this case for ease of no-tation.

In practice, an expedient method of realizing the secondsystematic encoder is to permute (or interleave) the infor-mation bits !i and duplicate the first encoder, as in Figure 2.Since this is a particular instance of Figure 1, we will simplyconsider two separate encodings of ! = (!1, . . . , !k) in whatfollows and avoid explicit reference to the interleaving op-eration, despite its importance in the study of the distanceproperties of concatenated codes [35].

The encoder outputs are converted to antipodal signal-ing (±1) and transmitted over a channel containing additivenoise, giving the received signals xi, yi, and zi:

xi =!2!i ! 1

"+ bx,i, i = 1, 2, . . . , k;

yi =!2"i ! 1

"+ by,i, i = 1, 2, . . . ,n!k;

zi =!2#i ! 1

"+ bz,i, i = 1, 2, . . . ,n!k.

(2)

We assume that the noise samples bx,i, by,i, and bz,i are Gaus-sian and mutually independent, sharing a common vari-ance $2. For notational convenience, we arrange the received

764 EURASIP Journal on Applied Signal Processing

signals into the vectors

x =

#$$%

x1...xk

&''( , y =

#$$%

y1...

yn!k

&''( , z =

#$$%

z1...

zn!k

&''( . (3)

2.1. Optimum decoding

The maximum a posteriori decoding rule aims to calculatethe a posteriori probability ratios

Pr!!i = 1|x, y, z

"

Pr!!i = 0|x, y, z

" , i = 1, 2, . . . , k, (4)

with the decision rule favoring a 1 for the ith bit if this ratiois greater than one, and 0 if the ratio is less than one. By usingBayes’s rule, each ratio can be developed as

Pr!!i = 1|x, y, z

"

Pr!!i = 0|x, y, z

" =)

!:!i=1 Pr(!|x, y, z))!:!i=0 Pr(!|x, y, z)

=)

!:!i=1 p(x, y, z|!) Pr(!))!:!i=0 p(x, y, z|!) Pr(!)

,

(5)

involving the a priori probability mass function Pr(!) and thelikelihood function p(x, y, z|!), which is evaluated for the re-ceived x, y, and z as a function of the candidate informationbits ! = (!1, . . . , !k); the sum in the numerator (resp., de-nominator) is over all the configurations of the vector ! forwhich the ith bit is a “1” (resp., “0”). Since the noise samplesare assumed independent, the likelihood function naturallyfactors as

p(x, y, z|!) = p(x|!)p(y|!)p(z|!). (6)

For the Gaussian noise case considered here, the three likeli-hood evaluations appear as

p(x|!) " exp

*+!,,x! cx(!)

,,2

2$2

-.,

p(y|!) " exp

*+!,,y ! cy(!)

,,2

2$2

-.,

p(z|!) " exp

*+!,,z! cz(!)

,,2

2$2

-.,

(7)

where cx(!), cy(!), and cz(!) contain the antipodal symbols±1 which would be received as a function of the candidate in-formation bits !, in the absence of noise. For non-Gaussiannoise, the likelihood functions would, of course, assume dif-ferent forms.

The a posteriori probability ratios may therefore be writ-ten as

Pr!!i=1|x, y, z

"

Pr!!i=0|x, y, z

" =)

!:!i=1 p(x|!)p(y|!)p(z|!) Pr(!))!:!i=0 p(x|!)p(y|!)p(z|!) Pr(!)

,

i = 1, 2, . . . , k.(8)

If the a priori probability mass function Pr(!) is uniform (i.e.,Pr(!) = 1/2k for all !), then this reduces to the maximum-likelihood decision metric:

Pr!!i = 1|x, y, z

"

Pr!!i = 0|x, y, z

"

=)

!:!i=1 p(x|!)p(y|!)p(z|!))!:!i=0 p(x|!)p(y|!)p(z|!)

if Pr(!) is uniform.

(9)

If this expression were evaluated as written, the complexity ofan optimum decision rule would be O(2k), since there are 2k

configurations of the k information bits comprising !, lead-ing to as many likelihood function evaluations. This clearlybecomes impractical for sizable k.

Observe now that if we instead consider an optimum de-coding rule using only one of the constituent encoders, wemay write, by a development parallel to that above,

Pr!!i = 1|x, y

"

Pr!!i = 0|x, y

" =)

!:!i=1 p(x|!)p(y|!)Pr(!))!:!i=0 p(x|!)p(y|!)Pr(!)

, (10)

Pr(!i = 1|x, z)Pr(!i = 0|x, z)

=)

!:!i=1 p(x|!)p(z|!)Pr(!))!:!i=0 p(x|!)p(z|!)Pr(!)

. (11)

If each constituent encoder implements a trellis code, then xand y form a Markov chain, as do x and z; the complexity ofeither decoding expression can then be reduced to O(k) byusing the forward-backward algorithm from [36] (which, inturn, is a particular case of the sum-product algorithm [27]).

If the a priori probability function Pr(!) is indeed uni-form, then it weighs all terms in the numerator and de-nominator equally and, as such, is e!ectively relegated to anunused variable in either decoding expression (10) or (11).Rather than accepting this status, one can imagine replacingthe a priori probability function Pr(!), or “usurping” its po-sition, by some other function in an attempt to “bias” eitherdecoding rule (10) or (11) towards the maximum-likelihooddecoding rule in (9). In particular, if Pr(!) were replaced byp(z|!) in (10), or by p(y|!) in (11), then either expressionwould agree formally with (9).

In order to retain the O(k) complexity of the forward-backward algorithm from [36], however, the a priori proba-bility function Pr(!) is assumed to factor into the product ofits bitwise marginals:

Pr(!) = Pr!!1"

Pr!!2"· · ·Pr

!!k". (12)

The likelihood function p(y|!) or p(z|!) does not, on theother hand, generally factor into its bitwise marginals, thatis,

p(y|!) #= p!

y|!1"p!

y|!2"· · · p

!y|!k". (13)

As such, a direct usurpation of the a priori probability by thelikelihood function of the parity-check bits of the other con-stituent coder is not feasible. Rather, one must approximate

Iterative Decoding of Concatenated Codes: A Tutorial 765

the likelihood function p(y|!) or p(z|!) by a function thatdoes factor into the product of its marginals. Many candidateapproximations may be envisaged; that which has proved themost useful relies on extrinsic information values, which arereviewed next.

2.2. Extrinsic information values

We reexamine the likelihood function for the systematic bits:

p(x|!) = 1!$

2%$"k exp

/!

k0

i=1

1xi !!2!i ! 1

"22

2$2

3

=k4

i=1

exp5!1xi !!2!i ! 1

"22/2$26

$2%$

= p!x1|!1"p!x2|!2"· · · p

!xk|!k".

(14)

This shows that the likelihood function p(x|!) for the sys-tematic bits factors into the product of its marginals,1 justlike the a priori probability mass function:

Pr(!) = Pr!!1"

Pr!!2"· · ·Pr

!!k". (15)

Owing to these factorizations, each term from the numer-ator of (10) contains a factor p(xi|!i = 1) Pr(!i = 1), andeach term from the denominator contains a factor p(xi|!i =0) Pr(!i = 0). By isolating these common factors, we mayrewrite the ratio from (10) as

Pr!!i = 1|x, y

"

Pr!!i = 0|x, y

"

= p!xi|!i = 1

"

p!xi|!i = 0

"7 89 :

intrinsicinformation

Pr!!i = 1

"

Pr!!i = 0

"7 89 :

a prioriinformation

%)

!:!i=1 p(y|!);

j #=i p!xj|! j"

Pr!! j"

)!:!i=0 p(y|!)

;j #=i p!xj|! j) Pr

!! j"

7 89 :extrinsic information

.

(16)

The three terms on the right-hand side may be interpreted asfollows:

(i) the first term indicates what the ith received bit xi con-tributes to the determination of the ith transmitted bit!i; hence the name “intrinsic information.” It coincideswith the maximum-likelihood metric for determiningthe ith bit when no coding is used,

(ii) the second term expresses the a priori probability ratiofor the ith bit, and will be usurped shortly,

(iii) the third term expresses what the remaining bits in thepacket (i.e., of index j #= i) contribute to the determi-nation of the ith bit; hence the name “extrinsic infor-mation.”

1Although we show this factorization here for a Gaussian channel, thefactorization holds, of course, for any memoryless channel model.

Let T(!) = T1(!1)T2(!2) · · ·Tk(!k) be a factorable prob-ability mass function whose bitwise ratios are chosen tomatch the extrinsic information values above:

Ti!!i = 1

"

Ti!!i = 0

"

=)

!:!i=1 p(y|!);

j #=i p!xj|! j"

Pr!! j"

)!:!i=0 p(y|!)

;j #=i p!xj|! j"

Pr!! j" , i = 1, 2, . . . , k.

(17)

Since these values depend on the likelihood function p(y|!)(in addition to the systematic bits save for xi), we mayconsider T(!) a factorable function which approximates, insome sense, the likelihood function p(y|!). (We will see inTheorem 2 a condition under which this approximation be-comes exact). We now let T(!) usurp the place reserved forthe a priori probability function Pr(!) (denoted Pr(!) &T(!)) in the evaluation of the second decoder (11); since bothp(x|!) and T(!) factor into the product of their respectivemarginals, we have

)!:!i=1 p(x|!)p(z|!) Pr(!))!:!i=0 p(x|!)p(z|!) Pr(!)

&!)

!:!i=1 p(x|!)p(z|!)T(!))!:!i=0 p(x|!)p(z|!)T(!)

= p!xi|!i = 1

"

p!xi|!i = 0

"7 89 :

intrinsicinformation

Ti!!i = 1

"

Ti!!i = 0

"7 89 :

pseudoprior

%)

!:!i=1 p(z|!);

j #=i p!xj|! j"Tj!! j"

)!:!i=0 p(z|!)

;j #=i p!xj|! j"Tj!! j"

7 89 :extrinsic information

.

(18)

Here we adopt the term “pseudoprior” for T(!) since itusurps the a priori probability function; similarly, the re-sult of this substitution may be termed a “pseudoposterior”which usurps the true a posteriori probability ratio.

Let now U(!) = U1(!1)U2(!2) · · ·Uk(!k) denote anotherfactorable probability function whose bitwise ratios matchthe extrinsic information values furnished by this second de-coder:

Ui!!i = 1

"

Ui!!i = 0

"

=)

!:!i=1 p(z|!);

j #=i p!xj|! j"Tj!! j"

)!:!i=0 p(z|!)

;j #=i p!xj|! j"Tj!! j" , i = 1, 2, . . . , k.

(19)

This function may then usurp the a priori probability valuesused in the first decoder, and the process iterates. If we leta superscript (m) denote an iteration index, the coupling of

766 EURASIP Journal on Applied Signal Processing

Systematic

Parity-checkextrinsic

Pseudoprior

p(x|!)

p(y|!)

U(m)

U(m+1)

D T(m)

1st decoder

2nd decoder

Pseudoprior

Parity-checkextrinsic

Systematic

p(z|!)p(x|!)

Figure 3: Flow graph of the turbo decoding algorithm.

the two decoders admits an external description as

p!xi|!i = 1

"

p!xi|!i = 0

" U(m)i (1)

U (m)i (0)

T(m)i (1)

T(m)i (0)

=)

!:!i=1 p(y|!)p(x|!)U (m)(!))!:!i=0 p(y|!)p(x|!)U (m)(!)

,

(20)

p!xi|!i = 1

"

p!xi|!i = 0

" T(m)i (1)

T(m)i (0)

U (m+1)i (1)

U (m+1)i (0)

=)

!:!i=1 p(z|!)p(x|!)T(m)(!))!:!i=0 p(z|!)p(x|!)T(m)(!)

,

(21)

in which (20) furnishes T(m)(!) and (21) furnishesU (m+1)(!).This is depicted in Figure 3. A fixed point corresponds toU (m+1)(!) = U (m)(!) which, by inspection of the pseudopos-teriors above, yields the following property.

Property 1. A fixed point is attained if and only if the two de-coders yields the same pseudoposteriors (the left-hand sidesof (20) and (21)) for i = 1, 2, . . . , k.

A fixed point is therefore reflected by a state of “consen-sus” between the two decoders [15, 29, 37].

2.3. Existence of fixed points

A necessary (but not su"cient) condition for the algorithmto converge is that a fixed point exist, reflected by a state ofconsensus according to Property 1. A convenient tool in thisdirection is the Brouwer fixed point theorem [38], which as-serts that any continuous map from a closed, bounded, andconvex set into itself admits a fixed point; its application inthe present context gives the following result [18, 29].

Theorem 1. The turbo decoding algorithm from (20) and (21)always admits a fixed point.

To verify, consider the pseudopriors U (m)(!i) evaluatedfor !i = 1, which, at any iteration m, are (pseudo-) probabil-ities lying between 0 and 1:

0 ' U (m)(1) ' 1, i = 1, 2, . . . , k. (22)

This clearly gives a closed, bounded, and convex set. Since theupdated pseudopriors U (m+1) also lie in this set, and sincethe map from U (m)(!) to U (m+1)(!) is continuous [18, 29],the conditions of the Brouwer theorem are satisfied, to showexistence of a fixed point.

3. PROJECTIONS AND PRODUCT DISTRIBUTIONS

A key element of the development thus far concerns the cal-culation of bitwise marginal ratios which, according to [20],provide the troublesome element which accounts for the dif-ference between a provably convergent algorithm [20] whichis not practically implementable, and the implementable—but di"cult to grasp—turbo decoding algorithm. We de-velop here an alternate viewpoint of the calculation of bitwisemarginals in terms of a certain projection operator, adaptedfrom the seminal work of Richardson [29].

Let q(!) be a distribution, for example, a probability massfunction, or a likelihood function, which assigns a nonnega-tive number to each of the 2k evaluations of ! = (!1, . . . , !k).We let q be the vector built from these 2k evaluations:

q =

#$$$$%

q1! = (0, . . . , 0, 0)

2

q1! = (0, . . . , 0, 1)

2

...q1! = (1, . . . , 1, 1)

2

&''''(

<====>====?

2k evaluations. (23)

We assume that q is scaled such that its entries sum to one.The k marginal distributions determined from q(!), eachhaving two evaluations at !i = 0 and !i = 1 (1 ' i ' k),are given by

q1!!1 = 0

"=0

!:!1=0

q(!), q1!!1 = 1

"=0

!:!1=1

q(!),

q2!!2 = 0

"=0

!:!2=0

q(!), q2!!2 = 1

"=0

!:!2=1

q(!),

......

qk!!k = 0

"=0

!:!k=0

q(!), qk!!k = 1

"=0

!:!k=1

q(!).

(24)

Definition 1. The distribution q(!) is a product distribution ifit coincides with the product of its marginals:

q(!) = q1!!1"q2!!2"· · · qk

!!k". (25)

The set of all product distributions is denoted by P .

It is straightforward to check that q(!) ( P if and only ifits vector representation is Kronecker decomposable as

q = q1 ) q2 ) · · ·) qk (26)

with

qi =@qi!!i = 0

"

qi!!i = 1

"A

, i = 1, 2, . . . , k. (27)

Iterative Decoding of Concatenated Codes: A Tutorial 767

We note also that P is closed under multiplication: if q(!)and r(!) belong to P , so does their product:

s(!) = &q(!)r(!) ( P , (28)

where the scalar & is chosen so that the evaluations of s(!)sum to one. This operation can be expressed in vector nota-tion using the Hadamard (or term-by-term) product:

s = &q* r. (29)

To simplify notations, the scalar & will not be explicitly indi-cated, with the tacit understanding that the elements of thevector must be scaled to sum to one; we will henceforth writes = q* r, omitting explicit mention of the scale factor &.

Suppose now r(!) is not a product distribution. Ifr1(!1), . . . , rk(!k) denote its marginal distributions, then wecan set

q(!) = r1!!1"r2!!2"· · · rk

!!k", (30)

to create a product distribution q(!) ( P which, by con-struction, generates the same marginals as r(!):

qi!!i"= ri!!i", i = 1, 2, . . . , k. (31)

This operation will be denoted by

q = %(r). (32)

We can observe that q is a product distribution (q ( P ) ifand only if %(q) = q, and since %(r) ( P for any distributionr, we must have %(%(r)) = %(r), so that %(·) is a projectionoperator.

Definition 2. The distribution q is the projection of r into Pif (i) q ( P and (ii) qi(!i) = ri(!i).

The following section details some simple information-theoretic properties which reinforce the interpretation as aprojection.

3.1. Information-theoretic properties of the projector

The results summarized in this section may be understood asconcrete transcriptions of ultimately deeper results from thefield of information geometry [33, 34]. To begin, we recallthat the entropy of a distribution r(!) is defined as [39]

H(r) = !0

!

r(!) log2 r(!), (33)

involving the sum over all 2k configurations of the vector ! =(!1, . . . , !k). A basic result of information theory asserts thatthe entropy of any joint distribution is upper bounded by thesum of the entropies of its marginal distributions [39], thatis,

H(r) 'k0

i=1

H!ri"=

k0

i=1

/!

10

!i=0

ri!!i"

log2

!ri!!i""3

, (34)

with equality if and only if r(!) factors into the product of itsmarginals [r(!) ( P ]. Therefore, if r #( P , then by settingq = %(r), we have

H(r) 'k0

i=1

H!ri"=

k0

i=1

H!qi"= H(q), (35)

because qi(!i) = ri(!i) and q(!) ( P . This shows that theprojection q = %(r) maximizes the entropy over all distribu-tions that generate the same marginals as r(!).

We recall next that the Kullback-Leibler distance (or rela-tive entropy) between two distributions r(!) and s(!) is givenby [20, 39]

D(r+s) =0

!

r(!) log2r(!)s(!)

, 0, (36)

withD(r+s) = 0 if and only if r(!) = s(!) for all !. If s(!) ( Pand q = %(r), then we may verify (see the appendix) that

D(r+s) = D(r+q) + D(q+s) , D(r+q), (37)

since D(q+s) , 0, with equality if and only if s(!) = q(!).This shows that the projection q(!) is the closest product dis-tribution to r(!) using the Kullback-Leibler distance.

3.2. Application to turbo decoding

The added complication of accounting for the calculation ofbitwise marginals noted in [20] can be o!set by appealingto the previous section, which interprets bitwise marginalsas resulting from a projection. Accordingly, we show in thissection how the turbo decoding algorithm of (20) and (21)falls out as an alternating projection algorithm [29].

Let px, py , and pz denote the vectors which collect the 2k

evaluations of the likelihood functions p(x|!), p(y|!), andp(z|!), respectively, that is,

px =

#$$$$%

p!

x|! = [0, . . . , 0, 0]"

p!

x|! = [0, . . . , 0, 1]"

...p!

x|! = [1, . . . , 1, 1]"

&''''(

<====>====?

2k evaluations, (38)

and similarly for py and pz. Likewise, let the vectors t(m) andu(m) collect the 2k evaluations ofT(m)(!) andU (m)(!), respec-tively, at a given iteration m.

We can observe that the right-hand side of (20) cal-culates the bitwise marginal ratios of the distributionp(y|!)p(x|!)U (m)(!); this distribution admits a vector rep-resentation of the form py * px * u(m). The left-hand side of(20) displays the bitwise marginal ratios of the product dis-tribution px * u(m) * t(m) which generates, by construction,the same bitwise marginals as py * px * u(m). This confirmsthat px * u(m) * t(m) is the projection of py * px * u(m) inP . By applying the same reasoning to (21), we establish thefollowing [29].

768 EURASIP Journal on Applied Signal Processing

Proposition 1. The turbo decoding algorithm of (20) and (21)admits an exact description as the alternating projection algo-rithm

px * u(m) * t(m) = %!

py * px * u(m)", (39)

px * u(m+1) * t(m) = %!

pz * px * t(m)". (40)

From this, a connection with maximum-likelihood de-coding follows readily [18].

Theorem 2. If px*py and/or px*pz is a product distribution,then

(1) the turbo decoding algorithm ((39) and (40)) convergesin a single iteration;

(2) the pseudoposteriors so obtained agree with the maxi-mum-likelihood decision rule for the code.

For the proof, assume that px *py ( P . We already haveu(m) ( P , and since P is closed under multiplication, we seethat py * px * u(m) ( P . Since the projector behaves as theidentity operation for distributions in P , the first decoderstep of the turbo decoding algorithm from (39) becomes

px * u(m) * t(m) = %!

py * px * u(m)" = py * px * u(m).(41)

From this, we identify px * t(m) = px * py for all iterationsm to show that a fixed point is attained. The second decoderfrom (40) then gives

%!

pz * px * t(m)" = %!

pz * px * py", (42)

which furnishes the bitwise marginal ratios ofp(x|!)p(y|!)p(z|!). This agrees with the maximum-likelihood decision rule seen previously in (9). The proofwhen instead px * pz ( P follows by exchanging the role ofthe two decoders.

Note that since px is already a product distribution (i.e.,px ( P ), it is su"cient (but not necessary) that py ( P tohave px * py ( P . One may anticipate from this result thatif px * py and/or px * pz is “close” to a product distribution,then the algorithm should converge “rapidly;” formal stepsconfirming this notion are developed in [18]. Such proxim-ity to a product distribution can be verified, in particular, inextreme signal-to-noise ratios [18].

Example 1 (high signal-to-noise ratios). Let !- denote thevector of true information bits. The joint likelihood evalu-ation for x and y becomes

p(x, y|!) " exp

/!,,cx!!-"! cx(!) + bx

,,2

2$2

!,,cy!!-"! cy(!) + by

,,2

2$2

3,

(43)

where cx(!) and cy(!) denote the antipodal (±1) representa-tion of the coded information bits !, and where bx and by arethe vectors of channel noise samples. As the noise variance $2

tends to zero, we have bx, by . 0, and

p(x, y|!) $2.0!!!!. '!! ! !-

"=BCD

1, ! = !-,0, ! #= !-.

(44)

We note that the delta function can always be written as theproduct of its marginals (which are themselves delta func-tions of the individual bits of !-). Experimental evidenceconfirms that, in high signal-to-noise ratios, the algorithmconverges rapidly to decoded symbols of high reliability.

Example 2 (poor signal-to-noise ratios). As the noise vari-ance $2 increases, the likelihood evaluations are dominatedby the presence of the noise terms; ratios of candidate likeli-hood evaluations then tend to 1, which is to say that p(x, y|!)approaches a uniform distribution:

p(x, y|!) $2./!!!!. 12k

0!. (45)

We note that a uniform distribution can always be written asthe product of its marginals (which are themselves uniformdistributions). Experimental evidence again confirms (e.g.,[15, 18]) that, in poor signal-to-noise ratios, the algorithmconverges rapidly to a fixed point, but o!ers low confidencein the decoded symbols.

Although the above examples assume a Gaussian channelfor simplicity, the basic reasoning can be extended to othermemoryless channel models. More interesting, of course, isthe convergence behavior for intermediate signal-to-noiseratios, which still presents a challenging problem. A naturalquestion at this stage, however, is whether there exist con-stituent encoders which would give px * py or px * pz as aproduct distribution irrespective of the signal-to-noise ratio.The answer is in the a"rmative by considering, for example,a repetition code for the second constituent encoder. The ar-guments showing that px ( P can then be copied to showthat pz ( P as well (and therefore that px * py ( P ). Butthe distance properties of the resulting concatenated code arenot very impressive, being basically the same as for the firstconstituent encoder. This concurs with an observation from[24], namely that “easily decodable” codes do not tend to begood codes.

4. SERIAL CONCATENATED CODES

We turn our attention now to serial concatenated codes,which have been studied extensively by Benedetto and hiscoworkers [2, 3, 35], and which encompass an ultimatelyricher structure. Our aim in this section is to show that thealternating projection interpretation again carries through,a!ording thus a unified study of serial and parallel concate-nated codes.

Iterative Decoding of Concatenated Codes: A Tutorial 769

Outerencoder Interleaver Inner

encoder

(!1, . . . , !k7 89 :)

!

((1, . . . , (k , (k+1, . . . , (n)7 89 :

(

()1, . . . ,)l7 89 :)

)

Figure 4: Flow graph for a serial concatenated code, with optionalinterleaver.

The basic flow graph for serial concatenated codes isdepicted in Figure 4 in which the information bits ! =(!1, . . . , !k) are first processed by an outer encoder, whichhere is systematic, so that the first k bits of its output ( =((1, . . . , (n) are the information bits:

(i = !i, i = 1, 2, . . . , k. (46)

The remaining bits (k+1, . . . , (n furnish the n!k parity-checkbits. The cascaded inner encoder may admit di!erent inter-pretations.

(i) The inner encoder may be a second (block or convolu-tional) encoder, perhaps endowed with an interleaverto o!er protection against burst errors, consistent withconventional serial concatenated codes [2, 3]. Each in-put configuration ( is mapped to an output configura-tion ). With reference to Figure 4, the rate of the innerencoder is n/l.

(ii) The inner encoder may be a di!erential encoder, inorder to endow the receiver with robustness againstphase ambiguity in the received signal. Since a di!er-ential encoder is a particular case of a rate 1 convolu-tional encoder (with l = n or perhaps l = n+1), thiscase is accommodated by the previous case.

(iii) The inner encoder may represent the convolutional ef-fect induced by a channel whose memory is longerthan the symbol period. In this case, taking into ac-count that the symbols {(i} will have been convertedto antipodal signaling (±1), the baseband channel out-put appears as

vi =0

mhm!2(m!i ! 1

"

7 89 :)i

+bi, (47)

where {hm} denotes the equivalent impulse responseof the baseband model, bi is the additive channel noise,and where vi may be scalar-valued (for a single-inputsingle-output channel) or vector-valued (for a single-input multiple-output channel).

Certainly other interpretations may be developed as well;the above list may nonetheless be considered representativeof some common configurations.

4.1. Optimum decoding

With v denoting the noisy received signal ) (after conversionto antipodal form, possibly corrupted by intersymbol inter-

ference), the optimum decoding metric is again based on thea posteriori marginal probability ratios

Pr!!i = 1|v

"

Pr!!i = 0|v

" =)

!:!i=1 Pr(!|v))!:!i=0 Pr(!|v)

=)

!:!i=1 p(v|!) Pr(!))!:!i=0 p(v|!) Pr(!)

, i = 1, 2, . . . , k.

(48)

If all input configurations are equally probable, we havePr(!) = 1/2k and we recover the maximum-likelihood de-coding rule.

If no interleaver is used between the two coders, then themapping from ! to v is a noisy convolution, allowing a trel-lis structure to perform optimum decoding at a reasonablecomputational cost. In the presence of an interleaver, on theother hand, the convolutional structure between ! and v iscompromised, such that a direct evaluation of (48) leads to acomputational complexity that grows exponentially with theblock length. Iterative decoding, to be reviewed next, repre-sents an attempt to reduce the decoding complexity to a rea-sonable value.

4.2. Iterative decoding for serial concatenated codes

Iterative serial decoding [2] amounts to implementing locallyoptimum decoders which infer ( from v, and then ! from (,and subsequently exchanging information until consensus isreached. Our development emphasizes the external descrip-tions of the local decoding operations in order to better iden-tify the form of consensus that is reached, as well as to justifythe seemingly heuristic coupling between the coders by wayof connections with maximum-likelihood decoding.

Consider first the inner decoding rule, which seeks to de-termine the inner encoder’s input ( = ((1, . . . , (n) from thenoisy received signal v:

Pr!(i = 1|v

"

Pr!(i = 0|v

" =)

(:(i=1 Pr((|v))

(:(i=0 Pr((|v)

=)

(:(i=1 p(v|() Pr(())

(:(i=0 p(v|() Pr((), i = 1, 2, . . . ,n.

(49)

The inner decoder assumes that the a priori probability massfunction Pr(() factors into the product of its marginals as

Pr(() = Pr!(1"

Pr!(2"· · ·Pr

!(n". (50)

This assumption, strictly speaking, is incorrect, because thebits {(i} are produced by the outer encoder, which imposesdependencies between the bits for error control purposes.The forward-backward algorithm from [36], however, can-not exploit these dependencies without incurring a signif-icant increase in computational complexity. By turning a“blind eye” to this fact, and therefore admitting the fac-torization of Pr(() into the product of its marginals, eachterm from the numerator (resp., denominator) of (49) will

770 EURASIP Journal on Applied Signal Processing

contain a factor Pr((i = 1) (resp., Pr((i = 0)), which gives

Pr!(i=1|v

"

Pr!(i=0|v

"

= Pr!(i=1

"

Pr!(i=0

")

(:(i=1 p(v|();

j #=i Pr!( j"

)(:(i=0 p(v|()

;j #=i Pr

!( j"

7 89 :extrinsic information

, i=1, 2, . . . ,n.

(51)

We now let T(() = T1((1) · · ·Tn((n) denote a factorableprobability mass function whose marginal ratios match theextrinsic information values above:

Ti!(i = 1

"

Ti!(i = 0

" =)

(:(i=1 p(v|();

j #=i Pr!( j"

)(:(i=0 p(v|()

;j #=i Pr

!( j" . (52)

The outer decoder would normally aim to determine theinformation bits ! based on an estimate (denoted by E() of theouter encoder’s output, according to the a posteriori proba-bility ratios

Pr!!i = 1|E(

"

Pr!!i = 0|E(

" =)

!:!i=1 Pr(!|E())!:!i=0 Pr(!|E()

=)

!:!i=1 p(E(|!) Pr(!))!:!i=0 p(E(|!) Pr(!)

.

(53)

The estimate E(, however, is not immediately available. If itwere, then each likelihood function evaluation would appearas

p(E(|!) " exp

/!

n0

j=1

1E( j !!2( j(!)! 1

"22

2$2

3, (54)

assuming a Gaussian channel, in which ( j(!) is either 0 or1, depending on ! = (!1, . . . , !k). To each hypothetical bitE( j , therefore, we associate two evaluations as exp[!(E( j ±1)2/(2$2)] (corresponding to ( j(!) = 0 or 1), which areusurped by the two evaluations of Tj(( j) from (52):

expF!!E( j ! 1

"2/!2$2"G

expF!!E( j + 1

"2/!2$2"G &!

Tj(1)Tj(0)

. (55)

The forward-backward algorithm [36] may then run, follow-ing this systematic substitution.

To develop an external description of the decoding algo-rithm which results, we note that this substitution amountsto usurping the likelihood function p(E(|!) by

p(E(|!) &!n4

j=1

Tj!( j(!)", (56)

in which the right-hand side notationally emphasizes thatonly those bit combinations (1, . . . , (n that lie in the outercodebook make sense.

To arrive at a more convenient form, let *(() denote theindicator function for the outer codebook:

*(() =BCD

1 if ( lies in the outer codebook,0 otherwise.

(57)

The 2n configurations of ((1, . . . , (n) generate 2n evaluationsof;n

j=1 Tj(( j), but only 2k of these evaluations survive inthe product *(()

;j Tj(( j), namely, the 2k evaluations from

the right-hand side of (56) which are generated as ! variesover its 2k configurations. We may then establish a one-to-one correspondence between the 2k “surviving” evaluationsin *(()

;j Tj(( j) and the 2k evaluations of the likelihood

function p(E(|!) which are usurped in (56). Assuming thatPr(!) is a uniform distribution, the usurped pseudoposteri-ors from (53) become

)!:!i=1 p(E(|!))!:!i=0 p(E(|!)

&!)

(:(i=1 *(();n

j=1 Tj!( j"

)(:(i=0 *(()

;nj=1 Tj

!( j"

= Ti(1)Ti(0)

)(:(i=1 *(()

;j #=i Tj

!( j"

)(:(i=0 *(()

;j #=i Tj

!( j"

7 89 :extrinsic information

,(58)

in which we note the following:

(i) since the outer code is systematic, the first k bits(1, . . . , (k coincide with the information bits !1, . . . , !k,allowing therefore a direct substitution for the vari-ables of summation. In addition, the formula abovemay be evaluated as written for the parity-check bits(k+1, . . . , (n;

(ii) each term in the numerator (resp., denominator) con-tains a factor Ti((i = 1) (resp., Ti((i = 0)), so that theratio Ti(1)/Ti(0) naturally factors out. Let

U(() = U1!(1"· · ·Un

!(n"

(59)

be a factorable probability function whose marginalratios match the extrinsic information values:

Ui(1)Ui(0)

=)

(:(i=1 *(();

j #=i Tj!( j"

)(:(i=0 *(()

;j #=i Tj

!( j" , i = 1, 2, . . . ,n. (60)

These values may then usurp the a priori probabilityfunction Pr(() of the inner decoder: Pr(() & U(().

If we let a superscript (m) denote an iteration number,then the coupling of the two decoders admits an external de-scription of the form

T(m)i (1)

T(m)i (0)

U (m)i (1)

U (m)i (0)

=)

(:(i=1 p(v|();n

j=1 U(m)j (( j)

)(:(i=0 p(v|()

;nj=1 U

(m)j

!( j" , i = 1, 2, . . . ,n,

(61)

T(m)i (1)

T(m)i (0)

U (m+1)i (1)

U (m+1)i (0)

=)

(:(i=1 *(();n

j=1 T(m)j

!( j"

)(:(i=0 *(()

;nj=1 T

(m)j

!( j" , i = 1, 2, . . . ,n,

(62)

Iterative Decoding of Concatenated Codes: A Tutorial 771

{U(m+1)j (( j)}{U(m)

j (( j)}D

v Innerdecoder

{T(m)j (( j)} Outer

decoder

Pseudoposteriors

Figure 5: Flow graph for iterative decoding of serial concatenatedcodes.

as depicted in Figure 5. A fixed point corresponds toU (m+1)(() = U (m)(() which, in analogy with the parallel con-catenated code case, can be characterized as the following“consensus” property.

Property 2. A fixed point in the serial decoding algorithmoccurs if and only if the two decoders yield the same pseu-doposteriors (left-hand sides of (61) and (62)) for i =1, 2, . . . ,n.

Note that the consensus here covers the information bitsplus the parity-check bits furnished by the outer decoder.As with the parallel concatenated code case, the existence offixed points follows by applying the Brouwer fixed point the-orem (cf. Section 2.3).

4.3. Projection interpretation

The iterative decoding algorithm for serial concatenatedcodes can also be rephrased as an alternating projection al-gorithm, analogously to the parallel concatenated code caseof Section 3, as we develop presently.

We continue to denote by P the set of distributions q(!)which factor into the product of their marginals:

q(() = q1!(1"q2!(2"· · · qn

!(n". (63)

The only modification here is that we now have n marginaldistributions to consider, to account for the k informationbits plus the n!k parity-check bits which intervene in theconsensus of Property 2. If r(() is an arbitrary distribution,then q = %(r) yields a distribution q(() ( P which generatesthe same n marginal distributions as r(().

We let pv denote the vector containing the 2n likelihoodevaluations of p(v|():

pv =

#$$$$$$%

p1

v|( = (0, . . . , 0, 0)2

p1

v|( = (0, . . . , 0, 1)2

...

p1

v|( = (1, . . . , 1, 1)2

&''''''(

<=======>=======?

2n evaluations. (64)

Similarly, let the vectors t(m), u(m), and * collect their respec-

tive 2n evaluations:

t(m) =

#$$$$$%

T(m)1 (0) · · ·T(m)

n (0)T(m)

1 (0) · · ·T(m)n (1)

...T(m)

1 (1) · · ·T(m)n (1)

&'''''(

,

u(m) =

#$$$$$%

U (m)1 (0) · · ·U (m)

n (0)U (m)

1 (0) · · ·U (m)n (1)

...U (m)

1 (1) · · ·U (m)n (1)

&'''''(

,

* =

#$$$$%

*1( = (0, . . . , 0, 0)

2

*1( = (0, . . . , 0, 1)

2

...*1( = (1, . . . , 1, 1)

2

&''''(.

(65)

With respect to the inner decoder, we see that the right-hand side of (61) calculates the marginal ratios of the dis-tribution p(v|()U (m)((), which distribution admits a vectorrepresentation as pv * u(m). The left-hand side of (61) con-tains the marginal ratios of t(m)*u(m) ( P , which agree withthose of pv * u(m), consistent with our projection operation.By applying the same reasoning to (62), we obtain a naturalcounterpart to Proposition 1.

Proposition 2. The iterative serial decoding algorithm of (61)and (62) coincides with the alternating projection algorithm

t(m) * u(m) = %!

pv * u(m)",

t(m) * u(m+1) = %!* * t(m)".

(66)

From this follows a natural analogue to Theorem 2 estab-lishing a key link with maximum-likelihood decoding.

Theorem 3. If p(v|() factors into the product of its marginals,then

(1) the iterative algorithm (61) and (62) converges in a sin-gle iteration;

(2) the pseudoposteriors so obtained agree with the maxi-mum-likelihood decision metric for the code.

The proof parallels that of Theorem 2, but displays itsown particularities which merit its inclusion here. If p(v|()factors into the product of its marginals, then pv ( P , giv-ing pv * u(m) ( P as well. Since the projector behaves as theidentity when applied to elements of P , the first displayedequation of Proposition 2 becomes

t(m) * u(m) = %!

pv * u(m)" = pv * u(m). (67)

From this we identify t(m) = pv for all iterations m, giving afixed point. Substituting t(m) = pv into the projector of the

772 EURASIP Journal on Applied Signal Processing

second displayed equation of Proposition 2 reveals

t(m) * u(m+1) = %!* * t(m)" = %

!* * pv

". (68)

This calculates the marginal functions of *(()p(v|(), whosesurviving evaluations are the restriction of the likelihoodfunction p(v|() to the outer codebook:

*(()p(v|() =Hp!

v|((!)"= p(v|!) if *(() = 1,

0 otherwise.(69)

Since the outer code is systematic, we have (i = !i fori = 1, . . . , k. Therefore, the first k marginal ratios from*(()p(v|() coincide with those from p(v|!); these in turnagree with the maximum-likelihood decoding rule which re-sults from (48) when the a priori probability function Pr(!)is uniform.

As with the case of parallel concatenated codes, the like-lihood function p(v|() will be “close” to a factorable distri-bution when the signal-to-noise ratio is su"ciently high orsu"ciently low. The conclusions from [18, Section 3, Exam-ples 1 and 2] therefore apply to serial concatenated codes aswell.

5. CONCLUDING REMARKS

We have developed a tutorial overview of iterative decod-ing for parallel and serial concatenated codes, in the hopesof rendering this material accessible to a wider audience.Our development has emphasized descriptions and proper-ties which are valid irrespective of the block length, whichmay facilitate the analysis of such algorithms for short blocklengths. At the same time, the presentation emphasizes howdecoding algorithms for parallel and serial concatenatedcodes may be addressed in a unified manner.

Although di!erent properties have been exposed, thecritical question of convergence domains versus code choiceand signal-to-noise ratio remains less immediate to develop.The natural extension of the projection viewpoint favoredhere involves studying the stability properties of the dynamicsystem which results. This is pursued in [18, 29] (among oth-ers) in which explicit expressions for the Jacobian of the sys-tem feedback matrix are obtained; once a fixed point is iso-lated, local stability properties can then be studied [18, 29],but they depend in a complicated manner on the specific

code and channel properties (distance, block length, signal-to-noise ratio, etc.).

One may observe that a fixed point occurs wheneverthe pseudoposteriors assume uniform distributions, and thatthis gives a convergent point in pessimistic signal-to-noiseratios [18]. With some further code constraints [40], fixedpoints are also shown to occur at codeword configurations(i.e., where Ti(1) = !i), consistent with the observed conver-gence behavior for signal-to-noise ratios beyond the water-fall region, and corresponding to an unequivocal fixed pointin the terminology of [18]. Interestingly, the convergence ofpseudoprobabilities to 0 or 1 was observed for low-densityparity-check codes as far back as [6]. Deducing the stabilityproperties of di!erent fixed points versus the signal-to-noiseratio and block length, however, remains a challenging prob-lem.

By allowing the block length to become arbitrarily long,large sample approximations may be invoked, which typi-cally take the form of log-pseudoprobability ratios approach-ing independent Gaussian random variables. Many insight-ful analyses may then be developed (e.g., [15, 16, 17, 19],among others). Such approximations, however, are known tobe less than faithful for shorter block lengths, of greater in-terest in two-way communication systems, and analyses ex-ploiting large sample approximations do not adequately pre-dict the behavior of iterative decoding algorithms for shorterblock lengths.

Graphical methods (including [25, 26, 27, 28]) provideanother powerful analysis technique in this direction. Presenttrends include studying how code design impacts the cyclelength of the decoding algorithm, based on the plausible con-jecture that longer cycles should have a greater “stability mar-gin” in an ultimately closed-loop system. Further study, how-ever, is required to better understand the stability propertiesof iterative decoding algorithms in the general case.

APPENDIX

VERIFICATION OF IDENTITY (37)

Let r(!) be an arbitrary distribution, and let q(!) be itsprojection in P , giving a product distribution q(!) =q1(!1) · · · qk(!k) whose marginals match those of r(!) :qi(!i) = ri(!i). Consider first

D(r+q) =10

!1=0

· · ·10

!k=0

r(!) log2r(!)

q1!!1"· · · qk

!!k"

=10

!1=0

· · ·10

!k=0

r(!) log2 r(!)

7 89 :!H(r)

!10

!1=0

· · ·10

!k=0

r(!) log2

!q1!!1"· · · qk

!!k""

= !H(r) +

*+

10

!1=0

· · ·10

!k=0

r(!) log2 q1!!1"

+ · · · +10

!1=0

· · ·10

!k=0

r(!) log2 qk!!k"

7 89 :(a)

-..

(A.1)

Iterative Decoding of Concatenated Codes: A Tutorial 773

The ith sum from the term (a) appears as

10

!1=0

· · ·10

!i=0

· · ·10

!k=0

r(!) log2 qi!!i"

=10

!i=0

log2 qi!!i"0

j #=i

*+

10

! j=0

r!!1, . . . , !k

"-.

7 89 :ri(!i)=qi(!i)

=10

!i=0

ri!!i"

log2 qi!!i"= !H

!ri"= !H

!qi",

(A.2)

since the sums over bits other than i extract the ith marginalfunction ri(!i), which coincides with qi(!i). Combining withthe previous expression, we see that

D(r+q) =k0

i=1

H!ri"!H(r). (A.3)

Now let s(!) = s1(!i) · · · sk(!k) be an arbitrary productdistribution. The same steps illustrated above give

D(r+s) =10

!1=0

· · ·10

!k=0

r(!) log2r(!)

s1!!1"· · · sk

!!k"

= !H(r)!*+

10

!1=0

· · ·10

!k=0

r(!) log2 s1!!1"

+ · · ·

+10

!1=0

· · ·10

!k=0

r(!) log2 sk!!k"-.

= !H(r)!*+

10

!1=0

r1!!1"

log2 s1!!1"

+ · · ·

+10

!k=0

rk!!k"

log2 sk!!k"-.

= !H(r)!*+

10

!1=0

q1!!1"

log2 s1!!1"

+ · · ·

+10

!n=0

qk!!k"

log2 sk!!k"-..

(A.4)

Adding and subtracting the sums

10

!1=0

q1!!1"

log2 q1!!1"

+ · · · +10

!n=0

qk!!k"

log2 qk!!k"

= !k0

i=1

H!qi",

(A.5)

and regrouping gives

D(r+s)

= !H(r) +k0

i=1

H!qi"

7 89 :D(r+q)

+10

!1=0

q1!!1"

log2q1!!1"

s1!!1" +· · ·+

10

!k=0

qk!!k"

log2qk!!n"

sk!!k"

7 89 :)k

i=1 D(qi+si)=D(q+s)

,

(A.6)

which is the identity (37).

ACKNOWLEDGMENT

This work was supported by the Scientific Services Programof the US Army, Contract no. DAAD19-02-D-0001.

REFERENCES

[1] C. Berrou and A. Glavieux, “Near optimum error correctingcoding and decoding: turbo-codes,” IEEE Trans. Commun.,vol. 44, no. 10, pp. 1261–1271, 1996.

[2] S. Benedetto and G. Montorsi, “Iterative decoding of seriallyconcatenated convolutional codes,” Electronics Letters, vol. 32,no. 13, pp. 1186–1188, 1996.

[3] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Anal-ysis, design, and iterative decoding of double serially concate-nated codes with interleavers,” IEEE J. Select. Areas Commun.,vol. 16, no. 2, pp. 231–244, 1998.

[4] D. J. C. MacKay, “Good error-correcting codes based on verysparse matrices,” IEEE Trans. Inform. Theory, vol. 45, no. 2,pp. 399–431, 1999.

[5] Y. Kou, S. Lin, and M. P. C. Fossorier, “Low-density parity-check codes based on finite geometries: a rediscovery and newresults,” IEEE Trans. Inform. Theory, vol. 47, no. 7, pp. 2711–2736, 2001.

[6] R. G. Gallager, “Low-density parity-check codes,” IRE Trans.Inform. Theory, vol. 8, no. 1, pp. 21–28, 1962.

[7] C. Douillard, M. Jezequel, C. Berrou, P. Picart, P. Didier, andA. Glavieux, “Iterative correction of intersymbol interference:turbo equalization,” European Transactions on Telecommuni-cations, vol. 6, no. 5, pp. 507–511, 1995.

[8] C. Laot, A. Glavieux, and J. Labat, “Turbo equalization: adap-tive equalization and channel decoding jointly optimized,”IEEE J. Select. Areas Commun., vol. 19, no. 9, pp. 1744–1752,2001.

[9] M. Tuchler, R. Kotter, and A. Singer, “Turbo equalization:principles and new results,” IEEE Trans. Commun., vol. 50,no. 5, pp. 754–767, 2002.

[10] X. Wang and H. V. Poor, “Iterative (turbo) soft interferencecancellation and decoding for coded CDMA,” IEEE Trans.Commun., vol. 47, no. 7, pp. 1046–1061, 1999.

[11] X. Wang and H. V. Poor, “Blind joint equalization and mul-tiuser detection for DS-CDMA in unknown correlated noise,”IEEE Trans. Circuits Syst. II, vol. 46, no. 7, pp. 886–895, 1999.

[12] C. Heegard and S. B. Wicker, Turbo Coding, Kluwer AcademicPublishers, Boston, Mass, USA, 1999.

[13] B. Vucetic and J. Yuan, Turbo Codes: Principles and Applica-tions, Kluwer Academic Publishers, Boston, Mass, USA, 2000.

774 EURASIP Journal on Applied Signal Processing

[14] L. Hanzo, T. H. Liew, and B. L. Yeap, Turbo Coding, TurboEqualisation and Space-Time Coding, John Wiley & Sons,Chichester, UK, 2002.

[15] S. ten Brink, “Convergence behavior of iteratively decodedparallel concatenated codes,” IEEE Trans. Commun., vol. 49,no. 10, pp. 1727–1737, 2001.

[16] T. Richardson and R. Urbanke, “An introduction to the analy-sis of iterative coding systems,” in Codes, Systems, and Graphi-cal Models, IMA Volume in Mathematics and Its Applications,pp. 1–37, New York, NY, USA, 2001.

[17] D. Divsalar, S. Dolinar, and F. Pollara, “Iterative turbo de-coder analysis based on density evolution,” IEEE J. Select. Ar-eas Commun., vol. 19, no. 5, pp. 891–907, 2001.

[18] D. Agrawal and A. Vardy, “The turbo decoding algorithm andits phase trajectories,” IEEE Trans. Inform. Theory, vol. 47, no.2, pp. 699–722, 2001.

[19] H. El Gamal and A. R. Hammons Jr., “Analyzing the turbodecoder using the Gaussian approximation,” IEEE Trans. In-form. Theory, vol. 47, no. 2, pp. 671–686, 2001.

[20] M. Moher and T. A. Gulliver, “Cross-entropy and iterative de-coding,” IEEE Trans. Inform. Theory, vol. 44, no. 7, pp. 3097–3104, 1998.

[21] R. Le Bidan, C. Laot, D. LeRoux, and A. Glavieux, “Anal-yse de la convergence en turbo-detection,” in Proc. ColloqueGRETSI sur le Traitement du Signal et des Images (GRETSI’01), Toulouse, France, September 2001.

[22] A. Roumy, A. J. Grant, I. Fijalkow, P. D. Alexander, andD. Pirez, “Turbo-equalization: convergence analysis,” in Proc.IEEE International Conference on Acoustics, Speech, and SignalProcessing (ICASSP ’01), vol. 4, pp. 2645–2648, Salt Lake City,Utah, USA, May 2001.

[23] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networksof Plausible Inference, Morgan Kaufmann Publishers, San Ma-teo, Calif, USA, 1988.

[24] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, “Turbodecoding as an instance of pearl’s “belief propagation” algo-rithm,” IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 140–152, 1998.

[25] R. M. Tanner, “A recursive approach to low complexity codes,”IEEE Trans. Inform. Theory, vol. 27, no. 5, pp. 533–547, 1981.

[26] N. Wiberg, Codes and decoding on general graphs, Ph.D. thesis,Linkoping University, Linkoping, Sweden, April 1996.

[27] F. R. Kschischang and B. J. Frey, “Iterative decoding of com-pound codes by probability propagation in graphical models,”IEEE J. Select. Areas Commun., vol. 16, no. 2, pp. 219–230,1998.

[28] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factorgraphs and the sum-product algorithm,” IEEE Trans. Inform.Theory, vol. 47, no. 2, pp. 498–519, 2001.

[29] T. Richardson, “The geometry of turbo-decoding dynamics,”IEEE Trans. Inform. Theory, vol. 46, no. 1, pp. 9–23, 2000.

[30] M. Luby, M. Mitzenmacher, A. Shokrollahi, and D. Spielman,“Analysis of low density codes and improved designs using ir-regular graphs,” in Proc. 30th Annual ACM Symposium onTheory of Computing, pp. 249–258, Dallas, Tex, USA, 1998.

[31] N. Sourlas, “Spin-glass models as error-correcting codes,” Na-ture, vol. 339, no. 6227, pp. 693–695, 1989.

[32] A. Montanari and N. Sourlas, “Statistical mechanics andturbo codes,” in Proc. 2nd International Symposium on TurboCodes and Related Topics, pp. 63–66, Brest, France, September2000.

[33] S. Ikeda, T. Tanaka, and S. Amari, “Information geometry ofturbo and low-density parity-check codes,” IEEE Trans. In-form. Theory, vol. 50, no. 6, pp. 1097–1114, 2004.

[34] S. Amari and H. Nagaoka, Methods of information geometry,American Mathematical Society, Providence, RI, USA; OxfordUniversity Press, New York, NY, USA, 2000.

[35] S. Benedetto and G. Montorsi, “Unveiling turbo codes: someresults on parallel concatenated coding schemes,” IEEE Trans.Inform. Theory, vol. 42, no. 2, pp. 409–428, 1996.

[36] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decodingof linear codes for minimizing symbol error rate (corresp.),”IEEE Trans. Inform. Theory, vol. 20, no. 2, pp. 284–287, 1974.

[37] J. Hagenauer, E. O!er, and L. Papke, “Iterative decoding ofbinary block and convolutional codes,” IEEE Trans. Inform.Theory, vol. 42, no. 2, pp. 429–445, 1996.

[38] T. L. Saaty and J. Bram, Nonlinear Mathematics, McGraw-Hill, New York, NY, USA, 1964.

[39] T. M. Cover and J. A. Thomas, Elements of Information Theory,John Wiley & Sons, New York, NY, USA, 1991.

[40] P. A. Regalia, “Contractivity in turbo iterations,” in Proc. IEEEInternational Conference on Acoustics, Speech, and Signal Pro-cessing (ICASSP ’04), vol. 4, pp. 637–640, Montreal, Canada,May 2004.

Phillip A. Regalia was born in WalnutCreek, California, in 1962. He received theB.S. (with honors), M.S., and Ph.D. de-grees in electrical and computer engineer-ing in 1985, 1987, and 1988, respectively,from the University of California at SantaBarbara, and the Habilitation a Diriger desRecherches (necessary to advance to thelevel of Professor in the French academicsystem) from the University of Paris-Orsayin 1994. He is presently with the Department of Electrical Engi-neering and Computer Science, the Catholic University of America,Washington, DC, and Editor-in-Chief of the EURASIP Journal ofWireless Communications and Networking.