Berlekamp - Performance of Block Codes

download Berlekamp - Performance of Block Codes

of 6

Transcript of Berlekamp - Performance of Block Codes

  • 8/12/2019 Berlekamp - Performance of Block Codes

    1/6

    JANUARY2002 NOTICES OF THEAMS 17

    The Performance ofBlock Codes

    Elwyn Berlekamp

    In his classic 1948 paper [10], Claude Shannonintroduced the notions of communicationchannels and codes to communicate overthem. During the following two decades, he re-mained active in refining and extending the

    theory. One of Shannons favorite research topicswas the fundamental performance capabilities oflong block codes. In the 1950s and 1960s this topicalso attracted the active involvement of several ofShannons distinguished colleagues both at Bell

    Telephone Laboratories and at MIT, includingP. Elias, R. M. Fano, R. G. Gallager, E.N. Gilbert, R. W.Hamming, I. M. Jacobs, B. Reiffen, D. Slepian, andJ. M. Wozencraft, and several graduate students, in-cluding G. D. Forney and me. The work culminatedin a book by Gallager [6] and in a sequence of twoInformation and Control papers by Shannon, Gal-lager, and Berlekamp [11]. In this article I presentan overview of some of the more salient results andtheir impact.

    Error Exponent Theory

    A discrete memoryless channel has finite input

    and output alphabets related by any given matrixof transition probabilities. A block code of length Nconsists of M codewords, each of which is a se-quence of N symbols selected from the channelsinput alphabet. The rate R of the code in naturalunits is defined as R = (ln M)/N. A message sourceselects one of the M equiprobable codewords andtransmits it over the channel. The received word

    is determined, symbol by symbol, according to the

    row of the transmission matrix specified by the cor-

    responding input symbol. A decoderexamines the

    received word, computes the a posteriori proba-

    bilities of each of the M possible input words, and

    rank orders them. The decoder is considered cor-

    rect if the transmitted codeword is the most likely.

    In a revealing generalization first studied by Elias

    [3], a list of L decoder is considered correct if the

    transmitted codeword appears anywhere among theL most likely choices.

    For any fixed channel, one seeks bounds on

    Pe(N , M , L), the error probabilityof the best code

    of the specified codebook size M and list size L.

    (In the most general case, in which the probability

    of decoding error may depend on which codeword

    is selected, one evaluates the code according to one

    of its worst codewords.) Even today, the best codes

    are known for relatively few channels, and even then

    only for very high or very low coderates and/or rel-

    atively short block lengths. Many of those that are

    known have interesting connections with other

    areas of mathematics.Shannon was interested in getting bounds on the

    behavior of

    Pe(N, exp(NR ), L)

    for fixed L (often L = 1) and fixed R as N goes

    to . For all R less than the channel capacityC, it

    was shown that Pe is an exponentially decreasing

    function ofN. In particular, there is an error ex-

    ponentEL(R) such that Pe < exp(NEL(R)) for all

    sufficiently largeN.

    Elwyn Berlekamp is professor of mathematics at the Uni-

    versity of California at Berkeley. His e-mail address is

    [email protected] .

  • 8/12/2019 Berlekamp - Performance of Block Codes

    2/6

    18 NOTICES OF THEAMS VOLUME49, NUMBER1

    In the 1950s and 1960s, significant efforts weremade to determine the best possible error exponent,which one might hope to define as

    EL(R) = limN

    (1/N) ln Pe(N, exp(NR ), L).

    This definition measures information in naturalunits, called nats. If the natural logarithm is re-

    placed by the base 2 logarithm, then the units areconverted from nats to bits. In either case, EL(R)is also called the channels reliability function.Since it is conceivable that there might be circum-stances under which this limit might not exist, oneinterprets upper and lower bounds on EL(R) aslimsup and liminf.

    Upper bounds on EL(R) correspond to lowerbounds on the probability of decoding error. To ob-tain such a bound for a given list size, a givencode, and a corresponding decoding algorithm,

    one defines M sets of re-ceived words, each set con-

    sisting of all words that arelist-decoded into a particu-lar codeword. One intro-duces an appropriateweighting function on thereceived words and com-putes the volume (totalweight) of each such set andthe volume V of the entirespace of all possible re-ceived words. For suffi-ciently symmetric channels,the weighting function isuniform, and the volume is

    simply a count of the num-ber of points in the relevantset. Since each receivedword can belong to at mostL sets, it follows that theremust be at least one par-ticular codeword whosecorresponding decoding re-gion has volume no greaterthan LV/M. The probabilityof the received word lyingwithin an arbitrary region

    of this given volume is maximal when the region

    is a sphere. That probability can be computed andused to determine the probability of decoding errorof an idealized perfect code, whose decoding re-gions partition the set of all possible received se-quences into perfect spheres.

    This probability of decoding error for perfectcodes can be used to calculate an upper bound onEL(R) . For any fixed positive integer L and anyfixed rate Rbetween 0 and C, this technique yieldsas N goes to an upper bound on EL(R) that isindependent ofL . This bound is called either thevolume bound or the sphere-packing bound. As

    shown in Figure1, this function is typically analytic.

    It is tangent to the R axis at R =C, and tangent tothe Eaxis at R = 0. As we shall soon see, it happensto give the correct values of E(R) for all values ofR between 0 and C, assuming that we interpretE(R) as the limit of EL(R) as L goes to . Sinceeach EL(R) is itself a limit as N goes to , it is im-

    portant that the two limits be taken in the appro-priate order. If L and N were allowed both to goto in such a way that L were an exponentiallygrowing function ofN, we would get a different an-swer.

    The volume bound remains valid even if the

    problem is modified to include a noiseless, delay-less feedback link, which allows the encoder andthe decoder to revise their code depending on pre-viously received symbols.

    Lower bounds on EL(R) correspond to upperbounds on the probability of decoding error. An im-

    portant technique to get such bounds is the ran-dom coding argument introduced in Shannons

    1948 paper, in which all of the MN letters in thecodebook are selected independently at randomfrom the same input distribution, which is deter-mined by a careful optimization that depends crit-

    ically on the statistics of the noisy channel. As L

    goes to , the random bounds approach a limit,which happens to coincide with the volume boundshown in Figure 1. The conclusion is that the vol-ume bound is also the correct value of the func-tion E(R) for all ratesRbetween 0 andC. This isa very remarkable result: when the list size is suf-

    ficiently large, random codes are virtually as goodas perfect codes. Although the proof of this result

    for sufficiently symmetric channels is reasonablystraightforward, it is much more difficult for thegeneral asymmetric case. One problem is that thereliability function itself is not very tractable. The

    most successful approach is Gallagers formulation,which expresses the function in terms of a quan-tity , which turns out to be the negative slope,dE/dR.

    For each finite value of L, there is a corre-sponding critical rate RL such that the random

    coding bound on EL(R) agrees with E(R) for code-rates in the interval RL R < C. For coderates

    belowRL, however, the random coding bound is astraight line of slope L that joins the functionE(R) tangentially at the point R =RL . The randomcoding bounds for L = 1, 2, 3, 4, 5 are shown in Fig-

    ure 2.In most applications, a list of size L = 1 is re-

    quired. The coding theorems show that at suffi-ciently high coderates (Rcrit =R1 R < C), randomcodes are exponentially as good as perfect codes.At lower rates, however, the error exponent for

    random codes becomes a straight line of slope1

    that veers off tangentially from the volume bound.With a list of size L, the error exponent for random

    Figure 1.

  • 8/12/2019 Berlekamp - Performance of Block Codes

    3/6

    JANUARY2002 NOTICES OF THEAMS 19

    codes remains equal to the volume bound foranother interval of rates, RL R , but then it veers

    off tangentially from the volume bound withslope L.

    The limit of E1(R) as R approaches 0 is known

    precisely, and for most channels it is strictly be-tween the random bound and the volume bound.

    An upper bound on E1(0) is provided by

    limM

    limN

    (1/N) ln Pe(N , M , 1).

    For sufficiently symmetric channels, this double

    limit is relatively easy to compute, because for anyfixed Mand very large N, the best codes can be con-structed explicitly. Their key property is that they

    are equidistant, in the sense that the frequency withwhich any particular ordered pair of input symbolsoccurs in one pair of codewords is the same as the

    frequency with which it occurs in any other pair ofcodewords. Equidistant codes also happen to op-timize the above double limit for arbitrary asym-

    metric channels, although that result was rathermore difficult to prove.

    A lower bound on E1(R) that coincides with thecorrect value at R = 0 is obtained by a techniquecalled expurgation. One first selects a code at ran-

    dom. Then one computes the probability of con-fusing each pair of codewords, as if M = 2 andthese two words were the only words in the code.

    A threshold is carefully selected, and any pair ofcodewords whose confusion probability exceedsthis threshold is expurgated, meaning that both of

    those codewords are removed from the code. Thethreshold is chosen so that on the average, at least

    half of the codewords survive the expurgation.What remains is an expurgated random code thatcontains no pair of too-easily confused codewords.As shown in Figure 3, this yields an error exponentEx(R), which improves the random coding boundin the region of rates 0 R < Rx . The Ex(R) curve

    joins the random coding bound tangentially atR =Rx.

    Finally, there is another general result that al-

    lows any upper bound on E1(R) to be joined to anypoint on the volume bound by a straight line bound(also known as the SGB bound). As shown in Fig-

    ure 3, the best such bound is attained by selectingpoints such that the straight line bound is tangentto the volume bound at its high-rate endpoint and

    to a low-rate bound at its low-rate endpoint. InFigure 3, the low-rate bound is the single-pointdouble-limit upper bound on E1(R) described above.

    One communications channel of considerablepractical importance is the binary-input Gaussiannoise channel. This channel has only two inputs,

    which are taken as the real numbers +1 and1. Theinput number is called the signal. The noise istaken as an additive real number having zero mean

    and given variance. The output, also a real

    number, is the sum of thesignal and the noise. Inorder to make the channeldiscrete, the output rangeis commonly partitionedinto a finite number of in-tervals. Usually this is

    done by uniformly quan-tizing the output oversome finite range, allow-ing the extreme intervalsat either end to run to .For example, in the 2-input 8-output version ofthis model, the seven

    breakpoints between dif-ferent output intervals arecommonly set at 1.5,1, .5, 0, +.5, +1, and+1.5.

    More intricate modelsof this channel assumethat the noise is white andthat an appropriate figureof throughput for the en-tire system is the numberof nats/sec or bits/sec.This turns out to be max-imized by making the timeinterval for each trans-mitted bit very small, eventhough this yields a dis-crete memoryless channelthat is very noisy and that

    has a very small capacityper transmitted bit.

    This motivation led tothe study of very noisychannels, whose proba-

    bility transition matricescould be written asPi|j =Pi (1 + i,j) , wherei,j is small for all i and j.Reiffen [9] showed thatthe error exponent for anysuch channel depends ononly one parameter, thecapacity C. As shown inFigure 4, the graph ofE(R) is a parabola thatattains a slope of1 atthe critical rate R1 =C /4.At rate R = 0, the averagedistance upper bound co-incides with the randomcoding bound. So for such channels, expurgationgains nothing, and the random coding bound is op-timal at all rates between 0 and the capacityC. Itcoincides with the straight-line SGB bound at rates

    Figure 2.

    Figure 3.

  • 8/12/2019 Berlekamp - Performance of Block Codes

    4/6

    20 NOTICES OF THEAMS VOLUME49, NUMBER1

    0< R < C/4 and with the volume bound at rates

    C/4< R < C.

    In general, sophisticated codes attain reliabil-ity at the cost of code redundancy and digital

    electronics required to implement the coding and

    decoding algorithms. Often the greatest potential

    system gains are attained by designing a channel

    that is too noisy to use without coding. Such a

    channel allows such significant benefits as lower

    power, faster speeds, and perhaps greater stor-

    age density. The local cost is a much higher raw

    error rate, which is eliminated from the system

    by the coding.

    Many early communication and memory sys-tems were designed and deployed before sophis-ticated coding was practical. Others were designedand deployed before the benefits of sophisticatedcoding were understood and appreciated by the sys-tem designers. In either case, such systems worked

    by placing heavy investments in power and in ana-

    log equipment, and by constraining the users to rel-atively low throughput and often only marginallyacceptable error-performance as well. When usersof such systems expressed willingness to makefurther sacrifices in throughput in order to gain bet-ter error-performance, the coding engineers andtheorists were presented with very quiet chan-nels. Most such channels turned out to be highlysymmetric. This is fortunate because, to this day,relatively little is known about very quiet asym-metric channels.

    Perhaps the most-studied channel is the binarysymmetric channel. It is a special case (with q = 2)of the q-ary symmetric channel, which has q inputsand q outputs. Each input goes to its correspond-ing output with probability 1 p. The parameter p,which is the probability of symbol error, is the samefor all inputs. All kinds of symbol errors are equallylikely, each occurring with probability p/(q 1). Asp approaches (q 1)/q , the channel becomes verynoisy, and its capacity approaches 0. But as for allvery noisy channels, when properly normalized, thereliability approaches the function shown in Fig-ure 4. On the other hand, when p approaches 0, thecapacity approaches ln q bits/symbol. Since Rcritand Rxboth approach the capacity C, the high-rateregion above Rx disappears. Random codes be-

    come uniformly bad, because the reliability is dom-inated by minimum distance considerations ratherthan by volume considerations. Here minimumdistance is the smallest number of input symbolsin which any pair of codewords differ. The relia-

    bility approaches at all coderates between 0 andthe capacity C, and the rate at which it ap-proaches is proportional to ln p. So the inter-esting function to study is the normalized relia-

    bility, defined as

    e(R) =E1(R)/( ln p).

    For the binary symmetric channel, N e(R) is thebest possible minimum distance of any block codeof the specified (long) length and rate. From themid-1950s through the mid-1970s, the classic

    bounds on e(R) shown for the binary symmetricchannel in Figure 5 were the best results known.Results for the more general symmetric q-ary chan-nels were similar. But eventually, both the upperand lower bounds on e(R) were partially improved.The first improvement on the Elias bound, at suf-ficiently small positive rates, was by Welch,McEliece, and Rumsey [15]. The first improvementon the Gilbert bound, for q 49, was due to

    Figure 4.

    Figure 5.

  • 8/12/2019 Berlekamp - Performance of Block Codes

    5/6

    JANUARY2002 NOTICES OF THEAMS 21

    Tsfasman, Vladut, and Zink [13]. The latter resultwas a novel application of algebraic geometry. De-spite further improvements, a gap still remains

    between the improved bounds.

    Algebraic Coding

    Even before Shannons retirement, many engineers,

    computer scientists, and mathematicians were cre-ating innovative algorithms to specify and imple-ment feasible coding strategies for particular noisychannels. The first such were the single-bit-error-correcting binary Hamming codes. Shannon men-tioned the binary Hamming code with 24 code-words of length n = 7 as an example in his 1948paper. Binary Hamming codes with 232 or 264 code-words later became very widely used in computermemories.

    In 1960, Bose-Chaudhuri-Hocquenghem andReed-Solomon applied the theory of finite fields toconstruct error-correcting codes. Although subse-quently seen to be special cases of a common gen-eralization, the initial BCH codes were binary, whilethe initial RS codes used a large prime-power al-phabet of size q =N + 1. The RS codes proved di-rectly suitable for the original q-ary symmetricchannel and for several of its close relatives withq symmetric inputs but with more outputs. Onesuch channel has an additional (q + 1)st erasureoutput, accessible from all q inputs with the sametransition probability. Another such channel has 2qoutput symbols, each pair of which corresponds toa more reliable and a less reliable estimate ofeach input signal.

    The most widespread application of RS codes has

    been to a bursty (rather than a memoryless) ver-sion of the binary symmetric channel, the bit errorprobability varying with time. Any sequence of bitscan be partitioned into m-bit characters, each ofwhich can be viewed as a symbol in a finite fieldof order q = 2m. This is particularly appropriate forcertain binary channels in which errors tend tocome in short bursts of lengths comparable to m.For such channels, doubly-interleaved RS codesare superior to multiply-interleaved binary BCHcodes in numerous respects. They have superiorperformance. Surprisingly, they are easier to en-code. They are also easier to decode. Partly be-cause of this, and partly because of unusually fastand efficient decoding algorithms now associatedwith them, RS codes with high coderates have be-come very widely used in many magnetic and op-tical storage systems.

    The emphasis on high coderates arises from acoincidence of several factors. On the system side,considerations of factors such as bit synchroniza-tion, latency, and delay make high coderates at-tractive. On the coding side, all of the costs asso-ciated with RS codes grow linearly or as the squareof the redundancy, (ln q R)N, and these costs

    become very small when R is large. There is also aperformance factor, because traditional RS de-coding algorithms perform only as block codeswith list size L = 1.

    Sudan [12] introduced an innovative method todecode RS codes with low coderates using lists ofsize greater than 1. He also applied these methods

    to random proof-checking problems in theoreticalcomputer science. When it is feasible to trade anincrease in decoding complexity for improved per-formance, Sudans algorithm can be used to attainmuch higher reliability for low-rate RS codes evenwhen L = 1. One first uses Sudans algorithm likea search committee, to reduce a large pool of can-didates down to a short list. In many applications,there is auxiliary information available that allowsanother part of the decoder to scrutinize each ofthese candidates more carefully before making thefinal selection.

    Convolutional Codes

    Originally, Shannon used the block length N as arough measure of the delay and complexity of a cod-ing/decoding system. Many coding theorists havelong sought to refine this measure. In the 1950s,Elias introduced binary convolutional codes inwhich message and check bits were intermingledin the transmitted data stream, and in which eachcheck bit was constrained to depend on only Npriormessage bits. This approach replaced the blocklength with a constraint length and led Wozen-craft, Fano, Jacobs-Berlekamp, and others to studysequential decoding algorithms. Viterbi [14] pre-sented a very innovative approach showing that

    maximum-likelihood sequential decoding was fea-sible on sufficiently short convolutional codes andthat the performance was adequate for many pur-poses, particularly on the binary-input Gaussiannoise channel. This led to widespread use of Viterbidecoding on many communications channels. In themid-1980s, NASA adopted standards for deep-space communications that required a concatena-tion of two codes: an inner Viterbi code of code-rate 1/2, and an outer RS code of coderate 223/255or 239/255. The Viterbi code converts a rathernoisy memoryless Gaussian channel into a rather

    bursty binary symmetric channel, on which the RScode then attains a very low probability of decod-ing error.

    Low Density Parity Check Codes

    In 1962, Gallager [5] introduced low density par-ity check codes as a way to get some of the ad-vantages of long block codes while constraining thecost of complexity. Although the topic then re-mained relatively dormant for nearly twenty years,the phenomenal decrease in the cost of memory leda number of computer scientists and engineers torevisit or rediscover and improve this approach in

  • 8/12/2019 Berlekamp - Performance of Block Codes

    6/6

    22 NOTICES OF THEAMS VOLUME49, NUMBER1

    the early 1980s. Gallagers methods had been ran-dom, but the modern resurgence of interest in thisarea has focussed more attention on specific con-structions using expander graphs such as thoseconstructed by Margulis [7] and by Lubotzky,Phillips, and Sarnak. This construction is a novelapplication of modular forms.

    Conclusion

    Most mathematicians are probably aware of namessuch as Conway, Margulis, and Sarnak, but they maynot be aware of the enormous impact the codingwork inspired by Shannon has had in other areas.Forney, Ungerboeck, and others used increasinglysophisticated coding techniques to improve speedsover local phone lines by almost two orders ofmagnitude, well beyond crude early estimates ofcapacity. Jacobs and Viterbi founded a companycalled Qualcomm, which dominates the intellectualproperty in the cell phone industry. Today, Reed-Solomon codes are used in most magnetic disks and

    optical CDs. The large part of Bell Telephone Labsthat stayed with AT&T at the time of the Lucentspinoff was renamed Shannon Labs. Statues ofShannon have been dedicated in his hometown ofGaylord, Michigan, and at both of his alma maters:University of Michigan and MIT, and at AT&TsShannon Labs and at Lucents Bell Labs, and at theCenter for Magnetic Recording Research of the Uni-versity of California at San Diego.

    References

    [1] ELWYN R. BERLEKAMP, Algebraic coding theory, McGraw-

    Hill, 1968, revised edition, Aegean Park Press, 1984.

    [2] J. H. CONWAY and N. J. A. SLOANE, Sphere packings, lat-

    tices and groups, third edition, Springer, New York,

    1999.

    [3] PETER ELIAS, List decoding for noisy channels, Re-

    search Laboratory of Electronics Report Number

    335, Massachusetts Institute of Technology, 1957.

    [4] G. DAVID FORNEY JR., Concatenated codes, M.I.T. Press,

    Cambridge, Mass., 1966.

    [5] ROBERT G. GALLAGER, Low-density parity-check codes,

    M.I.T. Press, Cambridge, Mass., 1963.

    [6] , Information theory and reliable communica-

    tion, Wiley, New York, 1968.

    [7] G. A. MARGULIS, Explicit constructions of expanders,

    Problemy Peredachi Informatsii9 (1973) 7180; Eng-

    lish translation in Problems of Information Trans-

    mission9 (1973), no. 4, 325332 (1975).[8] I. S. REED and G. SOLOMON, Polynomial codes over cer-

    tain finite fields,J. Soc. Indust. Appl. Math.8 (1960)

    300304.

    [9] BARNEY REIFFEN, A note on very noisy channels, In-

    formation and Control6 (1963) 126130.

    [10] C. E. SHANNON, A mathematical theory of communi-

    cation, Bell System Tech. J. 27 (1948) 379423,

    623656.

    [11] C. E. SHANNON, R. G. GALLAGER, and E. R. BERLEKAMP,

    Lower bounds to error probability for coding on dis-

    crete memoryless channels, I and II, Information and

    Control10 (1967) 65103, 522552.

    [12] MADHU SUDAN, Decoding of Reed-Solomon codes be-

    yond the error-correction bound, J. Complexity13

    (1997), no. 1, 180193.

    [13] M. A. TSFASMAN, S. G. VLADUT , and TH. ZINK, Modular

    curves, Shimura curves, and Goppa codes, better

    than Varshamov-Gilbert bound, Math. Nachr.109

    (1982) 2128.

    [14] ANDREW J. VITERBI, Error bounds for convolutional

    codes and an asymptotically optimum decoding al-gorithm, IEEE Trans. Information TheoryIT-13 (1967)

    260269.

    [15] LLOYD R. WELCH, ROBERT J. MCELIECE, and HOWARD RUM-

    SEY JR., A low-rate improvement on the Elias bound,

    IEEE Trans. Information TheoryIT-20 (1974),

    676678.