4 Classical Error Correcting Codes - University of Central...

“Machines should work. People should think.” Richard Hamming.

4 Classical Error Correcting Codes

Coding theory is an application of information theory critical for reliable communication andfault-tolerant information storage and processing; indeed, Shannon channel coding theoremtells us that we can transmit information on a noisy channel with an arbitrarily low probabilityof error. A code is designed based on a well-defined set of specifications and protects theinformation only for the type and number of errors prescribed in its design. A good code:

(i) Adds a minimum amount of redundancy to the original message.

(ii) Efficient encoding and decoding schemes for the code exist; this means that informationis easily mapped to and extracted from a codeword.

Reliable communication and fault-tolerant computing are intimately related to each other;the concept of a communication channel, an abstraction for a physical system used to transmitinformation from one place to another and/or from one time to another is at the core ofcommunication, as well as, information storage and processing. It should however be clear

355

Basic Concepts

Linear Codes

Basic Concepts

Linear CodesPolynomial

Codes

Other Codes

Figure 96: The organization of Chapter 4 at a glance.

that the existence of error-correcting codes does not guarantee that logic operations can beimplemented using noisy gates and circuits. The strategies to build reliable computing systemsusing unreliable components are based on John von Neumann’s studies of error detection anderror correction techniques for information storage and processing.

This chapter covers topics from the theory of classical error detection and error correctionand introduces concepts useful for understanding quantum error correction, the subject ofChapter 5. For simplicity we often restrict our discussion to the binary case. The organizationof Chapter 4 is summarized in Figure 96. After introducing the basic concepts we discusslinear, polynomial, and several other classes of codes.

356

4.1 Informal Introduction to Error Detection and Error Correction

Error detection is the process of determining if a message is in error; error correction isthe process of restoring a message in error to its original content. Information is packagedinto codewords and the intuition behind error detection and error correction is to artificiallyincrease some “distance” between codewords so that the errors cannot possibly transform onevalid codeword into another valid codeword.

Error detection and error correction are based on schemes to increase the redundancy ofa message. A crude analogy is to bubble wrap a fragile item and place it into a box to reducethe chance that the item will be damaged during transport. Redundant information plays therole of the packing materials; it increases the amount of data transmitted, but it also increasesthe chance that we will be able to restore the original contents of a message distorted duringcommunication. Coding corresponds to the selection of both the packing materials and thestrategy to optimally pack the fragile item subject to the obvious constraints: use the leastamount of packing materials and the least amount of effort to pack and unpack.

Example. A trivial example of an error detection scheme is the addition of a parity checkbit to a word of a given length.

This is a simple scheme but very powerful; it allows us to detect an odd number of errors,but fails if an even number of errors occur. For example, consider a system that enforces evenparity for an eight-bit word. Given the string (10111011), we add one more bit to ensurethat the total number of 1s is even, in this case a 0, and we transmit the nine-bit string(101110110). The error detection procedure is to count the number of 1s; we decide that thestring is in error if this number is odd. When the information symbols are embedded into thecodeword, as in this example, the code is called a systematic code; the information symbolsare scattered throughout the codeword of a non-systematic code.

This example also hints to the limitations of error detection mechanisms. A code isdesigned with certain error detection or error correction capabilities and fails to detect, or tocorrect error patterns not covered by the original design of the code. In the previous examplewe transmit (101110110) and when two errors occur, in the 4-th and the 7-th bits we receive(101010010). This tuple has even parity (an even number of 1’s) and our scheme for errordetection fails.

Error detection and error correction have practical applications in our daily life. Errordetection is used in cases when an item needs to be uniquely identified by a string of digitsand there is the possibility of misidentification. Bar codes, also called Universal ProductCodes (UPC), the codes used by the postal service, the American banking system, or by theairlines for paper tickets, all have some error detection capability. For example, in the caseof ordinary bank checks and travellers checks there are means to detect if the check numbersare entered correctly during various financial transactions.

Example. The International Standard Book Number (ISBN) coding scheme. The ISBN codeis designed to detect any single digit in error and any transposition of adjacent digits. TheISBN number consists of 10 digits:

ISBN → d1 d2 d3 d4 d5 d6 d7 d8 d9 d10

357

Here d10 is computed as follows:

A = w1d1 + w2d2 + w3d3 + w4d4 + w5d5 + w6d6 + w7d7 + w8d8 + w9d9 mod 11

with w1 = 10, w2 = 9, w3 = 8, w4 = 7, w5 = 6, w6 = 5, w7 = 4, w8 = 3, w9 = 2. Then:

d10 =

{11 − A if 2 ≤ A ≤ 10X if A = 1

For example, consider the book with ISBN 0 − 471 − 43962 − 2. In this case:

A = 10 × 0 + 9 × 4 + 8 × 7 + 7 × 1 + 6 × 4 + 5 × 3 + 4 × 9 + 3 × 6 + 2 × 2 mod 11

A = 36 + 56 + 7 + 24 + 15 + 36 + 18 + 4 mod 11 = 196 mod 11 = 9.

d10 = 11 − A = 2.

The code presented above detects any single errors because the weights are relatively primewith 11. Indeed, if digit di is affected by an error and becomes di + ei then

∀ei wi(di + ei) mod 11 �= widi mod 11

because wiei �= 0 mod 11, wi is relatively prime to 11. The code also detects any transpo-sitions of adjacent digits. Indeed, the difference of two adjacent weights wi − wi+1 = 1 anddi − di+1 < 11, thus it cannot be a multiple of 11. We conclude that the ISBN code detectsany single digit in error and any transposition of adjacent digits.

Assume that we wish to transmit through a binary channel a text from a source A withan alphabet with 32 letters. We can encode each one of the 32 = 25 letters of the alphabet asa 5-bit binary string and we can rewrite the entire text in the binary alphabet accepted bythe channel. This strategy does not support either detection, or correction of transmissionerrors because all possible 5-bit combinations are exhausted to encode the 32 letters of thealphabet; any single bit error will transform a valid letter into another valid letter. However,if we use more than five bits to represent each letter of the alphabet, we have a chance tocorrect a number of single-bit errors.

Adding redundancy to a message increases the amount of information transmitted througha communication channel. The ratio useful information versus total information packed ineach codeword defines the rate of a code, see Section 4.2.

Example. We now consider a very simple error correcting code, a “repetitive” code. Insteadof transmitting a single binary symbol, 0 or 1, we map each bit into a string of length n ofbinary symbols:

0 �→ (00 . . . 0) and 1 �→ (11 . . . 1).

If n = 2k + 1 we use a majority voting scheme to decode a codeword of n bits. We count thenumber of 0s; if this number is larger than k we decode the codeword as 0, else we decode itas 1.

Consider a binary symmetric channel with random errors occurring with probability p andcall pn

err the probability that an n bit codeword of this repetitive code is in error; then

358

pnerr = 1 − (1 − p)k+1

[(1 − p)k +

(2k + 1

1

)p(1 − p)k−1 + · · · +

(2k + 1

k

)pk(1 − p)0

].

Indeed, the probability of error is the probability that (k + 1) or more bits are in error whichin turn is equal to one minus the probability that at most k bits are in error.

As we can see the larger is k, the smaller is the probability of error; this encoding schemeis better than sending a single bit as long as pn

err < p. For example, when n = 3 and k = 1we encode a single bit as follows: 0 �→ (000) and 1 �→ (111). Then the probability of error is

p3err = 1 − (1 − p)2[(1 − p)1 +

(3

1

)p(1 − p)0] = 3p2 − 2p3.

The encoding scheme increases the reliability of the transmission when p < 1/2.

4.2 Block Codes. Decoding Policies

In this section we discuss information transmission through a classical channel using codewordsof fixed length. The channel alphabet, A, is a set of q symbols that could be transmittedthrough the communication channel; when q = 2 we have the binary alphabet A = {0, 1}.The source encoder maps a message into blocks of k symbols from the code alphabet; then thechannel encoder adds r symbols to increase redundancy and transmits n-tuples or codewordsof length n = k + r, as shown in Figure 74.

At the destination, the channel decoder maps n-tuples into k-tuples, and the decoderreconstructs the message. The quantity r = n − k > 0 is called the redundancy of thecode. The channel encoder adds redundancy, and this leads to the expansion of a message.While the added redundancy is desirable from the point of view of error control, it decreasesthe efficiency of the communication channel by reducing its effective capacity. The ratio k

n

measures the efficiency of a code. First, we give several definitions related to block codes andthen discuss decoding policies.

A block code C of length n over the alphabet A with q symbols is a set of M n-tuples,where each n-tuple (codeword) uses symbols from A. Such a block code is also called an[n,M ] code over A. The codewords of an [n,M ] code C over A are a subset, M , of all n-tuples over A, with M < qn. Given an [n,M ] code C that encodes k-tuples into n-tuples therate of the code is R = k/n.

We now introduce several metrics necessary to characterize the error detecting and theerror correcting properties of a code C with symbols from the alphabet A.

Given two n-tuples vi, vj, the Hamming distance d(vi, vj) is the number of coordinate posi-tions in which the two n-tuples differ. The Hamming distance is a metric; indeed ∀vi, vj, vk ∈ Cwe have:

1. d(vi, vj) ≥ 0, with equality if and only if vi = vj.

359

2. d(vi, vj) = d(vj, vi).

3. d(vi, vj) + d(vj, vk) ≥ d(vi, vk) (triangle inequality).

The proof of this proposition is left as an exercise.

The Hamming distance of a code is the minimum distance between any pairs of codewords

d = min{d(ci, cj) : ∀ci, cj ∈ C, ci �= cj}.To compute the Hamming distance for an [n,M ] code C, it is necessary to compute the distancebetween

(M2

)pairs of codewords and then to find the pair with the minimum distance. A

block code with the block length n, with k information symbols and with distance d will bedenoted as [n, k, d]; if the alphabet A has q symbols then M = qk.

Given a codeword c ∈ C the Hamming weight, w(c), is the number of non-zero coordinatesof the codeword c. Let ci, 1 ≤ i ≤ M , be the codewords of C, an [n,M ] code, and let S bethe set of all n-tuples over the alphabet of C; the Hamming sphere of radius d around thecodeword ci consists of all n-tuples c within distance d of the codeword ci, the center of theHamming sphere

Sci= {c ∈ S : d(c, ci) ≤ d}

Figure 97 shows the two Hamming spheres of radius 1 around the 3-tuples 000 and 111.

Figure 97: The Hamming sphere of radius 1 around 000 contains {000, 100, 010, 001} and theHamming sphere of radius 1 around 111 contains {111, 011, 101, 110}.

Example. The Hamming distance of two codewords. Consider the binary alphabet {0, 1} andlet the two codewords be vi = (010110) and vj = (011011). The Hamming distance betweenthe two codewords is d(vi, vj) = 3. Indeed, if we number the bit position in each n-tuple fromleft to right as 1 to 6, the two n-tuples differ in bit positions 3, 4, and 6.

Example. The Hamming distance of a code. Consider the code C = {c0, c1, c2, c3} wherec0 = (000000), c1 = (101101), c2 = (010110), c3 = (111011). This code has distance d = 3.Indeed d(c0, c1) = 4, d(c0, c2) = 3, d(c0, c3) = 5, d(c1, c2) = 5, d(c1, c3) = 3, d(c2, c3) = 4.

360

Example. The Hamming sphere of radius d = 1 around the codeword (000000) consists ofthe center, (0000000), and all binary six-tuples, that differ from (000000) in exactly one bitposition, {(000000), (100000), (010000), (001000), (000100), (000010), (000001)}. In this caseone can construct a diagram similar with the one in Figure 97 where the 6-tuples are thevertices of a hypercube in a space with 6 dimensions.

Decoding policy. Given an [n,M ] code C with distance d, the decoding policy tells thechannel decoder what actions to take when it receives an n-tuple v. This policy consists oftwo phases:

(1) The recognition phase when the received n-tuple v is compared with all the codewords ofthe code until either a match is found, or we decide that v is not a codeword.

(2) The error correction phase, when the received n-tuple v is mapped into a codeword.

The actions taken by the channel decoder are:

(i) If v is a codeword, conclude that no errors have occurred and decide that the codewordsent was c = v.

(ii) If v is not a codeword, conclude that errors have occurred and either correct v to acodeword c, or declare that correction is not possible.

We observe that any decoding policy fails when c ∈ C, the codeword sent, is affected byerrors and transformed into another valid codeword, c′ ∈ C. This is a fundamental problemwith error detection and we shall return to it later. Once we accept the possibility that thechannel decoder may fail to properly decode in some cases, our goal is to take the course ofaction with the highest probability of being correct. Now we discuss three decoding policies.

I. Minimum distance or nearest neighbor decoding. If an n-tuple v is received, and thereis a unique codeword c ∈ C such that d(v, c) is the minimum over all c ∈ C, then correct v asthe codeword c. If no such c exists, report that errors have been detected, but no correction ispossible. If multiple codewords are at the same minimum distance from the received codewordselect at random one of them and decode v as that codeword.

II. Bounded distance decoding. This decoding policy requires that all patterns of at most eerrors and no other errors are corrected. When bounded distance decoding is used and morethan e errors occur, an n-tuple v is either not decoded, or decoded to a wrong codeword; thefirst case will be called a decoding failure and the second one a decoding error. It is easy tosee that if

e = d − 1

2

the bounded distance decoding is identical to minimum distance decoding.Recall from Section 3.7 that in case of a binary symmetric channel the errors occur with

probability p and are mutually independent. Given an [n, k] linear code operating on a binarysymmetric channel the probabilities of decoding failure, Pfail, and of decoding error, Perr,when at most e errors occur are

361

Pfail = 1 −e∑

i=0

(n

i

)pi(1 − p)n−i =

n∑i=e+1

(n

i

)pi(1 − p)n−i

and

Perr =∑w>0

w+e∑i=w−e

e∑j=0

AwT (i, j, w)pi(1 − p)n−i.

In this expression Aw is the number of codewords of code C of weight equal to w and T (i, j, w)is the number of n-tuples at distance i from a codeword with weight w and at distance j fromthe decoded codeword; e is the number of errors.

It is easy to prove that given a binary [n, k] linear code C, T (i, j, w) is given by the followingexpression:

T (i, j, w) =

(w

i − k

)(n − w

k

)

with k = 1/2(i + j − w) when (i + k − w) is even and when w − j ≤ i ≤ w + j; otherwiseT = 0.

III. Maximum likelihood decoding. Under this decoding policy, of all possible codewordsc ∈ C the n-tuple v is decoded to that codeword c which maximizes the probability P (v, c)that v is received, given that c is sent.

Proposition. If v is received at the output of a q-ary symmetric channel and if d1 ≤ d2 withd1 = d(v, c1) and d2 = d(v, c2), then

P (v, c1) ≥ P (v, c2) if and only if p <(q − 1)

q.

In other words, if p < (q − 1)/q the maximum likelihood decoding policy is equivalent to thenearest neighbor decoding; a received vector is decoded to the codeword “closest” to it, interms of Hamming distance.

Without loss of generality we assume that d1 ≤ d2 and that

P (v, c1) > P (v, c2).

It follows that

(1 − p)n−d1

(p

q − 1

)d1

> (1 − p)n−d2

(p

q − 1

)d2

(1 − p)d2−d1 >

(p

q − 1

)d2−d1

and (p

(1 − p)(q − 1)

)d2−d1

< 1.

362

If d1 = d2, this is false, and in fact, P (v, c1) = P (v, c2). Otherwise, d2 − d1 ≥ 1 and theinequality is true if and only if

p

(1 − p)(q − 1)< 1, i.e., p <

(q − 1)

q.

In conclusion, when the probability of error is p < (q − 1)/q and we receive the n-tuple v, wedecide that codeword c at the minimum distance from v is the one sent by the source.

Example. Maximum likelihood decoding. Assume that the probability of error is p = 0.15.Let the binary code C be C = {(000000), (101100), (010111), (111011)}. When we receive the6-tuple v = (111111) we decode it as (111011) because the probability P ((v), (111011)) is thelargest. Indeed,

P (v, (000000)) = (0.15)6 = 0.000011

P (v, (101100)) = (0.15)3 × (0.85)3 = 0.002076

P (v, (010011)) = (0.15)3 × (0.85)3 = 0.002076

P (v, (111011)) = (0.15)1 × (0.85)5 = 0.066555

A code is capable of correcting e errors if the channel decoder is capable of correcting anypattern of e or fewer errors, using the algorithm above.

4.3 Error Correcting and Detecting Capabilities of a Block Code

The intuition behind error detection is to use a small subset of the set of n-tuples over analphabet A as codewords of a code C and detect an error whenever the n-tuple receivedis not a codeword. To detect errors we choose the codewords of C so that the Hammingdistance between any pair of codewords is large enough and one valid codeword cannot betransformed by errors into another valid codeword. The Hamming distance of a code allowsus to accurately express the capabilities of a code to detect and to correct errors. Severalpropositions quantify the error detection and error correction properties of a code.

Proposition. Let C be an [n, k, d] code with an odd distance, d = 2e + 1. Then C can correcte errors and can detect 2e errors.

Proof. We first show that S(e)ci and S

(e)cj , the Hamming spheres of radius e around two distinct

codewords ci �= cj ∈ C are disjoint, S(e)ci ∩S

(e)cj = Ø. Assume by contradiction that there exists

a codeword v ∈ C with the property that v ∈ S(e)ci ∩S

(e)cj and that d(v, ci) ≤ e and d(v, cj) ≤ e.

The triangle inequality requires that

d(ci, v) + d(v, cj) ≥ d(ci, cj) =⇒ 2e ≥ d(ci, cj).

But the distance of the code is d = 2e + 1 so d(ci, cj) ≥ 2e + 1. This is a contradiction and

we conclude that S(e)ci ∩ S

(e)cj = Ø.

363

Therefore, when the codeword ci is transmitted and t ≤ e errors are introduced, thereceived n-tuple, v is inside the Hamming sphere S

(e)ci and ci is the unique codeword closest

to v. The decoder can always correct any error pattern of this type.If we use the code C for error detection, then at least 2e + 1 errors must occur in any

codeword c ∈ C to transform it into another codeword c′ ∈ C. If at least 1 and at most 2eerrors are introduced, the received n-tuple will never be a codeword and error detection isalways possible. �

The distance of a code may be even or odd. The case for even distance is proved in asimilar manner. Throughout this chapter we use the traditional notation a for the largestinteger smaller than or equal to a.

Proposition. Let C be an [n, k, d] code. Then C can correct (d−1)2

errors and can detectd − 1 errors.

c1

d(c1,c2) > 2e

d(c3,c4 ) = 2e+1

cu

c2

c3c4

cv

cw

Sc1 Sc2

Sc3Sc4

Figure 98: A code C = {c1, c2, c3, c4, ..} with minimum distance d = 2e + 1 is used for errordetection as well as for error correction. The Hamming spheres Sc1 , Sc2 , Sc3 , and Sc4 arecentered around the codewords c1, c2, c3, and c4 respectively. The decoder identifies cu, cv,and cw as errors; one of them is corrected, the other two are not. cu, received when the c1

was sent, is correctly decoded as c1 because it is located inside the Hamming sphere aroundcodeword c1, d(c1, cu) < e. The decoder fails to decode cv, received when c2 was sent, becaused(c2, cv) > e and cv is not inside any of the Hamming spheres. The decoder incorrectly decodescw, received when c3 was sent, as c4 because d(c3, cw) = e + 1 and d(c4, cw) ≤ e.

A geometric interpretation of this proposition is illustrated in Figure 98. The [n, k, d] codeC = {c1, c2, c3, c4, ..} with distance d = 2e + 1 is used for error correction as well as for errordetection. The Hamming spheres of radius e around all codewords do not intersect becausethe minimum distance between any pair of codewords is 2e + 1. According to the channel

364

decoding rules, any n-tuple t in the Hamming sphere of radius e around codeword c1 is decodedas the center of the Hamming sphere c1. If at most 2e errors occur when the codeword ci istransmitted, the received n-tuple, v cannot be masquerading as a valid codeword cj since thedistance between ci and cj is at least 2e + 1.

From Figure 98 we see that there are n-tuples that are not contained in any of the Hammingspheres around the codewords of a code C. If an n-tuple v is received and the decoder cannotplace it in any of the Hamming spheres, then the decoder knows that at least e + 1 errorshave occurred and no correction is possible.

If a code C with distance 2e+1 is used for error detection as well as error correction somepatterns of less than 2e errors could escape detection. For example, patterns of e + 1 errorsthat transform a codeword c3 into an n-tuple, cw, inside the Hamming sphere Sc4 , force thedecoder to correct the received n-tuple to c4. In this case the (e + 1)-error pattern causes afaulty decoding.

In conclusion, when d = 2e + 1 the code C can correct e errors in general, but is unable todetect additional errors at the same time. If the distance of the code is an even number, thesituation changes slightly according to the following proposition.

Proposition. Let C be an [n, k, d] code with d = 2e. Then C can correct e − 1 errors and,simultaneously, detect e errors.

Proof. According to a previous proposition the code C can correct up to

d − 1

2 = 2e − 1

2 = e − 1

2 = e − 1

errors. Since the Hamming spheres around codewords have radius e − 1, any pattern of eerrors cannot take a codeword into a an n-tuple contained in some Hamming sphere aroundanother codeword. Otherwise, the codewords at the centers of these two Hamming sphereswould have distance at most e + e − 1 = 2e − 1, which is impossible since d = 2e. Hence,a received word obtained from a codeword by introducing k errors cannot lie in any code-word Hamming sphere and the decoder can detect the occurrence of errors. The decodercannot detect e + 1 errors in general, since a vector at distance e + 1 from a given code-word may be in the Hamming sphere of another codeword, and the decoder would erro-neously correct such a vector to the codeword at the center of the second Hamming sphere.

�To evaluate the reliability of the nearest neighbor decoding strategy we have to compute

the probability that a codeword c sent over the channel is decoded correctly. Let us consideran [n, k, d] code C over an alphabet A with q symbols, |A| = q. Assume a probability of errorp, such that 0 ≤ p ≤ (q − 1)/q.

When codeword c is sent, the received binary n-tuple, v, will be decoded correctly if it isin the Hamming sphere Sc of radius e = (d− 1)/2 about c. The probability of this event is

∑v∈Sc

P (v, c) =e∑

i=0

(n

i)

pi(1 − p)n−i.

365

Indeed, the probability of receiving an n-tuple with i positions in error is(n

i)

pi(1−p)n−i and

i takes values in the range from 0, no errors, to a maximum of e errors. This expression givesa lower bound on the probability that a transmitted codeword is correctly decoded.

Example. Decoding rule. Consider the code C = {c1, c2, c3} where

c1 = (101100) c2 = (010111) c3 = (111011).

The code C has distance d = 3, and hence can correct 1 error. The set S of all possiblewords over the alphabet {0,1} consists of all possible binary 6-tuples, hence, |S| = 64. Let usnow construct the four Hamming spheres of radius 1 about each codeword:

Sc1 = {(101100), (001100), (111100), (100100), (101000), (101110), (101101)},Sc2 = {(010111), (110111), (000111), (011111), (010011), (010101), (010110)},Sc3 = {(111011), (011011), (101011), (110011), (111111), (111001), (111010)}.The Hamming spheres cover 21 of the 64 possible 6-tuples in S. Let S∗ be the set of

6-tuples that are not included in any Hamming sphere; then |S∗| = 64 − 21 = 43. When thedecoder receives the binary 6-tuple v = (000111) the distance to each codeword is computedas follows:

d(c1, v) = 4, d(c2, v) = 1, d(c3, v) = 4.

According to the minimum distance rule v is decoded as c2; v lies in the Hamming sphere Sc2 .

4.4 Algebraic Structures and Coding Theory

A brief excursion into algebraic structures used by coding theory will help us follow the pre-sentation of the codes discussed in this chapter, as well as the quantum error correcting codespresented in Chapter 5. First, we review basic concepts regarding the algebraic structuresused in this chapter and in the next one.

A group consists of a set G equipped with a map : (G×G) �→ G and −1 : G �→ G andan identity element e ∈ G so that the following axioms hold:

1. ∀(gi, gj) ∈ G (gi gj) ∈ G.

2. ∀(gi, gj, gk) ∈ G (gi gj) gk = gi (gj gk).

3. ∀gi ∈ G gi e = gi.

4. ∀gi ∈ G ∃g−1i such that gi g−1

i = e.

G is an Abelian or commutative group if an additional axiom is satisfied: ∀(gi, gj) ∈F gi gj = gj gi. The group is either an additive group when the operation ( ) is additiondenoted as (+), or a multiplicative group when the operation is multiplication denoted as (·).A subset G′ is called a subgroup of G if g1, g2 ∈ G′ implies g1 g−1

2 ∈ G′. The number ofdistinct elements of a group is called the order of the group.

366

Example. The set of rotational symmetries of an ordinary sphere. This symmetry grouphas an infinite number of elements since we can rotate a sphere through any angle about anydirection in the 3-dimensional space. The group is called SO(3) [324].

A subgroup G′ ⊂ G is a normal subgroup of G if each element of the group G commuteswith the normal subgroup G′:

G′ g = g G′, ∀g ∈ G.

A set R with two binary operations, addition, (+), and multiplication, (·), and a distinctelement, 0 ∈ G, is a commutative ring with unity, or simply a ring, if the following axiomsare satisfied:

1. (R,+) is an Abelian group with the identity element 0.

2. Multiplication is commutative: ∀(a, b) ∈ R a · b = b · a.

3. Identity/unity element for multiplication: ∃ 1 ∈ R such that ∀a ∈ R 1 · a = a · 1 = a.

4. Multiplication is associative: ∀(a, b, c) ∈ R (a · b) · c = a · (b · c).5. Multiplication is distributive: ∀(a, b, c) ∈ R (a + b) · c = a · c + b · c.

Note that there is no multiplicative inverse of the elements of a ring.

Example. The set of integers with integer addition and integer multiplication, (Z, +, ·). Thisset forms a ring, denoted as Z, with an infinite number of elements.

An algebraic structure (F, +, ·), consisting of a set F , two binary operation, addition (+)and multiplication (·), and two distinct elements, 0 and 1 is a field if the following axioms aresatisfied:

1. ∀(a, b) ∈ F a + b = b + a.

2. ∀(a, b, c) ∈ F (a + b) + c = a + (b + c).

3. ∀a ∈ F a + 0 = a.

4. ∀a ∈ F ∃(−a) a + (−a) = 0.

5. ∀(a, b) ∈ F a · b = b · a.

6. ∀(a, b, c) ∈ F (a · b) · c = a · (b · c).7. ∀a ∈ F a · 1 = a.

8. ∀a ∈ F ∃a−1 a · a−1 = 1.

9. ∀(a, b, c) ∈ F a · (b + c) = a · b + a · c.

367

A field F is at the same time an Abelian additive group, (F, +), and an Abelian multiplicativegroup, (F, ·).

A field F with a finite number of elements q is called a finite or Galois field denoted asGF (q), it is also named after the French mathematician Evariste Galois.

We now discuss the congruence relationship and show that it allows us to construct finitealgebraic structures consisting of equivalence classes. Given a positive integer, p ∈ Z, thecongruence relation among integers is defined as

∀(m,n) ∈ Z m ≡ n mod p =⇒ (m − m/p · p) = (n − n/p · p).

The congruence is an equivalence relation R: it is reflexive, aRa; symmetric, aRb ⇒ bRa;and transitive [aRb and bRc] ⇒ aRc. It partitions a set into equivalence classes. Forexample, the set of integers modulo 3 is partitioned into three equivalence classes:

[0] = 0, 3, 6, 9 . . ., the integers divisible by 3;[1] = 1, 4, 7, 10 . . ., the integers with a remainder of 1 when divided by 3; and[2] = 2, 5, 8, 11 . . ., the integers with a remainder of 2 when divided by 3.

Proposition. The set of integers modulo p, Zp, with the standard integer addition and integermultiplication is a finite field if and only if p is a prime number.

Proof: To prove that the condition is necessary let us assume that p is prime. First, we observethat Zp has a finite number of elements, 0, 1, 2, . . . (p − 1). Then we verify all the propertiesor axioms of a field. Here we only show that given 0 < a < p the multiplicative inverse, a−1

exists. The gcd(p, a) = 1 because p is prime; then according to Euclid’s algorithm there existintegers s and t such that s · p + t · a = 1. But for all integers s we have s · p ≡ 0 mod p; thisimplies that t · a = 1 mod p or t = a−1 mod p. Thus, if p is prime, then Zp is a finite field.

We start with the observation that if p is not a prime, then it is a product of two termsp = a · b with a, b �= 0 mod p. To prove that the condition is sufficient we have to show thatb has no multiplicative inverse in Zp. In other words that there is no c such that b = c−1 orb · c ≡ 1 mod p. If such c exists then

p · c = a · b · c = a · (b · c) = a · 1 ≡ a mod p = 0 mod p.

But, a ≡ 0 mod p contradicts the assumption that a �= 0 mod p. �

Example. The finite field of integers modulo 5 denoted as Z5 or GF (5): Z5 consists of fiveequivalence classes of integers modulo 5, namely [0], [1], [2], [3], and [4] corresponding to theremainders 0, 1, 2, 3 and 4 respectively:

[0] = { 0 5 10 15 20 25 30 ...}[1] = { 1 6 11 16 21 26 31 ...}[2] = { 2 7 12 17 22 27 32 ...}[3] = { 3 8 13 18 23 28 33 ...}[4] = { 4 9 14 19 24 29 34 ...}

The addition and multiplication tables of Z5 are

368

+ [0] [1] [2] [3] [4][0] [0] [1] [2] [3] [4][1] [1] [2] [3] [4] [0][2] [2] [3] [4] [0] [1][3] [3] [4] [0] [1] [2][4] [4] [0] [1] [2] [3]

and

· [0] [1] [2] [3] [4][0] [0] [0] [0] [0] [0][1] [0] [1] [2] [3] [4][2] [0] [2] [4] [1] [3][3] [0] [3] [1] [4] [2][4] [0] [4] [3] [2] [1]

.

From these tables we see that [3] is the multiplicative inverse of [2] (indeed, [3] · [2] = [1]) andthat [1] and [4] are their own multiplicative inverses ([1] · [1] = [1] and [4] · [4] = [1]).

Z5, Z7, Z11, and Z13 are finite fields but Z9 and Z15 are not because 9 and 15 arenot prime numbers. Indeed, the set Z9 is finite and consists of the following elements:[0], [1], [2], [3], [4], [5], [6], [7] and [8]. The multiplication table of Z9 is

· [0] [1] [2] [3] [4] [5] [6] [7] [8][0] [0] [0] [0] [0] [0] [0] [0] [0] [0][1] [0] [1] [2] [3] [4] [5] [6] [7] [8][2] [0] [2] [4] [6] [8] [1] [3] [5] [7][3] [0] [3] [6] [0] [3] [6] [0] [3] [6][4] [0] [4] [8] [3] [7] [2] [6] [1] [5][5] [0] [5] [1] [6] [2] [7] [3] [8] [4][6] [0] [6] [3] [0] [6] [3] [0] [6] [3][7] [0] [7] [5] [3] [1] [8] [6] [4] [2][8] [0] [8] [7] [6] [5] [4] [3] [2] [1]

.

We see immediately that Z9 is not a field because some non-zero elements, e.g., [3] and [6] donot have a multiplicative inverse. The multiplicative inverse of [2] is [5] (indeed, [2] · [5] = [1]),the one of [7] is [4] ( indeed, [7] · [4] = [1]), and the one of [1] is [1].

It is easy to prove that a finite field GF (q) with q = pn and p a prime number has pn

elements. The finite field GF (q) can be considered a vector space V over Zp. GF (q) is a finitefield thus, the vector space V is finite dimensional. If n represents the number of dimensionsof V there are n elements v1, v2, . . . vn ∈ V which form a basis for V :

V =n∑

i=1

λivi λi ∈ Zp.

Example. Consider p = 2, the finite field of binary numbers, Z2. For n ≥ 2 we constructthe set of all 2n polynomials with binary coefficients:

Z2[x] = {0, 1, x, 1+x, 1+x2, x2, x+x2, 1+x+x2, x3, 1+x3, 1+x+x3, 1+x2+x3, 1+x+x2+x3,

x + x3, x2 + x3, . . .}.Given a finite field with q elements, there is an element β ∈ GF (q) called primitive element,

or characteristic element of the finite field such that the set of nonzero elements of GF (q) is

GF (q) − {0} = {β1, β2, . . . , βq−2, βq−1} with βq−1 = 1.

369

Multiplication in a finite field with q elements, GF (q), can be carried out as addition modulo(q−1). Indeed, if α is the primitive element of GF (q) then any two elements can be expressedas powers of β, a = βi and b = βj. Then

a · b = βi · βj = β(i+j) mod (q−1).

Example. Table 11 displays the 4 non-zero elements of GF (5) as powers of the primitiveelement β = [3].

Table 11: The non-zero elements of GF (5) expressed as a power of β = [3].Element of GF (5) Element of GF (5) as a power of β Justification

[3] β1 3 mod 5 = 3[4] β2 32 mod 5 = 9 mod 5 = 4[2] β3 33 mod 5 = 27 mod 5 = 2[1] β4 34 mod 5 = 81 mod 5 = 1

The product of two elements of GF (5) can be expressed as powers of the primitive element.For example, [2] · [4] = [3]3 · [3]2 = [3]5 mod 4 = [3]; indeed, (2 · 4 mod 5) = (8 mod 5) = 3.

Let GF (q) be a finite field and a ∈ GF (q); the order of element a, ord(a) is the smallestinteger s such that as = 1. The set {β1, β2, . . .} is finite thus there are two positive integersi and j such that βi = βj. This implies that βi−j = 1; in turn, this means that there is asmallest s such that as = 1, so the order is well defined.

By analogy with the ring of integers modulo a positive integer p, we can use the congruencerelation to construct equivalence classes of polynomials. The set of polynomials q(x) in xand with coefficients from a field F form a ring with an infinite number of elements denotedas F [x]. If f(x) ∈ F [x] is a non-zero polynomial, we define the equivalence class [g(x)],containing the polynomial g(x) ∈ F [x] as the set of polynomials which produce the sameremainder as g(x) when divided by f(x).

Let F[x] denote the set of polynomials in variable x with coefficients from a finite field Fand let f(x), g(x) ∈ F[x] be non-zero polynomials; the equivalence class of polynomials [g(x)]relative to f(x) is defined as

[g(x)] = {q(x) : q(x) ∈≡ g(x) mod f(x)}.Addition and multiplication of equivalence classes of polynomials are defined as

[g(x)] + [h(x)] = [g(x) + h(x)] and [g(x)] · [h(x)] = [g(x) · h(x)].

We can use the congruence relation to construct a finite ring of polynomials over F mod-ulo the non-zero polynomial f(x). A finite ring of polynomials over F modulo the non-zeropolynomial f(x), denoted by R = F[x]/(f(x)) is

R = F[x]/(f(x)) = {[g(x)] : g(x) ∈ F [x]}.

370

We are particulary interested in the special case f(x) = xn − 1 when:

xn − 1 ≡ 0 mod f(x) =⇒ xn ≡ 1 mod f(x).

Example. Addition and multiplications in Z2[x]/(x4 + 1)

[1 + x2 + x3] + [1 + x + x2 + x3] = [1 + x2 + x3 + 1 + x + x2 + x3] = [x]

and

[1 + x] · [1 + x + x2 + x3] = [1 + x + x2 + x3 + x + x2 + x3 + x4] = [1 + x4] = [0].

An ideal I of a ring (R, +, ·) is a non-empty subset of R, I ⊂ R, I �= ∅, with two properties:

1. (I, +) is a group.

2. i · r ∈ I ∀i ∈ I and ∀r ∈ R.

The ideal generated by the polynomial g(x) ∈ R is the set of all polynomials multiple ofg(x)

I = {g(x) · p(x), ∀p(x) ∈ R}.A principal ideal ring (R, +, ·) is one when ∀I ⊂ R there exits g(x) ∈ I such that I is theideal generated by g(x).

It is easy to prove that F[x], the set of polynomials f(x) with coefficients from a field Fand F(x)/q(x), the finite ring of polynomials over the field F modulo the non-zero polynomialq(x) are principal ideal rings.

In summary, the congruence relationship allows us to construct finite algebraic structuresconsisting of equivalence classes; this process is illustrated in Figure 99. Given the set ofintegers we can construct the ring Zp of integers modulo p. Similarly, given a non-zeropolynomial with coefficients in Z2 we construct 16 equivalence classes of polynomials modulothe polynomial f(x) = x4 − 1 :

[0] = the set of polynomials with remainder 0 when divided by f(x)[1] = the set of polynomials with remainder 1 when divided by f(x)[x] = the set of polynomials with remainder x when divided by f(x)

[1+x] = the set of polynomials with remainder 1+x when divided by f(x)...

[x2 + x3] = the set of polynomials with remainder x2 + x3 when divided by f(x)[x + x2 + x3] = the set of polynomials with remainder x + x2 + x3 when divided by f(x)

We now discuss briefly extension Galois fields GF (q) with q = pm and p a prime number.The extension field GF (pm) consists of pm vectors with m components; each component is

371

n

)()( xgxf iΠ=

i

Figure 99: The congruence relationship allows us to construct finite algebraic structures. Theinteger congruence modulo p leads to a finite ring of integers when p is not a prime number, orto a finite field F when p is prime. The infinite ring of polynomials over the finite field F andthe congruence modulo the polynomial f(x) allow us to construct a finite ring of polynomials,F[x]/f(x). When f(x) = xn−1 = Πq

i=1gi(x) we can construct ideals of polynomials generatedby gi(x). Such ideals of polynomials are equivalent to cyclic subspaces of a vector space overF, Vn(F ), or cyclic codes discussed in Section 4.11.

an element of GF (p). The vector a = (a1, a2, . . . , am−2, am), with ai ∈ GF (p), can also berepresented as the polynomial a(x) = am−1x

m−1 + am−2xm−2 + . . . + a1x + a0.

The addition operation in GF (pm) is defined componentwise. The multiplication is moreintricate: if f(x) ∈ GF (pm) is an irreducible polynomial of degree m with coefficients in GF (p)then multiplication is defined as a(x) · b(x) mod f(x); thus, if a(x) · b(x) = q(x) · f(x) + r(x)then r(x) = rm−1x

m−1 + rm−2xm−2 + . . . + r1x + r0 is a polynomial of degree at most m − 1.

The identity element is I(x) = 1 or the m-tuple (1, 0, . . . 0, 0). It is left as an exercise to provethat the axioms for a finite field are satisfied. For example, the 2m elements of GF (2m) arebinary m-tuples. The addition is componentwise, modulo 2. If a, b ∈ GF (2m) and

a = (a0, a1, . . . , am−2, am−1), b = (b0, b1, . . . , bm−2, bm−1), ai, bi ∈ GF (2), 0 ≤ i ≤ m − 1

then

a+ b = [(a0 + b0) mod 2, (a1 + b1) mod 2, . . . , (am−2 + bm−2) mod 2, (am−1 + bm−1) mod 2].

An important property of an extensions field GF (qm): a polynomial g(x) irreducible over

372

GF (q) could be factored in GF (qm). For example, the polynomial g(x) = x2 + x + 1 isirreducible over GF (2) but can be factored over GF (22):

g(x) = (x + α)(x + α2)

with α a primitive element of GF (22) = {0, α0, α1, α2} and g(α) = 0.

Example. Construct GF (24) using the irreducible polynomial f(x) = x4 + x + 1.

First, we identify the primitive element α which must satisfy the equation αq−1 = 1. Inthis case q = 24 = 16 and α = x and α15 = x15 = (x4+x+1)(x11+x8+x7+x5+x3+x2+x)+1thus, α15 = 1 mod (x4 + x + 1).

We construct all powers of α = x, or the binary 4-tuple (0, 0, 1, 0). The first elements areα2 = x2, α3 = x3. Then α4 mod (x4 + x + 1) = x + 1. Indeed, the remainder when dividingx4 by (x4 + x + 1) is (x + 1) as x4 = 1 · (x4 + x + 1) + (x + 1). The next two elements areα5 = x · x4 = x(x + 1) = x2 + x and α6 = x · x5 = x · (x2 + x) = x3 + x2.

The other elements are: α7 = x3 + x + 1, α8 = x2 + 1, α9 = x3 + x, α10 = x2 + x + 1,α11 = x3 + x2 + x, α12 = x3 + x2 + x + 1, α13 = x3 + x2 + 1, α14 = x3 + 1, and α15 = 1.

Table 12: GF (24) construction when the irreducible polynomial and the primitive elementare: (Left) f(x) = x4 + x + 1 and α = x; (Right) h(x) = x4 + x3 + x2 + x + 1 and β = x + 1.

Vector Primitive Polynomial Vector Primitive Polynomialelement element

0000 0 0000 00010 α1 x 0011 β1 x + 10100 α2 x2 0101 β2 x2 + 11000 α3 x3 1111 β3 x3 + x2 + x + 10011 α4 x + 1 1110 β4 x3 + x2 + x0110 α5 x2 + x 1101 β5 x3 + x2 + 11100 α6 x3 + x2 1000 β6 x3

1011 α7 x3 + x + 1 0111 β7 x2 + x + 10101 α8 x2 + 1 1001 β8 x3 + 11010 α9 x3 + x 0100 β9 x2

0111 α10 x2 + x + 1 1100 β10 x3 + x2

1110 α11 x3 + x2 + x 1011 β11 x3 + x + 11111 α12 x3 + x2 + x + 1 0010 β12 x1101 α13 x3 + x2 + 1 0110 β13 x2 + x1001 α14 x3 + 1 1010 β14 x3 + x0001 α15 1 0001 β15 1

Example. Construct GF (24) using the irreducible polynomial h(x) = x4 + x3 + x2 + x + 1.

373

The primitive element is β = x+1. To show that β15 = 1 we carry out a binomial expansionand a polynomial division and conclude that (x + 1)15 = 1 mod (x4 + x3 + x2 + x + 1).

Now β2 = (x+1) · (x+1) = x2 +1. Then βk+1 = βk(x+1). For example, after computingβ5 = x3 + x2 + 1 we can compute β6 = (x3 + x2 + 1) · (x + 1); as x4 = (x3 + x2 + x + 1) itfollows that β6 = x3.

Table 12 shows the correspondence between the 4-tuples, the powers of the primitiveelement, and polynomials for the two examples.

Example. Identify the primitive element of GF (32) constructed as class of remainders of theirreducible polynomial f(x) = x2 + 1. This is another example when the primitive element isnot x but α = x + 1. Indeed, the non-zero elements of the field are:

1 = α0 2 = α4 x = α6 2x = α2

1 + x = α1 2 + x = α7 1 + 2x = α3 2 + 2x = α5

The order of the each of the eight elements of GF (32) is given by the power of the primitiveelement. For example, the order of 2 + x is 7, the order of x is 6.

It is easy to see if β ∈ {GF (q)−0} then βq−1 = 1. Observe that the multiplicative inverseof β, is in GF (q) and let the (q−1) distinct non-zero elements GF (q) be a1, a2, . . . aq−1. Thenall the products β ·a1, β ·a2, . . . , β ·aq−1 must be distinct. Indeed, if β ·ai = β ·aj, i �= j then(β−1 · β) · ai = (β−1 · β) · aj i �= j or ai = aj which contradicts that ai and aj are distinct. Itfollows immediately that the product of (q − 1) distinct non-zero elements of GF (q) must bethe same

(β · a1) · (β · a2) . . . · (β · aq−1) = a1 · a2 · . . . aq−1 or βq−1 · (a1 · a2 · . . . aq−1) = a1 · a2 · . . . aq−1

and this implies that βq−1 = 1. Another way of reading this equation is that any elementβ ∈ GF (q) satisfies the equation βq − β = 0; this leads us to the conclusion that any elementof GF (q), must be a root of the equation xq − x = 0.

It is also easy to show that any element β ∈ GF (qm) is in GF (q) if and only if βq = β.A polynomial of degree n in x with coefficients in GF (q) has at most n roots in GF (q). Thecondition βq = β shows that β is a root of the polynomial equation xq − x = 0 which has asroots all the elements of GF (q). We conclude that β is in GF (q) if and only if βq = β.

We now show that the orders of all non-zero elements of GF (q) divide (q − 1). Consideran element β ∈ GF (q) of order a = ord(β) thus βa = 1. Let us assume that a does not divide(q − 1) so q − 1 = a · b + c with 1 ≤ c < a. We have shown earlier that ∀β ∈ GF (q) wehave the equality βq−1 = 1. Thus,

βq−1 = βa·b+c = βa·b · βc = (βa)b · βc = 1.

But (βa)b = 1 thus, βc = 1. This is possible only if c = 0 thus q − 1 = a · b and a = ord(β)divides q − 1.

Finally, we show that there exists a primitive element in any finite field GF (q). Assumethat β is the element of the largest order, b in GF (q), βb = 1. In the trivial case b = q − 1and we have concluded the proof, β is the primitive element of GF (q).

374

Let us now concentrate on the case when b < q−1. Assume that the orders of all elementsin GF (q) divide b; this implies that every non-zero element of GF (q) satisfies the polynomialequation yb − 1 = 0. This equation has at most b roots and can generate at most b elementsof GF (q); we have reached a contradiction with the assumption b < q − 1.

Let γ be the element of GF (q) whose order, c, does not divide b thus,

gcd(b, c) = d > 1, =⇒ c = d · g with gcd(g, b) = 1.

But γc = 1 implies that there exists an element δ of order d in GF (q); indeed, γc = γc·g =(γg)d = 1. Thus δ = γg ∈ GF (q).

Let us now consider another element of GF (q) constructed as the product of two elementsβ · δ; it is easy to see that this is an element of order b · d as βb · δd = 1. We have thus foundan element in GF (q) of order b · d and this contradicts the assumption b is the order of thelargest element.

The following proposition summarizes some of the results discussed above.

Proposition. Consider a finite field with q elements, GF (q) and an element α ∈ GF (q); thefollowing properties are true:

1. αj = 1 ⇐⇒ ord(α) = j.

2. ord(α) = s =⇒ α1 �= α2 �= . . . �= αs.

3. ord(α) = s, ord(β) = r, gcd(s, r) = 1 =⇒ ord(α · β) = s · r.4. ord(αj) = ord(α)

gcd(ord(α),j).

5. GF (q) has an element of order (q − 1).

6. The order of any element in GF (q) divides (q − 1).7. γq − γ = 0, ∀γ ∈ GF (q).

After this brief review of algebras we start our discussion of error correcting codes withan informal introduction.

4.5 Linear Codes

Linear algebra allows an alternative description of a class of block codes called linear codes.In this section we show that an [n, k] linear code C is a k-dimensional subspace V

(k)n of an

n-dimensional vector space Vn. Once we select a set of k linearly independent vectors in Vn wecan construct a generator matrix G for C. Then the code C can be constructed by multiplyingthe generator matrix G with message vectors in Vk. Several classes of linear codes, codesdefined as subspaces of a vector space, are known, including the cyclic codes and the BCHcodes discussed in Section 4.11. Throughout the presentation of linear codes we use the termn-tuple instead of vector in an n-dimensional vector space.

Recall that an [n,M ] block code C over the alphabet A with q symbols encodes k = logq Minformation symbols into codewords with n = k + r symbols. Two alternative descriptionsof a linear code C are used: [n, k] code and [n, k, d] code with d the Hamming distance of

375

the code. Decoding is the reverse process, extracting the message from a codeword. Bothprocesses can be simplified when the code has an algebraic structure.

We consider an alphabet with q symbols from a finite field F = GF (q) and vectors withn or k components also called n-tuples and, respectively, k-tuples; the corresponding vectorspaces are denoted as Vn(F ) and Vk(F ). When there is no confusion about the alphabet usedinstead of Vn(F ) and Vk(F ) we shall use the notation Vn and Vk, respectively. These twovector spaces can also be regarded as the finite fields GF (pn) and GF (pk), respectively. A

subspace of dimension k of the vector space Vn is denoted as V(k)n . Most of our examples cover

binary codes, the case when q = 2.We start this section with formal definitions and then discuss relevant properties of linear

codes. In our presentation we refer to the linear code C as an [n, k] code.

A linear [n, k] code over F is a k-dimensional subspace of the vector space Vn. A linearcode is an one-to-one mapping f of k-tuples from the message space to n-tuples

f : Vk �→ Vn

with n > k. The n-tuples selected as codewords form a subspace of V(k)n ⊂ Vn spanned by the

k linearly independent vectors. Given a set of messages

M = {m1,m2, . . . , mqk} where mi = (mi,1,mi,2, . . . , mi,n) ∈ Vk and mi,j ∈ F

and a basis Bk for the k-dimensional subspace V(k)n ⊂ Vn, Bk = {v1, v2, . . . , vk} with vi ∈ Vn,

encoding can be defined as the product of vectors in M with vectors in Bk; the code C is then

C = {ci =k∑

j=1

mi,j · vj ∀mi,j ∈ M with vj ∈ Bk}.

Example. Consider a 3-dimensional subspace of a vector space over the finite field with q = 2and n = 6. We choose the following three vectors as a basis

Bk = {v1 = (111000) v2 = (100110) v3 = (110011)}.The message space consists of all 3-tuples

M = {m1 = (000), m2 = (001), m3 = (010), m4 = (011),m5 = (100), m6 = (101), m7 = (110), m8 = (111)} .

Each codeword of the code C is a 6-tuple obtained as a mapping of the products of individualmessage bits with the basis vectors:

m1 = (000) → c1 = 0 · v1 + 0 · v2 + 0 · v3 = (000000)

m2 = (001) → c2 = 0 · v1 + 0 · v2 + 1 · v3 = (110011)

m3 = (010) → c3 = 0 · v1 + 1 · v2 + 0 · v3 = (100110)

m4 = (011) → c4 = 0 · v1 + 1 · v2 + 1 · v3 = (010101)

m5 = (100) → c5 = 1 · v1 + 0 · v2 + 0 · v3 = (111000)

m6 = (101) → c6 = 1 · v1 + 0 · v2 + 1 · v3 = (001011)

376

m7 = (110) → c7 = 1 · v1 + 1 · v2 + 0 · v3 = (011110)

m8 = (111) → c8 = 1 · v1 + 1 · v2 + 1 · v3 = (101101)

An [n, k] linear code C is characterized by a k × n matrix whose rows form a basis in Vn;this matrix is called the generator matrix of the linear code C.

Example. The generator matrix of the code C with q = 2, n = 6, k = 3 is

G =

⎛⎝ v1

v2

v3

⎞⎠ =

⎛⎝ 1 1 1 0 0 0

1 0 0 1 1 01 1 0 0 1 1

⎞⎠ .

The code C = {c1, c2 . . . c8} generated by matrix G is obtained as products of the vectors inthe message space M with G. For example:

c3 = m3 · G = (010)

⎛⎝ 1 1 1 0 0 0

1 0 0 1 1 01 1 0 0 1 1

⎞⎠ = (100110)

c7 = m7 · G = (110)

⎛⎝ 1 1 1 0 0 0

1 0 0 1 1 01 1 0 0 1 1

⎞⎠ = (011110)

Given two [n, k] linear codes, C1 and C2, over the filed F with generator matrices G1 andG2, respectively, we say that they are equivalent codes if

G2 = G1 · Pwith P a permutation matrix and (·) denoting matrix multiplication.

Example. Consider the generator matrix G of the code C

G =

⎛⎝ 1 1 1 0 0 0

1 0 0 1 1 01 1 0 0 1 1

⎞⎠

and a permutation matrix P given by

P =

⎛⎜⎜⎜⎜⎜⎜⎝

0 0 1 0 0 01 0 0 0 0 00 0 0 1 0 00 0 0 0 1 00 1 0 0 0 00 0 0 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎠

.

We construct the generator matrix Gp = GP as

377

Gp =

⎛⎝ 1 1 1 0 0 0

1 0 0 1 1 01 1 0 0 1 1

⎞⎠

⎛⎜⎜⎜⎜⎜⎜⎝

0 0 1 0 0 01 0 0 0 0 00 0 0 1 0 00 0 0 0 1 00 1 0 0 0 00 0 0 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎠

=

⎛⎝ 1 0 1 1 0 0

0 1 1 0 1 01 1 1 0 0 1

⎞⎠ .

Recall that the code C is

C = {(000000) (110011) (100110) (010101) (111000) (001011) (011110) (101101)}.It is easy to see that Cp, the code generated by Gp is

Cp = {(000000) (111001) (011010) (100011) (101100) (010101) (110110) (001111)}.The permutation matrix P maps columns of matrix G to columns of Gp as follows: 2 �→ 1,

5 �→ 2, 1 �→ 3, 3 �→ 4, 4 �→ 5, and 6 �→ 6. To map column i of the original matrix G intocolumn j of matrix Gp then the element pij of the permutation matrix must be pij = 1.

This example shows that being equivalent does not mean that the two codes are identical.Since a k-dimensional subspace of a vector space is not unique, the generator matrix of a codeis not unique but one of them may prove to be more useful than the others. We shall seeshortly that among the set of equivalent codes there is one whose generator matrix is of theform G = [IkA]. Indeed, given the k×n generator matrix G of code C with a subset of columnsbeing the binary representation of the integers 1 to k we can easily construct the permutationmatrix that exchanges the columns of G to obtain the generator matrix GI = [IkA] of theequivalent code CI .

Given an [n, k]-linear code C over the field F the orthogonal complement, or thedual of the code C denoted as C⊥ consists of vectors (or n-tuples) orthogonal to every vectorin C

C⊥ = {c ∈ Vn : c · w = 0, ∀ w ∈ C}

Given an [n, k]-linear code C over the field F let H be the generator matrix of the dual codeC⊥. Then ∀c ∈ Vn, HcT = 0 ⇔ c ∈ C; the matrix H is called the parity-check matrix of C.

Given an [n, k]-linear code C over the field F with the parity check matrix H, theerror syndrome s of the n-tuple v is defined as s = HvT . There is a one-to-one correspondencebetween the error syndrome and the bits in error, thus, the syndrome is used to determine ifan error has occurred and to identify the bit in error. We wish to have an efficient algorithmto identify the bit(s) in error once the syndrome is calculated, a topic addressed in Section4.6. A linear code C with the property that C⊥ ⊂ C is called weakly self-dual. When C = C⊥

the code is called strictly self-dual.

Some properties of linear codes. We now discuss several properties of linear codesover a finite field. There are many k-dimensional subspaces of the vector space Vn thus, many[n, k] codes; their exact number is given by the following proposition.

378

Proposition. The number of k-dimensional subspaces V(k)n of the vector space Vn over GF (q),

with n > k is

sn,k =(qn − 1)(qn−1 − 1) . . . (qn−k+1 − 1)

(qk − 1)(qk−1 − 1) . . . (q − 1)

Proof: First, we show that there are

(qn − 1)(qn − q) . . . (qn − qk−1)

ordered sets of k linearly independent vectors in Vn. The process of selecting one by one thek linearly independent vectors is now described. To select the first one we observe that thereare qn − 1 non-zero vectors in Vn. This explains the first term. There are q vectors thatare linearly dependent on a given vector. Thus, for every vector in the first group there are(qn−q) linearly independent vectors. Therefore, there are (qn−1)(qn−q) different choices forthe first two vectors. The process continues until we have selected all k linearly independentvectors of the k-dimensional subspace.

A similar reasoning shows that a k-dimensional subspace consists of

(qk − 1)(qk − q) . . . (qk − qk−1)

ordered sets of k linearly independent vectors. This is the number of the ordered setsof k linearly independent vectors generating the same k-dimensional subspace. To de-termine the number of distinct k-dimensional subspaces we divide the two numbers.

�

Example. If q = 2, n = 6, k = 3 the number of 3-dimensional subspaces of V6 is

sn,k =(26 − 1)(25 − 1)(24 − 1)

(23 − 1)(22 − 1)(21 − 1)=

63 × 31 × 15

7 × 3 × 1= 1395.

The weight of the minimum weight non-zero vector, or n-tuple, w ∈ C, is called theHamming weight of the code C, w(C) = min{w(c) : ∀c ∈ C, c �= 0}.

Proposition. The Hamming distance of a [n, k]-linear code C over GF (2) is equal to theHamming weight of the code

d(C) = w(C).

Proof: ∀ci, cj ∈ C, ci �= cj, ci ⊕ cj = ck ∈ C and d(ci, cj) = w(ci ⊕ cj) with ⊕ the bitwiseaddition modulo 2 in GF (2). Therefore, d(C) = min{w(ck) : ∀ck ∈ C, ck �= 0} = w(C).

�

Proposition. Given an [n, k] linear code C over the field F , among the set of generatormatrices of C, or of codes equivalent to C, there is one, where the information symbols appearin the first k positions of a codeword. Thus, the generator matrix is of the form

379

G = [IkA].

Here Ik is the k × k identity matrix and A is a k × (n − k) matrix.

Proposition. Given an [n, k] linear code C over the field F , its dual, C⊥, is an [n,n-k] linearcode over the same field, F . If G = [IkA] is a generator matrix for C, or for a code equivalentto C, then H = [−AT In−k] is a generator matrix for its dual code, C⊥.

Proof: It is relatively easy to prove that C⊥ is a subspace of Vn. From the definition of H

H = [−AT In−k] =

⎛⎜⎜⎜⎜⎝

−a1,1 −a2,1 −a3,1 . . . −an,1 1 0 0 . . . 0−a1,2 −a2,2 −a3,3 . . . −an,2 0 1 0 . . . 0−a1,3 −a2,3 −a3,3 . . . −an,3 0 0 1 . . . 0

. . .−a1,n−k −a2,n−k −a3,n−k . . . −an,n−k 0 0 0 . . . 1

⎞⎟⎟⎟⎟⎠

it follows that

HT =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

−a1,1 −a1,2 −a1,3 . . . −a1,n−k

−a2,1 −a2,2 −a2,3 . . . −a2,n−k

−a3,1 −a3,2 −a3,3 . . . −a3,n−k

. . .−an,1 −an,2 −an,3 . . . −an,n−k

1 0 0 . . . 00 1 0 . . . 00 0 1 . . . 0

. . .0 0 0 . . . 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

[ −AIn−k

].

Then,

GHT = [IkA]

[ −AIn−k

]= 0.

It is easy to see that the element cij 1 ≤ i ≤ k, 1 ≤ j ≤ n − k of the product iscij = −aij + aij = 0. Note also that in GF (q) the additive inverse of aij is an element suchthat aij + (−aij) = 0; for example, (−1) = 1 in GF (2) and (−1) = 2 in GF (3).

Call span(H) the subspace generated by the rows of H. The rows of H are linearly inde-pendent so the row space of H spans an (n−k) subspace of Vn. If the rows of H are orthogonalto the rows of G it follows that

span(H) ⊆ C⊥.

To complete the proof we have to show that C⊥ ⊆ span(H). We leave this as an exercise tothe reader. Thus, C⊥ = span(H) and dim(C⊥) = n − k.

380

Example. The previous proposition suggests an algorithm to construct the generator matrixof the dual code, C⊥. Let the generator matrix of the code

C = {(000000) (101001) (011010) (110011) (101100) (000101) (110110) (011111)}be

G =

⎛⎝ 1 0 1 1 0 0

0 1 1 0 1 01 1 1 0 0 1

⎞⎠ .

This matrix can be transformed by permuting the columns of G. We map columns 1 �→ 4,2 �→ 5, 3 �→ 6, 4 �→ 1, 5 �→ 2, and 6 �→ 3 to construct GI

GI = GP =

⎛⎝ 1 0 1 1 0 0

0 1 1 0 1 01 1 1 0 0 1

⎞⎠

⎛⎜⎜⎜⎜⎜⎜⎝

0 0 0 1 0 00 0 0 0 1 00 0 0 0 0 11 0 0 0 0 00 1 0 0 0 00 0 1 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠

=

⎛⎝ 1 0 0 1 0 1

0 1 0 0 1 10 0 1 1 1 1

⎞⎠

or,

GI = [I3A] with A =

⎛⎝ 1 0 1

0 1 11 1 1

⎞⎠ .

It is easy to observe that in this case AT = A and H = [−A I3], or:

H =

⎛⎝ 1 0 1 1 0 0

0 1 1 0 1 01 1 1 0 0 1

⎞⎠ = G.

Proposition. The necessary and sufficient condition for the linear code C with generatormatrix G to be weakly self-dual is that:

GT G = 0.

The proof of this proposition is left as an exercise for the reader.

Proposition. Given an [n, k] linear code C over the field F with the parity check matrix H,every set of (d − 1) columns of H are linearly independent if and only if C has a distance atleast equal to d.

Example. The parity-check matrix of a code capable of correcting a single error. A single-error correcting code must have a distance d ≥ 3. According to the previous propositionwe need a matrix H such that no 2 or fewer than 2 columns are linearly dependent. Let usfirst examine the set of single columns. If we eliminate the all-zero column, then no single

381

column can be linearly dependent. Two columns hi and hj are linearly dependent if there arenon-zero scalars, λi, λj such that

λihi + λjhj = 0.

This implies that the two columns are scalar multiple of each other. Indeed, the previousequation allows us to express hi = −(λj/λi)hj.

In summary, if we want a code capable of correcting any single error we have to constructa matrix H to be used as the parity-check matrix of the code such that:

1. The matrix H does not contain an all-zero column.

2. No two columns of H are scalar multiples of each other.

Example. Generate the code with the parity-check matrix H.

First, we transform the parity-check matrix by column operations as follows

H ⇒ H′ = [IkA].

Then we construct the generator matrix of the code G = [−AT In−k]. Finally, we multiply allvectors in the message space with the generator matrix and obtain the set of codewords.

Example. Correct a single error in the n-tuple v using the parity check matrix H of the [n, k]linear code C. Apply the following algorithm:

1. Compute sT = HvT . If sT = 0 then v ∈ C.

2. If sT �= 0 compare sT with h1, h2, . . . , hn, the columns of H.

• If there is an index j such that sT = hj, then the error is an n-tuple e with a 1 inthe j-th bit position and 0s elsewhere. The codeword is then c = v + e.

• Else, there are two or more errors.

For example, assume that the codeword c = (0111100) of the code C with the parity-checkmatrix H is affected by an error in the last bit position; the n-tuple received is v = (0111101).Then

sT = HvT =

⎛⎝ 1 0 1 0 1 0 1

0 1 1 0 0 1 10 0 0 1 1 1 1

⎞⎠

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

0111101

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎝ 1

11

⎞⎠

The vector sT is identical to the last column of H thus e = (0000001) and the codeword isc = v + e = (0111101)+ (0000001) = (0111100). In this example H is the parity-check matrixof a Hamming code; Hamming codes are discussed in Section 4.8.

382

Proposition. If C is an [n, k] code with a generator matrix G whose columns are the 2k − 1non-zero binary k-tuples then all non-zero codewords have weight 2k−1.

Hint: About half of the (2k − 1) k-tuples that are columns of G have an even weight and halfhave an odd weight. More precisely 2k−1 have an odd weight 2k−1 − 1 have an even weight.It follows that half of the components of a codeword obtained by linear combinations of therows of G are 1s.

The weight distribution of a code C is a vector AC = (A0, A1, . . . , An) with Aw the numberof codewords of weight w; the weight enumerator of C is the polynomial

A(z) =n∑

w=0

Awzw.

If C is an [n, k, d] linear code then Aw represents also the number of codewords at distanced = w from a given codeword.

Proposition. If A(z) is the weight enumerator of the [n, k, d] linear code C, then the weightenumerator of its dual, C⊥, is

B(z) = 2k(1 + z)nA

(1 − z

1 + z

).

In the next section we discuss algorithms for efficient decoding of linear codes.

4.6 Syndrome and Standard Array Decoding of Linear Codes

Now we discuss the partitioning of a vector space Vn over the finite field F into equivalenceclasses such that all n-tuples in the same class have the same syndrome; then we present adecoding procedure that exploits this partitioning. We start with a few definition of algebraicstructures.

Let G be an additive Abelian group and S be a subgroup of G, S ⊆ G. Two elementsg1, g2 ∈ G are congruent modulo the subgroup S, if g1 − g2 ∈ S32, or

g1 ≡ g2 mod S ⇐⇒ g1 − g2 ∈ S.

Let G be an additive Abelian group and S be a subgroup of G, S ⊆ G; the equivalence classesfor congruence modulo S are called the cosets of the subgroup S.33

32In the general case the congruence relationship implies that g1 g−12 ∈ S where “ ” is the group operation

and g−12 is the inverse of g2 under “ ”. In other words g2 g−1

2 = e with e the neutral element of G underthe group operation. In case of an additive group the operation is +, “addition” and the “inverse” g−1 = −g.

33In the general case the elements defined here are called the right cosets, but for an Abelian group theyare simply called cosets.

383

Proposition. The order m of a subgroup S ⊆ G, divides n, the order of the group (LagrangeTheorem).

Proof: The elements of the subgroup are S = {s1, s2, . . . sm}. Coset i consists of the elementscongruent with si and is denoted as Ssi and coset j consists of the elements congruent withsj and is denoted as Ssj. Let f be the mapping

f : Ssi → Ssj =⇒ f(gksi) = gksj

The function f maps distinct elements of the coset Ssi into distinct elements of the cosetSsj. All elements of Ssj are mapped onto. Thus f is an one-to-one map. This impliesthat the two cosets, and then obviously all cosets, have the same cardinality, m. But con-gruence does partition the set G of n elements into a number of disjoint cosets and n =(number ofcosets of S)×m. �

We now present an alternative definition of the cosets useful for syndrome calculationand decoding of linear codes. We discuss only binary codes, codes over GF (2n), but similardefinitions and properties apply to codes over GF (qn). Throughout this section Vn is GF (2n),we consider binary n-tuples; the code C is a subset of GF (2n).

Let C be an [n, k] code, C ⊂ GF (2n). The coset of the code C containing v ∈ GF (2n) isthe set v + C = {v + c, c ∈ C}. The binary n-tuple of minimum weight in Ci is called theleader of the coset Ci.

The decoding procedure we present later is based on the following two propositions.

Proposition. Two binary n-tuples are in the same coset if and only if they have the samesyndrome.

Proof: Recall that if H is the parity check matrix of the code C and c ∈ C then HcT = 0. Thesyndrome corresponding to an n-tuple v ∈ GF (2n) is σv = HvT .

If the two binary n-tuples are in the same coset they can be written as v1 = c1 + e andv2 = c2 + e with c1, c2 ∈ C, e ∈ GF (2n). The two syndromes are equal

σv1 = HvT1 = H(c1 + e)T = HeT and σv2 = HvT

2 = H(c2 + e)T = HeT =⇒ σv1 = σv2 .

If the two n-binary tuples v1 and v2 have the same syndrome then

HvT1 = HvT

2 =⇒ H(v1 − v2)T = 0 =⇒ (v1 − v2) ∈ C.

(v1 − v2) is a codeword thus v1 and v2 must be in the same coset. �It is easy to see that there is a one-to-one correspondence between the coset leaders and

the syndromes.

Proposition. Let C be an [n, k] binary linear code. C can correct t errors if and only if allbinary n-tuples of weight t or less are coset leaders.

Proof: To prove that this condition is necessary assume that C has the distance d = 2t + 1,thus it can correct t errors. Let v1, v2 ∈ GF (2n); if v1 and v2 belong to the same coset, then

384

Table 13: The correspondence between the coset leaders and the syndromesCoset leader Syndrome

0000000 (000)T

0000001 (111)T

0000010 (011)T

0000100 (110)T

0001000 (101)T

0010000 (001)T

0100000 (010)T

1000000 (100)T

the binary n-tuple (v1 − v2) is a codeword, (v1 − v2) ∈ C, according to the definition of thecoset and to the previous proposition. If their weight is at most t, w(v1), w(v2) ≤ t, thenthe weight of the codeword (v1 − v2) is w(v1 − v2) ≤ 2t which contradicts the fact that thedistance of the code is d = 2t + 1. Thus, if the code can correct t errors, all binary n-tuplesof weight less or equal to t must be in distinct cosets and each one can be elected as cosetleader.

To prove that this condition is sufficient assume that all binary n-tuples of weight less orequal to t are coset leaders but there are two codewords c1 and c2 such that d(c1, c2) ≤ 2t.There exist two coset leaders, v1 and v2, such that w(v1), w(v2) ≤ t and c1 + v1 = c2 +v2. Indeed we could choose them such that c1 − c2 = v2 − v1. This is possible becaused(c1, c2) ≤ 2t and also d(v1, v2) ≤ 2t as each coset leader has a weight of at most t. Butc1 − c2 = v2 − v1 implies that the two syndromes are equal, σc1+v1 = σc2+v2 . This is acontradiction because we assumed that two coset leaders, v1 and v2 are in different cosets.Thus, the assumption that d(c1, c2) ≤ 2t is incorrect and the code is capable to correct terrors. �

Example. Let C be a [7, 4, 3] code with the parity check matrix:

H =

⎛⎝ 1 0 0 1 1 0 1

0 1 0 0 1 1 10 0 1 1 0 1 1

⎞⎠ .

Table 13 shows the correspondence between the coset leaders and the syndromes for code C.Next we discuss an efficient decoding procedure for a linear code C.

Standard array decoding. Consider an [n, k] code C = {c1, c2, . . . c2k} over GF (2). LetV2n be the vector space of all n-tuples over GF (2). C is a subgroup of order 2k of V2n with 2n

elements. According to the Lagrange theorem there are 2n/2k = 2n−k cosets of C denoted as{C0, C1, . . . , Ci, . . . , C2n−k−1}.

We start with the coset leader L0 = 00 . . . 0, the all-zero n-tuple and then construct thecoset C0 = {L0, c1, c2, . . . c2k}. Then we select as coset leaders n-tuples of weight 1. After weselect all n-tuples of weight 1 as coset leaders we move to select those n-tuples of weight 2that do not appear in any previously constructed coset and continue the process until all 2n−k

385

cosets have been constructed. Once we select the leader of a coset j, Lj, the i-th element ofthe coset is Lj + ci.

The cosets could be arranged in order as rows of a table with 2n−k rows and with 2k

columns. This table is called a standard array for code C.

Example. Consider the binary [6, 3] code with distance 3 and the generator matrix

G =

⎛⎝ 0 0 1 0 1 1

0 1 0 1 1 01 0 0 1 0 1

⎞⎠ =

⎛⎝ c1

c2

c3

⎞⎠

The standard array has as a first row the codewords c1, c2, c3, c1 +c2, c1 +c3, c2 +c3, c1 +c2 +c3

and as the first column the coset leaders.

000000 001011 010110 100101 011101 101100 110011 111000

000001 001010 010111 100100 011100 101101 110010 111001000010 001001 010100 100111 011111 101110 110001 111010000100 001111 010010 100001 011001 101000 110111 111100001000 000011 011110 101101 010101 100100 111011 110000010000 011011 000110 110101 001101 111100 100011 101000100000 101011 110110 000101 111101 001100 010011 011000

Proposition. Consider an [n, k] code C = {c1, c2, . . . , cqk−1} over GF (2). Let SA be a stan-dard array with cosets Ci : i ∈ {1, 2n−k} and coset leaders Li : i ∈ {1, 2n−k}. The (i+1, j+1)entry is Li + cj, as described earlier. Then ∀h ∈ {0, 2k}

d(Li + cj, cj) ≤ d(Li + cj, ch).

In other words, the j +1 entry in column i is closer to the codeword at the head of its column,cj, than to any other codeword of C.

Proof: Recall that w(ci) is the weight of the codeword ci. Then from the definition of theentries of the standard table

d(Li + cj, cj) = w(Li)

d(Li + cj, ch) = w(Li + cj − ch).

Coset Ci is obtained by adding the coset leader Li to the codewords of C, ci ∈ C. Clearly,the n-tuple (Li + cj − ch), is a member of the same coset, Ci because cj − ch ∈ C. Yet, byconstruction, the coset leader Li is the n-tuple with the minimum weight in Ci

w(Li) ≤ w(Li + cj − ch).

�

Proposition. The following procedure based on a standard table (ST) can be used to decodea received n-tuple v of a linear code

386

(i) Locate v in ST.

(ii) Correct v as the codeword at the top of its column.

Example. If we receive v = 111101 we identify it at the fifth element of the last row of thestandard table (ST) constructed for the previous example and decode it as the codeword at thetop of the fifth column, c = 011101.

In the next section we discuss the limitations imposed on our ability to construct “good”linear codes.

4.7 Hamming, Singleton, Gilbert-Varshamov, and Plotkin Bounds

Ideally, we want codes with a large number of codewords (thus, large value of k) which cancorrect as large a number of errors, e, as possible and with the lowest possible number of paritycheck symbols r = n− k. These contradictory requirements cannot be satisfied concurrently.To illustrate these limitations Table 1434 lists the number of binary codewords for the casewhen the length n of a codeword is in the range 5 ≤ n ≤ 15 and the number of errors thecode is able to correct is in the range 1 ≤ e ≤ 3.

Several bounds on the parameters of a linear code have been established and in this sectionwe discuss Hamming, Singleton, Gilbert-Varshamov, and Plotkin bounds. When n and k aregiven, the Hamming bound shows the limit of the efficiency of an [n, k] code, it specifies theminimum number of redundancy symbols for the code, while the Singleton bound providesan upper limit to the distance d of an [n, k, d] code. The Gilbert-Varshamov and the Plotkinbounds limit the number of codewords on an [n, k, d] code when n and d are specified.

We consider [n, k, d] linear codes C over an alphabet with q symbols; recall that an [n, k, d]code encodes k information symbols into codewords consisting of n symbols and guaranteesa minimum distance d between any pair of codewords. The codes are vector spaces over thefield GF (q) and the codewords c ∈ C are vectors in GF (qn).

Hamming bound. Let C be an [n, k] code over GF (q) capable of correcting e errors;then C satisfies the Hamming bound

qn ≥ qk

e∑i=0

(n

i

)(q − 1)i.

Proof: The number of n-tuples within a Hamming sphere of radius e around a codeword is

e∑i=0

(n

i

)(q − 1)i

There are qk such Hamming spheres and the total number of n-tuples is qn.

Example. Determine the minimum number r of redundancy bits required for a binary codeable to correct any single error. Our analysis is based on the following observations:

34The table is from www.ams.org

387

Table 14: The number of binary codewords for 5 ≤ n ≤ 15 and e = 1, 2, 3.n e=1 e=2 e=35 4 2 -6 8 2 -7 16 2 28 20 4 29 40 6 210 72 12 211 144 24 412 256 32 413 512 64 814 1024 128 1615 2048 256 32

1. To construct an [n,M ] block code we select a subset of cardinality M = 2k as codewordsout of the set of all 2n n-tuples.

2. To correct all 1-bit errors, the Hamming spheres of radius one centered at all M code-words must be disjoint.

3. A Hamming sphere of radius one centered around an n-tuple consists of (n+1) n-tuples;indeed, the number of n-tuples at distance one from the center of the Hamming sphereis equal to n, one for each bit position and we should add one for the center.

It is easy to see that M × (n+1) ≤ 2n. Indeed, the number of n-tuples of all the M Hammingspheres of radius one cannot exceed the total number of n-tuples. But n = k + r thus thisinequality becomes

2k × (k + r + 1) ≤ 2(k+r) or 2r − r ≥ k + 1.

We conclude that the minimum number of redundancy bits has a lower bound:

2r ≥ n + 1.

For example, if k = 15, the minimum value of r is 5.

A binary code is able to correct at most e errors if

2n ≥ 2k

e∑i=0

(n

i

)

or, using the fact that d = 2e + 1 and r = n − k

2r ≥�(d−1)/2�∑

i=0

(n

i

).

When d = 3 this becomes 2r ≥ n + 1 as before.

388

Singleton bound. Every [n, k, d] linear code C over the field GF (q) satisfies the inequality

k + d ≤ n + 1.

Proof: Let {e1, e2, . . .} be a basis in the vector space GF (qn). Express every c ∈ C in thisbasis; then construct a new code Cs consisting of codewords of length n − d + 1 obtained bydeleting the first d − 1 symbols of every c ∈ C. The codewords of C are at a distance at leastd from one another. When we remove the first d− 1 symbols from every c ∈ C, the remaining(n− d + 1)-tuples cs ∈ Cs are at least at distance 1, thus they are distinct. The original codeC consists of | C |= qk distinct codewords, thus the cardinality of the new code is

| Cs |= qk.

There cannot be more than qn−d+1 tuples of length (n − d + 1) in GF (qn−d+1). Thus,

qk ≤ qn−k+1 =⇒ k ≤ n − d + 1

or

k + d ≤ n + 1.

�A maximum distance separable (MDS) [n, k, d] linear code is a code C with the property

k + d = n + 1.

MDS codes are optimal in the following sense: for a given length of a codeword, n, and agiven number of information symbols, k, the distance d of the code reaches its maximum.

Corollary. Consider an infinite sequence of linear codes [ni, ki, di] such that when i → ∞then ni → ∞ and:

R = limi→∞

ki

ni

δ = limi→∞

di

ni

,

Then the Singleton bound requires that

ki

ni

+di

ni

≤ 1 +1

ni

.

When i → ∞ then

R + δ ≤ 1.

Figure 100(a) shows that in the limit any sequence of codes must have the rate and the relativeminimum distance (δ = d/n) within the triangle R + δ ≤ 1.

Gilbert-Varshamov bound. If C is an [n, k, d] linear code over GF (q) and if M(n, d) =|C | is the largest possible number of codewords for given n and d values, then the code satisfiesthe Gilbert-Varshamov bound

389

1≤+∂ R

∂∂

Figure 100: (a) The Singleton bound. (b) The Gilbert-Varshamov bound.

M(n, d) ≥ qn∑d−1i=0

(ni

)(q − 1)i

.

Proof: Let c ∈ C. Then there is an n-tuple v ∈ GF (qn) such that d(c, v), the distance betweenc and v, satisfies the inequality

d(c, v) ≤ d − 1.

The existence of such an n-tuple is guaranteed because otherwise v could be added to thecode C, while maintaining the minimum distance of the code as d. Recall that M(n, d) is thelargest possible number of codewords for given n and d values. A Hamming sphere of radius(d − 1) about a codeword c ∈ C contains

d−1∑i=0

(n

i

)(q − 1)i

n-tuples. We assume that there are M(n, d) codewords in C thus an equal number of Hammingspheres of radius (d−1), one about each codeword, Sci

, ci ∈ C. Thus, the entire space GF (qn)is contained in the union of all Hamming spheres of radius d−1 around individual codewords:

GF (qn) ⊂ ∪cj∈CScj(d − 1)

or

| GF (qn) |≤ M(n, d)d−1∑i=0

(n

i

)(q − 1)i.

It follows that

qn ≤ M(n, d)d−1∑i=0

(n

i

)(q − 1)i

or

390

M(n, d) ≥ qn∑d−1i=0

(ni

)(q − 1)i

.

�The Gilbert-Varshamov bound applies not only to linear codes but also to any code of

length n and distance d. If the code has distance d the Hamming spheres of radius �(d−1)/2�are disjoint but those of radius d are not disjoint; two spheres of radius d have many n-tuplesin common thus, the sum of the volumes of all these spheres has qn as a lower bound, as weare guaranteed to include all n-tuple and some of them are counted multiple times.

Plotkin bound. If C is an [n, k, d] linear code and if the condition (2d > n) is satisfied,then the code satisfies the Plotkin bound

M ≤ 2 d

2d − n

with M =| C |.Proof: Let cx, cy ∈ C. We find first a lower bound for

∑cx,cy∈C d(cx, cy); we observe that

∑cx,cy∈C

d(cx, cy) ≥ dM(M − 1).

Indeed, there are M(M − 1) pairs of codewords in C and d is the minimum distance betweenany pair of codewords.

We can arrange the codewords of C in a table with M rows and n columns. Considercolumn i of this table and call zi the number of 0s and (M − zi) the number of 1s in thatcolumn. Then,

∑cx,cy∈C

d(cx, cy) =n∑

i=1

2zi(M − zi)

because each choice of a 0 and an 1 in the same column of the table contributes 2 to sum ofdistances. Now we consider two cases:

(1) M is an even integer. In that case the sum on the right hand side of the equality ismaximized when zi = M/2. Thus we obtain an upper bound for the sum of distances:

∑cx,cy∈C

d(cx, cy) ≤ 1

2nM2.

Now we combine the lower and the upper bound to get

M(M − 1)d ≤ 1

2nM2,

or

M ≤ 2d

2d − n

because we assumed that 2d − n > 0. Since M is an even integer it follows that

391

M ≤ 2 d

2d − n.

(2) M is an odd integer. In that case the sum on the right hand side of the equality ismaximized when zi = (M ± 1)/2. Thus we obtain an upper bound for the sum of distances:

∑cx,cy∈C

d(cx, cy) ≤ 1

2n(M2 − 1).

Now,

M(M − 1)d ≤ 1

2n(M2 − 1),

or,

M ≤ 2d

2d − n− 1.

Again we use the fact that M is an integer:

M ≤ 2d

2d − n− 1 = 2d

2d − n − 1 ≤ 2 d

2d − n. �

It is now the time to present several families of codes and, not unexpectedly, we start ourjourney with the codes discovered by Richard Hamming.

4.8 Hamming Codes

Hamming codes, introduced in 1950 by Richard Hamming, are the first error correcting codesever invented. Hamming codes are linear codes; they can detect up to two simultaneousbit errors, and correct single-bit errors. At the time of their discovery Hamming codes wereconsidered a great improvement over simple parity codes which cannot correct errors and canonly detect an odd number of errors. Throughout this section we refer to the linear code Cconsisting of n-tuples with k information symbols in each codeword and with distance d asan [n, k, d] code.

A q-ary Hamming code of order r over the field GF (q) is an [n, k, 3]-linear code where

n =qr − 1

q − 1and k = n − r.

The parity-check matrix of the code Hr is an r × n matrix such that it contains no all-zerocolumns and no two columns are scalar multiples of each other.

A code C such that the Hamming spheres of radius e about each codeword are disjointand exhaust the entire space of n-tuples is called a perfect code capable of correcting e errors.

The last statement means that every n-tuple is contained in the Hamming sphere of radiuse about some codeword.

392

Proposition. The Hamming code of order r over GF (q) is a perfect code Cr.

Proof: If c ∈ Cr then the Hamming sphere of radius e about c contains: (i) one n-tuple atdistance 0 from c (c itself), (ii) n(q − 1) n-tuples at distance 1 from c, because there are nsymbols in an n-tuple and each symbol can be altered to any of the other q − 1 letters ofthe alphabet. Thus, the total number of n-tuples in a Hamming sphere about a codeword is(1 + n(q − 1)).

There are k information symbols, thus the total number of messages is qk and the totalnumber of such Hamming spheres is also qk. The total number of n-tuples in all the Hammingspheres about codewords is equal to the number of Hamming spheres times the number ofn-tuples in each Hamming sphere

qk(1 + n(q − 1)) = qn−r(1 +(qr − 1)

(q − 1)(q − 1)) = qn−r(1 + (qr − 1)) = qn.

But qn is precisely the total number of n-tuples with elements from an alphabet with qsymbols. This proves that the Hamming spheres of radius e = 1 about all codewords containall possible n-tuples. The spheres of radius 1 about codewords are disjoint because theminimum distance of the code is 3.

Assume that we transmit a codeword c ∈ C where C is a linear code with distance d = 3and parity check matrix H. A single error occurs and the error vector e has a Hamming weightof 1; what we receive is an n-tuple v = c + e. From the definition of the parity check matrixit follows that the syndrome associated with the n-tuple v is

HvT = H(c + e)T = HeT .

Since e has only one non-zero coordinate, say coordinate j which has value β, then HeT =βhj where hj is the j-th column of the parity-check matrix H. This justifies the followingproposition.

Proposition. The following procedure can be used to decode any single error linear code, andin particular, Hamming codes:

(i) Compute the syndrome HvT .

(ii) If the syndrome is zero no error has occurred.

(iii) If HeT = sT �= 0 then compare the syndrome with the columns of the parity-check matrixH.

(iv) If there is some integer j such that sT = βhj then the error vector e is an n-tuple withβ in position j and zero elsewhere. Then c = v − e.

(v) If there is no integer j such that sT = βhj then more than one error has occurred.

An [n, n − r, d] code whose parity check matrix has all nonzero binary vectors of length ras columns is a binary Hamming code.

There are 2r − 1 nonzero vectors of length r, thus n = 2r − 1. The number of informationsymbols is k = n − r = 2r − 1 − r. Recall that the minimum distance of the code is equal tothe minimum number of linearly dependent columns of the parity check matrix; by definition

393

any two rows are linearly independent, thus d = 3. It follows that a binary Hamming code isa [(2r − 1), (2r − 1 − r), 3] linear code.

Example. The binary Hamming code of order r = 3 is a [7, 4, 3]-linear code.Indeed, (23 − 1)/(2 − 1) = 7, thus n = 7 and k = n − r = 7 − 3 = 4. The parity-check

matrix of the code is

H3 =

⎛⎝ 1 0 0 1 1 0 1

0 1 0 0 1 1 10 0 1 1 0 1 1

⎞⎠ .

A code obtained by adding one all-0s column and one all-1s row to the generator matrixof a binary Hamming code is called an extended binary Hamming code. An extended binaryHamming code is a [2r, (2r − r − 1), 4] linear code.

Example. The binary Hamming code and the extended binary Hamming code of order r = 4.They are [15, 11, 3] and, respectively, [16, 11, 4] linear codes with parity check matrices, H4

and HE4 given by:

H4 =

⎛⎜⎜⎝

1 0 1 0 1 0 1 0 1 0 1 0 1 0 10 1 1 0 0 1 1 0 0 1 1 0 0 1 10 0 0 1 1 1 1 0 0 0 0 1 1 1 10 0 0 0 0 0 0 1 1 1 1 1 1 1 1

⎞⎟⎟⎠ ,

and

HE4 =

⎛⎜⎜⎜⎜⎝

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 00 1 1 0 0 1 1 0 0 1 1 0 0 1 1 00 0 0 1 1 1 1 0 0 0 0 1 1 1 1 00 0 0 0 0 0 0 1 1 1 1 1 1 1 1 01 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

⎞⎟⎟⎟⎟⎠ .

4.9 Proper Ordering, and the Fast Walsh-Hadamard Transform

In Section 1.21 we have introduced Hadamard matrices and the Walsh-Hadamard transform.In this section we continue our discussion, provide an alternative definition of the Walsh-Hadamard transform, and introduce the fast Walsh-Hadamard transform.

Consider all binary q-tuples, b1, b2, . . . , b2q . The proper ordering, πq, of binary q-tuples isdefined recursively:

π1 = [0, 1]...πi = [b1, b2, . . . b2i ]πi+1 = [b10, b20, . . . b2i0, b11, b21, . . . , b2i1] ∀i ∈ {2, q − 1}.

394

Example. Given that π1 = [0, 1] it follows that:

π2 = [00, 10, 01, 11]π3 = [000, 100, 010, 110, 001, 101, 011, 111]π4 = [0000, 1000, 0100, 1100, 0010, 1010, 0110, 1110, 0001, 1001, 0101, 1101, 0011, 1011, 0111, 1111]

Let n = 2q and let the n q-tuples under the proper order be

πq = [u0, u1, . . . , un−1] .

Let us enforce the convention that the leftmost bit of ui is the least significant bit of the integerrepresented by ui as a q-tuple. For example, consider the following mappings of 4-tuples tointegers:

0000 �→ 0, 1000 �→ 1, 0100 �→ 2, 1100 �→ 3, 0010 �→ 4, 1010 �→ 5, . . . , 0111 �→ 14, 1111 �→ 15.

Proposition. The matrix H = [hij] with

hij = (−1)ui·uj ∀(i, j) ∈ {0, n − 1},where ui and uj are members of

πq = [u0, u1, . . . un−1] ,

is the Hadamard matrix of order n = 2q. Observe that the rows and the columns of the matrixH are numbered from 0 to n − 1.

As an example, consider the case q = 3. It is easy to see that the matrix elements of thefirst row, h0i, and the first column, hi0, are all +1 because u0 · ui = ui · u0 = 0,∀i ∈ {1, 7}and (−1)0 = +1. It is also easy to see that the diagonal element hii = −1 when the weightof i (the number of 1’s in the binary representation of integer i) is odd and hii = +1 whenthe weight of i is even. Individual calculations of hij, i �= j are also trivial. For example,h32 = (−1)(110)·(010) = (−1)1 = −1. Indeed (110) · (010) = 1 · 0 + 1 · 1 + 0 · 0 = 1. This showsthat H = H3.

Let c = (c0c1c2 . . . ci . . . c2q−1) be a binary 2q-tuple, ci = {0, 1} and let Bq be a q × 2q

matrix whose columns are all 2q possible q-tuples bi. We define c(b) to be the component ofc selected by the q-vector b according to the matrix B. The binary value c(b) can be either 0or 1, depending on the value of the corresponding element of the binary 2q-tuple c.

For example, let q = 3 and B3 be given by

B3 =

⎛⎝ 1 0 0 1 0 1 1 0

0 1 0 1 1 0 1 00 0 1 0 1 1 1 0

⎞⎠ .

Let c = (0 1 1 1 1 0 0 1). Then c(1 1 0) = 1 because (1 1 0) is the 4-th column of B and the4-th element of c is 1. Similarly c(1 1 1) = 0 because (1 1 1) is the 7-th column of B and the7-th element of c is 0.

395

We can define a vector R(c) whose components Ri(c) are (−1)c(b) and their value can beeither +1, or −1. In our example

R(c) = (+1, − 1, − 1, − 1, − 1, + 1, + 1, − 1).

Indeed,

R1(c) = (−1)c(100) = (−1)0 = +1R2(c) = (−1)c(010) = (−1)1 = −1R3(c) = (−1)c(001) = (−1)1 = −1R4(c) = (−1)c(110) = (−1)1 = −1R5(c) = (−1)c(011) = (−1)1 = −1R6(c) = (−1)c(101) = (−1)0 = +1R7(c) = (−1)c(111) = (−1)0 = +1R8(c) = (−1)c(000) = (−1)1 = −1.

Let d be a binary q-tuple and let c be a binary 2q-tuple. Let R(c) = (−1)c(b) be a 2q-tuplewith entries either +1, or −1 as defined earlier; then the Walsh-Hadamard transform of R(c)is defined as

R(d) =∑b∈Bq

(−1)d·bR(c))

or

R(d) =∑b∈Bq

(−1)d·b+c(b).

Example. Let q = 3, d = ( 1 1 1)T and c = (0 1 1 1 1 0 0 1). Then,

R(1 1 1) =∑b∈B3

(−1)( 1 1 1)·b+c(b)

R(1 1 1) = (−1)( 1 1 1)·(1 0 0)+c(1 0 0) + (−1)(1 1 1)·(0 1 0)+c(0 1 0) + (−1)(1 1 1)·(0 0 1)+c(0 0 1)

+(−1)(1 1 1)·(1 1 0)+c(1 1 0) + (−1)(1 1 1)·(0 1 1)+c(0 1 1) + (−1)(1 1 1)·(1 0 1)+c(1 0 1)

+(−1)(1 1 1)·(1 1 1)+c(1 1 1) + (−1)(1 1 1)·(0 0 0)+c(0 0 0)

= (−1)1+0 + (−1)1+1 + (−1)1+1 + (−1)0+1 + (−1)0+1 + (−1)0+0 + (−1)1+0 + (−1)0+1

= −2.

Proposition. Given the binary vector

t = c +

q∑i=1

divi

396

with d = (d1d2 . . . dq)T a binary q-tuple and vi the i-th row of Bq, then R(d) is the number

of 0s minus the number of 1s in t. This proposition is the basis of a decoding scheme forfirst-order Reed-Muller codes R(1, r) discussed in Section 4.10.

There is a faster method to compute the Walsh-Hadamard transform by using matricesM

(i)2q = I2q−i ⊗ H(2) ⊗ I2i−1 . If q is a positive integer the fast Walsh-Hadamard transform

allows a speedup of (2q+1 − 1)/3q over the classical Walsh-Hadamard transform.

Proposition. If q is a positive integer and M(i)2q = I2q−i⊗H(2)⊗I2i−1 with H(2) the Hadamard

matrix, I2q−i the identity matrix of size 2q−i × 2q−i, and I2i−1 the identity matrix of size2i−1 × 2i−1, then:

H(2q) = M(1)2q M

(2)2q . . .M

(q)2q .

We can prove this equality by induction. For q = 1 we need to prove that H(2) = M(1)2 . We

denote by In×n or simply In the n × n identity matrix. By definition:

M(1)2 = I21−1 ⊗ H(2) ⊗ I21−1 = H(2).

Assume that this is true for q = k and consider the case q = k + 1. For q = k we have:

H(2k) = M(1)

2k M(2)

2k . . .M(k)

2k .

Now for q = k + 1 we have to prove that

H(2k+1) = M(1)

2k+1M(2)

2k+1 . . . M(k)

2k+1M(k+1)

2k+1 .

Then

M(i)

2k+1 = I2k+1−i ⊗ H(2) ⊗ I2i−1 = I2 ⊗ I2k−i ⊗ H(2) ⊗ I2i−1 = I2 ⊗ M(i)

2k

Thus,

H(2k+1) = (I2 ⊗ M(1)

2k )(I2 ⊗ M(2)

2k ) . . . (I2 ⊗ M(k)

2k )M(k+1)

2k+1 .

We know that the tensor product of square matrices V,W,X, Y has the following property(V ⊗ W )(X ⊗ Y ) = V X ⊗ WY . Applying this property repeatedly after substituting H(2k)

for M(1)

2k M(2)

2k . . . M(k)

2k we get

H(2k+1) = (I2 ⊗ M(1)

2k M(2)

2k . . . M(k)

2k )(M(k+1)

2k+1 ) = (I2 ⊗ H(2k)(M(k+1)

2k+1 ).

But from the definition of M(k+1)

2k+1 we see that

M(k+1)

2k+1 = H(2) ⊗ I2k .

Thus,

H(2k+1) = (I2 ⊗ H(2k))(H(2) ⊗ I2k) = (I2H(2)) ⊗ (H(2k)I2k) = H(2) ⊗ H(2k) = H(2k+1).

397

Compared with the traditional Walsh-Hadamard transform, the fast Walsh-Hadamard trans-form allows a speedup SFHT/HT given by:

SFHT/HT =2q+1 − 1

3q.

For example, for q = 16 we have SFHT/HT = 217/48 ≈ 128, 000/48 ≈ 4, 200.

Example. Given R(c) = (+1,−1,−1,−1,−1, +1, +1,−1) let us compute R = RH withH = H(23) = M1

8 M28 M3

8 . First, we calculate M18 , M2

8 , and M38 as follows:

M18 = I4 ⊗ H(2) ⊗ I1 =

⎛⎜⎜⎝

1 0 0 00 1 0 00 0 1 00 0 0 1

⎞⎟⎟⎠ ⊗

(1 11 −1

).

Thus,

M18 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 0 0 0 0 0 01 −1 0 0 0 0 0 00 0 1 1 0 0 0 00 0 1 −1 0 0 0 00 0 0 0 1 1 0 00 0 0 0 1 −1 0 00 0 0 0 0 0 1 10 0 0 0 0 0 1 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

Then,

M28 = I2⊗H(2)⊗I2 =

(1 00 1

)⊗

(1 11 −1

)⊗

(1 00 1

)=

⎛⎜⎜⎝

1 1 0 01 −1 0 00 0 1 10 0 1 −1

⎞⎟⎟⎠⊗

(1 00 1

).

Thus,

M28 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 1 0 0 0 0 00 1 0 1 0 0 0 01 0 −1 0 0 0 0 00 1 0 −1 0 0 0 00 0 0 0 1 0 1 00 0 0 0 0 1 0 10 0 0 0 1 0 −1 00 0 0 0 0 1 0 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

Finally,

398

M38 = I1 ⊗ H(2) ⊗ I4 =

(1 11 −1

)⊗

⎛⎜⎜⎝

1 0 0 00 1 0 00 0 1 00 0 0 1

⎞⎟⎟⎠ .

Thus,

M38 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 0 1 0 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 1 0 0 0 11 0 0 0 −1 0 0 00 1 0 0 0 −1 0 00 0 1 0 0 0 −1 00 0 0 1 0 0 0 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

Then,

RM18 = (+1,−1,−1,−1,−1, +1, +1,−1)

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 0 0 0 0 0 01 −1 0 0 0 0 0 00 0 1 1 0 0 0 00 0 1 −1 0 0 0 00 0 0 0 1 1 0 00 0 0 0 1 −1 0 00 0 0 0 0 0 1 10 0 0 0 0 0 1 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

or,

RM18 = (0, +2,−2, 0, 0,−2, 0, +2).

Next,

(RM18 )M2

8 = (0, +2,−2, 0, 0,−2, 0, +2)

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 1 0 0 0 0 00 1 0 1 0 0 0 01 0 −1 0 0 0 0 00 1 0 −1 0 0 0 00 0 0 0 1 0 1 00 0 0 0 0 1 0 10 0 0 0 1 0 −1 00 0 0 0 0 1 0 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

or,(RM1

8 )M18 = (−2, +2, +2, +2, 0, 0, 0,−4).

Finally,

399

(RM18 M2

8 )M38 = (−2, +2, +2, +2, 0, 0, 0,−4)

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 0 1 0 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 1 0 0 0 11 0 0 0 −1 0 0 00 1 0 0 0 −1 0 00 0 1 0 0 0 −1 00 0 0 1 0 0 0 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

orR = (RM1

8 M28 )M3

8 = (−2, +2, +2,−2,−2, +2, +2, +6).

4.10 Reed-Muller Codes

Reed-Muller codes are linear codes with numerous applications in communication; they wereintroduced in 1954 by Irving S. Reed and D. E. Muller. Muller invented the codes [303] andthen Reed proposed the majority logic decoding scheme [353]. In Section 4.8 we defined Hr,the parity check matrix of a binary (2r −1, 2r −1− r) Hamming code. Hr has 2r −1 columns,consisting of all non-zero binary r-tuples.

The first-order Reed-Muller code R(1, r) is a binary code with the generator matrix

G =

(1 1Hr 0

)=

(1Br

)

where 1 is an (2r − 1)-tuple of 1s. The generator matrix G of the code is constructed byaugmenting Hr with a zero column to obtain a matrix Br, and then adding as a first row a2r-tuple of 1s.

Example. For r = 3, given H3

H3 =

⎛⎝ 1 0 0 1 1 0 1

0 1 0 1 1 1 00 0 1 1 0 1 1

⎞⎠

we construct first

B3 =

⎛⎝ 1 0 0 1 1 0 1 0

0 1 0 1 1 1 0 00 0 1 1 0 1 1 0

⎞⎠

and then

G =

⎛⎜⎜⎝

1 1 1 1 1 1 1 11 0 0 1 1 0 1 00 1 0 1 1 1 0 00 0 1 1 0 1 1 0

⎞⎟⎟⎠ .

400

Proposition. The code with the generator matrix G, a first-order Reed-Muller code, hascodewords of length n = 2r, the number of information symbols is k = r + 1. The code hasdistance d = 2r−1.

Proof: Recall that Hr is an r × n′ matrix with n′ = 2r − 1. Then Br is an r × n matrix withn = n′ +1 = 2r ; its first n− 1 columns, bi, 1 ≤ i ≤ n− 1 are all non-zero binary r-tuples andits last column is the all-zero r-tuple, Br = (b1b2 . . . bn−10). It follows that half of the n = 2r

columns of Br have even parity and half have odd parity.The generator matrix G of the code C is an (r+1)×n matrix with the first row an n-tuple

of all 1s and the other r rows the rows of Br. Half of the columns of G have an even parityand half have an odd parity as every column of G adds a 1 as its first element to a columnof Br. Every codeword c ∈ C is obtained as a product of a message (r + 1)-tuple with thegenerator matrix G thus, every codeword will have half of its 2r components 1s and half 0s.Thus, the minimum weight of any codeword is exactly 2r−1.

Proposition. Assume that the columns of Br are in the proper order πr and that H is theHadamard matrix H = H(2r). Let w be a received n-tuple when a first order Reed-Mullercode is used for transmission. The following procedure is used to decode w:

(i) Compute R and R.

(ii) Find a component R(u) whose magnitude is maximum. Let u = (u1, u2, . . . , ur)T be a

binary r-tuple and let bi be the i-th row of Br.

(iii) If R(u) > 0 then decode w as∑r

i=1 uibi.

(iv) If R(u) ≤ 0 then decode w as I +∑r

i=1 uibi with I an n-tuple of 1s.

A justification of this decoding procedure can be found in most standard texts on errorcorrecting codes such as [436].

Example. Consider the B3 matrix whose columns are π3, in other words they are the binaryexpression of all 3-tuples in proper order, e.g., the 4-th column is (0 1 1) the binary expressionof integer 3, the 7-th column is 1 1 0 the binary expression of integer 6, and so on. We assumethat the leftmost bit is the least significant one.

B3 =

⎛⎝ 0 1 0 1 0 1 0 1

0 0 1 1 0 0 1 10 0 0 0 1 1 1 1

⎞⎠ =

⎛⎝ b1

b2

b3

⎞⎠

The corresponding generator matrix of the Reed-Muller code is

G =

⎛⎜⎜⎝

1 1 1 1 1 1 1 10 1 0 1 0 1 0 10 0 1 1 0 0 1 10 0 0 0 1 1 1 1

⎞⎟⎟⎠

Assume that we receive w = (0 1 1 1 1 0 0 1). Then,

R = (+1, − 1, − 1, − 1, − 1, + 1, + 1, − 1).

401

The Hadamard transform of R is (see for example [284])

R = RH8 = (−2, + 2, + 2, − 2, − 2, + 2, + 2, − 4)

with

H8 =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 1 1 1 11 −1 1 −1 1 −1 1 −11 1 −1 −1 1 1 −1 −11 −1 −1 1 1 −1 −1 11 1 1 1 −1 −1 −1 −11 −1 1 −1 −1 1 −1 11 1 −1 −1 −1 −1 1 11 −1 −1 1 −1 1 1 −1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

We see that the largest component of R occurs in position 8, so we set u equal to the lastcolumn of Br, u = (1 1 1)T . Since the largest component is negative (it is equal to −4) wedecode w as

v = I +3∑

i=1

uibi = (1 1 1 1 1 1 1 1) + 1 · b1 + 1 · b2 + 1 · b3

v = (1 1 1 1 1 1 1 1) + 1 · (0 1 0 1 0 1 0 1) + 1 · (0 0 1 1 0 0 1 1) + 1 · (0 0 0 0 1 1 1 1) =

(1 1 1 1 1 1 1 1) + (0 1 1 0 1 0 0 1) = (1 0 0 1 0 1 1 0).

An alternative description of Reed-Muller codes is based on manipulation of Booleanmonomials. A monomial is a product of powers of variables. For example, given the variablesx, y, and z then a monomial is of the form xaybzc with a, b, and c nonnegative integers,a, b, c ∈ Z+. A Boolean monomial in variables x1, x2, . . . , xn is an expression of the form

p = xr11 xr2

2 . . . xrnn with ri ∈ Z+, 1 ≤ i ≤ n.

The reduced form of p is the result of applying two rules: (i) xixj = xjxi; and (ii) x2i = xi.

A Boolean polynomial is a linear combination of Boolean monomials, where xi are binaryvectors of length 2m and the exponents ri are binary. The degree 0 monomial is a vector1 of 2m ones; e.g., when m = 3 then this monomial is (11111111). The degree 1 monomialx1 is a vector of 2m−1 ones followed by 2m−1 zeros. The monomial associated with x2 is avector of 2m−2 ones followed by 2m−2 zeros followed by 2m−2 ones followed by 2m−2 zeros. Ingeneral, the vector associated with the monomial xi consists of 2m−i ones followed by 2m−i

zeros repeated until a vector of length 2m is obtained. For example, when m = 3 the vectorassociated with the monomial x1x2x3 is

(11110000) ⊗ (11001100) ⊗ (10101010) = (10000000)

with “⊗” the regular bitwise XOR operation. The scalar product of two binary vectors isdefined as the sum of the product of individual components; for example,

402

(11110000) · (11001100) = 1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 = 0.

An r-th order Reed-Muller code R(r,m) is the set of all binary strings of length n = 2m

associated with the Boolean polynomials p = xr11 xr2

2 . . . xrnn of degree at most r. For example,

the 0-th order code R(0,m) is a repetition of strings of zeros or ones of length 2m; the m-thorder code R(m,m) consists of all binary strings of length 2m.

The generator matrix of an R(r,m) with r > 1 code is formed by adding(

mr

)rows to the

generator matrix of an R(r − 1,m) code. For example, the generator matrix of an R(1, 3)code is

GR(1,3) =

⎛⎜⎜⎝

1x1

x2

x3

⎞⎟⎟⎠ =

⎛⎜⎜⎝

1 1 1 1 1 1 1 11 1 1 1 0 0 0 01 1 0 0 1 1 0 01 0 1 0 1 0 1 0

⎞⎟⎟⎠ .

We add the rows x1 ⊗ x2, x1 ⊗ x3, and x2 ⊗ x3 to GR(1,3) to obtain the generator matrix of anR(2, 3) code,

GR(2,3) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

1x1

x2

x3

x1 ⊗ x2

x1 ⊗ x3

x2 ⊗ x3

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 1 1 1 11 1 1 1 0 0 0 01 1 0 0 1 1 0 01 0 1 0 1 0 1 01 1 0 0 0 0 0 01 0 1 0 0 0 0 01 0 0 0 1 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

The generator matrix of an R(2, 4) code is

GR(2,4) =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1x1

x2

x3

x4

x1 ⊗ x2

x1 ⊗ x3

x1 ⊗ x4

x2 ⊗ x3

x2 ⊗ x4

x3 ⊗ x4

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 0 0 0 0 0 0 0 01 1 1 1 0 0 0 0 1 1 1 1 0 0 0 01 1 0 0 1 1 0 0 1 1 0 0 1 1 0 01 0 1 0 1 0 1 0 1 0 1 0 1 0 1 01 1 1 1 0 0 0 0 0 0 0 0 0 0 0 01 1 0 0 1 1 0 0 0 0 0 0 0 0 0 01 0 1 0 1 0 1 0 0 0 0 0 0 0 0 01 1 0 0 0 0 0 0 1 1 0 0 0 0 0 01 0 1 0 0 0 0 0 1 0 1 0 0 0 0 01 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

.

The set of characteristic vectors of any row of the generator matrix is the set of all monomialsin xi and xi, the bitwise reverse of xi, that are not in the monomial associated with that row.For example, the characteristic vectors of the row before the last of the generator matrix ofthe R(2, 4) code, correspond to the vector x2 ⊗ x4 and are x1 ⊗ x3, x1 ⊗ x3, x1 ⊗ x3, andx1⊗ x3. The scalar product of the vectors in this set with all the rows of the generator matrixof the code, except the row they belong to, is equal to zero.

403

The decoding algorithm for Reed-Muller codes described below is not very efficient, butis straightforward; it allows us to extract the message μ from the n-tuple received, v, and todetermine c, the original codeword sent.

• Step 1. Compute the 2r−m characteristic vectors κji for each row of the generator matrix

GR(m,r) except the first row.

• Step 2. Compute the scalar products of v with all the characteristic vectors κji for rows

gi, i > 1 of the generator matrix. Determine the binary indicator fi for each row, startingwith the bottom row; if the majority of the scalar products v · κj

i are equal to zero thenfi = 0, else fi = 1. For example, the rows for the R(1, 3) code are: g1 = x1, g2 = x2,and g3 = x3.

• Step 3. Compute the binary vector ν as follows: multiply each fi with the correspondingrow gi of the generator matrix and then add them together, ν =

∑i figi. For example,

ν = f3x3 + f2x2 + f1x1 for the R(1, 3) code.

• Step 4. Add the result of Step 3 to the received n-tuple; if ν + v has a majority ofzeros then assign a zero to the binary indicator of the first row of the generator matrix,f0 = 0, else f0 = 1. The message sent is μ = (f0f1f2.....). For example, μ = f0f1f2f3

for the R(1, 3) code.

Example. Correction of one error for the R(1, 3) Reed-Muller code. The distance of thiscode is d = 2r−m = 23−1 = 4; since d ≥ 3 the code can correct a single error in v. Themessage sent is μ = (1110); it is encoded as the codeword c ∈ R(1, 3) by multiplying themessage vector with the generator matrix of the code:

c = μGR(1,3) =(

1 1 1 0)⎛⎜⎜⎝

1 1 1 1 1 1 1 11 1 1 1 0 0 0 01 1 0 0 1 1 0 01 0 1 0 1 0 1 0

⎞⎟⎟⎠ =

(1 1 0 0 0 0 1 1

)

The message is affected by an error in the second bit and the 8-tuple received is w = 10000011.

Let us now apply the algorithm outlined above:We compute first the characteristic set of the vector x3 = 10101010 corresponding to the lastrow of the generator matrix; this set includes x1 ⊗ x2, x1 ⊗ x2, x1 ⊗ x2, and x1 ⊗ x2. Recallthat x1 = (11110000) thus, x1 = (00001111); similarly x2 = (11001100), so x2 = (00110011).Therefore, x1 ⊗ x2 = (11000000), x1⊗ x2 = (00110000), x1 ⊗ x2 = (00001100), and x1 ⊗ x2 =(00000011). The scalar products of these vectors with v are:

(11000000) · (10000011) = 1; (00110000) · (10000011) = 0;(00001100) · (10000011) = 0; (00000011) · (10000011) = 0.

The majority logic leads us to conclude that f3 = 0. We continue with the third row ofthe matrix, x2 = (11001100). The characteristic vectors are x1 ⊗ x3, x1 ⊗ x3, x1 ⊗ x3, andx1 ⊗ x3 These vectors are (10100000), (01010000), (00001010), and (00000101), respectively.The scalar products of these vectors with v are:

(10100000) · (10000011) = 1; (01010000) · (10000011) = 0;

404

(00001010) · (10000011) = 1; (00000101) · (10000011) = 1.We conclude that f2 = 1. The second row of the matrix is x1 = (11110000). The characteristicvectors are x2 ⊗ x3, x2 ⊗ x3, x2 ⊗ x3, and x2 ⊗ x3. These vectors are (10001000), (00100010),(00001010), and (00010001), respectively. The scalar products of these vectors with v are:

(10001000) · (10000011) = 1; (00100010) · (10000011) = 1;(01000100) · (10000011) = 0; (00010001) · (10000011) = 1.

We conclude that f1 = 1. Now we construct

ν = f3x3 + f2x2 + f1x1 = 0 · (10101010) + 1 · (11001100) + 1 · (11110000) = (00111100)

and then compute

ν + w = (00111100) + (10000011) = (10111111).

This string has more ones than zeros; this implies that the coefficient of the first row f0 ofthe generator matrix is a 1, thus the message is

μ = (f0f1f2f3) = (1110).

The zero in the second position of (ν +w) indicates an error, namely a zero has been receivedin error instead of an one.

We note that a Reed-Muller code R(r,m) exists for any integers m ≥ 0 and 0 ≤ r ≤ m; thecode R(r,m) is a binary [n, k, d] code with codewords of length n = 2m, distance d = 2r−m,and k(r,m) = k(r,m − 1) + k(r − 1,m − 1). For example, R(3, 4) and R(3, 5) are [16, 15, 2]and [32, 26, 4] Hamming codes, respectively, while R(0, 1), R(1, 3), and R(2, 5) are [2, 1, 2],[8, 4, 4], and [32, 16, 8] self-dual codes, respectively. The dual code of R(r,m) is the codeR(m − r − 1,m). The majority logic decoding used for Reed-Muller codes computes severalchecksums for each received codeword element. To decode an r-th order code, we have todecode iteratively r + 1 times before we identify the original codeword.

We continue our discussion with one of the most important classes of linear codes, cycliccodes.

4.11 Cyclic Codes

There is a one-to-one mapping between vectors from Vn, an n-dimensional vector space overthe finite field F and polynomials of degree (n − 1) with coefficients from the same field F .This observation is the basis of another formalism to describe a class of linear codes, the cycliccodes, a formalism based on algebras of polynomials over finite fields.

A cyclic code is characterized by a generator polynomial; encoding and decoding of cycliccodes reduces to algebraic manipulation of polynomials. We start our discussion of cyclic codeswith several definitions and then discuss some properties of polynomials with coefficients froma finite field.

Given the vector c = (c0c1 . . . cn−2cn−1) ∈ Vn, a cyclic shift is a transformation c �→ c′ withc′ = (cn−1c0c1 . . . cn−2) ∈ Vn.

405

We wish to establish a correspondence between vector spaces and sets of polynomials. Sucha correspondence will facilitate the construction of codes with an algebraic structure and willallow us to express properties of linear codes, as well as algorithms for error detection anderror correction in terms of the algebra of polynomials. We associate a codeword, a vectorc ∈ Vn with a polynomial of degree (n−1) as follows: the coefficients of the polynomial terms,from low to high order, are the vector elements

c = (c0c1 . . . cn−1) ∈ Vn ⇐⇒ c(x) = c0 + c1x + c2x2 + . . . cn−1x

n−1.

Let us now introduce cyclic codes. First, we define the concept of a cyclic subspace of avector space as a subspace closed to cyclic shifts.

A cyclic subspace C of an n-dimensional vector space Vn over the field F is a set of vectors{c} ∈ C with the property

c = (c0c1c2 . . . cn−2cn−1) ∈ C =⇒ c′ = (cn−1c0c1 . . . cn−2) ∈ C.

A linear code C is a cyclic code if C is a cyclic subspace of an n-dimensional vector spaceVn over the field F .

Consider again the case f(x) = xn − 1 and the finite ring of polynomials modulo f(x),F[x]/f(x). It is easy to prove the following proposition.

Proposition. Multiplication by x of a polynomial c(x) in the finite ring of polynomi-als modulo f(x), F[x]/f(x), is equivalent with a cyclic shift of the corresponding vectorc = (c0, c1, c2, . . . , cn−2cn−1) into c′ = (cn−1, c0, c1, . . . , cn−2) with c, c′ ∈ Vn.

Proof: Let c(x) = c0 + c1x + c2x2 + . . . + cn−2x

n−2 + cn−1xn−1 mod f(x). Then

c′(x) = x · c(x) = c0x + c1x2 + c2x

3 + . . . + . . . + cn−2xn−1 + cn−1x

n mod f(x)

But xn = 1 mod f(x) thus,

c′(x) = cn−1 + c0x + c1x2 + c2x

3 + . . . + . . . + cn−2xn−1 mod f(x).

Example. Let Z2 be the binary field and f(x) = 1 + x4. ThenZ2[x]/(x4 + 1) = {[0], [1], [x], [1 + x], [x2], [1 + x2], [x + x2], [1 + x + x2],[x3], [1 + x3], [1 + x + x3], [1 + x2 + x3], [1 + x + x2 + x3], [x + x3], [x2 + x3], [x + x2 + x3]}The corresponding vectors in V4(GF (2)) are:

C = { 0000, 1000, 0100, 1100, 0010, 1010, 0110, 1110,0001, 1001, 1101, 1011, 1111, 0101, 0011, 0111 }.

C is a cyclic subspace of V4(GF (2)), it is closed to cyclic shifts. Indeed, if we apply a cyclicshift we get:

406

0000 �→ 0000, 1000 �→ 0100, 0100 �→ 0010, 1100 �→ 0110, 0010 �→ 0001, 1010 �→ 0101,0110 �→ 0011, 1110 �→ 0111, 0001 �→ 1000, 1001 �→ 1100, 1101 �→ 1110, 1011 �→ 1101,1111 �→ 1111, 0101 �→ 1010, 0011 �→ 1001, 0111 �→ 1011

We notice that:

x · [0] = [0], x · [1] = [x], . . . , x · [x2 + x3] = [1 + x3], x · [x + x2 + x3] = [1 + x2 + x3].

Therefore, multiplication with x of the polynomials in Z2[x]/(x4 + 1) leads to the set:

{[0], [1], [x], [1 + x], [x2], [1 + x2], [x + x2], [1 + x + x2],[x3], [1 + x3], [1 + x + x3], [1 + x2 + x3], [1 + x + x2 + x3], [x + x3], [x2 + x3], [x + x2 + x3]}We can easily verify that the correspondence between the vectors subject to a cyclic shift

and polynomials multiplied by x is preserved.

Now we discuss a very important proposition which relates cyclic subspaces of an n-dimensional vector space with ideals of a ring of polynomials. We show that there is a one-to-one correspondence between cyclic subspaces of the vector space Vn and ideals of polynomialsgenerated by a polynomial g(x) in a finite ring of polynomials over F modulo the non-zeropolynomial f(x) = xn − 1 when g(x) divides f(x).

Proposition. There is an one-to-one correspondence between the cyclic subspaces of an n-dimensional vector space Vn over the field of F = GF (q) and the monic polynomials35 g(x) ∈F[x] which divide f(x) = xn − 1 with F[x] a ring of polynomials. If

f(x) =

q∏i=1

gaii (x)

with ai positive integers and gaii (x), 1 ≤ i ≤ q, distinct irreducible monic polynomials, then

Vn contains

Q =

q∏i=1

(ai + 1)

cyclic subspaces. If g(x) is a monic polynomial of degree (n−k) and it divides f(x) = xn − 1,then g(x) is the generator polynomial of a cyclic sub-space of Vn. The dimension of this cyclicsubspace is equal to k.

This proposition follows from several propositions whose proofs are left to the reader36.First, let S be a subspace of an n-dimensional vector space Vn over the field F, S ⊂ Vn. LetR be the ring of polynomials associated with Vn and let I ∈ R be the set of polynomialscorresponding to S. We can show that S is a cyclic subspace of Vn if and only if I is an idealin R, Figure 101.

35The coefficient of the highest order term of a monic polynomial is 1: f(x) = xn +an−1xn−1 + . . .+a1x+1.

36See the exercises at the end of this chapter.

407

n

Figure 101: There is a one-to-one correspondence between the vector space Vn(F ), an qn-dimensional vector space over the finite field F , and R, the ring of polynomials with coefficientsfrom the finite field F , modulo the polynomial f(x) = xn − 1. There is also a one-to-onecorrespondence between S, a cyclic subspace of Vn, and I, an ideal of the ring R generatedby the polynomial g(x) ∈ R. Moreover, f(x) is a multiple of g(x), f(x) = g(x) · h(x).

If f(x) = xn − 1, R is the ring of equivalence classes of polynomials modulo f(x) withcoefficients in the field F , R = F[x]/(f(x)), and g(x) is a monic polynomial which divides f(x),then we can show that g(x) is the generator of the ideal of polynomials I = m(x)·g(x), m(x) ∈R. We can also show that there is a one-to-one correspondence between the cyclic subspacesof Vn and the monic polynomials g(x) which divide f(x) = xn − 1.

Example. We consider several values of n, namely 5, 10, 15, 20, and 25. Show the generatorpolynomials for the cyclic subspaces of Vn(GF (2)) and calculate their dimension.

• n = 5. Then:

x5−1 = (1+x) ·(1+x+x2 +x3 +x4); g1(x) = 1+x and g2(x) = 1+x+x2 +x3 +x4.

The number of cyclic subspaces is Q = 2 × 2 = 4.

The cyclic subspace generated by g1 has dimension 4 as

n − k1 = 1 → k1 = n − 1 = 4.


n − k2 = 4 → k2 = n − 4 = 1.

• n = 10. Then:

x10−1 = (1+x)2 ·(1+x+x2+x3+x4)2; g1(x) = 1+x and g2(x) = 1+x+x2+x3+x4.

The number of cyclic subspaces is: Q = 3 · 3 = 9.


n − k1 = 1 → k1 = n − 1 = 9.


n − k2 = 4 → k2 = n − 4 = 6.

408

• n = 15. Then:

x15 − 1 = (1 + x) · (1 + x + x2) · (1 + x + x2 + x3 + x4) · (1 + x + x4) · (1 + x3 + x4)

g1(x) = 1 + x, g2(x) = 1 + x + x2, g3(x) = 1 + x + x2 + x3 + x4

g4(x) = 1 + x + x4, g5(x) = 1 + x3 + x4.

The number of cyclic subspaces is: Q = 2 × 2 × 2 × 2 × 2 = 32.


n − k1 = 1 → k1 = n − 1 = 14.


n − k2 = 2 → k2 = n − 2 = 13.

The cyclic subspaces generated by g3, g4 and g5 have dimension 11 as

n − k3 = 4 → k3 = n − 4 = 11.

• n = 20. Then:

x20−1 = (1+x)4 ·(1+x+x2+x3+x4)4; g1(x) = 1+x and g2(x) = 1+x+x2+x3+x4.

The number of cyclic subspaces is: Q = 5 × 5 = 25.


n − k1 = 1 → k1 = n − 1 = 19.


n − k2 = 4 → k2 = n − 4 = 16.

• n = 25. Then:

x25 − 1 = (1 + x) · (1 + x + x2 + x3 + x4) · (1 + x5 + x10 + x15 + x20).

g1(x) = 1 + x, g2(x) = 1 + x + x2 + x3 + x4, and g3(x) = 1 + x5 + x10 + x15 + x20.

The number of cyclic subspaces is: Q = 2 × 2 × 2 = 8.


n − k1 = 1 → k1 = n − 1 = 24.


n − k2 = 4 → k2 = n − 4 = 6.


n − k3 = 20 → k2 = n − 20 = 5.

In summary, a cyclic code is an ideal I in the ring R of polynomials modulo f(x) = xn −1over a finite field, GF (q); f(x) is a product of irreducible polynomials f(x) = Πi(gi(x)). An[n, k] cyclic code is generated by a polynomial g(x) ∈ I of degree (n− k); all codewords of Cg

are multiples of the generator polynomial g(x). The generator polynomial of a code Cg maybe irreducible, g(x) = gi(x), or a product of several of several irreducible polynomials gi(x).

409

4.12 Encoding and Decoding Cyclic Codes

In this section we recast familiar concepts from the theory of linear codes such as generator andparity check matrices and syndrome in terms of algebras of polynomials and discuss encodingand decoding cyclic codes. Without loss of generality we only consider vector spaces over thefinite field, GF (2), a fancy way of expressing the fact that the coefficients of polynomials areonly zeros and ones. The [n, k] linear code C is characterized by the generator polynomialg(x) of degree (n − k):

g(x) = g0 + g1x + g2x2 + . . . + gn−k−1x

n−k−1 + gn−kxn−k, gi ∈ {0, 1}.

All operations are performed in an algebra modulo of the polynomial f(x) with

f(x) = xn − 1

subject to the condition that g(x) divides f(x)

f(x) = xn − 1 = g(x) · h(x).

A message of k bits is now represented by m(x), a polynomial of degree at most (k − 1):

m(x) = m0 + m1x + . . . + mk−2xk−2 + mk−1x

k−1, mi ∈ {0, 1}.The message m(x) is encoded as the polynomial c(x):

c(x) = g(x) · m(x).

The degree of the polynomial c(x) cannot be larger than (n − 1):

deg[c(x)] = deg[g(x)] + deg[m(x)] ≤ (n − k) + (k − 1) = n − 1.

Thus, when we encode the message polynomial m(x) into the codeword polynomial c(x) we donot need to perform a reduction modulo f(x). We can anticipate a very important propertyof cyclic codes, a direct consequence of the way we encode a message into a polynomial: allcodewords are represented by polynomials multiple of the generator polynomial of the code.Thus, to decide if a received polynomial r(x) belongs to the code C we divide it by the generatorpolynomial of the code, g(x).

We now revisit concepts such as the generator and the parity matrix of a code, dualcode, and encoding/decoding algorithms using polynomial representation of codewords andthe properties of cyclic subspaces of a vector space.

A generator matrix Gg of the code C[n, k] is a k × n matrix having as elements the coeffi-cients of the generator polynomial g(x),

Gg =

⎛⎜⎜⎜⎜⎜⎝

g(x)x · g(x)

x2 · g(x)...

xk−1 · g(x)

⎞⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎝

g0 g1 . . . gn−k−1 gn−k 0 . . . 00 g0 g1 . . . gn−k−1 gn−k . . . 00 0 g0 g1 . . . gn−k−1 . . . 0...

...... . . .

......

...0 0 0 . . . . . . g0 . . . gn−k−1 gn−k

⎞⎟⎟⎟⎟⎟⎠

.

410

Example. Consider the case n = 10 and express

x10 − 1 = (1 + x)2 · (1 + x + x2 + x3 + x4)2.

If we choose g(x) = 1 + x + x2 + x3 + x4 as the generator polynomial of the code it followsthat deg[g(x)] = n − k = 4, thus k = 10 − 4 = 6. A generator matrix of the code is

G =

⎛⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 1 0 0 0 0 00 1 1 1 1 1 0 0 0 00 0 1 1 1 1 1 0 0 00 0 0 1 1 1 1 1 0 00 0 0 0 1 1 1 1 1 00 0 0 0 0 1 1 1 1 1

⎞⎟⎟⎟⎟⎟⎟⎠

.

The message m(x) = 1 + x + x5 corresponding to the 5-tuple (110001) is encoded as

c(x) = m(x) · g(x)= (1 + x + x5) · (1 + x + x2 + x3 + x4)= (1 + x + x2 + x3 + x4) + (x + x2 + x3 + x4 + x5) + (x5 + x6 + x7 + x8 + x9)= 1 + x6 + x7 + x8 + x9

.

Recall that n = 10 and the codeword c(x) corresponds to the n-tuple (1000001111), the samevector obtained by multiplying the message vector with the generator matrix:

(1 1 0 0 0 1

)⎛⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 1 0 0 0 0 00 1 1 1 1 1 0 0 0 00 0 1 1 1 1 1 0 0 00 0 0 1 1 1 1 1 0 00 0 0 0 1 1 1 1 1 00 0 0 0 0 1 1 1 1 1

⎞⎟⎟⎟⎟⎟⎟⎠

=(

1 0 0 0 0 0 1 1 1 1).

An alternative method to construct the generator matrix. Consider a cyclic codewith the generator polynomial g(x); when we divide xn−k+i, 0 ≤ i ≤ k − 1 by the generatorpolynomial we obtain a quotient q(i)(x) and a remainder r(i)(x):

xn−k+i = g(x) · q(i)(x) + r(i)(x), 0 ≤ i ≤ k − 1.

The degree of the remainder satisfies the following condition:

deg[r(i)(x)] < deg[g(x)] = n − k, 0 ≤ i ≤ k − 1.

Clearly,

xn−k+i − r(i)(x) = g(x) · q(i)(x), 0 ≤ i ≤ k − 1.

Consider a k × (n − k) matrix Rg with rows corresponding to [−r(i)(x)]:

411

Rg =

⎛⎜⎜⎜⎝

−r0(x)−r1(x)

...−rk−1(x)

⎞⎟⎟⎟⎠ .

Then a generator matrix of the code is Gg = [Rg Ik]; we’ll show that a parity check matrix ofthe code is Hg = [In−k − RT

g ] with Ik a k × k identity matrix.

Example. Alternative construction of the generator matrix of a cyclic code when n = 10,k = 6, and g(x) = 1 + x + x2 + x3 + x4. The following table gives the polynomials ri(x)necessary for the construction of the matrix R:

i xn−k+i r(i)(x)0 x4 = [(1) · (1 + x + x2 + x3 + x4)] + 1 + x + x2 + x3 1 + x + x2 + x3

1 x5 = [(x + 1) · (1 + x + x2 + x3 + x4)] + 1 12 x6 = [(x2 + x) · (1 + x + x2 + x3 + x4)] + x x3 x7 = [(x3 + x2) · (1 + x + x2 + x3 + x4)] + x2 + x3 x2 + x3

4 x8 = [(x4 + x3) · (1 + x + x2 + x3 + x4)] + x3 x3

5 x9 = [(x5 + x4 + 1) · (1 + x + x2 + x3 + x4)] + 1 + x + x2 + x3 1 + x + x2 + x3

Thus,

R =

⎛⎜⎜⎜⎜⎜⎜⎝

1 1 1 11 0 0 00 1 0 00 0 1 10 0 0 11 1 1 1

⎞⎟⎟⎟⎟⎟⎟⎠

.

A generator matrix of the code is G = RIk

G =

⎛⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 1 0 0 0 0 01 0 0 0 0 1 0 0 0 00 1 0 0 0 0 1 0 0 00 0 1 1 0 0 0 1 0 00 0 0 1 0 0 0 0 1 01 1 1 1 0 0 0 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎠

.

The message m(x) = 1 + x + x5 which corresponds to the 5-tuple (110001) is encoded as

(1 1 0 0 0 1

)⎛⎜⎜⎜⎜⎜⎜⎝

1 1 1 1 1 0 0 0 0 01 0 0 0 0 1 0 0 0 00 1 0 0 0 0 1 0 0 00 0 1 1 0 0 0 1 0 00 0 0 1 0 0 0 0 1 01 1 1 1 0 0 0 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎠

=(

1 0 0 0 1 1 0 0 0 1).

412

Note that the information symbols appear as the last k bits of the codeword; this is a generalproperty of this method of constructing the generator matrix of a cyclic code.

Given a polynomial of degree m, h(x) =∑m

k=0 hkxk, hm �= 0, its reciprocal, h(x), is the

polynomial defined by

h(x) =m∑

k=0

hm−kxk.

The vectors corresponding to the two polynomials, h(x) and h(x) are respectively:

h = (h0h1 . . . hm−1hm) and h = (hmhm−1 . . . h1h0).

For example, when m = 4:

h(x) = 1 + x + x4 �→ h(x) = 1 + x3 + x4 or (11001) �→ (10011),

h(x) = x2 + x3 + x4 �→ h(x) = 1 + x + x2 or (00111) �→ (11100).

The equivalent and the dual of a cyclic code with generator polynomial g(x).Recall that two codes, C with generator matrix G and C ′ with generator matrix G′, are saidto be equivalent if G′ = G · P with P a permutation matrix. Recall also that a linear codeand its dual are orthogonal subspaces of a vector space and the generator matrix of a code isthe parity check matrix of the dual code.

The polynomial h(x) defined by the expression f(x) = xn − 1 = g(x) · h(x) is called theparity-check polynomial of C and has several properties: (i) it is a monic polynomial of degreek, and (ii) it is the generator of a cyclic code Ch. Indeed, any monic polynomial which dividesf(x) generates an ideal of polynomials, thus a cyclic code. The codewords of the two cycliccodes generated by g(x) and h(x), Cg and Ch, respectively, have the following property:

cg(x) · ch(x) ≡ 0 mod f(x) ∀cg(x) ∈ Cg, ∀ch(x) ∈ Ch.

A codeword of a cyclic code is represented by a polynomial multiple of the generator polyno-mial of the code. Thus:

cg(x) = g(x) · qg(x) mod f(x) and ch(x) = h(x) · qh(x) mod f(x).

Thus:

cg(x) · ch(x) = [g(x) · qg(x)] · [h(x) · qh(x)] = [g(x) · h(x)] · [qg(x) · qh(x)] ≡ 0 mod f(x)

as g(x) · h(x) = 0 mod f(x). The equation cg(x) · ch(x) ≡ 0 mod f(x) does not imply thatthe two codes are orthogonal; indeed, Ch �= C⊥

g , the two codes, Ch and C⊥g , are not each other’s

dual.

Given that f(x) = xn − 1 = g(x) · h(x) we define the codes Cg, Ch, C⊥g , and Ch as follows:

• Cg is the cyclic code generated by the monic polynomial g(x) of degree k; Gg, thegenerator matrix of Cg, is a (k × n) matrix and Hg, the parity-check matrix of Cg, is a(n − k) × n matrix.

413

• Ch is the cyclic code generated by the monic polynomial h(x) of degree k; the generatormatrix of Ch is an (n − k) × n matrix denoted by Gh.

• C⊥g is the dual of Cg; G⊥

g , the generator matrix of C⊥g , is an (n − k) × n matrix.

• Ch is the code generated by h(x), the reciprocal of the polynomial h(x); the generatormatrix of Ch is an (n − k) × n matrix denoted by Gh.

Proposition. The codes Cg, Ch, C⊥g , and Ch have the following properties:

(i) C⊥g and Ch are equivalent codes; the generator matrices of the two codes satisfy the relation:

Gh = G⊥g · P.

(ii) The dual of the code Cg is the code generated by h(x), the reciprocal of h(x): C⊥g = Ch;

the parity check matrix of Cg is the generator matrix of Ch:

Hg = Gh.

The generator matrices of the cyclic code Ch and Ch are:

Gh =

⎛⎜⎜⎜⎜⎜⎝

h(x)x · h(x)

x2 · h(x)...

xn−k−1 · h(x)

⎞⎟⎟⎟⎟⎟⎠

and Gh =

⎛⎜⎜⎜⎜⎜⎝

h(x)x · h(x)

x2 · h(x)...

xn−k−1 · h(x)

⎞⎟⎟⎟⎟⎟⎠

.

It is easy to see that the parity check matrix of the code Cg is Hg = Gh = [In−k − RTg ] with

RTg an (n − k) × k matrix with the rows corresponding to [−ri(x)], the remainder when we

divide xn−k+i by g(x).Let two arbitrary codewords cg(x) ∈ Cg and ch(x) ∈ Ch and the corresponding vectors be:

cg(x) = a0 + a1x + . . . + an−1xn−1 �→ cg = (a0a1 . . . an−1)

ch(x) = b0 + b1x + . . . + bn−1xn−1 �→ ch = (b0b1 . . . bn−1).

Then,

cg(x) · ch(x) = (n−1∑i=0

aixi) · (

n−1∑i=0

bixi) ≡

n−1∑i=0

dixi mod (xn − 1).

Call ch = (bn−1bn−2 . . . b1b0) the reciprocal of the vector ch = (b0b1 . . . bn−2bn−1). Then thecoefficient of x0 can be expressed as the inner product d0 = cg · ch:

d0 = cg · ch = (a0a1 . . . an−1) · (bn−1bn−2 . . . b1b0) = a0bn−1 + a1bn−2 + a2bn−3 + . . . + an−1b0.

Now compute dk, 1 ≤ k ≤ n − 1, and observe that dk becomes the constant term if wemultiply the product

∑n−1i=0 dix

i by xn−k. The multiplication corresponds to the followingproduct of polynomials

414

cg(x) · [xn−k · ch(x)].

Let c(k)h be the vector obtained by cyclically shifting ch(x), k + 1 positions to the left, and

then switching the last component of the vector with the first one. Then,

dk = cg · c(k)h .

Example. The procedure to compute the coefficients ck, 0 ≤ k ≤ n. Consider the case n = 5and f(x) = x5 − 1. Let:

cg(x) = a0 + a1x + a2x2 + a3x

3 + a4x4 and ch(x) = b0 + b1x + b2x

2 + b3x3 + b4x

4.

The corresponding vectors are:

cg = (a0a1a2a3a4) and ch = (b0b1b2b3b4).

Then,

cg(x) · ch(x) = (a0 + a1x + a2x2 + a3x

3 + a4x4) · (b0 + b1x + b2x

2 + b3x3 + b4x

4) =

(a0b0 + a1b4 + a2b3 + a3b2 + a4b1) + (a0b1 + a1b0 + a2b4 + a3b3 + a4b2)x+

(a0b2 + a1b1 + a2b0 + a3b4 + a4b3)x2 + (a0b3 + a1b2 + a2b1 + a3b0 + a4b4)x

3+

(a0b4 + a1b3 + a2b2 + a3b1 + a4b0)x4 mod f(x) = x5 − 1.

The coefficient of the constant term in this expression is given by the inner product of thevectors cg and c

(1)h :

cg · c(1)h = (a0a1a2a3a4) · (b0b2b3b4b1) = a0b0 + a1b2 + a2b3 + a3b4 + a4b1.

Indeed, c(1)h is obtained by a cyclic shift of ch = (b0b1b2b3b4) one position to the left to obtain

(b1b2b3b4b0) and then by switching the last with the first components leading to (b0b2b3b4b1).

Similarly, the coefficient of x2 in this expression is the inner product of cg and c(3)h :

cg · c(3)h = (a0a1a2a3a4) · (b2b1b0b4b3) = a0b2 + a1b1 + a2b0 + a3b4 + a4b3.

Now, c(3)h is obtained by a cyclic shift of ch three positions to the left to obtain (b3b4b0b1b2)

and then by switching the last with the first components to obtain (b2b1b0b4b3).

Now we consider the general case:

cg(x) · ch(x) ≡ 0 mod (xn − 1).

The coefficients of every power of x in the product of polynomials must be zero. But thesecoefficients are precisely the scalars dk computed above. Now consider d to be any cyclic shiftof the vector obtained from ch by reversing its components. Then:

cg · d = 0

Thus, the code Ch obtained by such cyclic shifts of ch followed by reversal of the componentsis the dual of Cg. It is left as an exercise to prove that Ch and Ch are equivalent codes.

415

Example. Consider the polynomial f(x) = x9 − 1 = (1 + x) · (1 + x + x2) · (1 + x3 + x6) andchoose g(x) = 1+x3 +x6. In this case n = 9 and deg[g(x)] = n−k = 6 thus, k = 3. It followsthat h(x) = (1+x)(1+x+x2) = (1+x+x2)+(x+x2+x3) = 1+x3. The vectors correspondingto g(x) and h(x) are: g(x) = 1 + x3 + x6 �→ (100100100) and h(x) = 1 + x3 �→ (100100000).The generator and the parity matrices of the code Cg are k × n and (n − k) × n matrices:

Gg =

⎛⎝ g(x)

x · g(x)x2 · g(x)

⎞⎠ =

⎛⎝ 1 0 0 1 0 0 1 0 0

0 1 0 0 1 0 0 1 00 0 1 0 0 1 0 0 1

⎞⎠ = [I3A],

Hg = [−AT In−k] =

⎛⎜⎜⎜⎜⎜⎜⎝

1 0 0 1 0 0 0 0 00 1 0 0 1 0 0 0 00 0 1 0 0 1 0 0 01 0 0 0 0 0 1 0 00 1 0 0 0 0 0 1 00 0 1 0 0 0 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎠

= G⊥g .

The generator matrix of the code Ch is an (n − k) × n = 6 × 9 matrix

Gh =

⎛⎜⎜⎜⎜⎜⎜⎝

h(x)x · h(x)

x2 · h(x)x3 · h(x)x4 · h(x)x5 · h(x)

⎞⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎜⎝

1 0 0 1 0 0 0 0 00 1 0 0 1 0 0 0 00 0 1 0 0 1 0 0 00 0 0 1 0 0 1 0 00 0 0 0 1 0 0 1 00 0 0 0 0 1 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎠

.

As we know the parity check matrix Gh is the generator matrix of the dual code C⊥. Thevector h = (100100000) thus, h = (000001001) and Gh is

Gh =

⎛⎜⎜⎜⎜⎜⎜⎝

h(x)x · h(x)

x2 · h(x)x3 · h(x)x4 · h(x)x5 · h(x)

⎞⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎜⎜⎜⎜⎝

0 0 0 0 0 1 0 0 11 0 0 0 0 0 1 0 00 1 0 0 0 0 0 1 00 0 1 0 0 0 0 0 11 0 0 1 0 0 0 0 00 1 0 0 1 0 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎠

.

Example. Consider the [7, 4] cyclic code with generator polynomial g(x) = 1 + x + x3. Inthis case n = 7 and k = 3 and xn − 1 = x7 − 1 = g(x) · h(x) with h(x) = 1 + x + x + x4. Agenerator matrix of the code is:

Ga =

⎛⎜⎜⎝

g(x)x · g(x)

x2 · g(x)x2 · g(x)

⎞⎟⎟⎠ =

⎛⎜⎜⎝

1 1 0 1 0 0 00 1 1 0 1 0 00 0 1 1 0 1 00 0 0 1 1 0 1

⎞⎟⎟⎠

Now we compute r(i)(x) = ri0 + ri

1x + ri2x

2 for 0 ≤ i ≤ k, the remainders when we divide ofxn−k+i by g(x):

416

x3 = 1 · (1 + x + x3) + (1 + x) ⇒ r(0)(x) = 1 + xx4 = (x) · (1 + x + x3) + (x + x2) ⇒ r(1)(x) = x + x2

x5 = (1 + x2) · (1 + x + x3) + (1 + x + x2) ⇒ r(2)(x) = 1 + x + x2

x6 = (1 + x + x3) · (1 + x + x3) + (1 + x2) ⇒ r(3)(x) = 1 + x2

and construct another generator matrix Gbg = [RI4]

Gbg =

⎛⎜⎜⎜⎝

r(0)0 r

(0)1 r

(0)2 1 0 0 0

r(1)0 r

(1)1 r

(1)2 0 1 0 0

r(2)0 r

(2)1 r

(2)2 0 0 1 0

r(3)0 r

(3)1 r

(3)2 0 0 0 1

⎞⎟⎟⎟⎠ =

⎛⎜⎜⎝

1 1 0 1 0 0 00 1 1 0 1 0 01 1 1 0 0 1 01 0 1 0 0 0 1

⎞⎟⎟⎠

To construct the parity check matrix Hg = [−AT In−k] of the code we transform Gbg to the

canonical form Gg = [IkA]. The permutation matrix P transforms the columns of Gbg as

follows 1 �→ 5, 2 �→ 6, 3 �→ 7, 4 �→ 1, 5 �→ 2, 6 �→ 3, and 7 �→ 4. Then Gg = GbgP

Gg =

⎛⎜⎜⎝

1 1 0 1 0 0 00 1 1 0 1 0 01 1 1 0 0 1 01 0 1 0 0 0 1

⎞⎟⎟⎠

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 0 0 0 1 0 00 0 0 0 0 1 00 0 0 0 0 0 11 0 0 0 0 0 00 1 0 0 0 0 00 0 1 0 0 0 00 0 0 1 0 0 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎜⎜⎝

1 0 0 0 1 1 00 1 0 0 0 1 10 0 1 0 1 1 10 0 0 1 1 0 1

⎞⎟⎟⎠

Then

G⊥g = Hg = [−AT In−k] =

⎛⎝ 1 0 1 1 1 0 0

1 1 1 0 0 1 00 1 1 1 0 0 1

⎞⎠ .

Recall that h(x) = h(x) = 1 + x + x + x4 and Gh is

Gh =

⎛⎝ h(x)

x · h(x)x2 · h(x)

⎞⎠ =

⎛⎝ 1 1 1 0 1 0 0

0 1 1 1 0 1 00 0 1 1 1 0 1

⎞⎠ .

It is easy to see that Gh = G⊥g P1 with P1 the permutation matrix exchanging the columns of

G⊥g as follows: 1 �→ 2, 2 �→ 4, 3 �→ 3, 4 �→ 5, 5 �→ 1, 6 �→ 6, and 7 �→ 7:

Gh =

⎛⎝ 1 0 1 1 1 0 0

1 1 1 0 0 1 00 1 1 1 0 0 1

⎞⎠

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 1 0 0 0 0 00 0 0 1 0 0 00 0 1 0 0 0 00 0 0 0 1 0 01 0 0 0 0 0 00 0 0 0 0 1 00 0 0 0 0 0 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎠

=

⎛⎝ 1 1 1 0 1 0 0

0 1 1 1 0 1 00 0 1 1 1 0 1

⎞⎠ .

417

Decoding cyclic codes. We recast linear codes decoding strategies in terms of algebrasof polynomials. Recall that the parity-check matrix H of an [n, k] linear code C is used tocompute the error syndrome s given the n-tuple v: sT = HvT . Once we compute the syndromewe are able to determine the error pattern e; there is a one-to-one correspondence betweenthe two. Indeed, if the codeword c ∈ C is affected by the error e and received as v then

v = c + e =⇒ sT = HvT = H(c + e)T = HeT as HcT = 0, ∀c ∈ C.

The following proposition gives a simple algorithm for computing the syndrome polyno-mial. This algorithm requires the determination of the remainder at the division of polyno-mials and can be implemented very efficiently using a family of linear circuits called feedbackshift registers.

Proposition. The syndrome corresponding to the polynomial v(x) = v0 +v1x+ . . .+vn−1xn−1

is the polynomial s(x) satisfying the condition deg[s(x)] ≤ n−k−1, obtained as the remainderwhen v(x) is divided by g(x).

v(x) = q(x)g(x) + s(x).

Proof: We have shown that a parity check matrix Hg of the cyclic code Cg with generatorpolynomial g(x) is Hg = [In−k − RT

g ]; it follows that the columns of Hg correspond to thefollowing polynomials:

hi(x) =

{xi if 0 ≤ i ≤ n − k − 1

r(i−n+k)(x) if n − k ≤ i ≤ n − 1.

with r(i−n+k)(x) defined by: xn−k+i = g(x) · q(i)(x) + r(i)(x), 0 ≤ i ≤ k − 1; the degree of thepolynomial r(i)(x) is less than n − k − 1. It follows that:

s(x) = [v0 + v1x + . . . + vn−k−1xn−k−1] + [vn−kr

0(x) + . . . + vn−1rk−1(x)]

= [v0 + v1x + . . . + vn−k−1xn−k−1]+

+[vn−k(xn−k − q(0)(x)g(x)) + . . . + vn−1(x

n−1 − q(k−1)(x)g(x))]= [v0 + v1x + . . . + vn−k−1x

n−k−1 + vn−kxn−k + . . . + vn−1x

n−1]−−[vn−kq

(0)(x)g(x) + . . . + vn−1q(k−1)(x)g(x)]

= v(x) − [vn−kq(0)(x) + . . . + vn−1q

(k−1)(x)]g(x)= v(x) − h(x)g(x).

Thus v(x) = h(x)g(x)+ s(x) with h(x) = vn−kq(0)(x)+ . . .+ vn−1q

(k−1)(x). The uniqueness ofthe quotient and the remainder of polynomial division grantees that the syndrome polynomials(x) constructed following the definition of the syndrome for linear codes can be used toassociate an error polynomial e(x) with a polynomial v(x) that may or may not be a codeword;if s(x) ≡ 0 then e(x) ≡ 0 and v(x) ≡ c(x), as all codewords c(x) ∈ Cg are multiple of thegenerator polynomial g(x). The degree of s(x) is deg[s(x)] ≤ (n − k − 1).

�An algorithm for decoding binary cyclic codes is discussed next. This algorithm corrects

individual errors, as well as bursts of errors, a topic discussed in the next section. Beforepresenting the algorithm we introduce the concept of cyclic run of length m ≤ n, a succession

418

of m cyclically consecutive components of an n-tuple. For example v = (011100101) has acyclic run of two 0’s and of three 1’s.

Algorithm. Error Trapping: Given an [n, k, d] cyclic code Cg with generator polynomial

g(x) and d = 2t + 1, let s(i)(x) =∑n−k−1

j=0 sjxj be the syndrome polynomial for xiv(x) with

v(x) = c(x) + e(x). The error polynomial, e(x), satisfies two conditions: (i) it corresponds toan error pattern of at most t errors, w[e(x)]) ≤ t; and (ii) it contains a cyclic run of at leastk zeros. To decode v(x) we carry out the following steps:

1. Set i = 0; compute syndrome s(0)(x) as the remainder at the division of v(x) by g(x):v(x) = q(x) · g(x) + s(0)(x).

2. If the Hamming weight of s(i)(x) satisfies the condition w[s(i)(x)] ≤ t then the errorpolynomial is e(x) = xn−is(i)(x) mod (xn − 1) and terminate.

3. Increment i

• If i = n stop and report that the error is not trappable.

• Else:

s(i)(x) =

{xs(i−1)(x) if deg[s(i−1)(x)] < n − k − 1

xs(i−1)(x) − g(x) if deg[s(i−1)(x)] ≥ n − k − 1

4. Go to step 2.

To justify this algorithm we show first that the syndrome of xv(x) is

s(1)(x) =

{xs(x) if deg[s(x)] < n − k − 1

xs(x) − g(x) if deg[s(x)] = n − k − 1

According to the previous proposition, if s(x) is the syndrome corresponding to a polynomialv(x) then the following condition must be satisfied: deg[s(x)] ≤ n − k − 1. It follows thatxs(x) is the syndrome of xv(x) if deg[s(x)] < n − k − 1; then deg[xs(x)] ≤ n − k − 1 thus,the condition is satisfied. If deg[s(x)] ≥ n − k − 1 then we express s(x), g(x), and xs(x) as:

s(x) =n−k−1∑

i=0

sixi = a(x) + sn−k−1x

n−k−1 with a(x) =n−k−2∑

i=0

sixi, deg[a(x)] ≤ n − k − 2.

g(x) =n−k∑i=0

gixi = b(x) + xn−k with b(x) =

n−k−1∑i=0

gixi, deg[b(x)] ≤ n − k − 1.

xs(x) = xa(x) + xsn−k−1xn−k−1 = xa(x) + sn−k−1x

n−k

= xa(x) + sn−k−1[g(x) − b(x)] = sn−k−1g(x) + [xa(x) − sn−k−1b(x)]

We see that d(x) = [xa(x) − sn−k−1b(x)] is the remainder when xs(x) is divided by g(x);deg[d(x)] < deg[g(x)] ≤ n − k − 1 thus, d(x) satisfies the condition to be the syndromeassociated with xs(x).

419

Let us now return to the justification of the error trapping algorithm and recall that theerror pattern has weight at most t and has a run of at least k 0’s. Then there exists an integeri, with 0 ≤ i ≤ n−1, such that a cyclic shift of i positions would bring all non-zero componentsof e(x) within the first n− k components. This new polynomial xie(x) mod (xn − 1) has thesyndrome s(i)(x) and its Hamming weight is w[s(i)(x)] ≤ t; this syndrome is the remainderwhen we divide the polynomial xie(x) mod (xn − 1) by g(x). Indeed, if the parity-checkmatrix of a linear [n, k, d] code is H = [In−kAn−k,k] and the error pattern has the last n − kbits equal to zero, e = (e1, e2, . . . , en−k, 0, 0 . . . 0) then the syndrome is

sT = HeT = [In−kAn−k,k]eT = (e1, e2, . . . en−k)

T .

In this case the syndrome is the same as the error polynomial.

Example. Let Cg be the [7, 4, 3] cyclic code with the generator polynomial g(x) = 1+x2 +x3.In this case n = 7, k = 4; the code can correct one error as d = 2t + 1 = 3 thus, t = 1. Wesee also that x7 − 1 = (x3 + x2 + 1)(x4 + x3 + x2 + 1).

We apply the error trapping algorithm for the case when we transmit the code polynomialc(x) = (1 + x) · g(x) and c(x) is affected by the error polynomial e(x) = x3 and we receivev(x) = c(x) + e(x). Then the codeword is c(x) = 1 + x2 + x3 + x + x3 + x4 = 1 + x + x2 + x4

and the received 7-tuple is v(x) = c(x) + e(x) = 1 + x + x2 + x3 + x4.

Set i = 0 and compute the remainder when we divide v(x) by g(x)

v(x) = q(x)g(x) + r0(x) =⇒ x4 + x3 + x2 + x + 1 = x(x3 + x2 + 1) + (x2 + 1).

Then s(0)(x) = 1 + x2 and the Hamming weight w[s(0)(x)] = 2 > t thus, we set i = 1 andevaluate deg[s(0)(x)] = 2 and n − k − 1 = 7 − 4 − 1 = 2. We see that deg[s(0)(x)] ≥ 2 thus,we compute s(1)(x) as

s(1)(x) = xs(0)(x) − g(x) = x(1 + x2) − (1 + x2 + x3) = 1 + x + x2.

We see that w[s(1)(x)] = 3 > t; we have to go to the next iteration, i = 2. Then deg[s1(x)] = 2thus, we compute s(2)(x) as

s(2)(x) = xs(1)(x) − g(x) = x(1 + x + x2) − (1 + x2 + x3) = 1 + x.

Once again w[s(2)(x)] = 2 > t; we have to go to the next iteration i = 3. We evaluatedeg[s(2)(x)] = 1 and the rule to compute s(3)(x) is

s(3)(x) = xs(2)(x) = x(1 + x) = x + x2.

Again w[s3(x)] = 2 > t and we set i = 4. We evaluate deg[s2(x)] = 2 thus, the rule tocompute s(4)(x) is:

s(4)(x) = xs(3)(x) − g(x) = x(x + x2) − (1 + x2 + x3) = 1.

We have now reached the value of the Hamming weight which allows us to identify the errorpattern: w[s(4)(x)] = 1 thus e(x) = xn−is(4)(x) mod (xn − 1). Thus,

420

e(x) = x7−4 mod (x7 − 1) = x3.

Example. Cyclic redundancy check. Cyclic codes are used for error detection by link layerprotocols where the data transport unit is called a frame. It is easy to show that:

Given p(x), a primitive polynomial of degree m, the cyclic binary code with the generatorpolynomial g(x) = (x + 1)p(x) is an [n, k, d] code with

n = 2m − 1, k = 2m − m − 2, and d = 4.

The n − k = m + 1 parity check bits are called cyclic redundancy check (CRC) bits and areincluded in the trailer header of the frame. A cyclic code with g(x) of this form can correctone error. The probability that two or more errors were undetected and the receiver acceptedthe frame, Punderr, is equal to the probability that the (m + 1) parity-check bits are all zerowhen e ≥ 2, i.e., two or more errors occur:

Punderr = Prob(e ≥ 2) × 2−(m+1).

When m is large, undetected errors have a vanishing probability.Two standard CRC polynomials, CRC-ANSI and CRC-ITU-T are used [228]:

x16 + x15 + x2 + 1 = (x + 1)(x15 + x + 1)

and

x16 + x12 + x5 + 1 = (x + 1)(x15 + x14 + x13 + x12 + x4 + x3 + x2 + x + 1).

In this case 2−17 < 10−5 and the probability of undetected errors is less than 10−3 if theprobability of two or more errors is less than 10−2.

The question we address next is if there are bounds restricting our ability to design “good”cyclic codes.

4.13 The Minimum Distance of a Cyclic Code; BCH Bound

We wish to establish a bound on the distance of a cyclic code, a question answered by theso-called Bose-Chaudhuri-Hocquenghem (BCH) bound. We recall a theorem discussed inSection 4.6 which states that every set of (d − 1) columns of H, the parity check matrix of alinear code are linearly independent if and only if the code has a distance of at least d. Thus,we should concentrate on the properties of the parity-check matrix of a cyclic code. First, weintroduce Vandermonde matrices and then review some properties of Vandermonde matriceswith elements from a finite field, GF (q) [228].

A Vandermonde matrix of order n with elements x1, x2, . . . , xn ∈ GF (q) is of the form

An =

⎛⎜⎜⎜⎝

1 x1 (x1)2 (x1)

3 . . . (x1)n−1

1 x2 (x2)2 (x2)

3 . . . (x2)n−1

......

......

...1 xn (xn)2 (xn)3 . . . (xn)n−1

⎞⎟⎟⎟⎠ .

421

Proposition. Let β be a primitive element of order n of the finite field GF (q); thus, βn = 1and βi �= 1, 0 ≤ i < n. Let xj = βj−1. Then

C = BAT = 0 and D = ABT = 0

with

A =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 . . . 1 1x1 x2 x3 . . . xn−1 xn

(x1)2 (x2)

2 (x3)2 . . . (xn−1)

2 (xn)2

......

......

...(x1)

α−1 (x2)α−1 (x3)

α−1 . . . (xn−1)α−1 (xn)α−1

(x1)α (x2)

α (x3)α . . . (xn−1)

α (xn)α

⎞⎟⎟⎟⎟⎟⎟⎟⎠

B =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

x1 x2 x3 . . . xn−1 xn

(x1)2 (x2)

2 (x3)2 . . . (xn−1)

2 (xn)2

(x1)3 (x2)

3 (x3)3 . . . (xn−1)

3 (xn)3

......

......

...(x1)

s−1 (x2)s−1 (x3)

s−1 (xn−1)s−1 (xn)s−1

(x1)s (x2)

s (x3)s . . . (xn−1)

s (xn)s

⎞⎟⎟⎟⎟⎟⎟⎟⎠

and

s + α + 1 ≤ n.

Proof: If C = [cik], A = [akj], and B = [bij] with akj = (xj)k−1 and bij = (xj)

i, then:

cik =n∑

j=1

bijajk =n∑

j=1

(xj)i(xj)

k−1 =n∑

j=1

(xj)i+k−1 =

n∑j=1

(βi+k−1)j−1.

But βi+k−1 ≤ 1 and

i + k − 1 ≤ s + a ≤ n − 1 =⇒ βi+k−1 �= 1.

It follows immediately that the sum of the geometric series is

cik =(βi+k−1)n − 1

(βi+k−1) − 1=

(βn)i+k−1 − 1

(βi+k−1) − 1= 0

because

(βn)i+k−1 = 1.

The proof that D = ABT = 0 follows a similar path. �

Proposition. Call Dn = det[An]. Then:

Dn =n∏

i>j

(xi − xj)

422

= (x2 − x1)(x3 − x2)(x3 − x1) . . . (xn − xn−1)(xn − xn−2) . . . (xn − x1).

Proof: We prove this equality by induction:

(i) It is easy to see that D2 = x2 − x1.

(ii) Assume that

Dn−1 =n−1∏i>j

(xi − xj)

= (x2 − x1)(x3 − x2)(x3 − x1) . . . (xn−1 − xn−3)(xn−1 − xn−2) . . . (xn−1 − x1).

We can consider Dn as a polynomial of degree (n−1) in xn with x1, x2, . . . xn−1 as zeros. Thecoefficient of (xn)n−1 is Dn−1. Thus:

Dn =n∏

i>j

(xi − xj) =n−1∏i=1

(xn − xi)Dn−1. �

The BCH (Bose, Ray-Chaudhuri, Hocquenghem) bound. Let g(x) be the generator poly-nomial of the [n, k] cyclic code C over GF (q). Let β ∈ GF (q) be a primitive element of ordern. If g(x) has among its zeros (d − 1) elements of GF (q), βα, βα+1, . . . , βα+d−2, then d is thedistance of the cyclic code.

Proof: Recall that the minimum distance between any pair of codewords called the distanceof a linear code is equal to the number of independent column vectors of H, the parity checkmatrix of the code. The parity check matrix of code C is

H =

⎛⎜⎜⎜⎝

1 (β)α (β)2α (β)3α . . . (β)(n−1)α

1 (β)α+1 (β)2(α+1) (β)3(α+1) . . . (β)(n−1)(α+1)

......

......

...1 (β)(α+d−2) (β)2(α+d−2) (β)3(α+d−2) . . . (β)(n−1)(α+d−2)

⎞⎟⎟⎟⎠ .

The determinant of (d − 1) columns of the parity check matrix H is

det[H] =

(β)αi1 (β)αi2 (β)αi3 . . . (β)αid−1

(β)(α+1)i1 (β)(α+1)i2 (β)(α+1)i3 . . . (β)(α+1)id−1

......

......

(β)(α+d−2)i1 (β)(α+d−2)i2 (β)(α+d−2)i3 . . . (β)(α+d−2)id−1

.

But xj = βij , thus

det[H] =

(x1)α (x2)

α (x3)α . . . (xd−1)

α

(x1)(α+1) (x2)

(α+1) (x3)(α+1) . . . (xd−1)

(α+1)

......

......

(x1)(α+d−2) (x2)

(α+d−2) (x3)(α+d−2) . . . (xd−1)

(α+d−2)

,

or

423

det[H] = xα1xα

2 xα3 . . . xα

d−1

1 1 1 . . . 1x1 x2 x3 . . . xd−1...

......

...(x1)

(d−2) (x2)(d−2) (x3)

(d−2) . . . (xd−1)(d−2)

.

Finally, xi �= xj thus

det[H] = xα1 xα

2xα3 . . . xα

d−1

d−1∏i>j

(xi − xj) �= 0.

This proves that (d − 1) columns of the parity check matrix are linearly independent, thusthe minimum distance of the code is d.

4.14 Burst Errors. Interleaving

So far we have assumed that errors are uncorrelated and we have only been concerned withthe construction of error correcting codes able to handle errors with a random distribution.Yet, physical systems may be affected by errors concentrated in a short time interval, oraffecting a number of adjacent bits; communication channels, as well as memory and infor-mation processing systems may experience correlated errors. A burst of noise triggered byan electrostatic phenomena, a speck of dust on a recording media, a slight variation of thevoltage of a power supply may affect a contiguous stream of bits transmitted, stored, or beingprocessed. Codes designed to handle bursts of errors are called Burst Error Correcting (BEC)codes. We start this section with a few definitions, then present several properties and discussseveral classes of BEC codes.

An error burst of length l is an n-tuple whose non-zero symbols are confined to a span ofno fewer than l symbols. A wraparound burst of length l is a cyclic shift of a burst of lengthl. A burst is specified by its location, the least significant digit of the burst, and the burstpattern. If i is the location and b(x) is the pattern ( b0 �= 0) then the error burst e(x) and thewraparound burst are:

e(x) = xi · b(x), e(x) = xi · b(x) mod (xn − 1).

Example. Assume n = 10. Then 0001101000 is a burst of length 4, the location is 3, andthe pattern is 1 + x + x3; 0100000011 is a wraparound burst of length 4, location 8, and thesame pattern, 1 + x + x3.

Given an [n, k] linear code capable of correcting error bursts of length l, itsburst error correcting efficiency is defined as

η =2l

(n − k).

The following two propositions show that cyclic codes and block codes can be used forburst error detection and correction.

424

Proposition. A cyclic code with generator polynomial g(x) of degree r = n−k can detect allburst errors of length l ≤ r.

Proof: Call c(x) the polynomial representing the codeword sent and e(x) the error polynomial.We are able to detect an error when v(x) = c(x) + e(x) is not a codeword; this conditionrequires that e(x) is not a codeword, thus it is not a multiple of the generator polynomialg(x) = g0 + g1x + g2x

2 + ... + xr. Assume that the error burst starts at position i andaffects l ≤ r consecutive bits thus, e(x) = xi · b(x). We see immediately that g(x) and xi

are relatively prime because g0 �= 0. The length of the burst l ≤ r and this implies thatdeg[b(x)] ≤ r − 1 < deg[g(x)] = r thus, b(x) is not a multiple of g(x). It follows immediatelythat e(x) = xi · b(x) is not a multiple of g(x), therefore it is not a codeword.

A cyclic code will detect most burst errors of length l > r; indeed, the probability thata binary cyclic code with generator polynomial of degree r will detect error bursts of lengthl > r is equal to 1− 2−r and the probability of detecting error bursts of length l is 1− 2−r+1.

Proposition. A block code can correct all bursts of length ≤ l if and only if no two codewordsdiffer by the sum of two bursts of length ≤ l.

An equivalent statement is that a linear block code can correct all bursts of length ≤ l ifand only if no codeword is the sum of two bursts of length ≤ l.

Proposition. The Rieger Bound - the burst error correcting ability l of an [n, k] linear blockcode satisfies the condition

l ≤ n − k

2.

This inequality is an immediate consequence of the fact that all polynomials of degree smallerthan l must have distinct syndromes, otherwise the difference of two polynomials would be acodeword and, at the same time, the sum of two bursts of length ≤ l.

We now examine several classes of burst error correcting codes and their burst error cor-recting efficiency. The Rieger bound imposes that linear codes have error correcting efficiencyη ≤ 1. On the other hand, Reed-Solomon codes discussed in Section 4.15 have η = 1. Reed-Solomon codes use 2t check symbols and can correct t random errors and, thus, bursts oflength ≤ t. The Fire codes have an error correcting efficiency of ≈ 2/3.

Cyclic codes have some desirable properties for burst error correction. Indeed, if symbol jof cyclic shift r(i)(x) can be corrected, then the same error magnitude can be used to correctthe symbol (j − i) mod n of r(x). To decode a cyclic code we use a feedback shift registerimplementing the generator polynomial of the code, g(x). If r(x) is the received polynomial weuse the following algorithm: (i) compute s(x) = r(x) mod g(x); (ii) Shift syndrome registerwith feedback until s[i](x) contains a syndrome with a known error e(x); (iii) decode r(x)using e(−i)(x). For example, g(x) = x4 + x + 1 is a primitive polynomial over GF (2) whichgenerates a (15, 11) cyclic Hamming code. The code with g(x) = (x + 1)(x4 + x3 + 1) cancorrect any bursts of length ≤ 2.

Fire codes are cyclic codes over GF (q) with a generator polynomial g(x) = (x2t−1−1)p(x)such that: (i) p(x) is a prime polynomial of degree m and order e over GF (q), (ii) m ≥ t, and(iii) p(x) does not divide (x2t−1 − 1). The block length of a Fire code, n, is the order of thegenerator polynomial, g(x); n = lcm(e, 2t − 1).

425

Interleaving is a simple technique to ensure burst error correction capabilities of a codeoriginally designed for random error correction, or to enhance the capabilities of an existingburst error correcting code. The basic idea of interleaving is to alternate the transmissionof the individual bits of a set of codewords, Figure 102. If we have an original [n, k] codeC = {c1, c2 . . . cm} with m codewords capable of correcting a burst of length b ≥ 1 errors orless, then interleaving allows us to correct a burst of length mb or less. We construct a tableand every row of this table is a codeword of the original code. When transmitted or storedthe symbols are interleaved; we send the symbols column wise, the first symbol of c1, thenthe first symbol c2, and so on. A burst of length mb or less can have at most b symbols in anyrow affected by the error. But each row is a codeword of the original code and it is able tocorrect a burst of length b or less. If b = 1 we start with a code with no burst error correctioncapabilities.

1

2

m

Figure 102: Schematic representation of interleaving. We wish to transmit m codewords andconstruct a table with m rows and n columns; every row of this table is a codeword of lengthn of the original code. We send the table column-wise thus, the symbols are interleaved. Nowthe code is capable of correcting a burst of length mb or less because in any row at most bsymbols could be affected by the error and the original code can correct a burst of this length.

Let us now give an example of interleaving. We shall start with the Reed-Muller codewith the generator matrix:

G =

⎛⎜⎜⎝

1 1 1 1 1 1 1 11 0 0 1 1 0 1 00 1 0 1 1 1 1 00 0 1 1 0 1 1 0

⎞⎟⎟⎠ .

426

This code is able to correct a single error, its distance is d = 3, and the length of a codewordis n = 8. Consider the set of five messages M = {0001, 0010, 0100, 0011, 0111} encoded asthe codewords C = {00110110, 01011110, 10011010, 01101000, 11110000}, respectively. A firstoption is to send the five codewords in sequence:

00110110|01011110|10011010|01101000|11110000.

In this case a burst of errors of length b ≥ 2 would go undetected. Note that the verticalbars are not part of the string transmitted; they mark the termination of each one of thefive codewords. If we interleave the symbols and transmit the first bits of the five codewordsfollowed by the second bits, and so on, then the string of bits transmitted is

00101|01011|10011|01001|01110|11000|11100|00000.

The interleaving allows us to correct any burst of errors of length b ≤ 4. Now there are eightgroups, one for each symbol of a codeword, and the vertical bars mark the end of a group.

4.15 Reed-Solomon Codes

Many of the codes we have examined so far are binary codes. We start our presentationof non-binary codes with a brief overview and then introduce a class of non-binary codes,the Reed-Solomon (R-S) codes. R-S codes are non-binary cyclic codes widely used for bursterror correction; they were introduced in 1960 by Irving Reed and Gustav Solomon [354].R-S codes are used for data storage on CDs, DVDs and Blu-ray Discs, in data transmissiontechnologies such as DSL (Digital Subscriber Line) and WiMAX (Worldwide Interoperabilityfor Microwave Access), in broadcast systems such as DVB (Digital Video Broadcasting), andin RAID 6 computer disk storage systems.

The density of codewords of the linear code C over the alphabet A is the ratio of the num-ber of codewords to the number of n-tuples constructed with the symbols from the alphabetA. Table 15 shows the density of the codewords | C | / | Vn | and the rate R of binary andnon-binary [n, k] codes over GF (2m).

Table 15: The cardinality of the n-dimensional vector space Vn, the cardinality of an [n, k]code C, the density of the codewords, and the rates of a binary and of a non-binary code overGF (2m).

Code | Vn | | C | | C | / | Vn | R

Binary 2n 2k 2−(n−k) k/nNon-binary qn = 2mn qk = 2mk 2−(n−k)m k2m/n2m = k/n

We notice that the rates of a binary [n, k] and of a q = 2m-ary [n, k] code are equal.We also see that non-binary codes have a lower density of codewords thus, a larger distancebetween codewords and this implies increased error correction capability.

427

Example. A binary Hamming code versus a code using 3-bit symbols. The binary [7, 4, 3] codeis a subspace of cardinality | C |= 24 = 16 of a vector space of cardinality | Vn |= 27 = 128.The density of the codewords is

16

128=

1

8.

If the code uses an alphabet with eight symbols, each symbol requiring m = 3 bits, thenthe total number of codewords becomes 24×3 = 212 = 4096, while the number of 7-tuples is27×3 = 221 = 2, 097, 152. Now the density of the codewords is much lower:

4096

2, 097, 152=

1

29=

1

512.

Let Pk−1[x] be the set of polynomials pi(x) of degree at most (k − 1) over the finite fieldGF (q). Given n > k then an [n, k] Reed-Solomon (R-S) code over GF (q) consists of allcodewords ci generated by polynomials pi ∈ Pk−1[x]:

ci = (pi(x1), pi(x2), pi(x3), . . . pi(xn−1), pi(xn))

where x1, x2, ... . . . xn are n different elements of GF (q); xi = βi−1, 1 ≤ i ≤ n with β aprimitive element of GF (q). The polynomial pi(x) is called the generator of the codeword ci.

Reed-Solomon codes have several desirable properties; for example, an [n, k] R-S code canachieve the Singleton bound, the largest possible code distance (minimum distance betweenany pair of codewords) of any linear code: d = n − k + 1. This means that the code cancorrect at most t errors with

t = d − 1

2 = n − k

2.

Recall that a cyclic code Cg has a generator polynomial g(x) and all codewords ci(x) aremultiple of the generator polynomial, ci(x) = g(x) · pi(x), ∀ci ∈ C. The Reed-Solomon codesare constructed differently, first we identify pi(x), the generator polynomial of each codewordci(x) and then evaluate this polynomial for the elements (x1, x2, . . . , xn) of the finite fieldGF (q) to identify the n coordinates ci,j of the codeword ci =

∑nj ci,jx

j. We shall later seethat an R-S code is a cyclic code and we can define a generator polynomial. The differencebetween R-S codes and the BCH codes discussed at the end of this section is that for R-Scodes β is a primitive element of GF (q) while in case of BCH codes β is an element of anextension Galois Field.

As noted earlier, Pk−1[x] ⊂ GF (q)[x] is a vector space of dimension (k − 1). For example,when k = 3 the polynomials of degree at most 2 over GF (2) are:

P2[x] = {p0(x) = 1, p1(x) = x, p2(x) = x + 1, p3(x) = x2, p4(x) = x2 + 1, p5(x) = x2 + x,p6(x) = x2 + x + 1}

It is easy to see that an R-S code is a linear code. Let two codewords of an R-S code be:

428

ci = (pi(x1), pi(x2), . . . , pi(xn)) and cj = (pj(x1), pj(x2), . . . , pj(xn))

Then αci + βcj = (g(x1), g(x2), . . . , g(xn)) with α, β ∈ GF (q) and g(x) = αpi(x) + βpj(x).

The minimum distance of an [n, k] Reed-Solomon code is (n−k+1). Indeed, each codewordconsists of n symbols from GF (q) and it is generated by a polynomial pi(x) ∈ Pk−1 of degreeat most (k − 1). Such a polynomial may have at most (k − 1) zeros. It follows that theminimum weight of a codeword is n − (k − 1) = n − k + 1 thus, the minimum distance thecode is dmin = n − k + 1.

Encoding R-S codes. We can easily construct a generator and a parity-check matrix ofan [n, k] R-S code when the n elements of the finite field x1, x2, ... . . . xn ∈ GF (q) can be ex-pressed as powers of β, the characteristic element of the finite field. Each row of the generatormatrix G consists of one polynomial in Pk[x] evaluated for all elements xi ∈ GF (q). For exam-ple, the rows 1 to k may consist of the polynomials p0(x) = x0, p1(x) = x1, . . . , pk−1(x) = xk−1;thus, a generator matrix of an [n, k] R-S code is [228]:

G =

⎛⎜⎜⎜⎜⎜⎝

1 1 1 . . . 1 1x1 x2 x3 . . . xn−1 xn

(x1)2 (x2)

2 (x3)2 . . . (xn−1)

2 (xn)2

......

......

...(x1)

k−1 (x2)k−1 (x3)

k−1 . . . (xn−1)k−1 (xn)k−1

⎞⎟⎟⎟⎟⎟⎠

.

To construct the parity-check matrix H of the R-S code with the generator matrix G we usea property discussed in Section 4.13: if G is a k × n matrix then there exist an (n − k) × nmatrix H such that HGT = GHT = 0. It follows immediately that the parity check matrix Hof the code is

H =

⎛⎜⎜⎜⎜⎜⎝

x1 x2 x3 . . . xn−1 xn

(x1)2 (x2)

2 (x3)2 . . . (xn−1)

2 (xn)2

(x1)3 (x2)

3 (x3)3 . . . (xn−1)

3 (xn)3

......

......

...(x1)

n−k (x2)n−k (x3)

n−k . . . (xn−1)n−k (xn)n−k

⎞⎟⎟⎟⎟⎟⎠

.

The two matrices can be expressed in terms of the primitive element β with the propertyβn = 1 and βi �= 1, 0 ≤ i < n. Then:

xj = βj−1 1 ≤ j ≤ q − 1 such that xnj = βj−1, 1 ≤ j ≤ n, and n = q − 1.

Now the generator and the parity-check matrices are:

G =

⎛⎜⎜⎜⎜⎜⎝

1 1 1 . . . 1 11 (β)1 (β)2 . . . (β)n−2 (β)n−1

1 (β2)1 (β2)2 . . . (β2)n−2 (β2)n−1

......

......

...1 (βk−1)1 (βk−1)2 . . . (βk−1)n−2 (βk−1)n−1

⎞⎟⎟⎟⎟⎟⎠

429

and:

H =

⎛⎜⎜⎜⎜⎜⎝

1 (β)1 (β)2 . . . (β)n−2 (β)n−1

1 (β2)1 (β2)2 . . . (β2)n−2 (β2)n−1

1 (β3)1 (β3)2 . . . (β3)n−2 (β3)n−1

......

......

...1 (βn−k)1 (βn−k)2 . . . (βn−k)n−2 (βn−k)n−1

⎞⎟⎟⎟⎟⎟⎠

.

Algorithm. To determine the codeword ci of an [n, k] Reed-Solomon code generated by theinformation polynomial pi(x) = pi,0 + pi,1x

1 + . . . + . . . + pi,nxn with pi(x) ∈ Pk−1 carry out

the following three steps:(i) choose k, q, and n with k < n < q;(ii) select n elements xj ∈ GF (q), 1 ≤ j ≤ n;(iii) evaluate the polynomial pi(x) for x = xj, 1 ≤ j ≤ n and then construct:

ci = (pi(x1), pi(x2), . . . , pi(xn))

Example. Encoding for a [10, 5] Reed-Solomon code over GF (11). In this case n = 10,k = 5, and q = 11. First, we have to find a primitive element βn = 1; we notice that210 mod 11 = 1; then the q − 1 = 10 non-zero elements of GF (11) can be expressed asxj = 2j−1 with 1 ≤ j ≤ 10:

x1 = 20 mod 11 = 1 x5 = 24 mod 11 = 5 x9 = 28 mod 11 = 3x2 = 21 mod 11 = 2 x6 = 25 mod 11 = 10 x10 = 29 mod 11 = 6x3 = 22 mod 11 = 4 x7 = 26 mod 11 = 9x4 = 23 mod 11 = 8 x8 = 27 mod 11 = 7

The information vectors are denoted as (i0, i1, i2, i3, i4); the 31 polynomials of degree at mostk = 4 with coefficients from GF (2) corresponding to these vectors and the codewords theygenerate are shown in Table 16.

For example, when the information vector is (0, 0, 0, 0, 1) the codeword is generated by thepolynomial p5(x) = x4 then c(0,0,0,0,1) = (1, 5, 3, 4, 9, 1, 5, 3, 4, 9). Indeed,

p5(x1) = (x1)4 = 14 mod 11 = 1 p5(x6) = (x6)

4 = 104 mod 11 = 1p5(x2) = (x2)

4 = 24 mod 11 = 5 p5(x7) = (x7)4 = 94 mod 11 = 5

p5(x3) = (x3)4 = 44 mod 11 = 3 p5(x8) = (x8)

4 = 74 mod 11 = 3p5(x4) = (x4)

4 = 84 mod 11 = 4 p5(x9) = (x9)4 = 34 mod 11 = 4

p5(x5) = (x5)4 = 54 mod 11 = 9 p5(x10) = (x10)

4 = 64 mod 11 = 9

A generator matrix of the code is

G =

⎛⎜⎜⎜⎜⎝

1 1 1 1 1 1 1 1 1 11 2 4 8 5 10 9 7 3 61 4 5 9 3 1 4 5 9 31 8 9 6 4 10 3 2 5 71 5 3 4 9 1 5 3 4 9

⎞⎟⎟⎟⎟⎠ .

430

Table 16: The information vector, the generator polynomials and the codewords for the [10, 4]R-S code over GF (11).

Vector Generator polynomial Codeword(1, 0, 0, 0, 0) p1(x) = 1 (1, 1, 1, 1, 1, 1, 1, 1, 1, 1)(0, 1, 0, 0, 0) p2(x) = x (1, 2, 4, 8, 5, 10, 9, 7, 3, 6)(0, 0, 1, 0, 0) p3(x) = x2 (1, 4, 5, 9, 3, 1, 4, 5, 9, 3)(0, 0, 0, 1, 0) p4(x) = x3 (1, 8, 9, 6, 4, 10, 3, 2, 5, 7)(0, 0, 0, 0, 1) p5(x) = x4 (1, 5, 3, 4, 9, 1, 5, 3, 4, 9)(1, 1, 0, 0, 0) p6(x) = 1 + x (2, 3, 5, 9, 6, 0, 10, 8, 4, 7)(1, 0, 1, 0, 0) p7(x) = 1 + x2 (2, 5, 6, 10, 4, 2, 5, 6, 10, 4)(1, 0, 0, 1, 0) p8(x) = 1 + x3 (2, 9, 10, 7, 5, 0, 4, 3, 6, 8)(1, 0, 0, 0, 1) p9(x) = 1 + x4 (2, 6, 4, 5, 10, 2, 6, 4, 5, 10)

......

...(1, 1, 1, 1, 1) p31(x) = x4 + x3 + x2 + x1 + 1 (5, 9, 0, 6, 0, 1, 5, 0, 0, 4)

Decoding R-S codes. Assume the codeword ci of an [n, k] R-S code is generated bythe polynomial pi(x); an error ei occurs when we send ci = (pi(x1), pi(x2), . . . pi(xn−1), pi(xn))and we receive the n-tuple v affected by at most t errors:

v = ci + e, with w(e) ≤ t = n − k

2

Before presenting the decoding algorithm for an R-S codes we introduce two new concepts,the interpolating polynomial and the error locator polynomial; then we prove a theorem ofexistence for the interpolation polynomial and show that the polynomial used to generatethe codeword cj is a ratio of the interpolating and error locating polynomials. The bivariateinterpolating polynomial is defined as

Q(x, y) = Q0(x) + yQ1(x).

Q(x, y) is a polynomial in x and y with Q0(x) and Q1(x) polynomials of degrees m0 and m1,respectively,

Q0(x) = Q0,0 + Q0,1x + . . . + Q0,jxj + . . . + Q0,m0x

m0

andQ1(x) = Q1,0 + Q1,1x + . . . + Q1,jx

j + . . . + Q0,m1xm1 ,

the so-called error locator polynomial.

Proposition. When we receive an n-tuple v = (v0, v2, . . . , vn−1) an interpolation polynomialfor the [n, k] Reed-Solomon code with t = (n − k)/2 exists if the following three conditionsare satisfied:

(i) Q(xi, vi) = Q0(xi) + viQ1(xi) = 0, i = 1, 2, . . . (n − 1), n, xi, vi ∈ GF (q).

431

(ii) deg[Q0(x)] = m0 ≤ n − 1 − t

(iii) deg[Q1(x)] = m1 ≤ n − 1 − t − (k − 1).

Proof: To prove the existence of Q(xi, vi) we have to show that we can determine the m0 + 1coefficients Q0,j and the m1+1 coefficients Q1,j given the n equations Q(xi, vi) = 0, 1 ≤ i ≤ n,and that not all of them are equal to zero. The total number of coefficients of the bivariatepolynomial is (m0 +1)+(m1 +1) = (n−t)+(n−t−(k−1)) = 2n−k+1−2t. If we substitute2t = d = n− k + 1 we see that the total number of coefficients is 2n− k + 1− n + k − 1 = n.The system of n equations with n unknowns has a non-zero solution if the determinant of thesystem is non-zero, and in this case a polynomial Q(xi, vi) that satisfies the three conditionsexists.

�

Proposition. Let C be a Reed-Solomon code with t = n−k2. If we transmit the codeword ci(x)

generated by the polynomial pi(x) ∈ Pk−1[x] and we receive v(x) =∑

j vjxi = ci(x)+ e(x) and

if the number of errors is w[e(x)] ≤ t then the codeword ci(x) is generated by the polynomialpi(x) ∈ Pk−1[x] given by

pi(x) = −Q0(x)

Q1(x)

Proof: If v = (v0, v2, . . . , vn−1) is the received n-tuple and e(x) = (e0, e1, . . . en−1) is the errorpolynomial then at most t components of e are non-zero, el = rl thus, at least n−t componentsej of e are zero. An interpolating polynomial Q(x, v) = Q(x, (pj(x) + e(x))) exists and has atleast n−t zeros, namely those values x = xj when pi(xj) = vj +ej with ej = 0. But the degreeof Q(x, pi(x)) is m0 ≤ n− t− 1. It follows that Q(x, pi(x)) = 0 for x = xj, j = 1, 2, . . . n. ButQ(x, pi(x)) = 0 implies that

Q0(x) + pi(x)Q2(x) = 0 thus pi(x) = −Q0(x)

Q1(x)

�

Algorithm. Let C be an [n, k] Reed-Solomon code able to correct at most t = (n − k)/2errors. We send ci = (pi(x1), pi(x2), . . . , pi(xn)) generated by the polynomial pi(x) and wereceive v = ci + e with v = (v0, v2, . . . , vn−1). To determine ci we first compute the generatorpolynomial pi(x) and then evaluate it for x = xj, 1 ≤ j ≤ n. The decoding algorithm consistsof the following steps:

Step 1. Compute m0 = n − 1 − t and m1 = n − 1 − t − (k − 1). Solve the system of linearequations:

432

⎡⎢⎢⎢⎢⎢⎣

1 x1 x21 . . . xm0

1 v1 v1x1 . . . v1xm11

1 x2 x22 . . . xm0

2 v2 v2x2 . . . v2xm12

1 x3 x23 . . . xm0

3 v3 v3x1 . . . v3xm13

...1 xn x2

n . . . xm0n vn vnxn . . . vnx

m1n

⎤⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

Q0,0

Q0,1

Q0,2...

Q0,m0

Q1,0

Q1,1

Q1,2...

Q1,m1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

000...0000...0

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

Step 2. Construct the following polynomials:

Q0(x) =

m0∑j=0

Q0,jxj, Q1(x) =

m1∑j=1

Q1,jxj and p(x) = −Q0(x)

Q1(x).

Step 3. If p(x) ∈ Pk−1 then the algorithm succeeds and produces (p(x1), p(x2), . . . p(xn)).

An alternative definition of Reed-Solomon codes. We discuss now several propertiesof extension finite fields GF (2m) and show that the polynomial xn − 1 can be factored as aproduct of irreducible monic polynomial over GF (2m). Recall that the set of polynomialsp(x) =

∑ki=0 αix

i of degree at most k and with αi ∈ GF (2m) is denoted as P2m [x].

Proposition. If the polynomial p(x) ∈ P2m [x] then p(x2) = [p(x)]2.

Proof: If α, β ∈ GF (2m) then (α+β)2 = α2+β2. Indeed, the term 2α·β = 0 as α, β ∈ GF (2m).If p(x) = αkx

k + αk−1xk−1 + . . . + α1x + α0x

0, with α0, α1, . . . , αk ∈ GF (2m) then

[p(x)]2 = [αkxk + αk−1x

k−1 + . . . + α1x + α0x0]2 = α2

kx2k + α2

k−1x2(k−1) + . . . + α2

1x2 + α2

0.

p(x2) = αkx2k + αk−1x

2(k−1) + . . . + α1x2 + α0.

Thus, p(x2) = [p(x)]2 if and only if αi = (αi)2, 1 ≤ i ≤ k. But this is true if and only if

αi ∈ GF (2m). It follows immediately that if β is a zero of p(x), then so is β2:

p(β) = 0 ⇔ p(β2) = 0.

�

Proposition. The extension Galois field GF (2m) is a subfield of GF (2n) if and only if m|n(m divides n).

Proof: It is easy to see that aq = a if and only if a ∈ GF (q). If a ∈ GF (q) then it can beexpressed as a power of the primitive element β of GF (q) as a = βi. Then

aq−1 = (βi)(q−1) = (βq−1)i = 1i = 1

as βq−1 = 1. If aq−1 = 1 then aq = a and a is the primitive element of a finite field GF (q).

433

We can thus express GF (2n) = {x|x2n= x} and GF (2m) = {x|x2m

= x}. In preparationfor our proof we show that

(xm − 1)|(xn − 1) ⇔ m|n.

Given a non-negative integer k we have the identity

(xn − 1) = (xm − 1)(xn−m + xn−2m + . . . + xn−km) + xn−km − 1.

or,xn − 1

xm − 1= (xn−m + xn−2m + . . . + xn−km) +

xn−km − 1

xm − 1.

If n − km = m then

xn − 1

xm − 1= (xn−m + xn−2m + . . . + xn−km) + 1.

In this case n = m(k + 1) so that when m | n then (xm − 1) | (xn − 1).Now we show that m|n is a necessary and sufficient condition for GF (2m) to be a subfield

of GF (2n). The condition is necessary: if GF (2m) is a subfield of GF (2n) them m|n. Indeed,GF (2m) contains an element of order 2m − 1. When GF (2m) is a subfield of GF (2n) thenGF (2n) contains an element of order 2m − 1. Thus, (xm − 1)|(xn − 1); therefore, accordingto the previous statement, m|n. The condition is sufficient: if m|n then GF (2m) is a subfieldof GF (2n). Indeed, m|n → (2m − 1)|(2n − 1) and this implies that (x2m − 1)|(x2n − 1), inother words that GF (2m) is a subfield of GF (2n).

�The minimal polynomial mγ(x) is the lowest degree polynomial with binary coefficients

that has γ ∈ GF (2m) as a zero.

Proposition. The minimal polynomial in GF (2m) has the following properties:

1. mγ(x) exists and is unique.

2. mγ(x) is irreducible.

3. If p(x) is a polynomial with binary coefficients and if p(γ) = 0 then minimal polynomialmγ(x) divides p(x).

4. The polynomial g(x) = x2m − x is the product of different minimal polynomials mγi(x)

with γi ∈ GF (2m).

5. deg mγ(x) ≤ m. If γ is a primitive element of GF (2m) then deg mγ(x) = m.

Proof: If γ is a primitive element of GF (2m) then γ2m−1 = 1 and γ2m − γ = 0. There exists apolynomial such that p(γ) = 0; the polynomial p(x) = x2m−x satisfies the condition p(γ) = 0.This polynomial must be unique; otherwise, if two polynomials, f1(x) and f2(x) of the samedegree existed, then their difference g(x) = f1(x) − f2(x) would have: (i) γ as a zero; (ii) alower degree, deg g(x) < deg f1(x). This contradicts the assumption that f1(x) and f2(x) areminimal polynomials.

434

To show that mγ(x) is irreducible we assume that mγ(x) = p(x)q(x). Thendeg[p(x)], deg[q(x)] < deg[mγ(x)] and at least one of the two polynomials p(x) and q(x) musthave γ as a root. This contradicts again the assumption that mγ(x) is a minimal polynomial.

It is easy to see that the minimal polynomial mγ(x) divides any polynomial p(x) withbinary coefficients if p(γ) = 0. Indeed we can express p(x) = mγ(x)q(x)+ r(x). But p(γ) = 0,thus r(γ) = 0. Therefore r(x) = 0.

To prove that the polynomial g(x) = x2m−x is the product of minimal polynomials mγi(x)

with γi ∈ GF (2m) we observe that all elements a ∈ GF (2m) are zeros of g(x) as they satisfythe equation a2m−1 − a = 0. Then the minimal polynomial ma(x) exists and it is unique;according to the property (4) ma(x) divides the polynomial g(x).

Finally, to show that deg mγ(x) ≤ m, we recall that GF (2m) can be viewed as an m-dimensional vector space over GF (2); thus, the elements 1, γ1, γ2, . . . , γm−1, γm are linearlydependent. This means that there are m + 1 coefficients gi ∈ GF (2), 0 ≤ i ≤ m such that:

gmγm + gm−1γm−1 + . . . + g1γ

1 + g0γ0 = 0.

Thus deg mγ(x) can be at most m. When γ is a primitive element of GF (2m) then GF (2m)must consist of 2m − 1 different elements, thus 1, γ1, γ2, . . . , γm−1 must be linearly indepen-dent. This shows that when γ is a primitive element of GF (2m) then deg mγ(x) = m.

�The following proposition allows us to introduce the generator matrix of an R-S code.

Proposition. If Cg is a cyclic code with generator polynomial g(x) and h(x) = (xn − 1)/g(x)

then the dual code C⊥g has a generator polynomial g⊥(x) = xk−1h(1/x) with h(x) =

∑k−1i=0 hix

i.Thus, g⊥(x) = h0xk−1 + h1x

k−1 + . . . + hk−1.

Let α be a primitive element of GF (q). If n is a divisor of (q − 1) and β = α(q−1)/n

then βn = 1. Let C be the [n, k, d] Reed-Solomon code obtained by evaluating polynomials ofdegree at most (k − 1), p(x) ∈ Pk−1[x] over GF (2m) at xi = βi−1, 1 ≤ i ≤ n. Then C is acyclic code over GF (q) with generator polynomial

g(x) = (x − β)(x − β2) . . . (x − βn−k).

Proof: If c ∈ C then c = (p(β0), p(β1), p(β2), . . . , p(βn−2), p(βn−1)). A cyclic shift of thecodeword c leads to the n-tuple:

c =(p(βn−1), p(β0), p(β1), . . . , p(βn−3), p(βn−2)

).

Recall that a cyclic code is a cyclic subspace of a vector space and from the fact that c ∈ Cwe conclude that C is a cyclic code. We now construct the parity-check matrix of the codeand then show that this parity check matrix corresponds to a cyclic code with a generatorpolynomial having β, β2, . . . , βn−k as roots. First, we express c as

c =(p1(β

0), p1(β1), p1(β

2), . . . , p1(βn−2), p1(β

n−1))

with p1(x) = p(β−1). According to the previous proposition the generator matrix of the dualcode is an (n− k)×n matrix H = [hij] with hij = p(1/(xj)

i), 1 ≤ i ≤ n− k, 1 ≤ j ≤ n; when

435

we substitute xj = βj−1 the generator matrix of the dual code which is at the same time theparity check matrix of the R-S code can be expressed as

H =

⎛⎜⎜⎜⎜⎜⎝

1 (β)1 (β)2 . . . (β)n−2 (β)n−1

1 (β2)1 (β2)2 . . . (β2)n−2 (β2)n−1

1 (β3)1 (β3)2 . . . (β3)n−2 (β3)n−1

......

......

...1 (βn−k)1 (βn−k)2 . . . (βn−k)n−2 (βn−k)n−1

⎞⎟⎟⎟⎟⎟⎠

.

Recall from Section 4.13 that when g(x) is the generator polynomial of the cyclic [n, k] code Cg

over GF (q) and when g(x) has among its zeros (d−1) elements of GF (q), βα, βα+1, . . . , βα+d−2

with β ∈ GF (q) the parity check matrix of cyclic code Cg is

Hg =

⎛⎜⎜⎜⎝

1 (β)α (β)2α (β)3α . . . (β)(n−1)α

1 (β)α+1 (β)2(α+1) (β)3(α+1) . . . (β)(n−1)(α+1)

......

......

...1 (β)(α+d−2) (β)2(α+d−2) (β)3(α+d−2) . . . (β)(n−1)(α+d−2)

⎞⎟⎟⎟⎠ .

The similarity of H and Hg allows us to conclude that β, β2, . . . , βn−k are the zeros of thegenerator matrix of the R-S code thus,

g(x) = (x − β)(x − β2) . . . (x − βn−k). �

Example. Construct the generator polynomial of the [7, 3] R-S code C over GF (23) with theprimitive element β ∈ GF (23) satisfying the condition β3 + β + 1 = 0.

The distance of the code is d = n− k + 1 = 5. We can describe GF (8) by binary 3-tuples,000, 001, . . . 111, by listing its elements expressed as powers of the primitive element β, or bypolynomials p(x) of degree at most k − 1 = 2. For example, the tuple 011 corresponds to β3

and to the polynomial p(x) = x + 1.

Table 17: The correspondence between the 3-tuples, the powers of β, and the polynomials.Binary 3-tuple Power of primitive element Polynomial000 0001 β0 1010 β1 x100 β2 x2

011 β3 x + 1101 β4 x2 + x110 β5 x2 + x + 1111 β6 x2 + 1

Table 17 shows the correspondence between the 3-tuples, the powers of the primitiveelement, β, and the polynomials. Indeed, we observe that the polynomial x2 + x + 1 isirreducible and thus can be used to construct GF (8). In GF (8):

436

p0(x) = x0 = 1, p1(x) = x1 = x, p2(x) = x2 = x2

p3(x) = x3 = x + 1p4(x) = x4 = x(x3) = x(x + 1) = x2 + xp5(x) = x5 = x(x4) = x(x2 + x) = x3 + x2 = x2 + x + 1p6(x) = x6 = x(x5) = x(x2 + x + 1) = x3 + x2 + x = x + 1 + x2 + x = x2 + 1.

The generator polynomial of C and of its dual C⊥ when d = 5 are respectively:

g(x) = (x − β0)(x − β1)(x − β2)(x − β3) = (x − 1)(x − β)(x − β2)(x − β3)

and

g⊥(x) = (x − β−4)(x − β−5)(x − β−6) = (x − β3)(x − β2)(x − β1) = (x − β)(x − β2)(x − β3).

We notice that g(x)/g⊥(x) = x − 1 thus, the code C is self-dual.

Burst-error correction and Reed-Solomon codes. R-S codes are well suited forburst-error correction. The following proposition quantifies the burst error correcting capa-bility of an R − S code with symbols from GF (2m).

Proposition. If C is an [n, k] R-S code cover GF (2m) with t = (n − k)/2 then the corre-sponding [n × m, k × m] binary code is able to correct burst errors of length (m × t) bits.

An R-S code using an alphabet with m-bit symbols could correct any burst of [(t−1)×m+1]bits errors with t the number of symbols in error. The code can correct a burst of (m× t) biterrors, provided that the errors occur in consecutive symbols and start at a symbol boundary;the code could not correct any arbitrary pattern of (m× t) bit errors because such a patterncould be spread over (m × t) symbols each with one bit in error.

Example. The [255, 246] R-S code. In this case a symbol is an 8-bit byte, m = 8, n = 255,and k = 246, Figure 103. We transmit bytes of information in blocks of 255 bits with 246information bits in each block. The code has a minimum distance of dmin = n − k + 1 = 10and can correct t symbols in error:

t = n − k

2 = 255 − 246

2 = 9

2 = 4.

The R-S code could correct all bursts of at most (t− 1)×m + 1 = 3× 8 + 1 = 25 consecutivebits in error, see Figures 103 (a) and (b), and some bursts of t×m = 4× 8 = 32 bits in error,Figure 103 (c). It cannot correct any arbitrary pattern of 32 errors, Figure 103 (d).

Bose-Chaudhuri-Hocquenghem (BCH) codes. Given the prime power q, and inte-gers m and d, the BCH code BCHq;m;d is obtained as follows: let n = qm and GF (qm) bean extension of GF (q) and let C ′ be the (extended) [n; n − (d − 1); d]qm Reed-Solomon codeobtained by evaluating polynomials of degree at most (n − d) over GF (qm) at all the pointsof GF (qm). Then the code BCHq;m;d is the GF (q)-subfield subcode of C ′.

A simple argument shows a BCH code has dimension at least n−m(d− 1) [413]. Indeed,every function from GF (qm) to GF (q) is a polynomial over GF (q) of degree at most (n− 1).

437

Byte i Byte i+1 Byte i+2 Byte i+3 Byte i+4 Byte i+5 Byte i+6



(a)

(b)

(c)

Byte i Byte i+1 Byte i+5 Byte i+6Byte i+1 Byte i+4Byte i+3Byte i+2

(d)

Figure 103: Correction of burst-errors with the (255, 246) R-S code. The upper bar marksconsecutive bits affected by error. The code can correct: (a), (b) any burst of 25 consecutivebit errors; (c) some bursts of 32 consecutive bit errors. The code cannot correct any arbitrarypattern of 32 bit errors, (d).

The space of polynomials from GF (qm) to GF (q) is a linear space of dimension n. Now weask what is the dimension of the subspace of polynomials of degree at most (n − d). Therestriction that a polynomial p(x) =

∑n−1i=0 cix

i has degree at most (n − d) implies that theci = 0, i ∈ {n− (d−1), . . . , (n−1)}. Each such condition is a linear constraint over GF (qm)and this translates into a block of m linear constraints over GF (q). Since we have (d−1) suchblocks, the restriction that the functions have degree at most (n − d) translates into at mostm(d − 1) linear constraints. Thus, the resulting space has dimension at least n − m(d − 1).

BCH codes were independently discovered by Hocquenghem [209] and Bose and Ray-Chaudhuri [68]; the extension to the general q-ary case is due to Gorenstein and Zierler [173].

Next we examine codes used in radio and satellite communication to encode streams ofdata.

4.16 Convolutional Codes

Binary convolutional codes were introduced by Peter Elias in 1955 [141] as a generalization ofblock codes. Recall that in the case of a block code we encode k information symbols into ablock of n symbols and the rate of the code is R = k/n. In some cases, for example in the caseof R-S codes, we use one polynomial from a set of polynomials to generate one codeword. Inour presentation we only discuss the encoding algorithm for convolutional codes; the Viterbialgorithm for decoding convolutional codes [440] is not analyzed here.

We consider an infinite sequence of blocks . . . b−2, b−1, b0, b1, b2, . . . with k symbols perblock. The input symbols are fed sequentially into an encoder which first creates a sequence of

438

information blocks . . . In−2, In−1, In0, In1, In2, . . . with n symbols per block; then the encodercyclically convolutes the information blocks with a generating vector g = (g0g1 . . . gm) withg0 = gm = 1 and produces the output Outj; the rate of the cod is R = k/n.

To facilitate the presentation we consider an information vector consisting of N bits or NRinformation blocks obtained from K input blocks; the output of the encoder is a codeword ofNR blocks or N bits. The rate of the code is R=K/N=k/n. For example, in the binary casewith k = 1 and n = 2 the rate is R = 1/2. If we choose K = 8 then the information vectorand an output vector consist of N = KR = 16bits. A rule to construct the information vectorcould be

In = {InBit0, InBit1, . . . , InBitN−1} with InBitj =

{bi if j = iR0 otherwise.

In this case each input block consists of one bit, each information block consists of two bits,the first equal to the corresponding bit of the information block and the second bit being a 0.

Given N , R, and g = (g0g1 . . . gm) with m < N the convolutional code C encodes the infor-mation vector In = {In0, In1, . . . , InN−1} as the N -vector Out = {Out0, Out1, . . . OutN−1}with

Outj =m∑

i=0

giInj−i, 0 ≤ j ≤ N − 1.

The sum is modulo 2 and all indices are modulo N . We recognize that an N -vector (codeword)results from the cyclic convolution of an information vector with the generating vector thus,the name convolutional code. The main difference between a convolutional encoder and ablock code encoder is that the j-th code block Outj depends not only on the current Inj butalso on the earlier m information blocks, Inj−1, Inj−2, . . . , Inj−m.

Example. Encode the frame Fr = (11101010) from the input stream when N = 16, R = 1/2,and g = (11.10.11). To facilitate reading, in our examples the blocks of a codeword areseparated by periods.

The process of codeword construction is illustrated in Figure 104. Each bit of the framegenerates two bits of the information vector; Table 18 gives the expression of the individualbits of the codeword as the convolution of the generating vector and the information vector.Note that all indices are modulo N = 16 thus, In−1 = In15, In−2 = In14, . . . , In−5 = In11.

Given N , the length of a block, and R, the rate, the convolutional encoding algorithmdiscussed in this section generates a linear [N, K] code iff K = R×N . In this case there is aK × N generating matrix G of the code [228]. Then:

Out = Fr · GThe K rows of the generator matrix are: the first row consists of the m + 1 bits of thegenerating vector followed by N − (m + 1) zeros; rows 2 to K are cyclic shifts of the previousrow 1/R positions to the right.

439

Generating

vector:

Frame: Fr

Information vector: In

Codeword: Out

11.10.11

11101010

10.10.10.00.10.00.10.00

00.01.10.01.00.10.00.10

Figure 104: The construction of the codeword when R = 1/2, the frame size is N/2 = 8, thegenerating polynomial of degree m = 5 is g = (11.10.11).

Table 18: The expression of the individual bits of the codeword when Fr = (11101010),N = 16, R = 1/2, and g = (11.10.11).

Out0 = g0 · In0 +g1 · In15 +g2 · In14 +g3 · In13 +g4 · In12 +g5 · In11 = 0Out1 = g0 · In1 +g1 · In0 +g2 · In15 +g3 · In14 +g4 · In13 +g5 · In12 = 0Out2 = g0 · In2 +g1 · In1 +g2 · In0 +g3 · In15 +g4 · In14 +g5 · In13 = 0Out3 = g0 · In3 +g1 · In2 +g2 · In1 +g3 · In0 +g4 · In15 +g5 · In14 = 1Out4 = g0 · In4 +g1 · In3 +g2 · In2 +g3 · In1 +g4 · In0 +g5 · In15 = 1Out5 = g0 · In5 +g1 · In4 +g2 · In3 +g3 · In2 +g4 · In1 +g5 · In0 = 0Out6 = g0 · In6 +g1 · In5 +g2 · In4 +g3 · In3 +g4 · In2 +g5 · In1 = 0Out7 = g0 · In7 +g1 · In6 +g2 · In5 +g3 · In4 +g4 · In3 +g5 · In2 = 1Out8 = g0 · In8 +g1 · In7 +g2 · In6 +g3 · In5 +g4 · In4 +g5 · In3 = 0Out9 = g0 · In9 +g1 · In8 +g2 · In7 +g3 · In6 +g4 · In5 +g5 · In4 = 0Out10 = g0 · In10 +g1 · In9 +g2 · In8 +g3 · In7 +g4 · In6 +g5 · In5 = 1Out11 = g0 · In11 +g1 · In10 +g2 · In9 +g3 · In8 +g4 · In7 +g5 · In6 = 0Out12 = g0 · In12 +g1 · In11 +g2 · In10 +g3 · In9 +g4 · In8 +g5 · In7 = 0Out13 = g0 · In13 +g1 · In12 +g2 · In11 +g3 · In10 +g4 · In9 +g5 · In8 = 0Out14 = g0 · In14 +g1 · In13 +g2 · In12 +g3 · In11 +g4 · In10 +g5 · In9 = 1Out15 = g0 · In15 +g1 · In14 +g2 · In13 +g3 · In12 +g4 · In11 +g5 · In10 = 0

Example. The generator matrix of the convolutional code with N = 16, R = 1/2, andg = (11.10.11) is a 8 × 16 matrix. The first m + 1 = 6 elements of the first row are theelements of g, (111011), followed by N − (m + 1) = 16 − 6 = 10 zeros; the next K − 1 = 7rows are cyclic shifts to the right of the previous row with two positions as 1/R = 2.

440

G =

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 00 0 1 1 1 0 1 1 0 0 0 0 0 0 0 00 0 0 0 1 1 1 0 1 1 0 0 0 0 0 00 0 0 0 0 0 1 1 1 0 1 1 0 0 0 00 0 0 0 0 0 0 0 1 1 1 0 1 1 0 00 0 0 0 0 0 0 0 0 0 1 1 1 0 1 11 1 0 0 0 0 0 0 0 0 0 0 1 1 1 01 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

When Fr = (11101010) it is easy to check that Out = (00.01.10.01.00.10.00.10), a resultconsistent with the previous example. Indeed, (11101010) · G = (00.01.10.01.00.10.00.10).

Proposition. Every symbol encoded by a convolutional code depends on at most M + 1 pre-vious information symbols, where the integer M = �R× (m + 1)� − 1 is called the memory ofthe convolutional code.

Proof: Recall that Outj =∑m

i=0 giInj−i thus, each symbol of the resulting N -vector is theresult of the cyclic convolution of (m+1) symbols of the information vector. Let us concentrateon the first of these (m+1) symbols of the information vector, the symbol Inj inherited fromthe i-th symbol of the frame, Fri, with j = i/R. Similarly, the information symbol Injout

is inherited from the symbol FrM+i+1 with jout = (M + i + 1)/R. It is easy to see thatFrM+i+1 is outside of the range of the summation required by convolution. To show this werecall that when a = �b × r� and r < 1 then a/r ≥ b; in our case M + 1 = �R × (m + 1)�with R < 1 thus, M + 1/R ≥ M + 1. Recall that j = i/R so we conclude that

M + i + 1

R ≥ i

R + M + 1

R ≥ j + M + 1.

It follows that Outj may only depend of at most (M + 1) symbols of the original frame.�

The cyclic convolution rule formulated at the beginning of this section when used withrelatively short frames justifies the name tail-biting code given to this class of codes. Theencoding rule we discussed so far is too restrictive. Instead of a generating vector we canconsider a periodic function g alternating between k generating vectors or polynomials; inthis case the generating matrix consists of k different types of rows.

An (n, k,M) convolutional code is a sequence of block codes of length N = q ×n encoded using the cyclic convolution rule; q is an integer greater than 1. Aterminated (n, k, M) convolutional code is an (n, k, M) convolutional code with the last Minput bits set to zero.

Example. An encoder for a convolutional code with multiple generator polynomials. Theencoder consists of m registers each holding one input bit, and n modulo 2 adders. Eachadder has a generator polynomial.

The encoded sequence of say q input bits is of length 4q and represents a path in the statetransition diagram. The diagram in Figure 106 shows that not all transitions are possible andgives us a hint on the decoding procedure for convolutional codes.

441

1 0 -1 -2

1

(1101)

2

(1011)

3

(0011)

4

(1001)

Figure 105: A convolutional non-recursive, non-systematic encoder with k = 1 and n = 4and generator polynomials g1(x) = x3 + x2 + 1, g2(x) = x3 + x + 1, g3(x) = x + 1, andg4(x) = x3 + 1.

A sequence of symbols is valid only if it fits the state transition diagram of the encoder.If a sequence does not fit the state transition diagram of the encoder we have to identify thenearest sequence that fits the state transition diagram.

The encoder has 2n states and operates as follows:

• initially the m input registers are all set to 0;

• an input bit is fed to the m-bit register;

• the encoder outputs n bits, each computed by one of the n adders;

• the next input bit is fed and the register shifts one bit to the right;

• the next group of n bits is calculated.

We present an encoder with m = 1, n = 4, and k = 1 in Figure 105; the encoder is a finitestate machine and its transition diagram is shown in Figure 106.

The rate of the encoder is 1/4 and the encoder has 16 states. The four cells of theinput register are labelled from left to right as: In1, In0, In−1, and In−2. The four generatorpolynomials are:

g1(x) = x3 + x2 + 1, g2(x) = x3 + x + 1, g3(x) = x + 1, and g4(x) = x3 + 1.

Thus, the four output symbols are calculated as follows:

Out1 = In1 + In0 + In−2 mod 2 Out2 = In1 + In−1 + In−2 mod 2Out3 = In−1 + In−2 mod 2 Out4 = In1 + In−2 mod 2.

442

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

000

111

110

101

100

011

010

001

Figure 106: A trellis diagram describes the state transition of the finite state machine for theconvolutional code encoder with k=1 and n=4 in Figure 105. For example, if the encoder isin state 010 (In0 = 0, In−1 = 1, and In−2 = 0) and the input bit (In1) is 0, then the nextstate is 001; if the input bit is 1, then the next state is 101. The diagram allows us to followthe path of eight input bits.

Table 19 summarizes the state transitions. The state is given by the contents of memory cellsIn0, In−1, In−2; the input is In1. If at time t the system is in state 010 (In0 = 0, In−1 = 1and In−2 = 0) and:

(a1) the input is In1 = 0, then the next state of the encoder is 001 and the four output bitsproduced by the encoder are 0110 as

Out1 = 0 + 0 + 0 = 0, Out2 = 0 + 1 + 0 = 1, Out3 = 1 + 0 = 1, Out4 = 0 + 0 = 0.

(b1) the input is In1 = 1, then the next state of the encoder is 101 and the four output bitsproduced by the encoder are 1011 as

Out1 = 1 + 0 + 0 = 1, Out2 = 1 + 1 + 0 = 0, Out3 = 1 + 0 = 1, Out4 = 1 + 0 = 1.

If at time t + 1 the system is in state 001 and

(a2) the input is In1 = 0, then the next state of the encoder is 000 and the four output bitsproduced by the encoder are 1111 as

Out1 = 0 + 0 + 1 = 1, Out2 = 0 + 0 + 1 = 1, Out3 = 0 + 1 = 1, Out4 = 0 + 1 = 1.

(b2) the input is In1 = 1, then the next state of the encoder is 100 and the four output bitsproduced by the encoder are 0010 as

443

Table 19: The state transitions for the convolutional encoder.

Time Current state Input Input register Output register Next statet 010 0 0010 0110 001

1 1010 1011 101t+1 001 0 0001 1111 000

1 1001 0010 100101 0 0101 0111 010

1 1101 1010 110

Out1 = 1 + 0 + 1 = 0, Out2 = 1 + 0 + 1 = 0, Out3 = 0 + 1 = 1, Out4 = 1 + 1 = 0.

010

001

101

010

110

100

000

0

0

0

1

1

1

Figure 107: The state transition of the encoder in Figure 105 for a sequence of two input bits.The state is given by the contents of memory cells In0, In−1, In−2; the input is In1. Thevertical box next to a transition shows the output, Out1, Out2, Out3, Out4, generated by theencoder and the horizontal box shows the contents of the four cells In1, In0, In−1, In−2

Figure 107 shows the state transitions for a sequence of two input bits, the contents of thememory register, and the four output bits.

Convolutional codes are widely used in applications such as digital video, radio, mobilecommunication, and satellite communication. We next focus our attention to other familiesof codes able to encode large data frames and capable to approach the Shannon limit.

444

4.17 Product Codes

We first discuss encoding of large blocks of data and methods to construct composite codesfrom other codes. Then we survey some of the properties of product codes.

The question we address is how to encode large frames and decode them efficiently. Shan-non channel encoding theorem tells us that we are able to transmit reliably at rates closeto channel capacity, even though the number of errors per frame, e = p × N , increases withthe frame size, N , and with the probability of a symbol error, p. We also know that decod-ing complexity increase exponentially with N ; indeed, after receiving a frame encoded as acodeword we check if errors have occurred and, if so, compute the likelihood of every possiblecodeword to correct the error(s).

A trivial solution is to split the frame into N/n blocks of size n and use short codes withsmall n. We distinguish decoder errors, the cases when an n-tuple in error is decoded as a validcodeword, from decoding failures, when the decoder fails to correct the errors; if we denoteby perr probability of a decoding error and by pfail the probability of a decoding failure, thenPcF , the probability of a correct frame, and PeF , the probability of a frame error, are:

PcF = (1 − perr)Nn and PeF = 1 − (1 − pfail)

Nn .

The undetected errors have a small probability only if the error rate and the block size nare small; but a small n leads to a large PeF . Thus, we have to search for other means toencode large frames. A possible solution is to construct codes from other codes using one ofthe following strategies:

• Parity check. Add a parity bit to an [n, k, 2t − 1] code to obtain an [n + 1, k, 2t] code.

• Punctuating. Delete a coordinate of an [n, k, d] code to obtain an [n − 1, k, d − 1] code.

• Restricting. Take the subcode corresponding to the first coordinate being “the mostcommon element” and then delete the first coordinate: [n, k, d] �→ [n − 1, k − 1, d].

• Direct product. Use two or more codes to encode a block of data.

The methods to construct codes from other codes weaken codes asymptotically, with oneexception, the product codes discussed next.A product code is a vector space of n1 × n2 arrays. Each row of this array is a codeword of alinear [n1, k1, d1] code and each column is a codeword of the linear code [n2, k2, d2].

Proposition. C, the product code of C1 and C2, [n1, k1, d1] and [n2, k2, d2] linear codes, re-spectively, is an [n1 × n2, k1 × k2, d1 × d2] linear code.

Proof: We arrange the information symbols into an k1×k2 array and compute the parity checksymbols on rows and columns independently, using C1 for the rows and C2 for the columns;the parity checks on the parity checks computed either using C1 or C2 are the same due tolinearity, Figure 108. It follows that C is an [n1 × n2, k1 × k2] linear code.

The minimum distance codewords of C are those resulting from repeating a codeword ofminimum weight equal to d1 of C1 in at most d2 rows; the weight of these codewords is d1×d2

and this is the distance of C.�

445

Parity of

parityVertical parity

Horiz

onta

l parity

Original information

k1

k2

n2

n1

Figure 108: Schematic representation of product codes; horizontal, vertical and parity ofparity symbols are added to the original information.

Proposition. If a1 and a2 are the numbers of minimum weight codewords in C1 and C2,respectively, then the product code C has a1 × a2 codewords of minimum weight d1 × d2.

Proof: The minimum weight of any row is d1 and there can be at most a1 such rows; theminimum weight of any column is d2 and there can be at most a2 such columns. A minimumweight codeword of C is one corresponding to a choice of a row and of a column of minimumweight; there are only a1 × a2 such choices of weight d1 × d2.

Serial decoding of a product code C: each column is decoded separately using the decodingalgorithms for C2; then errors are corrected and the decoding algorithm for C1 is applied toeach row.

Proposition. When both C1 and C2 are [n, k, d] codes then the resulting product code C is an[n2, k2, d2] linear code and can correct any error pattern of at most e = (d2−1)/4 errors whenserial decoding is used.

Proof: e = (d2 − 1)/4 = [(d − 1)/2][(d + 1)/2]; when d = 2t + 1 then e = t(t + 1)and several columns affected by t + 1 errors will be decoded incorrectly; indeed, C2, thecode used for encoding the columns, is an [n, k, d] code and can correct at most t errors.But, as the total number of errors is limited by e = t(t + 1), we may have at most tsuch columns affected by errors thus, the maximum number of errors left after the firstdecoding step in any row is t; these errors can be corrected by C1, the [n, k, d] row code.

�In the next section we introduce concatenated codes; we show that as the length of a

codeword increases concatenated codes can be decoded more efficiently than other classes ofcomposite codes.

4.18 Serially Concatenated Codes and Decoding Complexity

The concatenated codes were discovered by David Forney [154] at a time when the decodingproblem was considered solved through (i) long, randomly chosen convolutional codes with

446

a dynamically, branching tree structure and (ii) sequential decoding, namely exclusive treesearch techniques characterized by a probabilistic distribution of the decoding computations.The balance between code performance and decoding complexity was tilted in favor of codeperformance, measured by the number of errors corrected by the code.

Concatenated codes resulted from Forney’s search for a class of codes having two simul-taneous properties:

1. The probability of error decreases exponentially at all rates below channel capacity.

2. The decoding complexity increases only algebraically with the number of errors correctedby the code and, implicitly, with the length of a codeword.

Outer

encoder

Inner

EncoderChannel

Inner

Decoder

Outer

Decoder

Figure 109: The original concatenated code was based on a relatively short random “inner”code and used maximum likelihood (ML) decoding to achieve a modest error probability,Prob(error) ≈ 10−2, at a near capacity rate. A long, high-rate Reed-Solomon (R-S) code,and a generalized-minimum-distance (GMD) decoding scheme was used for the “outer” level.

Figure 109 illustrates the coding system proposed by Forney in his dissertation; a messageis first encoded using an “outer” Reed-Solomon [N, K, D] code over a large alphabet withN symbols and then with an “inner” [n, k, d] binary code with k = log2 N . The result isan [N × n,K × k, D × d] binary code, Figure 110. The decoding complexity of the code isdominated by the complexity of the algebraic decoder for the Reed-Solomon code while theprobability of error decreases exponentially with n, the code length, at all rates below channelcapacity.

Information transmission at Shannon rate is achievable using an efficient encoding and de-coding scheme and Forney’s concatenation method provides such an efficient decoding scheme.Consider a binary symmetric channel (BSCp) with p the probability of random errors. Thefollowing formulation of Shannon channel coding theorem [413] states that reliable informationtransmission through a binary symmetric channel is feasible at a rate

R = 1 − H(p) − ε with ε > 0.

Theorem. There exist a code C with an encoding algorithm E : {0, 1}k �→ {0, 1}n and adecoding algorithm D : {0, 1}n �→ {0, 1}k with k = (1 − H(p) − ε) × n such that:

Prob(decoding error for BSCp) < e−n.

A corollary of this theorem discussed next shows that given a binary symmetric channelif we select an “outer” Reed-Solomon (N,K, εN) code with K = (1 − ε)N and an “inner”binary (n, k, d) code with k = (1 − H(p) − ε) × n, then we should be able to correct randomerrors occurring with probability p and in this process limit the decoding failures by e−n whenk ≥ log2N .

447

Concatenated code

Outer code Inner code

Alphabet A with K symbols

Length: N=2nr

Rate: RK = N x R

Alphabet B with k symbols

Length: n

Rate: r

k =n x r

Can be decoded in polynomial time in its length, 2nr

x n.

The error probability is exponentially decreasing with the length

Length: n x N = n x 2nr

Rate: r x R

Can be decoded in

polynomial time in N

Decoded using maximum

likelihood decoding with

exponential complexity.

Figure 110: The “inner” and the “outer” codes of a concatenated code.

Corollary. There exist a code C with an encoding algorithm E ′ : {0, 1}K �→ {0, 1}N and adecoding algorithm D′ : {0, 1}N �→ {0, 1}K such that:

Prob(decoding error) <1

N

where E ′ and D′ are polynomial time algorithms.

Proof: We divide a message m ∈ {0, 1}K into K/k blocks of size k = log K. Then weuse Shannon’s encoding function E to encode each of these blocks into words of size n andconcatenate the results, such that N = (K/k)n. The rate is preserved:

K

N=

k

n.

It follows that

Prob(decoding failure of [E ′,D′]K) ≤ K

kProb(decoding failure of [E ,D]k).

This probability can be made smaller or equal to 1KN

by selecting k = constant × log Kwith a large enough constant. It is assumed that the encoding and decoding algorithms areexponential in time, by picking up a linear encoding function.

�The resulting code has the following properties [413]:

(i) A slightly lower rate R = (1 − H(p) − ε)(1 − ε) ≥ 1 − H(p) − 2ε.

(ii) The running time is exponential in k and polynomial in N, exp(k)poly(N).

448

(iii) The probability of decoding failure is bounded by e−nN

Prob(decoding failure) ≤ Prob( εN2

inner blocks leading to decoding failure)

≤ (N

εN/2

)e−nεN/2

≤ e−nN .

The decoding effort can be broken into the time to decode the R-S code, O(n2) and the timeto decode the “inner code” n times (once for each “inner” codeword for n log n bits). Thusthe decoding complexity for the concatenated code is O(n2 log n).

Concatenated codes with an interleaver placed between the outer and the inner encoderto spread out bursts of errors and with a deinterleaver between the inner and outer decoderhave been used for space exploration since late 1970s. The serially concatenated codes havea very large minimum distance; they could be improved if the inner decoder could provideinformation to the outer decoder, but such a scheme would introduce an asymmetry betweenthe two. The turbo codes discussed next address these issues.

4.19 Parallel Concatenated Codes - Turbo Codes

Turbo codes were introduced by Claude Berrou and his coworkers in 1993 as a refinementof concatenated codes [52]. The encoding structure of concatenated codes is complementedby an iterative algorithm for decoding; by analogy with a turbo engine, the decoder uses theprocessed output values as a posteriori input for the next iteration, thus the name “turbo,”as shown in Figure 111. Turbo codes achieve small bit error rates at information rates muchcloser to the capacity of the communication channel than previously possible. If the rates ofthe inner and outer codes are r and R, respectively, then the rate of a serially concatenatedcode is Rs = r × R while the rate of a parallel concatenated code is

Rp =r × R

1 − (1 − r)(1 − R).

We start the discussion of turbo codes with a review of likelihood functions. Consider arandom variable X with the probability density function pX(x); the result of an observationof the random variable X should be classified in one of M classes. Bayes’s theorem expressesthe “a posteriori probability (APP)” of a decision d to classify the observation of X in a classi, d = i, 1 ≤ i ≤ M conditioned by an observation x as

Prob(d = i|x) =pX [x|(d = i)]Prob(d = i)

pX(x)1 ≤ i ≤ M,

where

pX(x) =M∑i=1

pX [x|(d = i)]Prob(d = i).

449

Outer encoder Interleaver Inner encoder

Deinterleaver Inner decoderOuter decoder

Channel

Interleaver

Figure 111: The feedback from the outer to the inner decoder of a turbo code.

Two decisions rules are widely used, ML, maximum likelihood, Figure 112, and MAP, maxi-mum a posteriori.

When d is a binary decision, d = {0, 1},

pX(x) = pX [x | (d = 0)Prob(d = 0) + pX [x | (d = 1)Prob(d = 1)

and the maximum a posteriori decision is a minimum probability error rule formulated as:

Prob(d = 0|x) > Prob(d = 1|x) =⇒ d = 0 and Prob(d = 1|x) > Prob(d = 0|x) =⇒ d = 1.

These conditions can be expressed as:

pX [x|(d = 1)]

pX [x|(d = 0)]<

Prob(d = 0|x)

Prob(d = 1|x)=⇒ d = 0 and

pX [x|(d = 1)]

pX [x|(d = 0)]>

Prob(d = 0|x)

Prob(d = 1|x)=⇒ d = 1.

Next we introduce the log-likelihood ratios, LLR, real numbers used by a decoder; theabsolute value of LLR quantifies the reliability of the decision and represents the “soft”component of the decision, while the sign selects either the value 0 or the value 1 and representsthe “hard” component of the decision. The LLR, the logarithm of ratio of probabilitiesrequired by the MAP rule is

L(d|x) = log

[Prob(d = 1|x)

Prob(d = 0|x)

]= log

[pX [x|(d = 1)]Prob(d = 1)

pX [x|(d = 0)]Prob(d = 0)

],

or

L(d|x) = log

[pX [x|(d = 1)]

pX [x|(d = 1)]

]+ log

[Prob(d = 1)

Prob(d = 0)

].

We rewrite the previous equality as

L′X(d) = Lchannel(x) + LX(d).

450

x=0

s

pX(x| d=0) pX(x| d=1)

x=1x=xk

v1

v0

Figure 112: The probability distribution functions pX [x|(d = 0)] and pX [x|(d = 1)] for acontinuous random variable X and a binary decision. The decision line is s. In case ofmaximum likelihood we decide that xk, the value of the random variable X observed at timek, is 0 (d = 0) because v0 > v1, xk falls on the left of the decision line.

In this expression Lchannel(x) is the result of a measurement performed on the output of thechannel and LX(d) is based on the a priori knowledge of the input data.

Feedback

Measurement of the output of

the communication channel

A priori knowledge of the

distribution of symbols (0, 1)

in a message

Log-likelihood ratio

used to decide if the

input was a 1 or a 0;

- the absolute value

indicates the reliability

of the decision;

- the sign selects the

input:

(+ ) 1

( - ) 0

Figure 113: The decoder of a turbo code uses feedback to complement a priori informationand the result of output channel measurement.

A turbo-decoder is at the heart of the scheme discussed in this section. As we can see inFigure 113 the decoder needs as input:

• A channel measurement,

• The a priory knowledge about the data transmitted, and

• Feedback from the previous iteration, called “extrinsic” information.

451

The soft output of the decoder, denoted as L(d) will consist of two components, L′X(d),

the LLR representing the soft decision of the decoder, and Lextrn(d), the feedback reflectingthe knowledge from the previous iteration of the decoding process:

L(d) = L′X(d) + Lextrn(d) = Lchannel(x) + LX(d) + Lextrn(d).

We illustrate these ideas with an example for a two-dimensional code from [394]. An [n1, k1]code is used for horizontal encoding and an [n2, k2] code is used for vertical encoding seeFigure 114.

At each step we compute also the horizontal and vertical extrinsic LLRs LextrnHoriz andLextrnV ert, respectively. The decoding algorithm consists of the following steps:

1. If a priori probability is available set L(d) to that value; if not set L(d) = 0.

2. Decode horizontally and set LextrnHoriz(d) = L(d) − Lchan(x) − L(d).

3. Set L(d) = LextrnHoriz(d).

4. Decode vertically and set LextrnV ert(d) = L(d) − Lchan(x) + L(d))

5. Set L(d) = LextrnV ert(d)

6. Repeat steps (2) to (5) until a reliable decision can be made then provide the soft outputL(d) = Lchan(x) + LextrnHoriz(d) + LextrnV ert(d).

Data

1 2

Horizontal Parity-

check

1 2 2

Vertical Parity-check

1 1 2

Horizontal

Extrinsic

extrnHoriz

Vertical Extrinsic

extrnVert

Figure 114: Schematic representation of a two-dimensional turbo code constructed from ahorizontal [n1, k1] code and a vertical [n2, k2] code; the data is represented by a k1 × k2

rectangle, the horizontal parity check bits by a k1 × (n2 − k2) rectangle, and the verticalparity check bits by a k2 × (n1 − k1) rectangle. The horizontal extrinsic and the verticalextrinsic LLR bits computed at each step are shown.

To verify that indeed the reliability of a decision made using two independent randomvariables is the minimum of the absolute value of individual LLRs and the “hard” component

452

of the decision is given by the signs of the two LLR we consider symbol-by-symbol MAPestimation of a set of symbols ck of a transmitted vector of symbols c encoded, transmittedthrough a communication channel subject to a Gauss-Markov process37 according to thedistribution pX|Y (r|c) , and finally received as a vector v = (v1, v2, . . . , vk, . . .).

To distinguish between two distinct random variables, instead of L(d) we use the notationLX(d) and assume that:

PX(x = 0) =e−LX(d)

1 + e−LX(d)and PX(x = 1) =

eLX(d)

1 + eLX(d).

If X1 and X2 are independent binary random variables then

LX1,X2(d1 ⊕ d2) = log1 + eLX1

(d1)eLX2(d2)

eLX1(d1) + eLX2

(d2)

with ⊕ the addition modulo two. This expression becomes

LX1,X2(d1 ⊕ d2) ≈ sign[LX1(d1)] · sign[LX2(d2)] · min[| LX1(d1) |, | LX2(d2) | ].

This brief introduction of turbo codes concludes our presentation of classical error cor-recting codes.

4.20 History Notes

While Claude Shannon was developing the information theory, Richard Hamming, a colleagueof Shannon’s at Bell Labs, understood that a more sophisticated method than the paritychecking used in relay-based computers was necessary. Hamming realized the need for errorcorrection and in the early 1950s he discovered the single error correcting binary Hammingcodes and the single error correcting, double error detecting extended binary Hamming codes.Hamming’s work marked the beginning of coding theory; he introduced fundamental conceptsof coding theory such as Hamming distance, Hamming weight, and Hamming bound.

In 1955, Peter Elias introduced probabilistic convolutional codes with a dynamic structurepictured at that time as a branching tree [141] and in 1961, John Wozencraft and BarneyReiffen proposed sequential decoding based on exhaustive tree search techniques for long convo-lutional codes [463]. The probability that the number of operations required for the sequentialdecoding exceeds a threshold D for an optimal choice of algorithm parameters is a Pareto(algebraic) distribution:

Prob(Number of Operations > D) ≈ D−α(R)

with α(R) > 1 and R the transmission rate; this implies that the number of operations isbounded when R < R0 and R0 was regarded as the “practical capacity” of a memoryless

37A Gauss-Markov process is a stochastic process X(t) with three properties: (i) If h(t) is a non-zeroscalar function of t, then Z(t) = h(t)X(t) is also a Gauss-Markov process. (ii)If f(t) is a non-decreasingscalar function of t, then Z(t) = X(f(t)) is also a Gauss-Markov process. (iii) There exists a non-zero scalarfunction h(t) and a non-decreasing scalar function f(t) such that X(t) = h(t)W (f(t)), where W (t) is thestandard Wiener process.

453

channel [155]. In 1963, Robert Fano developed an efficient algorithm for sequential decoding[145].

Work on algebraic codes started in late 1950s with codes over Galois fields [68] and theReed-Solomon codes [354]. This line of research continued throughout the decade of 1960swith the development of cyclic codes including the BCH codes, and the publication of theseminal book of Elwyn Berlekamp [50].

In the early 1960s the center of gravity in the field of error correcting codes moved toMIT. The information theory group at MIT was focused on two problems: (a) how to reducethe probability of error of block codes, and (b) how to approach channel capacity. In 1966,Robert Gallagher [167] showed that when a randomly chosen code and maximum-likelihooddecoding are used, then the probability of error decreases exponentially with the block length:

Prob(E) ≈ e−NE(R)

with R the transmission rate, C the cannel capacity, and E(R) > 0 the function relating theerror and the transmission rate. The probability of error decreases algebraically with D, thedecoding complexity:

Prob(E) ≈ D−E(R)/R

In mid 1960s David Forney introduced concatenated codes [154]. At the time when concate-nated coded were introduced the balance between code performance and decoding complexitywas tilted in favor of code performance, measured by the number of errors corrected by thecode while decoding complexity was largely ignored. The original concatenated code wasbased on a relatively short random “inner code.” A long, high-rate Reed-Solomon code, anda generalized-minimum-distance (GMD) decoding scheme was used for the “outer” level. Thedecoding complexity of this code is dominated by the complexity of the algebraic decoderfor the Reed-Solomon code while the probability of error decreases exponentially with n, thecode length, at all rates below channel capacity.

In his 1995 Shannon Lecture Forney writes: “I arrived at MIT in early sixties, duringwhat can be seen in retrospect as the first golden age of information theory research....Itwas well understood by this time that the key obstacle to practically approaching channelcapacity was not the construction of specific good long codes, although the difficulties infinding asymptotically good codes were already apparent (as expressed by the contemporaryfolk theorem All codes are good except those we know of [463]). Rather it was the problem ofdecoding complexity.”

A major contribution to the field of error correcting codes is the introduction in 1993 ofturbo codes. Turbo codes discovered by Claude Berrou and his coworkers [52] are a refinementof concatenated codes; they use the encoding structure of concatenated codes and an iterativealgorithm for decoding.

4.21 Summary and Further Readings

Coding Theory is concerned with the process of increasing the redundancy of a message bypacking a number k of information symbols from an alphabet A into longer sequences ofn > k symbols. A block code of length n over the alphabet A is a set of M , n-tuples where

454

each n-tuple takes its components from A and is called a codeword. We call the block codean [n, M ] code over A.

Linear algebra allows an alternative description of a large class of [n, k] block codes. Alinear code C is a k-dimensional subspace, Vk(F ) of an n-dimensional vector space, Vn, overthe field F . An [n, k] linear code, C, is characterized by a k × n matrix whose rows forms avector space basis for C. This matrix is called the generator matrix of C. Once we select a setof k linearly independent vectors in Vn we can construct a generator matrix G for C. Thenthe code C can be constructed by multiplying the generator matrix G with message vectors(vectors with k components, or k-tuples). Given two [n, k] linear codes over the filed F , C1

and C2 with generator matrices G1 and G2 respectively, we say that they are equivalent ifG2 = G1P with P a permutation matrix.

Given an [n, k]-linear code C over the field F the orthogonal complement, or the dual ofcode C, denoted as C⊥ consists of vectors (or n-tuples) orthogonal to every vector in C. Givenan [n, k]-linear code C over the field F , its dual, C⊥, is an [n, n − k] linear code over thesame field, F . If G = [IkA] is a generator matrix for C, or for a code equivalent to C, thenH = [−AT In−k] is a generator matrix for its dual code, C⊥.

Given an [n, k]-linear code C over the field F let H be the generator matrix of the dualcode C⊥. Then ∀c ∈ Vn HcT = 0 ⇔ c ∈ C. The matrix H is called the parity-check matrixof C. The error syndrome s of an n-tuple v is defined as sT = HvT . There is a one-to-onecorrespondence between the error syndrome and the bits in error, thus, the syndrome is usedto determine if an error has occurred and to identify the bit in error.

Ideally, we want codes with a large number of codewords (thus, large value of k), which cancorrect as many errors, e, as possible, and with the shortest possible length of a codeword, n.These contradictory requirements cannot be satisfied concurrently. Several bounds for linearcodes exist. Let C be an [n, k] code over GF (q) capable of correcting e errors. Then C satisfiesthe Hamming bound

qn ≥ qk

e∑i=0

(n

i

)(q − 1)i.

Every [n, k, d] linear code C over the field GF (q) satisfies the Singleton bound k + d ≤ n + 1.If C is an [n, k, d] linear code over GF (q) and if M(n, d) =| C | is the largest possible number

of codewords for a given n and d values, then the code satisfies the Gilbert-Varshamov bound

M(n, d) ≥ qn∑d−1i=0

(ni

)(q − 1)i

.

If C is an [n, k, d]-linear code and if 2d > n then the code satisfies the Plotkin bound

M ≤ 2 d

2d − n

with M =| C |.A q-ary Hamming code of order r over the field GF (q) is an [n, k, 3]-linear code with

n = qr − 1/q − 1 and k = n − r. The parity-check matrix of the code Hr is an r × n matrixsuch that it contains no all zeros column and no two columns are scalar multiples of eachother. The Hamming code of order r over GF (q) is a perfect code Cr.

455

The first-order Reed-Muller code, R(1, r) is a binary code with the generator matrix

G =

(1 1Hr 0

)=

(1Br

).

A cyclic subspace C of an n-dimensional vector space Vn over the field F is a set of vectors{c} ∈ C with the property:

c = (c0c1c2 . . . cn−2cn−1) ∈ C =⇒ c′ = (cn−1c0c1 . . . cn−2) ∈ C.

A linear code C is a cyclic code if C is a cyclic subspace. There is a one-to one correspondencebetween the cyclic subspaces of an n-dimensional vector space Vn over the field of F and themonic polynomials g(x) ∈ F [x] which divide f(x) = xn − 1, with F [x] a ring of polynomials.If

f(x) =

q∏i=1

gaii (x)

with ai positive integers and gaii (x), 1 ≤ i ≤ q are distinct irreducible monic polynomials then

Vn contains

Q =

q∏i=1

(ai + 1)

cyclic subspaces. If g(x) is a monic polynomial of degree n − k and it divides f(x) = xn − 1,then g(x) is the generator polynomial of a cyclic subspace of Vn. The dimension of this cyclicsubspace is equal to k.

Let g(x) be the generator polynomial of the cyclic [n, k] code C over GF (q). Let β ∈ GF (q)be a primitive element of order n. If g(x) has among its zeros βα, βα+1, . . . , βα+d−2 then theminimum distance of the cyclic code is d.

Codes designed to handle bursts of errors are called Burst Error Correcting codes (BEC). Aburst of length l is an n-tuple whose non-zero symbols are confined to a span of no fewer thanl symbols. A block code can correct all bursts of length ≤ l if and only if no two codewordsdiffer by the sum of two bursts of length less or equal to l. The burst error correcting abilityl of an [n, k] linear block code satisfies the Rieger Bound:

l ≤ n − k

2

Interleaving is a simple technique to ensure burst error correction capabilities of a code origi-nally designed for random error correction, or to enhance the capabilities of an existing bursterror correcting code. The basic idea of interleaving is to alternate the transmission of theindividual bits of a set of codewords.

Reed-Solomon (R-S) codes, are non-binary cyclic codes widely used for burst error cor-rection. Non-binary codes allow a lower density of codewords (the ratio of the number ofcodewords to the number of n-tuples), thus, a larger distance between codewords, that, inturn, means increased error correction capability.

456

Call Pk the set of polynomials of degree at most k over the finite filed GF (q); Pk ⊂GF (q)[x] is a vector space of dimension k. Consider n > k elements of the finite fieldx1, x2, ... . . . xn ∈ GF (q). Call fi(x) the generator of the codeword ci with

ci = (fi(x1), fi(x2), fi(x3), . . . fi(xn−1), fi(xn))

The [n, k] Reed-Solomon (R-S) code consists of all codewords generated by polynomials fi ∈Pk. An [n, k] R-S code can achieve the Singleton bound, the largest possible code distance(minimum distance between any pair of codewords) of any linear code, d = n − k + 1; thismeans that the code can correct at most t errors with:

t = d − 1

2 = n − k

2.

Concatenated codes have two simultaneous properties: (i) the probability of error decreasesexponentially at all rates below channel, capacity; and (ii) the decoding complexity increasesonly algebraically with the number of errors corrected by the code and implicitly the lengthof a codeword.

Information transmission at Shannon rate is achievable using an efficient encoding anddecoding scheme. Forney’s concatenation method provides such an efficient decoding scheme.In this method a message is first encoded using an “outer” R-S code [N, K, N − K] overa large alphabet with N symbols and an “inner” [n, k, d]2 binary code with k = log2 N .The result is an [Nn,Kk, Dd]2 binary code. The decoding is block by block. It is shownthat there exist an encoding algorithm E ′ : {0, 1}K �→ {0, 1}N and a decoding algorithmD′ : {0, 1}N �→ {0, 1}K with similar parameters such that Prob(decoding error) < 1/N whereE ′ and D′ are polynomial time algorithms.

Turbo codes are a refinement of concatenated codes and achieve small bit error rates atinformation rates much closer to the capacity of the communication channel than previouslypossible.

There is an extensive literature on error correcting codes. The papers by Elias [141],Reed and Solomon [354], Bose [68], Wozencraft and Reiffen [463], Fano [145], Gallagher [167],Berrou, Glavieux, and Thitimajshima [52] reflect best the evolution of ideas in the field oferror correcting codes, they are the original sources of information for more recently discoveredcodes. The second edition of the original text by Peterson [322] provides a rigorous discussionof error correcting codes. The text by MacWilliams and Sloane [281] is an encyclopedic guideto the field of error correcting codes. We also recommend the book by Berlekamp [50] for itstreatment of cyclic codes, BCH codes, as well as Reed-Solomon codes. The book by Forney[154] covers the fundamentals of concatenated codes. More recent books, [228] and [436],provide a rather accessible introduction to error correction. The comprehensive 2003 bookby MacKay, “Information Theory, Inference, and Learning Algorithms” is also a very goodreference for error correcting codes.

4.22 Exercises and Problems

Problem 1. Show that there exists a characteristic element in GF (q). Prove first that ifa ∈ GF (q), a �= 0 then the order of a divides q − 1.

457

Problem 2. Show that aq = a if and only if a ∈ GF (q)

Problem 3. Show that if β is a characteristic element of GF (q) then ∀a, b ∈ GF (q) we have(a + b)β = aβ + bβ.

Problem 4. Construct GF (32) using the irreducible polynomial f(x) = x2 + x + 2 anddisplay its addition and multiplication tables.

Problem 5. Construct GF (24) using the irreducible polynomial f(x) = x4 + x3 + x2 + x + 1and display its addition and multiplication tables.

Problem 6. Prove that given a binary [n, k] linear code C, the number of n-tuples at distancei from a codeword with weight w and at distance j from the decoded codeword is given by

T (i, j, w) =

(w

i − k

)(n − w

k

)

with k = 1/2(i + j − w) when (i + k − w) is even and when w − j ≤ i ≤ w + j; otherwiseT = 0.

Problem 7. Consider an [n, k, d] linear code C and a binary symmetric channel; let p be theprobability of a symbol being in error. Prove that in case of bounded distance decoding theprobability of a decoding failure, Pfail and of decoding error, Perr are:

pfail = 1 −e∑

i=0

(n

i

)pi(1 − p)n−i =

n∑i=e+1

(n

i

)pi(1 − p)n−i

and

perr =∑w>0

w+e∑i=w−e

e∑j=0

AwT (i, j, w)pi(1 − p)n−i.

with Aw the number of codewords of code C of weight equal to w.

Problem 8. Let C be an [n, k] linear code and C⊥ its dual. Show that:

∑w∈C

(−1)v·w =

{ | C | if v ∈ C⊥

0 if v /∈ C⊥

Problem 9. Let C be an [n, k] linear code with generator matrix G and C⊥ its dual. Showthat C is weakly self-dual, C⊥ ⊂ C if and only if:

GT G = 0.

Problem 10. Let G be the generator matrix of a binary [32, 16] code:

G =

⎛⎜⎜⎝

A I I II A I II I A II I I A

⎞⎟⎟⎠

458

with I and A given by:

I =

⎛⎜⎜⎝

1 0 0 00 1 0 00 0 1 00 0 0 1

⎞⎟⎟⎠ and A =

⎛⎜⎜⎝

1 1 1 11 1 1 11 1 1 11 1 1 1

⎞⎟⎟⎠

10.1 Construct H, the parity check matrix of the code.

10.2 Show that the code is self dual.

10.3 Construct the cosets of the code.

10.4 What is the minimum distance of the code and how many errors can it correct?

Problem 11. Consider a finite field with q elements, F (q) and an element α ∈ F . Prove thefollowing properties:

1. αj = 1 ⇐⇒ ord(α) = j.

2. ord(α) = s =⇒ α1 �= α2 �= . . . �= αs.

3. ord(α) = s, ord(β) = r, gcd(s, r) = 1 =⇒ ord(α · β) = s · r.4. ord(αj) = ord(α)

gcd(ord(α),j).

5. F (q) has an element of order (q − 1).

6. The order of any element in F (q) divides (q − 1).7. γq − γ = 0, ∀γ ∈ F (q).

Problem 12. Show that among the set of generator matrices of an [n, k] linear code C overthe field F there is one of the form G = [IkA].

Problem 13. Given an [n, k] linear code C over the field F the orthogonal complement, orthe dual of C, denoted as C⊥ consists of vectors (or n tuples) orthogonal to every vector in C:

C⊥ = {v ∈ Vn : v · c = 0, ∀ c ∈ C}Prove that C⊥ is a subspace of Vn.

Problem 14. Show that if you pick up a different set of base vectors you obtain a differentset of codewords for the linear code discussed in the example from Section 4.5. Determine abase such that the first three symbols in a codeword are precisely the information symbols ineach of the eight messages.

Problem 15. An [n, k, d] code over GF (q) C is an MDS (Minimum Distance Separable) codeif reaches the Singleton bound thus, d = n − k + 1. Show that the number of codewords ofweight (n − k + 1) of C is equal to (q − 1)

(nd

). Show that its dual, C⊥ has distance k + 1.

Problem 16. Prove that the congruence modulo a subgroup is an equivalence relationshipthat partitions an Abelian group into disjoint equivalence classes.

Problem 17. Show that the r+1 rows of the generator matrix G of a first order Reed-Mullercode are linearly independent.

459

Problem 18. If F [x] denotes the set of all polynomials in x with coefficients from the fieldF , f(x) is a non-zero polynomial, f(x) ∈ F [x], and F [x]/(f(x)) denotes the set of equivalenceclasses of polynomials modulo f(x) then the following propositions are true:

(i) F [x] is a principal ideal ring.

(ii) F [x]/(f(x)) is a principal ideal ring.

Problem 19. Let S be a subspace of an n-dimensional vector space Vn over the field F,S ⊂ Vn. Let R be the ring of polynomials associated with Vn and let I be the set of polynomialsin R corresponding to S. Show that S is a cyclic subspace of Vn if and only if I is an ideal inR.

Problem 20. Let f(x) = xn − 1 and let R be the ring of equivalence classes of poly-nomials modulo f(x) with coefficients in the field F , R = F [x]/(f(x)). If g(x) is amonic polynomial which divides f(x) then g(x) is the generator of the ideal of polynomi-als I = m(x) · g(x), m(x) ∈ R.

Problem 21. Show that there is a one-to-one correspondence between the cyclic subspacesof Vn and the monic polynomials g(x) which divide f(x) = xn − 1.

Problem 22. Show that the two codes defined in Section 4.11 Ch and Ch are equivalent codes.

Problem 23. In Section 4.11 we give an alternative method to construct the generator matrixof an [n, k] cyclic code with generator polynomial g(x). We divide xn−k+i, 0 ≤ i ≤ k − 1 bythe generator polynomial and we obtain a quotient q(i)(x) and a remainder r(i)(x):

xn−k+i = g(x) · q(i)(x) + r(i)(x), 0 ≤ i ≤ k − 1.

Consider an (n − k) × k matrix R with rows corresponding to −r(i)(x) and construct thegenerator matrix as G = [RIk]. Show that when we encode a message m = (m0m1 . . .mk−1)as a codeword w = mG, then the last k bits of the codeword contain the original message.

Problem 24. Show that binary Hamming codes are equivalent to cyclic codes.

Problem 25. Consider the following finite fields: (1) GF (28); (2) GF (29); (3) GF (33); (4)GF (52); (5) GF (73); (6) GF (39); (8) GF (101). Determine the values n, k, and d such thatan [n, k, d] Reed-Solomon code exists over each one of these fields.

Problem 26. Show that the dual of an [n, k] Reed-Solomon code over GF (q) is an [n, (n−k)]Reed-Solomon code over GF (q).

460

4 Classical Error Correcting Codes - University of Central...

Documents

Transcript of 4 Classical Error Correcting Codes - University of Central...