Digitally Implemented Adaptive Filters

12
IEEE TRANSACTIONS ON CIRCUIT THEORY, VOL. CT-20, NO. 2, MARCH 1973 On the Design of Gradient Algorithms for Digitally Implemented Adaptive Filters 12.5 RICHARD D. GTTLIN, J. E. MAZO, AND MICHAEL G. TAYLOR Abstract-The effect of digital implementation on the gradient (steepest descent) algorithm commonly used in the mean-square adaptive equalization of pulse-amplitude modulated data signals is considered. It is shown that digitally implemented adaptive gradient algo- rithms can exhibit effects which are significantly different from those encountered in analog (infinite precision) algorithms. This is illus- trated by considering the often quoted result of stochastic approxi- mation that to achieve the optimum rate of convergence in an adap- tive algorithm the step size should be proportional to l/n, where n is the number of iterations. On closer examination one finds that this result applies only when n is large and is relevant only for ana- log algorithms. It is shown that as the number of iterations becomes large one should not continually decrease the step size in a digital gradient algorithm. This result is a manifestation of the quantization inherent in any digitally implemented system. A surprising result is that these effects produce a digital residual mean-square error that is minimized by making the step size as large as possible. Since the analog residual error is minimized by taking small step sizes, the optimum step-size sequence reflects a compromise between these competing goals. The performance of a time-varying gain sequence suggested by stochastic approximation is contrasted with the performance of a constant step-size sequence. It is shown that in a digital environ- ment the latter sequence is capable of attaining a smaller residual error. I. INTRODUCTIONANDSUMMARY ECAIJSE of the many attractive features of Es digital technology it is expected that most future signal processors, in particular the filters and equalizers found in communication systems, will be digitally implemented. The wide variation in channel characteristics often makes it necessary that com- munications processors be of the adaptive type. In order to clarify, at the outset, the framework of our discussion we introduce the baseband pulse-amplitude modulated (PAM) data-transmission system shown in Fig. 1. The discrete-valued data sequence {a,) is first impulse modulated at the symbol rate l/T, and then distorted by the linear channel’ whose impulse response is denoted by h(t). The received signal, which is also corrupted by additive noise, is sampled at the symbol rate and then processed by the tapped-delay-line equalizer, which adaptively compensates for the dis- Manuscript received July 12, 1972. R. D. Gitlin and M. G. Taylor are with Bell Telephone Labora- tories, Inc., Holmdel, N. J. 07733. J. E. Mazo is with the Mathematical Research Center, Bell Tele- phone Labcratcries, Murray Hill, N. J. ’ This filter, which actually represents the cascade of the trans- mitter shaping filter and the physical channel, is unknown at the receiver. DATA TRAIN CHANNEL PO 8WnT) n” h(t) SAMPLE EVERY T. - DETLAk?NE y(nT)_ x(nT) SECONDS EOUALIZER NOISE Fig. 1. A simplified PAM data system. tortion introduced by the (unknown) channel. Adapta- tion is accomplished by adjusting the variable system (equalizer) parameters, using an appropriate algo- rithm, so as to continually decrease a suitable error measure. Our primary objective is to indicate the ef- fects of digital implementation on the commonly used adaptive gradient algorithm. By digital implementation we have in mind that the adjustable parameters, as well as all internal signal levels, are quantized to within a least significant digit (LSD). The performance of the algorithm is measured in terms of the residual, or irre- ducible, error present when adaptation stops and by the rate of convergence to the final parameter settings. One might think, a priori, that the only consequence of digital implementation is that the variable parameters are truncated to within an LSD of the optimum set- tings; however, the digital effects are of a far more significant nature, and produce a residual error that is considerably larger than the error due solely to trunca- tion. Since the adaptive gradient algorithm modifies the current parameter settings by adding a correction term, which is the product of a gradient estimate and a (possibly time-varying) step size, adaptation must stop when the correction term is smaller in magnitude than the LSD. Of course the algorithm might terminate at an earlier iteration, due to analog effects, and never dis- play any digital peculiarities. A surprising result is that when adaptation is terminated by digital effects, the residual error can be further decreased by incrensing the step size. Since the analog error is minimized by using a small step size, the best step size must reflect a com- promise between these competing goals. Further manifestations of digital implementation are observed when we consider the design of a time-varying step size. It is well known, from stochastic approxima- tion theory, that the convergence properties of an analog (infinite precision) gradient algorithm can be improved by using a step size that decreases with time. Such a step-size sequence, when used in an analog en- vironment, will produce an error that vanishes with

description

Adaptive filtering

Transcript of Digitally Implemented Adaptive Filters

  • IEEE TRANSACTIONS ON CIRCUIT THEORY, VOL. CT-20, NO. 2, MARCH 1973

    On the Design of Gradient Algorithms for Digitally

    Implemented Adaptive Filters

    12.5

    RICHARD D. GTTLIN, J. E. MAZO, AND MICHAEL G. TAYLOR

    Abstract-The effect of digital implementation on the gradient (steepest descent) algorithm commonly used in the mean-square adaptive equalization of pulse-amplitude modulated data signals is considered.

    It is shown that digitally implemented adaptive gradient algo- rithms can exhibit effects which are significantly different from those encountered in analog (infinite precision) algorithms. This is illus- trated by considering the often quoted result of stochastic approxi- mation that to achieve the optimum rate of convergence in an adap- tive algorithm the step size should be proportional to l/n, where n is the number of iterations. On closer examination one finds that this result applies only when n is large and is relevant only for ana- log algorithms. It is shown that as the number of iterations becomes large one should not continually decrease the step size in a digital gradient algorithm. This result is a manifestation of the quantization inherent in any digitally implemented system. A surprising result is that these effects produce a digital residual mean-square error that is minimized by making the step size as large as possible. Since the analog residual error is minimized by taking small step sizes, the optimum step-size sequence reflects a compromise between these competing goals.

    The performance of a time-varying gain sequence suggested by stochastic approximation is contrasted with the performance of a constant step-size sequence. It is shown that in a digital environ- ment the latter sequence is capable of attaining a smaller residual error.

    I. INTRODUCTIONANDSUMMARY

    ECAIJSE of the many attractive features of

    Es digital technology it is expected that most future signal processors, in particular the filters and

    equalizers found in communication systems, will be digitally implemented. The wide variation in channel characteristics often makes it necessary that com- munications processors be of the adaptive type. In order to clarify, at the outset, the framework of our discussion we introduce the baseband pulse-amplitude modulated (PAM) data-transmission system shown in Fig. 1. The discrete-valued data sequence {a,) is first impulse modulated at the symbol rate l/T, and then distorted by the linear channel whose impulse response is denoted by h(t). The received signal, which is also corrupted by additive noise, is sampled at the symbol rate and then processed by the tapped-delay-line equalizer, which adaptively compensates for the dis-

    Manuscript received July 12, 1972. R. D. Gitlin and M. G. Taylor are with Bell Telephone Labora-

    tories, Inc., Holmdel, N. J. 07733. J. E. Mazo is with the Mathematical Research Center, Bell Tele-

    phone Labcratcries, Murray Hill, N. J. This filter, which actually represents the cascade of the trans-

    mitter shaping filter and the physical channel, is unknown at the receiver.

    DATA TRAIN CHANNEL

    PO 8WnT) n

    h(t)

    SAMPLE EVERY T. - DETLAk?NE y(nT)_

    x(nT)

    SECONDS EOUALIZER

    NOISE

    Fig. 1. A simplified PAM data system.

    tortion introduced by the (unknown) channel. Adapta- tion is accomplished by adjusting the variable system (equalizer) parameters, using an appropriate algo- rithm, so as to continually decrease a suitable error measure. Our primary objective is to indicate the ef- fects of digital implementation on the commonly used adaptive gradient algorithm. By digital implementation we have in mind that the adjustable parameters, as well as all internal signal levels, are quantized to within a least significant digit (LSD). The performance of the algorithm is measured in terms of the residual, or irre- ducible, error present when adaptation stops and by the rate of convergence to the final parameter settings. One might think, a priori, that the only consequence of digital implementation is that the variable parameters are truncated to within an LSD of the optimum set- tings; however, the digital effects are of a far more significant nature, and produce a residual error that is considerably larger than the error due solely to trunca- tion.

    Since the adaptive gradient algorithm modifies the current parameter settings by adding a correction term, which is the product of a gradient estimate and a (possibly time-varying) step size, adaptation must stop when the correction term is smaller in magnitude than the LSD. Of course the algorithm might terminate at an earlier iteration, due to analog effects, and never dis- play any digital peculiarities. A surprising result is that when adaptation is terminated by digital effects, the residual error can be further decreased by incrensing the step size. Since the analog error is minimized by using a small step size, the best step size must reflect a com- promise between these competing goals.

    Further manifestations of digital implementation are observed when we consider the design of a time-varying step size. It is well known, from stochastic approxima- tion theory, that the convergence properties of an analog (infinite precision) gradient algorithm can be improved by using a step size that decreases with time. Such a step-size sequence, when used in an analog en- vironment, will produce an error that vanishes with

  • 126 IEEE TRANSACTIONS ON CIRCUIT THEORY, MARCH 1973

    time. In order to contrast the two environments, we first present some new results on the design of the step- size sequence that minimizes the analog error at each iteration. It is found that when the initial error is sub- stantially larger than the background noise variance the optimum step size is a constant while the error de- creases exponentially. If the background noise is suf- ficiently small the constancy of the optimum step size persists until the error enters the quantization region; on the other hand, if the precision is fine enough, the error and step size will eventually become inversely proportional to the number of iterations until such time as adaptation is terminated by virtue of the error en- tering the quantization region. The performance of the variable step-size algorithm is contrasted with a fixed step-size algorithm, and somewhat surprisingly it turns out that the latter algorithm is capable of attaining a smaller residual error. An effective compromise would entail improving the rate of convergence by initially using the step size associated with the variable step- size algorithm and then gear-shifting to the step size that minimizes the residual error.2

    is not as restrictive as it might initially appear to be since the form of the algorithm is rather general, while the parameters that ,affect the algorithms performance have physical significance. We now briefly describe the particular problem used to motivate our discussion.

    A. The Equalization Problem

    An essential component of any high-speed data- transmission system is an equalizer 1.21. The equalizer adaptively compensates for the intersymbol inter- ference introduced by the channel. For our discussion we only need to consider the sampled baseband re- ceived signal

    x(nT) = c a,h(nT - mT) + v(nT), m

    0 n = 0, * 1, +2, 1 f 1 (la)

    where a, are the information symbols,3 h( *) is the over- all system impulse response, V( .) is the additive channel noise, and l/T is the symbol as well as sampling rate. Rewriting the received samples as

    Section II is partly tutorial and contains the devel- opment of the design of adaptive gradient algorithms in the absence of quantization; new results are pre- sented concerning the design of the optimum step-size sequence. In Section III we consider the effect of digital implementation on the design and performance of the gradient algorithm.

    x(C) = anhO + c a,h(nT - mT) + v(nT) (lb) m#l

    clearly displays the second term as the intersymbol in- terference. The output of the familiar tapped-delay-line equalizer shown in Fig. 2 is

    N

    y(nT) = c c,x(nT - mT) = CTX, (14 nL=-N

    II. ANALOG DESIGN CONSIDERATIONS

    In this section we consider the design of adaptive gradient algorithms in the absence of any digital or quantization effects. Thus the quantities used to update the algorithm, such as the received data and step size, as well as the operations of multiplication and addition, are assumed to be available or performed with infinite precision. It is well known from stochastic approxima- tion theory [l] that the convergence rate is increased and the residual error is decreased when the step size is decreased with time. We will first present new re- sults, of a more detailed nature than those previously reported, on the design of the step-size sequence that minimizes the error at each iteration. The insight gained into the behavior of such analog algorithms will make the effects of a digitally implemented version of such an algorithm on both the dynamic (rate of conver- gence) and steady-state performance (residual error) more readily appreciated.

    and is recognized as the convolution of the 2N+l tap weights [cc] and the received samples. The second equality makes use of the vector notation4

    CN

    CN-1

    x,-N

    k&-N+1

    c= x0 GO

    :-N+l

    c-w &+N-1

    xn+N

    where the superscript T denotes transpose. The tap weights are iteratively adjusted so as to minimize the mean-square error at the equalizer output [3].

    Since the desired output is a,, the output error is given by

    For the sake of definiteness we consider the mean- square equalization of a PAM data signal. More specif- ically, we examine the adaptive gradient algorithm that uses a variable step size to rapidly attain the op- timum structure. Considering this particular problem

    e, = yn - a, (2)

    and the mean-square error is given by

    E = ([(yn - 42]> = c gn2 + (go - 1)2 + u2 (3) n#O

    * Certain system parameter values may dictate a gear-shift to a larger value.

    3 Binary symbols of value f 1 will be assumed for rcnvenience. 4 We have used the subscripted notation s,=z(nT). We freely

    apply this notation to all system variables, e.g., an-a(n).

  • GITLIN ~~~~.:ALGORITHMS FOR ADAPTIVE FILTERS 127

    Fig. 2. An adaptive transversal digital equalizer.

    where () denotes the expectation operation, u2 the noise variance at the equalizer output, and g, (which is the discrete convolution of h, and c,) represents the overall channel/equalizer impulse response. Perfect equaliza- tion would require that go = 1 and g, = 0 for n #O.

    To get a mathematical description of what the equalizer is trying to do, we write the mean-square error explicitly as a function of c, the tap-weight vector, and obtain

    E(c) = CTAC - 2CTV + 1 (4)

    where vT = (hN, . . . , ho, . . . , h--N) denotes the vector of the 2N+l center channel samples, and A = (x,x,~) is recognized as the sum of the channel correlation matrix (whose {jth element is z,,, h,-J&,-j) and the noise convariance matrix. Setting the gradient of the quadratic form given by (4) to zero gives the best tap setting c* and the resulting residual error which are given, respectively, by the familiar expressions

    AC* = v (5)

    and

    &(c*) = 1 - vTA-V. (6)

    B. Deterministic Mode of Operution

    In practice the optimum tap setting c* is obtained in an iterative manner using some form of the gradient (steepest descent) algorithm, as opposed to solving (5).5 To see how the system parameters affect the dynamic behavior of the gradient algorithm, we review the performance of this algorithm when the gradient of

    6 In actual operation of the eqttalizer the system impulse response . . . . . . .

    E, with respect to the tap weights, is available.6 The results obtained for this situation will provide a con- venient reference for the more complicated case when the gradient is estimated directly from the received data. In the gradient, or steepest descent, algorithm [411 [51 h t P t e a weights are adjusted according to the recursion

    G+1 = G - &JAG - v>, n = 0, i, 2, . . . (7)

    where A,, is a suitably chosen positive step size and cn is the vector of tap weights at the nth iteration. Note that under the above assumptions the tap adjustments are done in a completely deterministic manner. Thus when the matrix A and the vector v are available, the algorithm solves (5) in a recursive manner and there are no random fluctuations in the tap settings. The con- vergence properties of this algorithm are studied by introducing the tap-error vector

    En = cn - c * . (8)

    Using (4)-(8) it is easy to see that the mean-square error when the taps are adjusted in accordance with (7) satisfies

    where &(cn) = E, = &* + E,~AE, (9)

    &* = &(c*)

    is the (minimum) mean-square error when the taps are at the optimum setting. From (9) we see that the dy- namic behavior of (7) is determined by the quadratic term E,~A Ed, which can be shown to satisfy the recur- sion

    G,+I~AE~+I = E,~AE, - 2A,znTA2z, + An2znTA3ctl. (10)

    To study this recursion we let (Y and /3 denote, respec- tively, the minimum and maximum eigenvalues of the symmetric positive-definite matrix A, and recall that

    axTx 5 xTAx 5 /3xTx. (11).

    Applying (11) repeatedly, we obtain the upper bound

    E~+~~AE~+~ 5 (1 - ~cYA% + ,B2An2)cnTAzn (12)

    and we see that E~+~~AE~+~ will approach zero as n be- comes large (thus E, will approach &*) provided

    in = 1 - 2crA, + P2An2 < 1. (13)

    This can be guaranteed by choosing A,, 5 (2a/P). The rate of convergence will be optimized in the sense

    that the step size A,, will be chosen to minimize ylz at each n. Differentiating (13) we obtain the optimum step size, denoted by A*, which is given by

    A*=. P2

    6 This would be the case when isolated test pulses are used to . . . . IS generally unknown so that A and v are net avaIlable. adapt the equahzer.

  • 128 IEEE TKANSACTIONS ON CIRCUIT THEORY, MARCH 1973

    Notice that A* is a constant, independent of n, and is laboratory performance of the algorithm, particularly half the maximum permissible step size. With A, =A* in predicting the digital effects that are described in we see that Section III. a2

    E,+lTAen+l I 1 - - &,TA&. [ 1 We note that cn+l, as given by (18), depends on

    P2 (15) x0, Xl, * . * , x,, and by the above assumption is then

    independent of xn+i. This is a very useful observation

    Hence the convergence is exponentially bounded, since and one which will be used repeatedly in the sequel. Our first step in assessing the performance of the es- a2 n+1

    E+lT&l+l 5 1 - -$ [ 1 EJAEo (16) timated gradient algorithm is to relate the mean-square error, at the nth iteration, to the tap error. The in- stantaneous squared error is given by

    and the rate of convergence is seen to depend on the ratio of the minimum-to-maximum eigenvalues of the A-matrix. Note that if all the eigenvalues of A are the same, then cr equals p, and convergence is achieved in one step. This observation has been used by Chang [6] to design a rapidly converging equalizer structure.

    In summary, if the actual gradient of the mean- square error is available the tap weights will converge (for any initial setting) to their optimum setting at an exponentially bounded rate, and the best step size is found to be a constant. The rate of convergence is seen to be a function of the eigenvalue spread of the channel-plus-noise correlation matrix.

    C. Adaptive Mode of Operation

    In actual operation the exact gradient of the mean- square error is not available and must be estimated from the received data. A well-known estimate is the gradient of the instantaneous squared error e,2, where e, is given by (2). Using this estimate gives

    Ven2 = 2e,x, (17)

    where x,, is a vector whose entries are the 2N+l re- ceived samples &-N, . . . , x0, . . . , L&+N that are in the equalizer at the time t =nT. The mathematical expectation of (17) is, in fact, the actual gradient. The

    en 2 = (CnTXn - a,)2 = (&&TX, + X,TC* - uJ2

    and taking expectations gives the relation

    E, = (EnTA&,) + &(c*) (19)

    an expression closely related to (9) and one which re- flects the random nature of the algorithm. We observe that the mean-square error is the sum of the irreducible error &* given by (6) and the average power in a weighted tap-error vector. By considering the evolution of the term (E,~AE,), we shall be able to describe the dynamic behavior of the estimated gradient algorithm. Subtracting the optimum tap setting from both sides of (18) gives

    G+I = en - A,e,xn (20)

    and we may easily establish that

    E,+~~AE,+~ = E,~AE,, - 2A,LenxnTAz, l- A,2en2x,,TAx,L. (21)

    Applying the boynds developed in Appendix I we have the inequality

    (E,+~~AE,+I) = qn+l 5 [l - 2crA, + P2An2]qn + An2u2 (22)

    where

    u2 = 2/3(2N + 1) [KP + X&l. (23) adaptive or estimated gradient algorithm, suggested by Robbins and Monro ]7] and explicitly described by

    The quantities EL and X,,, are defined by8

    Widrow [S], is then used to adjust the equalizer taps in p = ((XnTX>2) (244 accordance with X ml8 = d/(G2) (24b)

    Cn+l = Cn - A,.e,.x,

    = cn - &(cnTxn - G)X, (18) and p is taken such that9

    CnTCn I (2N + lb. (24~) where we refer to A,, as the step size and to A, -e, -x,, as the correction term. We note that due to the random It is interesting to note the similarity of (12) and nature of the correction term An.en.x,, the tap vector is (22). We see that the random nature of the algorithm itself a random quantity, and it is desired to establish that the taps converge, in some probabilistic sense, to 7 This assumption permits us to easily take the average of both the optimum setting. sides of (18). Doing this indicates that the evolution cf the average

    Before discussing the convergence problem it shall tap vector (c,) is governed by (7) with E, replaced by (c,). Since our interest is in studying the dynamics of the mean-square error we

    be necessary to make the following assumption. We as- shall not say anything further about the average tap vector. *Note that p involves fourth-order statistics of the received sume that the sequence of vectors in the equalizer signal. {x) is independent of xm for m#n. The results ob- 9 The p can be thought of as representing the dynamic range of

    tained using this assumption agree well with observed the tap weight. See Appendix I for further discussion regarding the introduction of p.

  • GITLIN et cd. : ALGORITHMS FOR ADAPTIVE FILTERS 129

    is summarized by the a2 term.O The relation described by (22) is one which occurs frequently in the applica- tion of stochastic approximation techniques (it is [l, eq. (18)]), and is generally employed to develop con- ditions on the step-size sequence that are sufficient to quarantee that qn will converge to zero with increasing n. Our emphasis in this investigation is slightly dif- ferent since we wish to study the overall dynamic be- havior of the sequence gn in the presence of quantiza- tion. Consequently, we are very much interested in the transient behavior of P,, and A,, i.e., how should we choose the step size initially and what is the resulting mean-square error? Our policy is to choose the step size so as to minimize qn at each iteration. Setting the de- rivative of the right-hand side (RHS) of (22), with respect to A,, to zero gives the optimum step size

    A*= ffqn* n 7 a2+p2qn*

    n = 0, 1, 2, * . . (254

    or, solving for q,*,

    (25b)

    When A, and p,, are taken to be related through (25) we say that the step size is chosen optimally. The algo- rithm, when initiated, has an arbitrary po. The initial tap vector is generally taken to be a reasonable a priori setting [e.g., coT = (0, 0, . . . , 0, 1, 0, . . . , 0, 0) 1, and using (25a) the optimum A0 could then be deter- mined.

    We of course would like a description of the se- quences A, and qn as explicit functions of n. We conclude from (25a) that

    A,,* 2 0 (26)

    and in order that (25b) be consistent, i.e., that qn*2.0, it follows that

    An* 5 . P2

    An iterative bound on the optimum step size is ob- tained by substituting (25a) in the RHS of (22) to give

    (@In*> 2 qn+1* 5 q71* - -___ =

    a2 + P2qn* (1 - dt*)q,* (28)

    and combining (25) and (28) we have

    u2&+l* 5 (1 - crA,*)

    GA,*

    a - P2&+l* a - P2An* (29)

    lo We sometimes casually refer to u* as the variance of the back- ground noise, bllt one should not regard this as implying that it is due solely to additive noise on the channel.

    I1 More accurately, we minimize an upper bound on pn at each iteration. We assume that this upper bound displays the essential relation between the system parameters.

    Transposing and solving for A,+,* gives

    1 - (uA,,* A n+l* i An*.

    1 - f12Ane2 (30)

    which is valid in the range Oa2, for small n. Under this assumption the optimum step size, from (25a), is

    AL. n P

    Since in obtaining (31) we have, in effect, neglected the Lnoise it is not surprising that the optimum size is the same as that obtained in the absence of noise [see (14)]. Combining (28) and (31) we have

    qn+1* I 1 - f ( )

    qn* (32)

    thus the minimized mean-square error initially decays at an exponential rate while the optimum step size is a constant. This will be the mode of behavior until qn is on the order of 0~//3~, at which time the large initial error assumption is no longer valid and the dynamic behavior is governed by (30). The above observation will be most important in interpreting the consequences of digital implementation on the behavior of the gradient algorithm.

    When n is large we note that the monotonic decreas- ing nature of the An* sequence implies that there is a number no such that for n > no

    (33)

    and (30) simplifies to

    &l* A .+1* I

    1 + /Q&z* n > no.

    Iterating (34) yields

    A A n+no* I

    nil

    1 + (n - no)PAno (35)

    and using (25b), and the fact that p2An

  • 130 IEEE TRANSACTIONS ON CIRCUIT THEORY, MARCH 1973

    Fig. 3. Mean-square error and variable step size in the absence cf quantization.

    provides precise convergence by eliminating small fluctuations about the optimum point. Many practical algorithms employ a small but nonzero final step size to be able to track small drifts in the channel pulse response.

    In summary we have shown that, for analog imple- mentation, when the step size is chosen optimally, the step size is initially a constant and the mean-square error initially decreases at an exponential rate; the step size and mean-square error are ultimately (for large n) described by the familiar l/n decay to zero. These rela- tions are sketched in Fig. 3. In the next section we ccm- sider the effect of a digital implementation on the es- timated gradient algorithm.

    III. DIGITAL DESIGN CONSIDERATIONS

    Having developed the necessary background in the design of analog adaptive gradient algorithms we are now ready to consider the main subject of this inves- tigation-the design of digitally implemented adaptive gradient algorithms. Such algorithms are used when implementing adaptive systems whose adjustable pa- rameters are quantized. We, of course, have in mind an adaptive equalizer whose tap weights are quantized. We shall see that the performance characteristics of digital algorithms are quite different from their analog counterparts. Our primary observation12 is that the digitally implemented algorithm stops adapting when- ever the correction term is smaller in magnitude than the LSD of the corresponding tap weight.13 In general, quantization precludes the tap weights reaching the optimum (from analog considerations) setting, and as we shall see shortly, the digital effects significantly in-

    I* First made by Taylcr [S]. I3 We also assume that before this situation occurs the algorithm

    is operating in the analog region where the taps are free to-assume any value, and the correction term is computed with infinite preci- sion. Of course, adaptation will only be terminated in this manner if the error is driven into the quantization region.

    fluence the design of the algorithm. Moreover, the quantization closely couples the steady-state and dy- namic behavior of the digital estimated-gradient algo- rithm.

    A. Digital Residual Error

    In order to more easily contrast the analog and digital environments we first consider the case of constant step size. Suppose the (digital) algorithm is terminated due to quanitzation effects. Then we can estimate the value of the error when adaptation steps by setting the cor- rection term (applied to the ith tap) to be less than or equal to the LSD. This gives the fundamental in- equality

    1 A.eno.xno--i( I LSD (36)

    which is valid when adaptation stops, where no is the time at which the ith tap stops adapting. Suppose (36) is first satisfied for the ith tap, and as the particular input sample xnO--i propagates down the equalizer, the error and step size will further decrease in magnitude, thus insuring that this sample will turn off all the taps down the line. Because of this observation we as- sume that all the taps stop adapting at the same time. To a first approximation, it is reasonable to replace 1 x,+-i/ by its rms value, X,,,, and this gives the fol- lowing relation for the rms errorI when adaptation stops :

    1 e,, 1 4 s = e,(A). rmll

    (37)

    We call e,(A) the rms digital residual error (DRE). The above indicates that this error is inversely proportional to the step size. Therefore, it is clear that if adaptation is terminated due to digital effects one should try to make the step size as large as possible (while still guaranteeing convergence) in order to minimize the DRE. With a constant step size it is possible, however, that the error never enters the quantization region; consequently, in that case one could further reduce the error by decreasing the step size. We will have con- siderably more to say about the choice of step size in the sequel.

    To clarify the practical significance of (37) let us look at a numerical example. Consider a 17-tap equalizer15 with tap weights quantized to 12 bit (LSDgO.25 X 10e3) with an input data stream having an rms value of unity, and with the step size fixed at 0.07. For this example, (37) estimates that the DRE will be about 0.35 X 10m2. Let us compare this with the rms error that would be expected if the only source of error were the

    I4 It is reasonable to say that Ien0 1 is approximately equal to dcO; we will use this apprcximation when discussing the dynamic behavior of the algcrithm.

    I5 With this length equalizer we can assume that the analog resid- ual error &(c*) is negligible. It will be convenient to make this assumption in the sequel.

  • GITLIN et d. : ALGORITHMS FOR ADAPTIVE FILTERS 131

    quantization of the desired tap weights to 12 bit; that is, we assume that the ideal error-free equalizer has 17 taps, each of infinite precision. In Appendix II we show that the quantization error (QE) can be approximated by

    165 LSO=7.5X 10-S PRECISION =S.lDlGlTS

    = 27 BITS

    rms QE = N12.LSD.X InIs. (38)

    For the numerical example described above, this QE is about 10m3. Thus the DRE due to the failure of the algorithm to find the best coefficient is roughly 3.5 times worse than the rms error would be if the best 12-bit coefficients had been found. As explained in Ap- pendix III, the ratio of the DRE to the QE is pro- portional to P2. Thus the residual error phenom- enon will be even more pronounced in longer digital equalizers.

    10-f

    IO-

    The ratio of the DRE to the QE clearly demonstrates the manifestations of digital implementation. The tap weights are, to a first approximation, trying to approach the quantized versions of the optimum settings; how- ever, when the tap weights get close to the optimum setting the mean-square error and step size have de- creased appreciably, and by the nature of the algo- rithm the taps try to approach the optimum setting by using very small correction terms. Once the correction term becomes smaller than the LSD, adaptation stops, and, as shown in Appendix III, the algorithm ter- minates while the taps are appreciably further away from the optimum setting than one LSD. Hence the quantization is enhanced, and produces the relatively large DRE.16

    Dl .dl k

    Fig. 4. RMS error as a function of the convergence constant k.

    To see how the above observations are manifested in practice we show, in Fig. 4, the results of a computer experiment on a 17-tap digital adaptive equalizer. The experiment consisted of sending the same input stream into both the adaptive equalizer and the desired equalizer, a 17-tap equalizer with fixed optimum co- efficients. The adaptive equalizer was adjusted by minimizing the mean-squared difference between their outputs. The final rms erroP is plotted for various values of constant step size A(n). = K. This error is nor- malized in the sense that the input data stream has an rms value of unity and the largest tap weight is one.

    large as one would choose for initial rapid convergence and, as the results indicate, any change from this value of k will increase the residual error. One might think that once the minimum rms error had been achieved with one particular value of k, a subsequent decrease in k would have no effect since the corrections would be even less in magnitude than the LSD in the coefficients. This argument is not generally true, for although on the average the corrections are less than the LSDs, the fluctuations in the size of the instantaneous error may be sufficient to make some corrections large enough to perturb the coefficients. For example, it was ob- served that if the filter was adapted with k = 0.07 until the minimum rms error was attained and then k was decreased to 0.001, the rms error gradually increased up to the value of 0.95 X 10e5 as predicted by Fig. 3.

    Since our primary objective is to obtain the minimum mean-squared error, the above experimental results suggest that the conventional stochastic approximation method, namely continually decreasing A(n) for each successive value of n, should not be used .in a digital adaptive equalizer.

    For very large values of k, the rms error due to the fluctuations of the tap weights, dominates; hence choosing a smaller value for k improves the rms error. However, for all but very large values of k, the DRE phenomenon dominates so that making k smaller de- grades the performance. In the example illustrated in Fig. 4, there is little to be gained from making A(n) a function of n, at least from the standpoint of minimiz- ing the final rms .error. The value of 0.07 is about as

    B. Choosing a Fixed Step Size

    Applying the results obtained thus far we develop, in this section, a more complete understanding of the behavior of the constant step-size digital gradient algorithm. Suppose at n = 0 we begin with a large initial tap error. Then for any fixed step size A 5 (2cr/p2), we have, by solving (22) with equality, that the error de- creases exponentially to a value a-(A) given by

    I6 The preceding discussion suggests adaptive algorithms whcse correction term be in units of the LSD. Such an algorithm has been proposed by Lucky [9], and contrasted with the estimated gradient algorithm by Taylor [8].

    (39)

    In this experiment the mean-squared error was approximated by averaging the instantaneous squared error over 4000 iterations.

    provided qm(A) exceeds the corresponding digital resid- ual mean-square error ed2(A). When ed2(A) >r~,(aA), adaptation will cease when the mean-square error n,(A)

  • 132

    q,(A

    Fig. 5. Mean-square error using a constant step size: analog dominance.

    Fig. 6. Mean-square error using a constant step size: digital dominance.

    decreases to the value ed2(A). Figs. 5 and 6 sketch the trajectory of q,(A) under the respective conditions G,(A) >ed*@) and am(A) --&-- > 0.

    A second observation is that when A < (a/fi2), we have

    q-(A) = az A d/a,; then, as shown in Fig. 6, digital effects stop adaptation at the level cd(A). The final mean- square error may again be decreased by increasing the step size until (43) is satisfied. Fig. 7 shows cd(A) and

    -- A --- P2 P2 P2

    I* For simplicity, we are assuming an equalizer sufficiently long so that the irreducible error &(c*) is negligible.

  • GITLIN et al.: ALGORITHMS F6R ADAPTIVE FILTERS 133

    qoo(A) plotted as a function of A and the value A*. Note that since 0 A*, then q-(A) >ed2(A), analog effects dominate, and the final error is reduced by decreasing A. If A

    and hence the constant step-size algorithm is capable of attaining a smaller final value than the time-varying algorithm.ig Ideally, if one knew the parameters of the

    Now consider the situation when ~~/@~>ed"(c~/fi"). The mean-square error qn will decay in an exponential fashion until q,, =u2/p2, at which time both q,, and A,, will decrease roughly as l/n. Note that as A,, decreases, the quantization level ed2(A,) increases. As shown in Fig. 9, adaptation ceases when

    I9 We must still consider the case u*/@*>e2((~/@~), q,(&) = ed(A,) (444

  • 134 IEEE TRANSACTIONS ON CIRCUIT THEORY, MARCH 1973

    a;(A) 1 q,(A)

    2a .A 1 P

    Fig. 10. Determination of the residual errors resulting from fixed and time-varying step-size sequences.

    or, from (25b) and (42), when

    where A,, denotes the step size at stopping.20 We again wish to compare A* and A,,. This can be done by plot- ting the functions q,(A), q-(A), and Q(A) versus A, and identifying the quantities A* and A,,. Fig. 10 displays these functions and illustrates graphically the solution to (43) and (44b). Since q*(A)>q..(A), it is clear that A,, is always less than A*. Hence we again have

    %(A*) 5 ed(&,)

    i.e., the constant step-size algorithm can achieve a smaller final mean-square error. By noting that

    qm(A = a/p) = ; > &(A = (r/p2)

    we see that the mean-square error is reduced by de- creasing A, hence A* c&nTAe,) (45)

    where we have used the positive definite property of A to define AlI2 such that

    cnTA2cn = s,TA/~AA~~,,, = (A12~,L)TA(A12c,J

    > CY(A 12~,JT( A12~,) = c~~n~Ae,t. (46)

    The third term on the RHS of (21) is treated as follows:

    (e,2x,TAx,) = ((A 1/2x,e,)T(A1/2x71e71))

    = (z7LTzn>

    = 5 ([&p]2) (47) i=-iv

    where the components of zn = A112x,e, are denoted by z,.(i), i= --Iv, -iv+1, . . . , N.

    Using the definition

    ([z,(~)]~) = var z,ci) + (zn(i))2

    we can write

    (en2xnTAx,) = 5 var znci) + 5 (z,(i))2 i=--N i=--N

    un2 +

    where an2 = xi=-N=yar 2

    (ZnjT(Zn> (48)

    ,,ci). We now want to obtain an upper bound on an2 that is independent of n, but we first note that

    (2) = A12(~n~,T~, - a,~,) = A12[Ac, - AC*] = A32~vb. (49)

    Thus by applying (11) twice and using (49) we have

    2o Recall that when ~G/p

  • GITLIN et al. : ALGORITHMS FOR ADAPTIVE FILTERS 135

    I_, nn Returning to the expression for un2 we first observe that

    CT,2 5 (

    igN [zp]2) = ((CnWAX,X,%,

    - u&&Tx,x,TAx + an2xnTAx,j). (51)

    Considering the first item on the RHS of (51) gives

    (C,TX,X,*AX,X,TC,) I ~(C,TX,X,TX,X,TC,)

    = P(CnTXnCn) (52)

    where the matrix X, is defined by

    x, = X,XnTX,XnT. (53)

    Noting that the maximum eigenvalue of X, is [x,Tx,]~ and using the independence of xn and cn permits us to write

    (CnTXnCn) I ((XnT&>2) * (GTG>. (54)

    If we denote the fourth-order term by

    P = ((XnT&J2> (55)

    and require that21

    CnTCn 5 (M7 + 1)P (56)

    then the first term on the RHS of (51) is upper bounded by Npp. The last term is simply bounded by noting that

    (u,2xnTAx,) = (X,TAX,)

    I P . (XrlTX >

    = P(2A7 + 1)Xnns2 (57)

    where the mean-square input

    X rm.3 2 = (x,2) = 2 Ii?,* + (Vn2) (58) mL=-cc

    is the average power in the received samples. The sec- ond term in (51) is recognized as a cross term by letting

    a = andxnTAx, __-

    b = c,~x&x,~Ax, (59)

    and can be absorbed by noting that

    a2 + b2 a,cnTx,xnTAxn = a6 5 ~ .

    2 (60)

    Combining the above inequalities we finally have that

    E,+I~AE~+I _< [l - 2aA, + P2An2](~,TA~,) + u2 (61)

    where

    q2 = 2PG3 + 1) [w + xm2]. 21 This would be true in any practical system; e.g., the dynamic

    range of the taps is taken to be limited to +di. In making this assumption we implicitly assume that the optimum tap setting does, m fact, lie within the dynamic range of the taps. If this is indeed true, then the algorithm will converge to the proper tap settings even though at some intermediate range one or more taps might sat- urate.

    Fig. 11. A model for analyzing truncation error.

    APPENDIX I I

    RR/IS QUANTIZATION ERROR

    In this appendix we shall derive an expression for the rms QE. Let us start by considering the following two equalizers shown in Fig. 11. The first has tap weights (cj ] that are assumed to be of infinite precision. The output of this equalizer is given by

    y(i) = 5 Cj.X(i -j). j&--N

    The tap weights in the second equalizer {Zj) are found by truncating the corresponding tap weights in the first filter. The output of the second equalizer equals

    g(i) = 5 tj*X(i - j). j=-N

    The QE equals

    Et(i) = y(i) - 9(i)

    = 5 (Cj - tjj) *x(i - j) j=-N

    and the mean-squared QE equals

    Q= E[Et2(i)]

    =E [

    5 2 (~~-~~).(~~-2~).x(i--j).x(i-Z) 1 j=-N l--N Let us assume that the inputs {xi) are independent

    random variables with rms value equal to X,,,. Then

    5 (2~ + 1) .LSD2.X,,s2

  • 136 IEEE TRANSACTIONS ON CIRCUIT THEORY, MARCH 1973

    where LSD is the value of the LSD in the truncated is a lower bound on the maximum step size, where X,,, tap weights. The above expressions are exact for the is defined by (58). case of statistically independent inputs and provide Combining the above we can lower bound the ratio useful approximate results when the inputs are de- of DRE to QE as follows: pendent. ed LSD.(2N + 1)Xrms2 - >-

    APPENDIX III

    DIGITAL RESIDUAL ERROR VERSUS QUANTIZATION ERROR

    L Et

    X + (2~ + 1)2.LSD.X,,,

    As explained in Section III, the rms DRE using the estimated gradient algorithm can be approximated by = f (2N + 1)12 (65) a

    LSD edc-------. (62) a number that grows with the length of the equalizer. --

    REFERENCES

    In Appendix II, the rms QE was found to be upper [I] bounded by

    et2 5 (2N + l)2.LSD.X,,,. (63) [2]

    To relate these two errors we must express A as a [31 function of N. This can be accomplished by means of the sufficient condition (14) for the convergence of the [41 estimated gradient algorithm, namely PI

    O.--=- 1

    F = j fl - ,B trace A P WV + 1)-Lns2 (64) [91

    D. J. Sakrison, Stochastic approximation: A recursive method for solving regression problems, in Advances in Communication Theory, vol. 2, A. V. Balakhrishnan, Ed. New York: Academic, 1966. R. W. Lucky, J. Salz, and E. J. Weldon, Jr., Principles of Data Commulzications. New York: McGraw-Hill, 1968. A. Gersho, Adaptive equalization of highly dispersive channels for data transmission, Bell Syst. Tech. J., vol. 48, no. 1, pp. 55-70, 1969. D. G. Luenberger, Optimization by Vector Space Methods. New York: Wiley, 1969. B. Widrow, Adaptive Filters I: Fundamentals, System Theory Lab., Stanford Electronics Labs., Stanford Univ., Stanford, Calif., TR 6764-6, Dec. 1966. R. W. Chang, A new equalizer structure fcr fast start-up digital communication, Bell Syst. Tech. J., vcl. 50, no. 6, pp. 1969- 2014, 1971. H. Robbins and S. Monro, A stochastic approximation method, Ann. Math. Statist., vol. 22, pp. 400-407, 1951. M. G. Taylor, A comparison of algorithms for adapting digital filters, Symp. Dig. 1970 Canadian Symp. Communications, 1970. R. W. Lucky, Techniques for adaptive equalization of digital communication systems, Bell Syst. Tech. J., vol. 45, no. 2, 1966.