Stochastic Approximation Applications

368
8/13/2019 Stochastic Approximation Applications http://slidepdf.com/reader/full/stochastic-approximation-applications 1/368

Transcript of Stochastic Approximation Applications

Page 1: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 1/368

Page 2: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 2/368

Stochastic Approximation and Its Applications

Page 3: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 3/368

Page 4: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 4/368

Stochastic Approximation

and Its Applications

by

Han-Fu Chen Institute of Systems Science,

 Academy of Mathematics and System Science,

Chinese Academy of Sciences,

 Beijing, P.R. China

KLUWER ACADEMIC PUBLISHERSNEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW

Page 5: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 5/368

eBook ISBN:   0-306-48166-9Print ISBN: 1-4020-0806-6

©2003 Kluwer Academic PublishersNew York, Boston, Dordrecht, London, Moscow

Print ©2002 Kluwer Academic Publishers

 All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,mechanical, recording, or otherwise, without written consent from the Publisher 

Created in the United States of America

Visit Kluwer Online at: http://kluweronline.comand Kluwer's eBookstore at: http://ebooks.kluweronline.com

Dordrecht

Page 6: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 6/368

Contents

PrefaceAcknowledgments

1. ROBBINS-MONRO ALGORITHM1.1

1.21.31.4

1.51.6

Finding Zeros of a Function.Probabilistic MethodODE MethodTruncated RM Algorithm and TS Method

Weak Convergence MethodNotes and References

2. STOCHASTIC APPROXIMATION ALGORITHMS WITH

2.12.22.32.4

2.52.6

2.72.82.9

MotivationGeneral Convergence Theorems by TS MethodConvergence Under State-Independent ConditionsNecessity of Noise ConditionNon-Additive NoiseConnection Between Trajectory Convergence and Propertyof Limit PointsRobustness of Stochastic Approximation AlgorithmsDynamic Stochastic ApproximationNotes and References

3. ASYMPTOTIC PROPERTIES OF STOCHASTIC

EXPANDING TRUNCATIONS

APPROXIMATION ALGORITHMS3.13.2

3.3

Convergence Rate: Nondegenerate CaseConvergence Rate: Degenerate CaseAsymptotic Normality

v

ix

xv

1

2

4

10

16

2123

25

26

284145

49

57

6782

93

9596

103113

Page 7: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 7/368

vi STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

3.43.5

Asymptotic EfficiencyNotes and References

4. OPTIMIZATION BY STOCHASTIC APPROXIMATION

4.14.2

4.34.4

4.54.6

Kiefer-Wolfowitz Algorithm with Randomized DifferencesAsymptotic Properties of KW AlgorithmGlobal Optimization

Asymptotic Behavior of Global Optimization AlgorithmApplication to Model ReductionNotes and References

5. APPLICATION TO SIGNAL PROCESSING

5.15.25.35.45.55.65.7

Recursive Blind IdentificationPrincipal Component AnalysisRecursive Blind Identification by PCAConstrained Adaptive FilteringAdaptive Filtering by Sign AlgorithmsAsynchronous Stochastic ApproximationNotes and References

6. APPLICATION TO SYSTEMS AND CONTROL6.16.26.3

6.4

6.5

Application to Identification and Adaptive ControlApplication to Adaptive StabilizationApplication to Pole Assignment for Systems with UnknownCoefficients

Application to Adaptive RegulationNotes and References

Appendices

A.1

A.2A.3A.4A.5A.6

A.7

Probability SpaceRandom Variable and Distribution FunctionExpectationConvergence Theorems and InequalitiesConditional ExpectationIndependenceErgodicity

B.1B.2B.3

Convergence Theorems for MartingaleConvergence Theorems for MDS IBorel-Cantelli-Lévy Lemma

130149

151

153166172194210218

219

220238246

265273278288

289290

305

316321327

329329

329

330330331332333333335335339340

Page 8: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 8/368

Contents  vii

B.4

B.5

B.6

Convergence Criteria for Adapted SequencesConvergence Theorems for MDS IIWeighted Sum of MDS

References

Index

341

343

344

347

355

Page 9: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 9/368

Preface

Estimating unknown parameters based on observation data contain-ing information about the parameters is ubiquitous in diverse areas of both theory and application. For example, in system identification theunknown system coefficients are estimated on the basis of input-outputdata of the control system; in adaptive control systems the adaptivecontrol gain should be defined based on observation data in such a waythat the gain asymptotically tends to the optimal one; in blind chan-nel identification the channel coefficients are estimated using the output

data obtained at the receiver; in signal processing the optimal weightingmatrix is estimated on the basis of observations; in pattern classifica-tion the parameters specifying the partition hyperplane are searched bylearning, and more examples may be added to this list.

All these parameter estimation problems can be transformed to aroot-seeking problem for an unknown function. To see this, let de-note the observation at time i.e., the information available about theunknown parameters at time It can be assumed that the parameter

under estimation denoted by is a root of some unknown functionThis is not a restriction, because, for example, mayserve as such a function. Let be the estimate for at time Thenthe available information at time can formally be written as

where

Therefore, by considering as an observation on at withobservation error the problem has been reduced to seeking theroot of based on

It is clear that for each problem to specify is of crucial importance.The parameter estimation problem is possible to be solved only if 

ix

Page 10: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 10/368

x STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

is appropriately selected so that the observation error meets therequirements figured in convergence theorems.

If and its gradient can be observed without error at any desiredvalues, then numerical methods such as Newton-Raphson method amongothers can be applied to solving the problem. However, this kind of methods cannot be used here, because in addition to the obvious problemconcerning the existence and availability of the gradient, the observationsare corrupted by errors which may contain not only the purely randomcomponent but also the structural error caused by inadequacy of theselected

Aiming at solving the stated problem, Robbins and Monro proposedthe following recursive algorithm

to approximate the sought-for root where is the step size. Thisalgorithm is now called the Robbins-Monro (RM) algorithm. Follow-ing this pioneer work of stochastic approximation, there have been alarge amount of applications to practical problems and research workson theoretical issues.

At beginning, the probabilistic method was the main tool in con-vergence analysis for stochastic approximation algorithms, and ratherrestrictive conditions were imposed on both and For example,it is required that the growth rate of is not faster than linear as

tends to infinity and is a martingale difference sequence [78].Though the linear growth rate condition is restrictive, as shown by sim-ulation it can hardly be simply removed without violating convergencefor RM algorithms.

To weaken the noise conditions guaranteeing convergence of the algo-

rithm, the ODE (ordinary differential equation) method was introducedin [72, 73] and further developed in [65]. Since the conditions on noiserequired by the ODE method may be satisfied by a large class of including both random and structural errors, the ODE method has beenwidely applied for convergence analysis in different areas. However, inthis approach one has to  a priori assume that the sequence of estimates

is bounded. It is hard to say that the boundedness assumption ismore desirable than a growth rate restriction on

The stochastic approximation algorithm with expanding truncationswas introduced in [27], and the analysis method has then been improvedin [14]. In fact, this is an RM algorithm truncated at expanding bounds,and for its convergence the growth rate restriction on is not re-quired. The convergence analysis method for the proposed algorithmis called the trajectory-subsequence (TS) method, because the analysis

Page 11: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 11/368

PREFACE  xi

is carried out at trajectories where the noise condition is satisfied andin contrast to the ODE method the noise condition need not be veri-fied on the whole sequence but is verified only along convergentsubsequences This makes a great difference when dealing with

the state-dependent noise because a convergent subsequenceis always bounded while the boundedness of the whole sequence

is not guaranteed before establishing its convergence. As shown inChapters 4, 5, and 6 for most of parameter estimation problems aftertransforming them to a root-seeking problem, the structural errors areunavoidable, and they are state-dependent.

The expanding truncation technique equipped with TS method ap-pears a powerful tool in dealing with various parameter estimation prob-lems: it not only has succeeded in essentially weakening conditions forconvergence of the general stochastic approximation algorithm but alsohas made stochastic approximation possible to be successfully applied indiverse areas. However, there is a lack of a reference that systematicallydescribes the theoretical part of the method and concretely shows theway how to apply the method to problems coming from different areas.To fill in the gap is the purpose of the book.

The book summarizes results on the topic mostly distributed over journal papers and partly contained in unpublished material. The bookis written in a systematical way: it starts with a general introductionto stochastic approximation and then describes the basic method usedin the book, proves the general convergence theorems and demonstratesvarious applications of the general theory.

In Chapter 1 the problem of stochastic approximation is stated andthe basic methods for convergence analysis such as probabilistic method,ODE method, TS method, and the weak convergence method are intro-duced.

Chapter 2 presents the theoretical foundation of the algorithm withexpanding truncations: the basic convergence theorems are proved byTS method; various types of noises are discussed; the necessity of theimposed noise condition is shown; the connection between stability of the equilibrium and convergence of the algorithm is discussed; the ro-bustness of stochastic approximation algorithms is considered when thecommonly used conditions deviate from the exact satisfaction, and themoving root tracking is also investigated. The basic convergence the-orems are presented in Section 2.2, and their proof is elementary andpurely deterministic.

Chapter 3 describes asymptotic properties of the algorithms: conver-gence rates for both cases whether or not the gradient of is degener-

Page 12: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 12/368

xii STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

ate; asymptotic normality of and asymptotic efficiency by averagingmethod.

Starting from Chapter 4 the general theory developed so far is ap-plied to different fields. Chapter 4 deals with optimization by usingstochastic approximation methods. Convergence and convergence ratesof the Kiefer-Wolfowitz (KW) algorithm with expanding truncations andrandomized differences are established. A global optimization methodconsisting in combination of the KW algorithms with search methods isdefined, and its a.s. convergence as well as asymptotic behaviors are es-tablished. Finally, the global optimization method is applied to solvingthe model reduction problem.

In Chapter 5 the general theory is applied to the problems arising

from signal processing. Applying the stochastic approximation methodto blind channel identification leads to a recursive algorithm estimatingthe channel coefficients and continuously improving the estimates whilereceiving new signal in contrast to the existing “block” algorithms. Ap-plying TS method to principal component analysis results in improvingconditions for convergence. Stochastic approximation algorithms withexpanding truncations with TS method are also applied to adaptive fil-ters with and without constraints. As a result, conditions required for

convergence have been considerably improved in comparison with theexisting results. Finally, the expanding truncation technique and TSmethod are applied to the asynchronous stochastic approximation.

In the last chapter, the general theory is applied to problems arisingfrom systems and control. The ideal parameter for operation is identifiedfor stochastic systems by using the methods developed in this book.Then the obtained results are applied to the adaptive quadratic controlproblem. Adaptive regulation for a nonlinear nonparametric system and

learning pole assignment are also solved by the stochastic approximationmethod.The book is self-contained in the sense that there are only a few points

using knowledge for which we refer to other sources, and these points canbe ignored when reading the main body of the book. The basic mathe-matical tools used in the book are calculus and linear algebra based onwhich one will have no difficulty to read the fundamental convergenceTheorems 2.2.1 and 2.2.2 and their applications described in the sub-

sequent chapters. To understand other material, probability concept,especially the convergence theorems for martingale difference sequencesare needed. Necessary concept of probability theory is given in AppendixA. Some facts from probability that are used at a few specific points arelisted in Appendix A but without proof, because omitting the corre-sponding parts still makes the rest of the book readable. However, the

Page 13: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 13/368

PREFACE  xiii

proof of convergence theorems for martingales and martingale differencesequences is provided in detail in Appendix B.

The book is written for students, engineers and researchers working inthe areas of systems and control, communication and signal processing,

optimization and operation research, and mathematical statistics.

HAN-FU  CHEN

Page 14: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 14/368

Acknowledgments

The support of the National Key Project of China and the NationalNatural Science Foundation of China is gratefully acknowledged. Theauthor would like to express his gratitude to Dr. Haitao Fang for hishelpful suggestions and useful discussions. The author would also liketo thank Ms. Jinling Chang for her skilled typing and to thank my wifeShujun Wang for her constant support.

xv

Page 15: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 15/368

ROBBINS-MONRO ALGORITHM

Chapter 1

Optimization is ubiquitous in various research and application fields.It is quite often that an optimization problem can be reduced to findingzeros (roots) of an unknown function which can be observed but

the observation may be corrupted by errors. This is the topic of stochas-tic approximation (SA). The error source may be observation noise, butmay also come from structural inaccuracy of the observed function. For

example, one wants to find zeros of but he actually observes func-tions which are different from Let us denote by the

observation at time the observation noise:

Here, is the additional error caused by the structural in-accuracy. It is worth noting that the structural error normally dependson and it is hard to require it to have a certain probabilistic property

such as independence, stationarity or martingale property. We call thiskind of noises as state-dependent noise.The basic recursive algorithm for finding roots of an unknown function

on the basis of noisy observations is the Robbins-Monro (RM) algorithm,which is characterized by its simplicity in computation. This chapter

serves as an introduction to SA, describing various methods for analyzing

convergence of the RM algorithm.In Section 1.1 the motivation of RM algorithm is explained, and its

limitation is pointed out by an example. In Section 1.2 the classical

approach to analyzing convergence of RM algorithm is presented, whichis based on probabilistic assumptions on the observation noise. To relax

restrictions made on the noise, a convergence analysis method connectingconvergence of the RM algorithm with stability of an ordinary differential

1

Page 16: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 16/368

2 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

equation (ODE) was introduced in nineteen seventies. The ODE methodis demonstrated in Section 1.3. In Section 1.4 the convergence analysisis carried out at a sample path by considering convergent subsequences.So, we call this method as Trajectory-Subsequence (TS) method, which

is the basic tool used in the subsequent chapters.In this book our main concern is the path-wise convergence of the

algorithm. However, there is another approach to convergence analy-sis called the weak convergence method, which is briefly introduced inSection 1.5. Notes and references are given in the last section.

This chapter introduces main methods used in literature for conver-gence analysis, but restricted to the single root case. Extension to moregeneral cases in various aspects is given in later chapters.

1.1. Finding Zeros of a Function.Many theoretical and practical problems in diverse areas can be re-

duced to finding zeros of a function. To see this it suffices to notice thatsolving many problems finally consists in optimizing some functioni.e., finding its minimum (or maximum). If is differentiable, thenthe optimization problem reduces to finding the roots of where

the derivative of In the case where the function or its derivatives can be observed

without errors, there are many numerical methods for solving the prob-lem. For example, the gradient method, by which the estimate forthe root  of  is recursively generated by the following algorithm

where denotes the derivative of This kind of problems belongs

to the topics of optimization theory, which considers general cases wheremay be nonconvex, nonsmooth, and with constraints.In contrast to the optimization theory, SA is devoted to finding zeros

of an unknown function which can be observed, but the observationsare corrupted by errors.

Since is not exactly known and even may not exist, (1.1.1)-like algorithms are no longer applicable. Consider the following simpleexample. Let be a linear function

If the derivative of is available, i.e., if we know and if can precisely be observed, then according to (1.1.1)

Page 17: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 17/368

 ROBBINS-MONRO ALGORITHM  3

This means that the gradient algorithm leads to the zero of by one step.

Assume the derivative of is unavailable but can exactly beobserved.

Let us replace by in (1.1.1). Then we derive

or

This is a linear difference equation, which can inductively be solved,and the solution of (1.1.3) can be expressed as follows

Clearly, tends to the root of as for any initialvalue This is an attractive property: although the gradient of isunavailable, we can still approach the sought-for root if the inverse of thegradient is replaced by a sequence of positive real numbers decreasinglytending to zero.

Let us consider the case where is observed with errors:

where denotes the observation at time the correspondingobservation error and the estimate for the root of at time

It is natural to ask, how will behave if the exact value of in (1.1.2) is replaced by its error-corrupted observation i.e., if  is recursively derived according to the following algorithm:

In our example, and (1.1.5) turns to be

Page 18: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 18/368

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 4

Similar to (1.1.3), the solution of this difference equation is

Therefore, converges to the root of if tends

to zero as This means that replacement of gradient by asequence of numbers still works even in the case of 

error-corrupted observations, if the observation errors can be averagedout. It is worth noting that in lieu of (1.1.5) we have to take the positive

sign before i.e., to consider

if rather than or more general, if is decreasing

as increases.This simple example demonstrates the basic features of the algorithm

(1.1.5) or (1.1.7): 1) The algorithm may converge to a root of 2) Thelimit of the algorithm, if exists, should not depend on the initial value; 3)The convergence rate is defined by that how fast the observation errorsare averaged out.

From (1.1.6) it is seen that the convergence rate is defined by

for linear functions. In the case where is a sequence of indepen-dent and identically distributed random variables with zero mean and

bounded variance, then

by the iterated logarithm law.This means that convergence rate for algorithms (1.1.5) or (1.1.7) with

error-corrupted observations should not be faster than

1.2. Probabilistic MethodWe have just shown how to find the root of an unknown linear function

based on noisy observations. We now formulate the general problem.

Page 19: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 19/368

 ROBBINS-MONRO ALGORITHM  5

Let be an unknown function with unknown rootAssume can be observed at each point with noise

and is the estimate for at timeStochastic approximation algorithms recursively generate to ap-

proximate based on the past observations. In the pioneer work of thisarea Robbins and Monro proposed the following algorithm

to estimate where step size is decreasing and satisfies the fol-lowing conditions and They proved

We explain the meaning of conditions required for step sizeCondition aims at reducing the effect of observation noises.To see this, consider the case where is close to and is closeto zero, say, with small.

Throughout the book, always means the Euclidean norm of avector and denotes the square root of the maximum eigenvalueof the matrix where means the transpose of the matrix  A.

By (1.2.2) andEven in the Gaussian noise case, may be large if  

has a positive lower bound. Therefore, in order to have the desiredconsistency, i.e., it is necessary to use decreasing gains

such that On the other hand, consistency can neither be

achieved, if decreases too fast as To see this, let

Then even for the noise-free case, i.e., from (1.2.2) we have

if is a bounded function.

Therefore, in this case

if the initial value is far from the true root and hence will neverconverge to

The algorithm (1.2.2) is now called Robbins-Monro (RM) algorithm.

where isthe observation at time is the observation noise,

Page 20: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 20/368

6 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

The classical approach to convergence analysis of SA algorithms isbased on the probabilistic analysis for trajectories. We now present atypical convergence theorem by this approach. Related concept andresults from probability theory are given in Appendices A and B.

In fact, we will use the martingale convergence theorem to prove thepath-wise convergence of i.e., to show For this, the

following set of conditions will be used.

A 1.2.1 The step size is such that 

A1.2.2  There exists a continuously twice differentiable Lyapunov func-

tion satisfying the following conditions.i) Its second derivative is bounded;

ii) and as

iii) For any there is a such that 

where denotes the gradient of 

A1.2.3 The observation noise is a martingale difference se-quence with

where is a family of nondecreasing

A1.2.4 The function and the conditional second moment of the

observation noise have the following upper bound 

where is a positive constant.

Prior to formulating the theorem we need some auxiliary results.

Let be an adapted sequence, i.e., is

Define the first exist time of from a Borel set

It is clear that i.e., is a Markov time.

Lemma 1.2.1  Assume and is a nonnegative supermartin-

gale, i.e.,

Page 21: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 21/368

 ROBBINS-MONRO ALGORITHM  7

Then is also a nonnegative supermartingale, where

The proof is given in Appendix B, Lemma B-2-1.

The following lemma concerning convergence of an adapted sequencewill be used in the proof for convergence of the RM algorithm, but thelemma is of interest by itself.

Lemma  1.2.2  Let be two nonnegative adapted se-

quences.

i) If and then converges a.s.

to a finite limit.

ii) If then

Proof. For proving i) set

Then we have

By the convergence theorem for nonnegative supermartingales, con-verges a.s. as

Since by the convergence theorem for martingales it

follows that converges a.s. as Since is

Noticing that both and converge a.s.

as we conclude that  is also convergent a.s. as

Consequently, from (1.2.5) it follows that converges a.s. as

For proving ii) set

measurable and  is nondecreasing, we have

Page 22: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 22/368

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 8

Taking conditional expectation leads to

Again, by the convergence theorem for nonnegative supermartingales,converges a.s. as Since by the same theorem also

converges a.s. as it directly follows that a.s.

Theorem 1.2.1  Assume Conditions A1.2.1–A1.2.4 hold. Then for anyinitial value, given by the RM algorithm (1.2.2) converges to the root 

of   a.s.  as

Proof. Let be the Lyapunov function given in A1.2.2. Expandingto the Taylor series, we obtain

where and denote the gradient and Hessian of respec-tively, is a vector with components located in-between the corre-sponding components of and and denotes the constant suchthat (by A1.2.2).

Noticing that is and taking con-ditional expectation for (1.2.6), by (1.2.4) we derive

Since by (A1.2.1), we have

Denoting

Page 23: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 23/368

 ROBBINS-MONRO ALGORITHM  9

and noticing by A1.2.2, iii) from (1.2.7) and (1.2.8) itfollows that

Therefore, and converges a.s. by the convergencetheorem for nonnegative supermartingales.

Since also converges a.s.

For any denote

Let be the first exit time of from and let

where denotes the complement to This means that is the firstexit time from after

Since is nonpositive, from (1.2.9) it follows that

for any

Then by (1.2.2), this implies that

By Lemma 1.2.2, ii), the above inequality implies

which means that must be finite a.s. Otherwise, we would have

a contradiction to A1.2.1. Therefore, after with

Page 24: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 24/368

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 10

possible exception of a set with probability zero the trajectory of must enter

Consequently, there is a subsequence such thatwhere as

By the arbitrariness of we then conclude that there is a subsequence,denoted still by such that Hence

However, we have shown that converges a.s. Therefore,a.s. By A1.2.2, ii) we then conclude that a.s.

Remark  1.2.1 If Condition A1.2.2 iii) changes to

then the algorithm (1.2.2) should accordingly change to

We now explain conditions required in Theorem 1.2.1. As noted in

Section 1.1, the step size should satisfy but the condition

may be weakened to

Condition A1.2.2 requires existence of a Lyapunov function Thiskind of conditions is normally necessary to be imposed for convergenceof the algorithms, but the analytic properties of may be weakened.The noise condition A1.2.3 is rather restrictive. As to be shown in thesubsequent chapters, may be composed of not only the random noisebut also structural errors which hardly have nice probabilistic properties

such as martingale difference, stationarity or with bounded variances etc.As in many cases, one can take to serve as Then from(1.2.4) it follows that the growth rate of as should not befaster than linear. This is a major restriction to apply Theorem 1.2.1.However, if we  a priori  assume that generated by the algorithm(1.2.2) is bounded, then is bounded provided is locallybounded, and then the linear growth is not a restriction for1,2,...}.

1.3. ODE Method

As mentioned in Section 1.2, the classical probabilistic approach toanalyzing SA algorithms requires rather restrictive conditions on theobservation noise. In nineteen seventies a so-called ordinary differentialequation (ODE) method was proposed for analyzing convergence of SA

Page 25: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 25/368

 ROBBINS-MONRO ALGORITHM  11

algorithms. We explain the idea of the method. The estimategenerated by the RM algorithm is interpolated to a continuous functionwith interpolating length equal to the step size used in the algo-rithm. The tail part of the interpolating function is shown to satisfy

an ordinary differential equation The sought-for root is theequilibrium of the ODE. By stability of this equation, or by assumingexistence of a Lyapunov function, it is proved that From

this, it can be deduced thatFor demonstrating the ODE method we need two facts from analysis,

which are formulated below as propositions.

Proposition  1.3.1 (Arzelà-Ascoli) Let be a set of 

equi-continuous and uniformly bounded functions, where by equi-continuity we mean that for any and any there exists

such that 

Then there are a continuous function and a subsequence of  

 functions which converge to uniformly in any finite interval of 

i.e.,

uniformly with respect to belonging to any finite interval.

Proposition  1.3.2 For the following ODE 

with

if there exists a continuously differentiable function such that as and 

then the solution to (1.3.1), starting from any initial value, tends toas i.e., is the global asymptotically stable solution to

(1.3.1).

Let us introduce the following conditions.

A1.3.2 There exists a twice continuously differentiable Lyapunov func-

tion such that as

and 

A1.3.1

whenever 

Page 26: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 26/368

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 12

In order to describe conditions on noise, we introduce an integer-valued function for any and any integer

For define

Noticing that tends to zero, for any fixed   diverges toinfinity as In fact, counts the number of iterationsstarting from time as long as the sum of step sizes does not exceedThe integer-valued function will be used throughout the book.

The following conditions will be used:

A1.3.3 satisfies the following conditions

A1.3.4 is continuous.

Theorem  1.3.1  Assume that A1.3.1, A1.3.2, and A1.3.4 hold. If for a fixed sample A1.3.3 holds and generated by the RM algorithm

(1.2.2) is bounded, then for this tends to as

Proof. Set

Define the linear interpolating function

It is clear that is continuous and

Further, define and the corresponding linear interpo-

lating function which is defined by (1.3.4) with replaced by

Since we will deal with the tail part of we define by shiftingtime in

Thus, we derive a family of continuous functions

Page 27: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 27/368

 ROBBINS-MONRO ALGORITHM  13

Let us define the constant interpolating function

Then summing up both sides of (1.2.2) yields

and hence

By the boundedness assumption on the family is uni-formly bounded. We now prove it is equi-continuous.

By definition,

Hence, we have

where since

From this it follows that

which tends to zero as and then by A1.3.3.For any we have

By boundedness of and (1.3.11) we see that is equi-continuous.

Page 28: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 28/368

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 14

By Proposition 1.3.1, we can select from a convergent subse-quence which tends to a continuous function

Consider the following difference with

which is derived by using (1.3.11).By (1.3.9) it is clear that for

Then from (1.3.12) we obtain

Tending to zero in (1.3.13), by continuity of and uniform con-vergence of to we conclude that the last term in (1.3.13)converges to zero, and

Page 29: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 29/368

 ROBBINS-MONRO ALGORITHM  15

By A1.3.2 and Proposition 1.3.2 we see asWe now prove that Assume the converse: there is a

subsequence

Then for There is a such thatBy (1.3.4) we have

where and denotesthe integer part of so

It is clear that the family of  functions indexed byis uniformly bounded and equi-continuous. Hence, we can select a

convergent subsequence, denoted still by The limit satisfiesthe ODE (1.3.14) and coincides with being the limit of bythe uniqueness of the solution to (1.3.14).

By the uniform convergence we have

which implies thatFrom here by (1.3.15) it follows that

Then we obtain a contradictory inequality:

for large enough such that and This completes

the proof of We now compare conditions used in Theorem 1.3.1 with those in The-

orem 1.2.1.

Conditions A1.3.1 and A1.3.2 are slightly weaker than A1.2.1 andA1.2.2, but they are almost the same. The noise condition A1.3.3 issignificantly weaker than those used in Theorem 1.2.1, because under

the conditions of Theorem 1.2.1 we have

which certainly implies A1.3.3.

Page 30: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 30/368

16 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

As a matter of fact, Condition A1.3.3 may be satisfied by sequencesmuch more general than martingale difference sequences.

Example 1.3.1 Assume but may be any random or deter-

ministic sequence. Then satisfies A1

.3.3.This is because

Example 1.3.2 Let be an MA process, i.e.,

where is a martingale difference sequence with

Then under conditionA1.2.1, a.s., and hence

a.s. Consequently, A1.3.3 is satisfied for almost all sample paths

Condition A1.3.4 requires continuity of which is not required inA1.2.4. At first glance, unlike A1.2.4, Condition A1.3.4 does not impose

any growth rate condition on but Theorem 1.3.1 a priori requiresthe boundedness of which is an implicit requirement for the growthrate of 

The ODE method is widely used in convergence analysis for algo-rithms arising from various application areas, because from the noiseit requires no probabilistic property which would be difficult to verify.Concerning the weakness of the ODE method, we have mentioned thatit  a priori assumes that is bounded. This condition is difficult to

be verified in general case. The other point should be mentioned thatCondition A1.3.3 is also difficult to be verified in the case wheredepends on the past which often occurs when  containsstructural errors of  This is because A1.3.3 may be verifiable if  isconvergent, but may badly behave depending upon the behavior of 

So we are somehow in a cyclic situation: with A1.3.3 we canprove convergence of on the other hand, with convergent wecan verify A1.3.3. This difficulty will be overcome by using Trajectory-

Subsequence (TS) method to be introduced in the next section and usedin subsequent chapters.

1.4. Truncated RM Algorithm and TS MethodIn Section 1.2 we considered the root-seeking problem where the

sought-for root may be any point in If the region belongs

as

Page 31: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 31/368

 ROBBINS-MONRO ALGORITHM  17

to is known, then we may use the truncated algorithm and the growthrate restriction on can be removed.

Let us assume that and is known. In lieu of (1.2.2) wenow consider the following truncated RM algorithm:

where the observation is given by (1.2.1), is a given point,

and

The constant used in (1.4.1) will be specified later on.

The algorithm (1.4.1) means that it coincides with the RM algorithmwhen it evolves in the sphere but if exits thesphere then the algorithm is pulled back to the fixed point

We will use the following set of conditions:

A1.4.1 The step size satisfies the following conditions

A1.4.2 There exists a continuously d ifferentiable Lyapunov function

(not necessarily being nonnegative) such that and for (which is used in

(1.4.1)) there is such that  

A1.4.3 For any convergent subsequence of  

where is given by (1.3.2);

A1.4.4 is measurable and locally bounded.

We first compare these conditions with A1.3.1–A1.3.4. We note thatA1.4.1 is the same as A1.3.1, while A1.4.2 is weaker than A1.2.2.

The difference between A1.3.3 and A1.4.3 consists in that Condition(1.4.2) is required to be verified only along convergent subsequences,while (1.3.3) in A1.3.3 has to be verified along the whole sequence

Page 32: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 32/368

Page 33: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 33/368

 ROBBINS-MONRO ALGORITHM  19

if is small enough and is large enough.This incorporating with (1.4.5) implies that

Therefore, the norm of 

cannot reach the truncation bound In other words, the algorithm(1.4.1) turns to be an untruncated RM algorithm (1.4.7) for

for small and large

By the mean theorem there exists a vector with components locatedin-between the corresponding components of and suchthat

Notice that by (1.4.2) the left-hand side of (1.4.6) is of for all

sufficiently large since is bounded.   From this it follows that i)for small enough and large enough

and hence and ii) the last term in (1.4.8) is of  since as From (1.4.7) and (1.4.8) it thenfollows that

Since the interval does not contain the origin. Noticingthat   we findand that there is such that

Page 34: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 34/368

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 20

for sufficientlysmall and all large enough Then by A1.4.2 thereis such that

for all large and small enough As mentioned abovefrom (1.4.9) we have

for sufficiently large and small enough where denotes a mag-nitude tending to zero as

Taking (1.4.4) into account, from (1.4.10) we find that

for large However, we have shown that

The obtained contradiction shows that the number of truncations in(1.4.1) can only be finite.

We have proved that starting from some large the algorithm (1.4.1)develops as an RM algorithm

and is bounded.We are now in a position to show that converges.Assume it were not true. Then we would have

Then there would exist an interval not containing the originand would cross for infinitely many

Again, without loss of generality, assuming by the same

argument as that used above, we will arrive at (1.4.9) and (1.4.10) forlarge and obtain a contradiction. Thus, tends to a finite limit

as It remains to show that

Assume the converse that there is a subsequence

Then there is a such that for all sufficiently largeWe still have (1.4.8), (1.4.9), and (1.4.10) for some

Page 35: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 35/368

Page 36: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 36/368

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 22

If for any bounded continuous function defined on

then we say that weakly converges toIf for any there is a compact measurable set in

such that

then is called tight.Further, is called relatively compact if each subsequence of 

contains a weakly convergent subsequence.In the weak convergence analysis an important role is played by the

Prohorov’s Theorem, which says that on a complete and separable met-ric space, tightness is equivalent to relative compactness. The weakconvergence method establishes the weak limit of as andconvergence of to in probability as whereas

Theorem 1.5.1  Assume the following conditions:

A1.5.1 is a.s. bounded;

A1.5.2 is continuous;

A1.5.3 is adapted, is uniformly integrable in the sense that  

and 

Then is tight in and weakly converges to

that is a solution to

Further, if is asymptotically stable for (1.5.3), then for anyas the distance between  and 

converges to zero in probability as

In stead of proof, we only outline its basic idea. First, it is shownthat we can extract a subsequence of weaklyconverging to

Page 37: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 37/368

 ROBBINS-MONRO ALGORITHM  23

For notational simplicity, denote the subsequence still by Bythe Skorohod representation, we may assume For

this we need only, if necessary, to change the probabilistic space and takeand on this new space such that and

have the same distributions as those of and respectively.Then, it is proved that

is a martingale. Since and as can be shown, is Lipschitzcontinuous, it follows that

Since is relatively compact and the limit does not depend onthe extracted subsequence, the whole family weakly convergesto as and satisfies (1.5.3). By asymptotic stability of 

Remark 1.5.1 The boundedness assumption on may be removed.For this a smooth function is introduced such that

and the following truncated algorithm

is considered in lieu of (1.5.1). Then is interpolated to a piece-wiseconstant function for the It is shownthat is tight, and weakly convergent as The limit

satisfies

Finally, by showing lim sup lim sup for some

for each it is proved that itself is tight and weaklyconverges to satisfying (1.5.3).

1.6. Notes and ReferencesThe stochastic approximation algorithm was first proposed by Rob-bins and Monro in [82], where the mean square convergence of the algo-

rithm was established under the independence assumption on the obser-vation noise. Later, the noise was extended from independent sequenceto martingale difference sequences (e.g. [7, 40, 53]).

Page 38: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 38/368

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 24

The probabilistic approach to convergence analysis is well summarizedin  [78].

The ODE approach was proposed in [65, 72], and then it was widelyused [4, 85]. For detailed presentation of the ODE method we refer to

[65, 68].The proof of Arzelá-Ascoli Theorem can be found in ([37], p.266).Section 1.4 is an introduction to the method described in detail in

coming chapters. For stability and Lyapunov functions we refer to [69].The weak convergence method was developed by Kushner [64, 68].

The Skorohod topology and Prohorov’s theorem can be found in [6, 41].For probability concepts briefly presented in Appendix A, we refer

to [30, 32, 70, 76, 84]. But the proof of the convergence theorem for

martingale difference sequences, which are frequently used throughoutthe book, is given in Appendix B.

Page 39: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 39/368

STOCHASTIC APPROXIMATION ALGORI-

THMS WITH EXPANDING TRUNCATIONS

In Chapter 1 the RM algorithm, the basic algorithm used in stochas-tic approximation(SA), was introduced, and four different methods foranalyzing its convergence were presented. However, conditions imposedfor convergence are rather strong.

Comparing theorems derived by various methods in Chapter 1, wefind that the TS method introduced in Section 1.4 requires the weakest

condition on noise. The trouble is that the sought-for root has to be in-side the truncation region. This motivates us to consider SA algorithmswith expanding truncations with the purpose that the truncation regionwill finally cover the sought-for root whose location is unknown. This isdescribed in Section 2.1.

General convergence theorems of the SA algorithm with expandingtruncations are given in Section 2.2. The key point of the proof is toshow that the number of truncations is finite. If this is done, then the

estimate sequence is bounded and the algorithm turns to be the conven-tional RM algorithm in a finite number of steps. This is realized by usingthe TS method. It is worth noting that the fundamental convergencetheorems given in this section are analyzed by a completely elementarymethod, which is deterministic and is limited to the knowledge of calcu-lus. In Section 2.3 the state-independent conditions on noise are givento guarantee convergence of the algorithm when the noise itself is state-dependent. In Section 2.4 conditions on noise are discussed. It appearsthat the noise condition in the general convergence theorems in a certainsense is necessary. In Section 2.5 the convergence theorem is given forthe case where the observation noise is non-additive.

In the multi-root (of case, up-to Section 2.6 we have only estab-lished that the distance between the estimate and the root set tends to

25

Chapter 2

Page 40: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 40/368

In Chapter 1 we have presented four types of convergence theoremsusing different analysis methods for SA algorithms. However, none of these theorems is completely satisfactory in applications. Theorem 1.2.1is proved by using the classical probabilistic method, which requiresrestrictive conditions on the noise and As mentioned before, thenoise may contain component caused by the structural inaccuracy of the function, and it is hard to assume this kind of noise to be mutually

independent or to be a martingale difference sequence etc. The growthrate restriction imposed on the function not only is sever, but also isunavoidable in a certain sense. To see this, let us consider the followingexample:

It is clear that conditions A1.2.1, A1.2.2, and A1.2.3 are satisfied,

where for A1.2.2 one may take The only conditionthat is not satisfied is (1.2.4), since while the right-hand side of (1.2.4) is a second order polynomial. Simple calculationshows that given by RM algorithm rapidly diverges:

STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 26

From this one might conclude that the growth rate restriction wouldbe necessary.However, if we take the initial value with then

given by the RM algorithm converges to To reduce initial valuein a certain sense, it is equivalent to use step size not from but from

for some The difficulty consists in that from which we should

zero. But, by no means this implies convergence of the estimate itself.This is briefly discussed in Section 2.4, and is considered in Section 2.6in connection with properties of the equilibrium of Conditionsare given to guarantee the trajectory convergence. It is also considered

whether the limit of the estimate is a stable or unstable equilibrium of In Section 2.7 it is shown that a small distortion of conditions

may cause only a small estimation error in limit, while Section 2.8 of this chapter considers the case where the sought-for root is moving dur-ing the estimation process. Convergence theorems are derived with thehelp of the general convergence theorem given in Section 2.2. Notes andreferences are given in the last section.

2.1. Motivation

Page 41: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 41/368

Stochastic Approximation Algorithms withExpanding Truncations 27

start the algorithm. This is one of the motivations to use expandingtruncations to be introduced later.

Theorem 1.3.1 proved in Section 1.3 demonstrates the ODE method.By this approach, the condition imposed on the noise has significantly

been weakened and it covers a class of noises much larger than thattreated by the probabilistic method. However, it   a priori  requiresbe bounded. This is the case if converges, but before establishing itsconvergence, this is an artificial condition, which is not satisfied even forthe simple example given above. Further, although the noise condition(1.3.3) is much more general than that used in Theorem 1.2.1, it isstill difficult to be verified for the state-dependent noise. For example,

where is a martingale difference sequence with

If is bounded andthen a.s. and (1.3.3) holds. However, in general,

it is difficult to directly verify (1.3.3) because the behavior of is

unknown. This is why we use Condition (1.4.2) which should be verifiedonly along convergent subsequences. With convergent the noise

is easier to be dealt with.Considering convergent subsequences, the path-wise convergence is

proved for a truncated RM algorithm by using the TS method in Theo-rem 1.4.1. The weakness of algorithms with fixed truncation bounds isthat the sought-for root of has to be located in the truncation region.But, in general, this cannot be ensured. This is another motivation toconsider algorithms with expanding truncations.

The weak convergence method explained in Section 1.5 can avoidboundedness assumption on but it can ensure convergence in dis-tribution only, while in practical computation one always deals with a

sample path. Hence, people in applications are mainly interested inpath-wise convergence.The SA algorithm with expanding truncations was introduced in or-

der to remove the growth rate restriction on It has been developedin two directions: weakening conditions imposed on noise and improv-ing the analysis method. By the TS method we can show that theSA algorithm with expanding truncations converges under a truly weakcondition on noise, which, in fact, is also necessary for a wide class of 

In Chapter 1, the root of is a singleton. Fromnow on we will consider the general case. Let  J  be the root set of  

We now define the algorithm. Let be a sequence of positivenumbers increasingly diverging to infinity, and let be a fixed point in

Page 42: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 42/368

28 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Fix an arbitrary initial value and denote by the estimate attime serving as the approximation to J. Define bythefollowingrecursion:

where is an indicator function meaning that it equals 1 if  the inequality indicated in the bracket is fulfilled, and 0 if the inequalitydoes not hold.

We explain the algorithm. is the number of truncations up-to timeserves as the truncation bound when the estimate is

generated. From (2.1.1) it is seen that if the estimate at timecalculated by the RM algorithm remains in the truncation region, i.e., if 

then the algorithm evolves as the RM algorithm.

If exits from the sphere with radius i.e., if  then the estimate at time is pulled back to thepre-specified point and the truncation bound is enlarged fromto

Consequently, if it can be shown that the number of truncations isfinite, or equivalently, generated by (2.1.1) and (2.1.2) is bounded,then the algorithm (2.1.1) and (2.1.2) turns to be the one without trun-cations, i.e., to be the RM algorithm after a finite number of steps. This

actually is the key step when we prove convergence of (2.1.1) and (2.1.2).The convergence analysis of (2.1.1) and (2.1.2) will be given in thenext section, and the analysis is carried out in a deterministic way at afixed sample without involving any interpolating function.

In This section by TS method we establish convergence of the RM

algorithm with expanding truncations defined by (2.1.1)–(2.1.3) undergeneral conditions. Let us first list conditions to be used.

2.2. General Convergence Theorems by TSMethod

Page 43: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 43/368

Stochastic Approximation Algorithms withExpanding Truncations 29

A2.2.2 There is a continuously differentiable function (not necessarily

being nonnegative) such that  

 for any and is nowhere dense, where

 J is the zero set of  i.e.,

and denotes the gradient of Further, used in (2.1.1)

is such that for some and  

For introducing condition on noise let us denote by the prob-ability space. Let be a mea-surable function defined on the product space. Fixing an meansthat a sample path is under consideration. Let the noise be givenby

Thus, the state-dependent noise is considered, and for fixedmay be random.

A2.2.3 For the sample path under consideration for any sufficiently

large integer 

for any such that converges, where is given by(1.3.2) and denotes given by (2.1.1)–(2.1.3) and valued at thesample path

In the sequel, the algorithm (2.1.1)–(2.1.3) is considered for the fixedfor which A2.2.3 holds, and in will often be suppressed if no

confusion is caused.

A2.2.4 is measurable and locally bounded.

Remark 2.2.1 Comparing A2.2.1–A2.2.4 with A1.4.1–A1.4.4, we findthat if the root set  J degenerates to a singleton then the only essentialdifference is that an indicator function is included in (2.2.2)while (1.4.2) stands without it. It is clear that if is bounded, thenthis makes no difference. However, before establishing the boundednessof condition (2.2.2) is easier to be verified. The key point here

Page 44: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 44/368

30 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

is that in contrast to Section 1.4 we do not assume availability of theupper bound for the roots of 

Remark 2.2.2 It is worth noting that con-

verges. To see this it suffices to take in (2.2.2).

Theorem 2.2.1  Let be given by (2.1.1)–(2.1.3) for a given initial

value Assume A2.2.1–A2.2.4 hold. Then, for the

sample path for which A2.2.3 holds.

Proof. The proof is completed by six steps by considering conver-

gent subsequences at the sample path. This is why we call the analysismethod used here as TS method.Step 1. We show that there are constants such that

for any there exists such that for any

if is a convergent subsequence of where  M  is

independent of andSince we need only to prove

(2.2.3) forIf the number of truncations in (2.1.1)–(2.1.3) is finite, then there is

an  N  such that i.e., there is no more truncation forHence, wheneverIn this case, we may take in (2.2.3).

We now prove (2.2.3) for the case where asAssume the converse that (2.2.3) is not true. Take There issuch that

Take a sequence of positive real numbers and as

Since (2.2.3) is not true, for there are andsuch that

Page 45: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 45/368

Stochastic Approximation Algorithms withExpanding Truncations 31

and for any there are andsuch that

Without loss of generality we may assume

Then for any from (2.2.4) and (2.2.6) it follows

that

Since there is such that Then from(2.2.7) it follows that

For any fixed if is large enough, then andand by (2.2.10)

Since from (2.2.11) it follows

that

and by (2.2.4), (2.2.7), and (2.2.8)

and hence

by A2.2.4, where is a constant.Let where is specified in A2.2.3. Thenfrom A2.2.3

for any

Page 46: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 46/368

32 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Taking and respectively in (2.2.10)

and noticing from(2.2.9) we thenhave

and hence

From (2.2.8), it follows that

where the second term on the right-hand of the inequality tends to zeroby (2.2.12) and (2.2.13), while the first term tends to zero because

Noticing that by

(2.2.9) and (2.2.13), we then by (2.2.14) have

On the other hand, by (2.2.6) we have

The obtained contradiction proves (2.2.3).Step 2. We now show that for all large enough

if T  is small enough, where is a constant.If the number of truncations in (2.1.1)–(2.1.3) is finite, then is

bounded and hence is also bounded.

Page 47: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 47/368

Stochastic Approximation Algorithms withExpanding Truncations 33

Then for large enough there is no truncation, and by (2.2.2) for

if T  is small enough. In (2.2.16), for the last inequality the boundednessof is invoked, and is a constant.

Thus, it suffices to prove (2.2.15) for the case where

From (2.2.3) it follows that for any

if is large enough.This implies that for

where is a constant. The last inequality of (2.2.18) yields

With in A2.2.3, from (2.2.2) we have

for large enough and small enough   T .Combining (2.2.18), (2.2.19), and (2.2.20) leads to

for all large enough This together with (2.2.16) verifies (2.2.15).Step 3. We now show the following assertion:For any interval with and the

sequence cannot cross infinitely many times with

Page 48: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 48/368

34 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

andAssume the converse: there are infinitely many crossings

and is bounded.

By boundedness of without loss ofgenerality, we may assume

By setting in (2.2.15), we have

But by definition so we have

From (2.2.15) we see that if take sufficiently small, then

for sufficiently largeBy (2.2.18) and (2.2.15), for large we then have

where denotes the gradient of and asFor condition (2.2.2) implies that

By (2.2.15) and (2.2.18) it follows that

bounded, where by “crossing  by  we mean that

Page 49: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 49/368

Stochastic Approximation Algorithms withExpanding Truncations 35

Then, by (2.2.23) and (2.2.1) from (2.2.24)–(2.2.26) it follows that thereare and such that

for all sufficiently largeNoticing (2.2.22), from (2.2.27) we derive

However, by (2.2.15) we have

which implies that for small enoughThis means that which contradicts (2.2.28).Step 4. We now show that the number of truncations is bounded.By A2.2.2, is nowhere dense, and hence a nonempty interval

exists such that and

If then starting from will cross the sphere

infinitely many times. Consequently, will crossinfinitely often with bounded. In Step 3, we have shown thisprocess is impossible. Therefore, starting from some the algorithm(2.1.1)–(2.1.3) will have no truncations and is bounded.

This means that the algorithm defined by (2.1.1)–(2.2.3) turns to bethe conventional RM algorithm for and a stronger than (2.2.2)condition is satisfied:

for any such that converges.Step 5. We now show that converges. Let

We have to showIf and one of and does not belong to thenexists such that and By Step 3 this

is impossible. So, both and belong to and

Page 50: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 50/368

36 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

If we can show that is dense in then from (2.2.30)it will follow that is dense in which contradicts to theassumption that is nowhere dense. This will prove i.e., theconvergence of 

To show that is dense in it suffices to show thatAssume the converse: there is a subsequence

Without loss of generality, we may assume converges. Otherwise,a convergent subsequence can be extracted, which is possible because

is bounded. However, ifwe take in (2.2.15), we have

which contradicts (2.2.31). Thus and converges.Step  6. For proving it   suffices to show that all

limit points of belong to  J.

Assume the converse: By (2.2.15) we

have

for all large if is small enough. By (2.2.1) it follows that

and from (2.2.24)

for smallenough This leads to a contradiction because convergesand the left-hand side of (2.2.32) tends to zero as Thus, weconclude

Remark 2.2.3 In (2.1.1)–(2.1.3) the spheres with expanding radiusesare used for truncations. Obviously, the spheres can be replaced

by other expanding sets. At first glance the point in (2.1.1) may bearbitrarily chosen, but actually the restriction is imposed on the exis-tence of such that The condition is obviouslysatisfied if as because the availability of is notrequired.

Page 51: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 51/368

Stochastic Approximation Algorithms withExpanding Truncations 37

Remark 2.2.4 In the proof of Theorem 2.2.1 it can be seen that theconclusion remains valid if in A2.2.2 “ J is the zero

set of is removed. As a matter of fact,   J  may be bigger than the

zero set of Of course, it should at least contain the zero set of  in order (2.2.1) to be satisfied. It should also be noted that for

we need not require to be nowhere dense.

Let us modify A2.2.2 as follows.

A2.2.2’ There  is a continuously differentiable function

such that 

 for any and is nowhere dense. Further, used in

(2.1.1) is such that for some and  

A2.2.2” There is a continuously differentiable function

such that 

 for any and J is closed. Further, used in (2.1.1) is such

that for some and 

Notice that, in A2.2.2’ and A2.2.2” the set  J  is not specified, but itcertainly contains the root sets of both and We may modify

Theorem 2.2.1 as follows.

Theorem 2.2.1’  Let be given by (2.1.1)–(2.1.3) for a given ini-

tial value Assume A2.2.1, A2.2.2’,A2.2.3, and A2.2.4 hold. Then

 for the sample path for which A2.2.3 holds.

Proof. The Proof of Theorem 2.2.1 applies without any change.

Theorem 2.2.1”  Let be given by (2.1.1)–(2.1.3) for a given initial

value. If A2.2.1, A2.2.2”,A2.2.3, and A2.2.4 hold, then

 for the sample path for which A2.2.3 holds.

Proof. We still have Step 1– Step 3 in the proof of Theorem 2.2.1. Let

Page 52: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 52/368

38 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

If or or both do not belong to  J , then exists such

that since  J is closed. Then would crossinfinitely many times. But, by Step 3 of the Proof for Theorem 2.2.1,this is impossible. Therefore both and belong to

Theorems 2.2.1 and 2.2.1’ only guarantee that the distance betweenand the set  J  tends to zero. As a matter of fact, we have more

precise result.

Theorem  2.2.2  Assume conditions of Theorem 2.2.1 or Theorem 2.2.1’

hold. Then for fixed   and for which A2.2.3 holds, a connected subset 

exists such that 

where denotes the closure of and is generated by (2.1.1)–  

(2.1.3).

Proof.  Denote by the set of limit points of Assume the

converse: i.e., is disconnected. In other words, closed sets andexist such that and

Define

Since a exists such that

where denotes the of set A.

Define

It is clear that and

Since by we have

By boundedness of we may assume that converges.Then, by taking in (2.2.15), we derive

Page 53: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 53/368

Stochastic Approximation Algorithms withExpanding Truncations 39

which contradicts (2.2.33) and proves the theorem.

Corollary  2.2.1  If J is not dense in any connected set, then under 

conditions of Theorem 2.2.1, given by (2.1.1)–(2.1.3) converges to

a point in This is because in the present case any connected set inconsists of a single point.

Example 2.2.1 Reconsider the example given in Section 2.1:

It was shown that the RM algorithm rapidly diverges to even in thenoise-free case.

We now assume the observations are noise-corrupted:

where is an ARMA process driven by the independent identicallydistributed normal random variables

whereWe use the algorithm (2.1.1)–(2.1.3) with The

computation shows

which tend to the sought-for root 10.

Example 2.2.2 Let Then

Clearly, A2.2.1 and A2.2.4 hold. Concerning A2.2.2, we may taketo serve as Since

(2.2.1) is satisfied. The existence of required in A2.2.2 is obvious, forexample,

Page 54: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 54/368

40 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Finally,  is nowhere dense. So A2.2.2 also holds.Now assume the noise is such that

Then A2.2.3 is satisfied too.By Corollary 2.2.1, given by (2.1.1)–(2.1.3) converges to a point

If for the conventional (untruncated) RM algorithm

it is  a priori known that is bounded, then we have the followingtheorem.

Theorem  2.2.3  Assume A2.2.1–A2.2.4 hold but in A2.2.2 the require-

ment: “Further,  used in (2.1.1) is such that for  

some and is removed. If produced by (2.2.34) isbounded, then for the sample path for which A2.2.3

holds, where is a connected subset of  

Proof. As a matter offact, by boundedness of (2.2.3) and (2.2.15)become obvious. Steps 3, 5, and 6 in the proof of Theorem 2.2.1 remainunchanged, while Step 4 is no longer needed. Then the conclusion followsfrom Theorems 2.2.1 and 2.2.2.

Remark 2.2.5 All theorems concerning SA algorithms with expandingtruncations remain valid for produced by (2.2.34), if given by

(2.2.34) is known to be bounded.

Theorems 2.2.1 and 2.2.2 concern with time-invariant functionbut the results can easily be extended to time-varying functions, i.e., tothe case where the measurements are carried out for

where depends on time

Conditions A2.2.2 and A2.2.4 are respectively replaced by the follow-ing conditions:A2.2.2o There is a continuously differentiable functionsuch that 

Page 55: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 55/368

Page 56: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 56/368

42 STOCHASTIC APPROXIMATION AND ITS  APPLICATIONS 

is so weak that it is necessary as to be shown later. However, conditionA2.2.3 is state-dependent in the sense that the condition itself dependson the behavior of This makes it not always possible to verifythe condition beforehand. We are planning to give convergence theo-

rems under conditions with no state involved. For this we have toreformulate Theorems 2.2.1 and 2.2.2.

As defined in Section 2.2 where is a mea-surable function. In lieu of A2.2.3 we introduce the following condition.

A2.3.1 For any sufficiently large integer there is an

with such that for any

 for any such that converges.

Theorem  2.3.1  Assume A2.2.1, A2.2.2, A2.2.4, and A2.3.1 hold. Then

a.s. for generated by (2.1.1)–(2.1.3) with a

given initial value where is a connected subset contained in theclosure of J.

Proof. Let It is clear that

i.e., Then for any

A2.2.3 is fulfilled with possibly depending on and theconclusion of the theorem follows from Theorems 2.2.1 and 2.2.2.We now introduce a state-independent condition on noise.

A2.3.2 For any is a martingale difference se-

quence and for some

where is a family of nondecreasing independent of  

We first give an example satisfying A2.3.2. Let be andimensional martingaledifference sequencewith

Page 57: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 57/368

Stochastic Approximation Algorithms withExpanding Truncations 43

for some and let

be a measurable and locally bounded function. Thensatisfies A2.3.2, because

and

by assumption.

Theorem  2.3.2  Let be given  by  (2.1.1)–(2.1.3)  for a given initialvalue. Assume A2.2.1, A2.2.2, A2.2.4, and A2.3.2 hold and 

 for given in A2.3.2. Then a.s., where is a

connected subset contained in

Proof. Since is measurable and is it fol-lows that is adapted. Approximating

by simple functions, it is seen that

Hence, is a martingale difference sequence, and

a.s.

By the convergence theorem for martingale difference sequences, theseries

converges a.s., which implies that with exists such thatfor each

converges to zero as uniformly inThis means that A2.3.1 holds, and the conclusion of the theorem

follows from Theorem 2.3.1.

In applications it may happen that is not directly observed. In-stead, the time-varying functions are observed, and the observa-

tions may be done not at but at i.e., at with bias

Page 58: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 58/368

44 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Theorem  2.3.3  Let be given by (2.1.1)–(2.1.3) for a given ini-

tial value. Assume that A2.2.1, A2.2.2, A2.2.4, and A2.3.2 hold and 

 for p given in A2.3.2. Further, assume is an

adapted sequence, is bounded by a constant, and for any sufficiently

large integer there exists with such that for any

 for any such that converges. Then, a.s.,where is a connected subset contained in

Proof.  By assumption where is a constant. Then

and again by the convergence theorem for martingale difference sequences,

the series

convergence a.s. Consequently, there exists with such that

for any the convergence indicated in (2.3.5) holds and for any

integer

tends to zero as uniformly inTherefore, A2.3.1 is fulfilled and the conclusion of the theorem follows

from Theorem 2.3.1.

Remark  2.3.1 The obvious sufficient condition for (2.3.5) is

which in turn is satisfied, if is continuous and

Remark  2.3.2 Theorems 2.3.2 and 2.3.3 with A2.2.2 and A2.2.4 re-

placed by A2.2.2° and A2.2.4’, respectively, remain valid, if isreplaced by time-varying

Page 59: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 59/368

Stochastic Approximation Algorithms withExpanding Truncations 45

2.4. Necessity of Noise ConditionUnder Conditions A2.2.1–A2.2.4 we have established convergence The-

orems for recursively given by (2.1.1)–(2.1.3). Condition A2.2.1 is a

commonly accepted requirement for decreasing step size, while A2.2.2 is

a stability condition. This kind of conditions are unavoidable for conver-gence of SA type algorithms, although it may appear in different forms.

Concerning A2.2.4 on it is the weakest possible: neither continuity

nor growth rate of is required. So, it is natural to ask is it possi-ble to further weaken Condition A2.2.3 on noise? We now answer thisquestion.

Theorem  2.4.1  Assume  only  has one  root   , i.e., and 

is continuous at Further, assume A2.2.1 and A2.2.2 hold. Thengiven by (2.1.1)–(2.1.3) converges to at those sample paths for 

which one of the following conditions holds:

i)

ii) can be decomposed into two parts such that 

and 

Conversely, if then both i) and ii) are satisfied.

Proof. Sufficiency. It is clear that ii) implies i), which in turn implies

A2.2.3. Consequently, sufficiency follows from Theorem 2.2.1.Necessity. Assume Then is bounded and (2.1.1)–

(2.1.3) turns to be the RM algorithm after a finite number of steps (for. Therefore,

where

Since and is continuous, Condition ii) is satisfied. And,

Condition i) being a consequence of ii) also holds.

Remark  2.4.1 In the case where and is continuous at

, under conditions A2.2.1, A2.2.2, and A2.2.3 by Theorem 2.2.1 wearrive at Then by Theorem 2.4.1 we derive (2.4.1) which isstronger than A2.2.3. One may ask why a weaker condition A2.2.3 can

imply a stronger condition (2.4.1)? Are they equivalent ? The answer

Page 60: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 60/368

46 STOCHASTIC APPROXIMATION AND ITS  APPLICATIONS 

is “yes” or “no”: Yes, these conditions are equivalent but only underadditional conditions A2.2.1, A2.2.2, and continuity of at beingthe unique root of However, these conditions by themselves are notequivalent because condition A2.2.3 is weaker than (2.4.1) indeed.

We now consider the multi-root case. Instead of the singleton wenow have a root set   J . Accordingly, continuity of at is replacedby the following condition

In order to derive the necessary condition on noise, we consider the

linear interpolating function

where From form a family of func-

tions, where

where is a constant.For any subsequence define

where appearing on the right-hand side of (2.4.3) denotes the de-pendence of the limit function on the subsequence, and the limsup of a

vector sequence is taken component-wise. In general, may bediscontinuous.However, if then

which is not only continuous but also differentiable.Thus, (2.4.2) for the multi-root case corresponds to the continuity of 

at for the single root case, while and a certain

analytic property of correspond to

Theorem  2.4.2  Assume (2.4.2), A2.2.1, A2.2.2, and A2.2.4 hold. Then

given by (2.1.1)–(2.1.3) is bounded, and the

right derivative for any convergent subsequence

Page 61: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 61/368

Page 62: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 62/368

48 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Necessity. We now assume is bounded, and

for any convergent subsequence and want toshow A2.2.3. Let For any from (2.4.5) we have

From (2.4.6) it is seen that

where asThe assumption means that

where and

Noticing the continuity of from (2.4.10) and (2.4.11) it follows

that

which incorporating with yields (2.4.9). Thus, we have

for any such that converges.By the boundedness of (2.4.12) is equivalent to (2.2.2), and the

proof is completed.

Corollary  2.4.1  Assume (2.4.2), A2.2.1, A2.2.2, and A2.2.4 hold, and assume J is not dense in any connected set. Then given by (2.1.1)– 

(2.1.3) converges to some point in J if and only if A2.2.3 holds.

This corollary is a direct generalization of Theorem 2.4.1. The suffi-ciency part follows from Corollary 2.2.1, while the necessity part fol-

lows from Theorem 2.4.2 if notice that convergence of implies

for sufficiently large

The first term on the right-hand side of (2.4.8) tends to zero asby (2.4.2) and So, to verify A2.2.3 it suffices to

show that

Page 63: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 63/368

Stochastic Approximation Algorithms withExpanding Truncations 49

2.5. Non-Additive NoiseIn the algorithm (2.1.1)–(2.1.3) the noise in observation is

additive. In this section we continue considering (2.1.1)–(2.1.2) but inlieu of (2.1.3) we now have the non-additive noise

where is the observation noise at timeThe problem is that under which conditions does the algorithm defined

by (2.1.1), (2.1.2), and (2.5.1) converge to J , the root set of whichisthe average of with respect to its second argument? To be precise,let be an measurable function and let be a

distribution function in The function is defined by

It is clear that the observation given by (2.5.1) can formally be ex-pressed by the one with additive noise:

and Theorems 2.2.1 and 2.2.2 can still be applied. The basic problem ishow to verify A2.2.3. In other words, under which conditions onand does given by (2.5.3) satisfy A2.2.3?

Before describing conditions to be used we first introduce some no-tations. We always take the regular version of conditional probability.This makes conditional distributions introduced later are well-defined.

Let be the distribution function of and be theconditional distribution of given where

Further, let us introduce the following coefficients,

where denotes the Borel in and for a random variablewhere runs over all setswith probability zero.

is known as the mixing coefficient of and it measures the

dependence between and It is clear thatmeasures the closeness of the distribution of to

Page 64: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 64/368

50 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

The following conditions will be needed.

A2.5.2 (=A2.2.2);

A2.5.3 is a measurable function and is locally Lipschitz-continuousin the first argument, i.e., for any fixed 

where is a constant depending on

A2.5.4 (Noise Condition)

i) is a process with mixing coefficient asuniformly in

ii)

where is defined in (2.5.6);iii) as

Theorem 2.5.1  Assume A2.5.1–A2.5.4. Then for generated  by

(2.1.1), (2.1.2), and (2.5.1)

where is a connected subset of 

The proof consists in verifying Condition A2.2.3 satisfied a.s. bygiven in (2.5.3). Then the theorem follows from Theorems 2.2.1 and

2.2.2.We first prove lemmas.

Lemma 2.5.1  Assume A2.5.1, A2.5.3, and A2.5.4 hold. Then there

is an with such that for any and any bounded  

subsequence of say,

A2.5.1

as

Page 65: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 65/368

Stochastic Approximation Algorithms withExpanding Truncations 51

(without loss of generality assume there exists an integer  

such that for all

if  T is small enough, where is given by (2.1.1), (2.1.2), and (2.5.1),and is given by (1.3.2).

Proof.  For any set

By setting in (2.5.6), it is clear that

From (2.5.7), it follows that

and

where (and hereafter)  L is taken large enough so thatSince is a convergent martingale, there is a a.s.

such that

From (2.5.13) and it is clear that for any integer  L the

series of martingale differences

converges a.s.Denote by the where the above series converges, and set

Page 66: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 66/368

52 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

It is clear thatLet be fixed and with and

Then for any integer by (2.5.13) we have

where the first term on the right-hand side tends to zero as by(2.5.15).

Assume  is sufficiently large such thati) for if as orii) if 

We note that in case ii) there will be no truncation in (2.1.1) for

Assume and fix a small enough T  such that Letbe arbitrarily fixed.We prove (2.5.9) by induction. It is clear (2.5.9) is true forAssume (2.5.9) is true for and there

is no truncation for if Noticingwe have, by (2.5.16)

if is large enough.

This means that at time there is no truncation in (2.1.1), and

Lemma 2.5.2  Assume A2.5.1, A2.5.3, and A2.5.4 hold. There is anwith such that if and if as

Page 67: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 67/368

Stochastic Approximation Algorithms withExpanding Truncations 53

is a bounded subsequence of produced by (2.1.1), (2.1.2),

and (2.5.1), then

Proof.  Write

where

By (2.5.13), for we have

which converges to a finite limit as by the martingale conver-gence theorem.

Therefore, for any integers  L  and

converges a.s.

Page 68: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 68/368

54 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Therefore, there is with such that (2.5.23) holds forany integers L and

Let be fixed, By Lemma 2.5.1,for small

Then

for any by (2.5.23).

We now estimate (II). By Lemma 2.5.1 we have the following,

Noticing (2.5.7) and (2.5.14), we then have

Similarly, by Lemma 2.5.1 and (2.5.7)

Combining (2.5.18), (2.5.24), and (2.5.26) leads to

Therefore, to prove the lemma it suffices to show that the right-handside of (2.5.27) is zero.

Page 69: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 69/368

Stochastic Approximation Algorithms withExpanding Truncations 55

Applying the Jordan-Hahn decomposition to the signed measure,

and noticing that is a process with mixing coefficientwe know that there is a Borel set  D  in such that for any

Borel set  A in

and

Then, we have the following,

where

Page 70: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 70/368

56 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

For any given there is a j such that

For any fixed by (2.5.13), (2.5.14), and it follows that

Therefore,

Since may be arbitrarily small, this combining with (2.5.27)

proves the lemma.

Proof of Theorem 2.5.1.For proving the theorem it suffices to show that A2.2.3 is satisfied by

a.s. By Lemma 2.5.2, we need only to prove

that

for is a bounded subsequence, and asAssume

Applying the Jordan-Hahn decomposition to the signed measure,

Page 71: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 71/368

Stochastic Approximation Algorithms withExpanding Truncations 57

we conclude that

where for the last inequality (2.5.8) and (2.5.12) are invoked. Sinceas the right-hand side of (2.5.32) tends to zero asfor any This proves (2.5.31) and completes the proof  

of Theorem 2.5.1.

Remark 2.5.1 From the expression (2.5.3) for observation it is seenthat the observation with non-additive noise can be reduced to the ad-ditive but state-dependent noise which was considered in Section 2.3.However, Theorem 2.5.1 is not covered by Theorems in Section 2.3 andvice versa.

2.6. Connection Between Trajectory Convergenceand Property of Limit Points

In the multi-root case, what we have established so far is that the dis-tance between given by (2.1.1)–(2.1.3) and a connected subsetof converges to zero under various sets of conditions.

As pointed out in Corollary 2.2.1, if  J  is not dense in any connectedset, then converges to a point belonging to However, it is stillnot clear how does behave when  J is dense in some connected set?The following example shows that still may not converge, although

Example  2.6.1 Let

Page 72: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 72/368

58 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and let

Take step sizes as follows

We apply the RM algorithm (2.2.34) withAs we may take

Then, all conditions A2.2.1–A2.2.4 are satisfied.Notice that

and

where k  is such that

By (2.6.1), it is clear that in (2.6.2)

and

Therefore, is bounded and by Theorem 2.2.4.

As a matter of fact, changes from one to zero and then from zero

to one, and this process repeats forever with decreasing step sizes.

Page 73: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 73/368

Stochastic Approximation Algorithms withExpanding Truncations 59

Thus, is dense in [0,1]. This phenomenon hints that for tra- jectory convergence of the stability-like condition A2.2.2 is notenough; a stronger stability is needed.

Definition  2.6.1

 A point i.e., a root of is called dominantlystable for if there exist a and a positive measurable function

which is bounded in the interval and  

satisfies the following condition

 for all the ball centered at with radius

Remark 2.6.1 The dominant stability implies stability. To see this, it

suffices to take as the Lyapunov function. Then

The dominant stability of however, is not necessary for asymptoticstability.

Remark 2.6.2 Equality (2.6.3) holds for any whatever is.Therefore, all interior points of  J are dominantly stable for Further,

for a boundary point of   J  to be dominantly stable for it sufficesto verify (2.6.3) for with small i.e., all that areclose to and outside   J .Example 2.6.2  Let

In fact, is the gradient of  

In this example We now show that all points of   J 

are dominantly stable for For this, by Remark 2.6.2, it suffices toshow that all with are dominantly stable for and for this,it in turn suffices to show (2.6.3) for any  with and

for small enough Denoting by the angle between vectorsand we have for

Page 74: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 74/368

60 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

It is clear that

for all small enough Therefore, all points in  J  are dominantlystable for

Theorem 2.6.1  Assume A2.2.1, A2.2.2, and A2.2.4 hold.   If for a

given is convergent and a limit point of generated  

by (2.1.1)–(2.1.3) is deminantly stable for then for this trajectory

Proof.   For any define

where is the one indicated in Definition 2.6.1.

It is clear that is well-defined, because there is a convergent subse-quence: and for any greater than some If  for any for some then by arbitrariness of  

Therefore, for proving the theorem, it suffices to show that, for anysmall an exists such that implies if  

Since implies A2.2.3, all conditions of Theorem 2.2.1

are satisfied. By the boundedness of we may assume that islarge enough so that the truncations no longer exist in (2.1.1)–(2.1.3)for It then follows that

Notice that for any andis bounded, and hence by (2.6.3)

Page 75: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 75/368

Stochastic Approximation Algorithms withExpanding Truncations 61

for some because is convergent and

Further,

An argument similar to that used for (2.6.5) leads to

if is large enough.Then from (2.6.6) we have

From (2.6.4) and (2.6.7) we see that we can inductively obtain

Then, noticing by definitions of we have

where the elementary inequality

Page 76: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 76/368

62 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

is used with for the first inequality in (2.6.8), and with

for the third inequality in (2.6.8). Because is bounded,

and an exists such

that

This means that and completes the proof.For convergence of SA algorithms we have imposed the stability-like

condition A2.2.2 for and the dominant stability con-

dition (2.6.3) for trajectory convergence. It is natural to ask does a limit

point of trajectory possess a certain stability property? The followingexample gives the negative answer.

Example 2.6.3 Let

It is straightforward to check that

satisfies A2.2.2. Take

where is a sequence of mutually independent random

variables such that a.s. Then with 1 being

a stable attractor for and all A2.2.1–A2.2.4 are satisfied. TakeThen by Theorem 2.2.1 it follows that

a.s. Since must converge to 0 a.s. Zero, however, isunstable for

In this example converges to a limit, which is independent of ini-

tial values and unstable, although conditions A2.2.1–A2.2.4 hold. This

strange phenomenon happens because

as a function of is singular for some in the sense that it

restricts the algorithm to evolve only in a certain set of Therefore,

Page 77: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 77/368

Stochastic Approximation Algorithms withExpanding Truncations 63

in order the limit of to be stable, imposing a certain regularitycondition on and some restrictions on noises is unavoidable.

As in Section 2.3, assume that observation noise iswith being a measurable function defined on Set

Let us introduce the following conditions:

A2.6.1 For a given is a surjection for any

A2.6.2 For any and is continuous in and for any

and 

where denotes the ball centered at with radius

It is clear, that A2.6.2 is equivalent to A2.6.2’:

A2.6.2’ For any and any compact set  

Before formulating Theorem 2.6.2 we first give some remarks on Con-ditions A2.6.1 and A2.6.2.

Remark  2.6.3 If does not depend on then in (2.6.9)can be removed when taking supremum. In Condition A2.2.3

is a convergent subsequence, and hence is automaticallylocated in a compact set. In Theorems in Sections 2.2, 2.3, 2.4, and2.5, the initial value is fixed, and hence for fixed is a fixedsequence. In contrast to this, in Theorem 2.6.2 we will consider the casewhere the initial value arbitrarily varies, and hence for any fixedmay be any point in If in (2.6.9) were not restricted to a compact

set (i.e., with removed in (2.6.9)), then the resultingcondition would be too strong. Therefore, to put in(2.6.9) is to make the condition reasonable.

Remark  2.6.4 If is continuous and if  then is a surjection.

Page 78: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 78/368

64 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By this property, is a surjection for a large class of Forexample, let be free of and let the growth rate of be notfaster than linear as Then with satisfying A2.2.1 wehave as for all Hence, A2.6.1

holds. In the case where the growth rate of is faster than linearas and for some we alsohave that as for all and A2.6.1

holds.In what follows by stability of a set for we mean it in the

Lyapunov sense, i.e., a nonnegative continuously differentiable function

exists such that andfor some where

Theorem 2.6.2  Assume A2.2.1, A2.2.2, and A2.6.2 hold, and that is continuous and for a given A2.6.1 holds. If defined by (2.1.1)–  

(2.1.3) with any initial value converges to a limit independent of  

then belongs to the unique stable set of 

Proof.  Since by A2.2.2 and by conti-

nuity of exists with such thatHence, By continuity of    J is closed, and hence by A2.2.2,

Since we must have Denote by the connected

subset of containing The minimizer set of that contains isclosed and is contained in Since is a connected set

and byA2.2.2 is nowhere dense, is a constant.By continuity of all connected root-sets are closed and they are

separated. Thus, there exists a such that

i.e., contains no root of other than those located inSet

Then andTherefore, by definition, is stable for

We have to show that and is the unique stable root-set.Let be the connected set of such

that contains By continuity of for an arbitrary small

exist such that and the distance

between the interval and the set is positive;

i.e.,

Page 79: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 79/368

Stochastic Approximation Algorithms withExpanding Truncations 65

We first show that, for any and there existand such that, for any if then

By Theorem 2.2.1, for with sufficiently large there will beno truncation for (2.1.1)–(2.1.3), and

For any let By A2.6.2, sufficiently small

and large enough exist such that for any

If for then (2.6.10) immediately

follows by setting Assume for someLet be the first such one. Then

By (2.6.11), however,

which contradicts (2.6.12). Thus and (2.6.10)is verified.

For a given we now prove the existence of such thatfor any if where the dependence of 

on and on the initial value is emphasized. For simplicity of writing,

is written as in the sequel.Assume the assertion is not true; i.e., for any exists such thatand for some

Suppose and

Page 80: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 80/368

66 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

If there exists an with then withexists because is connected and with

This yields a contradictory inequality:

where the first inequality follows from A2.2.2 while the second inequalityis because is the minimizer of 

Consequently, for any and

and a subsequence of exists, also denoted by for

notational simplicity, such that By the continuityof 

Hence, by the factBy (2.6.10) and the fact we can choose sufficiently

small T   and large enough  N   such that

and  i.e.,

for any By (2.6.10), exists with theproperty such that

Because as for sufficiently large N,

by (2.6.10) the last term of (2.6.15) is Then

Page 81: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 81/368

Stochastic Approximation Algorithms withExpanding Truncations 67

By (2.6.10) and the continuity of the third term on the right handside of (2.6.16) is and by A2.6.2 (Since

with for all sufficiently large  N .), the norm of the secondterm on the right-hand side of (2.6.16) is also as Henceby A2.2.2 and (2.6.13), some exists such that the right-handside of (2.6.16) is less than for all sufficiently large  N  if T  is smallenough. By noticing and mentioned

above, from (2.6.14) it follows that the left-hand side of (2.6.16) tendsto a nonnegative limit as The obtained contradiction showsthat exists such that for any if  With fixed   for any byA2.6.1 exists such thatBy and the arbitrary smallness of from this it

follows that Since by assumption, we havewhich means that is stable. If another stable set existed

such that then by the same argument would belong toThe contradiction shows that the uniqueness of the stable set.

2.7.   Robustness of Stochastic ApproximationAlgorithms

In this section for the single root case, i.e, the case we

consider the behavior of SA algorithms when conditions for convergenceof algorithms to are not exactly satisfied. It will be shown that a“small” violation of conditions will cause no big effect on the behaviorof the algorithm.

The following result known as Kronecker lemma will be used severaltimes in the sequel. We state it separately for convenience of reference.

Kronecker Lemma.  If  where is a sequence

of positive numbers nondecreasingly diverging to infinity and is a

sequence of matrices, then

Proof. Set Since

for any there is such that if Then it

Page 82: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 82/368

68 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

follows that

as and thenWe still consider the algorithm given by (2.1.1)–(2.1.3), where de-

notes the estimate for at time but may not be the exact root of As a matter of fact, the following set of conditions will be used to

replace A2.2.1–A2.2.4:

A2.7.1  nonincreasingly tends to zero, and 

exists such that 

A2.7.2 There exists a nonnegative twice continuously differentiabl e func-

tion such that and 

A2.7.3  For sample path the observation noise satisfies the fol-

lowing condition

A2.7.4  is continuous, but is not necessary to

be the root of 

Comparing A2.7.1–A2.7.4 with A2.2.1–A2.2.4, we see the followingconditions required here are not assumed in Section 2.2: nonincreasing

Page 83: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 83/368

Page 84: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 84/368

70 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Set

We will only consider those in (2.7.2) for which where is givenin (2.7.7). From (2.7.7) and (2.7.8) it is seen thatConsequently, by (2.7.2), a given by (2.7.12) is positive.

By continuity of and and existsuch that the following inequalities hold:

By A2.7.3 for can be taken sufficiently large suchthat

Lemma  2.7.1  Assume A2.7.1, A2.7.2, A2.7.4 hold with given in (2.7.3)being less than or equal to If for given by (2.1.1)–  

(2.1.3) with (2.7.5) fulfilled, for some where K is

given in (2.7.18), then for any

Proof.  Because is nondecreasing as  T 

increases, it suffices toprove the lemma forAssume the converse: there exists an such that

Page 85: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 85/368

Stochastic Approximation Algorithms withExpanding Truncations 71

Then for any we have

and hence

which incorporating with the definition of leads to

On the other hand, from (2.7.20) and (2.7.21) it follows that

From (2.7.9) we have

By a partial summation we have

Applying (2.7.3) to the first two terms on the right-hand side of (2.7.25),

and (2.7.1) and (2.7.3) to the last term we find

From (2.7.24) and (2.7.26) it then follows that

Page 86: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 86/368

72 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

which contradicts (2.7.22). This proves the lemma.

Lemma 2.7.2 Under the conditions of Lemma 2.7.1, for any

the following estimate holds:

Proof. Since by Lemma 2.7.1 we have

and hence

Consequently, we have

Lemma 2.7.3  Assume A2.7.1–A2.7.4 hold and satisfies (2.7.7). Then

 for the sample path for which A 2.7.3 holds, a that is independent of 

and exists such that 

in other words, given by (2.1.1)–(2.1.3) is bounded.

Proof. Let be a sufficiently large integer such that

where K  is given by (2.7.18).

Page 87: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 87/368

Stochastic Approximation Algorithms withExpanding Truncations 73

Assume the lemma is not true. Then there exist and such

that Let be the maximal integer satisfying thefollowing equality:

Then by definition we have

and by (2.7.28) and (2.7.29),

We first show that under the converse assumption there must be ansuch that

Otherwise, for any and from (2.7.24) it follows

that

This together with (2.7.30) implies

which contradicts with the converse assumption.Hence (2.7.31) must be held.By the definition of (2.7.6), and (2.7.30) we have

Since by (2.7.31), from (2.7.4) and (2.7.6) it follows that

Page 88: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 88/368

74 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

We now show For this it suffices to proveby noticing (2.7.34).

Since similar to (2.7.32) we have

and hence

From (2.7.32) and (2.7.36) it is seen that

where for the second inequality, (2.7.9) and are used, whilefor the last inequality (2.7.18) is invoked.

Paying attention to (2.7.10), we have and andby (2.7.16)

Then by (2.7.32) we see and (2.7.34) becomes

Thus, we can define

and have

Taking in Lemmas 2.7.1 and 2.7.2, and paying attentionto (2.7.4) and we know By Lemmas 2.7.1and 2.7.2, from (2.7.28) we see From (2.7.28)–(2.7.30) wehave obtained which together with the definition of 

implies and hence Therefore, iswell defined, and by the Taylor’s expansion we have

Page 89: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 89/368

Stochastic Approximation Algorithms withExpanding Truncations 75

where with components located in-between andWe now show that which, as to be shown, implies

a contradiction.By Lemma 2.7.2 we have

and hence

By (2.7.10) it follows that and by (2.7.11).Using Lemma 2.7.1, we continue (2.7.41) as follows:

Noticing we seeIt is clear that (2.7.35) and (2.7.37) remain valid with replacedby Hence, similar to (2.7.37) we have

By (2.7.11) and the Taylor’s expansion we have

and consequently,

and 

Page 90: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 90/368

76 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By (2.7.40), Substituting (2.7.44) into (2.7.43) and using(2.7.12) lead to

Estimating by the treatment similar to that used for

(2.7.26) yields

Noticing by Lemma 2.7.2 we find that

and

Hence, and by (2.7.15) from (2.7.45) it follows that

Using (2.7.14), from the above estimate we have

Page 91: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 91/368

Stochastic Approximation Algorithms withExpanding Truncations 77

From (2.7.18) it follows that Taking notice of (2.7.13) by

(2.7.17) we derive

On the other hand, by Lemma 2.7.2 and (2.7.11), (2.7.17), and (2.7.44)

it follows that

where

From (2.7.39), (2.7.40), and (2.7.48) we see that

and hence which contradicts with (2.7.47). This

means that the converse assumption of the lemma cannot be held.

Corollary  2.7.1 From Lemma 2.7.3 it follows that there exist 

and which is independent of and arbitrarily varying in

intervals and such that  

and for with sufficiently large the algorithm (2.1.1)–(2.1.3)

turns to an ordinary RM algorithm:

Set

Take and denote

By A2.7.2, Set

Page 92: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 92/368

78 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

If in (2.7.2), then In the general case may

be positive.

Theorem 2.7.1  Assume A2.7.1–A2.7.4 hold and is given by

(2.1.1)–(2.1.3) with (2.7.5) held. Then there exist 

and a nondecreasing, left-continuous function defined on such

that for the sample path for which A2.7.3 holds,

whenever and where and are the ones appearing in

(2.7.2) and (2.7.3), respectively. As a matter of fact, can be taken as

the inverse function of 

Proof.  Given recursively define

We now show that exists such that

Set and assume

From the recursion of we have

Assume is large enough such that by A2.7.3

Page 93: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 93/368

Stochastic Approximation Algorithms withExpanding Truncations 79

By a partial summation, from (2.7.57) we find that

where (2.7.58) is invoked.By (2.7.1) we see

Without loss of generality, we may assume Then by(2.7.1) we have

Applying (2.7.60) and (2.7.61) to (2.7.59) leads to

Page 94: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 94/368

80 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and hence

which implies (2.7.56).For and by (2.7.53)

Taking this into account for by (2.7.51)–(2.7.54) and the

Taylor’s expansion we have

Therefore, in the following Taylor’s expansion

we have and henceand

Denote

For we have

Page 95: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 95/368

Stochastic Approximation Algorithms withExpanding Truncations 81

From (2.7.63) and (2.7.64) it then follows that

Similar to (2.7.62), we see that

Consequently, we arrive at

Define

It is clear that is nondecreasing as increases and

Take such that Then we have

Page 96: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 96/368

82 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Define function

It is clear that is left-continuous, nondecreasing and

From (2.7.66) and (2.7.67) it follows that

which implies, by (2.7.57) and the definition of 

Corollary 2.7.2  If  in (2.7.2) may not be zero), then

and the right-hand side of (2.7.55) will be

Since may be arbitrarily small and hence the estimation errormay be arbitrarily small. If, in addition, in A2.7.3, then

tending and then in both sides of (2.7.55) we derive

In the case where by tending the right-hand side of  (2.7.55) converges to

Consequently, as the estimation error depends on how big isIf in (2.7.2), then can also be taken

arbitrarily small and the estimation error depends on the magnitude of 

2.8. Dynamic Stochastic   ApproximationSo far we have discussed the root-searching problem for an unknown

function, which is unchanged during the process of estimation. We nowconsider the case where the unknown functions together with their rootschange with time. To be precise, Let be a sequence of unknown

Page 97: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 97/368

Stochastic Approximation Algorithms withExpanding Truncations 83

functions with roots i.e.,Let be the estimatefor at time based on the observations

Assume the evolution of the roots satisfies the following equation

where areknown functions, while is a sequenceof dynamic noises.

The observations are given by

where is the observation noise and is allowed to depend on

In what follows the discussion is for a fixed sample, and the analysisis purely deterministic. Let us arbitrarily take as the estimate forand define

From equation (2.8.1), we see that may serve as a rough esti-mate for In the sequel, we will impose some conditions onand sothat

where is an unknown constant. Therefore, should notdiverge to infinity. But is unknown, so we will use the expandingtruncation technique.

Take a sequence of increasing numbers satisfying

Let be recursively defined by the following algorithm:

where denotes the number of truncations in (2.8.5) occurred untiltime

Page 98: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 98/368

84 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

We list conditions to be used.

A2.8.1 and 

A2.8.2 is measurable and for any

constant possibly depending on exists so that 

 for with

A2.8.3 is known such that 

 for where

and 

A2.8.4   and 

A2.8.5  There is a continuously  differentiable  function

such that     for  and for any

where is a positive constant possibly depending on and A con-

stant exists such that 

where is an unknown constant that is an upper bound for 

A2.8.6 For  any convergent  subsequence the observation noise

satisfies

where

Remark 2.8.1 Condition A2.8.2 implies the local boundedness, but theupper bound should be uniform with respect to In A2.8.3,measures the difference between the estimation error and the

Page 99: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 99/368

Stochastic Approximation Algorithms withExpanding Truncations 85

prediction error In general, is greaterthan For example, then A2.8.3 holdswith A2.8.4 means that in the root dynamics, thenoise should be vanishing.

As A2.2.3, Condition A2.8.6 is about existence of a Lyapunov func-tion. To impose such kind a condition is unavoidable in convergenceanalysis of SA algorithms. Inequality (2.8.7) is an easy condition. Forexample, if as then this condition is automati-cally satisfied. The noise condition A2.8.6 is similar to A2.2.3.

Before analyzingconvergence property ofthe algorithm (2.8.5), (2.8.6),and (2.8.2) we give an example of application of dynamic stochastic ap-

proximation.

Example 2.8.1 Assume that the chemical product is produced in abatch mode, and the product quality or quantity of the batch de-pends on the temperature in batch. When the temperature equals theideal one, then the product is optimized. Let denote the deviationof the temperature from its optimal value for the batch, wheredenotes the control parameter, which may be, for example, the pressurein batch, the quantity of catalytic promoter, the raw material propor-tion and others. The deviation reduces to zero if the control equals itsoptimal value i.e., Because of the environment change,the optimal parameter may change from batch to batch. Assume

where is known and is the noise.

Let be the estimate for Then may serve as a prediction

for Apply as the control parameter for the batch.Assume that the temperature deviation of for the thbatch can be observed, but the observation may be corrupted bynoise, i.e., where is the observationnoise.

Then we can apply algorithm (2.8.5), (2.8.6), and (2.8.2) to estimateUnder conditions A2.8.1–A2.8.6, by Theorem 2.8.1 to be proved in

this section, the estimate is consistent, i.e.,

Theorem 2.8.1 Under Conditions A2.8.1–A2.8.6 the estimation error 

tends to zero as where is given by (2.8.5),

(2.8.6), and (2.8.2).

To prove the theorem we start with lemmas.

Page 100: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 100/368

86 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Lemma 2.8.1 Under A2.8.3 and 2.8.4, the sequence

is bounded for any

Proof.  By A2.8.3 and A2.8.4 from (2.8.1) it follows that

Lemma 2.8.2  Assume A2.8.1–A2.8.4 and A2.8.6 hold. Let  beaconvergent subsequence such that as Then, there

are a sufficiently small and a sufficiently large integer such that  

 for 

where is implied by

  for where is a constant independent 

of 

Proof. In the case as

is bounded, and hence is bounded. By Lemma

2.8.1, is bounded. Therefore, is bounded. For

large and

The following expression (2.8.11) and estimate (2.8.12) will frequentlybe used. By (2.8.1) and A2.8.3 we have

Page 101: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 101/368

Stochastic Approximation Algorithms withExpanding Truncations 87

and

Substitution of (2.8.12) into (2.8.10) leads to

By boundedness of and A2.8.3,

for some ByA2.8.4, while the last term is also

less than by A2.8.6.Without loss ofgenerality, we may assume

Therefore, and the lemma is true for the case

We now consider the case as Let be so large

that for

with being a constant, and

where is given by (2.8.8).

as

Page 102: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 102/368

88 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Without loss of generality we may assume

Define and take T  so small that Weprove the lemma by induction.

By (2.8.8) and (2.8.12), we have

Therefore, at time there is no truncation. Then by (2.8.11) and

(2.8.12) we have

where (2.8.14) and (2.8.15) have been used.Let the conclusions of the lemma hold for

We prove that it also holds for Again by (2.8.12), we have

Hence there is no truncation at time By the inductive assump-tion, (2.8.11) and (2.8.12), it follows that

where (2.8.13) and (2.8.14) are invoked.Therefore, the conclusions of the lemma are also true for This

completes the proof.

Lemma 2.8.3  Assume A2.8.1–A2.8.6 hold. Then the number of trun-

cations in (2.8.5) is finite and isbounded.

Page 103: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 103/368

Page 104: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 104/368

90 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and (2.8.11) we have

Notice that by Lemma 2.8.2 and (2.8.13)

for sufficiently large From (2.8.21) and (2.8.23), it follows that

On the other hand, by Lemma 2.8.2

Identifying and in A2.8.5 to and respectively, we can

find such that

by A2.8.5.

Page 105: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 105/368

Stochastic Approximation Algorithms withExpanding Truncations 91

Let us consider the right-hand side of (2.8.22). Noticing

by A2.8.3 and A2.8.4 we have

By A2.8.6,

Noticing that

as and by continuity of we find thattends to zero as and

Since the sum of the first and second

terms on the right-hand side of (2.8.22) is as and

This combining with (2.8.26) yields the following conclusion that for

with sufficiently large and for small enough T   from (2.8.22) it

follows that

By (2.8.20), tending to infinity, from (2.8.30) we derive

By Lemma 2.8.2 we have

However, by definition,and Hence from (2.8.32), we must have

if  T   is small enough. Therefore, This contradicts

(2.8.31). The obtained contradiction shows that

Page 106: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 106/368

92 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Theorem 2.8.2  Assume A2.8.1–A2.8.6 hold. Then the estimation er-ror tends to zero as

Proof. We first show that converges. Assume the converse:

where because is bounded by Lemma 2.8.3.It is clear that there exists an interval that does not containzero such that Without loss of generality, assume

From A2.8.6, it follows that there are infinitely manysequences such that and that

forWithout loss of generality we may assume converges:

Since exists such that and byLemma 2.8.2, Completely thesame argument as that used for (2.8.22)–(2.8.32) leads to a contradiction.Hence is convergent.

We now show that as Assume the converse: thereis a subsequence By the same argument we again arrive

at (2.8.30). Tending by convergence of we obtain acontradictory inequality This implies that as

The following theorem is similar to Theorem 2.4.1.

Theorem 2.8.3  Assume A2.8.1–A2.8.5 hold and is continuous at 

uniformly in Then as if and only if A2.8.6  

holds. Furthermore, under conditions A2.8.1–A2.8.5, the following three

conditions are equivalent.

1) Condition A2.8.6;

2)

3) can be decomposed into two parts: so that 

Proof. Assume as Then is bounded. Wehave shown in the proof of Lemma 2.8.3 that the number of truncationsmust be finite if is bounded. Therefore, starting from some thealgorithm (2.8.5) becomes

The following theorem is similar to Theorem 2.4.1.

and as

Page 107: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 107/368

Stochastic Approximation Algorithms withExpanding Truncations 93

From (2.8.11) we have

Set

By A2.8.3 and A2.8.4 and as

while tends to zero becauseis uniformly continuous at and Consequently,3) holds.

On the other hand, it is clear that 3) implies 2), which in turn im-plies A2.8.6. By Theorem 2.8.1, under A2.8.1–A2.8.5, Condition A2.8.6implies as

Thus, the equivalence of l)–3) has been justified under A2.8.1–A2.8.5.

2.9. Notes and ReferencesThe initial version of SA algorithms with expanding truncations and

its associated analysis method were introduced in [27], where the algo-rithm was called SA with randomly varying truncations. Convergenceresults of this kind of algorithms can also be found in [14, 28]. The-orems given in Section 2.2 are the improved versions of those given in[14, 27, 28]. Theorems in Section 2.3 can be found in [18]. Necessity of 

the noise condition is proved in [24, 94] for the single-root case, and in[17] for the multi-root case.Convergence results of SA algorithms with additive noise can be found

in [16]. Concerning the measure theory, we refer to [31, 76, 84]. Resultsgiven in Section 2.6 can be found in [48], and some related problems arediscussed in [3]. For the proof of Remark 2.6.4 we refer to Theorem 3.3in [34]. Example 2.6.1 can be found in [93]. Robustness of SA algorithmsis presented in [24]. The dynamic SA was considered in [38, 39, 91], but

the results presented in Section 2.8 are given in [25].

Page 108: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 108/368

Chapter 3

ASYMPTOTIC PROPERTIES OF STOCHA-

STIC APPROXIMATION ALGORITHMS

In Chapter 2 we were mainly concerned with the path-wise conver-gence analysis for SA algorithms with expanding truncations. Condi-tions were given to guarantee where   J   denotes the

root set of the unknown function, and the estimate for unknown rootgiven by the algorithm.

In this chapter, for the case where  J   consists of a singleton we

consider the convergence rate of asymptotic normality of and asymptotic efficiency of the estimate.Assume is differentiable at   Then as

whereIt turns out that the convergence rate heavily depends on whether

or not  F   is degenerate. Roughly speaking, in the case where the stepsize in (2.1.1) the convergence rate of forsome positive when F is nondegenerate, and for some

when F   vanishes.It will be shown that is asymptotically normal and the covari-

ance matrix of the limit distribution depends on the matrix  D  if in(2.1.1) the step size is replaced by If  F   in (3.0.1) is available,then D can be defined to make the limiting covariance matrix minimal,i.e., to make the estimate efficient. However, this is not the case in SA.To overcome the difficulty one way is to derive the approximate valueof  F  by estimating it, but for this one has to impose rather heavy con-ditions on Efficiency here is derived by using a sequence of slowly

95

is

and 

to

Page 109: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 109/368

96 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

decreasing step sizes, and the averaged estimate appears asymptoticallyefficient.

3.1. Convergence Rate: Nondegenerate CaseIn this section, we give the rate of convergence of to zero

in the case  F   in (3.0.1) is nondegenerate, where is given by (2.1.1)–(2.1.3). It is worth noting that F is the coefficient for the first order inthe Taylor’s expansion for

The following conditions are to be used.

A3.1.2  A continuously differentiable function exists

such that 

 for any and for some with

where is used in (2.1.1).

A3.1.3  For the sample path under consideration the observation noise

in (2.1.3) can be decomposed into two parts such that 

 for some

A3.1.4 is measurable and locally bounded, and is differentiable at 

such that as

The matrix F is stable (This implies nondegeneracy of F.), in ad-

dition, is  also  stable, where  and   are given by (3.1.1) and (3.1.3), respectively.

By stability of a matrix we mean that all its eigenvalues are with

negative real parts.

and 

Page 110: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 110/368

 Asymptotic Properties of Stochastic Approximation Algorithms 97

Remark 3.1.1  We now compare A3.1.1–A3.1.4 with A2.2.1–A2.2.4. Be-cause of additional requirement (3.1.1), A3.1.1 is stronger than A2.2.1,but it is automatically satisfied if with In this case a in(3.1.1) equals Also, (3.1.1) is satisfied if with

In this case Take sufficiently small such that

Then and Assume

is a martingale difference sequence with

Then by the convergence theorem for martingale difference se-

quences,  Therefore (3.1.3) is satisfied a.s. with

Condition A3.1.4 assumes differentiability of whichis not required in A2.2.4.

Lemma  3.1.1  Let and H be -matrices. Assume H is stable

and If satisfies A3.1.1 and l-dimensional vectors

satisfy the following conditions

then defined by the following recursion with arbitrary initial value

tends to zero:

Proof. Set

We now show that there exist constants and such that

Let S  be any negative definite matrix. Consider

at

Page 111: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 111/368

Page 112: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 112/368

 Asymptotic Properties of Stochastic Approximation Algorithms 99

and hence

where denotes the minimum eigenvalue of  P.

Paying attention to that

from (3.1.13) we derive

which verifies (3.1.8).From (3.1.6) it follows that

We have to show that the right-hand side of (3.1.14) tends to zero as

For any fixed   because of (3.1.1) and(3.1.8). This implies that as for any initial value

Since as for any exists such thatThen by (3.1.8) we have

The first term at the right-hand side of (3.1.15) tends to zero by A3.1.1,while the second term can be estimated as follows:

as

Page 113: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 113/368

100 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where the first inequality is valid for sufficiently large sinceas and the second inequality is valid when

Therefore, the right-hand side of (3.1.15) tends to zero asand then

Set

By assumption of the lemma Hence, for anythere exists such that By a partialsummation, we have

where except the last term, the sum of remaining terms tends to zero asby (3.1.8) and

Let us now estimate

Page 114: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 114/368

 Asymptotic Properties of StochasticApproximation Algorithms 101

Since for and as   by (3.1.8)

we have

which tends to zero as and by (3.1.16) and the factthat Thus, the right-hand side of (3.1.17) tends to

zero as and the proof of the lemma is completed.Theorem 3.1.1   Assume A3.1.1- A3.1.4 hold.  Then given by (2.1.1)– 

(2.1.3) for those sample paths for which (3.1.3) holds converges to

with the following convergence rate:

where is the one given in (3.1.3).

Proof.  We first note that by Theorem 2.4.1 and there is no

truncation after a finite number of steps. Without loss of generality, we

may assumeBy (3.1.1), Hence, by the Taylor’s expansion we

have

Write given by (3.1.4) as follows

where

By (3.1.4) and (3.1.19), for sufficiently large k we have

Page 115: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 115/368

Page 116: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 116/368

 Asymptotic Properties of StochasticApproximation Algorithms 103

if is a martingale difference sequence with

So, for (3.1.25) it is sufficient to require

Since the best convergence rate is achievedat the convergence rate is Sincethe convergence rate is slowing down as approaches

to When (3.1.25) cannot be guaranteed. From this it is seenthat the convergence rate depends on how big is.

3.2. Convergence Rate: Degenerate CaseIn the previous section, for obtaining the convergence rate of 

stability and hence nondegeneracy of F  is an essential requirement. Wenow consider what will happen if the linear term vanishes in the Taylor’sexpansion of For this we introduce the following set of conditions:

A3.2.2  A continuously differentiable function exists

such that 

 for any and for some withwhere is used in (2.1.1);

A3.2.3 For the observation noise on the sample path under con-sideration the following series converges:

where

A3.2.4 is measurable and locally bounded, and is differentiable at 

such that as

where F is a stable matrix, and   is the one used in A3.2.3.

We first note that in comparison with A3.1.1–A3.1.4, here we do notrequire (3.1.1), but A3.2.2 is the same as A3.1.2. From (3.2.3) we see that

A3.2.1 and 

or

For

Page 117: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 117/368

104 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

the Taylor’s expansion for does not contain the linear term. HereF   is the coefficient for a term higher than second order in the Taylor’sexpansion of The noise condition A3.2.3 is different from A3.1.3,but, as to be shown by the following lemma, it also implies A2.2.3.

Lemma 3.2.1  If  (3.2.2) holds, then and hence A2.2.3

is satisfied.

Proof. We need only to show

Setting

by a partial summation we have

Since as and converges as the first twoterms on the right-hand side of (3.2.4) tend to zero as and

The last term in (3.2.4) is dominated by

where

By the following elementary calculation we conclude that the right-hand side of (3.2.5) tends to zero as and

as

Page 118: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 118/368

 Asymptotic Properties of StochasticApproximation Algorithms 105

which tends to zero as and because as

This combining with (3.2.4) and (3.2.5) shows that

By the Lyapunov equation (3.1.9), there is a positive definite matrixP > 0 such that

Assuming   is large enough so that there is no truncation, by (3.2.3) wehave

where is the maximum eigenvalue of P given by (3.2.6).

We start with lemmas. Note that by Theorems 2.2.1 or 2.4.1Therefore, starting from some   the algorithm has no truncation.

Define

Denote by and the maximum and minimum eigenvalues of  P, respectively, and by  K  the condition number

Theorem 3.2.1  Assume A3.2.1–A3.2.4 hold and is given by (2.1.1)

 –(2.1.3).  Then for the sample paths where A3.2.3 holds the following

convergence rate takes place:

Page 119: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 119/368

Page 120: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 120/368

 Asymptotic  Properties  of   StochasticApproximation  Algorithms 107

consider the case since if it is not true then   is clearlybounded.

Let P be given by (3.2.6). We have

where

In what follows we will prove that

By (3.2.10) and (3.2.6) it is clear that

where the last inequality follows by the following consideration:

By (3.2.11) so for (3.2.16) it suffices to show that

By definition of we have and hence

or

Page 121: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 121/368

108 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

Consequently,

and by the agreement

which verifies the last inequality in (3.2.16).

We now estimate By (3.2.10) (3.2.11) and the agreementwe have

Noticing that, as agreed,from (3.2.17) we have

and by (3.2.13),

Again, from (3.2.10) and noticing we have

Consequently, by (3.2.12)

Combining (3.2.14), (3.2.16), (3.2.18), and (3.2.20) yields

for

and

Page 122: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 122/368

Page 123: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 123/368

110 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

Proof of Theorem 3.2.1.By Lemma 3.2.2 and the fact

we have

where

By setting

from (3.2.9) it follows that

This is nothing else but an RM algorithm. Since by Lemma 3.2.2is bounded, no truncation is needed and one may apply Theorem 2.2.1”.

First note that

Hence, A2.2.1 is satisfied.

as   So A2.2.3 holds with replaced by

A2.2.4 is clearly satisfied, since is continuous. The key issue is tofind a satisfying A2.2.2”.

Take

and define   which is closed.Notice

Notice and

as

by

Page 124: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 124/368

 Asymptotic Properties of StochasticApproximation Algorithms 111

ForThen we have

This means that

and the condition A2.2.2” holds.By Theorem 2.2.1”, This implies

which in turn implies (3.2.7) by (3.2.8).Imposing some additional conditions on F , we may have more precise

than (3.2.7) results by using different Lyapunov functions.

Theorem  3.2.2  Assume A3.2.1–A3.2.4 hold, in addition, assume F is

normal, i.e., Let be given by (2.1.1)–(2.1.3). Then

 for those sample paths for which A3.2.3 holds, converges

to either zero or one of where denotes an eigenvalue of  

 More precisely,

where is an unit eigenvector of H corresponding to

Proof.  Since F  is stable, the integral

is well defined. Noticing that we have

and

and

for

Page 125: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 125/368

112 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

This means that  H  is also stable. Therefore, all eigenvalues arenegative. Further, by we  find

and hence

We consider (3.2.23) and take

By (3.2.26) we have

Define

Obviously,

for any

Clearly,

where is the dimension of Thus,  J  is a discrete set, and is nowhere dense because is

continuous. This together with (3.2.28) shows that A2.2.2’ is satisfied.

and

Page 126: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 126/368

 Asymptotic Properties of StochasticApproximation Algorithms 113

By Theorem 2.2.1’, and (3.2.25) is verified.

Corollary 3.2.1  Let  Then

 In this case,

and hence (3.2.7) and (3.2.25) are respectively equivalent to

and 

Remark  3.2.1  For the convergence rate given by (3.1.18)for the nondegenerate case is while for the degenerate case is

by (3.2.29), which is much slower than

3.3. Asymptotic Normality

In Theorem 3.1.1 we have shown that givenby (2.1.1)–(2.1.3). As shown in Remark 3.1.2,

This is a path-wise result. Assuming the observation noise isa random sequence, we show that is asymptotically normal,

i.e., the distribution of converges to a normal distributionas This convergence implies that in the convergence rate

cannot be improved toWe first consider the linear regression case, i.e., is a linear func-

tion, but may be time-varying.Let us introduce a central limit theorem on double-indexed random

variables. We formulate it as a lemma.

Lemma  3.3.1   Let  be an array of   l-dimensional random

vectors. Denote

as

for

if 

Page 127: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 127/368

114 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

and 

 Assume

and 

Then

where and  hereafter denotes the normal distribution with meanand covariance S.

Let us first consider the linear recursion (3.1.6) and derive its asymp-totic normality. We keep the notation introduced by (3.1.7).

We have obtained estimate (3.1.8) for and now derive moreproperties for it.

Lemma 3.3.2  Assume and 

 H where H is stable. Then for any

Proof. By (3.1.8) it follows that

Page 128: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 128/368

 Asymptotic Properties of StochasticApproximation Algorithms 115

We will use the following elementary inequality

which follows from the fact that the function equals

zero at  x  = 0 and its derivative By (3.3.8), we derive

which implies

Assume is sufficiently large such that Then

where for the last inequality (3.3.9) is invoked.

Combining (3.3.7) and (3.3.10) gives (3.3.6).

Lemma 3.3.3 Set 

Page 129: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 129/368

116 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

Under conditions of Lemma 3.3.2,

uniformly  with  respect   to and 

uniformly with respect to

Proof.  Expanding to the series

with we have

where by definition

By stability of  H , there exist constants and  p > 0 such that

Putting (3.3.13) into (3.3.12) yields that for any

where for the last inequality is assumed to be sufficiently large suchthat and (3.1.8) is used too.

as

as

Page 130: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 130/368

 Asymptotic  Properties  of   StochasticApproximation  Algorithms 117

Since  and may  be arbitrarily small the conclusions

of the lemma follow from (3.3.14) by Lemma 3.3.2.

Lemma 3.3.4  Assume as and  

 Let A, B, and Q be matrices and let A and B be stable. Then

Proof. For any T  > 0 define

Since for fixed T. Denoting

by we then have Consequently,

serves as an integral sum for or equivalently, for

and hence

Therefore, for (3.3.15) it suffices to show that

Similar to (3.3.10), by stability of  A we can show that there is a constantsuch that

Page 131: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 131/368

118 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

By stability of  A and  B, constants  and existsuch that

Consequently, we have

which verifies (3.3.18) and completes the proof of the lemma.

Theorem 3.3.1  Let be given by (3.1.6) with an arbitrarily given

initial value. Assume the following conditions holds:

where are constant  matrices with is

a martingale difference sequence of   dimension satisfying the following

conditions:

  and 

  and  is stable;l

 I 

and 

as

Page 132: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 132/368

 Asymptotic Properties of StochasticApproximation Algorithms 119

and 

Then is  asymptotically normal:

where

Proof.  Define by the following recursion

By (3.1.6) it follows that

Using (3.3.19) we have

Consequently,

where

Page 133: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 133/368

120 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

and

by (3.3.20).Define

By (3.3.30) and stability of  A,  from (3.1.8) it follows that constantsand exist such that

Consequently, from (3.3.29) we have

The first term on the right-hand side of (3.3.34) tends to zero as

by (3.3.33), while the second term is estimated as follows. By (3.3.31)

where for the last equality, Lemma 3.3.2 and (3.3.33) are used. Thismeans  that   r and   have the same limit distribution if exists.

Consequently, for the theorem it suffices to show

Similar to (3.3.29) and (3.3.31), by (3.3.28) we have

Page 134: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 134/368

 Asymptotic Properties of StochasticApproximation Algorithms 121

Noticing

by Lemma 3.3.2 and (3.1.8), we find that the last term of (3.3.36) tends

to zero in probability. Therefore, for (3.3.24) it suffices to show

We now show that for (3.3.37) it is sufficient to prove

For any fixed   we  have

By (3.3.21) we have

where convergence to zero follows from   and  Lemma  3.3.2.

It is worth noting that the convergence is uniform with respect to This

By (3.3.21) and we see that

Page 135: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 135/368

122 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

implies that the second term on the right-hand side of (3.3.39) tends tozero in probability. The first term on the right-hand side of (3.3.39) canbe rewritten as

By (3.3.33) for any fixed   we estimate the first term of (3.3.40) as follows

while for the second term we have

since and

We now show that the last term of (3.3.40) also converges to zero inprobability as

Notice that by (3.3.28), for any fixed and

Therefore, for a fixed  there exist constants

and such that

as

Page 136: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 136/368

 Asymptotic  Properties  of   StochasticApproximation   Algorithms 123

Then the last term of (3.3.40) is estimated as follows:

For the first term on the right-hand side of (3.3.44) we have

where the last inequality is obtained because is bounded

by some constant by (3.3.30). Since is fixed, in order to

prove that the right-hand side of (3.3.45) tends to zero as itsuffices to show

Page 137: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 137/368

124 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

By (3.3.33), for any fixed

while for any given we may take sufficiently large such that

Therefore,

by Lemma 3.3.2.Incorporating (3.3.47) with (3.3.48) proves (3.3.46). Therefore, the

right-hand side of (3.3.45) tends to zero as This implies

that the first term on the right-hand side of (3.3.44) tends to zero inprobability.By (3.3.43), for the last term of (3.3.44) we have

which tends to zero as as can be shown by an argument similar

to that used for (3.3.45).In summary we conclude that the right-hand side of (3.3.44) tends

to zero in probability, and hence all terms in (3.3.40) tend to zero inprobability. This implies that the right-hand side of (3.3.39) tends tozero in probability as and then Thus, we have shown

that for (3.3.37) it suffices to show (3.3.38).

We now intend to apply Lemma 3.3.1, identifying

to in that lemma. We have to check conditions of the lemma.Since is a martingale difference sequence, (3.3.1) is obviously

satisfied.

Page 138: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 138/368

 Asymptotic Properties of   StochasticApproximation Algorithms 125

By (3.3.22) and Lemma 3.3.2,

This verifies (3.3.3). We now verify (3.3.2). We have

where the last term tends to zero by (3.3.22) and Lemma 3.3.2.We show that the first term on the right-hand side of (3.3.49) tends

to (3.3.25).

With A and respectively identified to  H  and in Lemma 3.3.3,

by Lemmas 3.3.2 and 3.3.3 we have

This incorporating with (3.3.49) leads to

By Lemma 3.3.4 we conclude

Finally, we have to verify (3.3.4).

Page 139: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 139/368

126 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

By (3.3.33) we have

Noticing that uniformly with respect to

since or equivalently,

uniformly with respect to by (3.3.23) we have

Consequently, for any by Lemma 3.3.2

Thus, all conditions of Lemma 3.3.1 hold, and by this lemma we conclude(3.3.38). The proof is completed.

Remark 3.3.1 Under the conditions of Theorem 3.3.1, if integers

are such that then it can be

shown that converges in distribution to

where is a stationary Gaussian Markov process satisfying

the following stochastic differential equation

where is the   standard Wiener process.

Page 140: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 140/368

 Asymptotic Properties of   StochasticApproximation  Algorithms 127

Corollary 3.3.1 From (3.1.7) and (3.3.28), similar to (3.3.29)–(3.3.31)we have

and

By (3.3.33), the first term on the right-hand side of (3.3.50) tendsto zero as Note that the last term in (3.3.34) has beenproved to vanish as and it is just a different writing of  

Therefore, from (3.3.50) by Theorem 3.3.1, it fol-

lows that for any fixed

We have discussed the asymptotic normality of for the case

where is linear. We now consider the general Let us firstintroduce conditions to be used.

and 

A3.3.2  A continuously differentiable function  exists such that 

 for any and for some with

where is used in (2.1.1).

 for some

Page 141: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 141/368

128 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

where is a martingale difference sequence satisfying (3.3.21)–  

(3.3.23).

A3.3.3

A3.3.4 is measurable and locally bounded. As

where with a specified in (3.3.52) is stable and 

satisfying which  is specified   in (3.3.53).

Theorem 3.3.2  Let be given by (2.1.1)–(2.1.3) and let A3.3.1– 

 A3.3.4 be held. Then

where

Proof. Since there exists such that

which implies From (3.3.53) it follows that

This together with the convergence theorem for martingale differencesequences yields

Page 142: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 142/368

 Asymptotic Properties of  StochasticApproximation Algorithms 129

which implies

Since  from it follows thatStability of is implied by stability of which is a part of  A3.3.4. Then by Theorem 3.1.1

By (3.3.55) and (3.3.58) we have

From Theorem 3.1.1 we also know that there is an integer-valued

(possibly depending on sample paths) such thatand there is no truncation in (2.1.1) for Consequently,for  we have

Denoting

by (3.3.59) and (3.3.54) we see   a.s.Then (3.3.60) is written as

By (3.3.28) it follows that

where

Page 143: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 143/368

130 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Using introduced by (3.3.32), we find

By the argument similar to that used in Corollary 3.3.1, we have

a n d a s

Then by (3.3.51) from (3.3.63) we conclude (3.3.56).

Corollary 3.3.2 Let D be an matrix and let  in (2.1.1)–(2.1.2)be replaced by In other words, in stead of (2.1.1) and (2.1.2) if  

we consider

then this is equivalent to replacing and by andrespectively.

In this case the only modification should be made in conditions of Theorem 3.3.2 consists in that stability of  in A3.3.4 should bereplaced by stability of The conclusion of Theorem 3.3.2 re-

mains valid with only modification that and F in (3.3.57)

should be replaced by and  DF,  respectively.

3.4. Asymptotic EfficiencyIn Corollary 3.3.2 we have mentioned that the limiting covariance

matrix S ( D) for depends on D, if in (2.1.l)–(2.1.3) is replaced

by By efficiency

 we

 mean

 that

 S ( D)

 reaches

 its minimum withrespect to D.

Denote

Page 144: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 144/368

Page 145: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 145/368

Page 146: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 146/368

 Asymptotic  Properties  of   StochasticApproximation   Algorithms 133

In what follows we will show that is asymptotically normaland is asymptotically efficient.

We list the conditions to be used.

A3.4.1 nonincreasingly converges to zero,

and for some

A3.4.2 A continuously differentiable function exists such that 

 for any and   for some with

where is used in (2.1.1).

A3.4.3 The observation noise is such that  

with being a constant independent of   and 

where is specified in (3.4.7).

A3.4.4 is measurable and locally bounded. There exist a stable ma-

trix F, and such that  

Page 147: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 147/368

134 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

where  is a constant.

Remark 3.4.1 It is clear that satisfies A3.4.1.From (3.4.7) it follows that

where denotes the integer part of 

Since is nonincreasing, from (3.4.12) we have

which implies

or

Remark 3.4.2 If with being a martingale

difference sequence satisfying (3.3.21)–(3.3.23), then identifying to

in Lemma 3.3.1, by this lemma we have

where   is given by (3.4.1). Thus, in this case the second condition in(3.4.8) holds.

We now show that the first condition in (3.4.8) holds too.By the estimate for the weighted sum of martingale difference se-

quences (See Appendix B) we have

which incorporating with (3.4.13) yields

Page 148: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 148/368

 Asymptotic Properties of   StochasticApproximation Algorithms 135

It is clear that (3.4.9) is implied by (3.3.21). Therefore, in the presentcase all requirements in A3.4.3 are satisfied.

Theorem 3.4.2  Assume A3.4.1–A3.4.4 hold. Let be given by

(2.1.1)–(2.1.3) and  be given by (3.4.5). Then is asymptoticallyefficient:

Prior to proving the theorem we establish some properties of slowly

decreasing step size.Set

By (3.1.8) we have

where and are constants.

Set

Lemma  3.4.1  i) The following estimate takes place

where o(1) denotes a magnitude that tends to zero as

ii) is uniformly bounded with respect to both and 

and 

Proof. i) By (3.4.6) we know that

Page 149: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 149/368

136 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

and

which implies (3.4.17) since asii) By (3.4.6) as and hence for any we have

where denotes the integer part of  Using (3.4.15) we have

for any where the first term at the right-hand side tends to zeroas by (3.4.20), and the last term tends to zero asTherefore, for (3.4.18) it suffices to show

Noticing that (3.4.13) implies for any we have

Page 150: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 150/368

Page 151: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 151/368

Page 152: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 152/368

 Asymptotic Properties of StochasticApproximation Algorithms 139

Lemma 3.4.2 Under Conditions A3.4.1–A3.4.4, there exists an integer-

valued such that a.s., a.s., and   given  by(2.1.1)–(2.1.3) has no truncation for i.e.,

and a.s.

Proof. If we can show that A2.2.3 is implied by A3.4.3, then all condi-tions of Theorem 2.2.1 are fulfilled a.s., and the conclusions of the lemma

follow from Theorem 2.2.1.

Since we have

which means that (2.2.2) is satisfied forWe now check (2.2.2) for By a partial summation we have

where (3.4.6) is used and asBy (3.4.8) the first two terms on the right-hand side of (3.4.34) tend

to zero as by the same reason and by the fact

the last term of (3.4.34) also tends to zero as This means thatsatisfies (2.2.2), and the lemma follows.

By Lemma 3.4.2 we have

Page 153: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 153/368

140 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

and by (3.4.14)

For specified in (3.4.11) and a deterministic integer define the

stopping time as follows

From (3.4.35) we have

and

Lemma  3.4.3  If A 3.4.1-A3.4.4 hold, then

is uniformly bounded with respect to

Page 154: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 154/368

 Asymptotic Properties of StochasticApproximation Algorithms 141

Proof. By (3.4.11) and (3.4.15) from (3.4.39) we have

where respectively denote the terms on the right-handside of the inequality in (3.4.40).

By (3.4.19) we see

where as From this we find that is bounded inif is large enough so thatBy (3.4.19) we estimate as  follows:

where is assumed to be large enough such that

Thus, by (3.4.9)

Page 155: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 155/368

142 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

We now pay attention to (3.3.10) in the proof of Lemma 3.3.2 and findthat the right-hand side of (3.4.42) is bounded with respect to  

For by (3.4.19) and (3.4.10) we have

where is a constant. Again, by (3.3.10), is bounded inIt remains to estimate By Schwarz inequality we have

By (3.4.19), for large enough

which, as shown by (3.3.11), is bounded in we then by (3.4.37) have

where is a constant.

Combing (3.4.40)-(3.4.44) we find that there exists a constantsuch that

Page 156: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 156/368

 Asymptotic  Properties  of   StochasticApproximation  Algorithms 143

Setting

and

from (3.4.45) we have

where is a constant.Denoting

from (3.4.48) we find

where is set to equal to 1.

From (3.4.48) and (3.4.50) it then follows that

which combining (3.4.46) leads to

Page 157: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 157/368

144 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

where for the last equality we have used (3.4.47).Choosing sufficiently small so that

from (3.4.51) we then have

which is bounded with respect to as shown by (3.3.10).

Lemma  3.4.4  If A3.4.1-A3.4.4 hold, then

Proof.  It suffices to prove

Then the lemma follows from (3.4.53) by using the Kronecker lemma.By (3.4.11) and (3.4.37) we have

where the last inequality follows by using the Lyapunov inequality.

Page 158: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 158/368

Page 159: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 159/368

146 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

By Lemma 3.4.2, a.s. and

Consequently,

where as

Noticing we have

and hence

By (3.4.16) and (3.4.57), from here we derive

By Lemma 3.4.1,   is bounded. Then with the help of (3.4.58) wehave

Page 160: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 160/368

 Asymptotic  Properties  of   StochasticApproximation  Algorithms 147

From (3.4.58) and the boundedness of  there exists a constantsuch that

Then, we have

where the convergence to zero a.s. follows from Lemma 3.4.4.Putting (3.4.59), (3.4.61) into (3.4.56) leads to

By (3.4.58) we then have

Notice that

Let us denote by the upper bound for where the existence of is guaranteed by Lemma 3.4.1. Then using (3.4.9) and (3.4.18) wehave

Page 161: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 161/368

Page 162: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 162/368

 Asymptotic  Properties of   StochasticApproximation   Algorithms 149

This incorporating with (3.4.8) implies the conclusion of thetheorem.

This theorem tells us that if in (2.1.1)-(2.1.3) we apply the slowly

decreasing step size, then the averaged estimate leads to the minimal

covariance matrix of the limit distribution.

3.5. Notes and ReferencesConvergence rates and asymptotic normality can be found in [28, 68,

78] for the nondegenerate case. The rate of convergence for the degen-erate case was first considered by Pflug in [74]. The results presented inSection 3.2 are given in [15, 47].

For the proof of central limit theorem (Lemma 3.3.1) we refer to [6,56, 78], while for Remark 3.3.1 refer to [78]. The proof of Theorem 3.3.1and 3.3.2 can be found in [28].

Asymptotic normality of stochastic approximation algorithm was firstconsidered in [44].

For asymptotic efficiency the averaging technique was introduced in[80, 83], and further considered in [35, 59, 66, 67, 74, 98]. Theoremsgiven in Section 3.4 can be found in [13]. For adaptive stochastic ap-proximation refer to [92, 95].

Page 163: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 163/368

Chapter 4

OPTIMIZATION BY STOCHASTIC

APPROXIMATION

Up-to now we have been concerned with finding roots of an unknownfunction observed with noise. In applications, however, one often

faces to the optimization problem, i.e., to finding the minimizer or max-

inizer of an unknown function It is well know that achieves

its maximum or minimum values at the root set of its gradient, i.e., at

although it may be only in the local sense.

The gradient is also written asIf the gradient can be observed with or without noise, then theoptimization problem is reduced to the SA problem we have discussed in

previous chapters. Here, we are considering the optimization problem for

the case where the function itself rather than its gradient is observed

and the observations are corrupted by noise. This problem was solved

by the classical Kiefer-Wolfowitz (KW) algorithm which took the finite

differences to serve as estimates for the partial derivatives. To be precise,

let be the estimate at time for the minimizer (maximizer) of  and let

be two observations on at time with noises and

respectively, where

are two vectors perturbed from the estimate by and respec-tively, on the component of The KW algorithm suggests taking

151

Page 164: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 164/368

152 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

the finite difference

as the observation of the component of the gradientIt is clear that

where the component of equals

The RM algorithm

with defined above is called the KW algorithm.It is understandable that in the classical theory for convergence of 

the KW algorithm rather restrictive conditions are imposed not only onbut also on and Besides, at each iteration to form finite

differences, observations are needed, where is the dimension of 

In some problems may be very large, for example, in the problem of optimizing weights in a neuro-network corresponds to the number of nodes, which may be large. Therefore, it is of interest not only to weakenconditions required for convergence of the optimizing algorithm but alsoto reduce the number of observations per iteration.

In Section 4.1 the KW algorithm with expanding truncations usingrandomized differences is considered. As to be shown, because of replac-ing finite differences by randomized differences, the number of observa-

tions is reduced from to 2 for each iteration, and because of involvingexpanding truncations in the algorithm and applying TS method forconvergence analysis, the conditions needed for have been weak-ened significantly and the conditions imposed on the noise have beenimproved to the weakest possible. The convergence rate and asymp-totic normality for the KW algorithm with randomized differences and

expanding truncations are given in Section 4.2.The KW algorithm as other gradient-based optimization algorithms

may be stuck at a local minimizer (or maximizer). How to approachto the global optimizer is one of the important issues in optimizationtheory. Especially, how to pathwisely reach the global optimizer is adifficult and challenging problem. In Section 4.3 the KW algorithm iscombined with searching initial values, and it is shown that the resultingalgorithm a.s. converges to the global optimizer of the unknown function

Page 165: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 165/368

Optimization by Stochastic Approximation 153

The obtained results are then applied to some practical problemsin Section 4.4.

4.1. Kiefer-Wolfowitz Algorithm with

Randomized DifferencesThere is a fairly long history of random search or approximation ideas

in SA. Different random versions of KW algorithm were introduced: forexample, in one version a sequence of random unit vectors that are inde-pendent and uniformly distributed on the unit sphere or unit cube wasused; and in another version the KW algorithm with random directionswas introduced and was called a simultaneous perturbation stochasticapproximation algorithm.

Here, we consider the expandingly truncated KW algorithm with ran-domized differences. Conditions needed for convergence of the proposedalgorithm are considerably weaker than existing ones.

Conditions on

Let be a sequence of independent andidentically distributed (iid) random variables such that

Furthermore, let be independent of the algebra generated by

is the observation noise to be explained later.For convenience of writing let us denote

It should be emphasized that is a vector and is irrelevant to inverse.At each time two observations are taken: either

or

Page 166: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 166/368

154 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where is the estimate for the sought-for minimizer (maximizer) of  denote the observation noises, and is a real

number.The randomized differences are defined as

and

may serve as observations of randomized differences.To be fixed, let us consider observations defined by (4.1.3) and (4.1.4).

The convergence analysis, however, can analogously be done for obser-vations (4.1.5) and (4.1.6).

Thus, the observations considered in the sequel are

where

We now define the KW algorithm with expanding truncations and

randomized differences. Let be a sequence of positive numbersincreasingly diverging to infinity, and let be a fixed point inGiven any initial value the algorithm is defined by:

where is given by (4.1.9) and (4.1.10).It is worth noting that the algorithm (4.1.9)-(4.1.12) differs from

(2.1.1)- (2.1.3) only by observations As a matter of fact, (4.1.11)and (4.1.12) are exactly the same as (2.1.1) and (2.1.2), but (4.1.9) and

Page 167: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 167/368

Page 168: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 168/368

156 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Remark  4.1.2 If is the unique minimizer of then in (4.1.11)and (4.1.12) should be replaced by

Theorem  4.1.1  Assume A4.1.1, A4.1.2, and Conditions on hold.

 Let be given by (4.1.9)-(4.1.12) (or (4.1.11)-(4.1.14)) with anyinitial value. Then

if and only if for each the random noise given by (4.1.10) can be

decomposed into the sum of two terms in ways such that 

with

and 

where is given in Conditions on

Proof. We will apply Theorem 2.2.1 for sufficiency and Theorem 2.4.1for necessity.

Let us first check Conditions A2.2.1–A2.2.4. Condition A2.2.1 is apart of A4.1.1. Condition A2.2.2 is automatically satisfied if we take

noticing that in the presented case. Condition

A2.2.4 is contained in A4.1.2. So, the key issue is to verify thatgiven by (4.1.14) satisfies the requirements.

Let and be vector functions obtained fromwith some of its components replaced by zero:

It is clear that

and

Page 169: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 169/368

Optimization by Stochastic Approximation 157

For notational convenience, let denote a genericrandom vector such that

where is specified in (4.1.1), and may vary for differentapplications.

We express given by (4.1.14) in an appropriate form to be dealtwith. We mainly use the local Lipschitz-continuity to treat the structuralerror (4.1.15) in

Rewrite the component of the structural error as follows

and for any express

where on the right-hand side of the equality all terms are cancelled exceptthe first and the last terms, and in each difference of  L, the argumentsof  L differ from each other only by one

We write (4.1.25) in the compact from:

Page 170: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 170/368

158 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Applying the Taylor’s expansion to (4.1.26) we derive

where

Similarly, we have

and

where

Page 171: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 171/368

Optimization by Stochastic Approximation 159

Define the following vectors:

Finally, putting (4.1.27)-(4.1.35) into (4.1.14) we obtain the followingexpression for

It is worth noting that each component of and is a martingaledifference sequence, because both and are independent of 

For the sufficiency part we have to show that (2.2.2) is satisfied a.s.Let us show that (2.2.2) is satisfied by all components of and

For components of we have for any

since by (4.1.1), and asTherefore, for any integer N 

for any such that converges.Thus, all sample paths of components of satisfy (2.2.2). Com-

pletely the same situation takes place for the components of 

and

Page 172: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 172/368

160 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By the convergence theorem for martingale difference sequences, wefind that for any integer N 

This is because is inde-

pendent of and is bounded by a constant uniformly with respect

to by Lipschitz-continuity of  Then the martingale convergence

theorem applies since for some by A4.1.1.

Similar argument can be applied to components of Since for anyinteger N  (4.1.38) holds outside an exceptional set with probability zero,there is an with such that for any

and

for all and  N = 1,2, ….

Therefore, for all and any integer  N 

where is given by (1.3.2).

Page 173: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 173/368

Optimization by Stochastic Approximation 161

From (4.1.17) and (4.1.18) it follows that there exists suchthat and for each

and hence

Combining (4.1.41) and (4.1.42), we find for each

This means that for the algorithm (4.1.11)-(4.1.14), Condition A2.2.3 issatisfied on Thus by Theorem 2.2.1, on This provesthe sufficiency part of the theorem.

Under the assumption a.s. it is clear that both andconverge to zero a.s. and (4.1.39) and (4.1.40) turn to be

and

Then the necessity part of the theorem follows from Theorem 2.4.1. Weshow this. By Theorem 2.4.1, can be decomposed into two parts

and such that and Let us

denote by the component of a vector Define

Then for

Page 174: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 174/368

162 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and

From (4.1.43) and (4.1.36) it follows that

This together with (4.1.44) and (4.1.45) proves the necessity of the the-orem.

Theorem 2.4.1 gives necessary and sufficient condition on the obser-

vation noise in order the KW algorithm with expanding truncations andrandomized differences converges to the unique maximizer of a function

 L. We now give some simple sufficient conditions on

Theorem  4.1.2  Assume A4.1.1 and A4.1.2 hold. Further, assume that 

is independent of 

and  satisfies one of the following two conditions:

i) where is a random variable;

ii) Then

whre is given by (4.1.9)-(4.1.12).

Proof. It suffices to prove (4.1.16)-(4.1.18). Assume i) holds. Letbe given by

By definition, is independent of and so

and

Page 175: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 175/368

Page 176: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 176/368

164 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

A2.2.3 is satisfied as shown in Theorems 4.1.1 and 4.1.2. Then theconclusion of the theorem follows from Theorem 2.2.2.

Remark  4.1.3 In the multi-extreme case, the necessary conditions on

for convergence can also be obtained on theanalogy of Theorem 2.4.2.

Remark  4.1.4 Conditions i) or ii) used in Theorem 4.1.2 are simpleindeed. However, in Theorem 4.1.2 is required to be independentof This may not be satisfied if the observation noise

is state-dependent. Taking into account that is theobservation noise when observing at and we

see that depends on and if the observationnoise is state-dependent. In this case, does depend on Thisviolates the assumption about independence made in Theorem 4.1.2.

Consider the case where the observation noise may depend on loca-tions of measurement, i.e., in lieu of (4.1.3) and (4.1.4) consider

Introduce the following condition.

A4.1.3  Both and are measurable functions

and are martingale dif-

 ference sequences for any and 

 for p specified in A4.1.1 with

where is a family of nondecreasing  independent of both

and 

Page 177: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 177/368

Optimization by Stochastic Approximation 165

Theorem 4.1.4  Let be given by (4.1.9)–(4.1.12) with a given ini-tial value  Assume A4.1.1, A4.1.2’, and A4.1.3 hold. Then

where is a connected subset of  

Proof.  Introduce the generated by andi.e.,

It is clear that is measurable with respect toand hence are Both

and are Ap-proximating and by simple functions, it is seenthat

Therefore, and aremartingale difference sequences, and

where

Hence, is a martingale difference sequence with

Noticing is bounded and as by (4.1.50) and

(4.1.51) and the convergence theorem for martingale difference sequenceswe have, for any integer N > 0

This together with (4.1.37) with replaced by (4.1.39), and (4.1.40)verifies that expressed by (4.1.36) satisfies A2.2.3. Then the con-clusion of the theorem follows from Theorem 2.2.2.

Remark 4.1.5 If   J consists of a singleton then Theorems 4.1.3 and4.1.4 ensure a.s. If   J  is composed of isolated points, then

Page 178: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 178/368

166 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

theorems ensure that converges to some point in  J . However, the

limit is not guaranteed to be a global minimizer of  Depending oninitial value, may converge to a local minimizer. We will return backto this issue in Section 4.3.

4.2.   Asymptotic Properties of KW AlgorithmWe now present results on convergence rate and asymptotic normality

of the KW algorithm with randomized differences.

Theorem  4.2.1  Assume hypotheses of Theorem 4.1.2 or Theorem 4.1.4

with and that 

 for some and as

where is stable and and are specified in (4.2.1) and (4.2.2),

respectively.

Then given by (4.1.9)–(4.1.12) satisfies

Proof.  First of all, under conditions of Theorems 4.1.2 or 4.1.4,By Theorem 3.1.1 it suffices to show that given by(4.1.36) can be represented as

where

From (4.1.28) and (4.1.31) by the local Lipschitz continuity of itfollows that

Page 179: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 179/368

Optimization by Stochastic Approximation 167

by (4.2.2). Since it follows that

Since and given by (4.1.27) and (4.1.32)are uniformly bounded for for each

where converges. By the convergence theorem for martingaledifference sequences it follows that

where and are are given by (4.1.35).

In the proof of Theorem 4.1.2, replacing by and using (4.2.2),

the same argument leads to

Then by defining

we have shown (4.2.4) under the hypotheses of Theorem 4.1.2.Under the hypotheses of Theorem 4.1.4 we have the same conclusions

about and as before. We need only to show (4.2.5). But

this follows from (4.1.52) with replaced by and the convergence

Remark  4.2.1 Let be given by (4.1.9)–(4.1.12). If and

with then conditions (4.2.1) and (4.2.2) are satisfied.

Theorem  4.2.2  Assume  A4.1.1  and  A4.1.2 hold   and   that i) and for some

ii) for some c > 0  and 

iii) is stable and for some

iv) given by (4.1.10) is an MA process:

 for 

and

Page 180: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 180/368

168 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where are real numbers and is a martingale

difference sequence which is independent of and satisfies

Then

where and 

Proof. Since it follows that and

By assumption is independent of and hence is inde-pendent of Then by (4.2.11) and the convergence theorem formartingale difference sequences we obtain (4.2.5). By Theorem 4.2.1 wehave as

and after a finite number of iterations of (4.1.11), say, for thereare no more truncations.

Since and is stable,it follows that

Let be given by

Page 181: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 181/368

Optimization by Stochastic Approximation 169

By (4.1.11), (4.1.13), (4.1.36), and condition ii) it follows that for

Let be given by

whereSince is stable, by (3.1.8) it follows that there are constants

and such that

Noticing where becauseby condition iii), we have

Page 182: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 182/368

170 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where respectively denote the five terms on the right-hand side of the first equality of (4.2.19).

By (4.2.18),

By Lemma 3.3.2, because andBy (4.1.28) and (4.1.3) it follows that and hence

by i) and (4.2.18)

where is a constant.By Lemma 3.3.2 and the right-hand side of (4.2.20) tends to

zero a.s. asTo estimate let us consider the following linear recursion

By (4.2.17) it follows that

By (4.2.11), Since and

Then by the convergence theorem for martingale differ-

ence sequences it follows that

i.e.,

Page 183: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 183/368

Optimization by Stochastic Approximation 171

Similarly,

Applying Lemma 3.1.1, we find that From (4.2.22),

it follows that

Since is an MA process driven by a martingale difference sequence

satisfying (4.2.6),

By the argument similar to that used for (4.2.21) and (4.2.22), from

Lemma 3.1.1 it follows that

Therefore, putting all these convergence results into (4.2.19) yields

By (3.3.37),

where is given by (4.2.10). By (4.2.18), from (4.2.23) and (4.2.24)

it follows that which together with the definition

(4.2.14) for proves the theorem.

Example 4.2.1 The following example of and satisfies Con-ditions i) and iii) of Theorem 4.2.2:

In this example, and

Page 184: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 184/368

172 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Remark  4.2.2 Results in Sections 4.1 and 4.2 are proved for the case,

where the two-sided randomized differences are

used where and are given by (4.1.3) and (4.1.4), respectively.

But, all results presented in Sections 4.1 and 4.2 are also valid for thecase where the one-sided randomized differences

are used, where and are given by (4.1.3) and (4.1.6), respec-

tively.

In this case, in (4.1.27), (4.1.28) and in the expression of should

be replaced by 1, and (4.1.29)–(4.1.32) disappear. Accordingly, (4.1.36)

changes to

Theorems 4.1.1-4.1.4 and 4.2.1 remain unchanged. The conclusion of 

Theorem 4.2.2 remains valid too, if in Condition iv)

changes to

4.3. Global OptimizationAs pointed out at the beginning of the chapter, the KW algorithm may

lead to a local minimizer of Before the 1980s, the random search

or its combination with a local search method was the main stochastic

approach to achieve the global minimum when the values of  L can exactly

be observed without noise. When the structural property of  L  is usedfor local search, a rather rapid convergence rate can be derived, but itis hard to escape a local attraction domain. The random search hasa chance to fall into any attraction domain, but its convergence rate

decreases exponentially as the dimension of the problem increases.Simulating annealing is an attractive method for global optimization,

but it provides only convergence in probability rather than path-wise

convergence. Moreover, simulation shows that for functions with a few

local minima, simulated annealing is not efficient. This motivates oneto combine KW-type method with random search. However, a simplecombination of SA and random search does not work: in order to reach

the global minimum one has to reduce the noise effect as time goes on.

A hybrid algorithm composed of a search method and the KW algo-rithm is presented in the sequel with main effort devoted to design eas-

Page 185: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 185/368

Optimization by Stochastic Approximation 173

ily realizable switching rules and to provide an effective noise-reducingmethod.

We define a global optimization algorithm, which consists of threeparts: search, selection, and optimization. To be fixed, let us discuss

the global minimization problem. In the search part, we choose an ini-tial value and make the local search by use of the KW algorithm withrandomized differences and expanding truncations described in Section4.1 to approach the bottom of the local attraction domain. At the sametime, the average of the observations for L is used to serve as an estimateof the local minimum of  L in this attraction domain. In the selectionpart, the estimates obtained for the local minima of  L are compared witheach other, and the smallest one among them together with the corre-

sponding minimizer given by the KW algorithm are selected. Then, theoptimization part takes place, where again the local search is carried out,i.e., the KW algorithm without any truncations is applied to improvethe estimate for the minimizer. At the same time, the correspondingminimum of  L is reestimated by averaging the noisy observations. Afterthis, the algorithm goes back to the search part again.

For the local search, we use observations (4.1.3) and (4.1.4), or (4.1.5)and (4.1.6). To be fixed, let us use (4.1.5) and (4.1.6).

In the sequel, by KW algorithm with expanding truncations we meanthe algorithm defined by (4.1.11) and (4.1.12) with

where and are given by (4.1.5) and (4.1.6), respectively. Sim-ilar to (4.1.9) and (4.1.10) we have

where

By KW algorithm we mean

with defined by (4.3.2).It is worth noting that unlike (4.1.8), is used in (4.3.1).

Roughly speaking, this is because in the neighborhood of a miminizer

of is increasing, and in (4.1.11) should be anobservation on

Page 186: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 186/368

174 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

In order to define switching rules, we have to introduce integer-valuedand increasing functions and such thatand

Define

In the sequel, by the search period we mean the part of algorithmstarting from the test of selecting the initial value up to the nextselection of initial value. At the end of the search period, weare given and being the estimates for the global minimizerand the minimum of   L,  respectively. Variables such as

and etc. in the search period are equipped by superscriptetc.

The global optimization algorithm is defined by the following fivesteps.

(GO1) Starting from at the search period, the initial value

is chosen according to a given rule (deterministic or random),

and then is calculated by the KW algorithm with expanding

truncations (4.1.11) and (4.1.12) with defined by (4.3.1), forwhich , step sizes and and used for truncation aredefined as follows:

where c > 0 and are fixed constants, andare two sequences of positive real numbers increasingly diverging toinfinity.

(GO2) Set the initial estimate for and update theestimatefor by

where is the noise when observing

After steps, is obtained.

(GO3) Let be a given sequence of real numbers such thatand as Set For if  

as

e.g.,

Page 187: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 187/368

Optimization by Stochastic Approximation 175

then set Otherwise, keepunchanged.

(GO4)  Improve to by the KW algorithm with expanding

truncations (4.1.11) and (4.1.12) with defined by (4.3.1), forwhich

where in (4.1.11) and (4.1.12) may be an arbitrary sequence of numbers increasingly diverging to infinity, and

At the same time, update the estimate for by

where is the noise when observing At the end of thisstep, and are derived.

(GO5)  Go back to (GO1) for the search period.

We note that for the search period is added to and (see(4.3.7) and (4.3.8)). The purpose of this is to diminish the effect of 

the observation noise as increases. Therefore, and both tendto zero, not only as but also as The followingexample shows that adding an increasing to the denominators of 

and is necessary.

Example  4.3.1 Let

It is clear that the global minimizer is and are twolocal minima. Furthermore, and are attractiondomains for –1 and +1, respectively.

Page 188: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 188/368

176 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Since is linear, for local search we apply the ordinary KW al-gorithm without truncation

Here, no randomized differences are introduced, because this is a one-dimentional problem.

Assume

where

and and are mutually independent and both are sequences of  iid random variables with

Let us start from (GO1) and take

(not tending to infinity),

If then, by noticing one of andmust belong to Elementary calculation shows that

Paying attention to (4.3.13), we see

and

i.e.,

Page 189: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 189/368

Page 190: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 190/368

178 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

A4.3.2

A4.3.3 For any convergent subsequence of 

where denotes given by (4.3.3) with replaced by

denotes used for the ¢ search period, and 

A4.3.4 For any convergent subsequence

where is given by (1.3.2).

It is worth emphasizing that each in the sequence

is used only once when we form andWe now give sufficient conditions for A4.1.2, A4.3.3, and A4.4.4. For

this, we first need to define generated by estimates andderived up-to current time. Precisely, for running in the search

period of Step (GO1) define

and for running in Step (GO4) define

Page 191: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 191/368

Optimization by Stochastic Approximation 179

Remark  4.3.1 If both sequences

and are martingale difference sequences with

and if 

for some then A4.3.2 holds.

This is because

is a maringale difference sequence with bounded second conditional mo-ment, and hence

which implies (4.3.15).By using the second parts of conditions (4.3.22) and (4.3.23), (4.3.16)

can be verified in a similar way.

Remark 4.3.2 If  and is independent of 

and if exists

such that then by the uncorrelatedness of 

with for or

Page 192: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 192/368

Page 193: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 193/368

Optimization by Stochastic Approximation 181

where

Assume, further, for fixed

Lemma  4.3.1  Assume L( J )  is nowhere dense, where

 Let be a nonempty interval such If there are

two sequences and such that 

and is bounded, then it is impossible to have

where

Proof.  Without loss of generality we may assume converges as

otherwise, it suffices to select a subsequence.Assume the converse: i.e., (4.3.28) holds. Along the lines of the proof 

for Theorem 2.2.1 we can show that

for some constant  M  if is sufficiently large. As a matter of fact, this is

an analogue of (2.2.3). From (4.3.29) the following analogue of (2.2.15)

takes place:

and the algorithm for has no truncation forif is large enough, where is a constant. Similar to

(2.2.27), we then have

and

and

Page 194: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 194/368

182 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

for some small T  > 0 and all sufficiently large

From this, by (4.3.27) and convergence of it follows that

By continuity of and (4.3.30) we have

which implies that for small enough T .Then by definition,

which contradicts (4.3.32). The obtained contradiction shows the im-

possibility of (4.3.28).

Introduce

such that

and

Lemma  4.3.2  Let be given by (GO1). Assume

 A4.3.1 and A4.3.3 hold and for some Then

 for any may occur infinitely many often with

 probability 0, i.e.,

Proof.  Since L( J ) is nowhere dense, for any belonging to infinitely

many of there are subsequences such that

and

whereand

Page 195: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 195/368

Optimization by Stochastic Approximation 183

By assumption as must be bounded.

Hence, is bounded. Without loss of generality we may assume

that is convergent.

Notice that at Step (GO1), is calculated according to (4.1.11)and (4.1.12) with given by (4.3.2) and (4.3.3), i.e.,

which differ from (4.1.11) (4.1.12), (4.3.2), and (4.3.3) by superscript (i),which means the calculation is carried out in the search period.

By (4.1.27) with notations (4.1.33) and (4.1.34), equipped by super-script we have

where

If we can show that and

Page 196: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 196/368

184 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where

then by Lemma 4.3.1, (4.3.42) contradicts with that all sequences

cross the interval which is disjoint with L ( J ) .

This then proves (4.3.36).We now show for all sufficiently large if  T   is small

enough.

Since and are finite, where

We now show that on the

if is sufficiently large and  T   is smallenough.

Suppose the converse: for any fixed T > 0, there always exists

whatever large is taken such thatSince by continuity of there is a constant

q > 0 such that

For any let us estimate By

and the local Lipschitz continuity of it is seen that

is uniformly bounded with respect to and allThen by A4.3.3, it follows that there is a constant such that

From this it follows that there is no truncation forand

Let T   be so small that

Page 197: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 197/368

Optimization by Stochastic Approximation 185

On the other hand, however, we have andThe obtained contradiction shows for all sufficiently

large if  T  is  small enough.

We now prove (4.3.42). Let us order in the following way

From (4.1.34) and by the fact that is an iid sequence and isindependent of sums appearing in (4.1.34), it is easy to be convincedthat is a martingale difference sequence.

By the condition for some it is clearthat for with being a constant. Then we have

By (4.1.28) and (4.3.8), we have

where is a constant. Noticing that for large andsmall T , by (4.3.44),(4.3.45), and A4.3.3 we may assume sufficientlylarge and T small enough such that

This will imply (4.3.42) if we can show

We prove (4.3.47) by induction.We have by definition of Assume that

and by the convergence theorem for martingale difference sequences

Page 198: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 198/368

186 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and Then there is no truncation at timesince by (4.3.46) (with chosen such that

if  T   in (4.3.46) is sufficiently small.

Then by (4.3.40), we have

and by (4.3.43) and (4.3.46)

for small T . This completes induction, and (4.3.42) is proved, which, in

turn, concludes the lemma.

Lemma  4.3.3  Assume A4.3.1–A4.3.3 hold. Further, assume that 

 for some and If there

exists a subsequence such that then

Proof. For any by Lemma 4.3.2 there exists such that for

any if By (GO2),

we have

Then by A4.3.2, there exists such that, for any

This implies the conclusion of the lemma by the arbitrariness of 

Page 199: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 199/368

Optimization by Stochastic Approximation 187

Lemma  4.3.4  Assume A4.3.1–A4.3.3 hold, for 

some and If subsequence is such that  

then

where denotes the closure of L( J ) , and  and are

given by (GO1) and (GO2) for the search period.

Proof.  Since by A4.3.1, for (4.3.50) it is

seen that contains a bounded infinite subsequence, and hence, a

convergent subsequence (for simplicity of notation, assume

such that

Since there exists a such that

and hence

Define

It is worth noting that for any  T   > 0, is well defined for all

sufficiently large because and hence

We now show that

By the same argument as that just used before, without loss of gen-

erality, we may assume is convergent (otherwise, a convergentsubsequence should be extracted) and thus

We have to show

as

Page 200: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 200/368

188 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By the same argument as that used for deriving (2.2.27), it followsthat there is such that

which implies the correctness of (4.3.53).From (4.3.53) it follows that

because, otherwise, we would have a subsequence with

such that and by (4.3.54)

for large However, by (2.2.15), so for smallenough T  > 0, (4.3.56) is impossible. This verifies (4.3.55).

We now show

Assume the converse, i.e.,

From (4.3.54) and (4.3.58) it is seen that for all sufficiently large thesequence

contains at least a crossing the interval withIn other words, we are dealing with a sample path on which both(4.3.54) and (4.3.58) are satisfied. Thus, belongs to ByLemma 4.3.2, the set composed of such is with zero probability. This

verifies (4.3.57).From (4.3.57) it follows that

for all sufficiently large

Page 201: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 201/368

Optimization by Stochastic Approximation 189

Notice that from the following elementary inequalities

by (4.3.5) it follows that

By definition of we write

By (4.3.59) and (4.3.61), noticing we have

because

By (4.3.55) and (4.3.61) we have

Since by (4.3.15), combining (4.3.62)–(4.3.64)

leads to

Page 202: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 202/368

190 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

which completes the proof of the lemma.

Lemma 4.3.5  Let  be given by (GO1)–(GO5). Assume that A4.3.1– 

 A4.3.4 hold, initial values selected in (GO1) are dense in an open

set U containing the set  of global minima of  for some and Then for any

Proof. Among the first search periods denote by the number of those search periods for which are reset to be i.e.,

Since L( J ) is not dense in any interval, there exists an intervalsuch that So, for lemma it suffices to prove

that cannot cross infinitely many times a.s.If  then after a finite number of steps, is generated

by (GO4). By Lemma 4.3.1 the assertion of the lemma follows immedi-ately. Therefore, we need only to consider the case where

Denote by the search period for which a resetting happens, i.e.,It is clear that by

In the case by (GO4) the algorithm generates a family

of consecutive sequences:

Let us denote the sequence by

and the corresponding sequence of the values of by

Let be sufficiently small such that

Page 203: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 203/368

Optimization by Stochastic Approximation 191

and which is possible because  L( J ) isnowhere dense.

Since is dense in U, visits infinitely often. Assume

By Lemma 4.3.2

if is large enough.Define

This means that the first resetting in or after the search periodoccurs in the search period.We now show that there is a large enough such that the

following requirements are simultaneously satisfied:

where is fixed;

We first show ii)-v).Since all three intervals indicated in ii) have an empty intersection

with L( J ), by Lemma 4.3.1 ii) is true if  S   is large enough. It is clear

i) implies

ii) does not cross intervals

and

iii)

vi)

v)

Page 204: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 204/368

192 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

that iii) and vi) are correct for fixed and if is large enough,while v) is true because

For i) we first show that there are infinitely many for which

By (4.3.68) and (4.3.71) we have

Consider two cases.1) There is no resetting in the search period. Then

and by (4.3.72) and (4.3.74) it follows that

By (4.3.70) and the definition of there exists at least an integeramong such that

because, otherwise, we would have which contradicts(4.3.74).

By ii) we conclude that

and by (4.3.68) we also have (4.3.76).From (4.3.76), by ii) does not cross for

Consequently,

This together with (4.3.70) implies that

and, in particular,2) If there is a resetting in the search period, then

Page 205: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 205/368

Optimization by Stochastic Approximation 193

By (GO3) we then have

Noticing as we conclude that there are infinitelymany for which (4.3.73) holds.

We now show that there is a such that

where lim sup is taken along those for which (4.3.73) holds.Assume the converse: there is a subsequence of such that

Then by Lemma 4.3.4,

which contradicts (4.3.73). This proves (4.3.78), and also i). As a matter

of fact, we have proved more than i): Precisely, we have shown that thereare infinitely many for which (4.3.73) holds, and for (4.3.73)implies the following inequality:

Let us denote by the totality of those for which (4.3.73)holds and What we have just proved is that contains infinitely

many if Consider a sequence By ii) it cannot cross the interval

This means that

Then by (4.3.70)

and by  (GO3)

Page 206: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 206/368

194 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

since is a search period with resetting.Thus, we have shown that if then also be-

longs to Therefore, and

From here and (4.3.67) it follows that

Since may cross the interval only forfinite number of times by Lemma 4.3.1. This completes the proof of thelemma.

Proof of Theorem 4.3.1.By Lemma 4.3.5 the limit exists. By arbitrariness of from (4.3.69) it follows that

By continuity of we conclude that

4.4. Asymptotic Behavior of Global OptimizationAlgorithm

In last section a global optimization algorithm combining the KW al-

gorithm with search method was proposed, and it was proved that thealgorithm converges to the set of global minimizers, i.e.,However, in the algorithm defined by (GO1)–(GO5), reset-

tings are involved. The convergence by no means

excludes the algorithm from resettings asymptotically. In other words,although it may still happen that

where is defined in Lemma 4.3.5, i.e., it may still be possible to have

infinitely many resettings.In what follows we will give conditions under which

In this case, the global optimization algorithm (GO1)–(GO5) asymp-totically behaves like a KW algorithm with expanding truncations andrandomized differences, because for large is purely generated by(GO4) without resetting.

a.s.

a.s.,

a.s.

a.s.

Page 207: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 207/368

Optimization by Stochastic Approximation 195

A4.4.1  is a singleton,  is twice continuously differentiable

in the ball centered at with radius  for some and 

of is positive definite.

A4.4.2  and ordered as in (4.3.20) (4.3.21) and  Remark 4.3.1 are martingale difference sequences with

A4.4.3 is independent of 

for and

and 

for

We recall that is the observation noise in thesearch period.

A4.4.4  is independent of and where

denotes the observation noise when is calculated  

in (Go4).

Lemma  4.4.1  Assume A4.4.2 holds and, in addition,

Then, there exists an (maybe depending on such that for any

and 

and

Page 208: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 208/368

196 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Proof. Notice that by A4.4.2 is a martingale

difference sequence with bounded conditional variance. By the conver-

gence theorem for martingale difference sequences

which implies (4.4.2).

Estimate (4.4.3) can be proved by a similar way.

Lemma  4.4.2  Assume A4.4.3 and A4.4.4 hold. If  for some then

and 

 for where and are given in (4.1.34), where super-script denotes the corresponding values in the ith search period.

Proof.  Let us prove

Note that

is a martingale difference sequence with bounded conditional secondmoment. So, by the convergence theorem for martingale difference se-quences for (4.4.6) it suffices to show

Page 209: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 209/368

Optimization by Stochastic Approximation 197

By assumption of the lemma or and

for large The last inequality yields

and hence

Therefore,

Thus, (4.4.6) is correct. As noted in the proof of Lemma 4.3.2, isa martingale difference sequence. So, (4.4.4) is true.

Similarly, (4.4.5) is also verified by using the convergence theorem for

martingale difference sequences.

Lemma  4.4.3  In addition to the conditions of Theorem 4.3.1, suppose

that A4.4.1 and A4.4.3 hold, is positive definite, and  

 for some Then there exists a sufficiently large such

that, for if the inequality

holds for some with then the following inequality holds

Proof. . By A4.4.1 and the Taylor’s expansion, we have

i.e.,

Page 210: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 210/368

198 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where

Therefore, for any there is a such that for any

and

where and denote the minimum and maximum eigenvalue of 

 H, respectively, and o(1) is the one given in (4.4.10).

Since is the unique minimizer of and is continuous, there

is such that if We always assumethat is large enough such that

and

where is used in (GO1). From (4.4.8) it then follows thatand there is no truncation at time

Denote

For satisfying (4.4.8) and we have

where is given by (4.3.41).

Page 211: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 211/368

Optimization by Stochastic Approximation 199

By (4.4.11) it then follows that

where is given by (4.1.33) with superscript denoting thesearch period and

By (4.4.14) it is clear that

Let

For (4.4.9) it suffices to show thatAssume the converse:Let

By (4.4.20), for all

and hence,

Thus, (4.4.12)-(4.4.14) are applicable.

Page 212: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 212/368

200 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By (4.4.17) and the second inequality of (4.4.13), we have for

which incorporating with (4.4.21) yields

Applying the first inequality of (4.4.13) and then (4.4.20) leads to

Since for there is no truncation forUsing (4.4.18) we have

where

We now show that is negative for all sufficiently largeLet us consider terms in By assumption,

from (4.4.19) and (4.4.22) it follows that

Page 213: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 213/368

Optimization by Stochastic Approximation 201

We now estimate the second term on the right-hand side of (4.4.25)after multiplying it by

From (4.4.4) and (4.4.16) it follows that

uniformly with respect to and withNoticing that with being a constant,

and that  which implies we find

Then, noticing that is bounded by some constantwe have

For the third term on the right-hand side of (4.4.25), multiplying it bywe have

where is a constant.Finally, for the last term of (4.4.25) we have the following estimate

Combining (4.4.26)–(4.4.30) we find that

where

and for large

Page 214: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 214/368

202 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Consequently, from (4.4.25) it follows that

We now show that

by induction.Assume it holds for i.e.,

which has been verified for We have to show it is true for

By (4.4.18) we have

and

Page 215: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 215/368

Optimization by Stochastic Approximation 203

where

Comparing (4.4.35) with (4.4.25), we find that in lieu of and

we now have and respec-

tively. But, for both cases we use the same estimate (4.4.27). Therefore,completely by the same argument as (4.4.26)–(4.4.30), we can prove that

and for large

Thus, we have proved (4.4.32).By the elementary inequality

for which is derived from

for any matrices A and B of compatible dimensions, we derive

Page 216: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 216/368

204 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

from (4.4.32)

As mentioned before, for and there is notruncation. Then by (4.4.18)

where

Then from (4.4.36) and (4.4.27) it follows that

where

which tends to zero as by (4.4.27) and (4.4.38).

Then

where for the last equality (4.4.10) is used.Finally, by (4.4.21), for large from (4.4.39) it follows that

Page 217: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 217/368

Optimization by Stochastic Approximation 205

which incorporating with (4.4.10) yields

This contradicts (4.4.20), the definition of The contradictionshows

Theorem  4.4.1  Assume that A4.3.1, A4.4.1–A4.4.4 hold, and 

is positive definite for some

Further, assume that 

and for some constants

Then the number of resettings is finite, i.e.,

where is the number of resettings among the  first   search periods

(GO1), and is given in (GO3).

Proof.  If (4.4.44) were not true, then there would be an S  with positive

probability such that, for any there exists a subsequencesuch that at the search period a resetting occurs, i.e.,

Notice that

by(4.4.41) and and

Page 218: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 218/368

206 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

by (4.4.41) and (4.4.42). Hence, conditions of Lemma 4.4.1 are satisfied.Without loss of generality, we may assume that (4.4.2)–(4.4.5) and theconclusion of Theorem 4.3.1 hold From now on assume that

is fixed.

It is clear that, for any constant

if is large enough, since forLet

Rewrite (4.4.46) as

Define

and

Noticing that there is no resetting between and and (4.4.47)

corresponds to (4.4.8), by the same argument as that used in the proof of Lemma 4.4.3, we find that, for any

Since we have

Page 219: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 219/368

Optimization by Stochastic Approximation 207

By (4.4.3) (4.4.42) and (4.4.43) it follows that

where for the last inequality (4.4.41) is used.

Thus, by (4.4.40)

By (4.4.33) it follows that

provided is large enough, where for the last inequality, (4.4.2) is

used.

Since by (4.4.43)

and since

and

Page 220: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 220/368

208 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

we find

where the last inequality follows from (4.4.40).Using (4.4.51) and (4.4.53), from (4.4.52) for sufficiently large we

have

Using the second inequality of (4.4.43) and then observing that

and

by (4.4.40) and (4.4.41) and we find

We now show that there is such that

Assume the converse:

with

Page 221: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 221/368

Optimization by Stochastic Approximation 209

Then, we have

for large enough because

Inequality (4.4.57) contradicts (4.4.55). Consequently, (4.4.56) is true.In particular, for we have

Completely by the same argument as that used for (4.4.47)–(4.4.50), by

noticing that there is no resetting from to we concludethat

By the same treatment as that used for deriving (4.4.54) from (4.4.50),we obtain

Comparing (4.4.58) with (4.4.54), we find that has been changed toand this procedure can be continued if the number of resettings

Page 222: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 222/368

210 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

is infinite. Therefore, for any we have

From (4.4.40) we see

Since we have and hence by

Consequently, by (4.4.41) the right-hand side of (4.4.59) can be esti-mated as follows:

by (4.4.61) if   is large enough.However, the left-hand side of (4.4.59) is nonnegative. The obtained

contradiction shows that must be finite, and (4.4.44) is correct.By Theorem 4.4.1, our global optimization algorithm coincides with

KW algorithm with randomized differences and expanding truncationsfor sufficiently large Therefore, theorems proved in Section 4.2 are

applicable to the global optimization algorithm. By Theorems 4.2.1 and4.2.2 we can derive convergence rate and asymptotic normality of thealgorithm described by (GO1)–(GO5).

4.5. Application to Model ReductionIn this section we apply the global optimization algorithm to system

modeling. A real system may be modeled by a high order system which,however, may be too complicated for control design. In control engineer-

ing the order reduction for a model is of great importance. In the linearsystem case, this means that a high order transfer function is to be

approximated by a lower order transfer function. For this one may usemethods like the balanced truncation and the Hankel norm approxima-

tion. These methods are based on concept of the balanced realization.We are interested in recursively estimating the optimal coefficients of the

Page 223: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 223/368

Optimization by Stochastic Approximation 211

reduced model by using the stochastic optimization algorithm presentedin Section 4.3.

Let the high order transfer function be

and let it be approximated by a lower order transfer function

If is of order then is taken to be of order

To be fixed, let us take to be a polynomial of orderand of order

where coefficients should not be confused with stepsizes used in Steps (GO1)-(GO5). Write as where

and stand for coefficients of and

It is natural to take

as the performance index of approximation. The parameters and areto be selected to minimize under the constraint that

is stable. For simplicity of notations we denote and writeas

Let us describe the where has the required property.Stability requires that

This implies that

because is the sum of two complex-conjugate roots of 

If  then which yields   If 

then and hence

(or ).

Page 224: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 224/368

212 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Set

Identify and appeared in Section 4.3 toand respectively for the present case.

We now apply the optimization algorithm (GO1)–(GO5) to minimiz-ing under constraint that the parameter in belongs to D. For this we first concretize Steps (GO1)–(GO5) described in Section4.3.

Since is convex in for fixed   we take the fixed initial value

for any search period and randomly select initial valuesonly for  according to a distribution density which is defined asfollows:

where with and being the uniform dis-tributions over [ – 2,2] and – 1,1], respectively.

After having been selected in the search period, the algorithm

(4.1.11) and (4.1.12) is calculated with and

As to observations, in stead of (4.3.1) we will use information

about gradient because in the present case the gradientof can explicitly be expressed:

In the search period the observation is denoted by and is givenby

where is independently selected from according to the uniform

distribution, and stands for the estimate for at time in the

search period. It is clear that is an approximation to the integral

Page 225: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 225/368

Page 226: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 226/368

214 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where are independently selected from according to the uni-form distribution for each Clearly, is an approximation to

Finally, take equal to

In control theory there are several well-known model reduction meth-ods such as model reduction by balanced truncation, Hankel norm ap-proximation among others. These methods depend on the balanced re-alization which is a state space realization method for a transfer matrix

keeping the Gramians for controllability and observability of therealized system balanced. In order to compare the proposed global op-timization (GO) method, we take the commonly used model reductionmethods by balanced truncation (BT) and Hankel norm approximation

(HNA), which, are realized by using Matlab. For this, the discrete-timetransfer functions are transformed to the continuous time ones byusing d2c provided in Matlab. Then the reduced systems are discretizedto compute for comparison.

As we take a 10th order transfer function respec-tively for the following examples:

Example 4.5.1

Example 4.5.2

Example 4.5.3

Using the algorithm described in Section 4.3, for Examples 4.5.1-4.5.3we obtain the approximate transfer functions of order 4, respectively,

Page 227: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 227/368

Optimization by Stochastic Approximation 215

denoted by and with

Using Matlab we also derive the 4th order approximations for Exam-ples 4.5.1–4.5.3 by balanced truncation and Hankel norm approximation,which are as follows:

where the subscripts and  H   denote the results obtained by balancedtruncation and Hankel norm approximation, respectively.

The approximation errors are given in the following table:

From this table we see that the algorithm presented in Section 4.3gives less approximation errors in in comparison with othermethods.

We now compare approximation errors in norm and compare step

responses between the approximate models and the true one by figures.In the figures of step response

the solid lines denote the true high order systems;

the dashed lines (- - -) denote the system reduced by Hankel normapproximation;

Page 228: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 228/368

216 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

the dotted lines denote the system reduced by balanced trun-cation;

The dotted-dashed lines denote the systems reduced by the

stochastic optimization method given in Section 3.In the figures of the approximation error

the solid lines denote the systems reduced by the stochasticoptimization method;

the dashed lines (- - -) denote the system reduced by Hankel norm

approximation;the dotted lines denote the system reduced by balanced trun-cation.

Page 229: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 229/368

Optimization by Stochastic Approximation 217

Example 4.5.1

Example 4.5.2

Example 4.5.3

These figures show that the algorithm given in Section 4.3 gives lessapproximation error in in comparison with other methodsfor Example 4.5.1 and the intermediate error in for Exam-ples 4.5.2 and 4.5.3. Concerning step responses, the algorithm givenin Section 4.3 provides better approximation in comparison with othermethods for all three examples.

Page 230: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 230/368

Page 231: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 231/368

Chapter 5

APPLICATION TO SIGNAL PROCESSING

The general convergence theorems developed in Chapter 2 can deal

with noises containing not only random components but also structural

errors. This property allows us to apply SA algorithms to parameterestimation problems arising from various fields. The general approach,roughly speaking, is as follows. First, the parameter estimation problemcoming from practice is transformed to a root-seeking problem for a rea-sonable but unknown function which may not be directly observed.

Then, the real observation is artificially written in the standardform

with Normally, it is quite straightforward to arrive

at this point. The main difficulty is to verify that the complicated noise

satisfies one of the noise conditions required in the

convergence theorems. It is common that there is no standard method to

complete the verification procedure, because for different problemsare completely different from each other.

In Section 5.1, SA algorithms are applied to solve the blind channelidentification problem, an active topic in communication. In Section 5.2,the principle component analysis used in pattern classification is dealt

with by SA methods. Section 5.3 continues the problem discussed in

Section 5.1, but in more general setting. Namely, unlike Section 5.1,the covariance matrix of the observation noise is no longer assumed to

be known. In Section 5.4, adaptive filtering is considered: Very simpleconditions for convergence of sign-algorithms are given. Section 5.5 dis-

cusses the asymptotic behavior of asynchronous SA algorithms, which

take the possible communication delays between parallel processors intoconsideration.

219

Page 232: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 232/368

220 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

5.1. Recursive Blind IdentificationIn system and control area, the unknown parameters are estimated on

the basis of observed input and output data of the system. This is thesubject of system identification. In contrast to this, for communicationchannels only the channel output is observed and the channel input is un-available. The topic of blind channel identification is to estimate channelparameters by using the output data only. Blind channel identificationhas drawn much attention from researchers because of its potential ap-plications in wireless communication. However, most existing estimationmethods are “block” algorithms in nature, i.e., parameters are estimatedafter the entire block of data have been received.

By using the SA method, here a recursive approach is presented: Es-timates are continuously improved while receiving new signals.

Consider a system consisting of channels with L being the maximumorder of the channels. Let be the one-dimensionalinput signal, and be the channel out-put at time where  N   is the number of samplesand may not be  fixed:

where

are the unknown channel coefficients.Let us denote by

the coefficients of the channel, and by

the coefficients of the whole system which compose avector.

The observations may be corrupted by noise

where is a vector. The problem is to estimate   onthe basis of observations.

Page 233: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 233/368

 Application to Signal Processing 221

Let us introduce polynomials in backward-shift operator

whereWrite and in the component forms

respectively, and express the component via

From this it is clear that

Define

where is a

It is clear that is a xSimilar to and let us define and and and which

have the same structure as and but with replaced by and

respectively.

Page 234: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 234/368

222 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By (5.1.5) we have

From (5.1.8), (5.1.4), and (5.1.10) it is seen that

This means that the channel coefficient satisfies the set of linearequations (5.1.12) with coefficients being the system outputs.

From the input sequence we form the( N – 2 L + 1) × (2 L + 1)-Hankel matrix

It is clear that the maximal rank of is 2 L + 1 as

If is of full rank for some then will

also be of full rank for anyLemma  5.1.1  Assume the following conditions hold:

A5.1.1  have no common root.

A5.1.2  The Hankel matrix composed of input signal is of 

 full rank (rank=2 L + 1).

Then is the unique up to a scalar multiple nonzero vector simulta-

neously satisfying

Proof.  Assume there is another solution to(5.1.14), which is different from

where isDenote

Page 235: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 235/368

 Application to Signal Processing 223

From (5.1.15) it follows that

By (5.1.7), we then have

which implies

where by we denote the (2 L + 1)-dimensional vector composed

of coefficients of the polynomial written inthe form of increasing orders of 

Since is of full rank, In other words,

For a 

fixed 

(5.1.17) is valid for all Therefore, allroots of should be roots of for all By A5.1.1,

all roots of must be roots of Consequently, there is a

constant such that Substitutingthis into (5.1.17) leads to

and hence Thus, we conclude that

We first establish a convergence theorem for blind channel identifica-tion based on stochastic approximation methods for the case where anoise-free data sequence is observed.

Then, we extend the results to the case where  N   is not fixed and

observation is noise-corrupted.Assume is observed. In this case

are available, and we have We will repeatedlyuse the data by setting

Page 236: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 236/368

224 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Define estimate for recursively by

with an initial valueWe need the following condition.

Theorem 5.1.1  Assume A5.1.1–A5.1.3 hold. Let be given by

(5.1.19) with any initial value with Then

where is a constant.

Proof.  Decompose and respectively into orthogonalvectors:

where

If serves as the initial value for (5.1.19), then by (5.1.14),Again, by (5.1.14) we have

and we conclude that

and

Therefore, for proving the theorem it suffices to show thatas

Denote

Page 237: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 237/368

 Application to Signal Processing 225

andThen by (5.1.21) we have

Noticing that and is uniformly bounded with respect tofor large we have

andBy (5.1.18)

and by Lemma 5.1.1, is its unique up to a constant multiple eigenvec-

tor corresponding to the zero eigenvalue, and the rank of 

is

Denote by the minimal nonzero eigenvalue of 

Let be an arbitrary vector orthogonal to

Then can be expressed by

where  – 1, are the unit eigenvectors of 

corresponding to its nonzero eigenvalues.

It is clear that

By this, from (5.1.23) and (5.1.24), it follows that for

Page 238: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 238/368

226 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and

Noticing that

we conclude

and hence

From (5.1.21) it is seen that is nonincreasing forHence, the convergence implies that

The proof is completed.

Remark 5.1.1 If the initial value is orthogonal to thenand (5.1.20) is also true. But this is a non-interesting case giving

no information about

Remark 5.1.2 Algorithm (5.1.19) is an SA algorithm with linear time-varying regression function The root set  J   for istime-invariant: As mentioned above, evolves in

one of the subspaces depending on the initialvalue: In the proof of Theorem 5.5.1we have actually verified that may serve as the Lyapunov function

satisfyingA2.2.20

for Then applying Remark 2.2.6 also leads tothe desired conclusion.

We now assume the input signal is a sequence of infinitely manymutually independent random variables and that the observations donot contain noise, i.e., in (5.1.5).

Lemma 5.1.2  Assume A5.1.1 holds and is a sequence of mutually

independent random variables with Then is

the unique unit eigenvector corresponding to the zero eigenvalue for the

matrices

Page 239: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 239/368

 Application to Signal Processing 227

and the rank of is

Proof. Since is a sequence of mutually independent random vari-ables and it follows that

where

Proceeding along the lines of the proof of Lemma 5.1.1., we arrive at theanalogue of (5.1.16):

which implies

From (5.1.28) and (5.1.29) it follows that Then followingthe proof of Lemma 5.1.1, we conclude that is the unique unit vectorsatisfying

This shows that is of rank and isits unique unit eigenvector corresponding to the zero eigenvalue.

Let denote the minimal nonzero eigenvalue of Onwe need the following condition.

A5.1.4 is a sequence of mutually independent random variables

with for some and such that 

Condition A5.1.3 is strengthened to the following A5.1.5.

A5.1.5  A5.1.3 holds and where is given in A5.1.4.

Page 240: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 240/368

228 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

It is obvious that if is an iid sequence, then is a positiveconstant, and (5.1.30) is automatically satisfied.

Theorem  5.1.2  Assume A5.1.1, A5.1.4, and A5.1.5 hold, and is

given by (5.1.19) with initial value Then

where

Proof.  In the present situation we still have (5.1.21) and (5.1.22). So,it suffices to show

With N  replaced by 4 L  in the definitions of and we again arriveat (5.1.23).

Since

converges a.s. by A5.1.4 and A5.1.5, there is a large suchthat

Let be an arbitrary vector such that

Then by Lemma 5.1.2,

and hence

Therefore, which

tends to zero since This implies

is bounded, and

a.s.,

Page 241: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 241/368

 Application to Signal Processing 229

We now consider the noisy observation (5.1.5). By the definition

(5.1.11), similar to (5.1.9) we have

where and have the same structure as given by (5.1.10) with

replaced by and , respectively.The following truncated algorithm is used to estimate

with initial value andIntroduce the following conditions.

A5.1.6 and are mutually independent and each of them is asequence of mutually independent random variables (vectors) such that 

and 

 for some

and where is given in A5.1.4.

Set

Then

Denote by the resetting times, i.e.,

Then, we have

A5.1.6

Page 242: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 242/368

230 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and

Let be an orthogonal matrix, where

Denote

Then

Noticing we find that

Lemma 5.1.3  Assume A5.1.6 and A5.1.7 hold. Then for given by(5.1.32),

Proof.  Setting

we have

and

Page 243: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 243/368

Page 244: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 244/368

232 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Proof. Since is a sequence of mutually independent nondegenerate

random variables, where

Notice that coincides with given by (5.1.13) if setting  N = 4 L and in (5.1.13).

Proceeding as the proof of Lemma 5.1.1, we again arrive at (5.1.16).

Then, we have Since

we find that Then bythe same argument as that used in the proof of Lemma 5.1.1, we con-

clude that for any is the unique unit nonzero vector simultaneouslysatisfying

Since is a matrix, the above assertion

proves that the rank of is and also

proves that is its unique unit eigenvector corresponding to the zeroeigenvalue.

Denote by the minimal nonzero eigenvalue of 

We need the following condition.

A5.1.8  There is a such that 

It is clear that if is an iid sequence, then is independentof and and A5.1.8 is automatically satisfied.

Lemma  5.1.6  Assume A5.1.1 and A5.1.6–A5.1.8 hold. Then for any

Page 245: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 245/368

Page 246: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 246/368

234 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

which incorporating with (5.1.44) leads to

for large enough

where and

Theorem  5.1.3  Assume A5.1.1 and A5.1.6–A5.1.8 hold. Then for 

given by (5.1.32) with initial valueand 

where is a random variable expressed by (5.1.60).

Proof.  We first prove that the number of truncations is finite, i.e.,

a.s.Assume the converse:

By Lemma 5.1.3, for any given

and

as

Page 247: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 247/368

 Application to Signal Processing 235

if is large enough, say,By the definition of we have

which incorporating with (5.1.52) implies

and

Define

Since is well-defined

by (5.1.54). Notice that from to there is no truncation. Con-sequently,

and

Page 248: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 248/368

236 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

To be fixed, let us takeFrom (5.1.52) and (5.1.54) it follows that sequences

starting from cross the intervalfor each This means that

crosses interval for eachHere, we call that the sequence

crosses an interval with if and

there is no truncation in the algorithm (5.1.32) forWithout loss of generality, we may assume converges:

It is clear that andBy Lemma 5.1.4, there is no truncation forif  T  is small enough.

Then, similar to (2.2.24), for large by Lemmas 5.1.3 and 5.1.4 wehave

where and

By Lemma 5.1.6, for large and small  T  we have

By Lemma 5.1.4 Noticing that

and by definition of crossing wesee that for small enough  T ,

Page 249: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 249/368

 Application to Signal Processing 237

This implies that

Letting  in (5.1.57), we find that

which contradicts (5.1.58). The contradiction shows that

Thus, starting from the algorithm (5.1.32) suffers from no truncation.If did not converge as then

and would cross a nonempty interval

infinitely often. But this leads to a contradiction as shown above. There-fore, converges as

If were not zero, then there would exist a convergent

subsequence Replacing in (5.1.56) by from

(5.1.57) it follows that

Since converges, the left-hand side of (5.1.59) tends to zero,which makes (5.1.59) a contradictory inequality. Thus, we have proved

a.s.

Since from (5.1.40) it follows that

By (5.1.38) and the fact that we finally conclude that

The difficulty of applying the algorithm (5.1.32) consists in that thesecond moment of the noise may not be available. Identificationof channel coefficients without using will be discussed in Sec-tion 5.3, by using the principal component analysis to be described inthe next section.

a.s.

Page 250: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 250/368

238 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

5.2. Principal Component AnalysisThe principal component analysis (PCA) is one of the basic methods

used in feature extraction, signal processing and other areas. Roughly

speaking, PCA gives recursive algorithms for finding eigenvectors of asymmetric matrix  A  based on the noisy observations on  A.

Let be a sequence of observed symmetric matrices, andThe problem is to find eigenvectors of  A,  in particular,

the one corresponding to the maximal eigenvalue.Define

with initial value being a nonzero unit vector. serves as anestimate for unit eigenvector of  A.

If then is reset to a different vector with norm equalto 1.

Assume have been defined as estimates for unit

eigenvectors of  A.  Denote which isan where

where denotes the pseudo-inverse of Since for largeis a full-rank matrix,

Define

if withIf we redefine an with such that

Define the estimate  for the eigenvalue corresponding to the

eigenvector whose estimate at time is by the following recursion.

Page 251: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 251/368

 Application to Signal Processing 239

Take an increasingly diverging to infinity sequence

and define by the SA algorithm with expanding truncations:

where

We will use the following conditions:

A5.2.1 and 

A5.2.2 are symmetric, and 

A5.2.3 and 

where is given by (1.3.2).

Examples for which (5.2.8) is satisfied are given in Chapters 1 and 2.

We now give one more example.

Example 5.2.1 Assume is stationary and ergodic,If then satisfies (5.2.8). Set By

ergodicity, we have a.s. By a partial summation it follows

that

which implies (5.2.8).

Let be the unit eigenvector of  A  corresponding to eigenvalue

where may not be different.

Page 252: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 252/368

240 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Theorem  5.2.1  Assume A5.2.1 and A5.2.2 hold. Then given by

(5.2.1)–(5.2.6) converges at those samples for which A5.2.3 holds,

and the limits of coincide with

 Let denote the limit of as Then

Proof.  Consider those for which A5.2.3 holds. We first prove con-

vergence of Note that may happen only for a finite

number of steps because as and By

boundedness of we expand into the power series of  

where

Further, we rewrite (5.2.9) as

where

Page 253: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 253/368

Page 254: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 254/368

242 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Denote by  S   the unit sphere in Then defined by (5.2.2)

evolves on  S.Define

The root set of on S  is

Defining we find for

Thus, Condition A2.2.2(S) introduced in Remark 2.2.6 is satisfied.

Since is bounded, no truncation is needed. Then, by Remark 

2.2.6 we conclude that converges to one of sayDenote

Inductively, we now assume

We then have

Since and from (5.2.21) and (5.2.5) it

follows that and by (5.2.6)

We now proceed to show that converges to one of unit eigenvectorscontained in

From (5.2.5) we see that the last term in the recursion

Page 255: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 255/368

 Application to Signal Processing 243

tends to zero as So, by (5.2.22) we need to reset withand at most for a finite number of times.

Replacing by in (5.2.9)–(5.2.11), we again arrive at

(5.2.11) for Precisely,

where

and

By noticing

and using (5.2.22), (5.2.23) can be rewritten as

where as

Since tends to an eigenvector of  A,  from (5.2.11) it follows that

where

Since converges, from (5.2.13) and it follows that

Page 256: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 256/368

244 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Inductively, assume that

with satisfying (5.2.27), i.e.,

Noticing that for any matrix  V,  we have

by (5.2.28).

Since by (5.2.24), denoting by

the term we have

for any convergent subsequence

Denoting

from (5.2.26) we see

By (5.2.8) and (5.2.30), similar to (5.2.18)–(5.2.20), by Remark 2.2.6

converges to an unit eigenvector of From (5.2.5) it

is seen that converges since and Then from

(5.2.6) it follows that itself converges as

Thus, we have

Page 257: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 257/368

 Application to Signal Processing 245

From (5.2.5) it follows that

which implies that and consequently,

Since the limit of is an unit eigenvector of  

we have

By (5.2.33) it is clear that can be expressed as a linear combi-nation of eigenvectors Consequently,

which incorporating with (5.2.34) implies that

This means that is an eigenvector of   A, and is different from

by (5.2.33).Thus, we have shown (5.2.21) for To complete the induction it

remains to show (5.2.28) for

As have just shown,tends to zero as from (5.2.31) we have

where satisfies (5.2.29) with replaced by by taking

notice of that (5.2.30) is fulfilled for whole sequence because whichhas been shown to be convergent.

Elementary manipulation leads to

Page 258: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 258/368

246 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

This expression incorporating with (5.2.35) proves (5.2.28) forThus, we have proved that given by (5.2.1)–(5.2.6)

converge to different unit eigenvectors of  A,  respectively.

To complete the proof of the theorem it remains to showRewrite the untruncated version of (5.2.7) as follows

We have just proved that Then by (5.2.8) and

noticing the fact that converges and we see that

satisfies A2.2.3.The regression function in (5.2.36) is linear:

Applying Theorem 2.2.1 leads to

Remark  5.2.1 If in (5.2.1) and (5.2.3) is replaced by Theo-rem 5.2.1 remains valid. In this case given by (5.2.18) should changeto and correspondingly changes to As a

result, the limit of changes to the opposite sign, fromto

5.3. Recursive Blind Identification by PCAAs mentioned in Section 5.1, the algorithm (5.1.32) for identifying

channel coefficients uses the second moment of the obser-vation noise. This causes difficulty in possible applications, because

may not be available.We continue to consider the problem stated in Section 5.1 with nota-

tions introduced there. In particular, (5.1.1)–(5.1.12), and (5.1.31) will

be used without explanation.In stead of (5.1.32) we now consider the following normalized SAalgorithm:

Page 259: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 259/368

 Application to Signal Processing 247

Comparing (5.3.1) and (5.3.2) with (5.2.1) and (5.2.2), we find thatthe channel parameter identification algorithm coincides with the PCAalgorithm with By Remark 5.2.1, Theorem 5.2.1 canbe applied to (5.3.1) and (5.3.2) if conditions A5.2.1, A5.2.2, and A5.2.3

hold.The following conditions will be used.

A5.3.1  The input is a sequence, i.e., there exist a con-

stant and a function such that for any

where

A5.3.2  There exists a distribution function over such that 

where denotes the Borel in and 

A5.3.3 The (2 L + 1) × (2 L + 1)-matrix is nondegenerate,

where

A5.3.4  The signal is independent of and 

a.s., where is a random variable with

A5.3.5  All components of of are

mutually independent with and and is bounded where

is a constant.

A5.3.6  have no common root.

For Theorem 5.1.1, is assumed to be a sequence of mutuallyindependent random variables (Condition A5.1.6), while in A5.3.1 theindependence is weakened to a property, but the distribution of 

A5.3.7 and 

Page 260: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 260/368

248 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

is additionally required to be convergent. Although thereis no requirement on distribution of in Theorem 5.1.1, we noticethat (5.1.30) is satisfied if are identically distributed.

In the sequel, denotes the identity matrix.

Define with

and

In what follows denotes the Kronecker product.

Theorem 5.3.1  Assume A5.3.1–A5.3.7 hold. Then

where C is a -matrix and Q is given in A5.3.3, and  for given by (5.3.1) and (5.3.2),

where J denotes the set of unit eigenvectors of C.

Proof.  By the definition of we have

Since

Page 261: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 261/368

 Application to Signal Processing 249

and by A5.3.2, (5.3.3) im-

mediately follows.

From the definition (5.1.31) for by A5.3.5 it is clear that

is a -identity matrix multiplied by withThen by A5.3.4 and A5.3.5

Identifying inTheorem 5.2.1 to we find that Theorem5.2.1 can be applied to the present algorithm, if we can show (5.2.8),

which, in the present case, is expressed as

where is given by (1.3.2), and  B  is given by (5.3.6).

Notice, by the notation introduced by (5.1.33),

Since

and

by the convergence theorem for martingale difference

sequences, for (5.3.7) it suffices to show

Identifying and in Lemma 2.5.2 to

and respectively, we find that conditions required there are

satisfied. Then (5.3.8) follows from Lemma 2.5.2, and hence (5.3.7) is

fulfilled.

Page 262: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 262/368

250 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By Theorem 5.2.1 given by (5.3.1) and (5.3.2) converges to an

unit eigenvector of   B, which clearly is an eigenvector of   C.

Lemma  5.3.1  is the unique up to a scalar multiple nonzero vector 

simultaneously satisfying

Proof.  Since it is known that satisfies (5.3.9), it suffices to prove the

uniqueness.

As in the proof of Lemma 5.1.1, assume is

also a solution to (5.3.9). Then, along the lines of the proof of Lemma5.1.1, we obtain the analogue of (5.1.16), which implies (5.1.29):

where is given by (5.1.28) while by (5.1.16).

By A5.3.3 which is nondegener-

ate. Then we have The rest of proof for uniqueness coincides

with that given in Lemma 5.1.1.

By Lemma 5.3.1 zero is an eigenvalue of  C  with multiplicity one and

the corresponding eigenvector is Theorem 5.3.1 guar-antees that the estimate approaches to   J , but it is not clear if  tends to the direction of 

Let be all different eigenvalues

of  C. J   is composed of disconnected sets andwhere Note that

the limit points of are in a connected set, so converges to a

for some Let We want to prove that

a.s. or This is the conclusion of Theorem5.3.2, which is essentially based on the following lemma, proved in [9].

Lemma  5.3.2  Let be a family of nondecreasing and 

be a martingale difference sequence with

 Let be an adapted random sequence and be a real sequence

such that and Suppose that onthe following conditions 1, 2 and 3 hold.

Page 263: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 263/368

 Application to Signal Processing 251

2) can be decomposed into two adapted sequences and

such that

3)  coincides with an random variablefor some

Then

Theorem  5.3.2  Assume A5.3.1–A5.3.7 hold. Then   defined   by

(5.3.1) and (5.3.2) converges to up-to a constant multiple:

where equals either 

Proof.  Assume the contrary: for some

Since  C   is a symmetric matrix, for where andhereafter a possible set with zero probability in is ignored. The proof 

is completed by four steps.Step 1.  We first explicitly expressExpanding defined by (5.3.2) to the power series of we derive

where

Noting and we derive

and

Page 264: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 264/368

252 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where is defined by (5.1.4), is given by (5.1.10)

with replaced by the observation noise, and denotes theestimate for at time

By (5.3.4) and (5.3.5), there exists a.s. such thata.s.

For any integers and define and

Note that for

and by the convergence of from (5.3.12) it follows thatwhere  is a constant for all in By

(5.3.7) we then have

as where and hereafter  T   should not be confused with thesuperscript  T   for transpose.

Choose large enough and sufficiently small T   such thatLet

and It then follows

that forIn

Page 265: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 265/368

 Application to Signal Processing 253

for sufficiently large.

Consequently, for with fixed

and hence

Define

From (5.3.15) it follows that

Tending in (5.3.21) and replacing by in the resulting equal-

ity, by (5.3.19) we have

Thus, we have expressed in two ways: (5.3.21) shows that is

measurable, while (5.3.22) is in the form required in 5.3.2, where

Page 266: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 266/368

254 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Step 2.  In order to show that the summand in (5.3.22) can beexpressed as that required in Lemma 5.3.2 we first show that the series

is convergent on By (5.3.14) and (5.3.7) it suffices to showis convergent on

Define

and

Clearly, is measurable with respect to and Thenby the convergence theorem for martingale difference sequences,

By (5.3.16) it follows that

Page 267: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 267/368

 Application to Signal Processing 255

The first term on the right-hand side of the last equality of (5.3.29) can

be expressed in the following form:

where the last term equals

Combining (5.3.30) and (5.3.31) we derive that the first term on the

right-hand side of the last equality of (5.3.29) is

Page 268: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 268/368

256 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By A5.3.4, A5.3.5, and A5.3.7 it is clear that

Hence replacing by in (5.3.29) results in

producing an additional term of magnitude Thus, by (5.3.24)–

(5.3.26) we can rewrite (5.3.29) as

where and is By (5.3.28) and A5.3.7

the series (5.3.33) is convergent, and hence given by (5.3.23) is a

convergent series.

Step 3.  We now define sequences corresponding to and in

Lemma 5.3.2.

Let We have

where

Denote

Page 269: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 269/368

 Application to Signal Processing 257

Then and are adapted sequences, is a mar-tingale difference sequence, and is written in the form of Lemma 5.3.2:

It remains to verify (5.3.10) and (5.3.11).

From (5.3.23) and (5.3.33) it follows that there is a constantsuch that Then for noticing

and

we have

By A5.3.4 and A5.3.5 it follows that

As in Step 4 it will be shown that

Page 270: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 270/368

258 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

From this it follows that

Then from the following inequality

by (5.3.34) and (5.3.36) it follows that

Therefore all conditions required in Lemma 5.3.2 are met, and we con-clude Since it follows that

and must converge to a.s.Step 4.  To complete the proof we have to show (5.3.35).Proof. If (5.3.35) were not true, then there would exist a subsequence

such that

For notational simplicity, let us denote the subsequence still by

Since by A5.3.5 for if and for anybut if we then have

which incorporating with (5.3.37) implies that

and

Noticing that and from (5.3.38)

and (5.3.24) it follows that

Page 271: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 271/368

 Application to Signal Processing 259

On the other hand, we have

and hence,

where denotes the estimate provided by for at timeSince for any

we have

Hence (5.3.40) implies that

Page 272: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 272/368

260 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and

By A5.3.4 the left-hand side of (5.3.41) equals

Since it follows that for any

The left side of (5.3.42) equals

Thus (5.3.42) implies that for any

Page 273: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 273/368

 Application to Signal Processing 261

Noticing from (5.3.25) we have

Then by A5.3.5, (5.3.39) implies that for any

Notice that

and

Page 274: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 274/368

262 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Then by A5.3.5, from (5.3.45)–(5.3.47) it follows that

and hence for any

and

Notice that (5.3.49) means that

However, the above expression equals

Therefore,

Page 275: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 275/368

 Application to Signal Processing 263

In the sequel, it will be shown that (5.3.43), (5.3.44), (5.3.48), and(5.3.50)) imply that which contradicts with

This means that the converse assumption (5.3.37) is not true.

For any since are coprime, where isgiven in (5.1.6), there exist polynomials such that

Let and be the degrees of and respectively. SetIntroduce the q-dimensional vector and q × q

square matrices W and A as follows:

Note that where and  Then (5.3.43), (5.3.44), (5.3.48),and (5.3.50) can be written in the following compact form:

To see this, note that for any fixed and on the left hand sides of  

(5.3.48) and (5.3.50) there are 2 L different sums when varies from 0 to L –  1 and   replace roles each other. These together with (5.3.43) and(5.3.44) give us 2 L + 1 sums, and each of them tends to zero. Explicitlyexpressing (5.3.52), we find that there are 2 L +1 nonzero rows and eachrow corresponds to one of the relationships in (5.3.43), (5.3.44), (5.3.48),and (5.3.50).

Since we have put enough zeros in the definition of after multiply-ing the left hand side of (5.3.52) by

has only shifted nonzero elements inFrom (5.3.52) it follows that for any and in

(5.3.51)

Page 276: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 276/368

264 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

From (5.3.53) it follows that

Note that for any polynomial of degree if the last elements of are zeros. From (5.3.54) it follows that

Denoting

from (5.3.55) we find that

By the definition of the first elements of are zeros, i.e.,

This means that the lastelements of are zeros, i.e.,

On the other hand,

By (5.3.56), from (5.3.57) and (5.3.58) it is seen that i.e.,

Page 277: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 277/368

 Application to Signal Processing 265

From (5.3.53) it then follows that

i.e., But this is impossible, becauseare unit vectors. Consequently, (5.3.37) is impossible and this completes

the proof of Theorem 5.3.2.

which, however, is unknown.It is required to design the optimal weighting  X,  which

minimizes

under constraint

where  C   and are matrices, respectively. In the

case where C  = 0, the problem is reduced to the unconstrained one.It is clear that (5.4.3) is solvable with respect to  X   if and only if 

and in this case the solution to (5.4.3) is

where Z   is anyFor notational simplicity, denote

Let L(C ) denote the vector space spanned by the columns of matrix C ,and let the columns of matrix be an orthogonally normalized basis

5.4. Constrained Adaptive FilteringWe now apply SA methods to adaptive filtering, which is an important

topic in signal processing. We consider the constrained problem, whilethe unconstrained problem is only a special case of the constrained one

as to be explained.Let and be two observed sequences, where and are

respectively. Assume is stationary and

ergodic with

Page 278: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 278/368

266 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

of  L(C ). Then there is a full-rank decompositionNoticing we have  Let bean orthogonal matrix. Then

and hence

From this it follows that

and hence a.s. This implies that

Let us express the optimal  X  minimizing (5.4.2) via By(5.4.8) substituting (5.4.4) into (5.4.2) leads to

On the right-hand side of (5.4.9) only the first term, which is quadratic,depends on Z. Therefore, the optimal should be the solution of 

i.e.,

where is any satisfying

Page 279: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 279/368

 Application to Signal Processing 267

Combining (5.4.4) with (5.4.11), we find that

Using the ergodic property of we may replace and bytheir sample averages to obtain the estimate for And, the esti-mate can be updated by using new observations. However, to updatethe estimate, it involves taking the pseudo-inverse of the updated esti-mate for which may be of high dimension. This will slow down thecomputation speed. Instead, we now use an SA algorithm to approach

By (5.4.8), we can rewrite (5.4.10) as

or

We now face to the standard root-seeking problem for a linear function

As before, letand 

  Thefollowing algorithm is used to estimate given by (5.4.12), which inthe notations used in previous chapters is the root set   J  for the linearfunction given by (5.4.14):

with initial value such that and

Theorem  5.4.1  Assume that is stationary and ergodic with sec-

ond moment given by (5.4.1) and that Then, after a finite number of steps, say (5.4.16) has no more

truncations, i.e.,

Page 280: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 280/368

Page 281: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 281/368

Page 282: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 282/368

270 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Then from (5.4.26) it follows that

Denote

and

Since is stationary and ergodic, a.s.,  and

Then by a partial summation, we have

Page 283: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 283/368

 Application to Signal Processing 271

Notice that a.s. by ergodicity. Then for large

and from (5.4.29) it follows that

where (5.4.24) is used incorporating with the fact that

and is stationary with  E 

From (5.4.27)–(5.4.30) by convergence of it follows that

for large and small T , where and are constants independent of  

andConsequently, in the case i.e., in

(5.4.16), will never reach the truncation bound for

if is large enough and T  is small enough.

Then coincides  with This verifies(5.4.22), while (5.4.23) follows from (5.4.16) because for a fixed

and are bounded, and

are also bounded by (5.4.31) and the convergence In

the case  i.e.,  ‚ for some is

bounded, and hence (5.4.22) and (5.4.23) are also satisfied.We are now in a position to verify the noise condition required in

Theorem 2.2.1 for given by (5.4.20), i.e., we want to show that

for any convergent subsequence

Page 284: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 284/368

272 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By (5.4.24)

so for (5.4.32) it suffices to show

Again, by (5.4.24) and also by (5.4.23)

which implies (5.4.33).  By Theorem 2.2.1, there is such that for is defined by(5.4.17) and converges to the root set   J  for given by (5.4.14).This completes the proof for the theorem.

Remark 5.4.1 For the unconstrained problem and   C  = 0, thealgorithm (5.4.16) becomes

Page 285: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 285/368

Page 286: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 286/368

274 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Theorem  5.5.1  Assume is stationary and ergodic with

Then

where is defined by (5.5.4) and (5.5.5) with an arbitrary initialvalue. In addition, in a finite number of steps truncations cease to exist 

in (5.5.4).

Proof. Define

and

Let be a countable set that is dense in let and betwo sequences of positive real numbers such that andas and denote

and

where and is an integer.

The summands of (5.5.9)–(5.5.11) are stationary with finite expecta-tions for any any integer any and any and then theergodic theorem yields that

a.s.,

Page 287: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 287/368

 Application to Signal Processing 275

and

Therefore, there is an such that and for eachthe convergence for (5.5.12)–(5.5.14) takes place for any anyinteger any and any

Let us fix anWe first show that for any fixed

if is large enough (say, for ), and in addition,

where c is a constant which may depend on but is independent of  In what follows always denote constants that may

depend on but are independent of By (5.4.24) we have for any

There are two cases to be considered. If then for largeenough, and (5.5.15) holds. If is bounded, then thetruncations cease to exist after a finite number of steps. So, (5.5.15) alsoholds if is sufficiently large. Then (5.5.16) follows immediately from

(5.5.15) and (5.5.17).Let us define

where is given by (5.5.2). Then (5.5.15) can be represented as

Let be a convergent subsequence of and letbe such that We now show that

Page 288: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 288/368

276 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Let By (5.5.16) or forsome integer

We examine that the terms on the right-hand side of (5.5.20) satisfy(5.5.19) .

For the first term on the right-hand side of (5.5.20) we have

where and are deterministic for a fixed and the expecta-tion is taken with respect to and

Since (5.5.6), a.s., applying the dominated convergencetheorem yields

Then from (5.5.21) it follows that

Page 289: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 289/368

 Application to Signal Processing 277

Similarly, for the second term on the right-hand side of (5.5.20) wehave

since a.s.

For the third term on the right-hand side of (5.5.20) by (5.4.24),(5.5.10), and (5.5.13) we have

sinceFinally, for the last term in (5.5.20), by (5.5.14) and (5.4.24) we have

where the last convergence follows from the fact that

a.s. as since and

a.s.Combining (5.5.23)–(5.5.26) yields that

Page 290: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 290/368

278 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Since the left-hand side of (5.5.27) is free of tending to infinity in(5.5.27) leads to (5.5.19). Then the conclusion of the theorem followsfrom Theorem 2.2.1 by noticing that as in A2.2.2 one may take

5.6. Asynchronous Stochastic ApproximationWhen dealing with large interconnected systems, it is natural to con-

sider the distributed, asynchronous SA algorithms. For example, in acommunication network with servers, each server has to allocate audioand video bandwidths in an appropriate portion in order to minimizethe average time of queueing delay. Denote by   the  bandwidthratio for the server, and Assume the average delay

time depends on only and is  differentiable,Then, to minimize is equivalent to find the root of Assumethe time, denoted by spent on transmitting data from the serverto the server is not negligible. Then at the server for theiteration we can observe or only at where

denotes the total time spent until completion of iterations for theserver. This is a typical problem solved by asynchronous SA. Simi-

lar problem arises also from job-scheduling for computers in a computernetwork.We now precisely define the problem and the algorithm.

At time denote by the estimate for the unknown

root of Components of are observedby different processors, and the communication delays from theprocessor to the processor at time are taken into account. Theobservation of the processor is carried only ati.e.,

where is the observation noise.In contrast to the synchronous case, the update steps now are different

for different processors, so it is unreasonable to use the same step sizefor all processors in an asynchronous environment. At time the stepsize used in the processor is known and is denoted by

We will still use the expanding truncation technique, but we are un-able to simultaneously change estimates in different processors when theestimate exceeds the truncation bound because of the communicationdelay.

Assume all processors start at the same given initial valueand for all The observation at

Page 291: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 291/368

 Application to Signal Processing 279

the processor is and is updated to by the rule givenbelow. Because of the communication delay the estimate produced bythe processor cannot reach the processor for the initial steps:

By agreement we will take to serve as whenever

At the processor, there are two sequences andare recursively generated, where is the estimate for the

component of at time and is connected with the number of  truncations up-to and including time at the processor. For the

processor at time the newest information about other processorsis In all algorithms discussed until

now all components of are observed at the same point at timeand this makes updating to meaningful. In the present case,

although we are unable to make all processors to observe at thesame points at each time, it is still desirable to require all processorsobserve at points located as close as possible. Presumably, thiswould make estimate updating reasonable. For this, by noticing thatthe estimate gradually changes after a truncation, the ideal is to keepall are equal, but for this the best we can do is to

equalize with otherKeeping this idea in mind, we now define the algorithm and the ob-

servations for the processor,Let be a fixed point from where the algorithm

restarts after a truncation.i) If there exists with then reset to equal the biggest

one among and pull back to the fixed point although

may not exceed the truncation bound. Precisely, in this case define

and observe

for any then observe atii) If 

Page 292: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 292/368

280 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

i.e.,

For both cases i) and ii), and are updated as follows:

where is the step size at time and may be random, and

is a sequence of positive numbers increasingly diverging to in-

finity.

Let us list conditions to be used.

A5.6.1 is locally Lipschitz continuous.

A5.6.2 and 

there exist two positive constants such that  

A5.6.3  There is a twice continuously differentiable function (not neces-

sarily being nonnegative) such that  

and is nowhere dense, where

and  denotes the gradient of 

A5.6.4 For any convergent subsequence any and any

Page 293: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 293/368

 Application to Signal Processing 281

where

and 

A5.6.5

Note that (5.6.10) holds if is bounded, since Note

also that A5.6.3 holds if and

Theorem  5.6.1  Let    be given by (5.6.1)–(5.6.6) withinitial value Assume A5.6.1–A5.6.5 hold, and thereis a constant such that and 

where is given in A5.6.3. Then

where

The proof of the theorem is separated into lemmas. From now on wealways assume that A5.6.1–A5.6.5 hold.

We first introduce an auxiliary sequence and its associated ob-servation noise It will be shown that differs from only

by a finite number of steps. Therefore, for convergence of it sufficesto prove convergence of 

Let be a sample

path generated by the algorithm (5.6.1)–(5.6.6), where is the one afterresetting according to (5.6.2). Let where is

defined in A5.6.4. Assume By the resetting rule given

in i), for any after resetting we have For we

have and by the definition of 

In the processor we take and to replace and

respectively, and define for those

Further, define and for

Page 294: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 294/368

282 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Then we obtain new sequences associated withBy (5.6.1)–(5.6.6), if then there exists a with

and

since and forBecause during the period there is no truncationfor the sequences are recursively

updated as follows:

where

Define delays for as follows

is available to the processor at time

Lemma 5.6.1 For any any convergent subsequence

and any satisfies the following condition

where

Proof. Since equals either or which is available at time itis seen that

For by definition of we have

which is certainly available to the processor. Therefore,

We rewrite By the definition of and

paying attention to (5.6.17) we see

so

as

Page 295: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 295/368

 Application to Signal Processing 283

We now show that (5.6.18) is true for all Forthere is no truncation for the processor,

and hence by the resetting rule i). If 

for some then by (5.6.16) and the definition of it follows that

which implies (5.6.18).If for some then as explained above for the processor

at time the latest information about the estimate produced by theprocessor is In other words,

However, by definition of which yields

This again implies (5.6.18).In summary, we have

This means that for there is no truncation at any time equal toand the observation is carried out at

i.e.,

For any any convergent subsequence and any wehave

By (5.6.11), Then from A5.6.2 and

A5.6.5 it follows that and hence the second term

Page 296: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 296/368

284 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

on the right-hand side of (5.6.21) tends to zero as Further,from the definition of there is such that Hence the firstterm on the right-hand side of (5.6.21) is of order o(T ) by A5.6.4. Con-sequently, from A5.6.2, A5.6.4 and A5.6.5 it follows that satisfies(5.6.15).

Lemma 5.6.2  Let be generated by (5.6.12)–(5.6.14). For any con-vergent subsequence of if is bounded,

then there are and such that 

where is given in (5.6.14).

where is given in A5.6.2.By (5.6.15) for convergent subsequence there exists such

that for any and

Choose such that For any let

Then for any

If then if is sufficiently large,i.e., no truncation occurs after and hence for

If then there exists such that forany From (5.6.24) it follows that

Therefore, in both cases

Proof.  Let whereand 

Page 297: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 297/368

 Application to Signal Processing 285

If then for sufficiently large

i.e.,

This contradicts the definition of Therefore,

Lemma 5.6.3  Let be given by (5.6.12) – (5.6.14). For any

with the following assertions take place:

i) In the case, cannot cross infinitely manytimes keeping bounded, where  are the starting

 points of crossing;

ii) In the case cannot converge to keeping

bounded.

Proof . i) Since is bounded, there exists a convergent subse-quence, which is still denoted by for notational simplicity,

By the boundedness of and (5.6.22)

for sufficiently large there is no truncation between andand hence

where By (5.6.20), (5.6.22) and

it follows that

Page 298: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 298/368

286 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By A5.6.2 and A5.6.3 we have

Then by A5.6.1

where is the Lipschitz coefficient of in and

By the boundednessof 

and the fact that there is no truncation between and it follows

that

Without loss of generality, we may assume is a convergent se-quence. Then by A5.6.3 and A5.6.5

Therefore,

where

Since is continuous for fixed   by A5.6.4 there exists afor such that

Page 299: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 299/368

 Application to Signal Processing 287

Thus, for sufficiently small T and sufficiently large we have

On the other hand, by Lemma 5.6.2

Thus, for sufficiently small T , and

This contradicts (5.6.31), and i) is proved.

ii) If is bounded, then there is a convergent subsequenceThen the assertion can be deduced by a similar way as that for i).

Lemma 5.6.4 Under the conditions of Theorem 5.6.1

where is given by (5.6.14).

Proof. If then there exists a sequence such that

From (5.6.12)–(5.6.14) we haveChoose a small positive constant such that

Let be a connectedset containing and included in the set

and let be a connected set containing and included in the setClearly, and and are

bounded.

Since diverges to infinity, there exists such thatfor Noting that there exists i such that

and we  can define and

for

Since there is a convergent subsequence in also de-

noted by Let  be a limit point of By the definition of is bounded. But

crosses infinitely many times, and it

is impossible by Lemma 5.6.3. Thus,

Proof of Theorem 5.6.1

and 

Page 300: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 300/368

288 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By Lemma 5.6.4 is bounded. Let

If then by Lemma 5.6.3, we have

If then there are and such thatand since is nowhere dense. But by

Lemma 5.6.3 this is impossible. Therefore,We now show If there is a convergent subsequence

and then (5.6.26)–(5.6.30) still hold. Hence,

This is a contradiction to

Consequently, i.e.,

Since and the truncations occur onlyfor finitely many times. Therefore, and differ from each other onlyfor a finite number of So,

5.7. Notes and ReferencesFor blind identification with “block” algorithms we refer to [71, 96].

Recursive blind channel identification algorithms appear to be new. Sec-tion 5.1 is written on the basis of the joint work “H. F. Chen, X. R. Cao,and J. Zhu, Convergence of stochastic approximation based algorithmsfor blind channel identification”. Principal component analysis is ap-plied in different areas (see, e.g., [36, 79]). The results presented inSection 5.2 are the improved version of those given in [101]. The princi-pal component analysis is applied to solve the blind identification prob-lem in Section 5.3, which is based on the recent work “H. T. Fang and

H. F. Chen, Blind channel identification based on noisy observation bystochastic approximation method”. The proof of Lemma 5.3.2 is givenin [9].

For adaptive filter we refer to [57]. The results presented in Sec-tion 5.4 are stronger than those given in [11, 28]. The sign algorithmsare dealt with in [42], but conditions used in Section 5.5 are consider-ably weaker than those in [42]. Section 5.5 is based on the recent work“H. F. Chen and G. Yin, Asymptotic properties of sign algorithms for

adaptive filtering”.Asynchronous stochastic approximation was considered in [9, 88, 89,

99]. Section 5.6 is written on the basis of [50].

i.e.,

Page 301: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 301/368

Chapter 6

APPLICATION TO SYSTEMS

AND CONTROL

Assume a control system depends on a parameter and the systemoperation reaches its ideal status when the parameter equals someSince is unknown, we have to estimate it during the operation of thesystem, which, therefore, can work only on the estimate of Inother words, the real system is not under the ideal parameter and

the problem is to on-line estimate and to make the system asymptot-ically operating in the ideal status. It is clear that this kind of systemparameter identification can be dealt with by SA methods.

Adaptive control for linear stochastic systems is a typical examplefor the situation described above. If the system coefficients are known,then the optimal stochastic control may be a feedback control of thesystem state. The corresponding feedback gain can be viewed as theideal parameter which depends on the system coefficients. In the setupof adaptive control, system coefficients are unknown, and hence isunknown. The problem is to estimate and to prove that the resultingadaptive control system by using the estimate as the feedback gain isasymptotically optimal as tends to infinity.

In Section 6.1 the ideal parameter is identified by SA methods forsystems in a general setting, and the results are applied to solving theadaptive quadratic control problem. The adaptive stabilization problemis solved for stochastic systems in Section 6.2, while the adaptive exactpole assignment is discussed in Section 6.3. An adaptive regulationproblem for nonlinear and nonparametric systems is considered is Section6.4.

289

Page 302: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 302/368

290 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

6.1. Application to Identification and AdaptiveControl

Consider the following linear stochastic system depending on param-

eter

where andare unknown.

The ideal parameter for System (6.1.1) is a root of an unknownfunction

The system actually operates with equal to some estimate for ,i.e., the real system is as follows:

For the notational simplicity, we suppress the dependence on thestate and rewrite (6.1.3) as

The observation at time is

where is a noise process.From (6.1.5) it is seen that the function is not directly observed,

but it is connected with as follows:

We list conditions that will be used.

where is generated by (6.1.1).Let be a sequence of positive numbers increasingly diverging to

infinity and let be a fixed point. Fixing an initial value werecursively estimate by the SA algorithm with expanding truncations:

Page 303: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 303/368

291 Application to Systems and Control

A6.1.2 There is a continuously differentiable function

such that 

 for any and is nowhere dense,

where J is given by (6.1.2). Further, used in (6.1.8) is such that 

inf   for some and 

A6.1.3 The random sequence in (6.1.1) satisfies a mixing condition

characterized by

uniformly in  where Further, is such that  

sup where

A6.1.4 For sufficiently large integer 

 for any such that converges, where is given by (1.3.2).

Let is stable}, and let be an open, connected subsetof 

A6.1.5 and f are connected by (6.1.6) and (6.1.1) for each

  satisfies a local Lipschitz condition on

with   for any constants and where is

given in A6.1.3.

with

A6.1.1 and  

Page 304: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 304/368

292 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

A6.1.6  and in (6.1.1) are globally Lipschitz continuous:

where L is a constant.

A6.1.7 given by  (6.1.7) is If converges for some

then where may depend on

Theorem 6.1.1  Assume A6.1.1–A6.1.7 hold. Then

where is a connected subset of 

Proof. By (6.1.5) we rewrite the observation in the standard from

where

By Theorem 2.2.2 and Condition A6.1.4, the assertion of the theoremwill immediately follow if we can show that for almost all condition(2.2.2) is satisfied with replaced by

Let be expressed as a sum of seven terms:

where

Page 305: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 305/368

293 Application to Systems and Control

where

and and denote the distribution and

conditional distribution of given respectively.To prove the theorem it suffices to show that there exists with

such that for each all satisfy

(2.2.2) with respectively identified to

By definition, for any there is such that

where is independent of 

Let us first show that satisfy (2.2.2).Solving (6.1.1) yields

By A6.1.3 is bounded. Hence, by (6.1.18) is bounded andby A6.1.5 is also bounded:

where

where is given in A6.1.5.

Since we haveWe now show that and are continuous in uni-

formly with respect toBy (6.1.18) and (6.1.20), from (6.1.19) it follows that

Page 306: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 306/368

By (6.1.18) (6.1.20) and the Lipschitz condition A6.1.5 for it followsthat

and

294 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

which implies the uniform continuity of This together with (6.1.13)

yield that is also uniformly continuous.Let be a countable dense subset of Noticing that is and expressing

as a sum of martingale difference sequences

by (6.1.20) and we find that there is withsuch that for each

for any integer and any From here by uniform continuity of  it follows that for and for any integer

Note that

Page 307: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 307/368

This is because by (6.1.18) and (6.1.20) we have the following estimate:

We now estimate by the treatment used in Lemma 2.5.2. By ap-plying the Jordan-Hahn decomposition to the signed measure

Similarly, we can find with such that forand

since is bounded by the martingale convergencetheorem. It is worth noting that (6.1.23) holds a.s. for anybut without loss of generality (6.1.23) may be assumed to hold for all

with To see this, we first selectthat (6.1.23) holds for any This is possible

because is a countable set. Then, we notice that iscontinuous in uniformly with respect to Thus, we have

295 Application to Systems and Control

with

on

such

Page 308: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 308/368

where is the mixing coefficient given in A6.1.3. Thus, by(6.1.27)–(6.1.29) we have

and

296 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

it is seen that there is a Borel set D in the sampling spacesuch that for any  A  in the sampling space

Page 309: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 309/368

 Application to Systems and Control 297

By A6.1.5, (6.1.18), (6.1.20), and noticing we find

whose expectation is finite as explained for (6.1.20). Therefore, on theright-hand side of (6.1.30) the conditional expectation is bounded withrespect to by the martingale convergence theorem, and the last term isalso bounded with respect to   Thus, by (6.1.10) from (6.1.30) it follows

that there is with such that

Assume is a convergent subsequence

Define

Write (6.1.4) as

Let be fixed.

Page 310: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 310/368

Page 311: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 311/368

299 Application to Systems and Control

whereBy induction we now show that

for all suitable large .

For any fixed  if  is large enough, since

Therefore (6.1.36) holds for sinceAssume (6.1.36) holds for some By notic-

ing from (6.1.34) and (6.1.35) it follows that

By using (6.1.20) (6.1.37) and the inductive assumption and applying

(6.1.19) to it follows that

for where and satisfies thefollowing equation

By A6.1.7 and (6.1.20) we have

and using (6.1.18), (6.1.37), and the inductive assumption we derive

Page 312: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 312/368

300 STOCHASTIC   APPROXIMATION   AND  ITS   APPLICATIONS 

This combining with (6.1.38) leads to that there are real numbersand such that

for From here it follows that

From the inductive assumption it follows that for

for some large enough integer  N . Then by (6.1.12)

Setting

we derive

where (6.1.22), (6.1.24), (6.1.25), (6.1.31), (6.1.39), and (6.1.40) areused.

Choose sufficiently small so that (6.1.35) holds, and

Page 313: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 313/368

 Application to Systems and Control 301

Since by A6.1.5 there is such that

for all From (6.1.41) it then follows that

It can be assumed that is sufficiently large so that

Since by (6.1.42) it follows that

and hence there is no truncation at

Thus, we have

or equivalently,

which proves (6.1.36).

Consequently, (6.1.39) is valid for andhence

times and

and

Page 314: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 314/368

where is the estimate of Let be given by (6.1.7) and (6.1.8) with given by (6.1.5).

where and are related by (6.1.44).However, since the ideal is unknown, the real system satisfies the

equation

where and are symmetric such that andLet given by A6.1.3). The control

where is the feedback control which is required to minimize

Finally, noticing that A6.1.5 assumes (6.1.6), we conclude that for each

all satisfy (2.2.2) with

respectively replaced by The proof of the theorem iscompleted.

We now apply the obtained result to an adaptive control problem.Assume that is the ideal parameter for the system, being

the unique zero of an unknown function The system in the idealcondition is described by the equation

302 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

From (6.1.21) and (6.1.13) it is seen that is contin-uous in uniformly with respect to Therefore, its limit is acontinuous function. Then by (6.1.36) it follows that

should be selected in the family U  of admissible controls:

Page 315: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 315/368

 Application to Systems and Control 303

In order to give adaptive control we need the expression of the optimalcontrol when is known.

Lemma  6.1.1  Suppose that 

is a martingale difference sequence with

ii) where is controllable and observ-

able, i.e., · · · , and 

· · · , are of full rank.Then in the class of nonnegativedefinite matrices there is an unique satisfying

and 

where

and 

Proof. The existence of an unique solution to (6.1.50) and stability of F  given by (6.1.51) are well-known facts in control theory. We show theoptimality of control given by (6.1.52).

For notational simplicity , we temporarily suppress the dependence of 

and on and write them as  A, B,

  and D, respectively.Noticing

is stable. The optimal control minimizing (6.1.45) is

Page 316: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 316/368

304 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

we then have

Since by the estimate for the weighted sum of martingale

difference sequence from (6.1.55) it follows that

where is the state in (6.1.47).

Thus the closed system becomes

Notice that the last term of (6.1.56) is nonnegative. The conclusions of 

the lemma follow from (6.1.56).

According to (6.1.52), by the certainty-equivalence-principle, we form

the adaptive control

Page 317: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 317/368

305 Application to Systems and Control

which has the same structure as (6.1.4). Therefore, under the assump-

tions A6.1.1–A6.1.7 with replaced by and with  J  being asingleton by Theorem 6.1.1 it is concluded that

By continuity and stability of it is seen that there are andpossibly depending on such that

This yields the boundedness of and

because

By (6.1.60) it follows that

Therefore, the closed system (6.1.58) asymptotically operates under theideal parameter and makes the performance index (6.1.45) minimized.

6.2. Application to Adaptive StabilizationConsider the single-input single-output system

where and are the system input, output, and noise, respec-

tively, and

where is the backward shift operator,The system coefficient

is unknown. The purpose of adaptive stabilization is to design control

so that

a.s.

Page 318: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 318/368

The fact that and a can be solved from (6.2.5) for anymeans that

is nonzero. In other words, the coprimeness of and is equiva-lent to

In the case is unknown the certainly-equivalency-principle suggestsreplacing by its estimate to derive the adaptive control law. How-ever, for may be zero and (6.2.5) may not be solvable withand replaced by their estimates.

Let us estimate by the following algorithm called the weighted leastsquares (WLS) estimate, which is convergent for any feedback control

306 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

If is known and if and are coprime, then for an arbitrarystable polynomial of degree there are unique polynomials

and both of order with such that

Then the feedback control generated by

leads the system (6.2.1) to

Then, by stability of (6.2.4) holds if assume

Considering coefficients of  and as unknowns, and identifyingcoefficients of  for both sides of (6.2.5), we derivea system of linear algebraic equations with matrix for unknowns:

Page 319: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 319/368

 Application to Systems and Control 307

where

Though converges a.s., its limit may not be the true If a boundedsequence can be found such that the modified estimate

and  for some

is convergent and

then the control obtained from (6.2.6) with replaced by solves theadaptive stabilization problem, i.e., makes (6.2.4) to hold.

Therefore, the central issue in adaptive stabilization is to find a bound

-ed sequence such that given by (6.2.12) is convergent and(6.2.13) is fulfilled. This gives rise to the following definition.Definition. System (6.2.1) is called adaptively stabilizable by the use

of parameter estimate if there is a bounded sequence such that  

(6.2.13) holds and given by (6.2.12) is convergent.

It can be shown that if system (6.2.1) is controllable, i.e., andare coprime, then it is adaptively stabilizable by the use of the WLS

estimate. It can also be shown that the system is adaptively stabilizable

by use of if and only if where and  F   denotethe limits of and respectively, which are generated by (6.2.9)–(6.2.11).

We now use an SA algorithm to recursively produce such thatis convergent and the resulting estimate by (6.2.12) satisfies

(6.2.13).

Page 320: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 320/368

is generated by (6.2.9)–(6.2.11), is defined by (6.2.11), and isrecursively defined by an SA given below.

Let us take a few real sequences defined as follows:

where

which can be written as

From algebraic geometry it is known that is a

finite  set.

However, is not directly observed; the real observation is

The root set of is denoted by where

where

As a matter of fact,

Let and be –dimensional, and let

308 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Page 321: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 321/368

 Application to Systems and Control 309

Let be  l-dimensional with only one nonzero element equal toeither +1 or –1, Similarly, let be -dimensionalwith only nonzero elements, each of which equals either 1 or – 1,

The total number of such vectors is

Normalize these vectors and denote the resulting vectors byin the nondecreasing order of the number of nonzero elements in

Define and for Introduce

Define the recursive algorithm for as follows:

and is a fixed vector.The algorithm (6.2.23)–(6.2.27) is the RM algorithm with expanding

truncations, but it differs from the algorithm given by (2.1.1)–(2.1.3)

as follows. The algorithm (2.1.1)–(2.1.3) is truncated at the upper sideonly, but the present algorithm is truncated not only at the upper sidebut also at the lower side: is allowed neither to diverge to infinity norto tend to zero; whenever it reaches the truncation bounds the estimate

is pulled back to and is enlarged to at the upper side,while at the lower side is pulled back to which will change to the

Page 322: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 322/368

310 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

next whenever is satisfied. If for successiveresettings of we have to change to the next one, then we reduceto

Lemma 6.2.1  Assume the following conditions hold:

A6.2.2 System (6.2.1) is adaptively stabilizable by use of generated 

by (6.2.9)–(6.2.11), i.e.,

 If then after a finite number of steps the algorithm (6.2.23)– (6.2.27) becomes the RM algorithm

converges and 

Proof. The basic steps of the proof are essentially the same as those forproving Theorem 2.2.1, but some modifications should be made becauseof truncations at the lower side.

Step 1. Let be a convergent subsequence of 

For any define the RM algorithm

with or for some for some

We show that there are  M >  0,  T >  0 such thatwhen and

when if is large enough, where is givenby (1.3.2).

Let > 1 be a constant such that

It is clear that

A6.2.1 and 

Page 323: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 323/368

 Application to Systems and Control 311

Since and are convergent, there is such that

Let By (6.2.29) and (6.2.30), we have

for if and for if whereLet (6.2.31) hold for or

It then follows that

where orThus, (6.2.31) has been inductively proved for

orStep 2. Let be a convergent subsequence. We show that there

are  M   > 0 and T >  0 such that

if is large enough.If defined by (6.2.25) is bounded, then (6.2.32) directly follows.Again take such that and setAssume Then there is a such that

By the result proved in Step 1, starting from the algo-

rithm for cannot directly hit the sphere with radius without atruncation for So it may first hit somelower bound at time and switch to some

from which again by Step 1 cannot directly reach without atruncation. The only possibility is to be truncated again at a lowerbound. Therefore, (6.2.32) takes place.

Page 324: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 324/368

312 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Step 3. Since and are convergent, by (6.2.32) it follows that

from any convergent subsequence there are constants andsuch that

if is large enough.Consequently, there is such that

By (6.2.32) and the convergence of and it also follows that

Therefore,

Using (6.2.33) and (6.2.34) by the same argument as that given in Step 3

of the proof for Theorem 2.2.1, we arrive at the following conclusion.If starting from the algorithm (6.2.24) is calculated

as an RM algorithm and is bounded, then for

any with and cannot crossinfinitely often.

Step 4. We now show that is bounded.

If is unbounded, then as Therefore,is unbounded and comes back to the fixed point infinitely manytimes.

Notice that is a finite set and

We see that there is an interval with and

0 such that crosses infinitely often, and during each cross-

ing the algorithm (6.2.24) behaves like an RM algorithm with staringpoint It is clear that is bounded because asBut by Step 3, this is impossible. Thus, we conclude that

is bounded, and after a finite number of steps (6.2.24) becomes

as

Page 325: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 325/368

 Application to Systems and Control 313

Step 5. We now show (6.2.28), i.e., after a finite number of steps thealgorithm (6.2.35) ceases to truncate at the lower side.

Since and by A6.2.2, it follows that thereis at least one nonzero coefficient in the polynomial for some

with Therefore, for some and a small

From (6.2.16) it is seen that for sufficiently small we have

This combining with convergence of and leads to

for sufficiently large

From (6.2.26) and (6.2.36) it follows that must be bounded, andhence is bounded. This means that there is a such that

We now show that is bounded.Since for all sufficiently large it follows that

were unbounded, then by (6.2.37) the algorithm, starting fromwould infinitely many times enter the sphere with radius

where is small enough such that

Then would cross infinitely often an intervalSince is a finite set, we may assume It is clearthat during the crossing the algorithm behaves like an RM algorithm.By Step 4, this is impossible.

Therefore, there is a such that

Noticing (6.2.20), (6.2.34), and that serves as the Lyapunov func-tion for from Theorem 2.2.1 we conclude the remaining assertionsof the lemma.

Page 326: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 326/368

where and are defined by 1)-3) described above.

Proof. The key step is to show that

Assume the converse:

Case i) The assumption implies that

and occurs infinitely many times. However,

this is impossible, since and The contradictionshows

Theorem 6.2.1  Assume conditions A6.2.1 and A6.2.2 hold. Then thereis such that and converges and 

and use to produce the adaptive control as in 1), and go back to

1) for3) If and none of a)-c) of 2) is the case, then set

and go back to 1) for and at the same time change to

i.e.,

Define

314 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Using we now define in (6.2.12) satisfying (6.2.13) and thussolving the adaptive stabilization problem.

Let1) If then set Using we produce

the adaptive control from (6.2.6) with and defined from

(6.2.5) with replaced by and go back to 1) for2) If then definea) for the case where

b) defined by (6.2.24) for the case where

butc) for the case, where

but

Page 327: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 327/368

and the algorithm defining will run over the following cases: 1) and2a)-2c). Since and are convergent, the inequality

for all sufficiently large Again, this means that (6.2.41) may take placeat most a finite number of times, and we conclude that

Thus, there is such that

If then from (6.2.43) it follows that

Since and for sufficiently large

from (6.2.42) it follows that

for all sufficiently large Thus, (6.2.41) may take place at most a finitenumber of times. The contradiction shows that

we havethen as

Take a convergent subsequence of For notational simplicity de-

note by itself its convergent subsequence. Thus

By Lemma 6.2.1,1) If then

 Application to Systems and Control 315

Case ii) The assumption implies that there

is a sequence of integers such that andi.e., for all the following indicator equals one

2) If 

implies

Page 328: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 328/368

6.3.   Application to Pole Assignment for Systems

with Unknown CoefficientsConsider the linear stochastic system

where is the -dimensional state, is the one-dimensional control,and is the -dimensional system noise.

The task of pole assignment is to define the feedback control

in order that the characteristic polynomial

of the closed-loop system coincides with a given polynomial

The pair is called similar to if there exists a nonsin-gular matrix such that

where denotes the column of  T .

Consequently, the truncation at the lower bound in (6.2.24) should bevery rare. The computation will be simplified if there is no lower boundtruncation.

316 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

for sufficiently large This means that the algorithm can be at 2b)only for finitely many times. By the same reason it cannot be at 2c)for infinitely many times. Therefore, the algorithm will stick on 1) if 

and on 2a) if and in both cases there is a

such that and

The convergence of follows from the convergence of and

Remark 6.2.1 For the case the origin is not a stableequilibrium for the equation

Page 329: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 329/368

So, is nonsingular if and only if is nonsingular.Assume that is controllable and is already in its con-

troller form (6.3.5). For notational simplicity, we will write rather

than

where

which imply

 Application to Systems and Control 317

Define

where are coefficients of  

The pair is called the controller form associated to the pair

If is controllable, i.e., is of full rank,then is similar to its controller form. To see this, we note that(6.3.4) implies and from it followsthat

Page 330: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 330/368

where is the system noise at time “1” for the system with feedbackgain applied.

Having observed we compute its characteristic polynomial detwhich is a noise-corrupted characteristic polynomial of 

Let be the estimate for By observing det weactually learn the difference det which in a certain sensereflects how far det differs from the ideal polynomial

For any let

318 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

With feedback control the closed-loop system takes theform

Since is in controller form,

where are elements of the row vector F :

Therefore, if  is known, then comparing (6.3.10) with (6.3.3) givesthe solution to the pole assignment problem, where

We now solve the pole assignment problem by learning for the casewhere is unknown.

Let us combine the vector equation (6.3.9) for initial values to form

the matrix equation

Let In learning control, can be observed at any fixed

For any the observation of is denoted by

Page 331: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 331/368

 Application to Systems and Control 319

be the row vector composed of coefficients of 

By (6.3.10)

composed of coefficients of 

and respectively.Take a sequence of positive real numbers

and

Calculate the estimate for by the following RM algorithm with

expanding truncations:

with fixed

Theorem 6.3.1  Assume that is controllable and is in the con-

troller form. Further, assume the following conditions A6.3.1 and A6.3.2

hold:

A6.3.1 The components of 

of in (6.3.13) are mutually independent with

A6.3.2

where is the same as that in A6.3.1.Then there is with such that for each as

Similarly, define row vectors

 for some

Page 332: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 332/368

From here it is seen that is a sum of products of elements from with +1 and –1 as

multiple for each product, where and denote elements of  A  andrespectively. It is important to note that each product inincludes at least one of as its factor. Thus, the product is of the form

From (6.3.21) by (6.3.18), (6.3.15), and (6.3.13) it follows that

Therefore, the conclusion of the theorem will follow from Theorem 2.2.1,if we can show that for any integer N 

320 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

where is the desired feedback gain realizing the exact pole

assignment.

Proof. Define

where and are given by (6.3.14) and (6.3.17), respectively.By (6.3.11) and (6.3.16) it follows that

Thus, (6.3.19) and (6.3.20) become

It is clear that the recursive algorithm for has the same structure

as (2.1.1)–(2.1.3). For the present case, as function required inA2.2.2 we may take

Page 333: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 333/368

 Application to Systems and Control 321

where

By A6.3.1 we have

whereBy A6.3.2 and the convergence theorem for martingale difference se-

quences it follows that

for any integer which implies (6.3.24).

6.4. Application to Adaptive Regulation

We now apply the SA method to solve the adaptive regulation problemfor a nonlinear nonparametric system.

Consider the following system

where is the system state, is the control, andis an unknown nonlinear function with being

the unknown equilibrium for the system (6.4.1).Assume the state is observed, but the observations are corruptedby noise:

where is the observation noise, which may depend onThe purpose of adaptive regulation is to define adaptive control based

on measurements in order the system state to reach the desired value,

which, without loss of generality, may be assumed to be equal to zero.We need the following conditions.

A6.4.1   and 

Page 334: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 334/368

322 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

A6.4.2  The upper bound for is known, i.e., and is

robust stabilizing control in the sense that for any the state

tends to zero for the following system

A6.4.3 The system  (6.4.1) is BIBS stable, i.e., for any bounded input,the system state is also bounded;

A6.4.4  is continuous for bounded i.e., for any

A6.4.5 The system  (6.4.1) is strictly input passive, i.e., there are and such that for any input 

A6.4.6  For any convergent subsequence

where is defined by (1.3.2).

It is worth noting that A6.4.6 becomes

if is independent of The adaptive control is given according to the following recursive

algorithm:

where b is specified in A6.4.2.

Theorem 6.4.1  Assume A6.4.1 –  A6.4.6. Then the system (6.4.1), (6.4.2),

and  (6.4.4) has the desired properties:

Page 335: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 335/368

 Application to Systems and Control 323

at sample paths where A6.4.6 holds.

Proof. Let be a convergent subsequence of such that

and

We have

for sufficiently large and small enough T , where is a constant to bespecified later on. The relationships (6.4.5) and (6.4.6) can be provedalong the lines of the proof for Theorem 2.2.1, but here is known to bebounded, and (6.4.5) and (6.4.6) can be proved more straightforwardly.We show this.

Since the system (6.4.1) is BIBS, from it follows that thereis such that

By A6.4.6 for large and small T >  0,

This implies that

Let be large enough such that

and let T  be small enough such that

Then we have

and hence there is no truncation in (6.4.4) for i.e., (6.4.5) holdsfor Therefore,

Page 336: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 336/368

indeed.By induction, the assertions (6.4.5) and (6.4.6) have been proved.We now show that for any convergent subsequence

there is a such that

from (6.4.4) it follows that (6.4.5) holds for Hence,

Thus, (6.4.5) and (6.4.6) hold for Assume they are true for allWe now show that they are true for

too.Since

for small enough T > 0.By A6.4.5, we have

Let us restrict in (6.4.8) to Then forsmall T  and large from (6.4.6) and (6.4.8) it follows that

324 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

and (6.4.6) is true for

Page 337: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 337/368

Since and it is seen that

Using a partial summation, by (6.4.9) we have

for all sufficiently large and small enough T  > 0.Set

for

This implies that there exist a and a sufficiently large whichmay depend on but is independent of such that

 Application to Systems and Control 325

Page 338: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 338/368

326 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Then (6.4.10) implies that

This proves (6.4.7).Define

From (6.4.7) it follows that

for convergent subsequenceUsing A6.4.6 and (6.4.11), by completely the same argument as that

used in the proof (Steps 3– 6) of Theorem 2.2.1, we conclude that

Finally, write (6.4.1) as

By A6.4.4 and the boundedness of we have

and by A6.4.2 we conclude

Remark 6.4.1 It is easy to see that A6.4.6 is also necessary if A6.4.1–A6.4.5 hold and and This is because for largethe observation noise can be expressed as

and hence

Page 339: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 339/368

6.5.   Notes and ReferencesFor system identification and adaptive control we refer to [10, 23, 54,

62, 75, 90]. The identification problem stated in Section 6.1 was solved in[72] by ODE method. In comparison with [72], conditions used here haveconsiderably been weakened, and the convergence is proved by the TSmethod rather than the ODE method. Section 6.1 is based on the jointwork by H. F. Chen, T. Duncan and B. Pasik-Duncan. The existenceand uniqueness of the solution to (6.1.50) can be found, e.g., in [23]. Forstochastic quadratic control refer to [2, 10, 12, 33].

Adaptive stabilization for stochastic systems is dealt with in [5, 55, 77].

The convergence of WLS and adaptive stabilization using WLS are givenin [55]. The problem is solved by the SA method in [19]. This approachis presented in Section 6.2.

The pole assignment problem for stochastic system with unknowncoefficients is solved by SA with the help of learning in Section 6.3,which is based on [20]. For concept of linear control systems we refer to

 Application to Systems and Control 327

which tends to zero as since and

Remark  6.4.2 In the formulation of Theorem 6.4.1 the condition A6.4.5

can be replaced either by (6.4.7) or by (6.4.11), which are the conse-

quences of A6.4.5. Further, the quadratic can be replaced by acontinuously differentiable function such thatand In this case, in (6.4.7) should becorrespondingly replaced by

Example  6.4.1 Let the nonlinear system be affine:

where the scalar nonlinear function is bounded from above and frombelow by positive constants:

Note that and hence(6.4.7) holds, if Assume is known:Then A6.4.2, A6.4.3, and A6.4.4 are satisfied. Therefore, if satisfiesA6.4.6, then given by (6.4.4) leads to and

In the area of system and control, the SA methods also are successfullyapplied in discrete event dynamic systems, especially, to the perturbationanalysis based parameter optimization.

Page 340: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 340/368

328 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

[1, 46, 60]. The connection between the feedback gain and coefficients of the desired characteristic polynomial is called the Ackermann’s formula,which can be found in [46].

Application of SA to adaptive regulation is based on [26].

For perturbation analysis of discrete event dynamic systems we referto [58]. The perturbation analysis based parameter optimization is dealtwith in [29, 86, 87].

Page 341: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 341/368

Appendix A

In Appendix A we introduce the basic concept of probability theory. Results arepresented without proof. For details we refer to [31, 32, 70, 76, 84].

A.1. Probability SpaceThe basic space is denoted by The point is called elementary event or

sample. The point set in is denoted by  A,

Let be a family of sets in satisfying the following conditions:1.

2.

3.Then, is called the or The element  A  of is called themeasurable set, or random event, or event.

As a consequence of Properties 2 and 3,

then the complement of  A, also belongs toIf 

If 

if 

A set function defined on is called -additive if  for any

sequence of disjoint events By definition, one of the values or isnot allowed to be taken by

A nonnegative set function is called a measure.Define

The set functions and are called the upper, lower, and totalvariation of on respectively.

Jordan-Hahn Decomposition Theorem If is on then there

exists a set D such that, for any

and are measures andLet P be a set function defined on with the following properties.1.

2.

329

then

Page 342: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 342/368

330 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

3. if are disjoint. Then,  P  is called a

probability measure on The triple is called a probability space.PA  is called the probability of random event  A.

It is assumed that any subset of a measurable set of probability zero is measurableand its probability is zero. After such a completion of measurable sets the resulting

probability space is called completed.If a relationship between random variables holds for any with possible exception

of a set with probability zero, then we say this relationship holds a.s. (almost surely)

or with probability one.

A.2. Random Variable and Distribution FunctionIn R,  the real line, the smallest containing all intervals is called the Borel

and is denoted by The “smallest” means that if there is acontaining all intervals, then there must be in the sense that for any

The Borel can also be defined in Any set in or iscalled the Borel set.

Any interval can be endowed with a measure equal to its length. This measurecan be extended to each i.e., to each Borel set. Any subset of a set with

measure zero is also assumed to be a measurable set with measure zero. After such acompletion, the measurable set is called Lebesgue measurable, and the measure theLebesgue measure. In what follows always means the completed Borel

A real function defined on is called measurable, if 

If is a real measurable function defined on andthen is called a random variable. Therefore, if is a measurable function, then

is also a random variable if Let be a random variable. The distribution function of is defined as

By a random vector we mean that each component

of is a random variable.The distribution function of a random vector is defined as

If is differentiable, then its derivative is called the densityof The density of a random vector is defined by a similar way. The density of l-dimensional normal distribution is defined by

A.3. ExpectationLet be a random variable and let

Define

Page 343: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 343/368

 APPENDIX A 331

whereis called the expectation of 

For an arbitrary random variable define

The expectation of is defined as

if at least one of and is finite .If then is called integrable.The expectation of can be expressed by a Lebesgue-Stieltjes integral with respect

to its distribution function

In the density of  l-dimensional random vector with normal distribution,

A.4. Convergence Theorems and InequalitiesLet be a sequence of random variables and be a random variable.If then we say that converges to and write

If for any then we say that converges to in

probability and writeIf the distribution functions of converge to at any where

is continuous, then we say weakly (or in distribution) converges to and write

If then we say converges to in the mean square sense and

write l.i.m.implies which in turn implies

Monotone Convergence Theorem  If random variables nondecreasingly(nonincreasingly) converge to andthen

Dominated Convergence Theorem  If and there exists an inte-

grable random variable such that then and

Fatou Lemma If for some random variable withthen

If is a measurable function, then

Page 344: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 344/368

332 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Chebyshev Inequality

Lyapunov Inequality

Hölder Inequality

In the special case where the Hölder inequality is called the Schwarzinequality.

A.5.   Conditional Expectat ionLet be a probability space. is called a of if is a

and by which it is meant that any impliesRadon-Nikodym Theorem  Let be a of For any random

variable with at least one of and being finite, there is an uniquemeasurable random variable denoted by such that for any

The random variable satisfying the above equality is called condi-tional expectation of given

Let be the smallest (see A.2) containing all setsis called the generated by

The conditional expectation of given is defined as

Let  A  be an event. Conditional probability of  A given is defined by

Properties of the conditional expectation are listed below.1) for constants and2)3) if is and

4) if 

5) if 

Convergence theorems and inequalities stated in A.4 remain true with expectationreplaced by the conditional expectation For example, the conditional

Hölder inequality

forFor a sequence of random variables and a the consistent

conditional distribution functions of given

Let and Then

Page 345: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 345/368

 APPENDIX A 333

can be defined such that i) they are for any and anyfixed ii) they are distribution functions for any fixed and iii) for anymeasurable function

A.6. IndependenceLet be a sequence of events.If for any set of indices

then is called mutually independent.Let be a sequence of If events are mutually

independent whenever then the family of  is called mutually independent.

Let  be a sequence of random variables and let be the generatedby   If  is mutually independent, then the sequence of random variablesis called mutually independent.

Law of iterated logarithm Let be a sequence of independent and identically

distributed (iid) random variables, Then

Proposition A.6.1  Let be a measurable function  defined   on

 If the l-dimensional random vector is independent of the m-dimensional ran-

dom vector then

where

From this proposition it follows that

if is independent of 

A. 7. ErgodicityLet be a sequence of random variables and let be the

distribution function of If for any integer then

is called stationary, or is a stationary process.

Proposition A.7.1  Let   be stationary.

 provided exists for all in the range of 

Page 346: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 346/368

334 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

 If exists, then

where is a of and is called invariant 

If then the stationary process is called ergodic. Thus, forstationary and ergodic process we have

If is a sequence of mutually independent and identically distributed (and hence

stationary) random variables, then and the sequence is ergodic.

Page 347: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 347/368

Appendix B

In Appendix B we present the detailed proof of convergence theorems for martin-

gales and martingale difference sequences.Let be a sequence of random variables, and let be a family of nonde-

creasing i.e.,

If is for any then we write and call it as an adaptedprocess.

An adapted process with is called a martingale if a supermartingale if and a submartingale if 

An adapted process is called a martingale difference sequence (MDS) if 

A sequence of mutually independent random vectors with is anobvious example of MDS.

An integer-valued measurable function is called a Markov time with respect toif 

If, in addition, then is called a stopping time.

B.1. Convergence Theorems for MartingaleLemma B.1.1  Let be adapted, a Markov time, and B  a Borel set. Letbe the first time at which the process hits the set  B  after time i.e.,

Then is a Markov time.Proof.  The conclusion follows from the following expression:

335

Page 348: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 348/368

336 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

For defining the number of up-crossing of an interval by a submartingale

we first define

The largest for which is called the number of up-crossing of the interval

by the process and is denoted by

By Lemma B.1.1

So, is a Markov time.

Assume is a Markov time. Again, by Lemma B.1.1,

and

Therefore, all are Markov times.Theorem B.1.1 (Doob)  For submartingales the following inequalities

hold

where

Proof. Note that equals the number of up-crossing of the interval

by the submartingale or bySince for

is a submartingale.

Thus, without loss of generality, it suffices to prove that for a nonnegative sub-

martingale

Define

Page 349: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 349/368

 APPENDIX B 337

Define also Then for even crosses (0, b) from time toTherefore,

and

Further, the set is since is a Markov time,and

Taking expectation of both sides of (B-l-2) yields

where the last inequality holds because is a submartingale and hence theintegrand is nonnegative.

Thus (B.1.1) and hence the theorem is proved.Theorem B.1.2 (Doob) Let be a submartingale with

a.s.Then there is a random variable with such that

Proof.  Set

Assume the converse:Then

where and run over all rational numbers.

Page 350: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 350/368

338 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By the converse assumption there exist rational numbers such that

Let be the number of up-crossing of the interval byBy Theorem B.1.1

By the monotone convergence theorem from (B-1-4) it follows that

However, (B.1.3) implies which contradicts (B.1.5). Hence,

and

where is invoked. Hence,Corollary B.1.1  If is a nonnegative supermartingale or nonpositive sub-

martingale, then

Because for nonpositive submartingales the corollary follows from the the-orem; while for a nonnegative supermartingale is a nonpositivesubmartingale.

Corollary B.1.2  If is a martingale with thenand

This is because for a martingale andand hence

or converges to a limit which is finite a.s.By Fatou lemma it follows that

Page 351: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 351/368

 APPENDIX B 339

B.2. Convergence Theorems for MDS ILet be an adapted process, and let  G  be a Borel set in

Then the first exit time from G defined by

is a Markov time. This is because

Lemma B.2.1.  Let be a martingale (supermartingale, submartingale)and a Markov time. Then the process stopped at is again a martingale(supermartingale, submartingale), where

Proof. Note that

is

If is a martingale, then

This shows that is a martingale. For supermartingales and submartin-gales the proof is similar.

Theorem B.2.1. Let be a one-dimensional MDS. Then as

converges on

Proof. Since is the first exit time

is a Markov time and by Lemma B.2.1 is a martingale, where  M   is apositive constant.

Noticing that and that

is we find

Page 352: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 352/368

340 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

By Corollary B.1.2 converges as It is clear that onTherefore, as pathwisely converges on Since  M   is

arbitrary, converges on which equals  A.

Theorem B.2.2.  Let be an MDS and  If 

then converges on If then

converges on

Proof.  It suffices to prove the first assertion, because the second one is reduced tothe first one if is replaced byDefine

By Lemma B.2.1 is a martingale. It is clear that

Consequently,

By Theorem B.1.2 converges asSince on as converges on and

consequently on which equals

B.3. Borel-Cantelli-Lévy LemmaTheorem B.3.1. (Borel-Cantelli-Lévy Lemma) Let be a sequence of 

events, Then if and only if or equivalently,

Proof.  Define

Clearly, is a martingale and is an MDS.Since by Theorem B.2.2, converges on

Page 353: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 353/368

 APPENDIX B 341

If then from (B.3.2) it follows that which implies that

converges. Then, this combining with by (B.3.2) yields

Conversely, if    then from (B.3.2) it follows that

Noticing that is contained in the set where converges by

Theorem B.2.2, from the convergence of by (B.3.2) it follows that

If are mutually independent and then

Proof. Denote by the generated by

If then

and hence which, by (B.3.1), implies (B.3.3).

When are mutually independent, then

B.4.   Convergence  Criteria for AdaptedSequences

Let be an adapted process.Theorem B.4.1  Let be a sequence of positive numbers. Then

where

Consequently, implies andfollows from (B.3.1).

Theorem B.3.2 (Borel-Cantelli Lemma) Let be a sequence of events. If 

then the probability that occur infinitely often is zero, i.e.,

Page 354: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 354/368

342 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

Proof. Set

By Theorem B.3.1

or

This means that  A  is the set where events may occur only finitely many times.

Therefore, on  A  the series converges if and only if converges.

Theorem B.4.2 (Three Series Criterion)  Denote by  S  the where the

following three series converge:

and

where  c  is a positive constant.

Then converges on  S   as

Proof.  Taking in (B.4.1), we have and

by Theorem B.4.1.Define

Since converges on S,  from (B.4.2) it follows that

Noticing that is an MDS and

we see

By Theorem B.2.1 converges on S,  or

Page 355: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 355/368

 APPENDIX B 343

Then from (B.4.3) it follows that

or converges ).

B.5. Convergence Theorems for MDS IILet be an MDS.

Theorem B.5.1 (Y. S. Chow) converges on

Proof.  By Theorem B.4.2 it suffices to prove where  S   is defined in Theo-rem B.4.2 with replaced by considered in the present theorem.

We now verify that three series defined in Theorem B.4.2 are convergent on  A  if is replaced byFor convergence of the first series it suffices to note

For convergence of the second series, taking into account we find

Finally, for convergence of the last series it suffices to note

and

by the conditional Schwarz inequality.Theorem B.5.2.  The conclusion of Theorem B.5.1 is valid also forProof.  Define

Then we have

Page 356: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 356/368

344 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

on A where  A  is still defined by (B-5-1) but with

Applying Theorem B.5.1 with to the MDS leads to that

converges on  A, i.e.,

This is equivalent to

B.6. Weighted Sum of  MDSTheorem B.6.1  Let be an l-dimensional MDS and let be a

matrix adapted process. If 

for some then as

where

Proof.  Without loss of generality, assume

Notice that convergence of implies convergence of since for

sufficiently largeConsequently, from (B.5.2) it follows that

Page 357: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 357/368

 APPENDIX B 345

We have the following estimate:

By Theorems B.5.1 and B.5.2 it follows that

where

Notice that is nondecreasing as If is bounded, then the conclusion

of the theorem follows from (B.6.1). If then by the Kronecker lemma(see Section 3.4) the conclusion of the theorem also follows from (B.6.1).

Page 358: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 358/368

References

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

B. D. O. Anderson and T. B. Moore, Optimal Control: Linear Quadratic Methods,

Prentice-Hall, N. J., 1990.

K. J. Åström, Introduction to Stochastic Control, Academic Press, New York,1970.

M. Benaim, A dynamical systems approach to stochastic approximation, SIAMJ. Control & Optimization, 34:437–472, 1996.

A. Benveniste, M. Metivier and P. Priouret, Adaptive Algorithms and Stochastic

Approximation, Springer-Verlag, New York, 1990.

B. Bercu, Weighted estimation and tracking for ARMAX models, SIAM J. Con-

trol & Optimization, 33:89–106, 1995.

P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968.

J. R. Blum, Multidimensional stochastic approximation, Ann. Math. Statist.,

9:737–744, 1954.

V. S. Borkar, Asynchronous stochastic approximations, SIAM J. Control and

Optimization, 36:840–851, 1998.

O. Brandière and M. Duflo, Les algorithmes stochastiques contournents-ils les

pièges? Ann. Inst. Henri Poincaré, 32:395–427, 1996.

P. E. Caines, Linear Stochastic Systems, Wiley, New York, 1988.

H. F. Chen, Recursive algorithms for adaptive beam-formers, Kexue Tongbao

(Science Bulletin), 26:490–493, 1981.

H. F. Chen, Recursive Estimation and Control for Stochastic SyNew York, 1985.

stems, Wiley,

H. F. Chen, Asymptotic efficient stochastic approximation, Stochastics and

Stochastics Reports, 45:1–16, 1993.

347

Page 359: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 359/368

348 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

H. F. Chen, Stochastic approximation and its new applications, Proceedingsof 1994 Hong Kong International Workshop on New Directions of Control andManufacturing, 1994, 2–12.

H. F. Chen, Convergence rate of stochastic approximation algorithms in thedegenerate case, SIAM J. Control & Optimization, 36:100–114, 1998.

H. F. Chen, Stochastic approximation with non-additive measurement noise, J.

of Applied Probability, 35:407–417, 1998.

H. F. Chen, Convergence of SA algorithms in multi-root or multi-extreme cases,

Stochastics and Stochastics Reports, 64: 255–266, 1998.

H. F. Chen, Stochastic approximation with state-dependent noise, Science inChina (Series E), 43:531–541, 2000.

H. F. Chen and X. R. Cao, Controllability is not necassry for adaptive poleplacement control, IEEE Trans. Autom. Control, AC-42:1222–1229, 1997.

H. F. Chen and X. R. Cao, Pole assignment for stochastic systems with unknowncoefficients, Science in China (Series E), 43:313–323, 2000.

H. F. Chen, T. Duncan, and B. Pasik-Duncan, A Kiefer-Wolfowitz algorithmwith randomized differences, IEEE Trans. Autom. Control, AC-44:442–453, 1999.

H. F. Chen and H. T. Fang, Nonconvex stochastic optimization for model reduc-tion, Global Optimization, 2002.

H. F. Chen and L. Guo, Identification and Stochastic Adaptive Control,Birkhäuser, Boston, 1991.

H. F. Chen, L. Guo, and A. J. Gao, Convergence and robustness of the Robbins-Monro algorithm truncated at randomly varying bounds, Stochastic Processesand Their Applications, 27:217–231, 1988.

H. F. Chen and R. Uosaki, Convergence analysis of dynamic stochastic approx-

imation, Systems and Control Letters, 35:309–315, 1998.

H. F. Chen and Q. Wang, Adaptive regulator for discrete-time nonlinear non-

parametric systems, IEEE Trans. Autom. Control, AC-46: , 2001.

H. F. Chen and Y. M. Zhu, Stochastic approximation procedures with randomlyvarying truncations, Scientia Sinica (Series A), 29:914–926, 1986.

H. F. Chen and Y. M. Zhu, Stochastic Approximation (in Chinese), ShanghaiScientific and Technological Publishers, Shanghai, 1996.

E. K. P. Chong and P. J. Ramadge, Optimization of queues using an infinitesi-mal perturbation analysis-based stochastic algorithm with general update times,SIAM J. Control & Optimization, 31:698–732, 1993.

Y. S. Chow, Local convergence of martingales and the law of large numbers,Ann. Math. Statst. 36:552–558, 1965.

Page 360: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 360/368

REFERENCES   349

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

 Y. S. Chow and H. Teicher, Probablility Theory: Independence, Interchangeability, Martingales, Springer Verlag, New York, 1978.

K. L. Chung, A Course in Probability Theory, (second edition), Academic Press,New York, 1974.

M. H. A. Davis, Linear Estimation and Stochastic Control, Chapman and Hall,New York, 1977.

K. Deimling, Nonlinear Functional Analysis, Springer, Berlin, 1985.

B. Delyon and A. Juditsky, Stochastic optimization with averaging of trajecto

ries, Stochastics and Stochastics Reports, 39:107–118, 1992.

E. F. Deprettere (eds.), SVD and Signal Processing, Elsevier, HorthHolland,1988.

N. Dunford and J. T. Schwartz, Linear Operators, Part 1: General Theory, Wiley

Interscience, New York, 1966.

 V. Dupač, A dynamic stochastic methods, Ann. Math. Statist. 36:1695–1702.

 V. Dupač, Stochastic approximation in the presense of trend, Czeshoslovak Math.

J., 16:454–461, 1966.

 A .  Dvoretzky, On stochastic approximation, Proceedings of the Third Berkeley 

Symposium on Mathematical Statistics and Probability, pp. 39–55, 1956.

S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Conver

gence, Wiley, New York, 1986.

E. Eweda, Convergence of the sign algorithm for adaptive filtering with corre

lated data, IEEE Trans. Information Theory, IT37:14501457, 1991.

 V. Fabian, On asymptotic normality in stochastic approximation, Ann. of Math.

Statis., 39: 1327–1332, 1968.

 V. Fabian, On asymptotically efficient recursive estimation, Ann. Statist., 6:854–856, 1978.

 V. Fabian, Simulated annealing simulated, Computers Math. Applic., 33:81–94,

1997.

F. W. Fairman, Linear Control Theory, The State Space Approach, Wiley, Chich

ester, 1998.

H. T. Fang and H. F. Chen, Sharp convergence rates of stochastic approximation

for degenerate roots, Science in China (Series E), 41:383–392, 1998.

H. T. Fang and H. F. Chen, Stability and instability of limit points of stochastic

approximation algorithms, IEEE Trans. Autom. Control, AC45:413–420, 2000.

H. T. Fang and H. F. Chen, An a.s. convergent algorithm for global optimization

 with noise corrupted observations, J. Optimization and Its Applications, 104:343–

376, 2000.

Page 361: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 361/368

Page 362: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 362/368

 REFERENCES  351

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

H. J. Kushner and G. Yin, Stochastic Approximation Algorithms and Applica-tions, Springer-Verlag, New York, 1997.

J. P. LaSaller and Lefchetz, Stability by Lyapunov’s Direct Methods with Ap-

plications, Academic Press, New York, 1961.

R. Liptser and A. N. Shiryaev, Statistics of Random Processes, Springer-Verlag,New York, 1977.

R. Liu, Blind signal processing: An introduction, Proceedings 1996 Intl. Symp.Circuits and Systems, Vol. 2, 81–83, 1996.

L. Ljung, Analysis of recursive stochastic algorithms, IEEE Trans. Autom. Con-

trol, AC-22:551-575, 1977.

L. Ljung, On positive real transfer functions and the convergence of some recur-sive schemes, IEEE Trans. Autom. Control, AC-22:539–551, 1977.

L. Ljung, G. Pflug, and H. Walk, Stochastic Approximation and Optimizationof Random Systems, Birkhäuser, Basel, 1992.

L. Ljung and T. Söderström, Theory and Practice of Recursive Identification,MIT Press, Cambridge, MA, 1983.

M. Loéve, Probability Theory, Springer, New York, 1977–1978.

R. Lozano and X. H. Zhao, Adaptive pole placement without excitation probingsignals, IEEE Trans. Autom. Control, AC-39:47–58, 1994.

M. B. Nevelson and R. Z. Khasminskii, Stochastic Approximation and Recur-

sive Estimation, Amer. Math. Soc., Providence, RI, 1976, Translation of Math.

Monographs, Vol. 47.

E. Oja, Subspace Methods of Pattern Recognition, 1st ed., Letchworth, ResearchStudies Press Ltd., Hertfordshire, 1983.

B. T. Polyak, New stochastic approximation type procedures, (in Russian) Au-tom. i Telemekh., 7:98–107, 1990.

B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation byaveraging, SIAM J. Control & Optimization, 30:838–855, 1992.

H. Robbins and S. Monro, A stochastic approximation method, Ann. Math.

Statist., 22:400–407, 1951.

D. Ruppert, Stochastic approximation, In B. K. Ghosh and P. K. Sen, Editors,

Handbook in Sequential Analysis, 503–529, Marcel Dekker, New York, 1991.

A. N. Shiryaev, Probability, Springer, New York, 1984.

J. C. Spall, Multivariate stochastic approximation using a simultaneous pertur-

bation gradient approximation, IEEE Trans. Autom. Control, AC-37:331–341,

1992.

Page 363: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 363/368

352 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]

[98]

[99]

Q. Y. Tang and H. F. Chen, Convergence of perterbation analysis based optimiza-tion algorithm with fixed-number of customers period, Discrete Event DynamicSystems, 4:359–373, 1994.

Q. Y. Tang, H. F. Chen, and Z. J. Han, Convergence rates of perturbation-

analysis-Robbins-Monro-Single-run algorithms, IEEE Trans. Autom. Control,AC-42:1442–1447, 1997.

J. N. Tsitsiklis, Asynchronous stochastic approximation and  Q-learning MachineLearning, 16:185–202, 1994.

N. J. Tsitsiklis, D. P. Bertsekas, and M. Athans, Distributed asynchronous de-terministic and stochastic gradient optimization algorithms, IEEE Trans. Autom.

Control, 31:803–812, 1986.

Ya. Z. Tsypkin, Adaptation and Learning in Automatic Systems, AcademicPress, New York, 1971.

K. Uosaki, Some generalizations of dynamic stochastic approximation processes,Ann. Statist., 2:1042–1048, 1974.

J. Venter, An extension of the Robbins-Monro procedure, Ann. Math. Stat.,38:181–190, 1967.

G. J. Wang and H. F. Chen, Behavior of stochastic approximation algorithm

in root set of regression function, Systems Science and Mathematical Sciences,12:92–96, 1999.

I. J. Wang, E. K. P. Chong and S. R. Kulkarni, Equivalent necessary and suffi-

cient conditions on noise sequences for stochastic approximation algorithms, Adv.Appl. Probab., 28:784–801, 1996.

C. Z. Wei, Multivariate adaptive stochastic approximation, Ann. Stat., 15:1115–1130, 1987.

G. Xu, L. Tong, and T. Kailath, A least squares approach to blind identification,IEEE Trans. Signal Processing, SP-43:2982–2993, 1995.

S. Yakowitz, A globally convergent stochastic approximation, SIAM J. Control

& Optimization, 31:30–40, 1993.

G. Yin, On extensions of Polyak’s averaging approach to stochastic approxima-tion, Stochastics and Stochastics Reports, 36:245–264, 1991.

G. Yin and Y. M. Zhu, On w.p.l. convergence of a parallel stochastic approxi-mation algorithm, Probability in the Eng. and Infor. Sciences, 3:55–75, 1989.

[100] R. Zeilinski, Global stochastic approximation: A review of results and someopen problems. In F. Archetti and M. Cugiani (eds.), Numerical Techniques forStochastic Systems, 379–386, Northholland Publ. Co., 1980.

[101] J. H. Zhang and H. F. Chen, Convergence of algorithms used for principalcomponent analysis, Science in China (Series E), 40:597–604, 1997.

Page 364: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 364/368

 REFERENCES  353

[102] K. Zhou, J. C. Doyle, and K. Glover, Robust Optimal Control, Prentice-Hall,New Jersey, 1996.

Page 365: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 365/368

Index

50, 55, 247329

329329

Ackermann’s formula, 328adapted process, 335adapted sequence, 341adaptive control, 290, 303, 327adaptive filter, 288adaptive filtering, 265, 273adaptive regulation, 321adaptive stabilization, 305, 307, 314, 327adaptive stochastic approximation, 132,

149adaptively stabilizable, 310admissible controls, 302algebraic Riccati equation, 131ARMA process, 39Arzelá-Ascoli theorem, 11, 24asymptotic behavior, 194asymptotic efficiency, 95, 130, 132, 149asymptotic normality, 95, 113, 119, 127,

149, 210

asymptotic properties, 95, 166asymptotically efficient, 135asynchronous stochastic approximation,

219, 278, 288averaging technique, 132, 149

balanced realization, 210, 214balanced truncation, 214, 215blind channel identification, 219, 220, 223blind identification, 220Borel 330

Borel set, 330Borel-Cantelli Lemma, 341Borel-Cantelli-Lévy Lemma, 340

certainly-equivalency-principle, 304, 306

Chebyshev inequality, 332closure, 38conditional distribution function, 332conditional expectation, 332conditional probability, 332conditional Schwarz inequality, 343constant interpolating function, 13constrained optimization problem, 268controllable, 307, 317, 319controller form, 317–319convergence, 28, 36, 41, 153, 223, 331, 341convergence analysis, 6, 28, 95, 154convergence rate, 95, 96, 101–103, 105,

149convergence theorem for martingale dif-

ference sequences, 97, 128, 160,170, 185, 196, 231, 249, 321,339,  343

convergence theorem for nonnegative su-permartingales, 7–9

convergence theorems for martingale, 335convergent subsequence, 17, 18, 30, 36,

84, 86, 89, 178, 187, 237, 241,244, 271, 275, 280, 282, 283,

285, 287, 288, 297, 312, 315,322, 323

coprimeness, 306covariance matrix, 130, 132crossing, 18, 34, 188, 236, 312

degenerate case, 103, 149density, 330distribution function, 330dominant stability, 59, 62dominated convergence theorem, 331

dynamic stochastic approximation, 82, 93

equi-continuous, 15ergodic, 265, 268, 270, 273, 274, 334ergodicity, 333

355

Page 366: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 366/368

356 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 

event, 329expectation, 330

Fatou lemma, 331first exit time, 9, 339

general convergence theorems, 28global minimum, 177global minimizer, 174, 177, 180global optimization, 172–174, 218global optimization algorithm, 180, 194global optimizer, 152globally Lipschitz continuous, 292Gronwall inequality, 298

Hölder Inequality, 332Hankel matrix, 222Hankel norm approximation, 210, 214,

215Hessian, 8, 195

identification, 290integrable, 331interpolating function, 11invariant 334

Jordan-Hahn decomposition, 55, 56, 295,329

Kiefer-Wolfowitz (KW) algorithm, 151–153, 166, 173,  218

Kronecker lemma, 67, 144, 148, 345Kronecker product, 248KW algorithm with expanding trunca-

tions, 152, 154, 173–175

Law of iterated logarithm, 333Lebesgue measurable, 330Lebesgue measure, 330

Lebesgue-Stieltjes integral, 331linear interpolating function, 12Lipschitz continuous, 23Lipschitz-continuity, 160local search, 172, 173locally bounded, 17, 29, 96, 103, 133locally Lipschitz continuous, 50, 155, 163,

177, 280Lyapunov equation, 105Lyapunov function, 6, 8, 10, 11, 17, 111,

226, 268, 313

Lyapunov inequality, 144, 332Lyapunov theorem, 98

MA process, 171Markov time, 6, 335, 336, 339martingale, 335, 339, 340martingale convergence theorem, 6, 180,

297

martingale difference sequence, 6, 16, 42,97, 128, 134, 159, 164, 168,179, 185, 195–197, 231, 250,257, 294, 335

maxinizer, 151

measurable, 17, 29, 96, 103, 133measurable function, 330measurable set, 329measure, 329minimizer, 151mixing condition, 291model reduction, 210monotone convergence theorem, 331multi-extreme, 163, 164multi-root, 46, 57mutually independent, 333, 341

necessity of noise condition, 45non-additive noise, 49nondegenerate case, 96, 149nonnegative adapted sequence, 7nonnegative supermartingale, 6, 7, 338nonpositive submartingale, 338normal distribution, 113, 114, 330nowhere dense, 29, 35, 37, 41, 177, 181,

182, 280, 291

observation, 5, 17, 132, 321observation noise, 5, 103, 133, 175, 195,

321ODE method, 2, 10, 24, 327one-sided randomized difference, 172optimal control, 303optimization, 151optimization algorithm, 212ordinary differential equation (ODE), 10

pattern classification, 219perturbation analysis, 328pole assignment, 316, 318, 327principal component analysis, 238, 288probabilistic method, 4probability measure, 330probability of random event, 330probability space, 329, 330Prohorov’s theorem, 22, 24

Radon-Nikodym Theorem, 332

random noise, 10, 21random search, 172random variable, 330randomized difference, 152–154recursive blind identification, 246relatively compact, 22RM algorithm with expanding trunca-

tions, 28, 155, 309, 319

Page 367: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 367/368

 INDEX  357

Robbins-Monro (RM) algorithm, 1, 5, 8,11, 12, 17, 20, 45, 110, 310, 313

robustness, 67, 93

SA algorithm, 67

SA algorithm with expanding trunca-tions, 25, 40, 95, 290SA with randomly varying truncations, 93Schwarz inequality, 142, 332sign algorithms, 273, 288signal processing, 219, 265signed measure, 56, 295Skorohod representation, 23Skorohod topology, 21, 24slowly decreasing step sizes, 132spheres with expanding radiuses, 36

stability, 131stable, 96, 97, 102, 131, 133state-dependent, 42, 164state-dependent noise, 29, 57state-independent condition, 41, 42stationary, 265, 268, 270, 273, 274, 333step size, 5, 6, 17, 102, 132, 174stochastic approximation (SA), 1, 223,

226, 246stochastic approximation algorithm, 5,

307, 308stochastic approximation method, 321

stochastic differential equation, 126stochastic optimization, 211stopping time, 335strictly input passive, 322structural error, 10, 157

structural inaccuracy, 21submartingale, 335–337, 339subspace, 41, 226supermartingale, 335, 339surjection, 63system identification, 327

three series criterion, 342time-varying, 44trajectory-subsequence (TS) method, 2,

16, 21

truncated RM algorithm, 16, 17TS method, 28, 327

uniformly bounded, 15uniformly locally bounded, 41up-crossing, 336, 338

weak convergence method, 21, 24weighted least squares, 306weighted sum of MDS, 344Wiener process, 126

Page 368: Stochastic Approximation Applications

8/13/2019 Stochastic Approximation Applications

http://slidepdf.com/reader/full/stochastic-approximation-applications 368/368

Nonconvex Optimization and Its Applications

22.23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41

H. Tuy: Convex Analysis and Global Optimization.  1998 ISBN 0792348184D. Cieslik: Steiner Minimal Trees. 1998 ISBN 0792349830

N.Z. Shor: Nondifferentiable Optimization and Polynomial Problems.  1998

ISBN 0792349970

R. Reemtsen and J.J. Rückmann (eds.): SemiInfinite Programming. 1998

ISBN 0792350545

B. Ricceri and S. Simons (eds.): Minimax Theory and Applications.  1998

ISBN 0792350642

J.P. Crouzeix, J.E. MartinezLegaz and M. Volle (eds.): Generalized Convexitiy,

Generalized Monotonicity: Recent Results.  1998 ISBN 079235088X J. Outrata, M. Kočvara and J. Zowe: Nonsmooth Approach to Optimization Problems

with Equilibrium Constraints.  1998 ISBN 0792351703

D. Motreanu and P.D. Panagiotopoulos: Minimax Theorems and Qualitative Proper

ties of the Solutions of Hemivariational Inequalities.  1999 ISBN 0792354567

J.F. Bard: Practical Bilevel Optimization. Algorithms and Applications. 1999

ISBN 0792354583

H.D. Sherali and W.P. Adams: A ReformulationLinearization Technique for Solving 

Discrete and Continuous Nonconvex Problems.  1999 ISBN 0792354877

F. Forgó, J. Szép and F. Szidarovszky: Introduction to the Theory of Games. Concepts,Methods, Applications. 1999 ISBN 0792357752

C.A. Floudas and P.M. Pardalos (eds.):  Handbook of Test Problems in Local and 

Global Optimization.  1999 ISBN 0792358015

T. Stoilov and K. Stoilova: Noniterative Coordination in Multilevel Systems.  1999

ISBN 0792358791

J. Haslinger, M. Miettinen and P.D. Panagiotopoulos: Finite Element Method for 

Hemivariational Inequalities. Theory, Methods and Applications. 1999

ISBN 0792359518

 V. Korotkich: A Mathematical Structure of Emergent Computation.  1999ISBN 0792360109

C.A. Floudas: Deterministic Global Optimization: Theory, Methods and Applications.

2000 ISBN 0792360141

F. Giannessi (ed.): Vector Variational Inequalities and Vector Equilibria. Mathemat

ical Theories. 1999 ISBN 0792360265

D. Y. Gao: Duality Principles in Nonconvex Systems. Theory, Methods and Applica

tions. 2000 ISBN 0792361453

C.A. Floudas and P.M. Pardalos (eds.):  Optimization in Computational Chemistry

and Molecular Biology. Local and Global Approaches. 2000 ISBN 0792361555G Isac: Topological Methods in Complementarity Theory 2000 ISBN 0 7923 6274 8