Point Estimation, Stochastic Approximation, and Robust Kalman
Stochastic Approximation Applications
-
Upload
amir-freeiran -
Category
Documents
-
view
223 -
download
0
Transcript of Stochastic Approximation Applications
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 1/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 2/368
Stochastic Approximation and Its Applications
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 3/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 4/368
Stochastic Approximation
and Its Applications
by
Han-Fu Chen Institute of Systems Science,
Academy of Mathematics and System Science,
Chinese Academy of Sciences,
Beijing, P.R. China
KLUWER ACADEMIC PUBLISHERSNEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 5/368
eBook ISBN: 0-306-48166-9Print ISBN: 1-4020-0806-6
©2003 Kluwer Academic PublishersNew York, Boston, Dordrecht, London, Moscow
Print ©2002 Kluwer Academic Publishers
All rights reserved
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic,mechanical, recording, or otherwise, without written consent from the Publisher
Created in the United States of America
Visit Kluwer Online at: http://kluweronline.comand Kluwer's eBookstore at: http://ebooks.kluweronline.com
Dordrecht
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 6/368
Contents
PrefaceAcknowledgments
1. ROBBINS-MONRO ALGORITHM1.1
1.21.31.4
1.51.6
Finding Zeros of a Function.Probabilistic MethodODE MethodTruncated RM Algorithm and TS Method
Weak Convergence MethodNotes and References
2. STOCHASTIC APPROXIMATION ALGORITHMS WITH
2.12.22.32.4
2.52.6
2.72.82.9
MotivationGeneral Convergence Theorems by TS MethodConvergence Under State-Independent ConditionsNecessity of Noise ConditionNon-Additive NoiseConnection Between Trajectory Convergence and Propertyof Limit PointsRobustness of Stochastic Approximation AlgorithmsDynamic Stochastic ApproximationNotes and References
3. ASYMPTOTIC PROPERTIES OF STOCHASTIC
EXPANDING TRUNCATIONS
APPROXIMATION ALGORITHMS3.13.2
3.3
Convergence Rate: Nondegenerate CaseConvergence Rate: Degenerate CaseAsymptotic Normality
v
ix
xv
1
2
4
10
16
2123
25
26
284145
49
57
6782
93
9596
103113
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 7/368
vi STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
3.43.5
Asymptotic EfficiencyNotes and References
4. OPTIMIZATION BY STOCHASTIC APPROXIMATION
4.14.2
4.34.4
4.54.6
Kiefer-Wolfowitz Algorithm with Randomized DifferencesAsymptotic Properties of KW AlgorithmGlobal Optimization
Asymptotic Behavior of Global Optimization AlgorithmApplication to Model ReductionNotes and References
5. APPLICATION TO SIGNAL PROCESSING
5.15.25.35.45.55.65.7
Recursive Blind IdentificationPrincipal Component AnalysisRecursive Blind Identification by PCAConstrained Adaptive FilteringAdaptive Filtering by Sign AlgorithmsAsynchronous Stochastic ApproximationNotes and References
6. APPLICATION TO SYSTEMS AND CONTROL6.16.26.3
6.4
6.5
Application to Identification and Adaptive ControlApplication to Adaptive StabilizationApplication to Pole Assignment for Systems with UnknownCoefficients
Application to Adaptive RegulationNotes and References
Appendices
A.1
A.2A.3A.4A.5A.6
A.7
Probability SpaceRandom Variable and Distribution FunctionExpectationConvergence Theorems and InequalitiesConditional ExpectationIndependenceErgodicity
B.1B.2B.3
Convergence Theorems for MartingaleConvergence Theorems for MDS IBorel-Cantelli-Lévy Lemma
130149
151
153166172194210218
219
220238246
265273278288
289290
305
316321327
329329
329
330330331332333333335335339340
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 8/368
Contents vii
B.4
B.5
B.6
Convergence Criteria for Adapted SequencesConvergence Theorems for MDS IIWeighted Sum of MDS
References
Index
341
343
344
347
355
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 9/368
Preface
Estimating unknown parameters based on observation data contain-ing information about the parameters is ubiquitous in diverse areas of both theory and application. For example, in system identification theunknown system coefficients are estimated on the basis of input-outputdata of the control system; in adaptive control systems the adaptivecontrol gain should be defined based on observation data in such a waythat the gain asymptotically tends to the optimal one; in blind chan-nel identification the channel coefficients are estimated using the output
data obtained at the receiver; in signal processing the optimal weightingmatrix is estimated on the basis of observations; in pattern classifica-tion the parameters specifying the partition hyperplane are searched bylearning, and more examples may be added to this list.
All these parameter estimation problems can be transformed to aroot-seeking problem for an unknown function. To see this, let de-note the observation at time i.e., the information available about theunknown parameters at time It can be assumed that the parameter
under estimation denoted by is a root of some unknown functionThis is not a restriction, because, for example, mayserve as such a function. Let be the estimate for at time Thenthe available information at time can formally be written as
where
Therefore, by considering as an observation on at withobservation error the problem has been reduced to seeking theroot of based on
It is clear that for each problem to specify is of crucial importance.The parameter estimation problem is possible to be solved only if
ix
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 10/368
x STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
is appropriately selected so that the observation error meets therequirements figured in convergence theorems.
If and its gradient can be observed without error at any desiredvalues, then numerical methods such as Newton-Raphson method amongothers can be applied to solving the problem. However, this kind of methods cannot be used here, because in addition to the obvious problemconcerning the existence and availability of the gradient, the observationsare corrupted by errors which may contain not only the purely randomcomponent but also the structural error caused by inadequacy of theselected
Aiming at solving the stated problem, Robbins and Monro proposedthe following recursive algorithm
to approximate the sought-for root where is the step size. Thisalgorithm is now called the Robbins-Monro (RM) algorithm. Follow-ing this pioneer work of stochastic approximation, there have been alarge amount of applications to practical problems and research workson theoretical issues.
At beginning, the probabilistic method was the main tool in con-vergence analysis for stochastic approximation algorithms, and ratherrestrictive conditions were imposed on both and For example,it is required that the growth rate of is not faster than linear as
tends to infinity and is a martingale difference sequence [78].Though the linear growth rate condition is restrictive, as shown by sim-ulation it can hardly be simply removed without violating convergencefor RM algorithms.
To weaken the noise conditions guaranteeing convergence of the algo-
rithm, the ODE (ordinary differential equation) method was introducedin [72, 73] and further developed in [65]. Since the conditions on noiserequired by the ODE method may be satisfied by a large class of including both random and structural errors, the ODE method has beenwidely applied for convergence analysis in different areas. However, inthis approach one has to a priori assume that the sequence of estimates
is bounded. It is hard to say that the boundedness assumption ismore desirable than a growth rate restriction on
The stochastic approximation algorithm with expanding truncationswas introduced in [27], and the analysis method has then been improvedin [14]. In fact, this is an RM algorithm truncated at expanding bounds,and for its convergence the growth rate restriction on is not re-quired. The convergence analysis method for the proposed algorithmis called the trajectory-subsequence (TS) method, because the analysis
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 11/368
PREFACE xi
is carried out at trajectories where the noise condition is satisfied andin contrast to the ODE method the noise condition need not be veri-fied on the whole sequence but is verified only along convergentsubsequences This makes a great difference when dealing with
the state-dependent noise because a convergent subsequenceis always bounded while the boundedness of the whole sequence
is not guaranteed before establishing its convergence. As shown inChapters 4, 5, and 6 for most of parameter estimation problems aftertransforming them to a root-seeking problem, the structural errors areunavoidable, and they are state-dependent.
The expanding truncation technique equipped with TS method ap-pears a powerful tool in dealing with various parameter estimation prob-lems: it not only has succeeded in essentially weakening conditions forconvergence of the general stochastic approximation algorithm but alsohas made stochastic approximation possible to be successfully applied indiverse areas. However, there is a lack of a reference that systematicallydescribes the theoretical part of the method and concretely shows theway how to apply the method to problems coming from different areas.To fill in the gap is the purpose of the book.
The book summarizes results on the topic mostly distributed over journal papers and partly contained in unpublished material. The bookis written in a systematical way: it starts with a general introductionto stochastic approximation and then describes the basic method usedin the book, proves the general convergence theorems and demonstratesvarious applications of the general theory.
In Chapter 1 the problem of stochastic approximation is stated andthe basic methods for convergence analysis such as probabilistic method,ODE method, TS method, and the weak convergence method are intro-duced.
Chapter 2 presents the theoretical foundation of the algorithm withexpanding truncations: the basic convergence theorems are proved byTS method; various types of noises are discussed; the necessity of theimposed noise condition is shown; the connection between stability of the equilibrium and convergence of the algorithm is discussed; the ro-bustness of stochastic approximation algorithms is considered when thecommonly used conditions deviate from the exact satisfaction, and themoving root tracking is also investigated. The basic convergence the-orems are presented in Section 2.2, and their proof is elementary andpurely deterministic.
Chapter 3 describes asymptotic properties of the algorithms: conver-gence rates for both cases whether or not the gradient of is degener-
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 12/368
xii STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
ate; asymptotic normality of and asymptotic efficiency by averagingmethod.
Starting from Chapter 4 the general theory developed so far is ap-plied to different fields. Chapter 4 deals with optimization by usingstochastic approximation methods. Convergence and convergence ratesof the Kiefer-Wolfowitz (KW) algorithm with expanding truncations andrandomized differences are established. A global optimization methodconsisting in combination of the KW algorithms with search methods isdefined, and its a.s. convergence as well as asymptotic behaviors are es-tablished. Finally, the global optimization method is applied to solvingthe model reduction problem.
In Chapter 5 the general theory is applied to the problems arising
from signal processing. Applying the stochastic approximation methodto blind channel identification leads to a recursive algorithm estimatingthe channel coefficients and continuously improving the estimates whilereceiving new signal in contrast to the existing “block” algorithms. Ap-plying TS method to principal component analysis results in improvingconditions for convergence. Stochastic approximation algorithms withexpanding truncations with TS method are also applied to adaptive fil-ters with and without constraints. As a result, conditions required for
convergence have been considerably improved in comparison with theexisting results. Finally, the expanding truncation technique and TSmethod are applied to the asynchronous stochastic approximation.
In the last chapter, the general theory is applied to problems arisingfrom systems and control. The ideal parameter for operation is identifiedfor stochastic systems by using the methods developed in this book.Then the obtained results are applied to the adaptive quadratic controlproblem. Adaptive regulation for a nonlinear nonparametric system and
learning pole assignment are also solved by the stochastic approximationmethod.The book is self-contained in the sense that there are only a few points
using knowledge for which we refer to other sources, and these points canbe ignored when reading the main body of the book. The basic mathe-matical tools used in the book are calculus and linear algebra based onwhich one will have no difficulty to read the fundamental convergenceTheorems 2.2.1 and 2.2.2 and their applications described in the sub-
sequent chapters. To understand other material, probability concept,especially the convergence theorems for martingale difference sequencesare needed. Necessary concept of probability theory is given in AppendixA. Some facts from probability that are used at a few specific points arelisted in Appendix A but without proof, because omitting the corre-sponding parts still makes the rest of the book readable. However, the
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 13/368
PREFACE xiii
proof of convergence theorems for martingales and martingale differencesequences is provided in detail in Appendix B.
The book is written for students, engineers and researchers working inthe areas of systems and control, communication and signal processing,
optimization and operation research, and mathematical statistics.
HAN-FU CHEN
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 14/368
Acknowledgments
The support of the National Key Project of China and the NationalNatural Science Foundation of China is gratefully acknowledged. Theauthor would like to express his gratitude to Dr. Haitao Fang for hishelpful suggestions and useful discussions. The author would also liketo thank Ms. Jinling Chang for her skilled typing and to thank my wifeShujun Wang for her constant support.
xv
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 15/368
ROBBINS-MONRO ALGORITHM
Chapter 1
Optimization is ubiquitous in various research and application fields.It is quite often that an optimization problem can be reduced to findingzeros (roots) of an unknown function which can be observed but
the observation may be corrupted by errors. This is the topic of stochas-tic approximation (SA). The error source may be observation noise, butmay also come from structural inaccuracy of the observed function. For
example, one wants to find zeros of but he actually observes func-tions which are different from Let us denote by the
observation at time the observation noise:
Here, is the additional error caused by the structural in-accuracy. It is worth noting that the structural error normally dependson and it is hard to require it to have a certain probabilistic property
such as independence, stationarity or martingale property. We call thiskind of noises as state-dependent noise.The basic recursive algorithm for finding roots of an unknown function
on the basis of noisy observations is the Robbins-Monro (RM) algorithm,which is characterized by its simplicity in computation. This chapter
serves as an introduction to SA, describing various methods for analyzing
convergence of the RM algorithm.In Section 1.1 the motivation of RM algorithm is explained, and its
limitation is pointed out by an example. In Section 1.2 the classical
approach to analyzing convergence of RM algorithm is presented, whichis based on probabilistic assumptions on the observation noise. To relax
restrictions made on the noise, a convergence analysis method connectingconvergence of the RM algorithm with stability of an ordinary differential
1
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 16/368
2 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
equation (ODE) was introduced in nineteen seventies. The ODE methodis demonstrated in Section 1.3. In Section 1.4 the convergence analysisis carried out at a sample path by considering convergent subsequences.So, we call this method as Trajectory-Subsequence (TS) method, which
is the basic tool used in the subsequent chapters.In this book our main concern is the path-wise convergence of the
algorithm. However, there is another approach to convergence analy-sis called the weak convergence method, which is briefly introduced inSection 1.5. Notes and references are given in the last section.
This chapter introduces main methods used in literature for conver-gence analysis, but restricted to the single root case. Extension to moregeneral cases in various aspects is given in later chapters.
1.1. Finding Zeros of a Function.Many theoretical and practical problems in diverse areas can be re-
duced to finding zeros of a function. To see this it suffices to notice thatsolving many problems finally consists in optimizing some functioni.e., finding its minimum (or maximum). If is differentiable, thenthe optimization problem reduces to finding the roots of where
the derivative of In the case where the function or its derivatives can be observed
without errors, there are many numerical methods for solving the prob-lem. For example, the gradient method, by which the estimate forthe root of is recursively generated by the following algorithm
where denotes the derivative of This kind of problems belongs
to the topics of optimization theory, which considers general cases wheremay be nonconvex, nonsmooth, and with constraints.In contrast to the optimization theory, SA is devoted to finding zeros
of an unknown function which can be observed, but the observationsare corrupted by errors.
Since is not exactly known and even may not exist, (1.1.1)-like algorithms are no longer applicable. Consider the following simpleexample. Let be a linear function
If the derivative of is available, i.e., if we know and if can precisely be observed, then according to (1.1.1)
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 17/368
ROBBINS-MONRO ALGORITHM 3
This means that the gradient algorithm leads to the zero of by one step.
Assume the derivative of is unavailable but can exactly beobserved.
Let us replace by in (1.1.1). Then we derive
or
This is a linear difference equation, which can inductively be solved,and the solution of (1.1.3) can be expressed as follows
Clearly, tends to the root of as for any initialvalue This is an attractive property: although the gradient of isunavailable, we can still approach the sought-for root if the inverse of thegradient is replaced by a sequence of positive real numbers decreasinglytending to zero.
Let us consider the case where is observed with errors:
where denotes the observation at time the correspondingobservation error and the estimate for the root of at time
It is natural to ask, how will behave if the exact value of in (1.1.2) is replaced by its error-corrupted observation i.e., if is recursively derived according to the following algorithm:
In our example, and (1.1.5) turns to be
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 18/368
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 4
Similar to (1.1.3), the solution of this difference equation is
Therefore, converges to the root of if tends
to zero as This means that replacement of gradient by asequence of numbers still works even in the case of
error-corrupted observations, if the observation errors can be averagedout. It is worth noting that in lieu of (1.1.5) we have to take the positive
sign before i.e., to consider
if rather than or more general, if is decreasing
as increases.This simple example demonstrates the basic features of the algorithm
(1.1.5) or (1.1.7): 1) The algorithm may converge to a root of 2) Thelimit of the algorithm, if exists, should not depend on the initial value; 3)The convergence rate is defined by that how fast the observation errorsare averaged out.
From (1.1.6) it is seen that the convergence rate is defined by
for linear functions. In the case where is a sequence of indepen-dent and identically distributed random variables with zero mean and
bounded variance, then
by the iterated logarithm law.This means that convergence rate for algorithms (1.1.5) or (1.1.7) with
error-corrupted observations should not be faster than
1.2. Probabilistic MethodWe have just shown how to find the root of an unknown linear function
based on noisy observations. We now formulate the general problem.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 19/368
ROBBINS-MONRO ALGORITHM 5
Let be an unknown function with unknown rootAssume can be observed at each point with noise
and is the estimate for at timeStochastic approximation algorithms recursively generate to ap-
proximate based on the past observations. In the pioneer work of thisarea Robbins and Monro proposed the following algorithm
to estimate where step size is decreasing and satisfies the fol-lowing conditions and They proved
We explain the meaning of conditions required for step sizeCondition aims at reducing the effect of observation noises.To see this, consider the case where is close to and is closeto zero, say, with small.
Throughout the book, always means the Euclidean norm of avector and denotes the square root of the maximum eigenvalueof the matrix where means the transpose of the matrix A.
By (1.2.2) andEven in the Gaussian noise case, may be large if
has a positive lower bound. Therefore, in order to have the desiredconsistency, i.e., it is necessary to use decreasing gains
such that On the other hand, consistency can neither be
achieved, if decreases too fast as To see this, let
Then even for the noise-free case, i.e., from (1.2.2) we have
if is a bounded function.
Therefore, in this case
if the initial value is far from the true root and hence will neverconverge to
The algorithm (1.2.2) is now called Robbins-Monro (RM) algorithm.
where isthe observation at time is the observation noise,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 20/368
6 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
The classical approach to convergence analysis of SA algorithms isbased on the probabilistic analysis for trajectories. We now present atypical convergence theorem by this approach. Related concept andresults from probability theory are given in Appendices A and B.
In fact, we will use the martingale convergence theorem to prove thepath-wise convergence of i.e., to show For this, the
following set of conditions will be used.
A 1.2.1 The step size is such that
A1.2.2 There exists a continuously twice differentiable Lyapunov func-
tion satisfying the following conditions.i) Its second derivative is bounded;
ii) and as
iii) For any there is a such that
where denotes the gradient of
A1.2.3 The observation noise is a martingale difference se-quence with
where is a family of nondecreasing
A1.2.4 The function and the conditional second moment of the
observation noise have the following upper bound
where is a positive constant.
Prior to formulating the theorem we need some auxiliary results.
Let be an adapted sequence, i.e., is
Define the first exist time of from a Borel set
It is clear that i.e., is a Markov time.
Lemma 1.2.1 Assume and is a nonnegative supermartin-
gale, i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 21/368
ROBBINS-MONRO ALGORITHM 7
Then is also a nonnegative supermartingale, where
The proof is given in Appendix B, Lemma B-2-1.
The following lemma concerning convergence of an adapted sequencewill be used in the proof for convergence of the RM algorithm, but thelemma is of interest by itself.
Lemma 1.2.2 Let be two nonnegative adapted se-
quences.
i) If and then converges a.s.
to a finite limit.
ii) If then
Proof. For proving i) set
Then we have
By the convergence theorem for nonnegative supermartingales, con-verges a.s. as
Since by the convergence theorem for martingales it
follows that converges a.s. as Since is
Noticing that both and converge a.s.
as we conclude that is also convergent a.s. as
Consequently, from (1.2.5) it follows that converges a.s. as
For proving ii) set
measurable and is nondecreasing, we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 22/368
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 8
Taking conditional expectation leads to
Again, by the convergence theorem for nonnegative supermartingales,converges a.s. as Since by the same theorem also
converges a.s. as it directly follows that a.s.
Theorem 1.2.1 Assume Conditions A1.2.1–A1.2.4 hold. Then for anyinitial value, given by the RM algorithm (1.2.2) converges to the root
of a.s. as
Proof. Let be the Lyapunov function given in A1.2.2. Expandingto the Taylor series, we obtain
where and denote the gradient and Hessian of respec-tively, is a vector with components located in-between the corre-sponding components of and and denotes the constant suchthat (by A1.2.2).
Noticing that is and taking con-ditional expectation for (1.2.6), by (1.2.4) we derive
Since by (A1.2.1), we have
Denoting
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 23/368
ROBBINS-MONRO ALGORITHM 9
and noticing by A1.2.2, iii) from (1.2.7) and (1.2.8) itfollows that
Therefore, and converges a.s. by the convergencetheorem for nonnegative supermartingales.
Since also converges a.s.
For any denote
Let be the first exit time of from and let
where denotes the complement to This means that is the firstexit time from after
Since is nonpositive, from (1.2.9) it follows that
for any
Then by (1.2.2), this implies that
By Lemma 1.2.2, ii), the above inequality implies
which means that must be finite a.s. Otherwise, we would have
a contradiction to A1.2.1. Therefore, after with
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 24/368
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 10
possible exception of a set with probability zero the trajectory of must enter
Consequently, there is a subsequence such thatwhere as
By the arbitrariness of we then conclude that there is a subsequence,denoted still by such that Hence
However, we have shown that converges a.s. Therefore,a.s. By A1.2.2, ii) we then conclude that a.s.
Remark 1.2.1 If Condition A1.2.2 iii) changes to
then the algorithm (1.2.2) should accordingly change to
We now explain conditions required in Theorem 1.2.1. As noted in
Section 1.1, the step size should satisfy but the condition
may be weakened to
Condition A1.2.2 requires existence of a Lyapunov function Thiskind of conditions is normally necessary to be imposed for convergenceof the algorithms, but the analytic properties of may be weakened.The noise condition A1.2.3 is rather restrictive. As to be shown in thesubsequent chapters, may be composed of not only the random noisebut also structural errors which hardly have nice probabilistic properties
such as martingale difference, stationarity or with bounded variances etc.As in many cases, one can take to serve as Then from(1.2.4) it follows that the growth rate of as should not befaster than linear. This is a major restriction to apply Theorem 1.2.1.However, if we a priori assume that generated by the algorithm(1.2.2) is bounded, then is bounded provided is locallybounded, and then the linear growth is not a restriction for1,2,...}.
1.3. ODE Method
As mentioned in Section 1.2, the classical probabilistic approach toanalyzing SA algorithms requires rather restrictive conditions on theobservation noise. In nineteen seventies a so-called ordinary differentialequation (ODE) method was proposed for analyzing convergence of SA
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 25/368
ROBBINS-MONRO ALGORITHM 11
algorithms. We explain the idea of the method. The estimategenerated by the RM algorithm is interpolated to a continuous functionwith interpolating length equal to the step size used in the algo-rithm. The tail part of the interpolating function is shown to satisfy
an ordinary differential equation The sought-for root is theequilibrium of the ODE. By stability of this equation, or by assumingexistence of a Lyapunov function, it is proved that From
this, it can be deduced thatFor demonstrating the ODE method we need two facts from analysis,
which are formulated below as propositions.
Proposition 1.3.1 (Arzelà-Ascoli) Let be a set of
equi-continuous and uniformly bounded functions, where by equi-continuity we mean that for any and any there exists
such that
Then there are a continuous function and a subsequence of
functions which converge to uniformly in any finite interval of
i.e.,
uniformly with respect to belonging to any finite interval.
Proposition 1.3.2 For the following ODE
with
if there exists a continuously differentiable function such that as and
then the solution to (1.3.1), starting from any initial value, tends toas i.e., is the global asymptotically stable solution to
(1.3.1).
Let us introduce the following conditions.
A1.3.2 There exists a twice continuously differentiable Lyapunov func-
tion such that as
and
A1.3.1
whenever
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 26/368
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 12
In order to describe conditions on noise, we introduce an integer-valued function for any and any integer
For define
Noticing that tends to zero, for any fixed diverges toinfinity as In fact, counts the number of iterationsstarting from time as long as the sum of step sizes does not exceedThe integer-valued function will be used throughout the book.
The following conditions will be used:
A1.3.3 satisfies the following conditions
A1.3.4 is continuous.
Theorem 1.3.1 Assume that A1.3.1, A1.3.2, and A1.3.4 hold. If for a fixed sample A1.3.3 holds and generated by the RM algorithm
(1.2.2) is bounded, then for this tends to as
Proof. Set
Define the linear interpolating function
It is clear that is continuous and
Further, define and the corresponding linear interpo-
lating function which is defined by (1.3.4) with replaced by
Since we will deal with the tail part of we define by shiftingtime in
Thus, we derive a family of continuous functions
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 27/368
ROBBINS-MONRO ALGORITHM 13
Let us define the constant interpolating function
Then summing up both sides of (1.2.2) yields
and hence
By the boundedness assumption on the family is uni-formly bounded. We now prove it is equi-continuous.
By definition,
Hence, we have
where since
From this it follows that
which tends to zero as and then by A1.3.3.For any we have
By boundedness of and (1.3.11) we see that is equi-continuous.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 28/368
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 14
By Proposition 1.3.1, we can select from a convergent subse-quence which tends to a continuous function
Consider the following difference with
which is derived by using (1.3.11).By (1.3.9) it is clear that for
Then from (1.3.12) we obtain
Tending to zero in (1.3.13), by continuity of and uniform con-vergence of to we conclude that the last term in (1.3.13)converges to zero, and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 29/368
ROBBINS-MONRO ALGORITHM 15
By A1.3.2 and Proposition 1.3.2 we see asWe now prove that Assume the converse: there is a
subsequence
Then for There is a such thatBy (1.3.4) we have
where and denotesthe integer part of so
It is clear that the family of functions indexed byis uniformly bounded and equi-continuous. Hence, we can select a
convergent subsequence, denoted still by The limit satisfiesthe ODE (1.3.14) and coincides with being the limit of bythe uniqueness of the solution to (1.3.14).
By the uniform convergence we have
which implies thatFrom here by (1.3.15) it follows that
Then we obtain a contradictory inequality:
for large enough such that and This completes
the proof of We now compare conditions used in Theorem 1.3.1 with those in The-
orem 1.2.1.
Conditions A1.3.1 and A1.3.2 are slightly weaker than A1.2.1 andA1.2.2, but they are almost the same. The noise condition A1.3.3 issignificantly weaker than those used in Theorem 1.2.1, because under
the conditions of Theorem 1.2.1 we have
which certainly implies A1.3.3.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 30/368
16 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
As a matter of fact, Condition A1.3.3 may be satisfied by sequencesmuch more general than martingale difference sequences.
Example 1.3.1 Assume but may be any random or deter-
ministic sequence. Then satisfies A1
.3.3.This is because
Example 1.3.2 Let be an MA process, i.e.,
where is a martingale difference sequence with
Then under conditionA1.2.1, a.s., and hence
a.s. Consequently, A1.3.3 is satisfied for almost all sample paths
Condition A1.3.4 requires continuity of which is not required inA1.2.4. At first glance, unlike A1.2.4, Condition A1.3.4 does not impose
any growth rate condition on but Theorem 1.3.1 a priori requiresthe boundedness of which is an implicit requirement for the growthrate of
The ODE method is widely used in convergence analysis for algo-rithms arising from various application areas, because from the noiseit requires no probabilistic property which would be difficult to verify.Concerning the weakness of the ODE method, we have mentioned thatit a priori assumes that is bounded. This condition is difficult to
be verified in general case. The other point should be mentioned thatCondition A1.3.3 is also difficult to be verified in the case wheredepends on the past which often occurs when containsstructural errors of This is because A1.3.3 may be verifiable if isconvergent, but may badly behave depending upon the behavior of
So we are somehow in a cyclic situation: with A1.3.3 we canprove convergence of on the other hand, with convergent wecan verify A1.3.3. This difficulty will be overcome by using Trajectory-
Subsequence (TS) method to be introduced in the next section and usedin subsequent chapters.
1.4. Truncated RM Algorithm and TS MethodIn Section 1.2 we considered the root-seeking problem where the
sought-for root may be any point in If the region belongs
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 31/368
ROBBINS-MONRO ALGORITHM 17
to is known, then we may use the truncated algorithm and the growthrate restriction on can be removed.
Let us assume that and is known. In lieu of (1.2.2) wenow consider the following truncated RM algorithm:
where the observation is given by (1.2.1), is a given point,
and
The constant used in (1.4.1) will be specified later on.
The algorithm (1.4.1) means that it coincides with the RM algorithmwhen it evolves in the sphere but if exits thesphere then the algorithm is pulled back to the fixed point
We will use the following set of conditions:
A1.4.1 The step size satisfies the following conditions
A1.4.2 There exists a continuously d ifferentiable Lyapunov function
(not necessarily being nonnegative) such that and for (which is used in
(1.4.1)) there is such that
A1.4.3 For any convergent subsequence of
where is given by (1.3.2);
A1.4.4 is measurable and locally bounded.
We first compare these conditions with A1.3.1–A1.3.4. We note thatA1.4.1 is the same as A1.3.1, while A1.4.2 is weaker than A1.2.2.
The difference between A1.3.3 and A1.4.3 consists in that Condition(1.4.2) is required to be verified only along convergent subsequences,while (1.3.3) in A1.3.3 has to be verified along the whole sequence
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 32/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 33/368
ROBBINS-MONRO ALGORITHM 19
if is small enough and is large enough.This incorporating with (1.4.5) implies that
Therefore, the norm of
cannot reach the truncation bound In other words, the algorithm(1.4.1) turns to be an untruncated RM algorithm (1.4.7) for
for small and large
By the mean theorem there exists a vector with components locatedin-between the corresponding components of and suchthat
Notice that by (1.4.2) the left-hand side of (1.4.6) is of for all
sufficiently large since is bounded. From this it follows that i)for small enough and large enough
and hence and ii) the last term in (1.4.8) is of since as From (1.4.7) and (1.4.8) it thenfollows that
Since the interval does not contain the origin. Noticingthat we findand that there is such that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 34/368
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 20
for sufficientlysmall and all large enough Then by A1.4.2 thereis such that
for all large and small enough As mentioned abovefrom (1.4.9) we have
for sufficiently large and small enough where denotes a mag-nitude tending to zero as
Taking (1.4.4) into account, from (1.4.10) we find that
for large However, we have shown that
The obtained contradiction shows that the number of truncations in(1.4.1) can only be finite.
We have proved that starting from some large the algorithm (1.4.1)develops as an RM algorithm
and is bounded.We are now in a position to show that converges.Assume it were not true. Then we would have
Then there would exist an interval not containing the originand would cross for infinitely many
Again, without loss of generality, assuming by the same
argument as that used above, we will arrive at (1.4.9) and (1.4.10) forlarge and obtain a contradiction. Thus, tends to a finite limit
as It remains to show that
Assume the converse that there is a subsequence
Then there is a such that for all sufficiently largeWe still have (1.4.8), (1.4.9), and (1.4.10) for some
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 35/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 36/368
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 22
If for any bounded continuous function defined on
then we say that weakly converges toIf for any there is a compact measurable set in
such that
then is called tight.Further, is called relatively compact if each subsequence of
contains a weakly convergent subsequence.In the weak convergence analysis an important role is played by the
Prohorov’s Theorem, which says that on a complete and separable met-ric space, tightness is equivalent to relative compactness. The weakconvergence method establishes the weak limit of as andconvergence of to in probability as whereas
Theorem 1.5.1 Assume the following conditions:
A1.5.1 is a.s. bounded;
A1.5.2 is continuous;
A1.5.3 is adapted, is uniformly integrable in the sense that
and
Then is tight in and weakly converges to
that is a solution to
Further, if is asymptotically stable for (1.5.3), then for anyas the distance between and
converges to zero in probability as
In stead of proof, we only outline its basic idea. First, it is shownthat we can extract a subsequence of weaklyconverging to
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 37/368
ROBBINS-MONRO ALGORITHM 23
For notational simplicity, denote the subsequence still by Bythe Skorohod representation, we may assume For
this we need only, if necessary, to change the probabilistic space and takeand on this new space such that and
have the same distributions as those of and respectively.Then, it is proved that
is a martingale. Since and as can be shown, is Lipschitzcontinuous, it follows that
Since is relatively compact and the limit does not depend onthe extracted subsequence, the whole family weakly convergesto as and satisfies (1.5.3). By asymptotic stability of
Remark 1.5.1 The boundedness assumption on may be removed.For this a smooth function is introduced such that
and the following truncated algorithm
is considered in lieu of (1.5.1). Then is interpolated to a piece-wiseconstant function for the It is shownthat is tight, and weakly convergent as The limit
satisfies
Finally, by showing lim sup lim sup for some
for each it is proved that itself is tight and weaklyconverges to satisfying (1.5.3).
1.6. Notes and ReferencesThe stochastic approximation algorithm was first proposed by Rob-bins and Monro in [82], where the mean square convergence of the algo-
rithm was established under the independence assumption on the obser-vation noise. Later, the noise was extended from independent sequenceto martingale difference sequences (e.g. [7, 40, 53]).
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 38/368
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 24
The probabilistic approach to convergence analysis is well summarizedin [78].
The ODE approach was proposed in [65, 72], and then it was widelyused [4, 85]. For detailed presentation of the ODE method we refer to
[65, 68].The proof of Arzelá-Ascoli Theorem can be found in ([37], p.266).Section 1.4 is an introduction to the method described in detail in
coming chapters. For stability and Lyapunov functions we refer to [69].The weak convergence method was developed by Kushner [64, 68].
The Skorohod topology and Prohorov’s theorem can be found in [6, 41].For probability concepts briefly presented in Appendix A, we refer
to [30, 32, 70, 76, 84]. But the proof of the convergence theorem for
martingale difference sequences, which are frequently used throughoutthe book, is given in Appendix B.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 39/368
STOCHASTIC APPROXIMATION ALGORI-
THMS WITH EXPANDING TRUNCATIONS
In Chapter 1 the RM algorithm, the basic algorithm used in stochas-tic approximation(SA), was introduced, and four different methods foranalyzing its convergence were presented. However, conditions imposedfor convergence are rather strong.
Comparing theorems derived by various methods in Chapter 1, wefind that the TS method introduced in Section 1.4 requires the weakest
condition on noise. The trouble is that the sought-for root has to be in-side the truncation region. This motivates us to consider SA algorithmswith expanding truncations with the purpose that the truncation regionwill finally cover the sought-for root whose location is unknown. This isdescribed in Section 2.1.
General convergence theorems of the SA algorithm with expandingtruncations are given in Section 2.2. The key point of the proof is toshow that the number of truncations is finite. If this is done, then the
estimate sequence is bounded and the algorithm turns to be the conven-tional RM algorithm in a finite number of steps. This is realized by usingthe TS method. It is worth noting that the fundamental convergencetheorems given in this section are analyzed by a completely elementarymethod, which is deterministic and is limited to the knowledge of calcu-lus. In Section 2.3 the state-independent conditions on noise are givento guarantee convergence of the algorithm when the noise itself is state-dependent. In Section 2.4 conditions on noise are discussed. It appearsthat the noise condition in the general convergence theorems in a certainsense is necessary. In Section 2.5 the convergence theorem is given forthe case where the observation noise is non-additive.
In the multi-root (of case, up-to Section 2.6 we have only estab-lished that the distance between the estimate and the root set tends to
25
Chapter 2
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 40/368
In Chapter 1 we have presented four types of convergence theoremsusing different analysis methods for SA algorithms. However, none of these theorems is completely satisfactory in applications. Theorem 1.2.1is proved by using the classical probabilistic method, which requiresrestrictive conditions on the noise and As mentioned before, thenoise may contain component caused by the structural inaccuracy of the function, and it is hard to assume this kind of noise to be mutually
independent or to be a martingale difference sequence etc. The growthrate restriction imposed on the function not only is sever, but also isunavoidable in a certain sense. To see this, let us consider the followingexample:
It is clear that conditions A1.2.1, A1.2.2, and A1.2.3 are satisfied,
where for A1.2.2 one may take The only conditionthat is not satisfied is (1.2.4), since while the right-hand side of (1.2.4) is a second order polynomial. Simple calculationshows that given by RM algorithm rapidly diverges:
STOCHASTIC APPROXIMATION AND ITS APPLICATIONS 26
From this one might conclude that the growth rate restriction wouldbe necessary.However, if we take the initial value with then
given by the RM algorithm converges to To reduce initial valuein a certain sense, it is equivalent to use step size not from but from
for some The difficulty consists in that from which we should
zero. But, by no means this implies convergence of the estimate itself.This is briefly discussed in Section 2.4, and is considered in Section 2.6in connection with properties of the equilibrium of Conditionsare given to guarantee the trajectory convergence. It is also considered
whether the limit of the estimate is a stable or unstable equilibrium of In Section 2.7 it is shown that a small distortion of conditions
may cause only a small estimation error in limit, while Section 2.8 of this chapter considers the case where the sought-for root is moving dur-ing the estimation process. Convergence theorems are derived with thehelp of the general convergence theorem given in Section 2.2. Notes andreferences are given in the last section.
2.1. Motivation
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 41/368
Stochastic Approximation Algorithms withExpanding Truncations 27
start the algorithm. This is one of the motivations to use expandingtruncations to be introduced later.
Theorem 1.3.1 proved in Section 1.3 demonstrates the ODE method.By this approach, the condition imposed on the noise has significantly
been weakened and it covers a class of noises much larger than thattreated by the probabilistic method. However, it a priori requiresbe bounded. This is the case if converges, but before establishing itsconvergence, this is an artificial condition, which is not satisfied even forthe simple example given above. Further, although the noise condition(1.3.3) is much more general than that used in Theorem 1.2.1, it isstill difficult to be verified for the state-dependent noise. For example,
where is a martingale difference sequence with
If is bounded andthen a.s. and (1.3.3) holds. However, in general,
it is difficult to directly verify (1.3.3) because the behavior of is
unknown. This is why we use Condition (1.4.2) which should be verifiedonly along convergent subsequences. With convergent the noise
is easier to be dealt with.Considering convergent subsequences, the path-wise convergence is
proved for a truncated RM algorithm by using the TS method in Theo-rem 1.4.1. The weakness of algorithms with fixed truncation bounds isthat the sought-for root of has to be located in the truncation region.But, in general, this cannot be ensured. This is another motivation toconsider algorithms with expanding truncations.
The weak convergence method explained in Section 1.5 can avoidboundedness assumption on but it can ensure convergence in dis-tribution only, while in practical computation one always deals with a
sample path. Hence, people in applications are mainly interested inpath-wise convergence.The SA algorithm with expanding truncations was introduced in or-
der to remove the growth rate restriction on It has been developedin two directions: weakening conditions imposed on noise and improv-ing the analysis method. By the TS method we can show that theSA algorithm with expanding truncations converges under a truly weakcondition on noise, which, in fact, is also necessary for a wide class of
In Chapter 1, the root of is a singleton. Fromnow on we will consider the general case. Let J be the root set of
We now define the algorithm. Let be a sequence of positivenumbers increasingly diverging to infinity, and let be a fixed point in
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 42/368
28 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Fix an arbitrary initial value and denote by the estimate attime serving as the approximation to J. Define bythefollowingrecursion:
where is an indicator function meaning that it equals 1 if the inequality indicated in the bracket is fulfilled, and 0 if the inequalitydoes not hold.
We explain the algorithm. is the number of truncations up-to timeserves as the truncation bound when the estimate is
generated. From (2.1.1) it is seen that if the estimate at timecalculated by the RM algorithm remains in the truncation region, i.e., if
then the algorithm evolves as the RM algorithm.
If exits from the sphere with radius i.e., if then the estimate at time is pulled back to thepre-specified point and the truncation bound is enlarged fromto
Consequently, if it can be shown that the number of truncations isfinite, or equivalently, generated by (2.1.1) and (2.1.2) is bounded,then the algorithm (2.1.1) and (2.1.2) turns to be the one without trun-cations, i.e., to be the RM algorithm after a finite number of steps. This
actually is the key step when we prove convergence of (2.1.1) and (2.1.2).The convergence analysis of (2.1.1) and (2.1.2) will be given in thenext section, and the analysis is carried out in a deterministic way at afixed sample without involving any interpolating function.
In This section by TS method we establish convergence of the RM
algorithm with expanding truncations defined by (2.1.1)–(2.1.3) undergeneral conditions. Let us first list conditions to be used.
2.2. General Convergence Theorems by TSMethod
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 43/368
Stochastic Approximation Algorithms withExpanding Truncations 29
A2.2.2 There is a continuously differentiable function (not necessarily
being nonnegative) such that
for any and is nowhere dense, where
J is the zero set of i.e.,
and denotes the gradient of Further, used in (2.1.1)
is such that for some and
For introducing condition on noise let us denote by the prob-ability space. Let be a mea-surable function defined on the product space. Fixing an meansthat a sample path is under consideration. Let the noise be givenby
Thus, the state-dependent noise is considered, and for fixedmay be random.
A2.2.3 For the sample path under consideration for any sufficiently
large integer
for any such that converges, where is given by(1.3.2) and denotes given by (2.1.1)–(2.1.3) and valued at thesample path
In the sequel, the algorithm (2.1.1)–(2.1.3) is considered for the fixedfor which A2.2.3 holds, and in will often be suppressed if no
confusion is caused.
A2.2.4 is measurable and locally bounded.
Remark 2.2.1 Comparing A2.2.1–A2.2.4 with A1.4.1–A1.4.4, we findthat if the root set J degenerates to a singleton then the only essentialdifference is that an indicator function is included in (2.2.2)while (1.4.2) stands without it. It is clear that if is bounded, thenthis makes no difference. However, before establishing the boundednessof condition (2.2.2) is easier to be verified. The key point here
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 44/368
30 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
is that in contrast to Section 1.4 we do not assume availability of theupper bound for the roots of
Remark 2.2.2 It is worth noting that con-
verges. To see this it suffices to take in (2.2.2).
Theorem 2.2.1 Let be given by (2.1.1)–(2.1.3) for a given initial
value Assume A2.2.1–A2.2.4 hold. Then, for the
sample path for which A2.2.3 holds.
Proof. The proof is completed by six steps by considering conver-
gent subsequences at the sample path. This is why we call the analysismethod used here as TS method.Step 1. We show that there are constants such that
for any there exists such that for any
if is a convergent subsequence of where M is
independent of andSince we need only to prove
(2.2.3) forIf the number of truncations in (2.1.1)–(2.1.3) is finite, then there is
an N such that i.e., there is no more truncation forHence, wheneverIn this case, we may take in (2.2.3).
We now prove (2.2.3) for the case where asAssume the converse that (2.2.3) is not true. Take There issuch that
Take a sequence of positive real numbers and as
Since (2.2.3) is not true, for there are andsuch that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 45/368
Stochastic Approximation Algorithms withExpanding Truncations 31
and for any there are andsuch that
Without loss of generality we may assume
Then for any from (2.2.4) and (2.2.6) it follows
that
Since there is such that Then from(2.2.7) it follows that
For any fixed if is large enough, then andand by (2.2.10)
Since from (2.2.11) it follows
that
and by (2.2.4), (2.2.7), and (2.2.8)
and hence
by A2.2.4, where is a constant.Let where is specified in A2.2.3. Thenfrom A2.2.3
for any
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 46/368
32 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Taking and respectively in (2.2.10)
and noticing from(2.2.9) we thenhave
and hence
From (2.2.8), it follows that
where the second term on the right-hand of the inequality tends to zeroby (2.2.12) and (2.2.13), while the first term tends to zero because
Noticing that by
(2.2.9) and (2.2.13), we then by (2.2.14) have
On the other hand, by (2.2.6) we have
The obtained contradiction proves (2.2.3).Step 2. We now show that for all large enough
if T is small enough, where is a constant.If the number of truncations in (2.1.1)–(2.1.3) is finite, then is
bounded and hence is also bounded.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 47/368
Stochastic Approximation Algorithms withExpanding Truncations 33
Then for large enough there is no truncation, and by (2.2.2) for
if T is small enough. In (2.2.16), for the last inequality the boundednessof is invoked, and is a constant.
Thus, it suffices to prove (2.2.15) for the case where
From (2.2.3) it follows that for any
if is large enough.This implies that for
where is a constant. The last inequality of (2.2.18) yields
With in A2.2.3, from (2.2.2) we have
for large enough and small enough T .Combining (2.2.18), (2.2.19), and (2.2.20) leads to
for all large enough This together with (2.2.16) verifies (2.2.15).Step 3. We now show the following assertion:For any interval with and the
sequence cannot cross infinitely many times with
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 48/368
34 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
andAssume the converse: there are infinitely many crossings
and is bounded.
By boundedness of without loss ofgenerality, we may assume
By setting in (2.2.15), we have
But by definition so we have
From (2.2.15) we see that if take sufficiently small, then
for sufficiently largeBy (2.2.18) and (2.2.15), for large we then have
where denotes the gradient of and asFor condition (2.2.2) implies that
By (2.2.15) and (2.2.18) it follows that
bounded, where by “crossing by we mean that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 49/368
Stochastic Approximation Algorithms withExpanding Truncations 35
Then, by (2.2.23) and (2.2.1) from (2.2.24)–(2.2.26) it follows that thereare and such that
for all sufficiently largeNoticing (2.2.22), from (2.2.27) we derive
However, by (2.2.15) we have
which implies that for small enoughThis means that which contradicts (2.2.28).Step 4. We now show that the number of truncations is bounded.By A2.2.2, is nowhere dense, and hence a nonempty interval
exists such that and
If then starting from will cross the sphere
infinitely many times. Consequently, will crossinfinitely often with bounded. In Step 3, we have shown thisprocess is impossible. Therefore, starting from some the algorithm(2.1.1)–(2.1.3) will have no truncations and is bounded.
This means that the algorithm defined by (2.1.1)–(2.2.3) turns to bethe conventional RM algorithm for and a stronger than (2.2.2)condition is satisfied:
for any such that converges.Step 5. We now show that converges. Let
We have to showIf and one of and does not belong to thenexists such that and By Step 3 this
is impossible. So, both and belong to and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 50/368
36 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
If we can show that is dense in then from (2.2.30)it will follow that is dense in which contradicts to theassumption that is nowhere dense. This will prove i.e., theconvergence of
To show that is dense in it suffices to show thatAssume the converse: there is a subsequence
Without loss of generality, we may assume converges. Otherwise,a convergent subsequence can be extracted, which is possible because
is bounded. However, ifwe take in (2.2.15), we have
which contradicts (2.2.31). Thus and converges.Step 6. For proving it suffices to show that all
limit points of belong to J.
Assume the converse: By (2.2.15) we
have
for all large if is small enough. By (2.2.1) it follows that
and from (2.2.24)
for smallenough This leads to a contradiction because convergesand the left-hand side of (2.2.32) tends to zero as Thus, weconclude
Remark 2.2.3 In (2.1.1)–(2.1.3) the spheres with expanding radiusesare used for truncations. Obviously, the spheres can be replaced
by other expanding sets. At first glance the point in (2.1.1) may bearbitrarily chosen, but actually the restriction is imposed on the exis-tence of such that The condition is obviouslysatisfied if as because the availability of is notrequired.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 51/368
Stochastic Approximation Algorithms withExpanding Truncations 37
Remark 2.2.4 In the proof of Theorem 2.2.1 it can be seen that theconclusion remains valid if in A2.2.2 “ J is the zero
set of is removed. As a matter of fact, J may be bigger than the
zero set of Of course, it should at least contain the zero set of in order (2.2.1) to be satisfied. It should also be noted that for
we need not require to be nowhere dense.
Let us modify A2.2.2 as follows.
A2.2.2’ There is a continuously differentiable function
such that
for any and is nowhere dense. Further, used in
(2.1.1) is such that for some and
A2.2.2” There is a continuously differentiable function
such that
for any and J is closed. Further, used in (2.1.1) is such
that for some and
Notice that, in A2.2.2’ and A2.2.2” the set J is not specified, but itcertainly contains the root sets of both and We may modify
Theorem 2.2.1 as follows.
Theorem 2.2.1’ Let be given by (2.1.1)–(2.1.3) for a given ini-
tial value Assume A2.2.1, A2.2.2’,A2.2.3, and A2.2.4 hold. Then
for the sample path for which A2.2.3 holds.
Proof. The Proof of Theorem 2.2.1 applies without any change.
Theorem 2.2.1” Let be given by (2.1.1)–(2.1.3) for a given initial
value. If A2.2.1, A2.2.2”,A2.2.3, and A2.2.4 hold, then
for the sample path for which A2.2.3 holds.
Proof. We still have Step 1– Step 3 in the proof of Theorem 2.2.1. Let
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 52/368
38 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
If or or both do not belong to J , then exists such
that since J is closed. Then would crossinfinitely many times. But, by Step 3 of the Proof for Theorem 2.2.1,this is impossible. Therefore both and belong to
Theorems 2.2.1 and 2.2.1’ only guarantee that the distance betweenand the set J tends to zero. As a matter of fact, we have more
precise result.
Theorem 2.2.2 Assume conditions of Theorem 2.2.1 or Theorem 2.2.1’
hold. Then for fixed and for which A2.2.3 holds, a connected subset
exists such that
where denotes the closure of and is generated by (2.1.1)–
(2.1.3).
Proof. Denote by the set of limit points of Assume the
converse: i.e., is disconnected. In other words, closed sets andexist such that and
Define
Since a exists such that
where denotes the of set A.
Define
It is clear that and
Since by we have
By boundedness of we may assume that converges.Then, by taking in (2.2.15), we derive
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 53/368
Stochastic Approximation Algorithms withExpanding Truncations 39
which contradicts (2.2.33) and proves the theorem.
Corollary 2.2.1 If J is not dense in any connected set, then under
conditions of Theorem 2.2.1, given by (2.1.1)–(2.1.3) converges to
a point in This is because in the present case any connected set inconsists of a single point.
Example 2.2.1 Reconsider the example given in Section 2.1:
It was shown that the RM algorithm rapidly diverges to even in thenoise-free case.
We now assume the observations are noise-corrupted:
where is an ARMA process driven by the independent identicallydistributed normal random variables
whereWe use the algorithm (2.1.1)–(2.1.3) with The
computation shows
which tend to the sought-for root 10.
Example 2.2.2 Let Then
Clearly, A2.2.1 and A2.2.4 hold. Concerning A2.2.2, we may taketo serve as Since
(2.2.1) is satisfied. The existence of required in A2.2.2 is obvious, forexample,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 54/368
40 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Finally, is nowhere dense. So A2.2.2 also holds.Now assume the noise is such that
Then A2.2.3 is satisfied too.By Corollary 2.2.1, given by (2.1.1)–(2.1.3) converges to a point
If for the conventional (untruncated) RM algorithm
it is a priori known that is bounded, then we have the followingtheorem.
Theorem 2.2.3 Assume A2.2.1–A2.2.4 hold but in A2.2.2 the require-
ment: “Further, used in (2.1.1) is such that for
some and is removed. If produced by (2.2.34) isbounded, then for the sample path for which A2.2.3
holds, where is a connected subset of
Proof. As a matter offact, by boundedness of (2.2.3) and (2.2.15)become obvious. Steps 3, 5, and 6 in the proof of Theorem 2.2.1 remainunchanged, while Step 4 is no longer needed. Then the conclusion followsfrom Theorems 2.2.1 and 2.2.2.
Remark 2.2.5 All theorems concerning SA algorithms with expandingtruncations remain valid for produced by (2.2.34), if given by
(2.2.34) is known to be bounded.
Theorems 2.2.1 and 2.2.2 concern with time-invariant functionbut the results can easily be extended to time-varying functions, i.e., tothe case where the measurements are carried out for
where depends on time
Conditions A2.2.2 and A2.2.4 are respectively replaced by the follow-ing conditions:A2.2.2o There is a continuously differentiable functionsuch that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 55/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 56/368
42 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
is so weak that it is necessary as to be shown later. However, conditionA2.2.3 is state-dependent in the sense that the condition itself dependson the behavior of This makes it not always possible to verifythe condition beforehand. We are planning to give convergence theo-
rems under conditions with no state involved. For this we have toreformulate Theorems 2.2.1 and 2.2.2.
As defined in Section 2.2 where is a mea-surable function. In lieu of A2.2.3 we introduce the following condition.
A2.3.1 For any sufficiently large integer there is an
with such that for any
for any such that converges.
Theorem 2.3.1 Assume A2.2.1, A2.2.2, A2.2.4, and A2.3.1 hold. Then
a.s. for generated by (2.1.1)–(2.1.3) with a
given initial value where is a connected subset contained in theclosure of J.
Proof. Let It is clear that
i.e., Then for any
A2.2.3 is fulfilled with possibly depending on and theconclusion of the theorem follows from Theorems 2.2.1 and 2.2.2.We now introduce a state-independent condition on noise.
A2.3.2 For any is a martingale difference se-
quence and for some
where is a family of nondecreasing independent of
We first give an example satisfying A2.3.2. Let be andimensional martingaledifference sequencewith
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 57/368
Stochastic Approximation Algorithms withExpanding Truncations 43
for some and let
be a measurable and locally bounded function. Thensatisfies A2.3.2, because
and
by assumption.
Theorem 2.3.2 Let be given by (2.1.1)–(2.1.3) for a given initialvalue. Assume A2.2.1, A2.2.2, A2.2.4, and A2.3.2 hold and
for given in A2.3.2. Then a.s., where is a
connected subset contained in
Proof. Since is measurable and is it fol-lows that is adapted. Approximating
by simple functions, it is seen that
Hence, is a martingale difference sequence, and
a.s.
By the convergence theorem for martingale difference sequences, theseries
converges a.s., which implies that with exists such thatfor each
converges to zero as uniformly inThis means that A2.3.1 holds, and the conclusion of the theorem
follows from Theorem 2.3.1.
In applications it may happen that is not directly observed. In-stead, the time-varying functions are observed, and the observa-
tions may be done not at but at i.e., at with bias
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 58/368
44 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Theorem 2.3.3 Let be given by (2.1.1)–(2.1.3) for a given ini-
tial value. Assume that A2.2.1, A2.2.2, A2.2.4, and A2.3.2 hold and
for p given in A2.3.2. Further, assume is an
adapted sequence, is bounded by a constant, and for any sufficiently
large integer there exists with such that for any
for any such that converges. Then, a.s.,where is a connected subset contained in
Proof. By assumption where is a constant. Then
and again by the convergence theorem for martingale difference sequences,
the series
convergence a.s. Consequently, there exists with such that
for any the convergence indicated in (2.3.5) holds and for any
integer
tends to zero as uniformly inTherefore, A2.3.1 is fulfilled and the conclusion of the theorem follows
from Theorem 2.3.1.
Remark 2.3.1 The obvious sufficient condition for (2.3.5) is
which in turn is satisfied, if is continuous and
Remark 2.3.2 Theorems 2.3.2 and 2.3.3 with A2.2.2 and A2.2.4 re-
placed by A2.2.2° and A2.2.4’, respectively, remain valid, if isreplaced by time-varying
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 59/368
Stochastic Approximation Algorithms withExpanding Truncations 45
2.4. Necessity of Noise ConditionUnder Conditions A2.2.1–A2.2.4 we have established convergence The-
orems for recursively given by (2.1.1)–(2.1.3). Condition A2.2.1 is a
commonly accepted requirement for decreasing step size, while A2.2.2 is
a stability condition. This kind of conditions are unavoidable for conver-gence of SA type algorithms, although it may appear in different forms.
Concerning A2.2.4 on it is the weakest possible: neither continuity
nor growth rate of is required. So, it is natural to ask is it possi-ble to further weaken Condition A2.2.3 on noise? We now answer thisquestion.
Theorem 2.4.1 Assume only has one root , i.e., and
is continuous at Further, assume A2.2.1 and A2.2.2 hold. Thengiven by (2.1.1)–(2.1.3) converges to at those sample paths for
which one of the following conditions holds:
i)
ii) can be decomposed into two parts such that
and
Conversely, if then both i) and ii) are satisfied.
Proof. Sufficiency. It is clear that ii) implies i), which in turn implies
A2.2.3. Consequently, sufficiency follows from Theorem 2.2.1.Necessity. Assume Then is bounded and (2.1.1)–
(2.1.3) turns to be the RM algorithm after a finite number of steps (for. Therefore,
where
Since and is continuous, Condition ii) is satisfied. And,
Condition i) being a consequence of ii) also holds.
Remark 2.4.1 In the case where and is continuous at
, under conditions A2.2.1, A2.2.2, and A2.2.3 by Theorem 2.2.1 wearrive at Then by Theorem 2.4.1 we derive (2.4.1) which isstronger than A2.2.3. One may ask why a weaker condition A2.2.3 can
imply a stronger condition (2.4.1)? Are they equivalent ? The answer
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 60/368
46 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
is “yes” or “no”: Yes, these conditions are equivalent but only underadditional conditions A2.2.1, A2.2.2, and continuity of at beingthe unique root of However, these conditions by themselves are notequivalent because condition A2.2.3 is weaker than (2.4.1) indeed.
We now consider the multi-root case. Instead of the singleton wenow have a root set J . Accordingly, continuity of at is replacedby the following condition
In order to derive the necessary condition on noise, we consider the
linear interpolating function
where From form a family of func-
tions, where
where is a constant.For any subsequence define
where appearing on the right-hand side of (2.4.3) denotes the de-pendence of the limit function on the subsequence, and the limsup of a
vector sequence is taken component-wise. In general, may bediscontinuous.However, if then
which is not only continuous but also differentiable.Thus, (2.4.2) for the multi-root case corresponds to the continuity of
at for the single root case, while and a certain
analytic property of correspond to
Theorem 2.4.2 Assume (2.4.2), A2.2.1, A2.2.2, and A2.2.4 hold. Then
given by (2.1.1)–(2.1.3) is bounded, and the
right derivative for any convergent subsequence
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 61/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 62/368
48 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Necessity. We now assume is bounded, and
for any convergent subsequence and want toshow A2.2.3. Let For any from (2.4.5) we have
From (2.4.6) it is seen that
where asThe assumption means that
where and
Noticing the continuity of from (2.4.10) and (2.4.11) it follows
that
which incorporating with yields (2.4.9). Thus, we have
for any such that converges.By the boundedness of (2.4.12) is equivalent to (2.2.2), and the
proof is completed.
Corollary 2.4.1 Assume (2.4.2), A2.2.1, A2.2.2, and A2.2.4 hold, and assume J is not dense in any connected set. Then given by (2.1.1)–
(2.1.3) converges to some point in J if and only if A2.2.3 holds.
This corollary is a direct generalization of Theorem 2.4.1. The suffi-ciency part follows from Corollary 2.2.1, while the necessity part fol-
lows from Theorem 2.4.2 if notice that convergence of implies
for sufficiently large
The first term on the right-hand side of (2.4.8) tends to zero asby (2.4.2) and So, to verify A2.2.3 it suffices to
show that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 63/368
Stochastic Approximation Algorithms withExpanding Truncations 49
2.5. Non-Additive NoiseIn the algorithm (2.1.1)–(2.1.3) the noise in observation is
additive. In this section we continue considering (2.1.1)–(2.1.2) but inlieu of (2.1.3) we now have the non-additive noise
where is the observation noise at timeThe problem is that under which conditions does the algorithm defined
by (2.1.1), (2.1.2), and (2.5.1) converge to J , the root set of whichisthe average of with respect to its second argument? To be precise,let be an measurable function and let be a
distribution function in The function is defined by
It is clear that the observation given by (2.5.1) can formally be ex-pressed by the one with additive noise:
and Theorems 2.2.1 and 2.2.2 can still be applied. The basic problem ishow to verify A2.2.3. In other words, under which conditions onand does given by (2.5.3) satisfy A2.2.3?
Before describing conditions to be used we first introduce some no-tations. We always take the regular version of conditional probability.This makes conditional distributions introduced later are well-defined.
Let be the distribution function of and be theconditional distribution of given where
Further, let us introduce the following coefficients,
where denotes the Borel in and for a random variablewhere runs over all setswith probability zero.
is known as the mixing coefficient of and it measures the
dependence between and It is clear thatmeasures the closeness of the distribution of to
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 64/368
50 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
The following conditions will be needed.
A2.5.2 (=A2.2.2);
A2.5.3 is a measurable function and is locally Lipschitz-continuousin the first argument, i.e., for any fixed
where is a constant depending on
A2.5.4 (Noise Condition)
i) is a process with mixing coefficient asuniformly in
ii)
where is defined in (2.5.6);iii) as
Theorem 2.5.1 Assume A2.5.1–A2.5.4. Then for generated by
(2.1.1), (2.1.2), and (2.5.1)
where is a connected subset of
The proof consists in verifying Condition A2.2.3 satisfied a.s. bygiven in (2.5.3). Then the theorem follows from Theorems 2.2.1 and
2.2.2.We first prove lemmas.
Lemma 2.5.1 Assume A2.5.1, A2.5.3, and A2.5.4 hold. Then there
is an with such that for any and any bounded
subsequence of say,
A2.5.1
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 65/368
Stochastic Approximation Algorithms withExpanding Truncations 51
(without loss of generality assume there exists an integer
such that for all
if T is small enough, where is given by (2.1.1), (2.1.2), and (2.5.1),and is given by (1.3.2).
Proof. For any set
By setting in (2.5.6), it is clear that
From (2.5.7), it follows that
and
where (and hereafter) L is taken large enough so thatSince is a convergent martingale, there is a a.s.
such that
From (2.5.13) and it is clear that for any integer L the
series of martingale differences
converges a.s.Denote by the where the above series converges, and set
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 66/368
52 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
It is clear thatLet be fixed and with and
Then for any integer by (2.5.13) we have
where the first term on the right-hand side tends to zero as by(2.5.15).
Assume is sufficiently large such thati) for if as orii) if
We note that in case ii) there will be no truncation in (2.1.1) for
Assume and fix a small enough T such that Letbe arbitrarily fixed.We prove (2.5.9) by induction. It is clear (2.5.9) is true forAssume (2.5.9) is true for and there
is no truncation for if Noticingwe have, by (2.5.16)
if is large enough.
This means that at time there is no truncation in (2.1.1), and
Lemma 2.5.2 Assume A2.5.1, A2.5.3, and A2.5.4 hold. There is anwith such that if and if as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 67/368
Stochastic Approximation Algorithms withExpanding Truncations 53
is a bounded subsequence of produced by (2.1.1), (2.1.2),
and (2.5.1), then
Proof. Write
where
By (2.5.13), for we have
which converges to a finite limit as by the martingale conver-gence theorem.
Therefore, for any integers L and
converges a.s.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 68/368
54 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Therefore, there is with such that (2.5.23) holds forany integers L and
Let be fixed, By Lemma 2.5.1,for small
Then
for any by (2.5.23).
We now estimate (II). By Lemma 2.5.1 we have the following,
Noticing (2.5.7) and (2.5.14), we then have
Similarly, by Lemma 2.5.1 and (2.5.7)
Combining (2.5.18), (2.5.24), and (2.5.26) leads to
Therefore, to prove the lemma it suffices to show that the right-handside of (2.5.27) is zero.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 69/368
Stochastic Approximation Algorithms withExpanding Truncations 55
Applying the Jordan-Hahn decomposition to the signed measure,
and noticing that is a process with mixing coefficientwe know that there is a Borel set D in such that for any
Borel set A in
and
Then, we have the following,
where
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 70/368
56 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
For any given there is a j such that
For any fixed by (2.5.13), (2.5.14), and it follows that
Therefore,
Since may be arbitrarily small, this combining with (2.5.27)
proves the lemma.
Proof of Theorem 2.5.1.For proving the theorem it suffices to show that A2.2.3 is satisfied by
a.s. By Lemma 2.5.2, we need only to prove
that
for is a bounded subsequence, and asAssume
Applying the Jordan-Hahn decomposition to the signed measure,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 71/368
Stochastic Approximation Algorithms withExpanding Truncations 57
we conclude that
where for the last inequality (2.5.8) and (2.5.12) are invoked. Sinceas the right-hand side of (2.5.32) tends to zero asfor any This proves (2.5.31) and completes the proof
of Theorem 2.5.1.
Remark 2.5.1 From the expression (2.5.3) for observation it is seenthat the observation with non-additive noise can be reduced to the ad-ditive but state-dependent noise which was considered in Section 2.3.However, Theorem 2.5.1 is not covered by Theorems in Section 2.3 andvice versa.
2.6. Connection Between Trajectory Convergenceand Property of Limit Points
In the multi-root case, what we have established so far is that the dis-tance between given by (2.1.1)–(2.1.3) and a connected subsetof converges to zero under various sets of conditions.
As pointed out in Corollary 2.2.1, if J is not dense in any connectedset, then converges to a point belonging to However, it is stillnot clear how does behave when J is dense in some connected set?The following example shows that still may not converge, although
Example 2.6.1 Let
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 72/368
58 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and let
Take step sizes as follows
We apply the RM algorithm (2.2.34) withAs we may take
Then, all conditions A2.2.1–A2.2.4 are satisfied.Notice that
and
where k is such that
By (2.6.1), it is clear that in (2.6.2)
and
Therefore, is bounded and by Theorem 2.2.4.
As a matter of fact, changes from one to zero and then from zero
to one, and this process repeats forever with decreasing step sizes.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 73/368
Stochastic Approximation Algorithms withExpanding Truncations 59
Thus, is dense in [0,1]. This phenomenon hints that for tra- jectory convergence of the stability-like condition A2.2.2 is notenough; a stronger stability is needed.
Definition 2.6.1
A point i.e., a root of is called dominantlystable for if there exist a and a positive measurable function
which is bounded in the interval and
satisfies the following condition
for all the ball centered at with radius
Remark 2.6.1 The dominant stability implies stability. To see this, it
suffices to take as the Lyapunov function. Then
The dominant stability of however, is not necessary for asymptoticstability.
Remark 2.6.2 Equality (2.6.3) holds for any whatever is.Therefore, all interior points of J are dominantly stable for Further,
for a boundary point of J to be dominantly stable for it sufficesto verify (2.6.3) for with small i.e., all that areclose to and outside J .Example 2.6.2 Let
In fact, is the gradient of
In this example We now show that all points of J
are dominantly stable for For this, by Remark 2.6.2, it suffices toshow that all with are dominantly stable for and for this,it in turn suffices to show (2.6.3) for any with and
for small enough Denoting by the angle between vectorsand we have for
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 74/368
60 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
It is clear that
for all small enough Therefore, all points in J are dominantlystable for
Theorem 2.6.1 Assume A2.2.1, A2.2.2, and A2.2.4 hold. If for a
given is convergent and a limit point of generated
by (2.1.1)–(2.1.3) is deminantly stable for then for this trajectory
Proof. For any define
where is the one indicated in Definition 2.6.1.
It is clear that is well-defined, because there is a convergent subse-quence: and for any greater than some If for any for some then by arbitrariness of
Therefore, for proving the theorem, it suffices to show that, for anysmall an exists such that implies if
Since implies A2.2.3, all conditions of Theorem 2.2.1
are satisfied. By the boundedness of we may assume that islarge enough so that the truncations no longer exist in (2.1.1)–(2.1.3)for It then follows that
Notice that for any andis bounded, and hence by (2.6.3)
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 75/368
Stochastic Approximation Algorithms withExpanding Truncations 61
for some because is convergent and
Further,
An argument similar to that used for (2.6.5) leads to
if is large enough.Then from (2.6.6) we have
From (2.6.4) and (2.6.7) we see that we can inductively obtain
Then, noticing by definitions of we have
where the elementary inequality
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 76/368
62 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
is used with for the first inequality in (2.6.8), and with
for the third inequality in (2.6.8). Because is bounded,
and an exists such
that
This means that and completes the proof.For convergence of SA algorithms we have imposed the stability-like
condition A2.2.2 for and the dominant stability con-
dition (2.6.3) for trajectory convergence. It is natural to ask does a limit
point of trajectory possess a certain stability property? The followingexample gives the negative answer.
Example 2.6.3 Let
It is straightforward to check that
satisfies A2.2.2. Take
where is a sequence of mutually independent random
variables such that a.s. Then with 1 being
a stable attractor for and all A2.2.1–A2.2.4 are satisfied. TakeThen by Theorem 2.2.1 it follows that
a.s. Since must converge to 0 a.s. Zero, however, isunstable for
In this example converges to a limit, which is independent of ini-
tial values and unstable, although conditions A2.2.1–A2.2.4 hold. This
strange phenomenon happens because
as a function of is singular for some in the sense that it
restricts the algorithm to evolve only in a certain set of Therefore,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 77/368
Stochastic Approximation Algorithms withExpanding Truncations 63
in order the limit of to be stable, imposing a certain regularitycondition on and some restrictions on noises is unavoidable.
As in Section 2.3, assume that observation noise iswith being a measurable function defined on Set
Let us introduce the following conditions:
A2.6.1 For a given is a surjection for any
A2.6.2 For any and is continuous in and for any
and
where denotes the ball centered at with radius
It is clear, that A2.6.2 is equivalent to A2.6.2’:
A2.6.2’ For any and any compact set
Before formulating Theorem 2.6.2 we first give some remarks on Con-ditions A2.6.1 and A2.6.2.
Remark 2.6.3 If does not depend on then in (2.6.9)can be removed when taking supremum. In Condition A2.2.3
is a convergent subsequence, and hence is automaticallylocated in a compact set. In Theorems in Sections 2.2, 2.3, 2.4, and2.5, the initial value is fixed, and hence for fixed is a fixedsequence. In contrast to this, in Theorem 2.6.2 we will consider the casewhere the initial value arbitrarily varies, and hence for any fixedmay be any point in If in (2.6.9) were not restricted to a compact
set (i.e., with removed in (2.6.9)), then the resultingcondition would be too strong. Therefore, to put in(2.6.9) is to make the condition reasonable.
Remark 2.6.4 If is continuous and if then is a surjection.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 78/368
64 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By this property, is a surjection for a large class of Forexample, let be free of and let the growth rate of be notfaster than linear as Then with satisfying A2.2.1 wehave as for all Hence, A2.6.1
holds. In the case where the growth rate of is faster than linearas and for some we alsohave that as for all and A2.6.1
holds.In what follows by stability of a set for we mean it in the
Lyapunov sense, i.e., a nonnegative continuously differentiable function
exists such that andfor some where
Theorem 2.6.2 Assume A2.2.1, A2.2.2, and A2.6.2 hold, and that is continuous and for a given A2.6.1 holds. If defined by (2.1.1)–
(2.1.3) with any initial value converges to a limit independent of
then belongs to the unique stable set of
Proof. Since by A2.2.2 and by conti-
nuity of exists with such thatHence, By continuity of J is closed, and hence by A2.2.2,
Since we must have Denote by the connected
subset of containing The minimizer set of that contains isclosed and is contained in Since is a connected set
and byA2.2.2 is nowhere dense, is a constant.By continuity of all connected root-sets are closed and they are
separated. Thus, there exists a such that
i.e., contains no root of other than those located inSet
Then andTherefore, by definition, is stable for
We have to show that and is the unique stable root-set.Let be the connected set of such
that contains By continuity of for an arbitrary small
exist such that and the distance
between the interval and the set is positive;
i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 79/368
Stochastic Approximation Algorithms withExpanding Truncations 65
We first show that, for any and there existand such that, for any if then
By Theorem 2.2.1, for with sufficiently large there will beno truncation for (2.1.1)–(2.1.3), and
For any let By A2.6.2, sufficiently small
and large enough exist such that for any
If for then (2.6.10) immediately
follows by setting Assume for someLet be the first such one. Then
By (2.6.11), however,
which contradicts (2.6.12). Thus and (2.6.10)is verified.
For a given we now prove the existence of such thatfor any if where the dependence of
on and on the initial value is emphasized. For simplicity of writing,
is written as in the sequel.Assume the assertion is not true; i.e., for any exists such thatand for some
Suppose and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 80/368
66 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
If there exists an with then withexists because is connected and with
This yields a contradictory inequality:
where the first inequality follows from A2.2.2 while the second inequalityis because is the minimizer of
Consequently, for any and
and a subsequence of exists, also denoted by for
notational simplicity, such that By the continuityof
Hence, by the factBy (2.6.10) and the fact we can choose sufficiently
small T and large enough N such that
and i.e.,
for any By (2.6.10), exists with theproperty such that
Because as for sufficiently large N,
by (2.6.10) the last term of (2.6.15) is Then
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 81/368
Stochastic Approximation Algorithms withExpanding Truncations 67
By (2.6.10) and the continuity of the third term on the right handside of (2.6.16) is and by A2.6.2 (Since
with for all sufficiently large N .), the norm of the secondterm on the right-hand side of (2.6.16) is also as Henceby A2.2.2 and (2.6.13), some exists such that the right-handside of (2.6.16) is less than for all sufficiently large N if T is smallenough. By noticing and mentioned
above, from (2.6.14) it follows that the left-hand side of (2.6.16) tendsto a nonnegative limit as The obtained contradiction showsthat exists such that for any if With fixed for any byA2.6.1 exists such thatBy and the arbitrary smallness of from this it
follows that Since by assumption, we havewhich means that is stable. If another stable set existed
such that then by the same argument would belong toThe contradiction shows that the uniqueness of the stable set.
2.7. Robustness of Stochastic ApproximationAlgorithms
In this section for the single root case, i.e, the case we
consider the behavior of SA algorithms when conditions for convergenceof algorithms to are not exactly satisfied. It will be shown that a“small” violation of conditions will cause no big effect on the behaviorof the algorithm.
The following result known as Kronecker lemma will be used severaltimes in the sequel. We state it separately for convenience of reference.
Kronecker Lemma. If where is a sequence
of positive numbers nondecreasingly diverging to infinity and is a
sequence of matrices, then
Proof. Set Since
for any there is such that if Then it
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 82/368
68 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
follows that
as and thenWe still consider the algorithm given by (2.1.1)–(2.1.3), where de-
notes the estimate for at time but may not be the exact root of As a matter of fact, the following set of conditions will be used to
replace A2.2.1–A2.2.4:
A2.7.1 nonincreasingly tends to zero, and
exists such that
A2.7.2 There exists a nonnegative twice continuously differentiabl e func-
tion such that and
A2.7.3 For sample path the observation noise satisfies the fol-
lowing condition
A2.7.4 is continuous, but is not necessary to
be the root of
Comparing A2.7.1–A2.7.4 with A2.2.1–A2.2.4, we see the followingconditions required here are not assumed in Section 2.2: nonincreasing
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 83/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 84/368
70 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Set
We will only consider those in (2.7.2) for which where is givenin (2.7.7). From (2.7.7) and (2.7.8) it is seen thatConsequently, by (2.7.2), a given by (2.7.12) is positive.
By continuity of and and existsuch that the following inequalities hold:
By A2.7.3 for can be taken sufficiently large suchthat
Lemma 2.7.1 Assume A2.7.1, A2.7.2, A2.7.4 hold with given in (2.7.3)being less than or equal to If for given by (2.1.1)–
(2.1.3) with (2.7.5) fulfilled, for some where K is
given in (2.7.18), then for any
Proof. Because is nondecreasing as T
increases, it suffices toprove the lemma forAssume the converse: there exists an such that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 85/368
Stochastic Approximation Algorithms withExpanding Truncations 71
Then for any we have
and hence
which incorporating with the definition of leads to
On the other hand, from (2.7.20) and (2.7.21) it follows that
From (2.7.9) we have
By a partial summation we have
Applying (2.7.3) to the first two terms on the right-hand side of (2.7.25),
and (2.7.1) and (2.7.3) to the last term we find
From (2.7.24) and (2.7.26) it then follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 86/368
72 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
which contradicts (2.7.22). This proves the lemma.
Lemma 2.7.2 Under the conditions of Lemma 2.7.1, for any
the following estimate holds:
Proof. Since by Lemma 2.7.1 we have
and hence
Consequently, we have
Lemma 2.7.3 Assume A2.7.1–A2.7.4 hold and satisfies (2.7.7). Then
for the sample path for which A 2.7.3 holds, a that is independent of
and exists such that
in other words, given by (2.1.1)–(2.1.3) is bounded.
Proof. Let be a sufficiently large integer such that
where K is given by (2.7.18).
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 87/368
Stochastic Approximation Algorithms withExpanding Truncations 73
Assume the lemma is not true. Then there exist and such
that Let be the maximal integer satisfying thefollowing equality:
Then by definition we have
and by (2.7.28) and (2.7.29),
We first show that under the converse assumption there must be ansuch that
Otherwise, for any and from (2.7.24) it follows
that
This together with (2.7.30) implies
which contradicts with the converse assumption.Hence (2.7.31) must be held.By the definition of (2.7.6), and (2.7.30) we have
Since by (2.7.31), from (2.7.4) and (2.7.6) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 88/368
74 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
We now show For this it suffices to proveby noticing (2.7.34).
Since similar to (2.7.32) we have
and hence
From (2.7.32) and (2.7.36) it is seen that
where for the second inequality, (2.7.9) and are used, whilefor the last inequality (2.7.18) is invoked.
Paying attention to (2.7.10), we have and andby (2.7.16)
Then by (2.7.32) we see and (2.7.34) becomes
Thus, we can define
and have
Taking in Lemmas 2.7.1 and 2.7.2, and paying attentionto (2.7.4) and we know By Lemmas 2.7.1and 2.7.2, from (2.7.28) we see From (2.7.28)–(2.7.30) wehave obtained which together with the definition of
implies and hence Therefore, iswell defined, and by the Taylor’s expansion we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 89/368
Stochastic Approximation Algorithms withExpanding Truncations 75
where with components located in-between andWe now show that which, as to be shown, implies
a contradiction.By Lemma 2.7.2 we have
and hence
By (2.7.10) it follows that and by (2.7.11).Using Lemma 2.7.1, we continue (2.7.41) as follows:
Noticing we seeIt is clear that (2.7.35) and (2.7.37) remain valid with replacedby Hence, similar to (2.7.37) we have
By (2.7.11) and the Taylor’s expansion we have
and consequently,
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 90/368
76 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By (2.7.40), Substituting (2.7.44) into (2.7.43) and using(2.7.12) lead to
Estimating by the treatment similar to that used for
(2.7.26) yields
Noticing by Lemma 2.7.2 we find that
and
Hence, and by (2.7.15) from (2.7.45) it follows that
Using (2.7.14), from the above estimate we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 91/368
Stochastic Approximation Algorithms withExpanding Truncations 77
From (2.7.18) it follows that Taking notice of (2.7.13) by
(2.7.17) we derive
On the other hand, by Lemma 2.7.2 and (2.7.11), (2.7.17), and (2.7.44)
it follows that
where
From (2.7.39), (2.7.40), and (2.7.48) we see that
and hence which contradicts with (2.7.47). This
means that the converse assumption of the lemma cannot be held.
Corollary 2.7.1 From Lemma 2.7.3 it follows that there exist
and which is independent of and arbitrarily varying in
intervals and such that
and for with sufficiently large the algorithm (2.1.1)–(2.1.3)
turns to an ordinary RM algorithm:
Set
Take and denote
By A2.7.2, Set
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 92/368
78 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
If in (2.7.2), then In the general case may
be positive.
Theorem 2.7.1 Assume A2.7.1–A2.7.4 hold and is given by
(2.1.1)–(2.1.3) with (2.7.5) held. Then there exist
and a nondecreasing, left-continuous function defined on such
that for the sample path for which A2.7.3 holds,
whenever and where and are the ones appearing in
(2.7.2) and (2.7.3), respectively. As a matter of fact, can be taken as
the inverse function of
Proof. Given recursively define
We now show that exists such that
Set and assume
From the recursion of we have
Assume is large enough such that by A2.7.3
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 93/368
Stochastic Approximation Algorithms withExpanding Truncations 79
By a partial summation, from (2.7.57) we find that
where (2.7.58) is invoked.By (2.7.1) we see
Without loss of generality, we may assume Then by(2.7.1) we have
Applying (2.7.60) and (2.7.61) to (2.7.59) leads to
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 94/368
80 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and hence
which implies (2.7.56).For and by (2.7.53)
Taking this into account for by (2.7.51)–(2.7.54) and the
Taylor’s expansion we have
Therefore, in the following Taylor’s expansion
we have and henceand
Denote
For we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 95/368
Stochastic Approximation Algorithms withExpanding Truncations 81
From (2.7.63) and (2.7.64) it then follows that
Similar to (2.7.62), we see that
Consequently, we arrive at
Define
It is clear that is nondecreasing as increases and
Take such that Then we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 96/368
82 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Define function
It is clear that is left-continuous, nondecreasing and
From (2.7.66) and (2.7.67) it follows that
which implies, by (2.7.57) and the definition of
Corollary 2.7.2 If in (2.7.2) may not be zero), then
and the right-hand side of (2.7.55) will be
Since may be arbitrarily small and hence the estimation errormay be arbitrarily small. If, in addition, in A2.7.3, then
tending and then in both sides of (2.7.55) we derive
In the case where by tending the right-hand side of (2.7.55) converges to
Consequently, as the estimation error depends on how big isIf in (2.7.2), then can also be taken
arbitrarily small and the estimation error depends on the magnitude of
2.8. Dynamic Stochastic ApproximationSo far we have discussed the root-searching problem for an unknown
function, which is unchanged during the process of estimation. We nowconsider the case where the unknown functions together with their rootschange with time. To be precise, Let be a sequence of unknown
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 97/368
Stochastic Approximation Algorithms withExpanding Truncations 83
functions with roots i.e.,Let be the estimatefor at time based on the observations
Assume the evolution of the roots satisfies the following equation
where areknown functions, while is a sequenceof dynamic noises.
The observations are given by
where is the observation noise and is allowed to depend on
In what follows the discussion is for a fixed sample, and the analysisis purely deterministic. Let us arbitrarily take as the estimate forand define
From equation (2.8.1), we see that may serve as a rough esti-mate for In the sequel, we will impose some conditions onand sothat
where is an unknown constant. Therefore, should notdiverge to infinity. But is unknown, so we will use the expandingtruncation technique.
Take a sequence of increasing numbers satisfying
Let be recursively defined by the following algorithm:
where denotes the number of truncations in (2.8.5) occurred untiltime
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 98/368
84 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
We list conditions to be used.
A2.8.1 and
A2.8.2 is measurable and for any
constant possibly depending on exists so that
for with
A2.8.3 is known such that
for where
and
A2.8.4 and
A2.8.5 There is a continuously differentiable function
such that for and for any
where is a positive constant possibly depending on and A con-
stant exists such that
where is an unknown constant that is an upper bound for
A2.8.6 For any convergent subsequence the observation noise
satisfies
where
Remark 2.8.1 Condition A2.8.2 implies the local boundedness, but theupper bound should be uniform with respect to In A2.8.3,measures the difference between the estimation error and the
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 99/368
Stochastic Approximation Algorithms withExpanding Truncations 85
prediction error In general, is greaterthan For example, then A2.8.3 holdswith A2.8.4 means that in the root dynamics, thenoise should be vanishing.
As A2.2.3, Condition A2.8.6 is about existence of a Lyapunov func-tion. To impose such kind a condition is unavoidable in convergenceanalysis of SA algorithms. Inequality (2.8.7) is an easy condition. Forexample, if as then this condition is automati-cally satisfied. The noise condition A2.8.6 is similar to A2.2.3.
Before analyzingconvergence property ofthe algorithm (2.8.5), (2.8.6),and (2.8.2) we give an example of application of dynamic stochastic ap-
proximation.
Example 2.8.1 Assume that the chemical product is produced in abatch mode, and the product quality or quantity of the batch de-pends on the temperature in batch. When the temperature equals theideal one, then the product is optimized. Let denote the deviationof the temperature from its optimal value for the batch, wheredenotes the control parameter, which may be, for example, the pressurein batch, the quantity of catalytic promoter, the raw material propor-tion and others. The deviation reduces to zero if the control equals itsoptimal value i.e., Because of the environment change,the optimal parameter may change from batch to batch. Assume
where is known and is the noise.
Let be the estimate for Then may serve as a prediction
for Apply as the control parameter for the batch.Assume that the temperature deviation of for the thbatch can be observed, but the observation may be corrupted bynoise, i.e., where is the observationnoise.
Then we can apply algorithm (2.8.5), (2.8.6), and (2.8.2) to estimateUnder conditions A2.8.1–A2.8.6, by Theorem 2.8.1 to be proved in
this section, the estimate is consistent, i.e.,
Theorem 2.8.1 Under Conditions A2.8.1–A2.8.6 the estimation error
tends to zero as where is given by (2.8.5),
(2.8.6), and (2.8.2).
To prove the theorem we start with lemmas.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 100/368
86 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Lemma 2.8.1 Under A2.8.3 and 2.8.4, the sequence
is bounded for any
Proof. By A2.8.3 and A2.8.4 from (2.8.1) it follows that
Lemma 2.8.2 Assume A2.8.1–A2.8.4 and A2.8.6 hold. Let beaconvergent subsequence such that as Then, there
are a sufficiently small and a sufficiently large integer such that
for
where is implied by
for where is a constant independent
of
Proof. In the case as
is bounded, and hence is bounded. By Lemma
2.8.1, is bounded. Therefore, is bounded. For
large and
The following expression (2.8.11) and estimate (2.8.12) will frequentlybe used. By (2.8.1) and A2.8.3 we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 101/368
Stochastic Approximation Algorithms withExpanding Truncations 87
and
Substitution of (2.8.12) into (2.8.10) leads to
By boundedness of and A2.8.3,
for some ByA2.8.4, while the last term is also
less than by A2.8.6.Without loss ofgenerality, we may assume
Therefore, and the lemma is true for the case
We now consider the case as Let be so large
that for
with being a constant, and
where is given by (2.8.8).
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 102/368
88 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Without loss of generality we may assume
Define and take T so small that Weprove the lemma by induction.
By (2.8.8) and (2.8.12), we have
Therefore, at time there is no truncation. Then by (2.8.11) and
(2.8.12) we have
where (2.8.14) and (2.8.15) have been used.Let the conclusions of the lemma hold for
We prove that it also holds for Again by (2.8.12), we have
Hence there is no truncation at time By the inductive assump-tion, (2.8.11) and (2.8.12), it follows that
where (2.8.13) and (2.8.14) are invoked.Therefore, the conclusions of the lemma are also true for This
completes the proof.
Lemma 2.8.3 Assume A2.8.1–A2.8.6 hold. Then the number of trun-
cations in (2.8.5) is finite and isbounded.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 103/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 104/368
90 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and (2.8.11) we have
Notice that by Lemma 2.8.2 and (2.8.13)
for sufficiently large From (2.8.21) and (2.8.23), it follows that
On the other hand, by Lemma 2.8.2
Identifying and in A2.8.5 to and respectively, we can
find such that
by A2.8.5.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 105/368
Stochastic Approximation Algorithms withExpanding Truncations 91
Let us consider the right-hand side of (2.8.22). Noticing
by A2.8.3 and A2.8.4 we have
By A2.8.6,
Noticing that
as and by continuity of we find thattends to zero as and
Since the sum of the first and second
terms on the right-hand side of (2.8.22) is as and
This combining with (2.8.26) yields the following conclusion that for
with sufficiently large and for small enough T from (2.8.22) it
follows that
By (2.8.20), tending to infinity, from (2.8.30) we derive
By Lemma 2.8.2 we have
However, by definition,and Hence from (2.8.32), we must have
if T is small enough. Therefore, This contradicts
(2.8.31). The obtained contradiction shows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 106/368
92 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Theorem 2.8.2 Assume A2.8.1–A2.8.6 hold. Then the estimation er-ror tends to zero as
Proof. We first show that converges. Assume the converse:
where because is bounded by Lemma 2.8.3.It is clear that there exists an interval that does not containzero such that Without loss of generality, assume
From A2.8.6, it follows that there are infinitely manysequences such that and that
forWithout loss of generality we may assume converges:
Since exists such that and byLemma 2.8.2, Completely thesame argument as that used for (2.8.22)–(2.8.32) leads to a contradiction.Hence is convergent.
We now show that as Assume the converse: thereis a subsequence By the same argument we again arrive
at (2.8.30). Tending by convergence of we obtain acontradictory inequality This implies that as
The following theorem is similar to Theorem 2.4.1.
Theorem 2.8.3 Assume A2.8.1–A2.8.5 hold and is continuous at
uniformly in Then as if and only if A2.8.6
holds. Furthermore, under conditions A2.8.1–A2.8.5, the following three
conditions are equivalent.
1) Condition A2.8.6;
2)
3) can be decomposed into two parts: so that
Proof. Assume as Then is bounded. Wehave shown in the proof of Lemma 2.8.3 that the number of truncationsmust be finite if is bounded. Therefore, starting from some thealgorithm (2.8.5) becomes
The following theorem is similar to Theorem 2.4.1.
and as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 107/368
Stochastic Approximation Algorithms withExpanding Truncations 93
From (2.8.11) we have
Set
By A2.8.3 and A2.8.4 and as
while tends to zero becauseis uniformly continuous at and Consequently,3) holds.
On the other hand, it is clear that 3) implies 2), which in turn im-plies A2.8.6. By Theorem 2.8.1, under A2.8.1–A2.8.5, Condition A2.8.6implies as
Thus, the equivalence of l)–3) has been justified under A2.8.1–A2.8.5.
2.9. Notes and ReferencesThe initial version of SA algorithms with expanding truncations and
its associated analysis method were introduced in [27], where the algo-rithm was called SA with randomly varying truncations. Convergenceresults of this kind of algorithms can also be found in [14, 28]. The-orems given in Section 2.2 are the improved versions of those given in[14, 27, 28]. Theorems in Section 2.3 can be found in [18]. Necessity of
the noise condition is proved in [24, 94] for the single-root case, and in[17] for the multi-root case.Convergence results of SA algorithms with additive noise can be found
in [16]. Concerning the measure theory, we refer to [31, 76, 84]. Resultsgiven in Section 2.6 can be found in [48], and some related problems arediscussed in [3]. For the proof of Remark 2.6.4 we refer to Theorem 3.3in [34]. Example 2.6.1 can be found in [93]. Robustness of SA algorithmsis presented in [24]. The dynamic SA was considered in [38, 39, 91], but
the results presented in Section 2.8 are given in [25].
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 108/368
Chapter 3
ASYMPTOTIC PROPERTIES OF STOCHA-
STIC APPROXIMATION ALGORITHMS
In Chapter 2 we were mainly concerned with the path-wise conver-gence analysis for SA algorithms with expanding truncations. Condi-tions were given to guarantee where J denotes the
root set of the unknown function, and the estimate for unknown rootgiven by the algorithm.
In this chapter, for the case where J consists of a singleton we
consider the convergence rate of asymptotic normality of and asymptotic efficiency of the estimate.Assume is differentiable at Then as
whereIt turns out that the convergence rate heavily depends on whether
or not F is degenerate. Roughly speaking, in the case where the stepsize in (2.1.1) the convergence rate of forsome positive when F is nondegenerate, and for some
when F vanishes.It will be shown that is asymptotically normal and the covari-
ance matrix of the limit distribution depends on the matrix D if in(2.1.1) the step size is replaced by If F in (3.0.1) is available,then D can be defined to make the limiting covariance matrix minimal,i.e., to make the estimate efficient. However, this is not the case in SA.To overcome the difficulty one way is to derive the approximate valueof F by estimating it, but for this one has to impose rather heavy con-ditions on Efficiency here is derived by using a sequence of slowly
95
is
and
to
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 109/368
96 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
decreasing step sizes, and the averaged estimate appears asymptoticallyefficient.
3.1. Convergence Rate: Nondegenerate CaseIn this section, we give the rate of convergence of to zero
in the case F in (3.0.1) is nondegenerate, where is given by (2.1.1)–(2.1.3). It is worth noting that F is the coefficient for the first order inthe Taylor’s expansion for
The following conditions are to be used.
A3.1.2 A continuously differentiable function exists
such that
for any and for some with
where is used in (2.1.1).
A3.1.3 For the sample path under consideration the observation noise
in (2.1.3) can be decomposed into two parts such that
for some
A3.1.4 is measurable and locally bounded, and is differentiable at
such that as
The matrix F is stable (This implies nondegeneracy of F.), in ad-
dition, is also stable, where and are given by (3.1.1) and (3.1.3), respectively.
By stability of a matrix we mean that all its eigenvalues are with
negative real parts.
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 110/368
Asymptotic Properties of Stochastic Approximation Algorithms 97
Remark 3.1.1 We now compare A3.1.1–A3.1.4 with A2.2.1–A2.2.4. Be-cause of additional requirement (3.1.1), A3.1.1 is stronger than A2.2.1,but it is automatically satisfied if with In this case a in(3.1.1) equals Also, (3.1.1) is satisfied if with
In this case Take sufficiently small such that
Then and Assume
is a martingale difference sequence with
Then by the convergence theorem for martingale difference se-
quences, Therefore (3.1.3) is satisfied a.s. with
Condition A3.1.4 assumes differentiability of whichis not required in A2.2.4.
Lemma 3.1.1 Let and H be -matrices. Assume H is stable
and If satisfies A3.1.1 and l-dimensional vectors
satisfy the following conditions
then defined by the following recursion with arbitrary initial value
tends to zero:
Proof. Set
We now show that there exist constants and such that
Let S be any negative definite matrix. Consider
at
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 111/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 112/368
Asymptotic Properties of Stochastic Approximation Algorithms 99
and hence
where denotes the minimum eigenvalue of P.
Paying attention to that
from (3.1.13) we derive
which verifies (3.1.8).From (3.1.6) it follows that
We have to show that the right-hand side of (3.1.14) tends to zero as
For any fixed because of (3.1.1) and(3.1.8). This implies that as for any initial value
Since as for any exists such thatThen by (3.1.8) we have
The first term at the right-hand side of (3.1.15) tends to zero by A3.1.1,while the second term can be estimated as follows:
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 113/368
100 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where the first inequality is valid for sufficiently large sinceas and the second inequality is valid when
Therefore, the right-hand side of (3.1.15) tends to zero asand then
Set
By assumption of the lemma Hence, for anythere exists such that By a partialsummation, we have
where except the last term, the sum of remaining terms tends to zero asby (3.1.8) and
Let us now estimate
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 114/368
Asymptotic Properties of StochasticApproximation Algorithms 101
Since for and as by (3.1.8)
we have
which tends to zero as and by (3.1.16) and the factthat Thus, the right-hand side of (3.1.17) tends to
zero as and the proof of the lemma is completed.Theorem 3.1.1 Assume A3.1.1- A3.1.4 hold. Then given by (2.1.1)–
(2.1.3) for those sample paths for which (3.1.3) holds converges to
with the following convergence rate:
where is the one given in (3.1.3).
Proof. We first note that by Theorem 2.4.1 and there is no
truncation after a finite number of steps. Without loss of generality, we
may assumeBy (3.1.1), Hence, by the Taylor’s expansion we
have
Write given by (3.1.4) as follows
where
By (3.1.4) and (3.1.19), for sufficiently large k we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 115/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 116/368
Asymptotic Properties of StochasticApproximation Algorithms 103
if is a martingale difference sequence with
So, for (3.1.25) it is sufficient to require
Since the best convergence rate is achievedat the convergence rate is Sincethe convergence rate is slowing down as approaches
to When (3.1.25) cannot be guaranteed. From this it is seenthat the convergence rate depends on how big is.
3.2. Convergence Rate: Degenerate CaseIn the previous section, for obtaining the convergence rate of
stability and hence nondegeneracy of F is an essential requirement. Wenow consider what will happen if the linear term vanishes in the Taylor’sexpansion of For this we introduce the following set of conditions:
A3.2.2 A continuously differentiable function exists
such that
for any and for some withwhere is used in (2.1.1);
A3.2.3 For the observation noise on the sample path under con-sideration the following series converges:
where
A3.2.4 is measurable and locally bounded, and is differentiable at
such that as
where F is a stable matrix, and is the one used in A3.2.3.
We first note that in comparison with A3.1.1–A3.1.4, here we do notrequire (3.1.1), but A3.2.2 is the same as A3.1.2. From (3.2.3) we see that
A3.2.1 and
or
For
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 117/368
104 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
the Taylor’s expansion for does not contain the linear term. HereF is the coefficient for a term higher than second order in the Taylor’sexpansion of The noise condition A3.2.3 is different from A3.1.3,but, as to be shown by the following lemma, it also implies A2.2.3.
Lemma 3.2.1 If (3.2.2) holds, then and hence A2.2.3
is satisfied.
Proof. We need only to show
Setting
by a partial summation we have
Since as and converges as the first twoterms on the right-hand side of (3.2.4) tend to zero as and
The last term in (3.2.4) is dominated by
where
By the following elementary calculation we conclude that the right-hand side of (3.2.5) tends to zero as and
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 118/368
Asymptotic Properties of StochasticApproximation Algorithms 105
which tends to zero as and because as
This combining with (3.2.4) and (3.2.5) shows that
By the Lyapunov equation (3.1.9), there is a positive definite matrixP > 0 such that
Assuming is large enough so that there is no truncation, by (3.2.3) wehave
where is the maximum eigenvalue of P given by (3.2.6).
We start with lemmas. Note that by Theorems 2.2.1 or 2.4.1Therefore, starting from some the algorithm has no truncation.
Define
Denote by and the maximum and minimum eigenvalues of P, respectively, and by K the condition number
Theorem 3.2.1 Assume A3.2.1–A3.2.4 hold and is given by (2.1.1)
–(2.1.3). Then for the sample paths where A3.2.3 holds the following
convergence rate takes place:
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 119/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 120/368
Asymptotic Properties of StochasticApproximation Algorithms 107
consider the case since if it is not true then is clearlybounded.
Let P be given by (3.2.6). We have
where
In what follows we will prove that
By (3.2.10) and (3.2.6) it is clear that
where the last inequality follows by the following consideration:
By (3.2.11) so for (3.2.16) it suffices to show that
By definition of we have and hence
or
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 121/368
108 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Consequently,
and by the agreement
which verifies the last inequality in (3.2.16).
We now estimate By (3.2.10) (3.2.11) and the agreementwe have
Noticing that, as agreed,from (3.2.17) we have
and by (3.2.13),
Again, from (3.2.10) and noticing we have
Consequently, by (3.2.12)
Combining (3.2.14), (3.2.16), (3.2.18), and (3.2.20) yields
for
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 122/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 123/368
110 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Proof of Theorem 3.2.1.By Lemma 3.2.2 and the fact
we have
where
By setting
from (3.2.9) it follows that
This is nothing else but an RM algorithm. Since by Lemma 3.2.2is bounded, no truncation is needed and one may apply Theorem 2.2.1”.
First note that
Hence, A2.2.1 is satisfied.
as So A2.2.3 holds with replaced by
A2.2.4 is clearly satisfied, since is continuous. The key issue is tofind a satisfying A2.2.2”.
Take
and define which is closed.Notice
Notice and
as
by
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 124/368
Asymptotic Properties of StochasticApproximation Algorithms 111
ForThen we have
This means that
and the condition A2.2.2” holds.By Theorem 2.2.1”, This implies
which in turn implies (3.2.7) by (3.2.8).Imposing some additional conditions on F , we may have more precise
than (3.2.7) results by using different Lyapunov functions.
Theorem 3.2.2 Assume A3.2.1–A3.2.4 hold, in addition, assume F is
normal, i.e., Let be given by (2.1.1)–(2.1.3). Then
for those sample paths for which A3.2.3 holds, converges
to either zero or one of where denotes an eigenvalue of
More precisely,
where is an unit eigenvector of H corresponding to
Proof. Since F is stable, the integral
is well defined. Noticing that we have
and
and
for
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 125/368
112 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
This means that H is also stable. Therefore, all eigenvalues arenegative. Further, by we find
and hence
We consider (3.2.23) and take
By (3.2.26) we have
Define
Obviously,
for any
Clearly,
where is the dimension of Thus, J is a discrete set, and is nowhere dense because is
continuous. This together with (3.2.28) shows that A2.2.2’ is satisfied.
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 126/368
Asymptotic Properties of StochasticApproximation Algorithms 113
By Theorem 2.2.1’, and (3.2.25) is verified.
Corollary 3.2.1 Let Then
In this case,
and hence (3.2.7) and (3.2.25) are respectively equivalent to
and
Remark 3.2.1 For the convergence rate given by (3.1.18)for the nondegenerate case is while for the degenerate case is
by (3.2.29), which is much slower than
3.3. Asymptotic Normality
In Theorem 3.1.1 we have shown that givenby (2.1.1)–(2.1.3). As shown in Remark 3.1.2,
This is a path-wise result. Assuming the observation noise isa random sequence, we show that is asymptotically normal,
i.e., the distribution of converges to a normal distributionas This convergence implies that in the convergence rate
cannot be improved toWe first consider the linear regression case, i.e., is a linear func-
tion, but may be time-varying.Let us introduce a central limit theorem on double-indexed random
variables. We formulate it as a lemma.
Lemma 3.3.1 Let be an array of l-dimensional random
vectors. Denote
as
for
if
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 127/368
114 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and
Assume
and
Then
where and hereafter denotes the normal distribution with meanand covariance S.
Let us first consider the linear recursion (3.1.6) and derive its asymp-totic normality. We keep the notation introduced by (3.1.7).
We have obtained estimate (3.1.8) for and now derive moreproperties for it.
Lemma 3.3.2 Assume and
H where H is stable. Then for any
Proof. By (3.1.8) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 128/368
Asymptotic Properties of StochasticApproximation Algorithms 115
We will use the following elementary inequality
which follows from the fact that the function equals
zero at x = 0 and its derivative By (3.3.8), we derive
which implies
Assume is sufficiently large such that Then
where for the last inequality (3.3.9) is invoked.
Combining (3.3.7) and (3.3.10) gives (3.3.6).
Lemma 3.3.3 Set
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 129/368
116 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Under conditions of Lemma 3.3.2,
uniformly with respect to and
uniformly with respect to
Proof. Expanding to the series
with we have
where by definition
By stability of H , there exist constants and p > 0 such that
Putting (3.3.13) into (3.3.12) yields that for any
where for the last inequality is assumed to be sufficiently large suchthat and (3.1.8) is used too.
as
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 130/368
Asymptotic Properties of StochasticApproximation Algorithms 117
Since and may be arbitrarily small the conclusions
of the lemma follow from (3.3.14) by Lemma 3.3.2.
Lemma 3.3.4 Assume as and
Let A, B, and Q be matrices and let A and B be stable. Then
Proof. For any T > 0 define
Since for fixed T. Denoting
by we then have Consequently,
serves as an integral sum for or equivalently, for
and hence
Therefore, for (3.3.15) it suffices to show that
Similar to (3.3.10), by stability of A we can show that there is a constantsuch that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 131/368
118 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By stability of A and B, constants and existsuch that
Consequently, we have
which verifies (3.3.18) and completes the proof of the lemma.
Theorem 3.3.1 Let be given by (3.1.6) with an arbitrarily given
initial value. Assume the following conditions holds:
where are constant matrices with is
a martingale difference sequence of dimension satisfying the following
conditions:
and
and is stable;l
I
and
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 132/368
Asymptotic Properties of StochasticApproximation Algorithms 119
and
Then is asymptotically normal:
where
Proof. Define by the following recursion
By (3.1.6) it follows that
Using (3.3.19) we have
Consequently,
where
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 133/368
120 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and
by (3.3.20).Define
By (3.3.30) and stability of A, from (3.1.8) it follows that constantsand exist such that
Consequently, from (3.3.29) we have
The first term on the right-hand side of (3.3.34) tends to zero as
by (3.3.33), while the second term is estimated as follows. By (3.3.31)
where for the last equality, Lemma 3.3.2 and (3.3.33) are used. Thismeans that r and have the same limit distribution if exists.
Consequently, for the theorem it suffices to show
Similar to (3.3.29) and (3.3.31), by (3.3.28) we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 134/368
Asymptotic Properties of StochasticApproximation Algorithms 121
Noticing
by Lemma 3.3.2 and (3.1.8), we find that the last term of (3.3.36) tends
to zero in probability. Therefore, for (3.3.24) it suffices to show
We now show that for (3.3.37) it is sufficient to prove
For any fixed we have
By (3.3.21) we have
where convergence to zero follows from and Lemma 3.3.2.
It is worth noting that the convergence is uniform with respect to This
By (3.3.21) and we see that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 135/368
122 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
implies that the second term on the right-hand side of (3.3.39) tends tozero in probability. The first term on the right-hand side of (3.3.39) canbe rewritten as
By (3.3.33) for any fixed we estimate the first term of (3.3.40) as follows
while for the second term we have
since and
We now show that the last term of (3.3.40) also converges to zero inprobability as
Notice that by (3.3.28), for any fixed and
Therefore, for a fixed there exist constants
and such that
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 136/368
Asymptotic Properties of StochasticApproximation Algorithms 123
Then the last term of (3.3.40) is estimated as follows:
For the first term on the right-hand side of (3.3.44) we have
where the last inequality is obtained because is bounded
by some constant by (3.3.30). Since is fixed, in order to
prove that the right-hand side of (3.3.45) tends to zero as itsuffices to show
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 137/368
124 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By (3.3.33), for any fixed
while for any given we may take sufficiently large such that
Therefore,
by Lemma 3.3.2.Incorporating (3.3.47) with (3.3.48) proves (3.3.46). Therefore, the
right-hand side of (3.3.45) tends to zero as This implies
that the first term on the right-hand side of (3.3.44) tends to zero inprobability.By (3.3.43), for the last term of (3.3.44) we have
which tends to zero as as can be shown by an argument similar
to that used for (3.3.45).In summary we conclude that the right-hand side of (3.3.44) tends
to zero in probability, and hence all terms in (3.3.40) tend to zero inprobability. This implies that the right-hand side of (3.3.39) tends tozero in probability as and then Thus, we have shown
that for (3.3.37) it suffices to show (3.3.38).
We now intend to apply Lemma 3.3.1, identifying
to in that lemma. We have to check conditions of the lemma.Since is a martingale difference sequence, (3.3.1) is obviously
satisfied.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 138/368
Asymptotic Properties of StochasticApproximation Algorithms 125
By (3.3.22) and Lemma 3.3.2,
This verifies (3.3.3). We now verify (3.3.2). We have
where the last term tends to zero by (3.3.22) and Lemma 3.3.2.We show that the first term on the right-hand side of (3.3.49) tends
to (3.3.25).
With A and respectively identified to H and in Lemma 3.3.3,
by Lemmas 3.3.2 and 3.3.3 we have
This incorporating with (3.3.49) leads to
By Lemma 3.3.4 we conclude
Finally, we have to verify (3.3.4).
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 139/368
126 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By (3.3.33) we have
Noticing that uniformly with respect to
since or equivalently,
uniformly with respect to by (3.3.23) we have
Consequently, for any by Lemma 3.3.2
Thus, all conditions of Lemma 3.3.1 hold, and by this lemma we conclude(3.3.38). The proof is completed.
Remark 3.3.1 Under the conditions of Theorem 3.3.1, if integers
are such that then it can be
shown that converges in distribution to
where is a stationary Gaussian Markov process satisfying
the following stochastic differential equation
where is the standard Wiener process.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 140/368
Asymptotic Properties of StochasticApproximation Algorithms 127
Corollary 3.3.1 From (3.1.7) and (3.3.28), similar to (3.3.29)–(3.3.31)we have
and
By (3.3.33), the first term on the right-hand side of (3.3.50) tendsto zero as Note that the last term in (3.3.34) has beenproved to vanish as and it is just a different writing of
Therefore, from (3.3.50) by Theorem 3.3.1, it fol-
lows that for any fixed
We have discussed the asymptotic normality of for the case
where is linear. We now consider the general Let us firstintroduce conditions to be used.
and
A3.3.2 A continuously differentiable function exists such that
for any and for some with
where is used in (2.1.1).
for some
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 141/368
128 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where is a martingale difference sequence satisfying (3.3.21)–
(3.3.23).
A3.3.3
A3.3.4 is measurable and locally bounded. As
where with a specified in (3.3.52) is stable and
satisfying which is specified in (3.3.53).
Theorem 3.3.2 Let be given by (2.1.1)–(2.1.3) and let A3.3.1–
A3.3.4 be held. Then
where
Proof. Since there exists such that
which implies From (3.3.53) it follows that
This together with the convergence theorem for martingale differencesequences yields
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 142/368
Asymptotic Properties of StochasticApproximation Algorithms 129
which implies
Since from it follows thatStability of is implied by stability of which is a part of A3.3.4. Then by Theorem 3.1.1
By (3.3.55) and (3.3.58) we have
From Theorem 3.1.1 we also know that there is an integer-valued
(possibly depending on sample paths) such thatand there is no truncation in (2.1.1) for Consequently,for we have
Denoting
by (3.3.59) and (3.3.54) we see a.s.Then (3.3.60) is written as
By (3.3.28) it follows that
where
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 143/368
130 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Using introduced by (3.3.32), we find
By the argument similar to that used in Corollary 3.3.1, we have
a n d a s
Then by (3.3.51) from (3.3.63) we conclude (3.3.56).
Corollary 3.3.2 Let D be an matrix and let in (2.1.1)–(2.1.2)be replaced by In other words, in stead of (2.1.1) and (2.1.2) if
we consider
then this is equivalent to replacing and by andrespectively.
In this case the only modification should be made in conditions of Theorem 3.3.2 consists in that stability of in A3.3.4 should bereplaced by stability of The conclusion of Theorem 3.3.2 re-
mains valid with only modification that and F in (3.3.57)
should be replaced by and DF, respectively.
3.4. Asymptotic EfficiencyIn Corollary 3.3.2 we have mentioned that the limiting covariance
matrix S ( D) for depends on D, if in (2.1.l)–(2.1.3) is replaced
by By efficiency
we
mean
that
S ( D)
reaches
its minimum withrespect to D.
Denote
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 144/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 145/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 146/368
Asymptotic Properties of StochasticApproximation Algorithms 133
In what follows we will show that is asymptotically normaland is asymptotically efficient.
We list the conditions to be used.
A3.4.1 nonincreasingly converges to zero,
and for some
A3.4.2 A continuously differentiable function exists such that
for any and for some with
where is used in (2.1.1).
A3.4.3 The observation noise is such that
with being a constant independent of and
where is specified in (3.4.7).
A3.4.4 is measurable and locally bounded. There exist a stable ma-
trix F, and such that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 147/368
134 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where is a constant.
Remark 3.4.1 It is clear that satisfies A3.4.1.From (3.4.7) it follows that
where denotes the integer part of
Since is nonincreasing, from (3.4.12) we have
which implies
or
Remark 3.4.2 If with being a martingale
difference sequence satisfying (3.3.21)–(3.3.23), then identifying to
in Lemma 3.3.1, by this lemma we have
where is given by (3.4.1). Thus, in this case the second condition in(3.4.8) holds.
We now show that the first condition in (3.4.8) holds too.By the estimate for the weighted sum of martingale difference se-
quences (See Appendix B) we have
which incorporating with (3.4.13) yields
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 148/368
Asymptotic Properties of StochasticApproximation Algorithms 135
It is clear that (3.4.9) is implied by (3.3.21). Therefore, in the presentcase all requirements in A3.4.3 are satisfied.
Theorem 3.4.2 Assume A3.4.1–A3.4.4 hold. Let be given by
(2.1.1)–(2.1.3) and be given by (3.4.5). Then is asymptoticallyefficient:
Prior to proving the theorem we establish some properties of slowly
decreasing step size.Set
By (3.1.8) we have
where and are constants.
Set
Lemma 3.4.1 i) The following estimate takes place
where o(1) denotes a magnitude that tends to zero as
ii) is uniformly bounded with respect to both and
and
Proof. i) By (3.4.6) we know that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 149/368
136 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and
which implies (3.4.17) since asii) By (3.4.6) as and hence for any we have
where denotes the integer part of Using (3.4.15) we have
for any where the first term at the right-hand side tends to zeroas by (3.4.20), and the last term tends to zero asTherefore, for (3.4.18) it suffices to show
Noticing that (3.4.13) implies for any we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 150/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 151/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 152/368
Asymptotic Properties of StochasticApproximation Algorithms 139
Lemma 3.4.2 Under Conditions A3.4.1–A3.4.4, there exists an integer-
valued such that a.s., a.s., and given by(2.1.1)–(2.1.3) has no truncation for i.e.,
and a.s.
Proof. If we can show that A2.2.3 is implied by A3.4.3, then all condi-tions of Theorem 2.2.1 are fulfilled a.s., and the conclusions of the lemma
follow from Theorem 2.2.1.
Since we have
which means that (2.2.2) is satisfied forWe now check (2.2.2) for By a partial summation we have
where (3.4.6) is used and asBy (3.4.8) the first two terms on the right-hand side of (3.4.34) tend
to zero as by the same reason and by the fact
the last term of (3.4.34) also tends to zero as This means thatsatisfies (2.2.2), and the lemma follows.
By Lemma 3.4.2 we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 153/368
140 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and by (3.4.14)
For specified in (3.4.11) and a deterministic integer define the
stopping time as follows
From (3.4.35) we have
and
Lemma 3.4.3 If A 3.4.1-A3.4.4 hold, then
is uniformly bounded with respect to
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 154/368
Asymptotic Properties of StochasticApproximation Algorithms 141
Proof. By (3.4.11) and (3.4.15) from (3.4.39) we have
where respectively denote the terms on the right-handside of the inequality in (3.4.40).
By (3.4.19) we see
where as From this we find that is bounded inif is large enough so thatBy (3.4.19) we estimate as follows:
where is assumed to be large enough such that
Thus, by (3.4.9)
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 155/368
142 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
We now pay attention to (3.3.10) in the proof of Lemma 3.3.2 and findthat the right-hand side of (3.4.42) is bounded with respect to
For by (3.4.19) and (3.4.10) we have
where is a constant. Again, by (3.3.10), is bounded inIt remains to estimate By Schwarz inequality we have
By (3.4.19), for large enough
which, as shown by (3.3.11), is bounded in we then by (3.4.37) have
where is a constant.
Combing (3.4.40)-(3.4.44) we find that there exists a constantsuch that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 156/368
Asymptotic Properties of StochasticApproximation Algorithms 143
Setting
and
from (3.4.45) we have
where is a constant.Denoting
from (3.4.48) we find
where is set to equal to 1.
From (3.4.48) and (3.4.50) it then follows that
which combining (3.4.46) leads to
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 157/368
144 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where for the last equality we have used (3.4.47).Choosing sufficiently small so that
from (3.4.51) we then have
which is bounded with respect to as shown by (3.3.10).
Lemma 3.4.4 If A3.4.1-A3.4.4 hold, then
Proof. It suffices to prove
Then the lemma follows from (3.4.53) by using the Kronecker lemma.By (3.4.11) and (3.4.37) we have
where the last inequality follows by using the Lyapunov inequality.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 158/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 159/368
146 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By Lemma 3.4.2, a.s. and
Consequently,
where as
Noticing we have
and hence
By (3.4.16) and (3.4.57), from here we derive
By Lemma 3.4.1, is bounded. Then with the help of (3.4.58) wehave
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 160/368
Asymptotic Properties of StochasticApproximation Algorithms 147
From (3.4.58) and the boundedness of there exists a constantsuch that
Then, we have
where the convergence to zero a.s. follows from Lemma 3.4.4.Putting (3.4.59), (3.4.61) into (3.4.56) leads to
By (3.4.58) we then have
Notice that
Let us denote by the upper bound for where the existence of is guaranteed by Lemma 3.4.1. Then using (3.4.9) and (3.4.18) wehave
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 161/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 162/368
Asymptotic Properties of StochasticApproximation Algorithms 149
This incorporating with (3.4.8) implies the conclusion of thetheorem.
This theorem tells us that if in (2.1.1)-(2.1.3) we apply the slowly
decreasing step size, then the averaged estimate leads to the minimal
covariance matrix of the limit distribution.
3.5. Notes and ReferencesConvergence rates and asymptotic normality can be found in [28, 68,
78] for the nondegenerate case. The rate of convergence for the degen-erate case was first considered by Pflug in [74]. The results presented inSection 3.2 are given in [15, 47].
For the proof of central limit theorem (Lemma 3.3.1) we refer to [6,56, 78], while for Remark 3.3.1 refer to [78]. The proof of Theorem 3.3.1and 3.3.2 can be found in [28].
Asymptotic normality of stochastic approximation algorithm was firstconsidered in [44].
For asymptotic efficiency the averaging technique was introduced in[80, 83], and further considered in [35, 59, 66, 67, 74, 98]. Theoremsgiven in Section 3.4 can be found in [13]. For adaptive stochastic ap-proximation refer to [92, 95].
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 163/368
Chapter 4
OPTIMIZATION BY STOCHASTIC
APPROXIMATION
Up-to now we have been concerned with finding roots of an unknownfunction observed with noise. In applications, however, one often
faces to the optimization problem, i.e., to finding the minimizer or max-
inizer of an unknown function It is well know that achieves
its maximum or minimum values at the root set of its gradient, i.e., at
although it may be only in the local sense.
The gradient is also written asIf the gradient can be observed with or without noise, then theoptimization problem is reduced to the SA problem we have discussed in
previous chapters. Here, we are considering the optimization problem for
the case where the function itself rather than its gradient is observed
and the observations are corrupted by noise. This problem was solved
by the classical Kiefer-Wolfowitz (KW) algorithm which took the finite
differences to serve as estimates for the partial derivatives. To be precise,
let be the estimate at time for the minimizer (maximizer) of and let
be two observations on at time with noises and
respectively, where
are two vectors perturbed from the estimate by and respec-tively, on the component of The KW algorithm suggests taking
151
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 164/368
152 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
the finite difference
as the observation of the component of the gradientIt is clear that
where the component of equals
The RM algorithm
with defined above is called the KW algorithm.It is understandable that in the classical theory for convergence of
the KW algorithm rather restrictive conditions are imposed not only onbut also on and Besides, at each iteration to form finite
differences, observations are needed, where is the dimension of
In some problems may be very large, for example, in the problem of optimizing weights in a neuro-network corresponds to the number of nodes, which may be large. Therefore, it is of interest not only to weakenconditions required for convergence of the optimizing algorithm but alsoto reduce the number of observations per iteration.
In Section 4.1 the KW algorithm with expanding truncations usingrandomized differences is considered. As to be shown, because of replac-ing finite differences by randomized differences, the number of observa-
tions is reduced from to 2 for each iteration, and because of involvingexpanding truncations in the algorithm and applying TS method forconvergence analysis, the conditions needed for have been weak-ened significantly and the conditions imposed on the noise have beenimproved to the weakest possible. The convergence rate and asymp-totic normality for the KW algorithm with randomized differences and
expanding truncations are given in Section 4.2.The KW algorithm as other gradient-based optimization algorithms
may be stuck at a local minimizer (or maximizer). How to approachto the global optimizer is one of the important issues in optimizationtheory. Especially, how to pathwisely reach the global optimizer is adifficult and challenging problem. In Section 4.3 the KW algorithm iscombined with searching initial values, and it is shown that the resultingalgorithm a.s. converges to the global optimizer of the unknown function
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 165/368
Optimization by Stochastic Approximation 153
The obtained results are then applied to some practical problemsin Section 4.4.
4.1. Kiefer-Wolfowitz Algorithm with
Randomized DifferencesThere is a fairly long history of random search or approximation ideas
in SA. Different random versions of KW algorithm were introduced: forexample, in one version a sequence of random unit vectors that are inde-pendent and uniformly distributed on the unit sphere or unit cube wasused; and in another version the KW algorithm with random directionswas introduced and was called a simultaneous perturbation stochasticapproximation algorithm.
Here, we consider the expandingly truncated KW algorithm with ran-domized differences. Conditions needed for convergence of the proposedalgorithm are considerably weaker than existing ones.
Conditions on
Let be a sequence of independent andidentically distributed (iid) random variables such that
Furthermore, let be independent of the algebra generated by
is the observation noise to be explained later.For convenience of writing let us denote
It should be emphasized that is a vector and is irrelevant to inverse.At each time two observations are taken: either
or
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 166/368
154 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where is the estimate for the sought-for minimizer (maximizer) of denote the observation noises, and is a real
number.The randomized differences are defined as
and
may serve as observations of randomized differences.To be fixed, let us consider observations defined by (4.1.3) and (4.1.4).
The convergence analysis, however, can analogously be done for obser-vations (4.1.5) and (4.1.6).
Thus, the observations considered in the sequel are
where
We now define the KW algorithm with expanding truncations and
randomized differences. Let be a sequence of positive numbersincreasingly diverging to infinity, and let be a fixed point inGiven any initial value the algorithm is defined by:
where is given by (4.1.9) and (4.1.10).It is worth noting that the algorithm (4.1.9)-(4.1.12) differs from
(2.1.1)- (2.1.3) only by observations As a matter of fact, (4.1.11)and (4.1.12) are exactly the same as (2.1.1) and (2.1.2), but (4.1.9) and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 167/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 168/368
156 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Remark 4.1.2 If is the unique minimizer of then in (4.1.11)and (4.1.12) should be replaced by
Theorem 4.1.1 Assume A4.1.1, A4.1.2, and Conditions on hold.
Let be given by (4.1.9)-(4.1.12) (or (4.1.11)-(4.1.14)) with anyinitial value. Then
if and only if for each the random noise given by (4.1.10) can be
decomposed into the sum of two terms in ways such that
with
and
where is given in Conditions on
Proof. We will apply Theorem 2.2.1 for sufficiency and Theorem 2.4.1for necessity.
Let us first check Conditions A2.2.1–A2.2.4. Condition A2.2.1 is apart of A4.1.1. Condition A2.2.2 is automatically satisfied if we take
noticing that in the presented case. Condition
A2.2.4 is contained in A4.1.2. So, the key issue is to verify thatgiven by (4.1.14) satisfies the requirements.
Let and be vector functions obtained fromwith some of its components replaced by zero:
It is clear that
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 169/368
Optimization by Stochastic Approximation 157
For notational convenience, let denote a genericrandom vector such that
where is specified in (4.1.1), and may vary for differentapplications.
We express given by (4.1.14) in an appropriate form to be dealtwith. We mainly use the local Lipschitz-continuity to treat the structuralerror (4.1.15) in
Rewrite the component of the structural error as follows
and for any express
where on the right-hand side of the equality all terms are cancelled exceptthe first and the last terms, and in each difference of L, the argumentsof L differ from each other only by one
We write (4.1.25) in the compact from:
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 170/368
158 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Applying the Taylor’s expansion to (4.1.26) we derive
where
Similarly, we have
and
where
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 171/368
Optimization by Stochastic Approximation 159
Define the following vectors:
Finally, putting (4.1.27)-(4.1.35) into (4.1.14) we obtain the followingexpression for
It is worth noting that each component of and is a martingaledifference sequence, because both and are independent of
For the sufficiency part we have to show that (2.2.2) is satisfied a.s.Let us show that (2.2.2) is satisfied by all components of and
For components of we have for any
since by (4.1.1), and asTherefore, for any integer N
for any such that converges.Thus, all sample paths of components of satisfy (2.2.2). Com-
pletely the same situation takes place for the components of
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 172/368
160 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By the convergence theorem for martingale difference sequences, wefind that for any integer N
This is because is inde-
pendent of and is bounded by a constant uniformly with respect
to by Lipschitz-continuity of Then the martingale convergence
theorem applies since for some by A4.1.1.
Similar argument can be applied to components of Since for anyinteger N (4.1.38) holds outside an exceptional set with probability zero,there is an with such that for any
and
for all and N = 1,2, ….
Therefore, for all and any integer N
where is given by (1.3.2).
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 173/368
Optimization by Stochastic Approximation 161
From (4.1.17) and (4.1.18) it follows that there exists suchthat and for each
and hence
Combining (4.1.41) and (4.1.42), we find for each
This means that for the algorithm (4.1.11)-(4.1.14), Condition A2.2.3 issatisfied on Thus by Theorem 2.2.1, on This provesthe sufficiency part of the theorem.
Under the assumption a.s. it is clear that both andconverge to zero a.s. and (4.1.39) and (4.1.40) turn to be
and
Then the necessity part of the theorem follows from Theorem 2.4.1. Weshow this. By Theorem 2.4.1, can be decomposed into two parts
and such that and Let us
denote by the component of a vector Define
Then for
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 174/368
162 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and
From (4.1.43) and (4.1.36) it follows that
This together with (4.1.44) and (4.1.45) proves the necessity of the the-orem.
Theorem 2.4.1 gives necessary and sufficient condition on the obser-
vation noise in order the KW algorithm with expanding truncations andrandomized differences converges to the unique maximizer of a function
L. We now give some simple sufficient conditions on
Theorem 4.1.2 Assume A4.1.1 and A4.1.2 hold. Further, assume that
is independent of
and satisfies one of the following two conditions:
i) where is a random variable;
ii) Then
whre is given by (4.1.9)-(4.1.12).
Proof. It suffices to prove (4.1.16)-(4.1.18). Assume i) holds. Letbe given by
By definition, is independent of and so
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 175/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 176/368
164 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
A2.2.3 is satisfied as shown in Theorems 4.1.1 and 4.1.2. Then theconclusion of the theorem follows from Theorem 2.2.2.
Remark 4.1.3 In the multi-extreme case, the necessary conditions on
for convergence can also be obtained on theanalogy of Theorem 2.4.2.
Remark 4.1.4 Conditions i) or ii) used in Theorem 4.1.2 are simpleindeed. However, in Theorem 4.1.2 is required to be independentof This may not be satisfied if the observation noise
is state-dependent. Taking into account that is theobservation noise when observing at and we
see that depends on and if the observationnoise is state-dependent. In this case, does depend on Thisviolates the assumption about independence made in Theorem 4.1.2.
Consider the case where the observation noise may depend on loca-tions of measurement, i.e., in lieu of (4.1.3) and (4.1.4) consider
Introduce the following condition.
A4.1.3 Both and are measurable functions
and are martingale dif-
ference sequences for any and
for p specified in A4.1.1 with
where is a family of nondecreasing independent of both
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 177/368
Optimization by Stochastic Approximation 165
Theorem 4.1.4 Let be given by (4.1.9)–(4.1.12) with a given ini-tial value Assume A4.1.1, A4.1.2’, and A4.1.3 hold. Then
where is a connected subset of
Proof. Introduce the generated by andi.e.,
It is clear that is measurable with respect toand hence are Both
and are Ap-proximating and by simple functions, it is seenthat
Therefore, and aremartingale difference sequences, and
where
Hence, is a martingale difference sequence with
Noticing is bounded and as by (4.1.50) and
(4.1.51) and the convergence theorem for martingale difference sequenceswe have, for any integer N > 0
This together with (4.1.37) with replaced by (4.1.39), and (4.1.40)verifies that expressed by (4.1.36) satisfies A2.2.3. Then the con-clusion of the theorem follows from Theorem 2.2.2.
Remark 4.1.5 If J consists of a singleton then Theorems 4.1.3 and4.1.4 ensure a.s. If J is composed of isolated points, then
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 178/368
166 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
theorems ensure that converges to some point in J . However, the
limit is not guaranteed to be a global minimizer of Depending oninitial value, may converge to a local minimizer. We will return backto this issue in Section 4.3.
4.2. Asymptotic Properties of KW AlgorithmWe now present results on convergence rate and asymptotic normality
of the KW algorithm with randomized differences.
Theorem 4.2.1 Assume hypotheses of Theorem 4.1.2 or Theorem 4.1.4
with and that
for some and as
where is stable and and are specified in (4.2.1) and (4.2.2),
respectively.
Then given by (4.1.9)–(4.1.12) satisfies
Proof. First of all, under conditions of Theorems 4.1.2 or 4.1.4,By Theorem 3.1.1 it suffices to show that given by(4.1.36) can be represented as
where
From (4.1.28) and (4.1.31) by the local Lipschitz continuity of itfollows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 179/368
Optimization by Stochastic Approximation 167
by (4.2.2). Since it follows that
Since and given by (4.1.27) and (4.1.32)are uniformly bounded for for each
where converges. By the convergence theorem for martingaledifference sequences it follows that
where and are are given by (4.1.35).
In the proof of Theorem 4.1.2, replacing by and using (4.2.2),
the same argument leads to
Then by defining
we have shown (4.2.4) under the hypotheses of Theorem 4.1.2.Under the hypotheses of Theorem 4.1.4 we have the same conclusions
about and as before. We need only to show (4.2.5). But
this follows from (4.1.52) with replaced by and the convergence
Remark 4.2.1 Let be given by (4.1.9)–(4.1.12). If and
with then conditions (4.2.1) and (4.2.2) are satisfied.
Theorem 4.2.2 Assume A4.1.1 and A4.1.2 hold and that i) and for some
ii) for some c > 0 and
iii) is stable and for some
iv) given by (4.1.10) is an MA process:
for
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 180/368
168 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where are real numbers and is a martingale
difference sequence which is independent of and satisfies
Then
where and
Proof. Since it follows that and
By assumption is independent of and hence is inde-pendent of Then by (4.2.11) and the convergence theorem formartingale difference sequences we obtain (4.2.5). By Theorem 4.2.1 wehave as
and after a finite number of iterations of (4.1.11), say, for thereare no more truncations.
Since and is stable,it follows that
Let be given by
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 181/368
Optimization by Stochastic Approximation 169
By (4.1.11), (4.1.13), (4.1.36), and condition ii) it follows that for
Let be given by
whereSince is stable, by (3.1.8) it follows that there are constants
and such that
Noticing where becauseby condition iii), we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 182/368
170 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where respectively denote the five terms on the right-hand side of the first equality of (4.2.19).
By (4.2.18),
By Lemma 3.3.2, because andBy (4.1.28) and (4.1.3) it follows that and hence
by i) and (4.2.18)
where is a constant.By Lemma 3.3.2 and the right-hand side of (4.2.20) tends to
zero a.s. asTo estimate let us consider the following linear recursion
By (4.2.17) it follows that
By (4.2.11), Since and
Then by the convergence theorem for martingale differ-
ence sequences it follows that
i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 183/368
Optimization by Stochastic Approximation 171
Similarly,
Applying Lemma 3.1.1, we find that From (4.2.22),
it follows that
Since is an MA process driven by a martingale difference sequence
satisfying (4.2.6),
By the argument similar to that used for (4.2.21) and (4.2.22), from
Lemma 3.1.1 it follows that
Therefore, putting all these convergence results into (4.2.19) yields
By (3.3.37),
where is given by (4.2.10). By (4.2.18), from (4.2.23) and (4.2.24)
it follows that which together with the definition
(4.2.14) for proves the theorem.
Example 4.2.1 The following example of and satisfies Con-ditions i) and iii) of Theorem 4.2.2:
In this example, and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 184/368
172 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Remark 4.2.2 Results in Sections 4.1 and 4.2 are proved for the case,
where the two-sided randomized differences are
used where and are given by (4.1.3) and (4.1.4), respectively.
But, all results presented in Sections 4.1 and 4.2 are also valid for thecase where the one-sided randomized differences
are used, where and are given by (4.1.3) and (4.1.6), respec-
tively.
In this case, in (4.1.27), (4.1.28) and in the expression of should
be replaced by 1, and (4.1.29)–(4.1.32) disappear. Accordingly, (4.1.36)
changes to
Theorems 4.1.1-4.1.4 and 4.2.1 remain unchanged. The conclusion of
Theorem 4.2.2 remains valid too, if in Condition iv)
changes to
4.3. Global OptimizationAs pointed out at the beginning of the chapter, the KW algorithm may
lead to a local minimizer of Before the 1980s, the random search
or its combination with a local search method was the main stochastic
approach to achieve the global minimum when the values of L can exactly
be observed without noise. When the structural property of L is usedfor local search, a rather rapid convergence rate can be derived, but itis hard to escape a local attraction domain. The random search hasa chance to fall into any attraction domain, but its convergence rate
decreases exponentially as the dimension of the problem increases.Simulating annealing is an attractive method for global optimization,
but it provides only convergence in probability rather than path-wise
convergence. Moreover, simulation shows that for functions with a few
local minima, simulated annealing is not efficient. This motivates oneto combine KW-type method with random search. However, a simplecombination of SA and random search does not work: in order to reach
the global minimum one has to reduce the noise effect as time goes on.
A hybrid algorithm composed of a search method and the KW algo-rithm is presented in the sequel with main effort devoted to design eas-
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 185/368
Optimization by Stochastic Approximation 173
ily realizable switching rules and to provide an effective noise-reducingmethod.
We define a global optimization algorithm, which consists of threeparts: search, selection, and optimization. To be fixed, let us discuss
the global minimization problem. In the search part, we choose an ini-tial value and make the local search by use of the KW algorithm withrandomized differences and expanding truncations described in Section4.1 to approach the bottom of the local attraction domain. At the sametime, the average of the observations for L is used to serve as an estimateof the local minimum of L in this attraction domain. In the selectionpart, the estimates obtained for the local minima of L are compared witheach other, and the smallest one among them together with the corre-
sponding minimizer given by the KW algorithm are selected. Then, theoptimization part takes place, where again the local search is carried out,i.e., the KW algorithm without any truncations is applied to improvethe estimate for the minimizer. At the same time, the correspondingminimum of L is reestimated by averaging the noisy observations. Afterthis, the algorithm goes back to the search part again.
For the local search, we use observations (4.1.3) and (4.1.4), or (4.1.5)and (4.1.6). To be fixed, let us use (4.1.5) and (4.1.6).
In the sequel, by KW algorithm with expanding truncations we meanthe algorithm defined by (4.1.11) and (4.1.12) with
where and are given by (4.1.5) and (4.1.6), respectively. Sim-ilar to (4.1.9) and (4.1.10) we have
where
By KW algorithm we mean
with defined by (4.3.2).It is worth noting that unlike (4.1.8), is used in (4.3.1).
Roughly speaking, this is because in the neighborhood of a miminizer
of is increasing, and in (4.1.11) should be anobservation on
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 186/368
174 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
In order to define switching rules, we have to introduce integer-valuedand increasing functions and such thatand
Define
In the sequel, by the search period we mean the part of algorithmstarting from the test of selecting the initial value up to the nextselection of initial value. At the end of the search period, weare given and being the estimates for the global minimizerand the minimum of L, respectively. Variables such as
and etc. in the search period are equipped by superscriptetc.
The global optimization algorithm is defined by the following fivesteps.
(GO1) Starting from at the search period, the initial value
is chosen according to a given rule (deterministic or random),
and then is calculated by the KW algorithm with expanding
truncations (4.1.11) and (4.1.12) with defined by (4.3.1), forwhich , step sizes and and used for truncation aredefined as follows:
where c > 0 and are fixed constants, andare two sequences of positive real numbers increasingly diverging toinfinity.
(GO2) Set the initial estimate for and update theestimatefor by
where is the noise when observing
After steps, is obtained.
(GO3) Let be a given sequence of real numbers such thatand as Set For if
as
e.g.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 187/368
Optimization by Stochastic Approximation 175
then set Otherwise, keepunchanged.
(GO4) Improve to by the KW algorithm with expanding
truncations (4.1.11) and (4.1.12) with defined by (4.3.1), forwhich
where in (4.1.11) and (4.1.12) may be an arbitrary sequence of numbers increasingly diverging to infinity, and
At the same time, update the estimate for by
where is the noise when observing At the end of thisstep, and are derived.
(GO5) Go back to (GO1) for the search period.
We note that for the search period is added to and (see(4.3.7) and (4.3.8)). The purpose of this is to diminish the effect of
the observation noise as increases. Therefore, and both tendto zero, not only as but also as The followingexample shows that adding an increasing to the denominators of
and is necessary.
Example 4.3.1 Let
It is clear that the global minimizer is and are twolocal minima. Furthermore, and are attractiondomains for –1 and +1, respectively.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 188/368
176 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Since is linear, for local search we apply the ordinary KW al-gorithm without truncation
Here, no randomized differences are introduced, because this is a one-dimentional problem.
Assume
where
and and are mutually independent and both are sequences of iid random variables with
Let us start from (GO1) and take
(not tending to infinity),
If then, by noticing one of andmust belong to Elementary calculation shows that
Paying attention to (4.3.13), we see
and
i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 189/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 190/368
178 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
A4.3.2
A4.3.3 For any convergent subsequence of
where denotes given by (4.3.3) with replaced by
denotes used for the ¢ search period, and
A4.3.4 For any convergent subsequence
where is given by (1.3.2).
It is worth emphasizing that each in the sequence
is used only once when we form andWe now give sufficient conditions for A4.1.2, A4.3.3, and A4.4.4. For
this, we first need to define generated by estimates andderived up-to current time. Precisely, for running in the search
period of Step (GO1) define
and for running in Step (GO4) define
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 191/368
Optimization by Stochastic Approximation 179
Remark 4.3.1 If both sequences
and are martingale difference sequences with
and if
for some then A4.3.2 holds.
This is because
is a maringale difference sequence with bounded second conditional mo-ment, and hence
which implies (4.3.15).By using the second parts of conditions (4.3.22) and (4.3.23), (4.3.16)
can be verified in a similar way.
Remark 4.3.2 If and is independent of
and if exists
such that then by the uncorrelatedness of
with for or
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 192/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 193/368
Optimization by Stochastic Approximation 181
where
Assume, further, for fixed
Lemma 4.3.1 Assume L( J ) is nowhere dense, where
Let be a nonempty interval such If there are
two sequences and such that
and is bounded, then it is impossible to have
where
Proof. Without loss of generality we may assume converges as
otherwise, it suffices to select a subsequence.Assume the converse: i.e., (4.3.28) holds. Along the lines of the proof
for Theorem 2.2.1 we can show that
for some constant M if is sufficiently large. As a matter of fact, this is
an analogue of (2.2.3). From (4.3.29) the following analogue of (2.2.15)
takes place:
and the algorithm for has no truncation forif is large enough, where is a constant. Similar to
(2.2.27), we then have
and
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 194/368
182 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
for some small T > 0 and all sufficiently large
From this, by (4.3.27) and convergence of it follows that
By continuity of and (4.3.30) we have
which implies that for small enough T .Then by definition,
which contradicts (4.3.32). The obtained contradiction shows the im-
possibility of (4.3.28).
Introduce
such that
and
Lemma 4.3.2 Let be given by (GO1). Assume
A4.3.1 and A4.3.3 hold and for some Then
for any may occur infinitely many often with
probability 0, i.e.,
Proof. Since L( J ) is nowhere dense, for any belonging to infinitely
many of there are subsequences such that
and
whereand
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 195/368
Optimization by Stochastic Approximation 183
By assumption as must be bounded.
Hence, is bounded. Without loss of generality we may assume
that is convergent.
Notice that at Step (GO1), is calculated according to (4.1.11)and (4.1.12) with given by (4.3.2) and (4.3.3), i.e.,
which differ from (4.1.11) (4.1.12), (4.3.2), and (4.3.3) by superscript (i),which means the calculation is carried out in the search period.
By (4.1.27) with notations (4.1.33) and (4.1.34), equipped by super-script we have
where
If we can show that and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 196/368
184 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where
then by Lemma 4.3.1, (4.3.42) contradicts with that all sequences
cross the interval which is disjoint with L ( J ) .
This then proves (4.3.36).We now show for all sufficiently large if T is small
enough.
Since and are finite, where
We now show that on the
if is sufficiently large and T is smallenough.
Suppose the converse: for any fixed T > 0, there always exists
whatever large is taken such thatSince by continuity of there is a constant
q > 0 such that
For any let us estimate By
and the local Lipschitz continuity of it is seen that
is uniformly bounded with respect to and allThen by A4.3.3, it follows that there is a constant such that
From this it follows that there is no truncation forand
Let T be so small that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 197/368
Optimization by Stochastic Approximation 185
On the other hand, however, we have andThe obtained contradiction shows for all sufficiently
large if T is small enough.
We now prove (4.3.42). Let us order in the following way
From (4.1.34) and by the fact that is an iid sequence and isindependent of sums appearing in (4.1.34), it is easy to be convincedthat is a martingale difference sequence.
By the condition for some it is clearthat for with being a constant. Then we have
By (4.1.28) and (4.3.8), we have
where is a constant. Noticing that for large andsmall T , by (4.3.44),(4.3.45), and A4.3.3 we may assume sufficientlylarge and T small enough such that
This will imply (4.3.42) if we can show
We prove (4.3.47) by induction.We have by definition of Assume that
and by the convergence theorem for martingale difference sequences
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 198/368
186 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and Then there is no truncation at timesince by (4.3.46) (with chosen such that
if T in (4.3.46) is sufficiently small.
Then by (4.3.40), we have
and by (4.3.43) and (4.3.46)
for small T . This completes induction, and (4.3.42) is proved, which, in
turn, concludes the lemma.
Lemma 4.3.3 Assume A4.3.1–A4.3.3 hold. Further, assume that
for some and If there
exists a subsequence such that then
Proof. For any by Lemma 4.3.2 there exists such that for
any if By (GO2),
we have
Then by A4.3.2, there exists such that, for any
This implies the conclusion of the lemma by the arbitrariness of
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 199/368
Optimization by Stochastic Approximation 187
Lemma 4.3.4 Assume A4.3.1–A4.3.3 hold, for
some and If subsequence is such that
then
where denotes the closure of L( J ) , and and are
given by (GO1) and (GO2) for the search period.
Proof. Since by A4.3.1, for (4.3.50) it is
seen that contains a bounded infinite subsequence, and hence, a
convergent subsequence (for simplicity of notation, assume
such that
Since there exists a such that
and hence
Define
It is worth noting that for any T > 0, is well defined for all
sufficiently large because and hence
We now show that
By the same argument as that just used before, without loss of gen-
erality, we may assume is convergent (otherwise, a convergentsubsequence should be extracted) and thus
We have to show
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 200/368
188 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By the same argument as that used for deriving (2.2.27), it followsthat there is such that
which implies the correctness of (4.3.53).From (4.3.53) it follows that
because, otherwise, we would have a subsequence with
such that and by (4.3.54)
for large However, by (2.2.15), so for smallenough T > 0, (4.3.56) is impossible. This verifies (4.3.55).
We now show
Assume the converse, i.e.,
From (4.3.54) and (4.3.58) it is seen that for all sufficiently large thesequence
contains at least a crossing the interval withIn other words, we are dealing with a sample path on which both(4.3.54) and (4.3.58) are satisfied. Thus, belongs to ByLemma 4.3.2, the set composed of such is with zero probability. This
verifies (4.3.57).From (4.3.57) it follows that
for all sufficiently large
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 201/368
Optimization by Stochastic Approximation 189
Notice that from the following elementary inequalities
by (4.3.5) it follows that
By definition of we write
By (4.3.59) and (4.3.61), noticing we have
because
By (4.3.55) and (4.3.61) we have
Since by (4.3.15), combining (4.3.62)–(4.3.64)
leads to
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 202/368
190 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
which completes the proof of the lemma.
Lemma 4.3.5 Let be given by (GO1)–(GO5). Assume that A4.3.1–
A4.3.4 hold, initial values selected in (GO1) are dense in an open
set U containing the set of global minima of for some and Then for any
Proof. Among the first search periods denote by the number of those search periods for which are reset to be i.e.,
Since L( J ) is not dense in any interval, there exists an intervalsuch that So, for lemma it suffices to prove
that cannot cross infinitely many times a.s.If then after a finite number of steps, is generated
by (GO4). By Lemma 4.3.1 the assertion of the lemma follows immedi-ately. Therefore, we need only to consider the case where
Denote by the search period for which a resetting happens, i.e.,It is clear that by
In the case by (GO4) the algorithm generates a family
of consecutive sequences:
Let us denote the sequence by
and the corresponding sequence of the values of by
Let be sufficiently small such that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 203/368
Optimization by Stochastic Approximation 191
and which is possible because L( J ) isnowhere dense.
Since is dense in U, visits infinitely often. Assume
By Lemma 4.3.2
if is large enough.Define
This means that the first resetting in or after the search periodoccurs in the search period.We now show that there is a large enough such that the
following requirements are simultaneously satisfied:
where is fixed;
We first show ii)-v).Since all three intervals indicated in ii) have an empty intersection
with L( J ), by Lemma 4.3.1 ii) is true if S is large enough. It is clear
i) implies
ii) does not cross intervals
and
iii)
vi)
v)
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 204/368
192 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
that iii) and vi) are correct for fixed and if is large enough,while v) is true because
For i) we first show that there are infinitely many for which
By (4.3.68) and (4.3.71) we have
Consider two cases.1) There is no resetting in the search period. Then
and by (4.3.72) and (4.3.74) it follows that
By (4.3.70) and the definition of there exists at least an integeramong such that
because, otherwise, we would have which contradicts(4.3.74).
By ii) we conclude that
and by (4.3.68) we also have (4.3.76).From (4.3.76), by ii) does not cross for
Consequently,
This together with (4.3.70) implies that
and, in particular,2) If there is a resetting in the search period, then
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 205/368
Optimization by Stochastic Approximation 193
By (GO3) we then have
Noticing as we conclude that there are infinitelymany for which (4.3.73) holds.
We now show that there is a such that
where lim sup is taken along those for which (4.3.73) holds.Assume the converse: there is a subsequence of such that
Then by Lemma 4.3.4,
which contradicts (4.3.73). This proves (4.3.78), and also i). As a matter
of fact, we have proved more than i): Precisely, we have shown that thereare infinitely many for which (4.3.73) holds, and for (4.3.73)implies the following inequality:
Let us denote by the totality of those for which (4.3.73)holds and What we have just proved is that contains infinitely
many if Consider a sequence By ii) it cannot cross the interval
This means that
Then by (4.3.70)
and by (GO3)
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 206/368
194 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
since is a search period with resetting.Thus, we have shown that if then also be-
longs to Therefore, and
From here and (4.3.67) it follows that
Since may cross the interval only forfinite number of times by Lemma 4.3.1. This completes the proof of thelemma.
Proof of Theorem 4.3.1.By Lemma 4.3.5 the limit exists. By arbitrariness of from (4.3.69) it follows that
By continuity of we conclude that
4.4. Asymptotic Behavior of Global OptimizationAlgorithm
In last section a global optimization algorithm combining the KW al-
gorithm with search method was proposed, and it was proved that thealgorithm converges to the set of global minimizers, i.e.,However, in the algorithm defined by (GO1)–(GO5), reset-
tings are involved. The convergence by no means
excludes the algorithm from resettings asymptotically. In other words,although it may still happen that
where is defined in Lemma 4.3.5, i.e., it may still be possible to have
infinitely many resettings.In what follows we will give conditions under which
In this case, the global optimization algorithm (GO1)–(GO5) asymp-totically behaves like a KW algorithm with expanding truncations andrandomized differences, because for large is purely generated by(GO4) without resetting.
a.s.
a.s.,
a.s.
a.s.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 207/368
Optimization by Stochastic Approximation 195
A4.4.1 is a singleton, is twice continuously differentiable
in the ball centered at with radius for some and
of is positive definite.
A4.4.2 and ordered as in (4.3.20) (4.3.21) and Remark 4.3.1 are martingale difference sequences with
A4.4.3 is independent of
for and
and
for
We recall that is the observation noise in thesearch period.
A4.4.4 is independent of and where
denotes the observation noise when is calculated
in (Go4).
Lemma 4.4.1 Assume A4.4.2 holds and, in addition,
Then, there exists an (maybe depending on such that for any
and
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 208/368
196 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Proof. Notice that by A4.4.2 is a martingale
difference sequence with bounded conditional variance. By the conver-
gence theorem for martingale difference sequences
which implies (4.4.2).
Estimate (4.4.3) can be proved by a similar way.
Lemma 4.4.2 Assume A4.4.3 and A4.4.4 hold. If for some then
and
for where and are given in (4.1.34), where super-script denotes the corresponding values in the ith search period.
Proof. Let us prove
Note that
is a martingale difference sequence with bounded conditional secondmoment. So, by the convergence theorem for martingale difference se-quences for (4.4.6) it suffices to show
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 209/368
Optimization by Stochastic Approximation 197
By assumption of the lemma or and
for large The last inequality yields
and hence
Therefore,
Thus, (4.4.6) is correct. As noted in the proof of Lemma 4.3.2, isa martingale difference sequence. So, (4.4.4) is true.
Similarly, (4.4.5) is also verified by using the convergence theorem for
martingale difference sequences.
Lemma 4.4.3 In addition to the conditions of Theorem 4.3.1, suppose
that A4.4.1 and A4.4.3 hold, is positive definite, and
for some Then there exists a sufficiently large such
that, for if the inequality
holds for some with then the following inequality holds
Proof. . By A4.4.1 and the Taylor’s expansion, we have
i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 210/368
198 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where
Therefore, for any there is a such that for any
and
where and denote the minimum and maximum eigenvalue of
H, respectively, and o(1) is the one given in (4.4.10).
Since is the unique minimizer of and is continuous, there
is such that if We always assumethat is large enough such that
and
where is used in (GO1). From (4.4.8) it then follows thatand there is no truncation at time
Denote
For satisfying (4.4.8) and we have
where is given by (4.3.41).
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 211/368
Optimization by Stochastic Approximation 199
By (4.4.11) it then follows that
where is given by (4.1.33) with superscript denoting thesearch period and
By (4.4.14) it is clear that
Let
For (4.4.9) it suffices to show thatAssume the converse:Let
By (4.4.20), for all
and hence,
Thus, (4.4.12)-(4.4.14) are applicable.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 212/368
200 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By (4.4.17) and the second inequality of (4.4.13), we have for
which incorporating with (4.4.21) yields
Applying the first inequality of (4.4.13) and then (4.4.20) leads to
Since for there is no truncation forUsing (4.4.18) we have
where
We now show that is negative for all sufficiently largeLet us consider terms in By assumption,
from (4.4.19) and (4.4.22) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 213/368
Optimization by Stochastic Approximation 201
We now estimate the second term on the right-hand side of (4.4.25)after multiplying it by
From (4.4.4) and (4.4.16) it follows that
uniformly with respect to and withNoticing that with being a constant,
and that which implies we find
Then, noticing that is bounded by some constantwe have
For the third term on the right-hand side of (4.4.25), multiplying it bywe have
where is a constant.Finally, for the last term of (4.4.25) we have the following estimate
Combining (4.4.26)–(4.4.30) we find that
where
and for large
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 214/368
202 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Consequently, from (4.4.25) it follows that
We now show that
by induction.Assume it holds for i.e.,
which has been verified for We have to show it is true for
By (4.4.18) we have
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 215/368
Optimization by Stochastic Approximation 203
where
Comparing (4.4.35) with (4.4.25), we find that in lieu of and
we now have and respec-
tively. But, for both cases we use the same estimate (4.4.27). Therefore,completely by the same argument as (4.4.26)–(4.4.30), we can prove that
and for large
Thus, we have proved (4.4.32).By the elementary inequality
for which is derived from
for any matrices A and B of compatible dimensions, we derive
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 216/368
204 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
from (4.4.32)
As mentioned before, for and there is notruncation. Then by (4.4.18)
where
Then from (4.4.36) and (4.4.27) it follows that
where
which tends to zero as by (4.4.27) and (4.4.38).
Then
where for the last equality (4.4.10) is used.Finally, by (4.4.21), for large from (4.4.39) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 217/368
Optimization by Stochastic Approximation 205
which incorporating with (4.4.10) yields
This contradicts (4.4.20), the definition of The contradictionshows
Theorem 4.4.1 Assume that A4.3.1, A4.4.1–A4.4.4 hold, and
is positive definite for some
Further, assume that
and for some constants
Then the number of resettings is finite, i.e.,
where is the number of resettings among the first search periods
(GO1), and is given in (GO3).
Proof. If (4.4.44) were not true, then there would be an S with positive
probability such that, for any there exists a subsequencesuch that at the search period a resetting occurs, i.e.,
Notice that
by(4.4.41) and and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 218/368
206 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
by (4.4.41) and (4.4.42). Hence, conditions of Lemma 4.4.1 are satisfied.Without loss of generality, we may assume that (4.4.2)–(4.4.5) and theconclusion of Theorem 4.3.1 hold From now on assume that
is fixed.
It is clear that, for any constant
if is large enough, since forLet
Rewrite (4.4.46) as
Define
and
Noticing that there is no resetting between and and (4.4.47)
corresponds to (4.4.8), by the same argument as that used in the proof of Lemma 4.4.3, we find that, for any
Since we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 219/368
Optimization by Stochastic Approximation 207
By (4.4.3) (4.4.42) and (4.4.43) it follows that
where for the last inequality (4.4.41) is used.
Thus, by (4.4.40)
By (4.4.33) it follows that
provided is large enough, where for the last inequality, (4.4.2) is
used.
Since by (4.4.43)
and since
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 220/368
208 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
we find
where the last inequality follows from (4.4.40).Using (4.4.51) and (4.4.53), from (4.4.52) for sufficiently large we
have
Using the second inequality of (4.4.43) and then observing that
and
by (4.4.40) and (4.4.41) and we find
We now show that there is such that
Assume the converse:
with
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 221/368
Optimization by Stochastic Approximation 209
Then, we have
for large enough because
Inequality (4.4.57) contradicts (4.4.55). Consequently, (4.4.56) is true.In particular, for we have
Completely by the same argument as that used for (4.4.47)–(4.4.50), by
noticing that there is no resetting from to we concludethat
By the same treatment as that used for deriving (4.4.54) from (4.4.50),we obtain
Comparing (4.4.58) with (4.4.54), we find that has been changed toand this procedure can be continued if the number of resettings
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 222/368
210 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
is infinite. Therefore, for any we have
From (4.4.40) we see
Since we have and hence by
Consequently, by (4.4.41) the right-hand side of (4.4.59) can be esti-mated as follows:
by (4.4.61) if is large enough.However, the left-hand side of (4.4.59) is nonnegative. The obtained
contradiction shows that must be finite, and (4.4.44) is correct.By Theorem 4.4.1, our global optimization algorithm coincides with
KW algorithm with randomized differences and expanding truncationsfor sufficiently large Therefore, theorems proved in Section 4.2 are
applicable to the global optimization algorithm. By Theorems 4.2.1 and4.2.2 we can derive convergence rate and asymptotic normality of thealgorithm described by (GO1)–(GO5).
4.5. Application to Model ReductionIn this section we apply the global optimization algorithm to system
modeling. A real system may be modeled by a high order system which,however, may be too complicated for control design. In control engineer-
ing the order reduction for a model is of great importance. In the linearsystem case, this means that a high order transfer function is to be
approximated by a lower order transfer function. For this one may usemethods like the balanced truncation and the Hankel norm approxima-
tion. These methods are based on concept of the balanced realization.We are interested in recursively estimating the optimal coefficients of the
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 223/368
Optimization by Stochastic Approximation 211
reduced model by using the stochastic optimization algorithm presentedin Section 4.3.
Let the high order transfer function be
and let it be approximated by a lower order transfer function
If is of order then is taken to be of order
To be fixed, let us take to be a polynomial of orderand of order
where coefficients should not be confused with stepsizes used in Steps (GO1)-(GO5). Write as where
and stand for coefficients of and
It is natural to take
as the performance index of approximation. The parameters and areto be selected to minimize under the constraint that
is stable. For simplicity of notations we denote and writeas
Let us describe the where has the required property.Stability requires that
This implies that
because is the sum of two complex-conjugate roots of
If then which yields If
then and hence
(or ).
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 224/368
212 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Set
Identify and appeared in Section 4.3 toand respectively for the present case.
We now apply the optimization algorithm (GO1)–(GO5) to minimiz-ing under constraint that the parameter in belongs to D. For this we first concretize Steps (GO1)–(GO5) described in Section4.3.
Since is convex in for fixed we take the fixed initial value
for any search period and randomly select initial valuesonly for according to a distribution density which is defined asfollows:
where with and being the uniform dis-tributions over [ – 2,2] and – 1,1], respectively.
After having been selected in the search period, the algorithm
(4.1.11) and (4.1.12) is calculated with and
As to observations, in stead of (4.3.1) we will use information
about gradient because in the present case the gradientof can explicitly be expressed:
In the search period the observation is denoted by and is givenby
where is independently selected from according to the uniform
distribution, and stands for the estimate for at time in the
search period. It is clear that is an approximation to the integral
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 225/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 226/368
214 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where are independently selected from according to the uni-form distribution for each Clearly, is an approximation to
Finally, take equal to
In control theory there are several well-known model reduction meth-ods such as model reduction by balanced truncation, Hankel norm ap-proximation among others. These methods depend on the balanced re-alization which is a state space realization method for a transfer matrix
keeping the Gramians for controllability and observability of therealized system balanced. In order to compare the proposed global op-timization (GO) method, we take the commonly used model reductionmethods by balanced truncation (BT) and Hankel norm approximation
(HNA), which, are realized by using Matlab. For this, the discrete-timetransfer functions are transformed to the continuous time ones byusing d2c provided in Matlab. Then the reduced systems are discretizedto compute for comparison.
As we take a 10th order transfer function respec-tively for the following examples:
Example 4.5.1
Example 4.5.2
Example 4.5.3
Using the algorithm described in Section 4.3, for Examples 4.5.1-4.5.3we obtain the approximate transfer functions of order 4, respectively,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 227/368
Optimization by Stochastic Approximation 215
denoted by and with
Using Matlab we also derive the 4th order approximations for Exam-ples 4.5.1–4.5.3 by balanced truncation and Hankel norm approximation,which are as follows:
where the subscripts and H denote the results obtained by balancedtruncation and Hankel norm approximation, respectively.
The approximation errors are given in the following table:
From this table we see that the algorithm presented in Section 4.3gives less approximation errors in in comparison with othermethods.
We now compare approximation errors in norm and compare step
responses between the approximate models and the true one by figures.In the figures of step response
the solid lines denote the true high order systems;
the dashed lines (- - -) denote the system reduced by Hankel normapproximation;
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 228/368
216 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
the dotted lines denote the system reduced by balanced trun-cation;
The dotted-dashed lines denote the systems reduced by the
stochastic optimization method given in Section 3.In the figures of the approximation error
the solid lines denote the systems reduced by the stochasticoptimization method;
the dashed lines (- - -) denote the system reduced by Hankel norm
approximation;the dotted lines denote the system reduced by balanced trun-cation.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 229/368
Optimization by Stochastic Approximation 217
Example 4.5.1
Example 4.5.2
Example 4.5.3
These figures show that the algorithm given in Section 4.3 gives lessapproximation error in in comparison with other methodsfor Example 4.5.1 and the intermediate error in for Exam-ples 4.5.2 and 4.5.3. Concerning step responses, the algorithm givenin Section 4.3 provides better approximation in comparison with othermethods for all three examples.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 230/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 231/368
Chapter 5
APPLICATION TO SIGNAL PROCESSING
The general convergence theorems developed in Chapter 2 can deal
with noises containing not only random components but also structural
errors. This property allows us to apply SA algorithms to parameterestimation problems arising from various fields. The general approach,roughly speaking, is as follows. First, the parameter estimation problemcoming from practice is transformed to a root-seeking problem for a rea-sonable but unknown function which may not be directly observed.
Then, the real observation is artificially written in the standardform
with Normally, it is quite straightforward to arrive
at this point. The main difficulty is to verify that the complicated noise
satisfies one of the noise conditions required in the
convergence theorems. It is common that there is no standard method to
complete the verification procedure, because for different problemsare completely different from each other.
In Section 5.1, SA algorithms are applied to solve the blind channelidentification problem, an active topic in communication. In Section 5.2,the principle component analysis used in pattern classification is dealt
with by SA methods. Section 5.3 continues the problem discussed in
Section 5.1, but in more general setting. Namely, unlike Section 5.1,the covariance matrix of the observation noise is no longer assumed to
be known. In Section 5.4, adaptive filtering is considered: Very simpleconditions for convergence of sign-algorithms are given. Section 5.5 dis-
cusses the asymptotic behavior of asynchronous SA algorithms, which
take the possible communication delays between parallel processors intoconsideration.
219
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 232/368
220 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
5.1. Recursive Blind IdentificationIn system and control area, the unknown parameters are estimated on
the basis of observed input and output data of the system. This is thesubject of system identification. In contrast to this, for communicationchannels only the channel output is observed and the channel input is un-available. The topic of blind channel identification is to estimate channelparameters by using the output data only. Blind channel identificationhas drawn much attention from researchers because of its potential ap-plications in wireless communication. However, most existing estimationmethods are “block” algorithms in nature, i.e., parameters are estimatedafter the entire block of data have been received.
By using the SA method, here a recursive approach is presented: Es-timates are continuously improved while receiving new signals.
Consider a system consisting of channels with L being the maximumorder of the channels. Let be the one-dimensionalinput signal, and be the channel out-put at time where N is the number of samplesand may not be fixed:
where
are the unknown channel coefficients.Let us denote by
the coefficients of the channel, and by
the coefficients of the whole system which compose avector.
The observations may be corrupted by noise
where is a vector. The problem is to estimate onthe basis of observations.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 233/368
Application to Signal Processing 221
Let us introduce polynomials in backward-shift operator
whereWrite and in the component forms
respectively, and express the component via
From this it is clear that
Define
where is a
It is clear that is a xSimilar to and let us define and and and which
have the same structure as and but with replaced by and
respectively.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 234/368
222 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By (5.1.5) we have
From (5.1.8), (5.1.4), and (5.1.10) it is seen that
This means that the channel coefficient satisfies the set of linearequations (5.1.12) with coefficients being the system outputs.
From the input sequence we form the( N – 2 L + 1) × (2 L + 1)-Hankel matrix
It is clear that the maximal rank of is 2 L + 1 as
If is of full rank for some then will
also be of full rank for anyLemma 5.1.1 Assume the following conditions hold:
A5.1.1 have no common root.
A5.1.2 The Hankel matrix composed of input signal is of
full rank (rank=2 L + 1).
Then is the unique up to a scalar multiple nonzero vector simulta-
neously satisfying
Proof. Assume there is another solution to(5.1.14), which is different from
where isDenote
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 235/368
Application to Signal Processing 223
From (5.1.15) it follows that
By (5.1.7), we then have
which implies
where by we denote the (2 L + 1)-dimensional vector composed
of coefficients of the polynomial written inthe form of increasing orders of
Since is of full rank, In other words,
For a
fixed
(5.1.17) is valid for all Therefore, allroots of should be roots of for all By A5.1.1,
all roots of must be roots of Consequently, there is a
constant such that Substitutingthis into (5.1.17) leads to
and hence Thus, we conclude that
We first establish a convergence theorem for blind channel identifica-tion based on stochastic approximation methods for the case where anoise-free data sequence is observed.
Then, we extend the results to the case where N is not fixed and
observation is noise-corrupted.Assume is observed. In this case
are available, and we have We will repeatedlyuse the data by setting
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 236/368
224 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Define estimate for recursively by
with an initial valueWe need the following condition.
Theorem 5.1.1 Assume A5.1.1–A5.1.3 hold. Let be given by
(5.1.19) with any initial value with Then
where is a constant.
Proof. Decompose and respectively into orthogonalvectors:
where
If serves as the initial value for (5.1.19), then by (5.1.14),Again, by (5.1.14) we have
and we conclude that
and
Therefore, for proving the theorem it suffices to show thatas
Denote
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 237/368
Application to Signal Processing 225
andThen by (5.1.21) we have
Noticing that and is uniformly bounded with respect tofor large we have
andBy (5.1.18)
and by Lemma 5.1.1, is its unique up to a constant multiple eigenvec-
tor corresponding to the zero eigenvalue, and the rank of
is
Denote by the minimal nonzero eigenvalue of
Let be an arbitrary vector orthogonal to
Then can be expressed by
where – 1, are the unit eigenvectors of
corresponding to its nonzero eigenvalues.
It is clear that
By this, from (5.1.23) and (5.1.24), it follows that for
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 238/368
226 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and
Noticing that
we conclude
and hence
From (5.1.21) it is seen that is nonincreasing forHence, the convergence implies that
The proof is completed.
Remark 5.1.1 If the initial value is orthogonal to thenand (5.1.20) is also true. But this is a non-interesting case giving
no information about
Remark 5.1.2 Algorithm (5.1.19) is an SA algorithm with linear time-varying regression function The root set J for istime-invariant: As mentioned above, evolves in
one of the subspaces depending on the initialvalue: In the proof of Theorem 5.5.1we have actually verified that may serve as the Lyapunov function
satisfyingA2.2.20
for Then applying Remark 2.2.6 also leads tothe desired conclusion.
We now assume the input signal is a sequence of infinitely manymutually independent random variables and that the observations donot contain noise, i.e., in (5.1.5).
Lemma 5.1.2 Assume A5.1.1 holds and is a sequence of mutually
independent random variables with Then is
the unique unit eigenvector corresponding to the zero eigenvalue for the
matrices
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 239/368
Application to Signal Processing 227
and the rank of is
Proof. Since is a sequence of mutually independent random vari-ables and it follows that
where
Proceeding along the lines of the proof of Lemma 5.1.1., we arrive at theanalogue of (5.1.16):
which implies
From (5.1.28) and (5.1.29) it follows that Then followingthe proof of Lemma 5.1.1, we conclude that is the unique unit vectorsatisfying
This shows that is of rank and isits unique unit eigenvector corresponding to the zero eigenvalue.
Let denote the minimal nonzero eigenvalue of Onwe need the following condition.
A5.1.4 is a sequence of mutually independent random variables
with for some and such that
Condition A5.1.3 is strengthened to the following A5.1.5.
A5.1.5 A5.1.3 holds and where is given in A5.1.4.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 240/368
228 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
It is obvious that if is an iid sequence, then is a positiveconstant, and (5.1.30) is automatically satisfied.
Theorem 5.1.2 Assume A5.1.1, A5.1.4, and A5.1.5 hold, and is
given by (5.1.19) with initial value Then
where
Proof. In the present situation we still have (5.1.21) and (5.1.22). So,it suffices to show
With N replaced by 4 L in the definitions of and we again arriveat (5.1.23).
Since
converges a.s. by A5.1.4 and A5.1.5, there is a large suchthat
Let be an arbitrary vector such that
Then by Lemma 5.1.2,
and hence
Therefore, which
tends to zero since This implies
is bounded, and
a.s.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 241/368
Application to Signal Processing 229
We now consider the noisy observation (5.1.5). By the definition
(5.1.11), similar to (5.1.9) we have
where and have the same structure as given by (5.1.10) with
replaced by and , respectively.The following truncated algorithm is used to estimate
with initial value andIntroduce the following conditions.
A5.1.6 and are mutually independent and each of them is asequence of mutually independent random variables (vectors) such that
and
for some
and where is given in A5.1.4.
Set
Then
Denote by the resetting times, i.e.,
Then, we have
A5.1.6
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 242/368
230 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and
Let be an orthogonal matrix, where
Denote
Then
Noticing we find that
Lemma 5.1.3 Assume A5.1.6 and A5.1.7 hold. Then for given by(5.1.32),
Proof. Setting
we have
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 243/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 244/368
232 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Proof. Since is a sequence of mutually independent nondegenerate
random variables, where
Notice that coincides with given by (5.1.13) if setting N = 4 L and in (5.1.13).
Proceeding as the proof of Lemma 5.1.1, we again arrive at (5.1.16).
Then, we have Since
we find that Then bythe same argument as that used in the proof of Lemma 5.1.1, we con-
clude that for any is the unique unit nonzero vector simultaneouslysatisfying
Since is a matrix, the above assertion
proves that the rank of is and also
proves that is its unique unit eigenvector corresponding to the zeroeigenvalue.
Denote by the minimal nonzero eigenvalue of
We need the following condition.
A5.1.8 There is a such that
It is clear that if is an iid sequence, then is independentof and and A5.1.8 is automatically satisfied.
Lemma 5.1.6 Assume A5.1.1 and A5.1.6–A5.1.8 hold. Then for any
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 245/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 246/368
234 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
which incorporating with (5.1.44) leads to
for large enough
where and
Theorem 5.1.3 Assume A5.1.1 and A5.1.6–A5.1.8 hold. Then for
given by (5.1.32) with initial valueand
where is a random variable expressed by (5.1.60).
Proof. We first prove that the number of truncations is finite, i.e.,
a.s.Assume the converse:
By Lemma 5.1.3, for any given
and
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 247/368
Application to Signal Processing 235
if is large enough, say,By the definition of we have
which incorporating with (5.1.52) implies
and
Define
Since is well-defined
by (5.1.54). Notice that from to there is no truncation. Con-sequently,
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 248/368
236 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
To be fixed, let us takeFrom (5.1.52) and (5.1.54) it follows that sequences
starting from cross the intervalfor each This means that
crosses interval for eachHere, we call that the sequence
crosses an interval with if and
there is no truncation in the algorithm (5.1.32) forWithout loss of generality, we may assume converges:
It is clear that andBy Lemma 5.1.4, there is no truncation forif T is small enough.
Then, similar to (2.2.24), for large by Lemmas 5.1.3 and 5.1.4 wehave
where and
By Lemma 5.1.6, for large and small T we have
By Lemma 5.1.4 Noticing that
and by definition of crossing wesee that for small enough T ,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 249/368
Application to Signal Processing 237
This implies that
Letting in (5.1.57), we find that
which contradicts (5.1.58). The contradiction shows that
Thus, starting from the algorithm (5.1.32) suffers from no truncation.If did not converge as then
and would cross a nonempty interval
infinitely often. But this leads to a contradiction as shown above. There-fore, converges as
If were not zero, then there would exist a convergent
subsequence Replacing in (5.1.56) by from
(5.1.57) it follows that
Since converges, the left-hand side of (5.1.59) tends to zero,which makes (5.1.59) a contradictory inequality. Thus, we have proved
a.s.
Since from (5.1.40) it follows that
By (5.1.38) and the fact that we finally conclude that
The difficulty of applying the algorithm (5.1.32) consists in that thesecond moment of the noise may not be available. Identificationof channel coefficients without using will be discussed in Sec-tion 5.3, by using the principal component analysis to be described inthe next section.
a.s.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 250/368
238 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
5.2. Principal Component AnalysisThe principal component analysis (PCA) is one of the basic methods
used in feature extraction, signal processing and other areas. Roughly
speaking, PCA gives recursive algorithms for finding eigenvectors of asymmetric matrix A based on the noisy observations on A.
Let be a sequence of observed symmetric matrices, andThe problem is to find eigenvectors of A, in particular,
the one corresponding to the maximal eigenvalue.Define
with initial value being a nonzero unit vector. serves as anestimate for unit eigenvector of A.
If then is reset to a different vector with norm equalto 1.
Assume have been defined as estimates for unit
eigenvectors of A. Denote which isan where
where denotes the pseudo-inverse of Since for largeis a full-rank matrix,
Define
if withIf we redefine an with such that
Define the estimate for the eigenvalue corresponding to the
eigenvector whose estimate at time is by the following recursion.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 251/368
Application to Signal Processing 239
Take an increasingly diverging to infinity sequence
and define by the SA algorithm with expanding truncations:
where
We will use the following conditions:
A5.2.1 and
A5.2.2 are symmetric, and
A5.2.3 and
where is given by (1.3.2).
Examples for which (5.2.8) is satisfied are given in Chapters 1 and 2.
We now give one more example.
Example 5.2.1 Assume is stationary and ergodic,If then satisfies (5.2.8). Set By
ergodicity, we have a.s. By a partial summation it follows
that
which implies (5.2.8).
Let be the unit eigenvector of A corresponding to eigenvalue
where may not be different.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 252/368
240 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Theorem 5.2.1 Assume A5.2.1 and A5.2.2 hold. Then given by
(5.2.1)–(5.2.6) converges at those samples for which A5.2.3 holds,
and the limits of coincide with
Let denote the limit of as Then
Proof. Consider those for which A5.2.3 holds. We first prove con-
vergence of Note that may happen only for a finite
number of steps because as and By
boundedness of we expand into the power series of
where
Further, we rewrite (5.2.9) as
where
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 253/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 254/368
242 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Denote by S the unit sphere in Then defined by (5.2.2)
evolves on S.Define
The root set of on S is
Defining we find for
Thus, Condition A2.2.2(S) introduced in Remark 2.2.6 is satisfied.
Since is bounded, no truncation is needed. Then, by Remark
2.2.6 we conclude that converges to one of sayDenote
Inductively, we now assume
We then have
Since and from (5.2.21) and (5.2.5) it
follows that and by (5.2.6)
We now proceed to show that converges to one of unit eigenvectorscontained in
From (5.2.5) we see that the last term in the recursion
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 255/368
Application to Signal Processing 243
tends to zero as So, by (5.2.22) we need to reset withand at most for a finite number of times.
Replacing by in (5.2.9)–(5.2.11), we again arrive at
(5.2.11) for Precisely,
where
and
By noticing
and using (5.2.22), (5.2.23) can be rewritten as
where as
Since tends to an eigenvector of A, from (5.2.11) it follows that
where
Since converges, from (5.2.13) and it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 256/368
244 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Inductively, assume that
with satisfying (5.2.27), i.e.,
Noticing that for any matrix V, we have
by (5.2.28).
Since by (5.2.24), denoting by
the term we have
for any convergent subsequence
Denoting
from (5.2.26) we see
By (5.2.8) and (5.2.30), similar to (5.2.18)–(5.2.20), by Remark 2.2.6
converges to an unit eigenvector of From (5.2.5) it
is seen that converges since and Then from
(5.2.6) it follows that itself converges as
Thus, we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 257/368
Application to Signal Processing 245
From (5.2.5) it follows that
which implies that and consequently,
Since the limit of is an unit eigenvector of
we have
By (5.2.33) it is clear that can be expressed as a linear combi-nation of eigenvectors Consequently,
which incorporating with (5.2.34) implies that
This means that is an eigenvector of A, and is different from
by (5.2.33).Thus, we have shown (5.2.21) for To complete the induction it
remains to show (5.2.28) for
As have just shown,tends to zero as from (5.2.31) we have
where satisfies (5.2.29) with replaced by by taking
notice of that (5.2.30) is fulfilled for whole sequence because whichhas been shown to be convergent.
Elementary manipulation leads to
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 258/368
246 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
This expression incorporating with (5.2.35) proves (5.2.28) forThus, we have proved that given by (5.2.1)–(5.2.6)
converge to different unit eigenvectors of A, respectively.
To complete the proof of the theorem it remains to showRewrite the untruncated version of (5.2.7) as follows
We have just proved that Then by (5.2.8) and
noticing the fact that converges and we see that
satisfies A2.2.3.The regression function in (5.2.36) is linear:
Applying Theorem 2.2.1 leads to
Remark 5.2.1 If in (5.2.1) and (5.2.3) is replaced by Theo-rem 5.2.1 remains valid. In this case given by (5.2.18) should changeto and correspondingly changes to As a
result, the limit of changes to the opposite sign, fromto
5.3. Recursive Blind Identification by PCAAs mentioned in Section 5.1, the algorithm (5.1.32) for identifying
channel coefficients uses the second moment of the obser-vation noise. This causes difficulty in possible applications, because
may not be available.We continue to consider the problem stated in Section 5.1 with nota-
tions introduced there. In particular, (5.1.1)–(5.1.12), and (5.1.31) will
be used without explanation.In stead of (5.1.32) we now consider the following normalized SAalgorithm:
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 259/368
Application to Signal Processing 247
Comparing (5.3.1) and (5.3.2) with (5.2.1) and (5.2.2), we find thatthe channel parameter identification algorithm coincides with the PCAalgorithm with By Remark 5.2.1, Theorem 5.2.1 canbe applied to (5.3.1) and (5.3.2) if conditions A5.2.1, A5.2.2, and A5.2.3
hold.The following conditions will be used.
A5.3.1 The input is a sequence, i.e., there exist a con-
stant and a function such that for any
where
A5.3.2 There exists a distribution function over such that
where denotes the Borel in and
A5.3.3 The (2 L + 1) × (2 L + 1)-matrix is nondegenerate,
where
A5.3.4 The signal is independent of and
a.s., where is a random variable with
A5.3.5 All components of of are
mutually independent with and and is bounded where
is a constant.
A5.3.6 have no common root.
For Theorem 5.1.1, is assumed to be a sequence of mutuallyindependent random variables (Condition A5.1.6), while in A5.3.1 theindependence is weakened to a property, but the distribution of
A5.3.7 and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 260/368
248 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
is additionally required to be convergent. Although thereis no requirement on distribution of in Theorem 5.1.1, we noticethat (5.1.30) is satisfied if are identically distributed.
In the sequel, denotes the identity matrix.
Define with
and
In what follows denotes the Kronecker product.
Theorem 5.3.1 Assume A5.3.1–A5.3.7 hold. Then
where C is a -matrix and Q is given in A5.3.3, and for given by (5.3.1) and (5.3.2),
where J denotes the set of unit eigenvectors of C.
Proof. By the definition of we have
Since
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 261/368
Application to Signal Processing 249
and by A5.3.2, (5.3.3) im-
mediately follows.
From the definition (5.1.31) for by A5.3.5 it is clear that
is a -identity matrix multiplied by withThen by A5.3.4 and A5.3.5
Identifying inTheorem 5.2.1 to we find that Theorem5.2.1 can be applied to the present algorithm, if we can show (5.2.8),
which, in the present case, is expressed as
where is given by (1.3.2), and B is given by (5.3.6).
Notice, by the notation introduced by (5.1.33),
Since
and
by the convergence theorem for martingale difference
sequences, for (5.3.7) it suffices to show
Identifying and in Lemma 2.5.2 to
and respectively, we find that conditions required there are
satisfied. Then (5.3.8) follows from Lemma 2.5.2, and hence (5.3.7) is
fulfilled.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 262/368
250 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By Theorem 5.2.1 given by (5.3.1) and (5.3.2) converges to an
unit eigenvector of B, which clearly is an eigenvector of C.
Lemma 5.3.1 is the unique up to a scalar multiple nonzero vector
simultaneously satisfying
Proof. Since it is known that satisfies (5.3.9), it suffices to prove the
uniqueness.
As in the proof of Lemma 5.1.1, assume is
also a solution to (5.3.9). Then, along the lines of the proof of Lemma5.1.1, we obtain the analogue of (5.1.16), which implies (5.1.29):
where is given by (5.1.28) while by (5.1.16).
By A5.3.3 which is nondegener-
ate. Then we have The rest of proof for uniqueness coincides
with that given in Lemma 5.1.1.
By Lemma 5.3.1 zero is an eigenvalue of C with multiplicity one and
the corresponding eigenvector is Theorem 5.3.1 guar-antees that the estimate approaches to J , but it is not clear if tends to the direction of
Let be all different eigenvalues
of C. J is composed of disconnected sets andwhere Note that
the limit points of are in a connected set, so converges to a
for some Let We want to prove that
a.s. or This is the conclusion of Theorem5.3.2, which is essentially based on the following lemma, proved in [9].
Lemma 5.3.2 Let be a family of nondecreasing and
be a martingale difference sequence with
Let be an adapted random sequence and be a real sequence
such that and Suppose that onthe following conditions 1, 2 and 3 hold.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 263/368
Application to Signal Processing 251
2) can be decomposed into two adapted sequences and
such that
3) coincides with an random variablefor some
Then
Theorem 5.3.2 Assume A5.3.1–A5.3.7 hold. Then defined by
(5.3.1) and (5.3.2) converges to up-to a constant multiple:
where equals either
Proof. Assume the contrary: for some
Since C is a symmetric matrix, for where andhereafter a possible set with zero probability in is ignored. The proof
is completed by four steps.Step 1. We first explicitly expressExpanding defined by (5.3.2) to the power series of we derive
where
Noting and we derive
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 264/368
252 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where is defined by (5.1.4), is given by (5.1.10)
with replaced by the observation noise, and denotes theestimate for at time
By (5.3.4) and (5.3.5), there exists a.s. such thata.s.
For any integers and define and
Note that for
and by the convergence of from (5.3.12) it follows thatwhere is a constant for all in By
(5.3.7) we then have
as where and hereafter T should not be confused with thesuperscript T for transpose.
Choose large enough and sufficiently small T such thatLet
and It then follows
that forIn
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 265/368
Application to Signal Processing 253
for sufficiently large.
Consequently, for with fixed
and hence
Define
From (5.3.15) it follows that
Tending in (5.3.21) and replacing by in the resulting equal-
ity, by (5.3.19) we have
Thus, we have expressed in two ways: (5.3.21) shows that is
measurable, while (5.3.22) is in the form required in 5.3.2, where
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 266/368
254 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Step 2. In order to show that the summand in (5.3.22) can beexpressed as that required in Lemma 5.3.2 we first show that the series
is convergent on By (5.3.14) and (5.3.7) it suffices to showis convergent on
Define
and
Clearly, is measurable with respect to and Thenby the convergence theorem for martingale difference sequences,
By (5.3.16) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 267/368
Application to Signal Processing 255
The first term on the right-hand side of the last equality of (5.3.29) can
be expressed in the following form:
where the last term equals
Combining (5.3.30) and (5.3.31) we derive that the first term on the
right-hand side of the last equality of (5.3.29) is
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 268/368
256 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By A5.3.4, A5.3.5, and A5.3.7 it is clear that
Hence replacing by in (5.3.29) results in
producing an additional term of magnitude Thus, by (5.3.24)–
(5.3.26) we can rewrite (5.3.29) as
where and is By (5.3.28) and A5.3.7
the series (5.3.33) is convergent, and hence given by (5.3.23) is a
convergent series.
Step 3. We now define sequences corresponding to and in
Lemma 5.3.2.
Let We have
where
Denote
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 269/368
Application to Signal Processing 257
Then and are adapted sequences, is a mar-tingale difference sequence, and is written in the form of Lemma 5.3.2:
It remains to verify (5.3.10) and (5.3.11).
From (5.3.23) and (5.3.33) it follows that there is a constantsuch that Then for noticing
and
we have
By A5.3.4 and A5.3.5 it follows that
As in Step 4 it will be shown that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 270/368
258 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
From this it follows that
Then from the following inequality
by (5.3.34) and (5.3.36) it follows that
Therefore all conditions required in Lemma 5.3.2 are met, and we con-clude Since it follows that
and must converge to a.s.Step 4. To complete the proof we have to show (5.3.35).Proof. If (5.3.35) were not true, then there would exist a subsequence
such that
For notational simplicity, let us denote the subsequence still by
Since by A5.3.5 for if and for anybut if we then have
which incorporating with (5.3.37) implies that
and
Noticing that and from (5.3.38)
and (5.3.24) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 271/368
Application to Signal Processing 259
On the other hand, we have
and hence,
where denotes the estimate provided by for at timeSince for any
we have
Hence (5.3.40) implies that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 272/368
260 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and
By A5.3.4 the left-hand side of (5.3.41) equals
Since it follows that for any
The left side of (5.3.42) equals
Thus (5.3.42) implies that for any
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 273/368
Application to Signal Processing 261
Noticing from (5.3.25) we have
Then by A5.3.5, (5.3.39) implies that for any
Notice that
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 274/368
262 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Then by A5.3.5, from (5.3.45)–(5.3.47) it follows that
and hence for any
and
Notice that (5.3.49) means that
However, the above expression equals
Therefore,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 275/368
Application to Signal Processing 263
In the sequel, it will be shown that (5.3.43), (5.3.44), (5.3.48), and(5.3.50)) imply that which contradicts with
This means that the converse assumption (5.3.37) is not true.
For any since are coprime, where isgiven in (5.1.6), there exist polynomials such that
Let and be the degrees of and respectively. SetIntroduce the q-dimensional vector and q × q
square matrices W and A as follows:
Note that where and Then (5.3.43), (5.3.44), (5.3.48),and (5.3.50) can be written in the following compact form:
To see this, note that for any fixed and on the left hand sides of
(5.3.48) and (5.3.50) there are 2 L different sums when varies from 0 to L – 1 and replace roles each other. These together with (5.3.43) and(5.3.44) give us 2 L + 1 sums, and each of them tends to zero. Explicitlyexpressing (5.3.52), we find that there are 2 L +1 nonzero rows and eachrow corresponds to one of the relationships in (5.3.43), (5.3.44), (5.3.48),and (5.3.50).
Since we have put enough zeros in the definition of after multiply-ing the left hand side of (5.3.52) by
has only shifted nonzero elements inFrom (5.3.52) it follows that for any and in
(5.3.51)
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 276/368
264 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
From (5.3.53) it follows that
Note that for any polynomial of degree if the last elements of are zeros. From (5.3.54) it follows that
Denoting
from (5.3.55) we find that
By the definition of the first elements of are zeros, i.e.,
This means that the lastelements of are zeros, i.e.,
On the other hand,
By (5.3.56), from (5.3.57) and (5.3.58) it is seen that i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 277/368
Application to Signal Processing 265
From (5.3.53) it then follows that
i.e., But this is impossible, becauseare unit vectors. Consequently, (5.3.37) is impossible and this completes
the proof of Theorem 5.3.2.
which, however, is unknown.It is required to design the optimal weighting X, which
minimizes
under constraint
where C and are matrices, respectively. In the
case where C = 0, the problem is reduced to the unconstrained one.It is clear that (5.4.3) is solvable with respect to X if and only if
and in this case the solution to (5.4.3) is
where Z is anyFor notational simplicity, denote
Let L(C ) denote the vector space spanned by the columns of matrix C ,and let the columns of matrix be an orthogonally normalized basis
5.4. Constrained Adaptive FilteringWe now apply SA methods to adaptive filtering, which is an important
topic in signal processing. We consider the constrained problem, whilethe unconstrained problem is only a special case of the constrained one
as to be explained.Let and be two observed sequences, where and are
respectively. Assume is stationary and
ergodic with
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 278/368
266 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
of L(C ). Then there is a full-rank decompositionNoticing we have Let bean orthogonal matrix. Then
and hence
From this it follows that
and hence a.s. This implies that
Let us express the optimal X minimizing (5.4.2) via By(5.4.8) substituting (5.4.4) into (5.4.2) leads to
On the right-hand side of (5.4.9) only the first term, which is quadratic,depends on Z. Therefore, the optimal should be the solution of
i.e.,
where is any satisfying
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 279/368
Application to Signal Processing 267
Combining (5.4.4) with (5.4.11), we find that
Using the ergodic property of we may replace and bytheir sample averages to obtain the estimate for And, the esti-mate can be updated by using new observations. However, to updatethe estimate, it involves taking the pseudo-inverse of the updated esti-mate for which may be of high dimension. This will slow down thecomputation speed. Instead, we now use an SA algorithm to approach
By (5.4.8), we can rewrite (5.4.10) as
or
We now face to the standard root-seeking problem for a linear function
As before, letand
Thefollowing algorithm is used to estimate given by (5.4.12), which inthe notations used in previous chapters is the root set J for the linearfunction given by (5.4.14):
with initial value such that and
Theorem 5.4.1 Assume that is stationary and ergodic with sec-
ond moment given by (5.4.1) and that Then, after a finite number of steps, say (5.4.16) has no more
truncations, i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 280/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 281/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 282/368
270 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Then from (5.4.26) it follows that
Denote
and
Since is stationary and ergodic, a.s., and
Then by a partial summation, we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 283/368
Application to Signal Processing 271
Notice that a.s. by ergodicity. Then for large
and from (5.4.29) it follows that
where (5.4.24) is used incorporating with the fact that
and is stationary with E
From (5.4.27)–(5.4.30) by convergence of it follows that
for large and small T , where and are constants independent of
andConsequently, in the case i.e., in
(5.4.16), will never reach the truncation bound for
if is large enough and T is small enough.
Then coincides with This verifies(5.4.22), while (5.4.23) follows from (5.4.16) because for a fixed
and are bounded, and
are also bounded by (5.4.31) and the convergence In
the case i.e., ‚ for some is
bounded, and hence (5.4.22) and (5.4.23) are also satisfied.We are now in a position to verify the noise condition required in
Theorem 2.2.1 for given by (5.4.20), i.e., we want to show that
for any convergent subsequence
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 284/368
272 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By (5.4.24)
so for (5.4.32) it suffices to show
Again, by (5.4.24) and also by (5.4.23)
which implies (5.4.33). By Theorem 2.2.1, there is such that for is defined by(5.4.17) and converges to the root set J for given by (5.4.14).This completes the proof for the theorem.
Remark 5.4.1 For the unconstrained problem and C = 0, thealgorithm (5.4.16) becomes
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 285/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 286/368
274 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Theorem 5.5.1 Assume is stationary and ergodic with
Then
where is defined by (5.5.4) and (5.5.5) with an arbitrary initialvalue. In addition, in a finite number of steps truncations cease to exist
in (5.5.4).
Proof. Define
and
Let be a countable set that is dense in let and betwo sequences of positive real numbers such that andas and denote
and
where and is an integer.
The summands of (5.5.9)–(5.5.11) are stationary with finite expecta-tions for any any integer any and any and then theergodic theorem yields that
a.s.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 287/368
Application to Signal Processing 275
and
Therefore, there is an such that and for eachthe convergence for (5.5.12)–(5.5.14) takes place for any anyinteger any and any
Let us fix anWe first show that for any fixed
if is large enough (say, for ), and in addition,
where c is a constant which may depend on but is independent of In what follows always denote constants that may
depend on but are independent of By (5.4.24) we have for any
There are two cases to be considered. If then for largeenough, and (5.5.15) holds. If is bounded, then thetruncations cease to exist after a finite number of steps. So, (5.5.15) alsoholds if is sufficiently large. Then (5.5.16) follows immediately from
(5.5.15) and (5.5.17).Let us define
where is given by (5.5.2). Then (5.5.15) can be represented as
Let be a convergent subsequence of and letbe such that We now show that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 288/368
276 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Let By (5.5.16) or forsome integer
We examine that the terms on the right-hand side of (5.5.20) satisfy(5.5.19) .
For the first term on the right-hand side of (5.5.20) we have
where and are deterministic for a fixed and the expecta-tion is taken with respect to and
Since (5.5.6), a.s., applying the dominated convergencetheorem yields
Then from (5.5.21) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 289/368
Application to Signal Processing 277
Similarly, for the second term on the right-hand side of (5.5.20) wehave
since a.s.
For the third term on the right-hand side of (5.5.20) by (5.4.24),(5.5.10), and (5.5.13) we have
sinceFinally, for the last term in (5.5.20), by (5.5.14) and (5.4.24) we have
where the last convergence follows from the fact that
a.s. as since and
a.s.Combining (5.5.23)–(5.5.26) yields that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 290/368
278 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Since the left-hand side of (5.5.27) is free of tending to infinity in(5.5.27) leads to (5.5.19). Then the conclusion of the theorem followsfrom Theorem 2.2.1 by noticing that as in A2.2.2 one may take
5.6. Asynchronous Stochastic ApproximationWhen dealing with large interconnected systems, it is natural to con-
sider the distributed, asynchronous SA algorithms. For example, in acommunication network with servers, each server has to allocate audioand video bandwidths in an appropriate portion in order to minimizethe average time of queueing delay. Denote by the bandwidthratio for the server, and Assume the average delay
time depends on only and is differentiable,Then, to minimize is equivalent to find the root of Assumethe time, denoted by spent on transmitting data from the serverto the server is not negligible. Then at the server for theiteration we can observe or only at where
denotes the total time spent until completion of iterations for theserver. This is a typical problem solved by asynchronous SA. Simi-
lar problem arises also from job-scheduling for computers in a computernetwork.We now precisely define the problem and the algorithm.
At time denote by the estimate for the unknown
root of Components of are observedby different processors, and the communication delays from theprocessor to the processor at time are taken into account. Theobservation of the processor is carried only ati.e.,
where is the observation noise.In contrast to the synchronous case, the update steps now are different
for different processors, so it is unreasonable to use the same step sizefor all processors in an asynchronous environment. At time the stepsize used in the processor is known and is denoted by
We will still use the expanding truncation technique, but we are un-able to simultaneously change estimates in different processors when theestimate exceeds the truncation bound because of the communicationdelay.
Assume all processors start at the same given initial valueand for all The observation at
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 291/368
Application to Signal Processing 279
the processor is and is updated to by the rule givenbelow. Because of the communication delay the estimate produced bythe processor cannot reach the processor for the initial steps:
By agreement we will take to serve as whenever
At the processor, there are two sequences andare recursively generated, where is the estimate for the
component of at time and is connected with the number of truncations up-to and including time at the processor. For the
processor at time the newest information about other processorsis In all algorithms discussed until
now all components of are observed at the same point at timeand this makes updating to meaningful. In the present case,
although we are unable to make all processors to observe at thesame points at each time, it is still desirable to require all processorsobserve at points located as close as possible. Presumably, thiswould make estimate updating reasonable. For this, by noticing thatthe estimate gradually changes after a truncation, the ideal is to keepall are equal, but for this the best we can do is to
equalize with otherKeeping this idea in mind, we now define the algorithm and the ob-
servations for the processor,Let be a fixed point from where the algorithm
restarts after a truncation.i) If there exists with then reset to equal the biggest
one among and pull back to the fixed point although
may not exceed the truncation bound. Precisely, in this case define
and observe
for any then observe atii) If
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 292/368
280 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
i.e.,
For both cases i) and ii), and are updated as follows:
where is the step size at time and may be random, and
is a sequence of positive numbers increasingly diverging to in-
finity.
Let us list conditions to be used.
A5.6.1 is locally Lipschitz continuous.
A5.6.2 and
there exist two positive constants such that
A5.6.3 There is a twice continuously differentiable function (not neces-
sarily being nonnegative) such that
and is nowhere dense, where
and denotes the gradient of
A5.6.4 For any convergent subsequence any and any
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 293/368
Application to Signal Processing 281
where
and
A5.6.5
Note that (5.6.10) holds if is bounded, since Note
also that A5.6.3 holds if and
Theorem 5.6.1 Let be given by (5.6.1)–(5.6.6) withinitial value Assume A5.6.1–A5.6.5 hold, and thereis a constant such that and
where is given in A5.6.3. Then
where
The proof of the theorem is separated into lemmas. From now on wealways assume that A5.6.1–A5.6.5 hold.
We first introduce an auxiliary sequence and its associated ob-servation noise It will be shown that differs from only
by a finite number of steps. Therefore, for convergence of it sufficesto prove convergence of
Let be a sample
path generated by the algorithm (5.6.1)–(5.6.6), where is the one afterresetting according to (5.6.2). Let where is
defined in A5.6.4. Assume By the resetting rule given
in i), for any after resetting we have For we
have and by the definition of
In the processor we take and to replace and
respectively, and define for those
Further, define and for
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 294/368
282 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Then we obtain new sequences associated withBy (5.6.1)–(5.6.6), if then there exists a with
and
since and forBecause during the period there is no truncationfor the sequences are recursively
updated as follows:
where
Define delays for as follows
is available to the processor at time
Lemma 5.6.1 For any any convergent subsequence
and any satisfies the following condition
where
Proof. Since equals either or which is available at time itis seen that
For by definition of we have
which is certainly available to the processor. Therefore,
We rewrite By the definition of and
paying attention to (5.6.17) we see
so
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 295/368
Application to Signal Processing 283
We now show that (5.6.18) is true for all Forthere is no truncation for the processor,
and hence by the resetting rule i). If
for some then by (5.6.16) and the definition of it follows that
which implies (5.6.18).If for some then as explained above for the processor
at time the latest information about the estimate produced by theprocessor is In other words,
However, by definition of which yields
This again implies (5.6.18).In summary, we have
This means that for there is no truncation at any time equal toand the observation is carried out at
i.e.,
For any any convergent subsequence and any wehave
By (5.6.11), Then from A5.6.2 and
A5.6.5 it follows that and hence the second term
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 296/368
284 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
on the right-hand side of (5.6.21) tends to zero as Further,from the definition of there is such that Hence the firstterm on the right-hand side of (5.6.21) is of order o(T ) by A5.6.4. Con-sequently, from A5.6.2, A5.6.4 and A5.6.5 it follows that satisfies(5.6.15).
Lemma 5.6.2 Let be generated by (5.6.12)–(5.6.14). For any con-vergent subsequence of if is bounded,
then there are and such that
where is given in (5.6.14).
where is given in A5.6.2.By (5.6.15) for convergent subsequence there exists such
that for any and
Choose such that For any let
Then for any
If then if is sufficiently large,i.e., no truncation occurs after and hence for
If then there exists such that forany From (5.6.24) it follows that
Therefore, in both cases
Proof. Let whereand
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 297/368
Application to Signal Processing 285
If then for sufficiently large
i.e.,
This contradicts the definition of Therefore,
Lemma 5.6.3 Let be given by (5.6.12) – (5.6.14). For any
with the following assertions take place:
i) In the case, cannot cross infinitely manytimes keeping bounded, where are the starting
points of crossing;
ii) In the case cannot converge to keeping
bounded.
Proof . i) Since is bounded, there exists a convergent subse-quence, which is still denoted by for notational simplicity,
By the boundedness of and (5.6.22)
for sufficiently large there is no truncation between andand hence
where By (5.6.20), (5.6.22) and
it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 298/368
286 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By A5.6.2 and A5.6.3 we have
Then by A5.6.1
where is the Lipschitz coefficient of in and
By the boundednessof
and the fact that there is no truncation between and it follows
that
Without loss of generality, we may assume is a convergent se-quence. Then by A5.6.3 and A5.6.5
Therefore,
where
Since is continuous for fixed by A5.6.4 there exists afor such that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 299/368
Application to Signal Processing 287
Thus, for sufficiently small T and sufficiently large we have
On the other hand, by Lemma 5.6.2
Thus, for sufficiently small T , and
This contradicts (5.6.31), and i) is proved.
ii) If is bounded, then there is a convergent subsequenceThen the assertion can be deduced by a similar way as that for i).
Lemma 5.6.4 Under the conditions of Theorem 5.6.1
where is given by (5.6.14).
Proof. If then there exists a sequence such that
From (5.6.12)–(5.6.14) we haveChoose a small positive constant such that
Let be a connectedset containing and included in the set
and let be a connected set containing and included in the setClearly, and and are
bounded.
Since diverges to infinity, there exists such thatfor Noting that there exists i such that
and we can define and
for
Since there is a convergent subsequence in also de-
noted by Let be a limit point of By the definition of is bounded. But
crosses infinitely many times, and it
is impossible by Lemma 5.6.3. Thus,
Proof of Theorem 5.6.1
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 300/368
288 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By Lemma 5.6.4 is bounded. Let
If then by Lemma 5.6.3, we have
If then there are and such thatand since is nowhere dense. But by
Lemma 5.6.3 this is impossible. Therefore,We now show If there is a convergent subsequence
and then (5.6.26)–(5.6.30) still hold. Hence,
This is a contradiction to
Consequently, i.e.,
Since and the truncations occur onlyfor finitely many times. Therefore, and differ from each other onlyfor a finite number of So,
5.7. Notes and ReferencesFor blind identification with “block” algorithms we refer to [71, 96].
Recursive blind channel identification algorithms appear to be new. Sec-tion 5.1 is written on the basis of the joint work “H. F. Chen, X. R. Cao,and J. Zhu, Convergence of stochastic approximation based algorithmsfor blind channel identification”. Principal component analysis is ap-plied in different areas (see, e.g., [36, 79]). The results presented inSection 5.2 are the improved version of those given in [101]. The princi-pal component analysis is applied to solve the blind identification prob-lem in Section 5.3, which is based on the recent work “H. T. Fang and
H. F. Chen, Blind channel identification based on noisy observation bystochastic approximation method”. The proof of Lemma 5.3.2 is givenin [9].
For adaptive filter we refer to [57]. The results presented in Sec-tion 5.4 are stronger than those given in [11, 28]. The sign algorithmsare dealt with in [42], but conditions used in Section 5.5 are consider-ably weaker than those in [42]. Section 5.5 is based on the recent work“H. F. Chen and G. Yin, Asymptotic properties of sign algorithms for
adaptive filtering”.Asynchronous stochastic approximation was considered in [9, 88, 89,
99]. Section 5.6 is written on the basis of [50].
i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 301/368
Chapter 6
APPLICATION TO SYSTEMS
AND CONTROL
Assume a control system depends on a parameter and the systemoperation reaches its ideal status when the parameter equals someSince is unknown, we have to estimate it during the operation of thesystem, which, therefore, can work only on the estimate of Inother words, the real system is not under the ideal parameter and
the problem is to on-line estimate and to make the system asymptot-ically operating in the ideal status. It is clear that this kind of systemparameter identification can be dealt with by SA methods.
Adaptive control for linear stochastic systems is a typical examplefor the situation described above. If the system coefficients are known,then the optimal stochastic control may be a feedback control of thesystem state. The corresponding feedback gain can be viewed as theideal parameter which depends on the system coefficients. In the setupof adaptive control, system coefficients are unknown, and hence isunknown. The problem is to estimate and to prove that the resultingadaptive control system by using the estimate as the feedback gain isasymptotically optimal as tends to infinity.
In Section 6.1 the ideal parameter is identified by SA methods forsystems in a general setting, and the results are applied to solving theadaptive quadratic control problem. The adaptive stabilization problemis solved for stochastic systems in Section 6.2, while the adaptive exactpole assignment is discussed in Section 6.3. An adaptive regulationproblem for nonlinear and nonparametric systems is considered is Section6.4.
289
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 302/368
290 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
6.1. Application to Identification and AdaptiveControl
Consider the following linear stochastic system depending on param-
eter
where andare unknown.
The ideal parameter for System (6.1.1) is a root of an unknownfunction
The system actually operates with equal to some estimate for ,i.e., the real system is as follows:
For the notational simplicity, we suppress the dependence on thestate and rewrite (6.1.3) as
The observation at time is
where is a noise process.From (6.1.5) it is seen that the function is not directly observed,
but it is connected with as follows:
We list conditions that will be used.
where is generated by (6.1.1).Let be a sequence of positive numbers increasingly diverging to
infinity and let be a fixed point. Fixing an initial value werecursively estimate by the SA algorithm with expanding truncations:
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 303/368
291 Application to Systems and Control
A6.1.2 There is a continuously differentiable function
such that
for any and is nowhere dense,
where J is given by (6.1.2). Further, used in (6.1.8) is such that
inf for some and
A6.1.3 The random sequence in (6.1.1) satisfies a mixing condition
characterized by
uniformly in where Further, is such that
sup where
A6.1.4 For sufficiently large integer
for any such that converges, where is given by (1.3.2).
Let is stable}, and let be an open, connected subsetof
A6.1.5 and f are connected by (6.1.6) and (6.1.1) for each
satisfies a local Lipschitz condition on
with for any constants and where is
given in A6.1.3.
with
A6.1.1 and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 304/368
292 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
A6.1.6 and in (6.1.1) are globally Lipschitz continuous:
where L is a constant.
A6.1.7 given by (6.1.7) is If converges for some
then where may depend on
Theorem 6.1.1 Assume A6.1.1–A6.1.7 hold. Then
where is a connected subset of
Proof. By (6.1.5) we rewrite the observation in the standard from
where
By Theorem 2.2.2 and Condition A6.1.4, the assertion of the theoremwill immediately follow if we can show that for almost all condition(2.2.2) is satisfied with replaced by
Let be expressed as a sum of seven terms:
where
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 305/368
293 Application to Systems and Control
where
and and denote the distribution and
conditional distribution of given respectively.To prove the theorem it suffices to show that there exists with
such that for each all satisfy
(2.2.2) with respectively identified to
By definition, for any there is such that
where is independent of
Let us first show that satisfy (2.2.2).Solving (6.1.1) yields
By A6.1.3 is bounded. Hence, by (6.1.18) is bounded andby A6.1.5 is also bounded:
where
where is given in A6.1.5.
Since we haveWe now show that and are continuous in uni-
formly with respect toBy (6.1.18) and (6.1.20), from (6.1.19) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 306/368
By (6.1.18) (6.1.20) and the Lipschitz condition A6.1.5 for it followsthat
and
294 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
which implies the uniform continuity of This together with (6.1.13)
yield that is also uniformly continuous.Let be a countable dense subset of Noticing that is and expressing
as a sum of martingale difference sequences
by (6.1.20) and we find that there is withsuch that for each
for any integer and any From here by uniform continuity of it follows that for and for any integer
Note that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 307/368
This is because by (6.1.18) and (6.1.20) we have the following estimate:
We now estimate by the treatment used in Lemma 2.5.2. By ap-plying the Jordan-Hahn decomposition to the signed measure
Similarly, we can find with such that forand
since is bounded by the martingale convergencetheorem. It is worth noting that (6.1.23) holds a.s. for anybut without loss of generality (6.1.23) may be assumed to hold for all
with To see this, we first selectthat (6.1.23) holds for any This is possible
because is a countable set. Then, we notice that iscontinuous in uniformly with respect to Thus, we have
295 Application to Systems and Control
with
on
such
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 308/368
where is the mixing coefficient given in A6.1.3. Thus, by(6.1.27)–(6.1.29) we have
and
296 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
it is seen that there is a Borel set D in the sampling spacesuch that for any A in the sampling space
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 309/368
Application to Systems and Control 297
By A6.1.5, (6.1.18), (6.1.20), and noticing we find
whose expectation is finite as explained for (6.1.20). Therefore, on theright-hand side of (6.1.30) the conditional expectation is bounded withrespect to by the martingale convergence theorem, and the last term isalso bounded with respect to Thus, by (6.1.10) from (6.1.30) it follows
that there is with such that
Assume is a convergent subsequence
Define
Write (6.1.4) as
Let be fixed.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 310/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 311/368
299 Application to Systems and Control
whereBy induction we now show that
for all suitable large .
For any fixed if is large enough, since
Therefore (6.1.36) holds for sinceAssume (6.1.36) holds for some By notic-
ing from (6.1.34) and (6.1.35) it follows that
By using (6.1.20) (6.1.37) and the inductive assumption and applying
(6.1.19) to it follows that
for where and satisfies thefollowing equation
By A6.1.7 and (6.1.20) we have
and using (6.1.18), (6.1.37), and the inductive assumption we derive
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 312/368
300 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
This combining with (6.1.38) leads to that there are real numbersand such that
for From here it follows that
From the inductive assumption it follows that for
for some large enough integer N . Then by (6.1.12)
Setting
we derive
where (6.1.22), (6.1.24), (6.1.25), (6.1.31), (6.1.39), and (6.1.40) areused.
Choose sufficiently small so that (6.1.35) holds, and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 313/368
Application to Systems and Control 301
Since by A6.1.5 there is such that
for all From (6.1.41) it then follows that
It can be assumed that is sufficiently large so that
Since by (6.1.42) it follows that
and hence there is no truncation at
Thus, we have
or equivalently,
which proves (6.1.36).
Consequently, (6.1.39) is valid for andhence
times and
and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 314/368
where is the estimate of Let be given by (6.1.7) and (6.1.8) with given by (6.1.5).
where and are related by (6.1.44).However, since the ideal is unknown, the real system satisfies the
equation
where and are symmetric such that andLet given by A6.1.3). The control
where is the feedback control which is required to minimize
Finally, noticing that A6.1.5 assumes (6.1.6), we conclude that for each
all satisfy (2.2.2) with
respectively replaced by The proof of the theorem iscompleted.
We now apply the obtained result to an adaptive control problem.Assume that is the ideal parameter for the system, being
the unique zero of an unknown function The system in the idealcondition is described by the equation
302 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
From (6.1.21) and (6.1.13) it is seen that is contin-uous in uniformly with respect to Therefore, its limit is acontinuous function. Then by (6.1.36) it follows that
should be selected in the family U of admissible controls:
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 315/368
Application to Systems and Control 303
In order to give adaptive control we need the expression of the optimalcontrol when is known.
Lemma 6.1.1 Suppose that
is a martingale difference sequence with
ii) where is controllable and observ-
able, i.e., · · · , and
· · · , are of full rank.Then in the class of nonnegativedefinite matrices there is an unique satisfying
and
where
and
Proof. The existence of an unique solution to (6.1.50) and stability of F given by (6.1.51) are well-known facts in control theory. We show theoptimality of control given by (6.1.52).
For notational simplicity , we temporarily suppress the dependence of
and on and write them as A, B,
and D, respectively.Noticing
is stable. The optimal control minimizing (6.1.45) is
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 316/368
304 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
we then have
Since by the estimate for the weighted sum of martingale
difference sequence from (6.1.55) it follows that
where is the state in (6.1.47).
Thus the closed system becomes
Notice that the last term of (6.1.56) is nonnegative. The conclusions of
the lemma follow from (6.1.56).
According to (6.1.52), by the certainty-equivalence-principle, we form
the adaptive control
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 317/368
305 Application to Systems and Control
which has the same structure as (6.1.4). Therefore, under the assump-
tions A6.1.1–A6.1.7 with replaced by and with J being asingleton by Theorem 6.1.1 it is concluded that
By continuity and stability of it is seen that there are andpossibly depending on such that
This yields the boundedness of and
because
By (6.1.60) it follows that
Therefore, the closed system (6.1.58) asymptotically operates under theideal parameter and makes the performance index (6.1.45) minimized.
6.2. Application to Adaptive StabilizationConsider the single-input single-output system
where and are the system input, output, and noise, respec-
tively, and
where is the backward shift operator,The system coefficient
is unknown. The purpose of adaptive stabilization is to design control
so that
a.s.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 318/368
The fact that and a can be solved from (6.2.5) for anymeans that
is nonzero. In other words, the coprimeness of and is equiva-lent to
In the case is unknown the certainly-equivalency-principle suggestsreplacing by its estimate to derive the adaptive control law. How-ever, for may be zero and (6.2.5) may not be solvable withand replaced by their estimates.
Let us estimate by the following algorithm called the weighted leastsquares (WLS) estimate, which is convergent for any feedback control
306 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
If is known and if and are coprime, then for an arbitrarystable polynomial of degree there are unique polynomials
and both of order with such that
Then the feedback control generated by
leads the system (6.2.1) to
Then, by stability of (6.2.4) holds if assume
Considering coefficients of and as unknowns, and identifyingcoefficients of for both sides of (6.2.5), we derivea system of linear algebraic equations with matrix for unknowns:
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 319/368
Application to Systems and Control 307
where
Though converges a.s., its limit may not be the true If a boundedsequence can be found such that the modified estimate
and for some
is convergent and
then the control obtained from (6.2.6) with replaced by solves theadaptive stabilization problem, i.e., makes (6.2.4) to hold.
Therefore, the central issue in adaptive stabilization is to find a bound
-ed sequence such that given by (6.2.12) is convergent and(6.2.13) is fulfilled. This gives rise to the following definition.Definition. System (6.2.1) is called adaptively stabilizable by the use
of parameter estimate if there is a bounded sequence such that
(6.2.13) holds and given by (6.2.12) is convergent.
It can be shown that if system (6.2.1) is controllable, i.e., andare coprime, then it is adaptively stabilizable by the use of the WLS
estimate. It can also be shown that the system is adaptively stabilizable
by use of if and only if where and F denotethe limits of and respectively, which are generated by (6.2.9)–(6.2.11).
We now use an SA algorithm to recursively produce such thatis convergent and the resulting estimate by (6.2.12) satisfies
(6.2.13).
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 320/368
is generated by (6.2.9)–(6.2.11), is defined by (6.2.11), and isrecursively defined by an SA given below.
Let us take a few real sequences defined as follows:
where
which can be written as
From algebraic geometry it is known that is a
finite set.
However, is not directly observed; the real observation is
The root set of is denoted by where
where
As a matter of fact,
Let and be –dimensional, and let
308 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 321/368
Application to Systems and Control 309
Let be l-dimensional with only one nonzero element equal toeither +1 or –1, Similarly, let be -dimensionalwith only nonzero elements, each of which equals either 1 or – 1,
The total number of such vectors is
Normalize these vectors and denote the resulting vectors byin the nondecreasing order of the number of nonzero elements in
Define and for Introduce
Define the recursive algorithm for as follows:
and is a fixed vector.The algorithm (6.2.23)–(6.2.27) is the RM algorithm with expanding
truncations, but it differs from the algorithm given by (2.1.1)–(2.1.3)
as follows. The algorithm (2.1.1)–(2.1.3) is truncated at the upper sideonly, but the present algorithm is truncated not only at the upper sidebut also at the lower side: is allowed neither to diverge to infinity norto tend to zero; whenever it reaches the truncation bounds the estimate
is pulled back to and is enlarged to at the upper side,while at the lower side is pulled back to which will change to the
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 322/368
310 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
next whenever is satisfied. If for successiveresettings of we have to change to the next one, then we reduceto
Lemma 6.2.1 Assume the following conditions hold:
A6.2.2 System (6.2.1) is adaptively stabilizable by use of generated
by (6.2.9)–(6.2.11), i.e.,
If then after a finite number of steps the algorithm (6.2.23)– (6.2.27) becomes the RM algorithm
converges and
Proof. The basic steps of the proof are essentially the same as those forproving Theorem 2.2.1, but some modifications should be made becauseof truncations at the lower side.
Step 1. Let be a convergent subsequence of
For any define the RM algorithm
with or for some for some
We show that there are M > 0, T > 0 such thatwhen and
when if is large enough, where is givenby (1.3.2).
Let > 1 be a constant such that
It is clear that
A6.2.1 and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 323/368
Application to Systems and Control 311
Since and are convergent, there is such that
Let By (6.2.29) and (6.2.30), we have
for if and for if whereLet (6.2.31) hold for or
It then follows that
where orThus, (6.2.31) has been inductively proved for
orStep 2. Let be a convergent subsequence. We show that there
are M > 0 and T > 0 such that
if is large enough.If defined by (6.2.25) is bounded, then (6.2.32) directly follows.Again take such that and setAssume Then there is a such that
By the result proved in Step 1, starting from the algo-
rithm for cannot directly hit the sphere with radius without atruncation for So it may first hit somelower bound at time and switch to some
from which again by Step 1 cannot directly reach without atruncation. The only possibility is to be truncated again at a lowerbound. Therefore, (6.2.32) takes place.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 324/368
312 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Step 3. Since and are convergent, by (6.2.32) it follows that
from any convergent subsequence there are constants andsuch that
if is large enough.Consequently, there is such that
By (6.2.32) and the convergence of and it also follows that
Therefore,
Using (6.2.33) and (6.2.34) by the same argument as that given in Step 3
of the proof for Theorem 2.2.1, we arrive at the following conclusion.If starting from the algorithm (6.2.24) is calculated
as an RM algorithm and is bounded, then for
any with and cannot crossinfinitely often.
Step 4. We now show that is bounded.
If is unbounded, then as Therefore,is unbounded and comes back to the fixed point infinitely manytimes.
Notice that is a finite set and
We see that there is an interval with and
0 such that crosses infinitely often, and during each cross-
ing the algorithm (6.2.24) behaves like an RM algorithm with staringpoint It is clear that is bounded because asBut by Step 3, this is impossible. Thus, we conclude that
is bounded, and after a finite number of steps (6.2.24) becomes
as
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 325/368
Application to Systems and Control 313
Step 5. We now show (6.2.28), i.e., after a finite number of steps thealgorithm (6.2.35) ceases to truncate at the lower side.
Since and by A6.2.2, it follows that thereis at least one nonzero coefficient in the polynomial for some
with Therefore, for some and a small
From (6.2.16) it is seen that for sufficiently small we have
This combining with convergence of and leads to
for sufficiently large
From (6.2.26) and (6.2.36) it follows that must be bounded, andhence is bounded. This means that there is a such that
We now show that is bounded.Since for all sufficiently large it follows that
were unbounded, then by (6.2.37) the algorithm, starting fromwould infinitely many times enter the sphere with radius
where is small enough such that
Then would cross infinitely often an intervalSince is a finite set, we may assume It is clearthat during the crossing the algorithm behaves like an RM algorithm.By Step 4, this is impossible.
Therefore, there is a such that
Noticing (6.2.20), (6.2.34), and that serves as the Lyapunov func-tion for from Theorem 2.2.1 we conclude the remaining assertionsof the lemma.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 326/368
where and are defined by 1)-3) described above.
Proof. The key step is to show that
Assume the converse:
Case i) The assumption implies that
and occurs infinitely many times. However,
this is impossible, since and The contradictionshows
Theorem 6.2.1 Assume conditions A6.2.1 and A6.2.2 hold. Then thereis such that and converges and
and use to produce the adaptive control as in 1), and go back to
1) for3) If and none of a)-c) of 2) is the case, then set
and go back to 1) for and at the same time change to
i.e.,
Define
314 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Using we now define in (6.2.12) satisfying (6.2.13) and thussolving the adaptive stabilization problem.
Let1) If then set Using we produce
the adaptive control from (6.2.6) with and defined from
(6.2.5) with replaced by and go back to 1) for2) If then definea) for the case where
b) defined by (6.2.24) for the case where
butc) for the case, where
but
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 327/368
and the algorithm defining will run over the following cases: 1) and2a)-2c). Since and are convergent, the inequality
for all sufficiently large Again, this means that (6.2.41) may take placeat most a finite number of times, and we conclude that
Thus, there is such that
If then from (6.2.43) it follows that
Since and for sufficiently large
from (6.2.42) it follows that
for all sufficiently large Thus, (6.2.41) may take place at most a finitenumber of times. The contradiction shows that
we havethen as
Take a convergent subsequence of For notational simplicity de-
note by itself its convergent subsequence. Thus
By Lemma 6.2.1,1) If then
Application to Systems and Control 315
Case ii) The assumption implies that there
is a sequence of integers such that andi.e., for all the following indicator equals one
2) If
implies
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 328/368
6.3. Application to Pole Assignment for Systems
with Unknown CoefficientsConsider the linear stochastic system
where is the -dimensional state, is the one-dimensional control,and is the -dimensional system noise.
The task of pole assignment is to define the feedback control
in order that the characteristic polynomial
of the closed-loop system coincides with a given polynomial
The pair is called similar to if there exists a nonsin-gular matrix such that
where denotes the column of T .
Consequently, the truncation at the lower bound in (6.2.24) should bevery rare. The computation will be simplified if there is no lower boundtruncation.
316 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
for sufficiently large This means that the algorithm can be at 2b)only for finitely many times. By the same reason it cannot be at 2c)for infinitely many times. Therefore, the algorithm will stick on 1) if
and on 2a) if and in both cases there is a
such that and
The convergence of follows from the convergence of and
Remark 6.2.1 For the case the origin is not a stableequilibrium for the equation
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 329/368
So, is nonsingular if and only if is nonsingular.Assume that is controllable and is already in its con-
troller form (6.3.5). For notational simplicity, we will write rather
than
where
which imply
Application to Systems and Control 317
Define
where are coefficients of
The pair is called the controller form associated to the pair
If is controllable, i.e., is of full rank,then is similar to its controller form. To see this, we note that(6.3.4) implies and from it followsthat
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 330/368
where is the system noise at time “1” for the system with feedbackgain applied.
Having observed we compute its characteristic polynomial detwhich is a noise-corrupted characteristic polynomial of
Let be the estimate for By observing det weactually learn the difference det which in a certain sensereflects how far det differs from the ideal polynomial
For any let
318 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
With feedback control the closed-loop system takes theform
Since is in controller form,
where are elements of the row vector F :
Therefore, if is known, then comparing (6.3.10) with (6.3.3) givesthe solution to the pole assignment problem, where
We now solve the pole assignment problem by learning for the casewhere is unknown.
Let us combine the vector equation (6.3.9) for initial values to form
the matrix equation
Let In learning control, can be observed at any fixed
For any the observation of is denoted by
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 331/368
Application to Systems and Control 319
be the row vector composed of coefficients of
By (6.3.10)
composed of coefficients of
and respectively.Take a sequence of positive real numbers
and
Calculate the estimate for by the following RM algorithm with
expanding truncations:
with fixed
Theorem 6.3.1 Assume that is controllable and is in the con-
troller form. Further, assume the following conditions A6.3.1 and A6.3.2
hold:
A6.3.1 The components of
of in (6.3.13) are mutually independent with
A6.3.2
where is the same as that in A6.3.1.Then there is with such that for each as
Similarly, define row vectors
for some
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 332/368
From here it is seen that is a sum of products of elements from with +1 and –1 as
multiple for each product, where and denote elements of A andrespectively. It is important to note that each product inincludes at least one of as its factor. Thus, the product is of the form
From (6.3.21) by (6.3.18), (6.3.15), and (6.3.13) it follows that
Therefore, the conclusion of the theorem will follow from Theorem 2.2.1,if we can show that for any integer N
320 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
where is the desired feedback gain realizing the exact pole
assignment.
Proof. Define
where and are given by (6.3.14) and (6.3.17), respectively.By (6.3.11) and (6.3.16) it follows that
Thus, (6.3.19) and (6.3.20) become
It is clear that the recursive algorithm for has the same structure
as (2.1.1)–(2.1.3). For the present case, as function required inA2.2.2 we may take
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 333/368
Application to Systems and Control 321
where
By A6.3.1 we have
whereBy A6.3.2 and the convergence theorem for martingale difference se-
quences it follows that
for any integer which implies (6.3.24).
6.4. Application to Adaptive Regulation
We now apply the SA method to solve the adaptive regulation problemfor a nonlinear nonparametric system.
Consider the following system
where is the system state, is the control, andis an unknown nonlinear function with being
the unknown equilibrium for the system (6.4.1).Assume the state is observed, but the observations are corruptedby noise:
where is the observation noise, which may depend onThe purpose of adaptive regulation is to define adaptive control based
on measurements in order the system state to reach the desired value,
which, without loss of generality, may be assumed to be equal to zero.We need the following conditions.
A6.4.1 and
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 334/368
322 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
A6.4.2 The upper bound for is known, i.e., and is
robust stabilizing control in the sense that for any the state
tends to zero for the following system
A6.4.3 The system (6.4.1) is BIBS stable, i.e., for any bounded input,the system state is also bounded;
A6.4.4 is continuous for bounded i.e., for any
A6.4.5 The system (6.4.1) is strictly input passive, i.e., there are and such that for any input
A6.4.6 For any convergent subsequence
where is defined by (1.3.2).
It is worth noting that A6.4.6 becomes
if is independent of The adaptive control is given according to the following recursive
algorithm:
where b is specified in A6.4.2.
Theorem 6.4.1 Assume A6.4.1 – A6.4.6. Then the system (6.4.1), (6.4.2),
and (6.4.4) has the desired properties:
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 335/368
Application to Systems and Control 323
at sample paths where A6.4.6 holds.
Proof. Let be a convergent subsequence of such that
and
We have
for sufficiently large and small enough T , where is a constant to bespecified later on. The relationships (6.4.5) and (6.4.6) can be provedalong the lines of the proof for Theorem 2.2.1, but here is known to bebounded, and (6.4.5) and (6.4.6) can be proved more straightforwardly.We show this.
Since the system (6.4.1) is BIBS, from it follows that thereis such that
By A6.4.6 for large and small T > 0,
This implies that
Let be large enough such that
and let T be small enough such that
Then we have
and hence there is no truncation in (6.4.4) for i.e., (6.4.5) holdsfor Therefore,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 336/368
indeed.By induction, the assertions (6.4.5) and (6.4.6) have been proved.We now show that for any convergent subsequence
there is a such that
from (6.4.4) it follows that (6.4.5) holds for Hence,
Thus, (6.4.5) and (6.4.6) hold for Assume they are true for allWe now show that they are true for
too.Since
for small enough T > 0.By A6.4.5, we have
Let us restrict in (6.4.8) to Then forsmall T and large from (6.4.6) and (6.4.8) it follows that
324 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
and (6.4.6) is true for
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 337/368
Since and it is seen that
Using a partial summation, by (6.4.9) we have
for all sufficiently large and small enough T > 0.Set
for
This implies that there exist a and a sufficiently large whichmay depend on but is independent of such that
Application to Systems and Control 325
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 338/368
326 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Then (6.4.10) implies that
This proves (6.4.7).Define
From (6.4.7) it follows that
for convergent subsequenceUsing A6.4.6 and (6.4.11), by completely the same argument as that
used in the proof (Steps 3– 6) of Theorem 2.2.1, we conclude that
Finally, write (6.4.1) as
By A6.4.4 and the boundedness of we have
and by A6.4.2 we conclude
Remark 6.4.1 It is easy to see that A6.4.6 is also necessary if A6.4.1–A6.4.5 hold and and This is because for largethe observation noise can be expressed as
and hence
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 339/368
6.5. Notes and ReferencesFor system identification and adaptive control we refer to [10, 23, 54,
62, 75, 90]. The identification problem stated in Section 6.1 was solved in[72] by ODE method. In comparison with [72], conditions used here haveconsiderably been weakened, and the convergence is proved by the TSmethod rather than the ODE method. Section 6.1 is based on the jointwork by H. F. Chen, T. Duncan and B. Pasik-Duncan. The existenceand uniqueness of the solution to (6.1.50) can be found, e.g., in [23]. Forstochastic quadratic control refer to [2, 10, 12, 33].
Adaptive stabilization for stochastic systems is dealt with in [5, 55, 77].
The convergence of WLS and adaptive stabilization using WLS are givenin [55]. The problem is solved by the SA method in [19]. This approachis presented in Section 6.2.
The pole assignment problem for stochastic system with unknowncoefficients is solved by SA with the help of learning in Section 6.3,which is based on [20]. For concept of linear control systems we refer to
Application to Systems and Control 327
which tends to zero as since and
Remark 6.4.2 In the formulation of Theorem 6.4.1 the condition A6.4.5
can be replaced either by (6.4.7) or by (6.4.11), which are the conse-
quences of A6.4.5. Further, the quadratic can be replaced by acontinuously differentiable function such thatand In this case, in (6.4.7) should becorrespondingly replaced by
Example 6.4.1 Let the nonlinear system be affine:
where the scalar nonlinear function is bounded from above and frombelow by positive constants:
Note that and hence(6.4.7) holds, if Assume is known:Then A6.4.2, A6.4.3, and A6.4.4 are satisfied. Therefore, if satisfiesA6.4.6, then given by (6.4.4) leads to and
In the area of system and control, the SA methods also are successfullyapplied in discrete event dynamic systems, especially, to the perturbationanalysis based parameter optimization.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 340/368
328 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
[1, 46, 60]. The connection between the feedback gain and coefficients of the desired characteristic polynomial is called the Ackermann’s formula,which can be found in [46].
Application of SA to adaptive regulation is based on [26].
For perturbation analysis of discrete event dynamic systems we referto [58]. The perturbation analysis based parameter optimization is dealtwith in [29, 86, 87].
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 341/368
Appendix A
In Appendix A we introduce the basic concept of probability theory. Results arepresented without proof. For details we refer to [31, 32, 70, 76, 84].
A.1. Probability SpaceThe basic space is denoted by The point is called elementary event or
sample. The point set in is denoted by A,
Let be a family of sets in satisfying the following conditions:1.
2.
3.Then, is called the or The element A of is called themeasurable set, or random event, or event.
As a consequence of Properties 2 and 3,
then the complement of A, also belongs toIf
If
if
A set function defined on is called -additive if for any
sequence of disjoint events By definition, one of the values or isnot allowed to be taken by
A nonnegative set function is called a measure.Define
The set functions and are called the upper, lower, and totalvariation of on respectively.
Jordan-Hahn Decomposition Theorem If is on then there
exists a set D such that, for any
and are measures andLet P be a set function defined on with the following properties.1.
2.
329
then
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 342/368
330 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
3. if are disjoint. Then, P is called a
probability measure on The triple is called a probability space.PA is called the probability of random event A.
It is assumed that any subset of a measurable set of probability zero is measurableand its probability is zero. After such a completion of measurable sets the resulting
probability space is called completed.If a relationship between random variables holds for any with possible exception
of a set with probability zero, then we say this relationship holds a.s. (almost surely)
or with probability one.
A.2. Random Variable and Distribution FunctionIn R, the real line, the smallest containing all intervals is called the Borel
and is denoted by The “smallest” means that if there is acontaining all intervals, then there must be in the sense that for any
The Borel can also be defined in Any set in or iscalled the Borel set.
Any interval can be endowed with a measure equal to its length. This measurecan be extended to each i.e., to each Borel set. Any subset of a set with
measure zero is also assumed to be a measurable set with measure zero. After such acompletion, the measurable set is called Lebesgue measurable, and the measure theLebesgue measure. In what follows always means the completed Borel
A real function defined on is called measurable, if
If is a real measurable function defined on andthen is called a random variable. Therefore, if is a measurable function, then
is also a random variable if Let be a random variable. The distribution function of is defined as
By a random vector we mean that each component
of is a random variable.The distribution function of a random vector is defined as
If is differentiable, then its derivative is called the densityof The density of a random vector is defined by a similar way. The density of l-dimensional normal distribution is defined by
A.3. ExpectationLet be a random variable and let
Define
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 343/368
APPENDIX A 331
whereis called the expectation of
For an arbitrary random variable define
The expectation of is defined as
if at least one of and is finite .If then is called integrable.The expectation of can be expressed by a Lebesgue-Stieltjes integral with respect
to its distribution function
In the density of l-dimensional random vector with normal distribution,
A.4. Convergence Theorems and InequalitiesLet be a sequence of random variables and be a random variable.If then we say that converges to and write
If for any then we say that converges to in
probability and writeIf the distribution functions of converge to at any where
is continuous, then we say weakly (or in distribution) converges to and write
If then we say converges to in the mean square sense and
write l.i.m.implies which in turn implies
Monotone Convergence Theorem If random variables nondecreasingly(nonincreasingly) converge to andthen
Dominated Convergence Theorem If and there exists an inte-
grable random variable such that then and
Fatou Lemma If for some random variable withthen
If is a measurable function, then
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 344/368
332 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Chebyshev Inequality
Lyapunov Inequality
Hölder Inequality
In the special case where the Hölder inequality is called the Schwarzinequality.
A.5. Conditional Expectat ionLet be a probability space. is called a of if is a
and by which it is meant that any impliesRadon-Nikodym Theorem Let be a of For any random
variable with at least one of and being finite, there is an uniquemeasurable random variable denoted by such that for any
The random variable satisfying the above equality is called condi-tional expectation of given
Let be the smallest (see A.2) containing all setsis called the generated by
The conditional expectation of given is defined as
Let A be an event. Conditional probability of A given is defined by
Properties of the conditional expectation are listed below.1) for constants and2)3) if is and
4) if
5) if
Convergence theorems and inequalities stated in A.4 remain true with expectationreplaced by the conditional expectation For example, the conditional
Hölder inequality
forFor a sequence of random variables and a the consistent
conditional distribution functions of given
Let and Then
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 345/368
APPENDIX A 333
can be defined such that i) they are for any and anyfixed ii) they are distribution functions for any fixed and iii) for anymeasurable function
A.6. IndependenceLet be a sequence of events.If for any set of indices
then is called mutually independent.Let be a sequence of If events are mutually
independent whenever then the family of is called mutually independent.
Let be a sequence of random variables and let be the generatedby If is mutually independent, then the sequence of random variablesis called mutually independent.
Law of iterated logarithm Let be a sequence of independent and identically
distributed (iid) random variables, Then
Proposition A.6.1 Let be a measurable function defined on
If the l-dimensional random vector is independent of the m-dimensional ran-
dom vector then
where
From this proposition it follows that
if is independent of
A. 7. ErgodicityLet be a sequence of random variables and let be the
distribution function of If for any integer then
is called stationary, or is a stationary process.
Proposition A.7.1 Let be stationary.
provided exists for all in the range of
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 346/368
334 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
If exists, then
where is a of and is called invariant
If then the stationary process is called ergodic. Thus, forstationary and ergodic process we have
If is a sequence of mutually independent and identically distributed (and hence
stationary) random variables, then and the sequence is ergodic.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 347/368
Appendix B
In Appendix B we present the detailed proof of convergence theorems for martin-
gales and martingale difference sequences.Let be a sequence of random variables, and let be a family of nonde-
creasing i.e.,
If is for any then we write and call it as an adaptedprocess.
An adapted process with is called a martingale if a supermartingale if and a submartingale if
An adapted process is called a martingale difference sequence (MDS) if
A sequence of mutually independent random vectors with is anobvious example of MDS.
An integer-valued measurable function is called a Markov time with respect toif
If, in addition, then is called a stopping time.
B.1. Convergence Theorems for MartingaleLemma B.1.1 Let be adapted, a Markov time, and B a Borel set. Letbe the first time at which the process hits the set B after time i.e.,
Then is a Markov time.Proof. The conclusion follows from the following expression:
335
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 348/368
336 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
For defining the number of up-crossing of an interval by a submartingale
we first define
The largest for which is called the number of up-crossing of the interval
by the process and is denoted by
By Lemma B.1.1
So, is a Markov time.
Assume is a Markov time. Again, by Lemma B.1.1,
and
Therefore, all are Markov times.Theorem B.1.1 (Doob) For submartingales the following inequalities
hold
where
Proof. Note that equals the number of up-crossing of the interval
by the submartingale or bySince for
is a submartingale.
Thus, without loss of generality, it suffices to prove that for a nonnegative sub-
martingale
Define
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 349/368
APPENDIX B 337
Define also Then for even crosses (0, b) from time toTherefore,
and
Further, the set is since is a Markov time,and
Taking expectation of both sides of (B-l-2) yields
where the last inequality holds because is a submartingale and hence theintegrand is nonnegative.
Thus (B.1.1) and hence the theorem is proved.Theorem B.1.2 (Doob) Let be a submartingale with
a.s.Then there is a random variable with such that
Proof. Set
Assume the converse:Then
where and run over all rational numbers.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 350/368
338 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By the converse assumption there exist rational numbers such that
Let be the number of up-crossing of the interval byBy Theorem B.1.1
By the monotone convergence theorem from (B-1-4) it follows that
However, (B.1.3) implies which contradicts (B.1.5). Hence,
and
where is invoked. Hence,Corollary B.1.1 If is a nonnegative supermartingale or nonpositive sub-
martingale, then
Because for nonpositive submartingales the corollary follows from the the-orem; while for a nonnegative supermartingale is a nonpositivesubmartingale.
Corollary B.1.2 If is a martingale with thenand
This is because for a martingale andand hence
or converges to a limit which is finite a.s.By Fatou lemma it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 351/368
APPENDIX B 339
B.2. Convergence Theorems for MDS ILet be an adapted process, and let G be a Borel set in
Then the first exit time from G defined by
is a Markov time. This is because
Lemma B.2.1. Let be a martingale (supermartingale, submartingale)and a Markov time. Then the process stopped at is again a martingale(supermartingale, submartingale), where
Proof. Note that
is
If is a martingale, then
This shows that is a martingale. For supermartingales and submartin-gales the proof is similar.
Theorem B.2.1. Let be a one-dimensional MDS. Then as
converges on
Proof. Since is the first exit time
is a Markov time and by Lemma B.2.1 is a martingale, where M is apositive constant.
Noticing that and that
is we find
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 352/368
340 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
By Corollary B.1.2 converges as It is clear that onTherefore, as pathwisely converges on Since M is
arbitrary, converges on which equals A.
Theorem B.2.2. Let be an MDS and If
then converges on If then
converges on
Proof. It suffices to prove the first assertion, because the second one is reduced tothe first one if is replaced byDefine
By Lemma B.2.1 is a martingale. It is clear that
Consequently,
By Theorem B.1.2 converges asSince on as converges on and
consequently on which equals
B.3. Borel-Cantelli-Lévy LemmaTheorem B.3.1. (Borel-Cantelli-Lévy Lemma) Let be a sequence of
events, Then if and only if or equivalently,
Proof. Define
Clearly, is a martingale and is an MDS.Since by Theorem B.2.2, converges on
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 353/368
APPENDIX B 341
If then from (B.3.2) it follows that which implies that
converges. Then, this combining with by (B.3.2) yields
Conversely, if then from (B.3.2) it follows that
Noticing that is contained in the set where converges by
Theorem B.2.2, from the convergence of by (B.3.2) it follows that
If are mutually independent and then
Proof. Denote by the generated by
If then
and hence which, by (B.3.1), implies (B.3.3).
When are mutually independent, then
B.4. Convergence Criteria for AdaptedSequences
Let be an adapted process.Theorem B.4.1 Let be a sequence of positive numbers. Then
where
Consequently, implies andfollows from (B.3.1).
Theorem B.3.2 (Borel-Cantelli Lemma) Let be a sequence of events. If
then the probability that occur infinitely often is zero, i.e.,
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 354/368
342 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
Proof. Set
By Theorem B.3.1
or
This means that A is the set where events may occur only finitely many times.
Therefore, on A the series converges if and only if converges.
Theorem B.4.2 (Three Series Criterion) Denote by S the where the
following three series converge:
and
where c is a positive constant.
Then converges on S as
Proof. Taking in (B.4.1), we have and
by Theorem B.4.1.Define
Since converges on S, from (B.4.2) it follows that
Noticing that is an MDS and
we see
By Theorem B.2.1 converges on S, or
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 355/368
APPENDIX B 343
Then from (B.4.3) it follows that
or converges ).
B.5. Convergence Theorems for MDS IILet be an MDS.
Theorem B.5.1 (Y. S. Chow) converges on
Proof. By Theorem B.4.2 it suffices to prove where S is defined in Theo-rem B.4.2 with replaced by considered in the present theorem.
We now verify that three series defined in Theorem B.4.2 are convergent on A if is replaced byFor convergence of the first series it suffices to note
For convergence of the second series, taking into account we find
Finally, for convergence of the last series it suffices to note
and
by the conditional Schwarz inequality.Theorem B.5.2. The conclusion of Theorem B.5.1 is valid also forProof. Define
Then we have
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 356/368
344 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
on A where A is still defined by (B-5-1) but with
Applying Theorem B.5.1 with to the MDS leads to that
converges on A, i.e.,
This is equivalent to
B.6. Weighted Sum of MDSTheorem B.6.1 Let be an l-dimensional MDS and let be a
matrix adapted process. If
for some then as
where
Proof. Without loss of generality, assume
Notice that convergence of implies convergence of since for
sufficiently largeConsequently, from (B.5.2) it follows that
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 357/368
APPENDIX B 345
We have the following estimate:
By Theorems B.5.1 and B.5.2 it follows that
where
Notice that is nondecreasing as If is bounded, then the conclusion
of the theorem follows from (B.6.1). If then by the Kronecker lemma(see Section 3.4) the conclusion of the theorem also follows from (B.6.1).
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 358/368
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
B. D. O. Anderson and T. B. Moore, Optimal Control: Linear Quadratic Methods,
Prentice-Hall, N. J., 1990.
K. J. Åström, Introduction to Stochastic Control, Academic Press, New York,1970.
M. Benaim, A dynamical systems approach to stochastic approximation, SIAMJ. Control & Optimization, 34:437–472, 1996.
A. Benveniste, M. Metivier and P. Priouret, Adaptive Algorithms and Stochastic
Approximation, Springer-Verlag, New York, 1990.
B. Bercu, Weighted estimation and tracking for ARMAX models, SIAM J. Con-
trol & Optimization, 33:89–106, 1995.
P. Billingsley, Convergence of Probability Measures, Wiley, New York, 1968.
J. R. Blum, Multidimensional stochastic approximation, Ann. Math. Statist.,
9:737–744, 1954.
V. S. Borkar, Asynchronous stochastic approximations, SIAM J. Control and
Optimization, 36:840–851, 1998.
O. Brandière and M. Duflo, Les algorithmes stochastiques contournents-ils les
pièges? Ann. Inst. Henri Poincaré, 32:395–427, 1996.
P. E. Caines, Linear Stochastic Systems, Wiley, New York, 1988.
H. F. Chen, Recursive algorithms for adaptive beam-formers, Kexue Tongbao
(Science Bulletin), 26:490–493, 1981.
H. F. Chen, Recursive Estimation and Control for Stochastic SyNew York, 1985.
stems, Wiley,
H. F. Chen, Asymptotic efficient stochastic approximation, Stochastics and
Stochastics Reports, 45:1–16, 1993.
347
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 359/368
348 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
H. F. Chen, Stochastic approximation and its new applications, Proceedingsof 1994 Hong Kong International Workshop on New Directions of Control andManufacturing, 1994, 2–12.
H. F. Chen, Convergence rate of stochastic approximation algorithms in thedegenerate case, SIAM J. Control & Optimization, 36:100–114, 1998.
H. F. Chen, Stochastic approximation with non-additive measurement noise, J.
of Applied Probability, 35:407–417, 1998.
H. F. Chen, Convergence of SA algorithms in multi-root or multi-extreme cases,
Stochastics and Stochastics Reports, 64: 255–266, 1998.
H. F. Chen, Stochastic approximation with state-dependent noise, Science inChina (Series E), 43:531–541, 2000.
H. F. Chen and X. R. Cao, Controllability is not necassry for adaptive poleplacement control, IEEE Trans. Autom. Control, AC-42:1222–1229, 1997.
H. F. Chen and X. R. Cao, Pole assignment for stochastic systems with unknowncoefficients, Science in China (Series E), 43:313–323, 2000.
H. F. Chen, T. Duncan, and B. Pasik-Duncan, A Kiefer-Wolfowitz algorithmwith randomized differences, IEEE Trans. Autom. Control, AC-44:442–453, 1999.
H. F. Chen and H. T. Fang, Nonconvex stochastic optimization for model reduc-tion, Global Optimization, 2002.
H. F. Chen and L. Guo, Identification and Stochastic Adaptive Control,Birkhäuser, Boston, 1991.
H. F. Chen, L. Guo, and A. J. Gao, Convergence and robustness of the Robbins-Monro algorithm truncated at randomly varying bounds, Stochastic Processesand Their Applications, 27:217–231, 1988.
H. F. Chen and R. Uosaki, Convergence analysis of dynamic stochastic approx-
imation, Systems and Control Letters, 35:309–315, 1998.
H. F. Chen and Q. Wang, Adaptive regulator for discrete-time nonlinear non-
parametric systems, IEEE Trans. Autom. Control, AC-46: , 2001.
H. F. Chen and Y. M. Zhu, Stochastic approximation procedures with randomlyvarying truncations, Scientia Sinica (Series A), 29:914–926, 1986.
H. F. Chen and Y. M. Zhu, Stochastic Approximation (in Chinese), ShanghaiScientific and Technological Publishers, Shanghai, 1996.
E. K. P. Chong and P. J. Ramadge, Optimization of queues using an infinitesi-mal perturbation analysis-based stochastic algorithm with general update times,SIAM J. Control & Optimization, 31:698–732, 1993.
Y. S. Chow, Local convergence of martingales and the law of large numbers,Ann. Math. Statst. 36:552–558, 1965.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 360/368
REFERENCES 349
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
Y. S. Chow and H. Teicher, Probablility Theory: Independence, Interchangeability, Martingales, Springer Verlag, New York, 1978.
K. L. Chung, A Course in Probability Theory, (second edition), Academic Press,New York, 1974.
M. H. A. Davis, Linear Estimation and Stochastic Control, Chapman and Hall,New York, 1977.
K. Deimling, Nonlinear Functional Analysis, Springer, Berlin, 1985.
B. Delyon and A. Juditsky, Stochastic optimization with averaging of trajecto
ries, Stochastics and Stochastics Reports, 39:107–118, 1992.
E. F. Deprettere (eds.), SVD and Signal Processing, Elsevier, HorthHolland,1988.
N. Dunford and J. T. Schwartz, Linear Operators, Part 1: General Theory, Wiley
Interscience, New York, 1966.
V. Dupač, A dynamic stochastic methods, Ann. Math. Statist. 36:1695–1702.
V. Dupač, Stochastic approximation in the presense of trend, Czeshoslovak Math.
J., 16:454–461, 1966.
A . Dvoretzky, On stochastic approximation, Proceedings of the Third Berkeley
Symposium on Mathematical Statistics and Probability, pp. 39–55, 1956.
S. N. Ethier and T. G. Kurtz, Markov Processes: Characterization and Conver
gence, Wiley, New York, 1986.
E. Eweda, Convergence of the sign algorithm for adaptive filtering with corre
lated data, IEEE Trans. Information Theory, IT37:14501457, 1991.
V. Fabian, On asymptotic normality in stochastic approximation, Ann. of Math.
Statis., 39: 1327–1332, 1968.
V. Fabian, On asymptotically efficient recursive estimation, Ann. Statist., 6:854–856, 1978.
V. Fabian, Simulated annealing simulated, Computers Math. Applic., 33:81–94,
1997.
F. W. Fairman, Linear Control Theory, The State Space Approach, Wiley, Chich
ester, 1998.
H. T. Fang and H. F. Chen, Sharp convergence rates of stochastic approximation
for degenerate roots, Science in China (Series E), 41:383–392, 1998.
H. T. Fang and H. F. Chen, Stability and instability of limit points of stochastic
approximation algorithms, IEEE Trans. Autom. Control, AC45:413–420, 2000.
H. T. Fang and H. F. Chen, An a.s. convergent algorithm for global optimization
with noise corrupted observations, J. Optimization and Its Applications, 104:343–
376, 2000.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 361/368
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 362/368
REFERENCES 351
[68]
[69]
[70]
[71]
[72]
[73]
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
[85]
H. J. Kushner and G. Yin, Stochastic Approximation Algorithms and Applica-tions, Springer-Verlag, New York, 1997.
J. P. LaSaller and Lefchetz, Stability by Lyapunov’s Direct Methods with Ap-
plications, Academic Press, New York, 1961.
R. Liptser and A. N. Shiryaev, Statistics of Random Processes, Springer-Verlag,New York, 1977.
R. Liu, Blind signal processing: An introduction, Proceedings 1996 Intl. Symp.Circuits and Systems, Vol. 2, 81–83, 1996.
L. Ljung, Analysis of recursive stochastic algorithms, IEEE Trans. Autom. Con-
trol, AC-22:551-575, 1977.
L. Ljung, On positive real transfer functions and the convergence of some recur-sive schemes, IEEE Trans. Autom. Control, AC-22:539–551, 1977.
L. Ljung, G. Pflug, and H. Walk, Stochastic Approximation and Optimizationof Random Systems, Birkhäuser, Basel, 1992.
L. Ljung and T. Söderström, Theory and Practice of Recursive Identification,MIT Press, Cambridge, MA, 1983.
M. Loéve, Probability Theory, Springer, New York, 1977–1978.
R. Lozano and X. H. Zhao, Adaptive pole placement without excitation probingsignals, IEEE Trans. Autom. Control, AC-39:47–58, 1994.
M. B. Nevelson and R. Z. Khasminskii, Stochastic Approximation and Recur-
sive Estimation, Amer. Math. Soc., Providence, RI, 1976, Translation of Math.
Monographs, Vol. 47.
E. Oja, Subspace Methods of Pattern Recognition, 1st ed., Letchworth, ResearchStudies Press Ltd., Hertfordshire, 1983.
B. T. Polyak, New stochastic approximation type procedures, (in Russian) Au-tom. i Telemekh., 7:98–107, 1990.
B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation byaveraging, SIAM J. Control & Optimization, 30:838–855, 1992.
H. Robbins and S. Monro, A stochastic approximation method, Ann. Math.
Statist., 22:400–407, 1951.
D. Ruppert, Stochastic approximation, In B. K. Ghosh and P. K. Sen, Editors,
Handbook in Sequential Analysis, 503–529, Marcel Dekker, New York, 1991.
A. N. Shiryaev, Probability, Springer, New York, 1984.
J. C. Spall, Multivariate stochastic approximation using a simultaneous pertur-
bation gradient approximation, IEEE Trans. Autom. Control, AC-37:331–341,
1992.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 363/368
352 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
[86]
[87]
[88]
[89]
[90]
[91]
[92]
[93]
[94]
[95]
[96]
[97]
[98]
[99]
Q. Y. Tang and H. F. Chen, Convergence of perterbation analysis based optimiza-tion algorithm with fixed-number of customers period, Discrete Event DynamicSystems, 4:359–373, 1994.
Q. Y. Tang, H. F. Chen, and Z. J. Han, Convergence rates of perturbation-
analysis-Robbins-Monro-Single-run algorithms, IEEE Trans. Autom. Control,AC-42:1442–1447, 1997.
J. N. Tsitsiklis, Asynchronous stochastic approximation and Q-learning MachineLearning, 16:185–202, 1994.
N. J. Tsitsiklis, D. P. Bertsekas, and M. Athans, Distributed asynchronous de-terministic and stochastic gradient optimization algorithms, IEEE Trans. Autom.
Control, 31:803–812, 1986.
Ya. Z. Tsypkin, Adaptation and Learning in Automatic Systems, AcademicPress, New York, 1971.
K. Uosaki, Some generalizations of dynamic stochastic approximation processes,Ann. Statist., 2:1042–1048, 1974.
J. Venter, An extension of the Robbins-Monro procedure, Ann. Math. Stat.,38:181–190, 1967.
G. J. Wang and H. F. Chen, Behavior of stochastic approximation algorithm
in root set of regression function, Systems Science and Mathematical Sciences,12:92–96, 1999.
I. J. Wang, E. K. P. Chong and S. R. Kulkarni, Equivalent necessary and suffi-
cient conditions on noise sequences for stochastic approximation algorithms, Adv.Appl. Probab., 28:784–801, 1996.
C. Z. Wei, Multivariate adaptive stochastic approximation, Ann. Stat., 15:1115–1130, 1987.
G. Xu, L. Tong, and T. Kailath, A least squares approach to blind identification,IEEE Trans. Signal Processing, SP-43:2982–2993, 1995.
S. Yakowitz, A globally convergent stochastic approximation, SIAM J. Control
& Optimization, 31:30–40, 1993.
G. Yin, On extensions of Polyak’s averaging approach to stochastic approxima-tion, Stochastics and Stochastics Reports, 36:245–264, 1991.
G. Yin and Y. M. Zhu, On w.p.l. convergence of a parallel stochastic approxi-mation algorithm, Probability in the Eng. and Infor. Sciences, 3:55–75, 1989.
[100] R. Zeilinski, Global stochastic approximation: A review of results and someopen problems. In F. Archetti and M. Cugiani (eds.), Numerical Techniques forStochastic Systems, 379–386, Northholland Publ. Co., 1980.
[101] J. H. Zhang and H. F. Chen, Convergence of algorithms used for principalcomponent analysis, Science in China (Series E), 40:597–604, 1997.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 364/368
REFERENCES 353
[102] K. Zhou, J. C. Doyle, and K. Glover, Robust Optimal Control, Prentice-Hall,New Jersey, 1996.
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 365/368
Index
50, 55, 247329
329329
Ackermann’s formula, 328adapted process, 335adapted sequence, 341adaptive control, 290, 303, 327adaptive filter, 288adaptive filtering, 265, 273adaptive regulation, 321adaptive stabilization, 305, 307, 314, 327adaptive stochastic approximation, 132,
149adaptively stabilizable, 310admissible controls, 302algebraic Riccati equation, 131ARMA process, 39Arzelá-Ascoli theorem, 11, 24asymptotic behavior, 194asymptotic efficiency, 95, 130, 132, 149asymptotic normality, 95, 113, 119, 127,
149, 210
asymptotic properties, 95, 166asymptotically efficient, 135asynchronous stochastic approximation,
219, 278, 288averaging technique, 132, 149
balanced realization, 210, 214balanced truncation, 214, 215blind channel identification, 219, 220, 223blind identification, 220Borel 330
Borel set, 330Borel-Cantelli Lemma, 341Borel-Cantelli-Lévy Lemma, 340
certainly-equivalency-principle, 304, 306
Chebyshev inequality, 332closure, 38conditional distribution function, 332conditional expectation, 332conditional probability, 332conditional Schwarz inequality, 343constant interpolating function, 13constrained optimization problem, 268controllable, 307, 317, 319controller form, 317–319convergence, 28, 36, 41, 153, 223, 331, 341convergence analysis, 6, 28, 95, 154convergence rate, 95, 96, 101–103, 105,
149convergence theorem for martingale dif-
ference sequences, 97, 128, 160,170, 185, 196, 231, 249, 321,339, 343
convergence theorem for nonnegative su-permartingales, 7–9
convergence theorems for martingale, 335convergent subsequence, 17, 18, 30, 36,
84, 86, 89, 178, 187, 237, 241,244, 271, 275, 280, 282, 283,
285, 287, 288, 297, 312, 315,322, 323
coprimeness, 306covariance matrix, 130, 132crossing, 18, 34, 188, 236, 312
degenerate case, 103, 149density, 330distribution function, 330dominant stability, 59, 62dominated convergence theorem, 331
dynamic stochastic approximation, 82, 93
equi-continuous, 15ergodic, 265, 268, 270, 273, 274, 334ergodicity, 333
355
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 366/368
356 STOCHASTIC APPROXIMATION AND ITS APPLICATIONS
event, 329expectation, 330
Fatou lemma, 331first exit time, 9, 339
general convergence theorems, 28global minimum, 177global minimizer, 174, 177, 180global optimization, 172–174, 218global optimization algorithm, 180, 194global optimizer, 152globally Lipschitz continuous, 292Gronwall inequality, 298
Hölder Inequality, 332Hankel matrix, 222Hankel norm approximation, 210, 214,
215Hessian, 8, 195
identification, 290integrable, 331interpolating function, 11invariant 334
Jordan-Hahn decomposition, 55, 56, 295,329
Kiefer-Wolfowitz (KW) algorithm, 151–153, 166, 173, 218
Kronecker lemma, 67, 144, 148, 345Kronecker product, 248KW algorithm with expanding trunca-
tions, 152, 154, 173–175
Law of iterated logarithm, 333Lebesgue measurable, 330Lebesgue measure, 330
Lebesgue-Stieltjes integral, 331linear interpolating function, 12Lipschitz continuous, 23Lipschitz-continuity, 160local search, 172, 173locally bounded, 17, 29, 96, 103, 133locally Lipschitz continuous, 50, 155, 163,
177, 280Lyapunov equation, 105Lyapunov function, 6, 8, 10, 11, 17, 111,
226, 268, 313
Lyapunov inequality, 144, 332Lyapunov theorem, 98
MA process, 171Markov time, 6, 335, 336, 339martingale, 335, 339, 340martingale convergence theorem, 6, 180,
297
martingale difference sequence, 6, 16, 42,97, 128, 134, 159, 164, 168,179, 185, 195–197, 231, 250,257, 294, 335
maxinizer, 151
measurable, 17, 29, 96, 103, 133measurable function, 330measurable set, 329measure, 329minimizer, 151mixing condition, 291model reduction, 210monotone convergence theorem, 331multi-extreme, 163, 164multi-root, 46, 57mutually independent, 333, 341
necessity of noise condition, 45non-additive noise, 49nondegenerate case, 96, 149nonnegative adapted sequence, 7nonnegative supermartingale, 6, 7, 338nonpositive submartingale, 338normal distribution, 113, 114, 330nowhere dense, 29, 35, 37, 41, 177, 181,
182, 280, 291
observation, 5, 17, 132, 321observation noise, 5, 103, 133, 175, 195,
321ODE method, 2, 10, 24, 327one-sided randomized difference, 172optimal control, 303optimization, 151optimization algorithm, 212ordinary differential equation (ODE), 10
pattern classification, 219perturbation analysis, 328pole assignment, 316, 318, 327principal component analysis, 238, 288probabilistic method, 4probability measure, 330probability of random event, 330probability space, 329, 330Prohorov’s theorem, 22, 24
Radon-Nikodym Theorem, 332
random noise, 10, 21random search, 172random variable, 330randomized difference, 152–154recursive blind identification, 246relatively compact, 22RM algorithm with expanding trunca-
tions, 28, 155, 309, 319
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 367/368
INDEX 357
Robbins-Monro (RM) algorithm, 1, 5, 8,11, 12, 17, 20, 45, 110, 310, 313
robustness, 67, 93
SA algorithm, 67
SA algorithm with expanding trunca-tions, 25, 40, 95, 290SA with randomly varying truncations, 93Schwarz inequality, 142, 332sign algorithms, 273, 288signal processing, 219, 265signed measure, 56, 295Skorohod representation, 23Skorohod topology, 21, 24slowly decreasing step sizes, 132spheres with expanding radiuses, 36
stability, 131stable, 96, 97, 102, 131, 133state-dependent, 42, 164state-dependent noise, 29, 57state-independent condition, 41, 42stationary, 265, 268, 270, 273, 274, 333step size, 5, 6, 17, 102, 132, 174stochastic approximation (SA), 1, 223,
226, 246stochastic approximation algorithm, 5,
307, 308stochastic approximation method, 321
stochastic differential equation, 126stochastic optimization, 211stopping time, 335strictly input passive, 322structural error, 10, 157
structural inaccuracy, 21submartingale, 335–337, 339subspace, 41, 226supermartingale, 335, 339surjection, 63system identification, 327
three series criterion, 342time-varying, 44trajectory-subsequence (TS) method, 2,
16, 21
truncated RM algorithm, 16, 17TS method, 28, 327
uniformly bounded, 15uniformly locally bounded, 41up-crossing, 336, 338
weak convergence method, 21, 24weighted least squares, 306weighted sum of MDS, 344Wiener process, 126
8/13/2019 Stochastic Approximation Applications
http://slidepdf.com/reader/full/stochastic-approximation-applications 368/368
Nonconvex Optimization and Its Applications
22.23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41
H. Tuy: Convex Analysis and Global Optimization. 1998 ISBN 0792348184D. Cieslik: Steiner Minimal Trees. 1998 ISBN 0792349830
N.Z. Shor: Nondifferentiable Optimization and Polynomial Problems. 1998
ISBN 0792349970
R. Reemtsen and J.J. Rückmann (eds.): SemiInfinite Programming. 1998
ISBN 0792350545
B. Ricceri and S. Simons (eds.): Minimax Theory and Applications. 1998
ISBN 0792350642
J.P. Crouzeix, J.E. MartinezLegaz and M. Volle (eds.): Generalized Convexitiy,
Generalized Monotonicity: Recent Results. 1998 ISBN 079235088X J. Outrata, M. Kočvara and J. Zowe: Nonsmooth Approach to Optimization Problems
with Equilibrium Constraints. 1998 ISBN 0792351703
D. Motreanu and P.D. Panagiotopoulos: Minimax Theorems and Qualitative Proper
ties of the Solutions of Hemivariational Inequalities. 1999 ISBN 0792354567
J.F. Bard: Practical Bilevel Optimization. Algorithms and Applications. 1999
ISBN 0792354583
H.D. Sherali and W.P. Adams: A ReformulationLinearization Technique for Solving
Discrete and Continuous Nonconvex Problems. 1999 ISBN 0792354877
F. Forgó, J. Szép and F. Szidarovszky: Introduction to the Theory of Games. Concepts,Methods, Applications. 1999 ISBN 0792357752
C.A. Floudas and P.M. Pardalos (eds.): Handbook of Test Problems in Local and
Global Optimization. 1999 ISBN 0792358015
T. Stoilov and K. Stoilova: Noniterative Coordination in Multilevel Systems. 1999
ISBN 0792358791
J. Haslinger, M. Miettinen and P.D. Panagiotopoulos: Finite Element Method for
Hemivariational Inequalities. Theory, Methods and Applications. 1999
ISBN 0792359518
V. Korotkich: A Mathematical Structure of Emergent Computation. 1999ISBN 0792360109
C.A. Floudas: Deterministic Global Optimization: Theory, Methods and Applications.
2000 ISBN 0792360141
F. Giannessi (ed.): Vector Variational Inequalities and Vector Equilibria. Mathemat
ical Theories. 1999 ISBN 0792360265
D. Y. Gao: Duality Principles in Nonconvex Systems. Theory, Methods and Applica
tions. 2000 ISBN 0792361453
C.A. Floudas and P.M. Pardalos (eds.): Optimization in Computational Chemistry
and Molecular Biology. Local and Global Approaches. 2000 ISBN 0792361555G Isac: Topological Methods in Complementarity Theory 2000 ISBN 0 7923 6274 8