Final Thesis Pk12
-
Upload
suman-pradhan -
Category
Documents
-
view
219 -
download
0
Transcript of Final Thesis Pk12
-
7/31/2019 Final Thesis Pk12
1/97
CHAPTER-1
INTRODUCTION:
1.1 BACK GROUND:
Outof many applications of adaptive filtering, direct modeling and inverse modeling are very
important. The direct modeling or system identification finds applications in control system
engineering including robotics [1], intelligent sensor design [2], process control [3], powersystem engineering [4], image and speech processing [4], geophysics [5], acoustic noise and
vibration control [6] and biomedical engineering [7]. Similarly inverse modeling technique is
used in digital data reconstruction [8], channel equalization in digital communication [9], digital
magnetic data recording [10], and intelligent sensor [2], deconvolution of seismic data [11]. The
direct modeling mainly refers to adaptive identification of unknown plants. Simple static linear
plants are easily identified through parameter estimation using conventional derivative based
least mean square (LMS) type algorithms [12]. But most of the practical plants are dynamic,
nonlinear and combination of these two characteristics. In many applications Hammerstein and
MIMO plants need identification. In addition the output of the plant is associated with
measurement or additive white Gaussian noise (AWGN). Identification of such complex plants
is a difficult task and poses many challenging problems. Similarly inverse modeling of
telecommunication and magnetic medium channels are also important for reducing the effect of
inter symbol interference (ISI) and achieving faithful reconstruction of original data. Similarly
adaptive inverse modeling of sensors is required to extend their linearity's for direct digital
readout and enhancement of dynamic range. These two important and complex issues are
addressed in the thesis and attempts have been made to provide improved efficient and alternate
promising solutions.
[1]
-
7/31/2019 Final Thesis Pk12
2/97
The conventional LMS and recursive least square (RLS) [13] techniques work well for
identification of static plants but when the plants are of dynamic type, the existing forward-
backward LMS [14] and the RLS algorithms very often lead to non optimal solution due to
premature convergence of weights to local minima [15]. This is a major drawback of the use of
existing derivative based techniques. To alleviate this burning issue this thesis suggests the use
of derivative free optimization techniques in place of conventional techniques.
In recent past population based optimization techniques have been reported which fall under the
category of evolutionary computing [16] or computational intelligence [17]. These are also
called bio-inspired techniques which include genetic algorithm (GA) and its variants [18],
Differential Evolution [19]. These techniques are suitably employed to obtain efficient iterative
learning algorithms for developing adaptive direct and inverse models of complex plants and
channels.Development of direct and inverse adaptive models essentially consists of two components. The
first component is an adaptive network which may be linear or nonlinear in nature. Use of a
nonlinear network is preferable when nonlinear plants or channels are to be identified or
equalized. The linear networks used in the thesis are adaptive linear combiner or all-zero or FIR
structure [7]under nonlinear category GA and DE are used.
1.2 MOTIVATIONIn summary the main motivations of the research work carried in the present thesis are the
following:
i. To formulate the direct and inverse modeling problems as error square optimization
problems
ii. To introduce bio-inspired optimization tools such as GA and DE and their variants to
efficiently minimize the squared error cost function of the models. In other words todevelop alternate identification scheme.
iii. To achieve improved identification (direct modeling) of complex nonlinear and channel
equalization (inverse modeling) of nonlinear noisy digital channels by introducing new
and improved updating algorithms.
[2]
-
7/31/2019 Final Thesis Pk12
3/97
1.3 MAJOR CONTRIBUTION OF THE THESIS
The major contribution of the thesis is outlined below
i. The GA based approach for both linear and nonlinear system identifications are
introduced. The GA based approach is found to be more efficient for nonlinear system
than other standard derivative based learning. In addition the DE based identification
have been proposed and shown to have better performance and involve less
computational complexity.
ii. The GA based approach for linear and nonlinear channel equalizations are introduced.
The GA based approach is found to be more efficient than other standard derivative
based learning. In addition DE based equalizers have been proposed and shown to have
better performance and involve less computational complexity.
1.4 CHAPTER WISE CONTRIBUTION
The research work undertaken is embodied in 7 Chapters.
Chapter-1 gives an introduction to System identification, channel equalization and reviews
of various learning algorithm such as Least-mean-square (LMS) algorithm, Recursive-
least-square (RLS) algorithm, Artificial Neural Network (ANN), Genetic Algorithm (GA),
Differential Evolution (DE) used to identify the system and train to the equalizer. It also
includes the motivation behind undertaking the thesis work.
Chapter-2 Discusses about the general form of adaptive algorithm, Adaptive filtering
problem, derivative based algorithm such as LMS and overview of derivative free basedalgorithm such as Genetic Algorithm and Differential Evolution.
Chapter-3 Discusses various system identification technique, Develop the algorithm of GA
for simulation on system identification and taking a comparison study between LMS and
GA on both linear and nonlinear system.
[3]
-
7/31/2019 Final Thesis Pk12
4/97
Chapter-4 Discusses various channel equalization technique, Develop the algorithm of GA
for simulation on channel equalization and taking a comparison between LMS and GA on
both linear and nonlinear channel.
Chapter-5 Develop the algorithm of DE for simulation on system identification and taking
a comparison between LMS, GA and DE on both linear and nonlinear system.
Chapter-6 Develop the algorithm of DE for simulation on channel equalization and taking a
comparison between LMS, GA and DE on both linear and nonlinear channel equalizers.
Chapter-7 deals with the conclusion of the investigation made in the thesis. This chapter
also suggests some future research related to the topic.
[4]
-
7/31/2019 Final Thesis Pk12
5/97
CHAPTER-2GENETIC ALGORITHM AND
DIFFERENTIAL EVOLUTION
2.1 INTRODUCTION:
There are many learning algorithms which are employed to train various adaptive models. The
performance of these models depends on rate of convergence, training time, Computational
complexity involved and minimum mean square error achieved after training. The learning
algorithms may be broadly classified into two categories (a) derivative based (b) derivative free.
The derivative based algorithms include least means square (LMS), IIR LMS (ILMS), back
propagation (BP) and FLANN-LMS. Under the derivative free algorithms, genetic algorithm
(GA), differential evolution (DE), particle swarm optimization (PSO), bacterial foraging
optimization (BFO) and artificial immune system (AIS) have been employed. In this section the
details of LMS, GA and DE algorithms are outlined in sequel.
2.2 GRADIENT BASED ADAPTIVE ALGORITHIM:An adaptive algorithm is a procedure for adjusting the parameters of an adaptive filter to
minimize a cost function chosen for the task at hand. In this section, we describe the general
form of many adaptive FIR filtering algorithms and present a simple derivation of the LMS
adaptive algorithm. In our discussion, we only consider an adaptive FIR filter structure. Such
systems are currently more popular than adaptive IIR filters because
1. The input-output stability of the FIR filter structure is guaranteed for any set of fixed
coefficients, and2. The algorithms for adjusting the coefficients of FIR filters are simpler in general than
those for adjusting the coefficients of IIR filters.
[5]
-
7/31/2019 Final Thesis Pk12
6/97
2.2.1 GENERAL FORM OF ADAPTIVE FIR ALGORITHMS:
The general form of an adaptive FIR filtering algorithm is
W(n+1)=W(n) + (n)G(e(n),X(n),(n)) (2.1)
Where G(-) is a particular vector-valued nonlinear function, (n) is a step sizeparameter, e(n)
and X(n) are the error signal and input signal vector, respectively, and (n) is a vector of states
that store pertinent information about the characteristics of the input and error signals and/or the
coefficients at previous time instants. In the simplest algorithms, (n) is not used, and the only
information needed to adjust the coefficients at time n are the error signal, input signal vector,
and step size.
The step size is so called because it determines the magnitude of the change or step that is
taken by the algorithm in iteratively determining a useful coefficient vector. Much research
effort has been spent characterizing the role that (n) plays in the performance of adaptive
filters in terms of the statistical or frequency characteristics of the input and desired response
signals. Often, success or failure of an adaptive filtering application depends on how the value
of(n) is chosen or calculated to obtain the best performance from the adaptive filter.
2.2.2 THE MEAN-SQUARED ERROR COST FUNCTION:
The form ofG(-) depends on the cost function chosen for the given adaptive filtering task. We
now consider one particular cost function that yields a popular adaptive algorithm. Define the
mean-squared error(MSE) cost function as
( ) ( )21( ) ( ) ( )2mse nJ n e n p e n de n+
= (2.2)
= ( )212Ee n (2.3)
[6]
-
7/31/2019 Final Thesis Pk12
7/97
Wherepn(e) represents the probability density function of the error at time n. and E- is short
hand for the expectation integral on the right hand side (2.3).
The MSE cost function is useful for adaptive FIR filters because
Jmse(n) has a well-defined minimum with respect to the parameters in W(n)
The coefficient values obtained at this minimum are the ones that minimize the power in
the error signal e(n), indicating that y(n) has approached d(n) and
Jmse(n) a smooth function of each of the parameters in W(n), such that it is differentiable
with respect to each of the parameters in W(n).
The third point is important in that it enables us to determine both the optimum coefficient
values given knowledge of the statistics of d(n) and x(w) as well as a simple iterative procedure
for adjusting the parameters of an FIR filter.
2.2.3 THE WIENER SOLUTION:
For the FIR filter structure, the coefficient values in W(n) that minimizeJM SE (n) are well-
defined if the statistics of the input and desired response signals are known. The formulation of
this problem for continuous-time signals and the resulting solution was first derived by Wiener.
Hence, this optimum coefficient vector WM SE (n) is often called the Wiener solution to the
adaptive filtering problem. The extension of Wieners analysis to the discrete-time case is
attributed to Levinson . To determine WM SE (n) we note that the function JM SE (n) in is quadraticin the parameters{wi(n)}, and the function is also differentiable. Thus, we can use a result from
optimization theory that states that the derivatives of a smooth cost function with respect to each
of the parameters is zero at a minimizing point on the cost function error surface. Thus,WM SE
(n) can be found from the solution to the system of equation
( ) 0.0 ( ) ( )
( )
mse
i
J ni L
w n
=
(2.4)
Taking derivative ofJmse(n) in (2.3) and noting that e(n)=d(n) - y(n) and
( ) ( )1
0
( ) ( ) ( ) TL
ii
w n X ny n w n x n i
=
== respectively, we obtain
[7]
-
7/31/2019 Final Thesis Pk12
8/97
( ) ( ( ))( )( ) ( )
mse
i i
J n e nE e n
w n w n
= (2.5)
( )( ) ( )iy nEe nw n
= (2.6)
[ ( ) ( )]E e n x n i= (2.7)
1
0( ) ( ) ( ) ( ) ( )
L
jj
E d n x n i E x n i x n j w n
=
= (2.8)
Where we have used the dentitions of e(n) and of y(n) for the FIR filter structure in and
respectively to expand the last result in (2.8).
By defining the matrixRXX (n) and vectorPdx (n) as
( ) ( )TXX
R E X n X n = (2.9)
( ) ( ) ( )dx
P n E d n X n = (2.10)
respectively, we can combine the above equations to obtain the system of equations in vector
form as
( ) ( ) ( ) 0XX MSE dxR n W n P n = (2.11)Where 0 is the zero vector. Thus, so long as the matrix RXX (n) is invertible, the optimum
Wiener solution vector for this problem is
[8]
-
7/31/2019 Final Thesis Pk12
9/97
1( ) ( ) ( )XXMSE dx
W n R n P n= (2.12)
2.2.4THEMETHODOFSTEEPESTDESCENT:
The method of steepest descent is a celebrated optimization procedure for minimizing the value
of a cost function J(n) with respect to a set of adjustable pa-rameters W(n). This procedure
adjusts each parameter of the system according to
( )( 1) ( ) ( )( )i i i
J nw n w n n
w n
+ = (2.13)
In other words, the Ithparameter of the system is altered according to the derivative of the cost
function with respect to theIthparameter. Collecting these equations in vector form, we have
( )( 1) ( ) ( )( )
J nW n W n n
W n
+ = (2.14)
Where( )
( )
J n
W n
be the vector form of( )
( )i
J n
w n
For an FIR adaptive filter that minimizes the MSE cost function, we can use the result in toexplicitly give the form of the steepest descent procedure in this problem. Substituting these
results into yields the update equation for W(n) as
( )( 1) ( ) ( ) ( ) ( ) ( )XXdxW n W n n P n R n W n+ = + (2.15)
However, this steepest descent procedure depends on the statistical quantities E{d(n)x(n i)}
andE{x(n i)x(n j)} contained inPdx(n) andRxx(n) respectively. In practice, we only have
measurements of both d(n) and x(n) to be used within the adaptation procedure. While suitable
estimates of the statistical quantities needed for (2.15) could be determined from the signals x(n)
and d(n) we instead develop an approximate version of the method of steepest descent that
depends on the signal values themselves. This procedure is known as theLMSalgorithm.
[9]
-
7/31/2019 Final Thesis Pk12
10/97
2.2.5 THE LMS ALGORITHM:
The cost function J(n) chosen for the steepest descent algorithm of determines the coefficient
solution obtained by the adaptive filter. If the MSE cost function in is chosen, the resultingalgorithm depends on the statistics of x(n) and d(n) because of the expectation operation that
defines this cost function. Since we typically only have measurements of d(n) and of x(n)
available to us, we substitute an alternative cost function that depends only on these
measurements.
One such cost function is the least-squares cost function given by
( )2
0( ) ( ) ( ) ( ) ( )
n
TLMSk
j n k d k W n X k== (2.16)
Where ( )ka is a suitable weighting sequence for the terms within the summation. This cost
function, however, is complicated by the fact that it requires numerous computations to
calculate its value as well as its derivatives with respect to each W(n), although efficient
recursive methods for its minimization can be developed. Alternatively, we can propose the
simplified cost function JLM S(n ) Given by
21( ) ( )2LMS
J n e n= (2.17)
function can be thought of as an instantaneous estimate of the MSE cost function, as
JMSE(n)=EJLMS(n). Although it might not appear to be useful, the resulting algorithm obtained
when JLMS (n) is used for J(n) in (2.13) is extremely useful for practical applications. Taking
derivatives of JLMS (n) with respect to the elements of W(n) and substituting the result into(2.13), we obtain the LMS adaptive algorithm given by
( 1) ( ) ( ) ( ) ( )W n W n n e n X n+ = + (2.18)
[10]
-
7/31/2019 Final Thesis Pk12
11/97
Note that this algorithm is of the general form in. It also requires only multiplications and
additions to implement. In fact, the number and type of operations needed for the LMS
algorithm is nearly the same as that of the FIR filter structure with fixed coefficient values,
which is one of the reasons for the algorithms popularity. The behavior of the LMS algorithm
has been widely studied, and numerous results concerning its adaptation characteristics under
different situations have been developed. For now, we indicate its useful behavior by noting that
the solution obtained by the LMS algorithm near its convergent point is related to the Wiener
solution. In fact, analyses of the LMS algorithm under certain statistical assumptions about the
input and desired response signals show that
lim [ ( )] MSEn E W n W = (2.19)
When the Wiener solution WM SE (n) is a fixed vector. Moreover, the average behavior of the
LMS algorithm is quite similar to that of the steepest descent algorithm in that depends
explicitly on the statistics of the input and desired response signals. In effect, the iterative nature
of the LMS coefficient updates is a form of time-averaging that smoothes the errors in the
instantaneous gradient calculations to obtain a more reasonable estimate of the true gradient.
The problem is that gradient descent is a local optimization technique, which is limited because
it is unable to converge to the global optimum on a multimodal error surface if the algorithm is
not initialized in the basin of attraction of the global optimum. Several medications' exist for
gradient based algorithms in attempt to enable them to overcome local optima. One approach is
to simply add noise or a momentum term to the gradient computation of the gradient descent
algorithm to enable it to be more likely to escape from a local minimum. This approach is only
likely to be successful when the error surface is relatively smooth with minor local minima, or
some information can be inferred about the topology of the surface such that the additional
gradient parameters can be assigned accordingly. Other approaches attempt to transform the
error surface to eliminate or diminish the presence of local minima , which would ideally result
in a unimodal error surface. The problem with these approaches is that the resulting minimum
[11]
-
7/31/2019 Final Thesis Pk12
12/97
transformed error used to update the adaptive filter can be biased from the true minimum output
error and the algorithm may not be able to converge to the desired minimum error condition.
These algorithms also tend to be complex, slow to converge, and may not be guaranteed to
emerge from a local minimum. Some work has been done with regard to removing the bias of
equation error LMS and Steiglitz-McBride adaptive IIR filters, which add further complexity
with varying degrees of success. Another approach, attempts to locate the global optimum by
running several LMS algorithms in parallel, initialized with different initial coefficients. The
notion is that a larger, concurrent sampling of the error surface will increase the likelihood that
one process will be initialized in the global optimum valley. This technique does have potential,
but it is inefficient and may still suffer the fate of a standard gradient technique in that it will be
unable to locate the global optimum if none of the initial estimates is located in the basin of
attraction of the global optimum. By using a similar congregational scheme, but one in whichinformation is collectively exchanged between estimates and intelligent randomization is
introduced, structured stochastic algorithms are able to hill-climb out of local minima. This
enables the algorithms to achieve better, more consistent results using a fewer number of total
estimates. These types of algorithms provide the framework for the algorithms discussed in the
following sections.
2.3 DERIVATIVE FREE BASED ALGORITHIM:
Since the beginning of the nineteenth century, a significant evolution in optimization theory has
been noticed. Classical linear programming and traditional non-linear optimization techniques
such as Lagranges Multiplier, Bellmans principle and Pontyagrins principle were prevalent
until this century. Unfortunately, these derivative based optimization techniques can no longer
be used to determine the optima on rough non-linear surfaces. One solution to this problem has
already been put forward by the evolutionary algorithms research community. Genetic
algorithm (GA), enunciated by Holland, is one such popular algorithm. This chapter provides
recent algorithms for evolutionary optimization known as deferential evolution (DE). Thealgorithms are inspired by biological and sociological motivations and can take care of
optimality on rough, discontinuous and multimodal surfaces. The chapter explores several
schemes for controlling the convergence behaviors DE by a judicious selection of their
parameters. Special emphasis is given on the hybridizations DE algorithms with other soft
computing tools.
[12]
-
7/31/2019 Final Thesis Pk12
13/97
2.4 GENETIC ALGORITHM:
Genetic algorithms are a class of evolutionary computing techniques, which is a rapidly
growing area of artificial intelligence. Genetic algorithms are inspired by Darwin's theory ofevolution. Simply said, problems are solved by an evolutionary process resulting in a best
(fittest) solution (survivor) - in other words, the solution is evolved. Evolutionary computing
was introduced in the 1960s by Rechenberg in his work "Evolution strategies" (Evolutions
strategies'in original). His idea was then developed by other researchers. Genetic Algorithms
(GAs) were invented by John Holland and developed by him and his students and colleagues .
This led to Holland's book "Adaption in Natural and Artificial Systems" published in 1975.
The algorithm begins with a set of solutions (represented by chromosomes) called population.
Solutions from one population are taken and used to form a new population.
This is motivated by a hope, that the new population will be better than the old one. Solutions
which are then selected to form new solutions (offspring) are selected according to their fitness -
the more suitable they are, the more chances they have to reproduce. This is repeated until some
condition (for example number of populations or improvement of the best solution) is satisfied.
2.4.1 OUTLINE OF BASIC GA:
1. [Start] Generate random population of n chromosomes (suitable solutions for the
problem)
2. [Fitness] Evaluate the fitnessf(x) of each chromosomex in the population
3. [New population] Create a new population by repeating following steps until the new
population is complete
4. a [Selection] Select two parent chromosomes from a population according to their
fitness (the better fitness, the bigger chance to be selected)
5. [Replace] Use new generated population for a further run of the algorithm6. [Test] If the end condition is satisfied, stop, and return the best solution in current
7. population
8. [Loop] Go to step 2
9. The outline of the Basic GA provided above is very general. There are many parameters
[13]
-
7/31/2019 Final Thesis Pk12
14/97
and settings that can be implemented differently in various problems. Elitism is often
used as a method of selection. Which means, that at least one of a generation's best
solution is copied without changes to a new population, so the best solution can survive
to the succeeding generation
a. [Crossover] With a crossover probability cross over the parents to form new
offspring (children). If no crossover was performed, offspring is the exact copy of
parents.
b. [Mutation] With a mutation probability mutate new offspring at each locus (position
in chromosome).
c. [Accepting] Place new offspring in the new population
2.4.2OPERATORS OF GA:OVERVIEW:
The crossover and mutation are the most important parts of the genetic algorithm. The
performance is influenced mainly by these two operators.
ENCODING OF A CHROMOSOME:
A chromosome should in some way contain information about solution that it represents. The
most commonly used way of encoding is a binary string. A chromosome then could look like
this:Table-2.1 (Encoding of a chromosome)
Each chromosome is represented by a binary string. Each bit in the string can represent some
characteristics of the solution. There are many other ways of encoding. The encoding depends
mainly on the problem to be solved. For example, one can encode directly integer or real
numbers; sometimes it is useful to encode some permutations and so on.
CROSSOVER:
[14]
Chromosome 1 1101100100110110
Chromosome 2 1101111000011110
-
7/31/2019 Final Thesis Pk12
15/97
Crossover operates on selected genes from parent chromosomes and creates new offspring. The
simplest way how to do that is to choose randomly some crossover point and copy everything
before this point from the first parent and then copy everything after the crossover point from
the other parent. Crossover is illustrated in the following (| is the Crossover point)
Table-2.2 (crossover of Chromosome)
Chromosome 1 11011 00100110110
Chromosome 2 11011 11000011110
Chromosome 3 11011 11000011110
Chromosome 4 11011 00100110110
There are other ways how to make crossover, for example we can choose more crossover points.
MUTATION:
Mutation is intended to prevent falling of all solutions in the population into a local optimum of
the solved problem. Mutation operation randomly changes the offspring resulted fromcrossover. In case of binary encoding we can switch a few randomly chosen bits from 1 to 0 or
from 0 to 1. Mutation can be then illustrated as follows
Table-2.3(Mutation operation)
[15]
-
7/31/2019 Final Thesis Pk12
16/97
Original offspring 1 1101111000011110
Original offspring 2 1101100100110110
Original offspring 3 1100111000011110
Original offspring 4 1101101100110110
The technique of mutation (as well as crossover) depends mainly on the encoding of
chromosomes. For example when we are encoding by permutations, mutation could be
performed as an exchange of two genes.
2.4.3 PARAMETERS OF GA:
There are two basic parameters of GA - crossover probability and mutation probability.
CROSSOVER PROBABILITY:
It indicates how often crossover will be performed. If there is no crossover, offspring are exactcopies of parents. If there is crossover, offspring are made from parts of both parent's
chromosome. If crossover probability is 100%, then all
offspring are made by crossover. If it is 0%, whole new generation is made from exact copies of
chromosomes from old population (but this does not mean that the new generation is the same!).
Crossover is made in hope that new chromosomes will contain good parts of old chromosomes
and therefore the new chromosomes will be better. However, it is good to leave some part of old
population survives to next generation.
MUTATION PROBABILITY:
This signifies how often parts of chromosome will be mutated. If there is no mutation, offspring
[16]
-
7/31/2019 Final Thesis Pk12
17/97
are generated immediately after crossover (or directly copied) without any change. If mutation
is performed, one or more parts of a chromosome are changed. If mutation probability is100%,
whole chromosome is changed, if it is 0%, nothing is changed. Mutation generally prevents the
GA from falling into local extremes. Mutation should not occur very often, because then GA
will in fact change to random search.
OTHER PARAMETERS:
There are also some other parameters of GA. One another particularly important parameter is
population size.
POPULATION SIZE:
It signifies how many chromosomes are present in population (in one generation). If there aretoo few chromosomes, then GA has few possibilities to perform crossover and only a small part
of search space is explored. On the other hand, if there are too many chromosomes, then GA
slows down.
SELECTION:
The chromosomes are selected from the population to be parents for crossover. The problem is
how to select these chromosomes. According to Darwin's theory of evolution, the best ones
survive to create new offspring. There are many methods in selecting the best chromosomes.
Examples are roulette wheel selection, Boltzmann selection, tournament selection, rank
selection, steady state selection and some others. In this thesis we have used the tournament
selection as it performs better than the others.
TOURNAMENT SELECTION:
A selection strategy in GA is simply a process that favors the selection of better individuals in
the population for the mating pool. There are two important issues in the evolution process of
genetic search, population diversity and selective pressure. Population diversity means that the
genes from the already discovered good individuals are exploited while promising the new areas
of the search space continue to be explored. Selective pressure is the degree to which the better
individuals are favored. The tournament selection strategy provides selective pressure by
[17]
-
7/31/2019 Final Thesis Pk12
18/97
holding a tournament competition among individuals.
2.5 DIFFERENTIAL EVALUATION:
The aim of optimization is to determine the best-suited solution to a problem under a given setof constraints. Several researchers over the decades have come up with different solutions to
linear and non-linear optimization problems. Mathematically an optimization problem involves
a fitness function describing the problem, under a set of constraints representing the solution
space for the problem. Unfortunately, most of the traditional optimization techniques are
centered around evaluating the first derivatives to locate the optima on a given constrained
surface. Because of the difficulties in evaluating the first Derivatives, to locate the optima for
many rough and discontinuous optimization surfaces, in recent times, several derivative free
optimization algorithms have emerged. The optimization problem, now-a-days, is represented as
an intelligent search problem, where one or more agents are employed to determine the optima
on a search landscape, representing the constrained surface for the optimization problem [20].
In the later quarter of the twentieth century, Holland pioneered a new concept on evolutionary
search algorithms, and came up with a solution to the so far open-ended problem to non-linear
optimization problems. Inspired by the natural adaptations of the biological species, Holland
echoed the Darwinian Theory through his most popular and well known algorithm, currently
known as genetic algorithms (GA) [21]. Holland and his coworkers including Goldberg and
Dejong popularized the theory of GA and demonstrated how biological crossovers and
mutations of chromosomes can be realized in the algorithm to improve the quality of the
solutions over successive iterations [22]. In mid 1990s Eberhart and Kennedy enunciated an
alternative solution to the complex non-linear optimization problem by emulating the collective
behavior of bird flocks, particles, the boids method of Craig Reynolds [23] and socio-cognition
and called their brainchild the particle swarm optimization (PSO)[23-27]. Around the same
time, Price and Storn took a serious attempt to replace the classical crossover and mutationoperators in GA by alternative operators, and consequently came up with a suitable deferential
operator to handle the problem. They proposed a new algorithm based on this operator, and
called it deferential evolution (DE) [28].
Both algorithms do not require any gradient information of the function to be optimized uses
only primitive mathematical operators and are conceptually very simple. They can be
[18]
-
7/31/2019 Final Thesis Pk12
19/97
implemented in any computer language very easily and requires minimal parameter tuning.
Algorithm performance does not deteriorate severely with the growth of the search space
dimensions as well. These issues perhaps have a great role in the popularity of the algorithms
within the domain of machine intelligence and cybernetics.
2.5.1 CLASSICAL DE:
Like any other evolutionary algorithm, DE also starts with a population ofNPD-dimensional
search variable vectors. We will represent subsequent generations in DE by discrete time steps
liket = 0, 1, 2. . . t, t+1, etc. Since the vectors are likely to be changed over different generations
we may adopt the following notation for representing the ith vector of the population at the
current generation (i.e., at timet = t) as
Xi(t)= [xi,1(t), xi,2(t), xi,3(t) . . . . . xi,D (t)] (2.20)
These vectors are referred in literature as genomes or chromosomes. DE is a very simple
evolutionary algorithm. For each search-variable, there may be a certain range within which
value of the parameter should lie for better search results. At the very beginning of a DE run or
at t= 0, problem parameters or independent variables are Initialized somewhere in their feasible
numerical range. Therefore, if the jth parameter of the given problem has its lower and upperbound as xLj andxUj respectively, then we may initialize the jth component of the ithpopulation
members as xi,j (0) =xLj + rand(0, 1) (xUj xLj),where rand (0,1) is a uniformly distributed
random number lying between 0 and 1. Now in each generation (or one iteration of the
algorithm) to change each population memberXi(t) (say), a Donor vectorVi(t) is created. It is
the method of creating this donor vector, which demarcates between the various DE schemes.
However, here we discuss one such specific mutation strategy known as DE/rand/1. In this
scheme, to create Vi(t) for each ith member, three other parameter vectors (say the r1, r2, and r3th
vectors) are chosen in a random fashion from the current population. Next, a scalar numberF
scales the deference of any two of the three vectors and the scaled deference is added to the
third one whence we obtain the donor vector Vi(t). We can express the process for thejth
component of each vector as
[19]
-
7/31/2019 Final Thesis Pk12
20/97
, 1, 2, 3, .( 1) ( ) .( ( ) ( ))..............i j r j r j r jV t x t F x t x t + = + (2.21)
The process is illustrated in Fig. 2. Closed curves in Fig. 2denote constant cost contours, i.e., for
a given cost function f, a contour corresponds to f (X) = constant. Here the constant cost
contours are drawn for the Ackley Function. Next, to increase the potential diversity of the
population a crossover scheme comes to play. DE can use two kinds of cross over schemes
namely Exponential and Binomial. The donor vector exchanges its body parts, i.e.,
components with the target vectorXi(t) under this scheme. In Exponential crossover, we first
choose an integern randomly among the numbers [0, D1]. This integer acts as starting point in
the target vector, from where the crossover or exchange of components with the donor vector
starts. We also choose another integer L from the interval [1, D]. L denotes the number of
components; the donor vector actually contributes to the target. After a choice ofn and L thetrial vector
,1 ,2 ,( ) [ ( ), ( ),....... ( )]i i i i DU t u t u t u t = (2.22)
is formed with , ,( ) ( )i j i ju t v t = for j= < n > D, < n+1 > D,..,< n L+1 >D
= ( )ijx t (2.23)
Where the angular brackets D denote a modulo function with modulus D. The integer L is
drawn from [1, D] according to the following pseudo code.
[20]
-
7/31/2019 Final Thesis Pk12
21/97
Fig. 1.1. Illustrating creation of the donor vector in 2-D parameter space (The
constant cost contours are for two-dimensional Ackley Function)
L=0;
Do
{
L=L+1;
}
While (rand (0, 1) < CR) AND (L m) = (CR)m1 for any m > 0. CR is called Crossover constant
and it appears as a control parameter of DE just likeF. For each donor vectorV, a new set ofn
and L must be chosen randomly as shown above. However, in Binomial crossover scheme,
the crossover is performed on each of the D variables whenever a randomly picked numberbetween 0 and 1 is within the CR value. The scheme may be outlined as
ui,j (t) =vi,j (t) if rand (0, 1) < CR,
[21]
-
7/31/2019 Final Thesis Pk12
22/97
=xi,j (t) else. (2.26)
In this way for each trial vectorXi(t) an offspring vectorUi(t) is created. To keep the population
size constant over subsequent generations, the next step of the algorithm calls for selection to
determine which one of the target vector and the trial vector will survive in the next generation,
i.e., at time t= t+ 1. DE actually involves the Darwinian principle of Survival of the fittest in
its selection process which may be outlined as
Xi(t+ 1) =Ui(t) iff(Ui(t)) f(Xi(t)),
= Xi(t) iff(Ui(t)) f(Xi(t)) (2.27)
Where f () is the function to be minimized. So if the new trial vector yields a better value of the
fitness function, it replaces its target in the next generation; otherwise the target vector is
retained in the population. Hence the population either gets better (w.r.t. the fitness function) or
remains constant but never deteriorates. The DE/rand/1 algorithm is outlined below
2.5.2 PROCEDURE:
Input: Randomly initialized position and velocity of the particles: xi(0)
Output: Position of the approximate global optima X
Begin
Initialize population;
Evaluate fitness;
For i = 0 to max-iteration do
Begin
Create Difference-Offspring;
Evaluate fitness;
If an offspring is better than its parent
Then replace the parent by offspring in the next generation;
End If;
End For;
End.
[22]
-
7/31/2019 Final Thesis Pk12
23/97
2.5.3 THE COMPLETE DE FAMILY:
Actually, it is the process of mutation, which demarcates one DE scheme from another. In the
former section, we have illustrated the basic steps of a simple DE. The mutation scheme in
(2.21) uses a randomly selected vectorXr1 and only one weighted difference vectorF (Xr2
Xr3) is used to perturb it. Hence, in literature the particular mutation scheme is referred to as
DE/rand/1. We can now have an idea of how different DE schemes are named. The general
convention used, is DE/x/y. DE stands for DE, x represents a string denoting the type of the
vector to be perturbed (whether it is randomly selected or it is the best vector in the population
with respect to fitness value) and y is the
number of difference vectors considered for perturbation ofx. Below we outline the other four
different mutation schemes, suggested by Price et al.
SCHEME DE/RAND TO BEST/1
DE/rand to best/1 follows the same procedure as that of the simple DE scheme illustrated
earlier. The only difference being that, now the donor vector, used to perturb each population
member, is created using any two randomly selected member of the population as well as the
best vector of the current generation (i.e., the vector yielding best suited objective function
value at t= t). This can be expressed for the ith donor vector at time t= t+ 1 as
Vi(t+ 1) = Xi(t) + (Xbest(t) Xi(t)) + F (Xr2 (t) Xr3(t)) (2.28)
Where is another control parameter of DE in [0, 2], Xi(t) is the target vector and Xbest(t) is the
best member of the population regarding fitness at current time step t= t. To reduce the number
of control parameters a usual choice is to put = F
SCHEME DE/BEST/1
In this scheme everything is identical to DE/rand/1 except the fact that the
trial vector is formed as
Vi(t+ 1) = Xbest(t) + F (Xr1(t) Xr2(t)) (2.29)
[23]
-
7/31/2019 Final Thesis Pk12
24/97
here the vector to be perturbed is the best vector of the current population and the perturbation is
caused by using a single difference vector.
SCHEME DE/BEST/2
Under this method, the donor vector is formed by using two difference vectors as shown below:
Vi(t+ 1) = Xbest(t) + F (Xr1(t) + Xr2(t) Xr3(t) Xr4(t)) (2.30)
Owing to the central limit theorem the random variations in the parameter vector seems to shift
slightly into the Gaussian direction which seems to be beneficial for many functions.
SCHEME DE/RAND/2
Here the vector to be perturbed is selected randomly and two weighted difference vectors are
added to the same to produce the donor vector. Thus for each target vector, a totality of five
other distinct vectors are selected from the rest of the population. The process can be expressed
in the form of an equation as
Vi(t
+ 1) =Xr
1(t) +
F1
(Xr
2(t)
Xr3(
t)) +
F2
(Xr
4(t)
X(t)) (2.31)
Here F1 and F2 are two weighing factors selected in the range from 0 to 1. To reduce the
number of parameters we may choose F1 = F2 = F.
SUMMARY OF ALL SCHEMES:
In 2001 Storn and Price [21] suggested total ten different working strategies of DE and some
guidelines in applying these strategies to any given problem. These strategies were derived from
the five different DE mutation schemes outlined above. Each mutation strategy was combined
with either the exponential type crossover or the binomial type crossover. This yielded 5
2 = 10 DE strategies, which are listed below.
DE/best/1/exp
[24]
-
7/31/2019 Final Thesis Pk12
25/97
DE/rand/1/exp
DE/rand-to-best/1/exp
DE/best/2/exp
DE/rand/2/exp
DE/best/1/bin
DE/rand/1/bin
DE/rand-to-best/1/bin
DE/best/2/bin
DE/rand/2/
The general convention used above is again DE/x/y/z, where DE stands for DE, x represents a
string denoting the vector to be perturbed, y is the number of difference vectors considered forperturbation of x, and z stands for the type of crossover being used (exp: exponential; bin:
binomial)
2.5.4 MORE RECENT VARIANTS OF DE:
DE is a stochastic, population-based, evolutionary search algorithm. The strength of the
algorithm lies in its simplicity, speed (how fast an algorithm can find the optimal or suboptimal
points of the search space) and robustness (producing nearly same results over repeated runs).The rate of convergence of DE as well as its accuracy can be improved largely by applying
different mutation and selection strategies. A judicious control of the two key parameters
namely the scale factor F and the crossover rate CR can considerably alter the performance of
DE. In what follows we will illustrate some recent medications in DE to make it suitable for
tackling the most difficult optimization problems.
DE WITH TRIGONOMETRIC MUTATION:Recently, Lampinen and Fan [29] has proposed a trigonometric mutation operator for DE to
speed up its performance. To implement the scheme, for each target vector, three distinct
vectors are randomly selected from the DE population. Suppose for the ith target vectorXi(t),
the selected population members are Xr1(t), Xr2(t) and Xr3(t). The indices r1, r2 and r3 are mutually
different and selected from [1, 2. . . N] Where N denotes the population size. Suppose the
[25]
-
7/31/2019 Final Thesis Pk12
26/97
objective function values of these three vectors are given by, f(Xr1(t)), f(Xr2(t)) and f(Xr3(t)).
Now three weighing coefficients are formed according to the following equations:
p = f (Xr1) + f (Xr2) + f (Xr3) (2.32)
p1 = f (Xr1) p (2.33)
p2 = f (Xr2) p (2.34)
p3 = f (Xr3) p (2.35)
Let rand (0, 1) be a uniformly distributed random number in (0, 1) and be the trigonometric
mutation rate in the same interval (0, 1). The trigonometric mutation scheme may now beexpressed as
Vi(t+ 1) = (Xr1 + Xr2 + Xr3)/3 + (p2 p1) (Xr1 Xr2)
+ (p3 p2) (Xr2 Xr3) + (p1 p3) (Xr3 Xr1)
if rand (0, 1) < (2.36)
Vi(t+ 1) = Xr1 + F (Xr2 + Xr3) else (2.37)
Thus, we find that the scheme proposed by Lampinen et al. uses trigonometric mutation with a
probability of and the mutation scheme of DE/rand/1 with a probability of (1 ).
DERANDSF (DE WITH RANDOM SCALE FACTOR)
In the original DE [28] the deference vector (Xr1(t) Xr2(t)) is scaled by a constant factor F.
The usual choice for this control parameter is a number between 0.4 and 1. We propose to vary
this scale factor in a random manner in the range (0.5, 1) by using the relation
F = 0.5 (1 + rand (0, 1)) (2.38)
[26]
-
7/31/2019 Final Thesis Pk12
27/97
where rand (0, 1) is a uniformly distributed random number within the range [0, 1]. We call this
scheme DERANDSF (DE with Random Scale Factor) . The mean value of the scale factor is
0.75. This allows for stochastic variations in the amplification of the difference vector and thus
helps retain population diversity as the search progresses. Even when the tips of most of the
population vectors point to locations clustered near a local optimum due to the randomly scaled
difference vector, a new trial vector has fair chances of pointing at an even better location on the
multimodal functional surface. Therefore, the
fitness of the best vector in a population is much less likely to get stagnant until a truly global
optimum is reached.
DETVSF (DE WITH TIME VARYING SCALE FACTOR)
In most population-based optimization methods (except perhaps some hybrid global-local
methods) it is generally believed to be a good idea to encourage
Fig. 1.2. Illustrating DETVSF scheme on two-dimensional cost contours of Ackley
Function
the individuals (here, the tips of the trial vectors) to sample diverse zones of the search spaceduring the early stages of the search. During the later stages it is important to adjust the
movements of trial solutions finely so that they can explore the interior of a relatively small
space in which the suspected global optimum lies. To meet this objective we reduce the value of
the scale factor linearly with time from a (predetermined) maximum to a (predetermined)
[27]
-
7/31/2019 Final Thesis Pk12
28/97
minimum value:
R = (Rmax Rmin)(MAXIT iter)/MAXIT (2.39)
where Fmax and Fmin are the maximum and minimum values of scale factor F, iteris the current
iteration number and MAXITis the maximum number of allowable iterations. The locus of the
tip of the best vector in the population under this scheme may be illustrated as in Fig. 2. The
resulting algorithm is referred as DETVSF (DE with a time varying scale factor).
DE WITH LOCAL NEIGHBORHOOD:
Only in 2006, a new DE-variant, based on the neighborhood topology of the parameter vectors
was developed [30] to overcome some of the disadvantages of the classical DE versions. Theauthors in proposed a neighborhood-based local mutation operator that draws inspiration from
PSO. Suppose we have a DE population P= [X1, X2. . . XNp ] where each Xi (i = 1, 2. . . Np) is a
D-dimensional vector. Now for every vectorXi we define a neighborhood of radius k, consisting
of vectors Xik . . . Xi . . .Xi+k. We assume the vectors to be organized in a circular fashion such
that two immediate neighbors of vectorX1 are XNp and X2. For each member of the population
a local mutation is created by employing the fittest vector in the neighborhood of the model may
be expressed as:
Li(t)=Xi(t)+ (Xnbest(t) Xi(t)) + F (Xp(t) Xq (t)) (2.40)
where the subscript nbestindicates the best vector in the neighborhood ofX i and p, q (i k,
i + k). Apart from this, we also use a global mutation expressed as:
Gi(t) =Xi(t) + (Xbest(t) Xi(t)) + F (Xr(t) Xs(t)) (2.41)
where the subscript best indicates the best vector in the entire population, and r, s (1, NP).
Global mutation encourages exploitation, since all members (vectors) of a population are biased
by the same individual (the population best); local mutation, in contrast, favors exploration,
since in general different members of the population are likely to be biased by different
individuals. Now we combine these two models using a time-varying scalar weight w (0, 1)
[28]
-
7/31/2019 Final Thesis Pk12
29/97
to form the actual mutation of the new DE as a weighted mean of the local and the global
components:
Vi(t) = w Gi(t) + (1 w) Li(t). (2.42)
The weight factor varies linearly with time as follows:
w = wmin + (wmax wmin) iter (2.43)
Where iter is the current iteration number, MAXIT is the maximum number of iterations
allowed and wmax, wmin denotes, respectively, the maximum and minimum value of the weight,
with wmax, wmin (0, 1). Thus the algorithm starts at iter = 0 with w = wmin but as iterincreases towards MAXIT, w increases gradually and ultimately when iter = MAXIT w reaches
wmax. Therefore at the beginning, emphasis is laid on the local mutation scheme, but with time,
contribution from the global model increases. In the local model attraction towards a single
point of the search space is reduced, helping DE avoid local optima. This feature is essential at
the beginning of the search process when the candidate vectors are expected to explore the
search space vigorously. Clearly, a judicious choice of wmax and wmin is necessary to strike a
balance between the exploration and exploitation abilities of the algorithm. After someexperimenting, it was found that wmax = 0.8 and wmin = 0.4 seem to improve the performance
of the algorithm over a number of benchmark function
[29]
-
7/31/2019 Final Thesis Pk12
30/97
CHAPTER -3
ADAPTIVE SYSTEM IDENTIFICATION
USING GA
3.1 INTRODUCTION:
Generally the identification of linear system is performed by using LMS algorithm. But most of
the dynamic systems exhibit nonlinearity. The LMS based technique [31] does not perform
satisfactory to identify nonlinear system. To improve the identification performance of
nonlinear systems various techniques such as Artificial Neural Network (ANN) [32], Functional
Link Artificial Neural Network (FLANN) [33], Radial Basis Function (RBF) [34], etc.
In this chapter we propose a novel adaptive model based on GA technique for identification of
nonlinear systems. To apply GAs in systems identification, each individual in the population
must represent a model of the plant and the objective becomes a quality measure of the model,
by evaluating its capacity of predicting the evolution of the measured outputs. The measured
output predictions, inherent to each individual i, is compared with the measurements made on
the real plant. The obtained error is a function of the individuals quality. As less is this error, as
more performing the individual is. There are many ways in which the GAs can be used to solve
system identification tasks.
3.2. BASIC PRINCIPLE OF ADAPTIVE SYSTEM
IDENTIFICATION:
An adaptive filter can be used in modeling that is, imitating the behavior of physical dynamic
systems which may be regarded as unknown black boxes having one or more inputs andoutputs. Modeling a single input, single output dynamic system is shown in fig(3).Noise is taken
into consideration because in many practical cases the system to be modeled is noisy, that is,
has internal random disturbing forces. Internal system noise appears at the system output and is
commonly represented there as an additive noise. This noise is generally uncorrelated with the
plant input. If this is the case and if the adaptive model is an adaptive linear combiner whose
[30]
-
7/31/2019 Final Thesis Pk12
31/97
weights are adjusted to minimize mean square error, it can be shown that the least squares
solution will be unaffected by the presence of plant noise. This is not to say that the
convergence of the adaptive process will be unaffected by system noise, only that the expected
weight vector of the adaptive model after convergence will be unaffected. The least square
solution will be determined primarily by the impulse response of the system to be modeled. It
could also be significantly affected by the statistical or spectral character of the system input
signal.
Fig.3.1 Modeling the single input, single output System..
The problem of determining a mathematical model for an unknown system by observing its
input-output data is known as system identification, Which is performed by suitably
adjusting the parameters within a given model, such that for a particular input, the model output
matches with the corresponding actual system output .After a system is identified, the output
can be predicted for a given input to the system which is the goal of system identification
problem. When the plant behavior is completely unknown it may be characterized using certain
adaptive model and then its identification task is carried out using adaptive algorithms like the
[31]
Adaptive model
Unknown System
Adaptive Algorithm
+
-
x
noise
e
y
-
7/31/2019 Final Thesis Pk12
32/97
LMS. The system identification task is at the heart of numerous adaptive filtering applications.
We list several of these applications here.
Channel Identification
Plant Identification.
Echo Cancellation for long distance transmission.
Acoustic Echo Cancellation
Adaptive Noise Cancellation.
Fig .4 represents a schematic diagram of system identification of time invariant, causal discrete
time dynamic plant The output of the plant is given by y = p(x) where x is the input which is
uniformly bounded function of time .the operator p describes the dynamic plant . The objective
of identification problem is to construct model generating an output which approximate the
plant output y when subjected to the same input x so that the squared error(e2) is minimum .
Fig.3.2 schematic block diagram of a GA based adaptive identification system
In this chapter the modeling is done in an adaptive manner such that after training the model
iteratively y and become almost equal and the squared error becomes almost zero. The
minimization of error in an iterative manner is usually achieved by LMS or RLS methods which
[32]
y
-
x
noise
e+
Adaptive model
System P(x)
GA Based AdaptiveAlgorithm
-
7/31/2019 Final Thesis Pk12
33/97
are basically derivative based. The shortcoming of this method is that for certain type of plant
the squared error cannot be optimally minimized due to error surface falling to local minima. In
this chapter we propose a novel and elegant method which employs Genetic algorithm for
minimizing the squared error in a derivative free manner. In essence, in this chapter the system
identification problem is viewed as a squared error minimization problem.
The adaptive modeling constitutes two step. In the first step the model is trained using GA
based updating technique. After successful training of the model system performance is carried
out by feeding zero mean uniformly distributed random input. Before we proceed to the
identification task using GA let us discuss the basics of GA based optimization.
3.3. DEVELOPMENT OF GA BASED ALGORITHEM
FOR SYSTEM IDENTIFICATION:
Referring to Fig.3.2 let the system p(x) be an FIR system represent by the transfer function
given by
p(z)=a0+a1z-1+a2z-2+a3z-3+.+anz-n (3.1)
Where a0, a1, a2 an represent the impulse response (parameter) of the system . The
measurement noise of the system is given by n(k) which is assumed to be white and Gaussian
distributed . The input x is also uniformly distributed white noise lying between -23 to +23
and have a variance of unity. The GA based model consists of an equal order FIR system with
unknown coefficients. The purpose of the adaptive identification model is to estimate the
unknown coefficients 0,1,2,...n such that they match with the corresponding
parameters a0, a1,a2 ,an of the actual system p(z) . if the system is exactly
identified(theoretically) then in case of a linear system (for example the FIR system ) the system
parameters and the model parameters become equal i.e. a0=0, a1=1, a2=2an=n.
Also the response of actual system(y) coincides with the response of the model system ().
However, in case of nonlinear dynamic system the system parameters do not match but the
responses of the system will match.
The updating of the parameters of the model is carried out using GA rule as outlined in the
following steps
[33]
-
7/31/2019 Final Thesis Pk12
34/97
I. As shown in fig.3.2 an unknown static dynamic system to be identified is connected is
parallel with an adaptive model to be developed using GA.
II. The coefficients () of the system are initially chosen from population of M
chromosomes. Each chromosome constitutes NL number of random binary bits, eachsequential group of L-bits represent one coefficient of the adaptive model, where N is
the number of parameters of the model.
III. Generate k(=500) number of input signal samples each of which is having zero mean
and uniformly distributed between -23 to +23 and having a variance of unity.
IV. Each of the input samples is passed through the plant P(Z) and the contaminated with
the additive noise of known strength .The resultant signal acts like the desired signal . in
this way k number of desired signals are produced by feeding all the k input samples.
V. Each of the input sample is also passed through the model using each chromosome as
model parameters and M sets of K estimated output are obtained.
VI. Each of the desired output is compared with corresponding estimated output and K
errors are produced. The mean square error (MSE) for set of parameters (corresponding
to mth chromosome) is determined by using relation.
1
2
( )
k
i
i
MSEn
k
e==
(3.2)
This is repeated for M times
VII. Since the objective is to minimize MSE(m),m=1 to M the GA based optimization isused.
VIII. The tournament selection, crossover, mutation and selection operator are sequentially
carried out following the steps as given in section-3.3.
[34]
-
7/31/2019 Final Thesis Pk12
35/97
IX. In each generation the minimum MSE, MMSE is obtained and plotted against
generation to show the learning characteristics.
X. The learning process is stopped when MMSE reaches the minimum level.
XI. At this step all the chromosomes attend almost identical genes, which represent the
estimated parameters of the developed model.
3. 4. SIMULATION STUDIES:
To demonstrate the performance of the proposed GA based approach numerous simulation
studied are carried out of several linear and non linear system. The performance of the proposedstructure is compared with corresponding LMS structure.
The block diagram shown in the Fig.3.2is used for simulation study
Case-1 (Linear System)
A unit variance random system uniform signal lying in the range of -2 3 to +23
is applied to known the system having transfer function.Experiment-1: H (z) =0.2090+ 0.9950Z-1 + 0.2090 Z-2 and
Experiment-2: H (z) =0.2600 + 0.9300 Z-1 + 0.2600 Z-2
The output of the system is contaminated with white Gaussian noise of different strengths of -20
db and -30db. The resultant signal y is used as the desired on the training signal. The same
random input is also applied to the GA based adaptive model having the same linear combinerstructure as that of H (z) but with random initial weights. The coefficients or weights of the
linear combiner are updated using LMS algorithm as well as the proposed GA based algorithm.
The training become complete when MSE in dB become parallel to x- axis. Under this
condition, for a linear system, the parameter ais match with the corresponding estimated
parameteris from the proposed system.
[35]
-
7/31/2019 Final Thesis Pk12
36/97
In Table -3.1 we represent actual and estimated parameter of a 3-tap linear combiner obtained
by the LMS as well as GA models. From this table it is observed that the GA based model
performs better than that of LMS based models under different noise conditions.
ExperimentActual
Parameter
Estimated parametersLMS Based GA Based
NSR = -30
dB
NSR = -20
dB
NSR = -30
dB
NSR = -20
dB
010.2090 0.2092 0.2064 0.2100 0.20610.9950 0.9941 1.0094 0.9943 0.99850.2090 0.2071 0.2153 0.2077 0.2077
020.2600 0.2631 0.2705 0.2582 0.25660.9300 0.9308 0.9289 0.9301 0.93420.2600 0.2563 0.2624 0.2598 0.2598
[36]
Table-3.1 comparison of actual and estimated parameters of LMS and GA based models
-
7/31/2019 Final Thesis Pk12
37/97
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30
-25
-20
-15
-10
-5
0
MSE
IN
dB
NUMBER OF ITERATIONS(SAMPLES)
CH:[0.2090,0.9950,0.2090],NL=0
NSR=-20dB
NSR=-30dB
Fig.3.3 Learning Chacteristics of LMS based Linear System Identification (Experiment-1)
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30
-25
-20
-15
-10
-5
0
MSE
IN
dB
NUMBER OF ITERATIONS(SAMPLES)
CH:[0.2600,0.9300,0.2600],NL=0
NSR=-30dB
NSR=-20dB
Fig.3.4 Learning Chacteristics of LMS based Linear System Identification (Experiment-2)
[37]
-
7/31/2019 Final Thesis Pk12
38/97
0 10 20 30 40 50 60 70 80 90 100-35
-30
-25
-20
-15
-10
-5
0
Generation
Mean
squareerrorindB
CH:[0.2090,0.9950,0.2090],NL=0
NSR=-30dB
NSR=-20dB
Fig.3.5 Learning Characteristics of GA based Linear System Identification (Experiment-1)
0 10 20 30 40 50 60 70 80 90 100-35
-30
-25
-20
-15
-10
-5
0
Generation
Mean
squareerrorindB
CH:[0.2600,0.9300,0.2600],NL=0
NSR=-20dB
NSR=-30dB
Fig.3.6 Learning Characteristics of GA based Linear System Identification (Experiment-2)
[38]
-
7/31/2019 Final Thesis Pk12
39/97
Case-2 (Non-Linear System)
In this simulation the actual is assume to be non linear in nature .Computer simulation result of
two different nonlinear system are presented in this case the actual system
Experiment -3: yn (k) = tanh{y (k)}
Experiment -4: yn (k) = y (k) + 0.2y2 (k) 0.1y3 (k)
Where y (k) is the output of the linear system and yn (k) is the output of nonlinear system .
In case of nonlinear system the parameter of two system do not match ,however the responses of
the actual and adaptive model match .To demonstrate this observation training carried out using
both LMS and GA based algorithm .
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30
-25
-20
-15
-10
-5
0
MSEIN
dB
NUMBER OF ITERATIONS(SAMPLES)
CH:[0.2090,0.9950,0.2090],NL:y=tanh(y)
NSR=20dB
NSR=-30dB
Fig.3.7 Learning Chacteristics of LMS based Non Linear System Identification
(Experiment-3)
[39]
-
7/31/2019 Final Thesis Pk12
40/97
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-25
-20
-15
-10
-5
0
MSE
IN
dB
NUMBER OF ITERATIONS(SAMPLES)
CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y2)-0.1(y3)
NSR=-30dB
NSR=-20dB
Fig.3.8 Learning Chacteristics of LMS based Non Linear System Identification
(Experiment-4)
0 100 200 300 400 500 600-30
-25
-20
-15
-10
-5
0
Generation
Mean
squareerrorindB
CH:[0.2090,0.9950,0.2090],NL:y=tanhy
NSR=-30dBNSR=-20db
Fig.3.9 Learning Chacteristics of GA based Non Linear System Identification (Experiment-3)
[40]
-
7/31/2019 Final Thesis Pk12
41/97
0 100 200 300 400 500 600-25
-20
-15
-10
-5
0
Generation
Mean
squareerrorindB
CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y.2)=0.1*(y.3)
NSR=-30dB
NSR=-20dB
Fig.3.10 Learning Chacteristics of GA based Non Linear System Identification
(Experiment-4)
0 5 10 15 20 25 30 35 40 45 50-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
output
CH:[0.2090,0.9950,0.2090],NL:y=tanhy
Actual
GA
LMS
Fig.3.11 Comparision of Output response of (Experiment-3) at -30dBNSR.
[41]
-
7/31/2019 Final Thesis Pk12
42/97
0 5 10 15 20 25 30 35 40 45 50-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
output
CH:[0.2090,0.9950,0.2090],NL:y=y+0.2*(y.2)-0.1*(y.3)
Actual
GA
LMS
Fig.3.12 Comparison of Output response of (Experiment-4) at -30dBNSR.
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-30
-25
-20
-15
-10
-5
0
MSE
IN
dB
NUMBER OF ITERATIONS(SAMPLES)
CH:[0.2600,0.9300,0.2600],NL:y=tanh(y)
NSR=-20dB
NSR=-30dB
Fig.3.13 Learning Chacteristics of LMS based Non Linear System Identification
(Experiment-3)
[42]
-
7/31/2019 Final Thesis Pk12
43/97
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-25
-20
-15
-10
-5
0
MSE
IN
dB
NUMBER OF ITERATIONS(SAMPLES)
CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y.2)-0.1*(y.3)
NSR=-30dB
NSR=-20dB
Fig.3.14 Learning Chacteristics of LMS based Non Linear System Identification
(Experiment-4)
0 100 200 300 400 500 600-30
-25
-20
-15
-10
-5
0
Generation
Mean
squareerrorindB
CH:[0.2600,0.9300,0.2600],Nl;y=tanhy
NSR=-30dB
NSR=-20dB
Fig.3.15 Learning Chacteristics of GA based Non Linear System Identification
(Experiment-3)
[43]
-
7/31/2019 Final Thesis Pk12
44/97
0 100 200 300 400 500 600-25
-20
-15
-10
-5
0
Generation
Mean
squareerrorindB
CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y. 2)-0.1*(y.3)
NSR=-30dB
NSR=-20dB
Fig.3.16 Learning Chacteristics of GA based Non Linear System Identification
(Experiment-4)
0 5 10 15 20 25 30 35 40 45 50-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
output
CH:[0.2600,0.9300,0.2600],NL:y=tanhy
Actual
GALMS
Fig.3.17 Comparison of Output response of (Experiment-3) at -30dBNSR.
[44]
-
7/31/2019 Final Thesis Pk12
45/97
0 5 10 15 20 25 30 35 40 45 50-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
output
CH:[0.2600,0.9300,0.2600],NL:y=y+0.2*(y.2)-0.1*(y.3)
Actual
GA
LMS
Fig.3.18 Comparison of Output response of (Experiment-4) at -30dBNSR
The MSE plots of experiment-3 and experiment-4 followed by experiment-1 for two differentnoise conditions for LMS based algorithm are obtained by simulation and shown in Fig.3.7
&3.8 respectively .The corresponding plots for the same system for GA based model are shown
in Fig.3.9 &3.10 respectively. The comparison of output responses of the two nonlinear models
using LMS and GA techniques are shown in Fig.3.11 &3.12 respectively. Similarly the MSE
plots of experiment-3 and experiment-4 followed by experiment-2 for two different noise
conditions for LMS based algorithm are obtained by simulation and shown in Fig.3.13 &3.14
respectively .The corresponding plots for the same system for GA based model are shown inFig.3.15 &3.16 respectively. The comparison of output responses of the two nonlinear models
using LMS and GA techniques are shown in Fig.3.17 &3.18 respectively. Similar results are
also observed in case of other non linear models and under various noise conditions.
[45]
-
7/31/2019 Final Thesis Pk12
46/97
3.5. RESULTS AND DISCUSSIONS:
Table-1 reveals that for FIR linear system the coefficients of adaptive model using LMS arematched closely with the coefficients of actual system in comparison with GA.Hence for linear
FIR system LMS works well.
For nonlinear system the learning characteristics of LMS technique is poor (Fig.9) for both
noise cases. But the same is much improved in case of GA (Fig.11).
The output response of nonlinear system (Experiment-3) of GA is better than the LMS counter
part because of GA is closer to the desired response (Fig.13).
[46]
-
7/31/2019 Final Thesis Pk12
47/97
CHAPTER-4
ADAPTIVE CHANNEL EQUALIZATION
USING GENETIC ALGORITHM.
4.1 INTRODUCTION:
The digital communication system suffers from the problem of ISI which essentially deteoriates
the accuracy of reception. The probability of error at the receiver can be minimized and can be
reduced to an acceptable level by introducing an equalizer at the front end of the receiver. An
adaptive digital channel equalizer is essentially an inverse system of the channel model which
primarily compacts the effect of ISI. Conventially the LMS algorithm is employed to design and
develop adaptive equalizers [35]. Such equalizers use gradient based weight update algorithm
and therefore there is a possibility that during training of the equalizers its weight do not attain
to their optimal values due to the MSE being trapped to local minimum. On the other hand the
GA and DE are derivative free technique and hence the local minima problem does not arise
during weight updates. The present chapter has developed a novel GA based adaptive channel
equalizer.
4.2 BASIC PRINCIPLE OF CHANNEL EQUALIZATION:
In an ideal communication channel, the received information is identical to that transmitted.
However, this is not the case for real communication channels, where signal distortions take
place. A channel can interfere with the transmitted data through three types of distorting effects.
Power degradation and fades, multi-path time dispersions and background thermal noise [36].
Equalization is the process of recovering the data sequence from the corrupted channel samples.
A typical baseband transmission system is depicted in Fig.4.1, where an equalizer is
incorporated within the receiver.
[47]
-
7/31/2019 Final Thesis Pk12
48/97
Fig. 4.1. A Baseband Communication System
4.2.1 MULTIPATH PROPAGATION:
Within telecommunication channels multiple paths of propagation commonly occur. In practical
terms this is equivalent to transmitting the same signal through a number of separate channels,
each having a different attenuation and delay. Consider an open-air radio transmission channel
that has three propagation paths, as illustrated in Fig4.2. These could be direct, earth bound and
sky bound.Multipath interference between consecutively transmitted signals will take place if one signal is
received whilst the previous signal is still being detected. In Fig4.1 this would occur if the
symbol transmission rate is greater than 1/ where, represents transmission delay. Because
bandwidth efficiency leads to high data rates, multi-path interference commonly occurs.
[48]
InputOut putTransmitter
FilterChannelMedium
+ ReceiverFilter EQUALISER
noise
-
7/31/2019 Final Thesis Pk12
49/97
Fig.4.2 Impulse Response of a transmitted signal in a channel which has 3
modes of propagation, (a) The signal transmitted paths, (b) The received samples
4.2.2MINIMUM & NON MINIMUM PHASE CHANNELS:
When all the roots of the H(Z) lie within the unit circle, the channel are termed minimum phase.
The inverse of a minimum phase [37] channel is convergent, illustrated by (4.1)
[49]
Sky Bound
Direct
Earth Bound
Transmitter Receiver
Multiple Transmission Paths
(a)
Signal Strengthat Receiver
Direct
Earth Bound
Sky Bound
(b)
-
7/31/2019 Final Thesis Pk12
50/97
111.0 0.5( )
( ) 111.0 0.5
1( )
201 2 31 0.5 0.25 0.125 ..............
ZH z
H z
z
i iZ
i
Z Z Z
- += - + - - = - - - - + - +
(4.1)
Whereas the inverse of non-minimum phase channels are not convergent, as shown in (4.2).
111.0 0.5( )
( )
1.0 0.5
1.[ ( ) ]
202 3.[1 0.5 0.25 0.125 ]
ZH z
H zZ
Z
i iZ Zi
Z Z Z Z
- += + - - = - + -
(4.2)
Since equalizers are designed to invert the channel distortion process they will in effect model
the channel inverse. The minimum phase channel has a linear inverse model therefore a linear
equalization solution exists. However, limiting the inverse model to m-dimensions will
[50]
-
7/31/2019 Final Thesis Pk12
51/97
approximate the solution and it has been shown that non-linear solutions can provide a superior
inverse model in the same dimension.
A linear inverse of a non-minimum phase channel does not exist without incorporating time
delays. A time delay creates a convergent series for a non-minimum phase model, where longer
delays are necessary to provide a reasonable equalizer. (4.3) describes a non-minimum phase
channel with a single delay inverse and a four sample delay inverse. The latter of these is the
more suitable form for a linear filter.
11 10.5 1.0( )
( )1
1 0.512 3 41 0.5 0.25 0.125 .........( )
3 2 10.5 0.25 0.125 ........
.( )
.( )
Z ZH z
H z
Z
Z Z Z ZH z
Z Z Z Z
noncausal
truncatedandcausal
- - += + - - + - + - - - - + - +
(4.3)
The three-tap non-minimum phase channel H (z) = 0.3410+0.8760z 1+0.3410z 2 is used
throughout this thesis for simulation purposes. A channel delay, D, is included to assist in the
classification so that the desired output becomes u(n D).
4.2.3 INTERSYMBOL INTERFERENCE:
Inter-symbol interference (ISI) has already been described as the overlapping of the transmitted
data. It is difficult to recover the original data from one channel sample dimension because there
is no statistical information about the multipath propagation. Increasing the dimensionality of
the channel output vector helps characterize the multipath propagation. This has the affect of not
only increasing the number of symbols but also increases the Euclidean distance between the
[51]
-
7/31/2019 Final Thesis Pk12
52/97
output classes.
Fig. 4.3 Interaction between two neighboring symbols
When additive Gaussian noise, is present within the channel, the input sample will form
Gaussian clusters around the symbol centers. These symbol clusters can be characterized by a
probability density function (PDF) with a noise variance 2. where the noise can cause the
symbol clusters to interfere. Once this occurs, equalization filtering will become inadequate to
classify all of the input samples. Error control coding schemes can be employed in such cases
but these often require extra bandwidth.
4.4.4SYMBOL OVERLAP:
The expected number of errors can be calculated by considering the amount of symbol
interaction, assuming Gaussian noise. Taking any two neighboring symbols, the cumulative
distribution function (CDF) can be used to describe the overlap between the two noise
characteristics. The overlap is directly related to the probability of error between the two
symbols and if these two symbols belong to opposing classes, a class error will occur.
Fig4.3 shows two Gaussian functions that could represent two symbol noise distributions. The
Euclidean distance, L, between symbol canters and the noise variance 2s can be used in the
[52]
L
1 2
1
2
Area of overlap =Probability of error
-
7/31/2019 Final Thesis Pk12
53/97
cumulative distribution function of (4.4) to calculate the area of overlap between the two
symbol noise distributions and therefore the probability of error, as in (4.5).
12
2exp 22( ) xCDF dx x s s
= - - (4.4)
( ) 22L
P c CDF
= (4.5)
Since each channel symbol is equally likely to occur the probability of unrecoverable errors
occurring in the equalization space can be calculated using the sum of all the CDF overlap
between each opposing class symbol. The probability of error is more commonly described as
the BER. (4.6) describes the BER based upon the Gaussian noise overlap, whereNSP is the
number of symbols in the positive class,Nm is the number of number of symbols in the negative
class and iD is the distance between the Ithpositive symbol and its closest neighboring symbol
in the negative class.
2( ) log ( )
21
NspiBER CDF n N N isp m n
ss
D= + =
(4.6)
[53]
-
7/31/2019 Final Thesis Pk12
54/97
4.3 CHANNEL EQUALIZATION:
The inverse model of a system having an unknown transfer function is itself a system having a
transfer function which is in some sense a best fit to the reciprocal of the unknown transfer
function. Sometimes the inverse model response contains a delay which is deliberately
incorporated to improve the quality of the fit. In Fig. 4.4, a source signal s(n) is fed into an
unknown system that produces the input signal x(n) for the adaptive filter. The output of the
adaptive filter is subtracted from a desired response signal that is a delayed version of the source
signal, such that( ) ( )n nd = - D
Where is a positive integer value. The goal of the adaptive filter is to adjust its characteristics
such that the output signal is an accurate representation of the delayed source signal.
There are many applications of adaptive inverse model of a system. If the system is a
communication channel then the inverse model is an adaptive equalizer which compensates the
effects of inter symbol interference (ISI) caused due to restriction of channel bandwidth [38].
Similarly if this system is the models of a high density recording medium then its correspondinginverse model reconstruct the recorded data without distortion [39]. If the system represents a
nonlinear sensor then its inverse model represents a compensator of environmental as well as
inherent nonlinearities [40]. The adaptive inverse model also finds applications in adaptive
control [41] as well as in deconvolution in geophysics application [42]
[54]
-
7/31/2019 Final Thesis Pk12
55/97
Fig. 4.4: Inverse Modeling
Channel equalization is a technique of decoding of transmitted signals across non ideal
Communication channels. The transmitter sends a sequence s(n) that is known to both the
transmitter and receiver. However, in equalization, the received signal is used as the input
Signalx(n) to an adaptive filter, which adjusts its characteristics so that its output closely
matches a delayed version ( )nS - D of the known transmitted signal. After a suitable
adaptation period, the coefficients of the system either are fixed and used to decode future
transmitted messages or are adapted using a crude estimate of the desired response signal that is
computed from y(n) . This latter mode of operation is known as decision-directed adaptation.
Channel equalization is one of the first applications of adaptive filters and is described in the
pioneering work of Lucky. Today, it remains as one of the most popular uses of an adaptive
filter. Practically every computer telephone modem transmitting at rates of 9600 bits per second
or greater contains an adaptive equalizer. Adaptive equalization is also useful for wireless
communication systems. Qureshi [43] has written an excellent tutorial on adaptive equalization.
A related problem to equalization is deconvolution, a problem that appears in the context of
geophysical exploration.
[55]
System/Plant/Channel Adaptive Filter
Delay
Update Algorithm
(n)+
+
x(n) y(n)
+
e(n)S(n)
-
-
7/31/2019 Final Thesis Pk12
56/97
In many control tasks, the frequency and phase characteristics of the plant hamper the
convergence behavior and stability of the control system. We can use an adaptive filter shown
in Fig. 4.4 to compensate for the nonideal characteristics of the plant and as a method for
adaptive control. In this case, the signals(n) is sent at the output of the controller, and the signal
x(n) is the signal measured at the output of the plant. The coefficients of the adaptive filter are
then adjusted so that the cascade of the plant and adaptive filter can be nearly represented by the
pure delay z-.
Transmission and storing of high density digital information plays an important role in the
present age of information technology. Digital information obtained from audio, video or text
sources needs high density storage or transmission through communication channels.
Communication channels and recording medium are often modeled as band-limited channel for
which the channel impulse response is that of an ideal low pass filter. When sequences ofsymbols are transmitted recorded, the low pass filtering of the channel distorts the transmitted
symbols over successive time intervals causing symbols to spread and overlap with adjacent
symbols. This resulting linear distortion is known as inter symbol interference. In addition
nonlinear distortion is also caused by cross talk in the channel and use of amplifiers. In the data
storage channel, the binary data is stored in the form of tiny magnetized regions called bit cells,
arranged along the recording track. At read back, noise and nonlinear distortions (ISI) corrupt
the signal. An ANN based equalization technique has been proposed to alleviate the ISI present
during read back from the magnetic storage channel. Recently, Sun et al [44] have reported an
improved Vitoria detector to compensate the nonlinearities and media noise. Thus adaptive
channel equalizers play an important role in recovering digital information from digital
communication channels/storage media. Preparta had suggested a simple and attractive scheme
for dispersal recovery of digital information based on the discrete Fourier transform.
Subsequently Gibson et al have reported an efficient nonlinear ANN structure for reconstructing
digital signal which has passed through a dispersive channel and corrupted with additive noise.
In a recent publication the authors have proposed optimal preprocessing strategies for perfect
reconstruction of binary signals from a dispersive communication channels. Tourietal have
developed deterministic worst case framework for perfect reconstruction of discrete data
transmission through a dispersive communication channel. In recent past, new adaptive
equalizers have been suggested using soft computing tools such as artificial neural network
[56]
-
7/31/2019 Final Thesis Pk12
57/97
(ANN), polynomial perception network (PPN) and the functional link artificial neural network
(FLANN). It is reported that these methods are best suited for nonlinear and complex channels.
Recently, Chebyshev artificial neural network has also been proposed for nonlinear channel
equalization [45]. The drawback of these methods is that the estimated weights may likely fall
to local minima during training. For this reason genetic algorithm (GA) [46] and Differential
evolution [19] has been suggested for training adaptive channel equalizers. The main attraction
of GA lies in the fact that it does not rely on Newton like gradient-descent methods, and hence
there is no need for calculation of derivatives. This makes them less likely to be trapped in local
minima. But only two parameters of GA, the crossover and the mutation, help to avoid local
minima problem.
4.3.1 TRANSVERSAL EQUALIZER:
The transversal equalizer uses a time-delay vector, Y (n) (4.7), of channel output samples to
determine the symbol class. The {m} TE notation used to represent the transversal equalizer
specifies m inputs. The equalizer filter output will be classified through a threshold activation
device (Fig4.5) so that the equalizer decision will belong to one of the BPSK states u(n) {1,
+1}
Y (n) = [y (n), y (n 1)... y (n (m