1 Neural Networks for Solving Systems of Linear Equations; Minimax, Least Absolute Value and Least...
-
Upload
julia-fleming -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Neural Networks for Solving Systems of Linear Equations; Minimax, Least Absolute Value and Least...
1
Neural Networks for Solving Systems of Linear Equations; Minimax, Least Absolute Value and Least Square Problems in Real Time
Presented by :
Yasaman Farahani
Maryam KhordadLeila Pakravan Nejad
Andrzej Cichocki
2
Introduction
Solving systems of linear equations is considered to be one of the basic problems widely encountered in science and engineering since it is frequently used in many applications.
Every linear parameter estimation problem gives rise to a set of linear equations Ax=b.
This problem arises in a broad class of scientific desciplines such as signal processing,robotics,automatic control,system theory,statistics,and physics.
In many applications a real time solution of a set of linear equations ( or,equivalently, an online inversion of matrices) is desired.
We employ artificial neural networks (ANN’s) which can be considered specialized analog computers relying on strongly simplified models of neurons.
This has led to new theoretical results and advances in VLSI technology that make it possible to fabricate microelectronic networks of high comlexity.
3
Formulation of the Basic Problem
Consider the linear parameter estimation model :
It is desired to find in real time a solution x* if an exact (error-free) solution exists at all or to find an approximate solution that comes as close as possible to to a true solution(the best estimates of the solution vector x*).
The key step is to construct an appropriate energy function(Lyapunov function) E(x) so that the lowest energy state will correspond to the desired solution x*.
The derivation of the energy function will transform the minimization problem into a set of ordinary diffrential or difference equations on the basis of ANN architectures with appropriate synaptics weights,input excitations, and nonlinear activation functions.
4
Find the vector that minimizes the energy function:
The following cases have special importance:
5
The proper choice of the criterion used depends on the specific applications and greatly on the distribution of the errors in the measurement vector b.
The standard least square criterion is optimal for a Gaussian distribution of the noise,however this assumption in frequently unrealistic due to different sources of errors such as instrument errors,modeling errors,sampling errors,and human errors.
In order to use the influence of outliers (large errors) the more robust iteratively reweighted least squares technique can be used. In the presense of outliers an alternative approach is to use the least absolute value criterion.
6
NEURON-LIKE ARCHITECTURES FOR SOLVING SYSTEMS OF LINEAR EQUATIONS
Standard Least Squares Criterion:
is an n x n positive-definite matrix that is often diagonal.
The entries of the matrix depend on the time and the vector x.
7
NEURON-LIKE ARCHITECTURES FOR SOLVING SYSTEMS OF LINEAR EQUATIONS, Cont.
The basic idea is to compute a trajectory x(t) starting at the initial point x(0) that has the solution x* as a limit point.
The specific choice of the coefficients must ensure the stability of the differential equations and an appropriate convergence speed to the stationary solution (equilibrium) state.
the system of above differential equations is stable (i.e., it has always a stable asymptotic solution) since
under the condition that the matrix is positive-definite for all values of x and t , and in absence of round-off errors in the matrix A.
8
Fig. 1. Schematic architecture of an artificial neural network for solving asystem of linear equations Ax = b.
9
Iteratively Reweighted Least Squares Criterion
In order to diminish the influence of the outliers we will employ the iteratively reweighted least squares criterion.
Applying the gradient approach for the minimization of the energy function a we obtain the system of differential equations:
10
Iteratively Reweighted Least Squares Criterion,Cont.
Adaptive selection of the can greatly increase the convergence rate without causing stability problems as could arise by use of higher (fixed) constant values of .
The use of sigmoid nonlinearities in the first layer of "neurons" is essential for overdetermined linear systems of equations since it enables us to obtain more robust solutions, which are less insensitive to outliers (in comparison to the standard linear implementation), by compressing large residuals and preventing their absolute values from being greater than the prescribed cut-off parameter .
11
Special Cases with Simpler Architectures
An important class of Ax = b problems are the well-scaled and well-conditioned problems for which the eigenvalues of the matrix A are clustered in a set containing eigenvalues of similar magnitude. In such a case the matrix differential equation will be:
is a positive scalar coefficient and is an arbitrary n x n nonsingular matrix that should be chosen such that the matrix is a positive stable matrix.
The stable equilibrium point x* (for dx / dt = 0) does not depend on the value , and the coefficients of the matrix .
matrix can be a diagonal matrix with entries , i.e., the set of differential equations can take the form:
is the nonlinear sigmoid function.
12
Special Cases with Simpler Architectures, Cont.
For some well-conditioned problems, instead of minimizing one global energy function E(x),it is possible to minimize simultaneously n local energy functions defined by
Applying a general gradient method for each energy function:
Fig. 2. Simplified architecture of an ANN for the solution of a system of linear equations with a diagonally dominant matrix.
13
Special Cases with Simpler Architectures, Cont.
Analogously, for the above energy function we can obtain
To find sufficient conditions for the stability of such a circuit we can use Lyapunov method:
Hence we estimate that if
Assuming, for example, that
we have
14
Special Cases with Simpler Architectures, Cont.
The above condition means that the system is stable if the matrix A is diagonally dominant.
We can derive a sufficient condition for the stability of the above systems:
15
Positive Connection
In some practical implementations of ANN’s it is convenient to have all gains (connection weights) aij positive.This can easily be achieved by the extension of the system of linear equations :
With diffrerent signs of the entries aij to the following form with all positive.
Since both of the above systems of linear equations must be equivalent with respect to the variables the following relation must be satisfied:
16
Positive Connection, Cont.
So we obtain:
From the above formula it is evident that we can always choose auxiliary entries and so that all entries will be positive.
instead of solving the original problem, it is possible to solve the problem with all the entries (connection weights) positive.
17
Improved Circuit Structures for ill-Conditioned Problems
For ill_conditioned problems proposed schemes may be prohibitively slow and they mail even fail to find an appropriate solution or they may find a solution with large error.
This can be explained by the fact that for ill-conditioned problem we may obtain a system of stiff diffrencial equations.
The system of stiff differential equations is one that is stable but exhibits a wide difference in the behavior of the individual components of the solution.
The essence of a system of stiff differential equations is that one has a very slowly varying solution (trajectory) which is such that some perturbation to it is rapidly damped.
For a linear system of differential equations this happens when the time constants of the system, i.e., the reciprocals of the eigenvalues of the matrix A are widely different.
18
Augmented Lagrangian with Regularization Motivated by the desire to alleviate the stiffness of the differential equations
and simultaneously to improve the convergence properties and the accuracy of the desired networks, we will develop a new ANN architecture with improved performance.
For this purpose we construct the following energy function (augmented Lagrangian function) for the linear parameter estimation problem:
where
19
Augmented Lagrangian with Regularization, Cont.
The augmented Lagrangian is obtained from the ordinary (common) Lagrangian by adding penalty terms.
Since an augmented Lagrangian can be ill-conditioned a regularization term with coefficient a is introduced to eliminate the instabilities associated with the penalty terms.
The problem of minimization of the above defined energy function can be transferred to the set of differential equations:
20
Augmented Lagrangian with Regularization, Cont.
The above set of equations can be written in the compact matrix form:
Fig 3. General architecture of an ANN for matrix inversion.
21
In comparison to the architecture given in Fig. 1, the circuit contains extra damped integrators and amplifiers (gains k,).
The addition of these extra gains and integrators does not change the stationary point x* but, as shown by computer simulation experiments, helps to damp parasitic oscillations, improves the final accuracy, and increases the convergence speed (decreases the settling time).
Analogous to our previous considerations auxiliary sigmoid nonlinearities can be incorporated in the first layer of computing units (i.e., adders) in order to reduce the influence of outliers.
Augmented Lagrangian with Regularization, Cont.
22
Preconditioning
Preconditioning techniques form a class of linear transformations of the matrix A or the vector x that improve the eigenvalue structure of the specified energy function and alleviate the stiffness of the associated system of differential equations.
The simplest technique that enables us to incorporate preconditioning in an ANN implementation is to apply a linear transformation x = My where M is an appropriate matrix, i.e., instead of minimizing the energy function we can minimize the modified energy function:
The above problem can be solved by the simulating system of differential equations:
where is a positive scalar coefficient. Multiplying above equation by the matrix M, we get:
23
Preconditioning, Cont.
Setting we get a system of differential equations already considered in standard least squares criterion .
Thus the realization of a suitable symmetric positive-definite matrix instead of a simple scalar enables us to perform preconditioning, which may considerably improve the convergence properties of the system.
24
Artificial Neural Network with Time Processing Independent of the Size of the Problem
The systems of differential equations considered above cause the trajectory x(t) to converge to a desired solution x* only for ,although the convergence speed can be very high.
In some real-time applications it is required to assure that the specified energy function E(x) reach the minimum at a prescribed finite period of time, say or that E(x) becomes close to the minimum with a specified error (where is an arbitrarily chosen positive very small number).
In other words, we can define the reachability time as the settling time after which the energy function E(x) enters a neighborhood of the minimum and remains there ever after the moment .
Such a problem can be solved by making the coefficients of the matrix ( adaptive during the minimization process, under the assumption that the initial value and the minimum (final) value E(x*) of the energy function E(x(t)) are known or can be estimated.
25
Artificial Neural Network with Time Processing Independent of the Size of the Problem, Cont.
consider the Ax = b problem with a nonsingular matrix A, which can be mapped to the system of differential equations :
The adaptive parameter can be defined as:
For this problem
26
Hence it follows that the energy function decreases in time linearly during the minimization process as:
and reaches the value (very close to the minimum) after the time
By choosing , we find that the system of above equations reaches the equilibrium (stationary point) in the prescribed time independent of the size of the problem. The system of differential equations (45) can (approximately) be implemented by the ANN shown in Fig. 4 employing auxiliary analog multipliers and dividers.
27
Neural Networks for Linear Programming
The ANN architectures considered in the previous sections can easily be employed for the solution of a linear programming problem which can be stated in standard form as follows:
Minimize the scalar cost function:
Subject to the linear constraints:
By use of the modified Lagrange multiplier approach we can construct the computation energy function
is a regularization parameter The problem of minimization of the energy function E(x) can be transformed
to a set of differential equations:
28
Neural Networks for Linear Programming,cont.
and mean the integration time constants of the integrators. The circuit consists of adders (summing amplifiers) and integrators.Diodes
used in the feedback from the integrators assure that the output voltages xj are non-negative (i.e., ).
A regularization in this circuit is performed by using local feedback with gain around appropriate integrators.
Fig. 4. A conceptual ANN implementation of linear programming.
29
Minimax and Least Absolute Value Problems
30
Goal The goal is to extend proposed class to new ANN’s
witch are capable in real time to find estimates of the solution vectors x* and residual vectors r(x*)=Ax*-b for the linear model using the minimax and least absolute values criteria
bAx
31
Lp-NORMED MINIMIZATION
Lp-normed error function:
Steepest decsent method:
Learning rate:
pm
i
ip rp
xE
1
1)( p1
m
j
jjiji bxaxr1
)( ),...,2,1( mi
j
pj
j
xxE
dt
dx
)(
01
j
j
32
Lp-NORMED MINIMIZATION
) j=1,2,…,n( )]([1
xrgm
ia
dt
dxiijj
j
0)(1
0)(1)(
xrif
xrifxrsign
i
ii
33
Lp-NORMED MINIMIZATION
34
Lp-NORMED MINIMIZATION
35
Lp-NORMED MINIMIZATION
L1-norm:
: normL )(max)(1
xrxE imi
otherwise
xrxrifxrsignxrg
kmk
iii
0
})({max)()]([)( 1
36
Lp-NORMED MINIMIZATION for p=1 and p=∞ have discontinuous partial
first order derivatives. is piecewise differentiable, with a possible
derivative discontinuity at x if for some i. derivative discontinuity at x if
for some i≠k. The presence of discontinuities in the derivatives are
often responsible for various anomalous result. The direct implementation of these activation
functions is difficult and impractical.
0)( xri)(1 xE
)(xEp
)(xE
)()()( xExrxr ki
37
MINIMAX (L∞-Norm)
We transform the minimax problem:
Into equivalent one: minimize ε
Subject to the constraint:
Thus the problem can be viewed as finding the smallest nonnegative value of
is the vector of the optimal value of the parameters.
)(maxmin1
xrimiRx n
0,)( xri
0*)(* xE*x
38
NN Architecture Using Quadratic Penalty Function Terms
Are penalty coefficients and
m
i
ii xrxrk
vxE1
22 )()(2
),(
0,0 kv},0min{:][ yy
39
NN Architecture Using Quadratic Penalty Function Terms
Steepest decsent method:
m
i
iiii SxrSxrk
v
dt
d
1
210 )()(
)0()0(
m
i
iiiiijjj
SxrSxradt
dx
1
21 )()(
),...,2,1()0( )0( njxx jj
0,00 j
40
NN Architecture Using Quadratic Penalty Function Terms
41
NN Architecture Using Quadratic Penalty Function Terms
• The system of differential equation can be simplified by incorporating adaptive nonlinear building blocks.
42
NN Architecture Using Quadratic Penalty Function Terms
43
NN Architecture Using Exact Penalty Method
44
NN Architecture Using Exact Penalty Method
45
NN Architecture Using Exact Penalty Method
46
NN Architecture Using Exact Penalty Method
47
NN Architecture Using Exact Penalty Method
48
NN Architecture Using Exact Penalty Method
Modifying minimax problem:
set of new equations:
49
NN Architecture Using Exact Penalty Method
50
NN Architecture Using Exact Penalty Method
One advantage of the proposed circuit is that it does not require to use precision signum activation function and absolute value function generators.
51
LEAST ABSOLUTE VALUES (L1-NORM)
Find the design vector that minimizes the energy function
52
Neural Network Model by Using the Inhibition Principle
The function of Inhibition subnetwork is to suppress some signals while allowing the other signals to be transmitted for further processing.
Theorem: there is a minimizer of the energy function for witch the residuals for at least n values of i, say
where n denotes the rank of the matrix A.
53
Neural Network Model by Using the Inhibition Principle
54
Simplified NN for Solving Linear Least Squares and Total Least Squares Problems
55
Objective
analog circuit designing of a neural network for implementing such adaptive algorithms
propose some extensions and modifications of the existing adaptive algorithms
demonstrate the validity and high performance of the proposed neural network models by computer simulation experiments
56
Problem Formulation
In least squares (LS) approach, matrix A are assumed to be free from error and all errors are confined to the observation vector b
Definition of a cost (error function) E(x)
57
Problem Formulation
By using a standard gradient approach for the minimization of the cost function the problem can be mapped to the system of linear differential equations
it requires extra precalculations and it is inconvenient for large matrices especially when the entries aij and/or bi are time variable
58
Motivation The ordinary LS problem is optimal only if all errors
are confined to the observation vector b and they have Gaussian distribution.
The measurements in the data matrix A are assumed to be free from errors. However, such an assumption is often unrealistic (e.g., in image recognition and computer vision) since sampling errors, modeling errors and instrument errors may imply noise inaccuracies of the data matrix A
The total least squares problem (TLS) has been devised as a more global and often more reliable fitting method than the standard LS problem for solving an overdetermined set of linear equations when the measurement in b as well as in A are subject to errors
59
A Simplified Neuron ForThe Least Squares Problem
In the design of an algorithm for neural networks the key step is to construct an appropriate cost (computational energy) function E(x) so that the lowest energy state will correspond to the desired solution x*
The formulation of the cost function enables us to transform the minimization problem into a system of differential equations on the basis of which we design an appropriate neural network with associated learning algorithm.
For our purpose we have developed the following instantaneous error function
60
A Simplified Neuron ForThe Least Squares Problem
The actual error e(t) can be written as
61
A Simplified Neuron ForThe Least Squares Problem
For the so-formulated error e ( t ) we can construct the instantaneous estimate of the energy (cost) function at time t as
The minimization of the cost (computational energy) function leads to the set of differential equations
62
A Simplified Neuron ForThe Least Squares Problem
The system of the above differential equations can be written in the compact matrix form
The system of these differential equations constitutes the basic adaptive learning algorithm of a single artificial neuron (processing unit)
63
A Simplified Neuron ForThe Least Squares Problem
64
Loss Functions
There are many possible loss functions p(e) which can be employed as the cost function
The absolute value functionHuber’s functionTalvar’s functionlogistic function
65
Standard Regularized Least Squares LS Problem
Find the vector x*LS which minimizes the cost function
The minimization of the cost function according to the gradient descent rule leads to the learning algorithm
66
Neural Network Implementations
67
About Implementation The network consists of analog integrators,
summers and analog multipliers. The network is driven by the independent source
signals si(t) multiplied by the incoming data aij, b;(i = 1,2, . . . , m; j = 1,2, . . , n) .
The artificial neuron (processing unit) with an on chip adaptive learning algorithm shown in Fig. allows processing of the input information (contained in the available input data aij, bi) fully simultaneously, i.e., all m equations are acted upon simultaneously in time.
This is the important feature of the proposed neural network.
68
Adaptive Learning Algorithms for the TLS problem
For the TLS problem formulated in previous, we can construct the instantaneous energy function
69
Adaptive Learning Algorithms for the TLS problem
The above set of differential equations constitutes a basic adaptive parallel learning algorithm for solving the TLS problem for overdetermined linear systems.
70
Analog (continuous-time) implementation of the algorithm
71
Extensions And GeneralizationsOf Neural Network Models It is interesting that the neural network models shown in previous
Figs. can be employed not only to solve LS or TLS problems but they can easily be modified and/or extended to related problems.
by changing the value of the parameter β more or less emphasis can be given to errors of the matrix A with respect to errors of the vector b.
for large β (say β = 100) it can be assumed that the vector b is almost free of error and the error lies in the data matrix A only.
Such a case is referred to as the so called DLS (data least squares) problem (since the error occurs in A but not in b)
The DLS problem can be solved by simulating the system of differential equations
72
Extensions And Generalizations
For complex-valued elements (signals) the algorithm can further be generalized as
β = 0 for the LS-problem β = 1 for the TLS problem β >> 1 for the DLS problem
73
Computer Simulation Result(LS) Example 1 : Consider the problem of finding the minimal L2-norm
solution of the underdetermined system of linear equations
The above set of equations has infinity many solutions. There is a unique minimum norm solution which we want to find. The final solution (equilibrium point) was
x* = [0.0882,0.1083,0.2733,0.5047,0.3828, -0.30971T which is in excellent agreement with the exact minimum L2-norm solution obtained by using MATLAB.
74
Computer Simulation Result
Example 2: Let us consider the following linear parameter estimation problem described by the set of linear equations
75
Simulation Results(LS, TLS, DLS)
Time: less than 400 ns
76
Simulation Results (MINIMAX, Least Absolute)
• MINIMAX problem
Theoretical solution:
Last proposed NN:
Time: 300 ns
77
Simulation Results (MINIMAX, Least Absolute)
Least Absolute Value:
Theoretical solution:
First proposed NN:
solution:
Time: 60 ns
Last proposed NN:
solution in first phase:
solution in second phase:
Time:100ns
78
Simulation Results (Iteratively reweighted LS, …)
Iteratively reweighted least squares criterion
for
Standard Least Square:
79
Simulation Results (MINIMAX, Least Absolute)
• Last NN:
• First NN:
• Example 3:
80
Simulation Results (Iteratively reweighted, …)
Iteratively reweighted least square criterion Time= 750 ns
Augmented lagrangian with regularization Time=52 ns
ANN providing linearly decreasing the energy function in time in a prescribed speed of convergence
Time=10 ns
81
Simulation Results (Iteratively reweighted, …)
Example 4:
inverse of the matrix
In order to find the inverse matrix we need to make the source vector b successively [1, 0, 0] T, [0, 1, 0] T, [0, 0, 1] T.
Time=50ns
82
Conclusion
very simple and low-cost analog neural networks for solving least squares and TLS problems
using only one single highly simplified artificial neuron with an on chip learning capability
able to estimate the unknown parameters in real time (hundreds or thousands of nanoseconds)
suitable for currently available VLSI implementations attractive for real time and/or high throughput rate
applications when the observation vector and the model matrix are changing in time
83
Conclusion
universal and flexible allows either a processing of all equations
fully simultaneously or a processing of groups of equations (i.e., blocks) in iterative steps
allows the processing only of one equation per block, i.e., in each iterative step only one single equation can be processed