1 Neural Networks for Solving Systems of Linear Equations; Minimax, Least Absolute Value and Least...

1

Neural Networks for Solving Systems of Linear Equations; Minimax, Least Absolute Value and Least Square Problems in Real Time

Presented by :

Yasaman Farahani

Maryam KhordadLeila Pakravan Nejad

Andrzej Cichocki

2

Introduction

Solving systems of linear equations is considered to be one of the basic problems widely encountered in science and engineering since it is frequently used in many applications.

Every linear parameter estimation problem gives rise to a set of linear equations Ax=b.

This problem arises in a broad class of scientific desciplines such as signal processing,robotics,automatic control,system theory,statistics,and physics.

In many applications a real time solution of a set of linear equations ( or,equivalently, an online inversion of matrices) is desired.

We employ artificial neural networks (ANN’s) which can be considered specialized analog computers relying on strongly simplified models of neurons.

This has led to new theoretical results and advances in VLSI technology that make it possible to fabricate microelectronic networks of high comlexity.

3

Formulation of the Basic Problem

Consider the linear parameter estimation model :

It is desired to find in real time a solution x* if an exact (error-free) solution exists at all or to find an approximate solution that comes as close as possible to to a true solution(the best estimates of the solution vector x*).

The key step is to construct an appropriate energy function(Lyapunov function) E(x) so that the lowest energy state will correspond to the desired solution x*.

The derivation of the energy function will transform the minimization problem into a set of ordinary diffrential or difference equations on the basis of ANN architectures with appropriate synaptics weights,input excitations, and nonlinear activation functions.

4

Find the vector that minimizes the energy function:

The following cases have special importance:

5

The proper choice of the criterion used depends on the specific applications and greatly on the distribution of the errors in the measurement vector b.

The standard least square criterion is optimal for a Gaussian distribution of the noise,however this assumption in frequently unrealistic due to different sources of errors such as instrument errors,modeling errors,sampling errors,and human errors.

In order to use the influence of outliers (large errors) the more robust iteratively reweighted least squares technique can be used. In the presense of outliers an alternative approach is to use the least absolute value criterion.

6

NEURON-LIKE ARCHITECTURES FOR SOLVING SYSTEMS OF LINEAR EQUATIONS

Standard Least Squares Criterion:

is an n x n positive-definite matrix that is often diagonal.

The entries of the matrix depend on the time and the vector x.

7

NEURON-LIKE ARCHITECTURES FOR SOLVING SYSTEMS OF LINEAR EQUATIONS, Cont.

The basic idea is to compute a trajectory x(t) starting at the initial point x(0) that has the solution x* as a limit point.

The specific choice of the coefficients must ensure the stability of the differential equations and an appropriate convergence speed to the stationary solution (equilibrium) state.

the system of above differential equations is stable (i.e., it has always a stable asymptotic solution) since

under the condition that the matrix is positive-definite for all values of x and t , and in absence of round-off errors in the matrix A.

8

Fig. 1. Schematic architecture of an artificial neural network for solving asystem of linear equations Ax = b.

9

Iteratively Reweighted Least Squares Criterion

In order to diminish the influence of the outliers we will employ the iteratively reweighted least squares criterion.

Applying the gradient approach for the minimization of the energy function a we obtain the system of differential equations:

10

Iteratively Reweighted Least Squares Criterion,Cont.

Adaptive selection of the can greatly increase the convergence rate without causing stability problems as could arise by use of higher (fixed) constant values of .

The use of sigmoid nonlinearities in the first layer of "neurons" is essential for overdetermined linear systems of equations since it enables us to obtain more robust solutions, which are less insensitive to outliers (in comparison to the standard linear implementation), by compressing large residuals and preventing their absolute values from being greater than the prescribed cut-off parameter .

11

Special Cases with Simpler Architectures

An important class of Ax = b problems are the well-scaled and well-conditioned problems for which the eigenvalues of the matrix A are clustered in a set containing eigenvalues of similar magnitude. In such a case the matrix differential equation will be:

is a positive scalar coefficient and is an arbitrary n x n nonsingular matrix that should be chosen such that the matrix is a positive stable matrix.

The stable equilibrium point x* (for dx / dt = 0) does not depend on the value , and the coefficients of the matrix .

matrix can be a diagonal matrix with entries , i.e., the set of differential equations can take the form:

is the nonlinear sigmoid function.

12

Special Cases with Simpler Architectures, Cont.

For some well-conditioned problems, instead of minimizing one global energy function E(x),it is possible to minimize simultaneously n local energy functions defined by

Applying a general gradient method for each energy function:

Fig. 2. Simplified architecture of an ANN for the solution of a system of linear equations with a diagonally dominant matrix.

13


Analogously, for the above energy function we can obtain

To find sufficient conditions for the stability of such a circuit we can use Lyapunov method:

Hence we estimate that if

Assuming, for example, that

we have

14


The above condition means that the system is stable if the matrix A is diagonally dominant.

We can derive a sufficient condition for the stability of the above systems:

15

Positive Connection

In some practical implementations of ANN’s it is convenient to have all gains (connection weights) aij positive.This can easily be achieved by the extension of the system of linear equations :

With diffrerent signs of the entries aij to the following form with all positive.

Since both of the above systems of linear equations must be equivalent with respect to the variables the following relation must be satisfied:

16

Positive Connection, Cont.

So we obtain:

From the above formula it is evident that we can always choose auxiliary entries and so that all entries will be positive.

instead of solving the original problem, it is possible to solve the problem with all the entries (connection weights) positive.

17

Improved Circuit Structures for ill-Conditioned Problems

For ill_conditioned problems proposed schemes may be prohibitively slow and they mail even fail to find an appropriate solution or they may find a solution with large error.

This can be explained by the fact that for ill-conditioned problem we may obtain a system of stiff diffrencial equations.

The system of stiff differential equations is one that is stable but exhibits a wide difference in the behavior of the individual components of the solution.

The essence of a system of stiff differential equations is that one has a very slowly varying solution (trajectory) which is such that some perturbation to it is rapidly damped.

For a linear system of differential equations this happens when the time constants of the system, i.e., the reciprocals of the eigenvalues of the matrix A are widely different.

18

Augmented Lagrangian with Regularization Motivated by the desire to alleviate the stiffness of the differential equations

and simultaneously to improve the convergence properties and the accuracy of the desired networks, we will develop a new ANN architecture with improved performance.

For this purpose we construct the following energy function (augmented Lagrangian function) for the linear parameter estimation problem:

where

19

Augmented Lagrangian with Regularization, Cont.

The augmented Lagrangian is obtained from the ordinary (common) Lagrangian by adding penalty terms.

Since an augmented Lagrangian can be ill-conditioned a regularization term with coefficient a is introduced to eliminate the instabilities associated with the penalty terms.

The problem of minimization of the above defined energy function can be transferred to the set of differential equations:

20


The above set of equations can be written in the compact matrix form:

Fig 3. General architecture of an ANN for matrix inversion.

21

In comparison to the architecture given in Fig. 1, the circuit contains extra damped integrators and amplifiers (gains k,).

The addition of these extra gains and integrators does not change the stationary point x* but, as shown by computer simulation experiments, helps to damp parasitic oscillations, improves the final accuracy, and increases the convergence speed (decreases the settling time).

Analogous to our previous considerations auxiliary sigmoid nonlinearities can be incorporated in the first layer of computing units (i.e., adders) in order to reduce the influence of outliers.


22

Preconditioning

Preconditioning techniques form a class of linear transformations of the matrix A or the vector x that improve the eigenvalue structure of the specified energy function and alleviate the stiffness of the associated system of differential equations.

The simplest technique that enables us to incorporate preconditioning in an ANN implementation is to apply a linear transformation x = My where M is an appropriate matrix, i.e., instead of minimizing the energy function we can minimize the modified energy function:

The above problem can be solved by the simulating system of differential equations:

where is a positive scalar coefficient. Multiplying above equation by the matrix M, we get:

23

Preconditioning, Cont.

Setting we get a system of differential equations already considered in standard least squares criterion .

Thus the realization of a suitable symmetric positive-definite matrix instead of a simple scalar enables us to perform preconditioning, which may considerably improve the convergence properties of the system.

24

Artificial Neural Network with Time Processing Independent of the Size of the Problem

The systems of differential equations considered above cause the trajectory x(t) to converge to a desired solution x* only for ,although the convergence speed can be very high.

In some real-time applications it is required to assure that the specified energy function E(x) reach the minimum at a prescribed finite period of time, say or that E(x) becomes close to the minimum with a specified error (where is an arbitrarily chosen positive very small number).

In other words, we can define the reachability time as the settling time after which the energy function E(x) enters a neighborhood of the minimum and remains there ever after the moment .

Such a problem can be solved by making the coefficients of the matrix ( adaptive during the minimization process, under the assumption that the initial value and the minimum (final) value E(x*) of the energy function E(x(t)) are known or can be estimated.

25

Artificial Neural Network with Time Processing Independent of the Size of the Problem, Cont.

consider the Ax = b problem with a nonsingular matrix A, which can be mapped to the system of differential equations :

The adaptive parameter can be defined as:

For this problem

26

Hence it follows that the energy function decreases in time linearly during the minimization process as:

and reaches the value (very close to the minimum) after the time

By choosing , we find that the system of above equations reaches the equilibrium (stationary point) in the prescribed time independent of the size of the problem. The system of differential equations (45) can (approximately) be implemented by the ANN shown in Fig. 4 employing auxiliary analog multipliers and dividers.

27

Neural Networks for Linear Programming

The ANN architectures considered in the previous sections can easily be employed for the solution of a linear programming problem which can be stated in standard form as follows:

Minimize the scalar cost function:

Subject to the linear constraints:

By use of the modified Lagrange multiplier approach we can construct the computation energy function

is a regularization parameter The problem of minimization of the energy function E(x) can be transformed

to a set of differential equations:

28

Neural Networks for Linear Programming,cont.

and mean the integration time constants of the integrators. The circuit consists of adders (summing amplifiers) and integrators.Diodes

used in the feedback from the integrators assure that the output voltages xj are non-negative (i.e., ).

A regularization in this circuit is performed by using local feedback with gain around appropriate integrators.

Fig. 4. A conceptual ANN implementation of linear programming.

29

Minimax and Least Absolute Value Problems

30

Goal The goal is to extend proposed class to new ANN’s

witch are capable in real time to find estimates of the solution vectors x* and residual vectors r(x*)=Ax*-b for the linear model using the minimax and least absolute values criteria

bAx

31

Lp-NORMED MINIMIZATION

Lp-normed error function:

Steepest decsent method:

Learning rate:

pm

i

ip rp

xE

1

1)( p1

m

j

jjiji bxaxr1

)( ),...,2,1( mi

j

pj

j

xxE

dt

dx

)(

01

j

j

32


) j=1,2,…,n( )]([1

xrgm

ia

dt

dxiijj

j

0)(1

0)(1)(

xrif

xrifxrsign

i

ii

33


34


35


L1-norm:

: normL )(max)(1

xrxE imi

otherwise

xrxrifxrsignxrg

kmk

iii

0

})({max)()]([)( 1

36

Lp-NORMED MINIMIZATION for p=1 and p=∞ have discontinuous partial

first order derivatives. is piecewise differentiable, with a possible

derivative discontinuity at x if for some i. derivative discontinuity at x if

for some i≠k. The presence of discontinuities in the derivatives are

often responsible for various anomalous result. The direct implementation of these activation

functions is difficult and impractical.

0)( xri)(1 xE

)(xEp

)(xE

)()()( xExrxr ki

37

MINIMAX (L∞-Norm)

We transform the minimax problem:

Into equivalent one: minimize ε

Subject to the constraint:

Thus the problem can be viewed as finding the smallest nonnegative value of

is the vector of the optimal value of the parameters.

)(maxmin1

xrimiRx n

0,)( xri

0*)(* xE*x

38

NN Architecture Using Quadratic Penalty Function Terms

Are penalty coefficients and

m

i

ii xrxrk

vxE1

22 )()(2

),(

0,0 kv},0min{:][ yy

39


Steepest decsent method:

m

i

iiii SxrSxrk

v

dt

d

1

210 )()(

)0()0(

m

i

iiiiijjj

SxrSxradt

dx

1

21 )()(

),...,2,1()0( )0( njxx jj

0,00 j

40


41


• The system of differential equation can be simplified by incorporating adaptive nonlinear building blocks.

42


43

NN Architecture Using Exact Penalty Method

44


45


46


47


48


Modifying minimax problem:

set of new equations:

49


50


One advantage of the proposed circuit is that it does not require to use precision signum activation function and absolute value function generators.

51

LEAST ABSOLUTE VALUES (L1-NORM)

Find the design vector that minimizes the energy function

52

Neural Network Model by Using the Inhibition Principle

The function of Inhibition subnetwork is to suppress some signals while allowing the other signals to be transmitted for further processing.

Theorem: there is a minimizer of the energy function for witch the residuals for at least n values of i, say

where n denotes the rank of the matrix A.

53

Neural Network Model by Using the Inhibition Principle

54

Simplified NN for Solving Linear Least Squares and Total Least Squares Problems

55

Objective

analog circuit designing of a neural network for implementing such adaptive algorithms

propose some extensions and modifications of the existing adaptive algorithms

demonstrate the validity and high performance of the proposed neural network models by computer simulation experiments

56

Problem Formulation

In least squares (LS) approach, matrix A are assumed to be free from error and all errors are confined to the observation vector b

Definition of a cost (error function) E(x)

57

Problem Formulation

By using a standard gradient approach for the minimization of the cost function the problem can be mapped to the system of linear differential equations

it requires extra precalculations and it is inconvenient for large matrices especially when the entries aij and/or bi are time variable

58

Motivation The ordinary LS problem is optimal only if all errors

are confined to the observation vector b and they have Gaussian distribution.

The measurements in the data matrix A are assumed to be free from errors. However, such an assumption is often unrealistic (e.g., in image recognition and computer vision) since sampling errors, modeling errors and instrument errors may imply noise inaccuracies of the data matrix A

The total least squares problem (TLS) has been devised as a more global and often more reliable fitting method than the standard LS problem for solving an overdetermined set of linear equations when the measurement in b as well as in A are subject to errors

59

A Simplified Neuron ForThe Least Squares Problem

In the design of an algorithm for neural networks the key step is to construct an appropriate cost (computational energy) function E(x) so that the lowest energy state will correspond to the desired solution x*

The formulation of the cost function enables us to transform the minimization problem into a system of differential equations on the basis of which we design an appropriate neural network with associated learning algorithm.

For our purpose we have developed the following instantaneous error function

60


The actual error e(t) can be written as

61


For the so-formulated error e ( t ) we can construct the instantaneous estimate of the energy (cost) function at time t as

The minimization of the cost (computational energy) function leads to the set of differential equations

62


The system of the above differential equations can be written in the compact matrix form

The system of these differential equations constitutes the basic adaptive learning algorithm of a single artificial neuron (processing unit)

63


64

Loss Functions

There are many possible loss functions p(e) which can be employed as the cost function

The absolute value functionHuber’s functionTalvar’s functionlogistic function

65

Standard Regularized Least Squares LS Problem

Find the vector x*LS which minimizes the cost function

The minimization of the cost function according to the gradient descent rule leads to the learning algorithm

66

Neural Network Implementations

67

About Implementation The network consists of analog integrators,

summers and analog multipliers. The network is driven by the independent source

signals si(t) multiplied by the incoming data aij, b;(i = 1,2, . . . , m; j = 1,2, . . , n) .

The artificial neuron (processing unit) with an on chip adaptive learning algorithm shown in Fig. allows processing of the input information (contained in the available input data aij, bi) fully simultaneously, i.e., all m equations are acted upon simultaneously in time.

This is the important feature of the proposed neural network.

68

Adaptive Learning Algorithms for the TLS problem

For the TLS problem formulated in previous, we can construct the instantaneous energy function

69

Adaptive Learning Algorithms for the TLS problem

The above set of differential equations constitutes a basic adaptive parallel learning algorithm for solving the TLS problem for overdetermined linear systems.

70

Analog (continuous-time) implementation of the algorithm

71

Extensions And GeneralizationsOf Neural Network Models It is interesting that the neural network models shown in previous

Figs. can be employed not only to solve LS or TLS problems but they can easily be modified and/or extended to related problems.

by changing the value of the parameter β more or less emphasis can be given to errors of the matrix A with respect to errors of the vector b.

for large β (say β = 100) it can be assumed that the vector b is almost free of error and the error lies in the data matrix A only.

Such a case is referred to as the so called DLS (data least squares) problem (since the error occurs in A but not in b)

The DLS problem can be solved by simulating the system of differential equations

72

Extensions And Generalizations

For complex-valued elements (signals) the algorithm can further be generalized as

β = 0 for the LS-problem β = 1 for the TLS problem β >> 1 for the DLS problem

73

Computer Simulation Result(LS) Example 1 : Consider the problem of finding the minimal L2-norm

solution of the underdetermined system of linear equations

The above set of equations has infinity many solutions. There is a unique minimum norm solution which we want to find. The final solution (equilibrium point) was

x* = [0.0882,0.1083,0.2733,0.5047,0.3828, -0.30971T which is in excellent agreement with the exact minimum L2-norm solution obtained by using MATLAB.

74

Computer Simulation Result

Example 2: Let us consider the following linear parameter estimation problem described by the set of linear equations

75

Simulation Results(LS, TLS, DLS)

Time: less than 400 ns

76

Simulation Results (MINIMAX, Least Absolute)

• MINIMAX problem

Theoretical solution:

Last proposed NN:

Time: 300 ns

77


Least Absolute Value:

Theoretical solution:

First proposed NN:

solution:

Time: 60 ns

Last proposed NN:

solution in first phase:

solution in second phase:

Time:100ns

78

Simulation Results (Iteratively reweighted LS, …)

Iteratively reweighted least squares criterion

for

Standard Least Square:

79


• Last NN:

• First NN:

• Example 3:

80

Simulation Results (Iteratively reweighted, …)

Iteratively reweighted least square criterion Time= 750 ns

Augmented lagrangian with regularization Time=52 ns

ANN providing linearly decreasing the energy function in time in a prescribed speed of convergence

Time=10 ns

81

Simulation Results (Iteratively reweighted, …)

Example 4:

inverse of the matrix

In order to find the inverse matrix we need to make the source vector b successively [1, 0, 0] T, [0, 1, 0] T, [0, 0, 1] T.

Time=50ns

82

Conclusion

very simple and low-cost analog neural networks for solving least squares and TLS problems

using only one single highly simplified artificial neuron with an on chip learning capability

able to estimate the unknown parameters in real time (hundreds or thousands of nanoseconds)

suitable for currently available VLSI implementations attractive for real time and/or high throughput rate

applications when the observation vector and the model matrix are changing in time

83

Conclusion

universal and flexible allows either a processing of all equations

fully simultaneously or a processing of groups of equations (i.e., blocks) in iterative steps

allows the processing only of one equation per block, i.e., in each iterative step only one single equation can be processed

1 Neural Networks for Solving Systems of Linear Equations; Minimax, Least Absolute Value and Least...

Documents

Transcript of 1 Neural Networks for Solving Systems of Linear Equations; Minimax, Least Absolute Value and Least...