ℓ -Norm Minimization Method for Solving Nonlinear …„“p-Norm Minimization Method for Solving...

12
p -Norm Minimization Method for Solving Nonlinear Systems of Equations ANDR ´ E A. KELLER Laboratoire d’Informatique Fondamentale de Lille/ Section SMAC (UMR 8022 CNRS) Universit´ e de Lille 1 Sciences et Technologies Cit´ e Scientifique, 59100 Lille FRANCE [email protected] Abstract: Solving nonlinear systems of equations also refers to an optimization problem. Moreover, this equiva- lence can be interpreted as an optimal design problem. We must determine the design variables needed to reduce the deviation between an actual vector of valued functions and a vector of desired constants. This study focuses on these two characteristic features for solving nonlinear systems of equations. One variety of numerical two- dimensional systems with multiple solutions helps demonstrate the effectiveness of the transformation process. Key–Words: Nonlinear system, least-squares-problem, regularized least-squares, p -norm, Newton’s method 1 Introduction Solving nonlinear systems of equations and optimiza- tion problems are major challenges in practical engi- neering (see [1, 2]). Indeed in an optimal design prob- lem (see [3]-[7]), the quality of the design is measured by a norm of the deviation between an actual vector function of valued design variables fx) and a vector of desired constants b, i.e., fx) b. The goal is to choose a vector of design variables x , such as the de- viation is minimized. Suppose we have to minimize the scalar multi- variate function f (x). A related problem is that of solving the system g(x)= 0, where g is the gradient vector of f with respect to x, i.e., g ∂f/∂ x. Nu- merical techniques (e.g., Newton’s method, Newton- Raphson, quasi Newton algorithms) usually solve these problems by iteration (see [3]-[6]). Let G(x) be the matrix of partial derivatives of g(x) with respect to x. Using the root of the linear Taylor expansion about a current value x (k) at iteration k, we get the new approximation x (k+1) = x (k) G(x (k) ) -1 g(x (k) ). To ensure that G is positive definite for a minimiza- tion problem, we solve x (k+1) = x (k) (G(x (k) )+ λ k I) -1 g(x (k) ), where λ k is sufficiently large so that f (x (k+1) ) <f (x (k) ) (see also [7]). The vector function g might not be the gradient of some objective function f . In that case, a relation can be stated between the solution of a system of nonlinear equations g(x)= 0 where g : R n 7R n , and the min- imization of the scalar function f (x)= g(x) T g(x), for which the functional value at the minimum so- lution is zero. In that case, the Newton step is re- placed by the line search strategy 1 , i.e., x (k+1) = x (k) α k G(x (k) ) -1 g(x (k) ), where α k (0, 1). More generally, consider the nonlinear system f : R n 7R m , (m>n). Since it is not possible to find an exact solution for that overdetermined sys- tem, one possibility is to seek a best least-squares so- lution, i.e., to find ˆ x such that f(x)2 2 is minimized [8]. Approaches of least-squares trace back to the es- timation methods from astronomical observation data. They were initially proposed by P.S. Laplace, L. Eu- ler in the 1750s, by A.M. Legendre in 1805 and C.F. Gauss in 1809 to estimate motion orbits of planets (see [8]). Aaron [9] in 1956 uses least-squares in the de- sign of physical systems. Gradient methods as steep- est descents (suggested by Hadamar in 1907) has been used to solve systems of algebraic equations and least- squares problems (see [10]). In 1949, Booth [11] ap- plied the steepest descents to the solution of nonlinear systems of equations. The generalization of least-squares into pth least approximation (p> 2) is presented by Tomes [12] with application to electric circuits and systems. In 1972, Bandler and Charalambous [13, 14] applied least p-th approximation to design problems and proved higher performances. This application-oriented article is focused on the resolution of nonlinear systems of equations, using pth least approximations (p 1) 2 . Let a system of 1 Recall that a suboptimization is implemented at each iteration step, minimizing f (x (k+1) ) with respect to α k [7]. 2 In the heat transfer process by [15], the resulting nonlin- ear system is solved using the Taylor linear approximation-based WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller E-ISSN: 2224-2880 654 Volume 13, 2014

Transcript of ℓ -Norm Minimization Method for Solving Nonlinear …„“p-Norm Minimization Method for Solving...

ℓp-Norm Minimization Method for Solving Nonlinear Systems ofEquations

ANDRE A. KELLERLaboratoire d’Informatique Fondamentale de Lille/ Section SMAC (UMR 8022 CNRS)

Universite de Lille 1 Sciences et TechnologiesCite Scientifique, 59100 Lille

[email protected]

Abstract: Solving nonlinear systems of equations also refers to an optimization problem. Moreover, this equiva-lence can be interpreted as an optimal design problem. We must determine the design variables needed to reducethe deviation between an actual vector of valued functions and a vector of desired constants. This study focuseson these two characteristic features for solving nonlinear systems of equations. One variety of numerical two-dimensional systems with multiple solutions helps demonstrate the effectiveness of the transformation process.

Key–Words: Nonlinear system, least-squares-problem, regularized least-squares, ℓp-norm, Newton’s method

1 IntroductionSolving nonlinear systems of equations and optimiza-tion problems are major challenges in practical engi-neering (see [1, 2]). Indeed in an optimal design prob-lem (see [3]-[7]), the quality of the design is measuredby a norm of the deviation between an actual vectorfunction of valued design variables f(x) and a vectorof desired constants b, i.e., f(x) ≈ b. The goal is tochoose a vector of design variables x , such as the de-viation is minimized.

Suppose we have to minimize the scalar multi-variate function f(x). A related problem is that ofsolving the system g(x) = 0, where g is the gradientvector of f with respect to x, i.e., g ≡ ∂f/∂x. Nu-merical techniques (e.g., Newton’s method, Newton-Raphson, quasi Newton algorithms) usually solvethese problems by iteration (see [3]-[6]). Let G(x) bethe matrix of partial derivatives of g(x) with respectto x. Using the root of the linear Taylor expansionabout a current value x(k) at iteration k, we get thenew approximation x(k+1) = x(k)−G(x(k))−1g(x(k)).To ensure that G is positive definite for a minimiza-tion problem, we solve x(k+1) = x(k) − (G(x(k)) +λkI)−1g(x(k)), where λk is sufficiently large so thatf(x(k+1)) < f(x(k)) (see also [7]).

The vector function g might not be the gradient ofsome objective function f . In that case, a relation canbe stated between the solution of a system of nonlinearequations g(x) = 0 where g : Rn 7→ Rn , and the min-imization of the scalar function f(x) = g(x)T g(x),for which the functional value at the minimum so-lution is zero. In that case, the Newton step is re-

placed by the line search strategy 1, i.e., x(k+1) =x(k) − αkG(x(k))−1g(x(k)), where αk ∈ (0, 1).

More generally, consider the nonlinear systemf : Rn 7→ Rm, (m > n). Since it is not possibleto find an exact solution for that overdetermined sys-tem, one possibility is to seek a best least-squares so-lution, i.e., to find x such that ∥f(x)∥22 is minimized[8]. Approaches of least-squares trace back to the es-timation methods from astronomical observation data.They were initially proposed by P.S. Laplace, L. Eu-ler in the 1750s, by A.M. Legendre in 1805 and C.F.Gauss in 1809 to estimate motion orbits of planets (see[8]). Aaron [9] in 1956 uses least-squares in the de-sign of physical systems. Gradient methods as steep-est descents (suggested by Hadamar in 1907) has beenused to solve systems of algebraic equations and least-squares problems (see [10]). In 1949, Booth [11] ap-plied the steepest descents to the solution of nonlinearsystems of equations.

The generalization of least-squares into pth leastapproximation (p > 2) is presented by Tomes [12]with application to electric circuits and systems. In1972, Bandler and Charalambous [13, 14] appliedleast p-th approximation to design problems andproved higher performances.

This application-oriented article is focused on theresolution of nonlinear systems of equations, usingpth least approximations (p ≥ 1) 2. Let a system of

1Recall that a suboptimization is implemented at each iterationstep, minimizing f(x(k+1)) with respect to αk [7].

2In the heat transfer process by [15], the resulting nonlin-ear system is solved using the Taylor linear approximation-based

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 654 Volume 13, 2014

nonlinear differentiable equations be

fi(x) = 0, i = 1, ...,m, (1)

where x ∈ Rn, n < m. Forming the semidefinitefunctional J(x) =

∑mi=1 f

2i (x), we seek the mini-

mum of J(x). If some minimum x verifies J(x) = 0then x = x is the solution of (1). If for every x, wehave J(x) > 0, then x = x are the least-squares so-lutions of (1). The interest of such optimization ap-proaches is shown for different examples, for whichmultiple global minima exist. In such cases of equiva-lent global optimization problems, evolutionary algo-rithms may be use as in [16, 17].

This article is organized as follows. Section 1presents historical context of this study and defines theobjective . Section 2 introduces to the basic elementsof least-squares and pth least approximations. Section3 solves different examples for which the difficultiesincrease. Section 4 treats cases for which regularizedsystems are considered, due to the necessity of obtain-ing not too large parameters. The concluding remarksare in Section 5. Finally nonlinear systems and opti-mization problem are compared with more numericalexamples in the appendix.

2 ℓp-Norm Approximation MethodThe distance approach refers to an approximationproblem for which different norms may be chosen,such as the ℓp norms. A standard Newton’s methodis used to find a solution locally for smooth objectivefunctions.

2.1 Basics, Notation and DefinitionsThe ℓp-norm of the real vector x ∈ Rn is defined as

∥x∥p , (

n∑i=1

|xi|p)1/p

for p ∈ N+. The usual ℓp-norms are: the ℓ1 (alsocalled, the ”Manhattan” or ”city block” norm) forp = 1 , the Euclidean ℓ2 (also called, the ”root energy”or ”least-squares” norm) for p = 2 and the Chebyshevℓ∞ (also called, the ”infinity”, ”uniform” or ”supre-mum” norm) for p→∞. Figure 1 (a)-(b) picture unitballs centered at zero B1(∥.∥) i.e.,x : ∥x∥ = 1, in the3D and 2D spaces respectively.

Newton method, for which the implementation of numerically al-gorithms is presented.

Figure 1: (a) 3D picture of unit balls in ℓp norm inR3 for p ∈ {1, 2,∞};

Figure 1: (b) 2D picture of unit balls in ℓp norm inR2 for p ∈ {1, 2,∞}

An ℓp-norm of a vector for an even p is a differen-tiable function of its components. The function corre-sponding to an infinity norm ℓ∞ is not differentiable.It can be shown that [18]

∥x∥1 ≥ ∥x∥2 ≥ ∥x∥∞.Let V be a vector space and < ., . > an inner

product on V . We define ∥v∥ =√< v, v >. By the

properties of an inner product, a norm satisfies(i) ∥v∥ ≥ 0 with ∥v∥ = 0 if and only if v = 0,(ii) ∥α v∥ = |α|.∥v∥ for α ∈ R, v ∈ V and(iii) the triangle inequality or subadditivity ∥v +

w∥ ≤ ∥v∥+ ∥w∥, for v,w ∈ V .The basic norm approximation problem for an

overdetermined system is [3, 19]

minimize x ∥Ax− b∥2,

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 655 Volume 13, 2014

where A ∈ Rm×n with m ≫ n, b ∈ Rm, x ∈ Rn.The norm ∥.∥2 is on Rm. A solution is an approxi-mate solution of Ax ≈ b in the norm ∥.∥ . The vec-tor r = Ax − b is the residual for the problem. ForBoyd and Vandenberghe [3] pp.291-292, four inter-pretations are possible for this problem, i.e., an inter-pretation in terms of regression, an interpretation interms of parameter estimation, a geometry interpre-tation and an optimal design interpretation. Thus, inthe regression problem, we can consider that the norm(i.e., a deviation measure) aims at approximating thevector b by a linear combination

Ax =

n∑j=1

ajxj

of columns of A. In the design interpretation, the xi’sare the design variables to be determined. The goalis to choose a vector of design variables that approxi-mates the target results b , i.e., Ax ≈ b. The residualvector r = Ax − b expresses the deviation betweenactual and target results.

2.2 Least-Squares and Minimax Approxi-mation Problems

We present two of the four possible interpretations,the regression problem and the optimal design prob-lem. The norms used for this presentation are the Eu-clidean and the Chebyshev norms, for which we givethe formulation in terms of an approximation problem[20].• Least-squares approximation problemBy using the common Euclidean norm ℓ2, we ob-

tain the following approximation problem

minimize x f(x) ≡ ∥Ax− b∥22,

where the objective f(x) is the convex quadratic func-tion:

f(x) = xTATAx− 2bTAx + bTb.

From the optimality conditions ∇f(x) = 0, wededuce that a solution point minimizes f(x), if andonly if, it satisfies the normal equations

ATAx = ATb.

Assuming independent columns for A, a unique solu-tion is achieved for

x =(ATA

)−1ATb.

•Minimax approximation problem

By using the Chebyshev norm ℓ∞, the approxi-mation problem is

minimize x ∥Ax− b∥∞.

We may also write

minimize x max{|A1x− b1|, . . . , |Amx− bm|

},

where the Ai’s are the rows of A. Let α ∈ R, we have|Aix − bi| ≤ α for all i = 1, . . . ,m. Letting a vectorof m ones em = (1, 1, . . . , 1)T , the condition is alsowritten |Ax−b| ≤ α em. Therefore, an equivalent lin-ear programming problem for the minimax problem is{

minimize α

subject to |Ax− b| ≤ α em.

2.3 Nonlinear ℓp-Norm ApproximationProblem

We introduce to the the norm equivalence of quadraticobjective functions and propose a formulation for gen-eralized nonlinear systems.• Quadratic objective function norm equiva-

lenceLet the problem

minimize x xTQx, (2)

where Q is a n × n symmetric semidefinite matrix.Then, there is a matrix 3 H, such as Q = HHT , H ∈Rn×p, p≪ n. The reformulation of (2) is

minimize x ∥HT x∥2, (3)

for which the storage and evaluation costs are moreattractive with n × p for the reformulation (3) than(1/2)n2 in the former formulation (2). In models ofthe covariance, we assume the generalization accord-ing to which Q = D + HHT , where D is a diagonalpositive semi-definite matrix [21].• Generalized nonlinear approximation prob-

lemThe nonlinear n-dimensional system f(x) = 0 ,

where x ∈ Rn and f : Rn 7→ Rn, has a solution whenthe scalar function g(x) = f(x)T f(x) has the minimumvalue of zero. Let the fi(x), i = 1, . . . , n be continu-ous component functions in the domain x ∈ X ⊆ Rn,we wish to determine the solution x ∈ X for an ini-tial approximation x0 . The first-order Taylor seriesapproximation of the vector function f(x) is

f(x) = f(x0) + J(x0)∆x +O2(∆x),

where J is a n× n Jacobian matrix, ∆x = x− x0 de-notes a correction error vector and O2(∆x) is a negli-gible remainder for higher terms.

3The matrix H may be the Cholesky factor.

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 656 Volume 13, 2014

Example 1 Let the vector function

f(x) =

6 + 20x1 + 4x21 +15x

22

10 + 2x1 − 4x2 +14x1x

22

.

The linear approximation about one of the foursolutions of f(x) = 0 (see Table 1) e.g., with x4 =(−0.3767, 2.1979), is

f(x) =

4.4663 + 16.9865x1 + 0.8792x2

10.9099 + 3.2077x1 − 4.4140x2

.

For Example 1, the scalar function g(x) =f(x)T f(x) is pictured in Figure 2 (b).

Figure 2: (a) Four minimum solutions of the systemf(x) = 0 in Example 1;

Figure 2: (b) Nine stationary points from the system∇g(x) = 0 in Example 1

The gradient ∇g(x) is given by280 + 904x1 − 16x2 + 480x21 + 13x22

+64x31 − 2x32 +265 x1x

22 +

18x1x

42

−80− 16x1 +1845 x2 + 26x1x2 +

425x

32

−6x1x22 + 265 x

21x2 +

14x

21x

32

.

The four minimum solutions are presented in Ta-ble 1.

Table 1: Solutions for f(x) = 0 in Example 1

♯ x1 x2 f(x)1 -4.6792 0.1535 4.5× 10−7

2 -4.5090 -3.7790 2.1× 10−7

3 -1.9653 -9.4489 1.5× 10−7

4 -0.3767 2.1979 2.1× 10−10

Figure 2 (a) pictures the contours lines g(x) = Ctogether with the four minimum solutions. The so-lutions are obtained by solving the system {f1(x) =0, f2(x) = 0}. The stationary points solve the systemof gradients{

∇1g(x) = 0,∇2g(x) = 0

},

where ∇ig(x) = ∂g(x)/∂xi, i = 1, 2 (see Figure 2(b)) ⊓⊔

The approximation of the quadratic function g(x)has for expression

g(x) = f(x0)T f(x0) + (∆x)T J(x0)J(x0)T∆x

+2f(x0)T J(x0)∆x.

A point x minimizes g, if and only if, x satisfiesthe normal equations

J(x0)T J(x0)∆x = −J(x0)T f(x0).

Using the Newton’s method, a step of the algo-rithm is (e.g., [4, 5])

x(k+1) = x(k)−αk(

J(x(k))T J(x(k)))−1

J(x(k))f(x(k)),

where αk ∈ (0, 1) is a damping parameter. The stepsof the Newton’s approximation method in Table 2 arepictured in Figure 3. The exact values at iteration 20are (a) 1.7×10−7, (b) 2.1×10−5, and (c)−4.6×10−5.

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 657 Volume 13, 2014

Table 2: Iteration steps of the Newton’s method to Ex-ample 1

k x(k)1 x

(k)2 f(x(k)) x

(k)1 −x1 x

(k)2 −x2

0 2 -4 5695.0 2.377 -6.1981 1.373 -3.309 2750.0 1.750 -5.5072 0.889 -2.499 1331.6 1.266 -4.6963 0.527 -1.546 636.7 0.904 -3.7444 0.256 -0.491 286.6 0.633 -2.6885 0.045 0.529 112.1 0.422 -1.669...

......

...10 -0.356 2.150 0.173 0.0206 -0.0480...

......

...20 -0.377 2.198 0(a) 0(b) −0(c)

Figure 3: ℓ2-norm of f(x) in Example 1 by using theNewton’s method

3 Numerical ExamplesNumerical examples 4 consist in two-dimensional andhigher dimensional systems. The two-dimensional ex-amples are a polynomial system with few solutionsand a trigonometric polynomial systems with muchmore real solutions. The common norms are com-pared. A three-dimensional example is reduced to a

4A collection of real-world nonlinear problems is presentedin [22], e.g., an n-stage distillation column, the Bratu prob-lem for nonlinear diffusion phenomena in combustion and semi-conductors, the Chandrasekhar H-equation in radiative transferproblems, the elastohydrodynamic lubrication problem.

set of three two-dimensional subsystems by eliminat-ing one of the variables.

3.1 Two-Dimensional SystemsExample 2 Let a residual vector be

r(x) =

25− x21 − x22

5 + x1 − x22

.

The two residual components equations r(x) = 0yields three solution points at x1 = (−5, 0)T , x2 =

(4, 3)T and x3 = (4,−3)T . The ℓ2-norm approxima-tion ∥ r(x) ∥2 is pictured in Figure 4 together with thethree global minimum points for which the functionvalues are zero. ⊓⊔

Figure 4: Polynomial Example 2 with ℓp-normapproximation and the three solutions

Example 3 Let a residual vector be

r(x) =

−3 + 4 cos(x1) + 2 cos(x2)

−1 + 2 sin(x1) + sin(x2)

.

The two residual components in Figure 5 (a)-

(b) show 18 solution points at the coordinates(a1 ±

j2π, a2 ± j2π

)and

(b1 ± j2π, b2 ± j2π

)for

j = 0, 1, where a = (0.0658, 2.0894)T and b =(1.1102,−0.9134)T . The ℓ2-norm approximation ∥r(x) ∥ is pictured in Figure 5 (b) together with the 18global minimum points for which the function valuesare zero. ⊓⊔

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 658 Volume 13, 2014

Figure 5: (a) 3D picture of the ℓ2-normapproximation for the polynomial trigonometric in

Example 3;

Figure 5: (b) Contours and the 18 minimum solutionpoints

Example 4 Let a residual vector be

r(x) =

x1 − sin(x1 + 2x2)− cos(2x1 − 3x2)

x2 − sin(4x1 − 3x2) + cos(x1 + 2x2)

.

The two residual components equations r(x) = 0yield seven solution points at the coordinates given inTable 3.

The ℓ2-norm approximation ∥ r(x) ∥ is picturedin Figure 6 (b) together with the seven global mini-mum points for which the function values are zero. ⊓⊔

Table 3: Solutions for r(x) = 0 in Example 4♯ x1 x2 f(x)1 -1.8446 -0.1325 2.1× 10−8

2 -0.9409 -0.0903 4.6× 10−8

3 -0.0874 1.1303 5.3× 10−8

4 0.0523 1.7879 1.4× 10−8

5 0.4254 -0.1431 8.6× 10−8

6 0.7231 0.9369 7.7× 10−8

7 1.3013 0.6190 2.1× 10−8

Figure 6: (a) 3D picture of the ℓ2 normapproximation for the trigonometric Example 4;

Figure 6: (b) Contours and the seven minimumsolution points

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 659 Volume 13, 2014

3.2 Comparison of Usual Norms

Example 5 Let a residual vector be

r(x) =

1− 4x1 + 2x21 − 2x32

−4 + 4x2 + x41 + 4x42

.

An approximation problem for the three commonnorms ℓ1,ℓ2, and ℓ∞ is pictured in Figure 7 (a) to (c) .The first Manhattan norm in Figure 7 (a) is defined as

∥r(x)∥1 = |r1(x)|+ |r2(x)|.

The second Euclidean norm in Figure 7 (b) is definedas

∥r(x)∥2 =(|r1(x)|2 + |r2(x)|2

)1/2

.

and is pictured with the two contour lines r1(x) = 0and r2(x) = 0. The third Chebyshev norm in Figure7 (c) is defined as

∥r(x)∥∞ = max{|r1(x)|, |r2(x)|}

and is pictured with the exclusion lines along whichwe find the discontinuities of the function.

Figure 7: Example 5 with usual normapproximations: (a) Manhattan norm;

Figure 7: (b) Euclidean norm;

Figure 7: (c) Chebychev norm

The resulting functions of using the ℓ1 and ℓ∞norms are nonsmooth. Then, nonderivative optimiza-tion methods must be used (see [23] on optimizationand nonsmooth analysis and [24] on the finite dif-ference approximation of sparse Jacobian matrices inNewton methods).

3.3 Higher Dimensional SystemsA two-dimensional subsystem can be obtained froma higher dimensional system, by eliminating one ormore of its variables. This procedure is illustratedby the following three- dimensional system for whichthree two-dimensional subsystems are deduced.

Example 6 Let a residual vector be

r(x) =

−2 + x1 − x2 + x3

x1x2x3

−1 + 2x2 + x3

.

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 660 Volume 13, 2014

• By eliminating x3, we obtain the two-dimensional system

r(x1, x2) =

−1 + x1 − 3x2

−x2 − x22 + 6x32

.

• By eliminating x2, we obtain the two-dimensional system

r(x1, x3) =

−5 + 2x1 + 3x3

5x3 − 8x23 + 6x33

.

• By eliminating x1, we obtain the two-dimensionalsystem

r(x2, x3) =

−1 + 2x2 + x3

5x3 − 8x23 + 3x33

.

The three solutions for these subsystems are x1 =(0,−1

3 ,53

)T , x2 =(1, 0, 1

)T and x3 =(52 ,

12 , 0)T .

The contour maps corresponding to these subsystemsare pictured in Figure 8 (a), (b) and (c) respectively. ⊓⊔

Figure 8: Contour maps of three two-dimensionalsubsystems in Example 6

4 Regularized Least-SquaresThe norm approximation problem can be formalizedfor a regularized linear system of equations, for whichone another objective is given.

In a regularized least-squares problem, the objec-tives are twofold. One first objective is to find thedesign variables x ∈ Rn that gives a better fit. Oneanother objective is to obtain not too large design vari-ables. The vector optimization problem with respectto the cone R2

+ is

minimize f(x) =(f1(x), f2(x)

)T, (4)

where the functions f1 and f2 may represent two iden-tical Euclidean norms (or different norms) measuringthe fitting error and the size of the design vector, re-spectively (see[3, pp. 184-185]).

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 661 Volume 13, 2014

4.1 Tikhonov RegularizationLet the deterministic 5 overdetermined system beAx = b, where x ∈ Rn, A ∈ Rm×n with m ≫ n,and b ∈ Rm (e.g., in some data fitting problems). Weretain quadratic measures for the size of the residu-als Ax − b and for that of x. The problem (4) is tominimize the two squared norms, so that

minimize f(x) ≡(∥Ax− b∥22, ∥x∥22

).

For both Euclidean norms, the unique Pareto op-timal point is given by x = A†b, where A† denotesthe pseudoinverse of A, i.e., A† = limε→0(ATA +εI)−1AT at ε > 0. Expanding f1(x) and f2(x), andscalarizing with strictly positive λi’s for i = 1, 2, theminimization problem is expressed by

minimize x∈Rn+

{xT(λ1ATA + λ2I

)x

−2λ1bTAx + λ1bTb}.

The minimum solution point x = (ATA +µI)−1ATb where µ = λ2/λ1 is Pareto optimal forany µ > 0 (see also this Tikhonov regularization tech-nique by [3, 25]).

4.2 ℓ1-norm regularizationDifferent norms are used in preference for the two ob-jectives in practical applications (e.g., image restora-tion) as in [26, 27]. The common method is

minimize1

2∥Ax− b∥2 + γ∥x∥1,

where γ ∈ (0,∞). In this formulation, the cost of us-ing large values is a penalty added to the cost of miss-ing the goal specification. The regularization tech-nique overcomes an ill-conditioned matrix A of data[26]. An optimal trade-off curve for a regularizedleast-squares problem may be determine, as in [3, p.185].

5 ConclusionSolving nonlinear systems using ℓp-norms appearsto be an effective method of resolution. The initialproblem is transformed into an optimization problemfor which we are looking for the zero global mini-mum solutions. However, the city-block ℓ1-norm and

5In presence of noisy observations, the system becomes b =Ax + ξ, where ξ denotes a white Gaussian noise vector.

the uniform ℓ∞-norm produce nonsmooth objectivefunctions, for which nonderivative optimization tech-niques are helpful. This optimization method is ap-plied to a variety of small size systems of nonlinearequations. This approach is well suited, specially, fortrigonometric polynomial systems with multiple solu-tions.

6 Appendix: Nonlinear Systems andOptimization Problem

A.1 Problem EquivalenceSolving a nonlinear system of equation and optimiz-ing a multivariate function include multiple similari-ties as in Table 4. The initial formulation of the twoproblems is different. In the first problem, the non-linear system g(x) = 0 consists in a vector functiong : Rn 7→ Rn, while a multivariate scalar functionis optimized in the second problem. In both cases,a Taylor series expansion is used. However, it is alinear approximation for the nonlinear system, but aquadratic approximation for the optimization. Simi-lar assumptions are made with regards to aspects suchas smoothness functions, negligible remainders, in-version of the squared Jacobian and Hessian matrix,respectively. Differences are in the formula for theNewton iteration step.

A.2 Numerical ExamplesTwo examples illustrate the two procedures. In Ex-ample 7 the nonlinear system of equation g(x) = 0allows to determine the stationary points of the mul-tivariate function f(x) in the minimization problem.The connection is different in Example 8 since theobjective function of the minimization problem is ex-pressed as f(x) = r(x)T r(x) , for which the resid-ual error function satisfies r(x) = 0. The two-dimensional Rosenbrock’s test function is used forthis example.

Example 7 Let the bivariate function

f(x) = 3− 3x1 − 2x2 + x22 + x31,

where x ∈ [−3, 3]× [−2, 4].

The first iterations of the nonlinear system prob-lem and that of the optimization problem are in Table

5. The gradient is g(x) =(−3 + 3x21,−2 + 2x22

)T. The solutions of the nonlinear system g(x) = 0 arethe stationary points (−1, 1)T and (1, 1)T for whichthe function values are respectively 4 and 0.

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 662 Volume 13, 2014

Table 4: Nonlinear system problem solving and optimization problem: a comparison

Nonlinear System Solving Problem Optimization Problem1. Formulation• Find x such that g(x) = 0, • minimize x f(x),

where x ∈ Rn and g : Rn 7→ Rn. where x ∈ Rn and f : Rn 7→ R.2. Approximation• Linear Taylor series expansion about x0, i.e., • Quadratic Taylor series expansion about x0, i.e.

g(x) = g(x0) + J(x0)T∆x +O2(∆x), f(x) = f(x0) +∇f(x0)T∆x+1

2(∆x)TH(x0)∆x +O3(∆x),where J(x0) ≡ ∇g(x0) is the Jacobian matrix where H(x0) ≡ ∇2f(x0) is the Hessian matrixat x0 and ∆x = x− x0, the correction vector. at x0 and ∆x = x− x0, the correction vector.

3. Assumptions• Smoothness of the functions gi(x), i = 1, . . . , n • Smoothness of the functions f(x)• Negligible remainder O2(∆x) for higher terms • Negligible remainder O3(∆x) for higher terms• Inverse of the squared Jacobian J(x0)T J(x0). • Inverse Hessian and positive definite Hessian

for a minimum.4. Normal equation

JT J∆x = −Jg(x). H∆x = −∇f(x).5. Newton’s iteration step k = 0, 1, . . .

x(k+1) = x(k) −(

J(x(k))T J(x(k)))−1

J(x(k)g(x(k)). x(k+1) = x(k) − αkH(x(k))−1∇f(x(k)),

where αk ∈ (0, 1) is the step size.

The global minimum is (1, 1)T at which the Hes-

sian H =

(6x1 00 2

)is positive definite, and for

which the function value is zero.

Example 8 The bivariate Rosenbrock’s function is

f(x) = (1− x1)2 + 100(x2 − x21)2,

where x ∈ [−1.5, 1.5]2.

The residual vector function for the Rosenbrock’sfunction is

r(x) =

1− x1

10(x2 − x21

,

such as f(x) = r(x)T r(x). The iterations of the New-ton’s method are in Table 6.

References:

[1] S. S. Rao, Engineering Optimization: Theoryand Practice (third edition), J. Wiley & Sons,Hoboken, N. J. 2009

Table 6: Newton’s method in Example 8

k x(k)1 x

(k)1 f(x(k))

0 -1 1 41 0 -1 1012 0.00248 -0.5 25.99563 0.00742 -0.25 7.2365...

......

...38 0.9999 0.9999 6× 10−13

39 1. 0.9999 1.5× 10−13

40 1 1 63.8× 10−14

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 663 Volume 13, 2014

Table 5: Nonlinear system and optimization problem in Example 7

k Nonlinear system of gradients Optimization problemx(k)1 x

(k)2 f(x(k)) x

(k)1 x

(k)2 f(x(k))

0 2 3 8 2 3 81 1.25 1 0.2031 1.625 2 2.41602 1.025 1 0.00189 1.3726 1.5 0.71823 1.0003 1. 2.79×10−7 1.2116 1.25 0.20634 1. 1. 6.44×10−15 1.1150 1.125 0.05685 1. 1. -2.22×10−16 1.0605 1.0625 0.01516 1. 1. 0 1.0311 1.0313 0.00397 1. 1. 0 1.0158 1.0156 9.95×10−4

8 1. 1. 0 1.0079 1.0078 2.51×10−4

9 1. 1. 0 1.0039 1.0039 6.31×10−5

10 1. 1. 0 1.002 1.002 1.6×10−5

[2] R. Ravindran, K.M. Raqsdell and G.V. Reklaitis,Engineering Optimization: Method and Appli-cations (second edition), J. Wiley & Sons, Hobo-ken, N. J. 2006

[3] S. Boyd and L. Vandenberghe, Convex Opti-mization, Cambridge University Press, Cam-bridge UK 2004

[4] P. A. Jensen and J.F. Bard, Operations Research:Models, and Methods, Wiley & Sons, Hoboken,N. J. 2003

[5] J. Nocedal and S.J. Wright, Numerical Op-timization (second edition), Springer ScienceBusiness Media, New York 2006

[6] J. A. Snyman, Practical Mathematical Opti-mization: An Introduction to Basic Optimiza-tion Theory and Classical New Gradient-BasedAlgorithms, Springer Science+Business Media,New York 2005

[7] G. K. Smyth, Optimization and nonlinear equa-tions. In: P. Armitage and T. Colton (eds.), Ency-clopedia of Biostatistics, pp. 3174–3180, J. Wi-ley & Sons, Chichester 1998

[8] C. Xu, Nonlinear least squares problems. In:C.A. Floudas and P.M. Pardalos (eds.), Ency-clopedia of Optimization (second edition), pp.2626–2630, Springer Science Business Media,New York, 2009

[9] M. R. Aaron, The use of least squares in systemdesign, IRE Trans. Circuit Theory Vol.3, No. 4,1956, pp. 224–231.

[10] J. W. Fischbach, Some applications of gradientmethods. In: R.V. Churchill, E. Reissner, andA.H. Taub (eds.), Numerical Analysis, pp. 59–72, American Mathematical Society, ProvidenceR.I., 1956

[11] A. D. Booth, An application of the methodof steepest descents to the solution of systemsof non-linear simultaneous equations, Quart. J.Mech. Appl. Math. Vol.2, No. 4, 1949, pp. 460–468.

[12] G. C. Tomes, Least pth approximation, IEEETrans. Circuit Theory, Vol.16, No. 2, 1969,pp. 235–237.

[13] J. W. Bandler and C. Charalambous, Theoryof generalized least pth approximation, IEEETrans. Circuit Theory, Vol.19, No. 3, 1972,pp. 287–289.

[14] J. W. Bandler and C. Charalambous, Practicalleast pth optimization of networks, IEEE Trans.Microwave Theory Tech., Vol.20, No. 12, 1972,pp. 834–840.

[15] C. Patrascioiu and C. Marinoiu, The applica-tions of the nonlinear equations systems algo-rithms for the heat transfer processes, Proc. 12thWSEAS Int. Conf. MAMECTIS’10, Kantaoui,Sousse, Tunisia, May 3-6, 2010, pp.30-35.

[16] A. F. Kuri-Morales, Solution of simultaneousnon-linear equations using genetic algorithms,WSEAS Trans. Systems, Vol.2, No. 1, 2003,pp. 44–51.

[17] N. E. Mastorakis, Solving nonlinear equtions viagenetic algorithms, WSEAS Trans. Inform. Sci.Appl., Vol.2, No. 5, 2005, pp. 455–459.

[18] A. Antoniou and W. S. Lu, Practical Optimiza-tion. Algorithms and Engineering Applications ,Springer Science Business Media, New York2007

[19] R. Fletcher, Practical Methods of Optimiza-tion (second edition), John Wiley & Sons,Chichester - New York -Brisbane 1987

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 664 Volume 13, 2014

[20] W. Forst and D. Hoffmann, Optimization- The-ory and Practice, Springer Science BusinessMedia, New York 2010

[21] E. D. Andersen, On formulating quadratic func-tions in optimization models, MOSEK TechnicalReport TR-1-2013, 2013.

[22] J. J. More, A collection of nonlinear model prob-lems, In: E. L. Allgower and K. Georg (eds.),Computational Solution of Nonlinear Systems ofEquations, pp. 723–762, American Mathemati-cal Society, Providence, R.I. 1990

[23] F. H. Clarke, Optimization and Nonsmooth Anal-ysis, Society for Industrial and Applied Mathe-matics (SIAM), Philadelphia, P. A.1990

[24] T. J. Ypma, Finite difference approximation ofsparse jacobian matrices in Newton-like meth-ods. In: E.L. Allgower and K. Georg (eds.),Computational Solution of Nonlinear Systems ofEquations, pp. 707–722, American Mathemati-cal Society, Providence, R.I. 1990

[25] X.-S. Yang, Engineering Optimization: AnIntroduction with Metaheuristic Applications,John Wiley & Sons, Hoboken N. J. 2010, pp.97–99.

[26] L. Wu and Z. Sun, New nonsmooth equations-based algorithms for ℓ1-norm minimiza-tion and applications, J. Appl. Math., 2012,doi:10.1155/2012/139609

[27] Y. Xiao, Q. Wang and Q. Hu, Non-smooth equa-tions based method for ℓ1-norm problems withapplications to compressed sensing, NonlinearAnal., Vol.74, 2011, pp. 3570–3577.

WSEAS TRANSACTIONS on MATHEMATICS Andre A. Keller

E-ISSN: 2224-2880 665 Volume 13, 2014