Newton and Halley are one step apart -  · Halley’s method (α = 1 2), and Super Halley’s...

32
Methods for Solving Nonlinear Equations Local Methods for Unconstrained Optimization The General Sparsity of the Third Derivative How to Utilize Sparsity in the Problem Numerical Results Newton and Halley are one step apart Trond Steihaug Department of Informatics, University of Bergen, Norway 4th European Workshop on Automatic Differentiation December 7 - 8, 2006 Institute for Scientific Computing RWTH Aachen University Aachen, Germany (This is joint work with Geir Gundersen) Trond Steihaug Newton and Halley are one step apart

Transcript of Newton and Halley are one step apart -  · Halley’s method (α = 1 2), and Super Halley’s...

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Newton and Halley are one step apart

Trond Steihaug

Department of Informatics, University of Bergen, Norway

4th European Workshop on Automatic DifferentiationDecember 7 - 8, 2006

Institute for Scientific ComputingRWTH Aachen University

Aachen, Germany

(This is joint work with Geir Gundersen)

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Overview

- Methods for Solving Nonlinear Equations:Method in the Halley Class is Two Steps of Newton indisguise.

- Local Methods for Unconstrained Optimization.

- How to Utilize Structure in the Problem.

- Numerical Results.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

The Halley ClassMotivation

Newton and Halley

A central problem in scientific computation is the solution ofsystem of n equations in n unknowns

F (x) = 0

where F : Rn → Rn is sufficiently smooth.

Sir Isaac Newton (1643 - 1727). Sir Edmond Halley (1656 - 1742).

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

The Halley ClassMotivation

A Nonlinear Newton method

Taylor expansion

T (s) = F (x) + F′(x)s +

1

2F′′(x)ss

Nonlinear Newton:Given x . Determine s : T (s) = 0. Update x+ = x + s.Two Newton steps on nonlinear problem T (s) = 0 with s(0) ≡ 0:

T′(0)s(1) = −T (0).

T′(s(1))s(2) = −T (s(1)).

x+ = x + s(1) + s(2).

F′(x)s(1) = −F (x).[

F′(x) + F

′′(x)s(1)

]s(1) = −1

2F

′′(x)s(1)s(1).

x+ = x + s(1) + s(2).

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

The Halley ClassMotivation

The Halley Class

The Halley class of iterations (Gutierrez and Hernandez 2001):Given starting value x0 compute

xk+1 = xk−{I+1

2L(xk)[I−αL(xk)]−1}(F ′

(xk))−1F (xk), k = 0, 1, . . . ,

where

L(x) = (F′(x))−1F

′′(x)(F

′(x))−1F (x), x ∈ Rn

Classical methods

Chebyshev’s method (α = 0),Halley’s method (α = 1

2), andSuper Halley’s method (α = 1).

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

The Halley ClassMotivation

One Step Halley

This formulation is not suitable for implementation. By rewritingthe equation we get the following iterative method for k = 0, 1, . . .

Solve for s(1)k : F ′(xk)s

(1)k = −F (xk)

Solve for s(2)k :

[F ′(xk) + αF

′′(xk)s

(1)k

]s(2)k = −1

2F′′(xk)s

(1)k s

(1)k

Update the iterate: xk+1 = xk + s(1)k + s

(2)k

A Key Point

One step super Halley (α = 1) is two steps of Newton on thequadratic approximation.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

The Halley ClassMotivation

Super Halley as Two Steps of Newton

Two Steps of Newton is:

Solve for s(1)k : F ′(xk )s

(1)k = −F (xk )

Solve for s(2)k : F ′(xk + s

(1)k )s

(2)k = −F (xk + s

(1)k )

Update the iterate: xk+1 = xk + s(1)k + s

(2)k

One step Super Halley:

Solve for s(1)k : F ′(xk )s

(1)k = −F (xk )

Solve for s(2)k : [F

′(xk ) + F

′′(xk )s

(1)k ]s

(2)k = − 1

2F

′′(xk )s

(1)k s

(1)k

Update the iterate: xk+1 = xk + s(1)k + s

(2)k

1 In addition to F (x), F ′(x) and the solution of two linear systems they require:

- Halley requires F ′′(x)s (+ matrix vector product [F ′′(x)s] s)- Two Steps of Newton requires F ′(x + s) and F (x + s) .

2 All members in the Halley class are cubically convergent.3 Super Halley and two steps of Newton are equivalent on quadratic functions.4 The super Halley method is quartically convergent for quadratic equations (D.

Chen, I. K. Argyros and Q. Qian 1994).

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

The Halley ClassMotivation

(Ortega and Rheinboldt 1970): Methods which require second and higher orderderivatives, are rather cumbersome from a computational view point. Note that, while

computation of F ′ involves only n2 partial derivatives ∂jFi , computation of F′′

requiresn3 second partial derivatives ∂j∂kFi , in general exorbiant amount of work indeed.

(Rheinboldt 1974): Clearly, comparisons of this type turn out to be even worse formethods with derivatives of order larger than two. Except in the case n = 1, where allderivatives require only one function evaluation, the practical value of methodsinvolving more than the first derivative of F is therefore very questionable.

(Rheinboldt 1998): Clearly, for increasing dimension n the required computationalwork soon outweighs the advantage of the higher-order convergence.

When structure and sparsity is utilized the picture is very different.Sparsity is more predominant in higher derivatives.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

BasicsComputations with the Tensor

Local Methods for Unconstrained Optimization

The members of the Halley class also apply for algorithms for theunconstrained optimization problem in the general case

minx∈Rn

f (x)

f (x), ∇f (x), ∇2f (x) and ∇3f (x)

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

BasicsComputations with the Tensor

Terminology

Let f : Rn → R be a three times continuously differentiable function. For a givenx ∈ Rn let

gi =∂f (x)

∂xi, Hij =

∂2f (x)

∂xi∂xj, Tijk =

∂3f (x)

∂xi∂xj∂xk.

g ∈ Rn, H ∈ Rn×n, and T ∈ Rn×n×n

H is a symmetric matrixHij = Hji , i 6= j

We say that a n × n × n tensor is super-symmetric when

Tijk = Tikj = Tjik = Tjki = Tkij = Tkji , i 6= j , j 6= k, i 6= k

Tiik = Tiki = Tkii , i 6= k.

We will use the notation (pT ) for the matrix ∇3f (x)p.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

BasicsComputations with the Tensor

Super-Symmetric Tensor

For a super-symmetric tensor we only store 16n(n + 1)(n + 2) elements Tijk for

1 ≤ k ≤ j ≤ i ≤ n. (n = 9).

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

BasicsComputations with the Tensor

Computations with the Tensor

The cubic value term pT (pT )p ∈ R:

pT (pT )p =∑n

i=1 pi∑n

j=1 pj∑n

k=1 pkTijk

=∑n

i=1 pi

{[∑i−1j=1 pj

(6

∑j−1k=1 pkTijk + 3pjTijj

)+ 3pi

∑i−1k=1 pkTiik

]+ p2

i Tiii

}

The cubic gradient term (pT )p ∈ Rn:

((pT )p)i =∑n

j=1

∑nk=1 pjpkTijk , 1 ≤ i ≤ n

The cubic Hessian term (pT ) ∈ Rn×n:

(pT )ij =∑n

k=1 pkTijk , 1 ≤ i , j ≤ n

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

BasicsComputations with the Tensor

Computing utilizing Super-Symmetry:H + (pT )

T ∈ Rn×n×n is a super-symmetric tensor.H ∈ Rn×n is a symmetric matrix.Let p ∈ Rn.for i = 1 to n do

for j = 1 to i − 1 dofor k = 1 to j − 1 do

Hij+ = pkTijk

Hik+ = pjTijk

Hjk+ = piTijk

end forHij+ = pjTijj

Hjj+ = piTijj

end forfor k = 1 to i − 1 do

Hii+ = pkTiik

Hik+ = piTiik

end forHii+ = piTiii

end for

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

SparsityGeneral Sparsity

Induced Sparsity

Definition

The sparsity of the Hessian matrix (Griewank and Toint 1982):

∂2

∂xi∂xjf (x) = 0, ∀x ∈ Rn, and (i , j) ∈ Z

Then

Tijk =∂3f (x)

∂xi∂xj∂xk= 0 for (i , j) ∈ Z or (j , k) ∈ Z or (i , k) ∈ Z.

We say that sparsity structure of the tensor is induced by the sparsity structureof the Hessian matrix.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

SparsityGeneral Sparsity

Stored Elements: Arrowhead

X XX X

X XX X

X XX X

X XX X

X X X X X X X X X

Stored elements of a 9× 9 arrowhead matrix and the induced tensor.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

SparsityGeneral Sparsity

Stored Elements: Tridiagonal

Stored elements of a 9× 9 tridiagonal matrix and the induced tensor.

X XX X X

X X XX X X

X X XX X X

X X XX X X

X X

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

SparsityGeneral Sparsity

General Sparsity

If Z is the set of indices of the Hessian matrix that are 0 and define

N = {(i , j)|1 ≤ i , j ≤ n} \ Z

N is the set of indices for which the elements in the Hessian matrix at x will benonzero.

Since

Tijk = 0, if (i , j) ∈ Z, or (j , k) ∈ Z or (i , k) ∈ Z

we only need to consider the elements (i , j , k) in the tensor, where

(i , j) ∈ N and (j , k) ∈ N and (i , k) ∈ N , 1 ≤ k ≤ j ≤ i ≤ n.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

SparsityGeneral Sparsity

General Sparsity cont.

In the following we will assume that (i , i) ∈ N . Define

T = {(i , j , k)|1 ≤ k ≤ j ≤ i ≤ n, (i , j) ∈ N , (j , k) ∈ N , (i , k) ∈ N}

Let Ci be the indices below the diagonal nonzero elements in row i of the sparseHessian matrix:

Ci = {j |j ≤ i , (i , j) ∈ N}, i = 1, . . . , n.

ThenT = {(i , j , k)|i = 1, ..., n, j ∈ Ci , k ∈ Ci ∩ Cj}.

For a given (i , j) the indices k so that (i , j , k) ∈ T is called tube (i,j).(Bader andKolda 2004)

A Key Point

The intersection of Ci and Cj defines the tube (i , j).

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

SparsityGeneral Sparsity

The (Extreme) Sparsity of the Tensor

The induced tensor sparsity ratio is

nnz(T )(n(n+1)(n+2))

6

100%.

The matrix sparsity ratio is

nnz(H)(n(n+1))

2

100%.

Consider the nos set from Matrix Market.

Sparsity Ratio %

Matrix n nnz(H) nnz(T ) tensor matrixnos1 237 627 1017 0.0453 3.6060nos2 957 2547 4137 0.0028 0.9025nos3 960 8402 35066 0.0237 7.6019nos4 100 347 630 0.3669 12.4752nos5 468 2820 6359 0.0370 5.7943nos6 675 1965 3255 0.0063 1.4267nos7 729 2673 4617 0.0071 1.7352

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

SparsityGeneral Sparsity

A general sparse implementation of: H + (pT )

T ∈ Rn×n×n is a super-symmetric tensor.H ∈ Rn×n is a symmetric matrix.Let p ∈ Rn.Let Ci is the nonzero index pattern of row i of the Hessian matrix.for i = 1 to n do

for j ∈ Ci ∧ j < i dofor k ∈ Ci ∩ Cj ∧ k < j do

Hij+ = pkTijk

Hik+ = pjTijk

Hjk+ = piTijk

end forHij+ = pjTijj

Hjj+ = piTijj

end forfor k ∈ Ci ∧ k < i do

Hii+ = pkTiik

Hik+ = piTiik

end forHii+ = piTiii

end for

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

Practical Issues

Four implementations:

1 Store k ∈ Ci ∩ Cj .

2 Let k ∈ Cj and if k ∈ Ci .

3 Let k ∈ Cj and Index ik = 0 when k 6∈ Ci and 1 otherwise.

4 Expand storage of tube (i , j) to |Cj |.With these implementations of k ∈ Ci ∩ Cj , k < j there is a tradeoffbetween memory and operations (arithmetic or logical).

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

Intersection with and without if: (pT )

Set the elements of t to false.for i = 1 to n do

Compute t(i)k

= true if k ∈ Cifor j ∈ Ci ∧ j < i do

for k ∈ Cj ∧ k < j do

if t(i)k

thenHij+ = pkTijkHik+ = pjTijkHjk+ = piTijk

end ifend forHij+ = pjTijjHjj+ = piTijj

end forfor k ∈ Ci ∧ k < i do

Hii + = pkTiikHik+ = piTiik

end forHii + = piTiii

Reset t(i)k

= false if k ∈ Ciend for

Set the elements of Index to zero.for i = 1 to n do

Compute Index(i)k

= 1 if k ∈ Cifor j ∈ Ci ∧ j < i do

for k ∈ Cj ∧ k < j do

Tijk = Tijk Index(i)k

Hij+ = pkTijkHik+ = pjTijkHjk+ = piTijk

end forHij+ = pjTijjHjj+ = piTijj

end forfor k ∈ Ci ∧ k < i do

Hii + = pkTiikHik+ = piTiik

end forHii + = piTiii

Reset Index(i)k

= 0 if k ∈ Ciend for

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

Numerical Results: Computing (pT )p and (pT )

CPU Measurements for the Gradient Term (milliSeconds)

Matrix n nnz(H) nnz(T ) Full storage ”if” Index x-storagenos3 960 8402 35066 701 766 1242 1047nos4 100 347 630 16 23 28 20nos5 468 2820 6359 154 222 373 289

bcsstk19 817 3835 10363 220 245 355 226bcsstk22 138 417 841 22 29 32 27gr3030 900 4322 11108 226 247 343 351

CPU Measurements for the Hessian Term (milliSeconds)

Matrix n nnz(H) nnz(T ) Full storage ”if” Index x-storagenos3 960 8402 35066 996 1195 1717 1514nos4 100 347 630 20 23 28 27nos5 468 2820 6359 224 288 461 407

bcsstk19 817 3835 10363 216 380 411 483bcsstk22 138 417 841 24 28 38 31gr3030 900 4322 11108 230 421 567 525

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

Analysis

Operations = c1S + c2nnz(T ) + c3nnz(H) + c4n.

where

S =n∑

i=1

∑j∈Ci

|Cj | , nnz(T ) =n∑

i=1

∑j∈Ci

|Ci ∩ Cj | and nnz(H) =n∑

i=1

|Ci |

- For full storage, c1 = 0, c2 = 6. Memory is 2nnz(T ).

- Implementations with if and full storage have the same number of arithmeticoperations.

- For intersection with if c1 = 1, c2 = 6 and memory is nnz(T )

- For Index and x-storage c1 = 6, but memory is nnz(T ) or S ≥ nnz(T )

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

How to Utilize Sparsity in the Problem: A Skyline Matrix

A matrix has a symmetric skyline structure (envelope structure) ifall nonzero elements in a row are located from the first nonzeroelement to the element on the diagonal. Define βi to be the(lower) bandwidth of row i ,

βi = max{i − j | Hij 6= 0 with j < i .}

and define fi to be the start index for row i in the Hessian matrix

fi = i − βi

ThenCj = {k|fj ≤ k ≤ j}

Ci ∩ Cj = {k|max{fi , fj} ≤ k ≤ j}, j ≤ i .

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

Skyline implementation of: pT (pT )p

Let T ∈ Rn×n×n be a super-symmetric tensor, let p ∈ Rn be a vector.Let {f1, . . . , fn} be the indices of the first nonzero elements for each row inthe Hessian matrix.Let c, s, t ∈ R be a scalar.for i = 1 to n do

t = 0for j = fi to i − 1 do

s = 0for k = max{fi , fj} to j − 1 do

s+ = pkTijk

end fort+ = pj (6s + 3pjTijj )

end fors = 0for k = fi to i − 1 do

s+ = pkTiik

end forc+ = pi (t + pi (3s + piTiii ))

end for

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

Halley and Two Steps of Newton in Review

Two Steps of Newton is:

Solve for s(1)k : F ′(xk )s

(1)k = −F (xk )

Solve for s(2)k : F ′(xk + s

(1)k )s

(2)k = −F (xk + s

(1)k )

Update the iterate: xk+1 = xk + s(1)k + s

(2)k

One step Halley:

Solve for s(1)k : F ′(xk )s

(1)k = −F (xk )

Solve for s(2)k : [F

′(xk ) + αF

′′(xk )s

(1)k ]s

(2)k = − 1

2F

′′(xk )s

(1)k s

(1)k

Update the iterate: xk+1 = xk + s(1)k + s

(2)k

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

Computational Requirements

The tensor computations and LDLT decomposition for dense, banded and skyline.

Number of floating point arithmetic operations

Operation Dense Banded Skyline

LDLT 13n3 + 1

2n2 − 5

6n nβ2 + 2nβ + n − 2

3β3 − 3

2β2 − 5

6β 2nnz(T ) − nnz(H) − n

(pT ) n3 + n2 3nβ2 + 5nβ − n − 2β3 − 4β2 − 3β 6nnz(T ) − 4nnz(H) − n

(pT )p 23n3 + 6n2 − 2

3n 2nβ2 + 14nβ + 7n − 4

3β3 − 8β2 − 20

3β 4nnz(T ) + 8nnz(H) − 6n

The LDLT decomposition has the same complexity as the tensor operations.The total computational requirements for the Newton’s and super Halley’s method.

Computational Requirements

Method Dense Banded Skyline

Newton 13n3 + 5

2n2 − 5

6n nβ2 + 6nβ + 3n − 2

3β3 − 7

2β2 − 17

6β 2nnz(T ) + 3nnz(H) − 3n

Super Halley 53n3 + 19

2n2 − 7

6n 5nβ2 + 22nβ + 11n − 10

3β3 − 14β2 − 70

6β 10nnz(T ) + 7nnz(H) − 9n

The Halley class and Newton’s method has the same asymptotic upper bound for the

dense, banded and skyline structure.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Less Memory More FlopsNumerical ResultsSkyline StructureThe Cost of Newton’s and Halley’s Methods

Upper Bound for the Skyline Structure

Theorem

The ratio of the number of arithmetic operations of a method inthe Halley class and Newton’s method is constant per iteration.

flops(One Step Halley)flops(One Step Newton)

≤ 5

when the tensor is induced by a skyline structure of the Hessianmatrix and we use a direct method to solve the systems of linearequations.

(Rheinboldt 1998): Clearly, for increasing dimension n therequired computational work soon outweighs the advantage of thehigher-order convergence

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Test Cases

Chained and Generalized Rosenbrock

Chained Rosenbrock (Toint 1982):

f (x) =n∑

i=2

[6.4(xi−1 − x2i )2 + (1− xi )

2]

Generalized Rosenbrock (Schwefel 1977):

f (x) =n∑

i=2

[(xn − x2i )2 + (xi − 1)2]

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Test Cases

Chained and Generalized Rosenbrock

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

5

10

15

20

25

30

35

40

xO

=(1.7,...,1.7) and x*=(1.0,...,1.0)

n

CP

U T

imin

gs (

mS

)

Newton 9 iterationsChebyshev 6 iterationsHalley 6 iterationsSuper Halley 4 iterations

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

5

10

15

20

25

xO

=(1.08,0.99,...,1.08,0.99) and x*=(1.0,...,1.0)

nC

PU

Tim

ing

s (

mS

)

Newton 5 iterationsChebyshev 3 iterationsHalley 3 iterationsSuper Halley 3 iterations

The termination criteria for all methods are if ‖∇f (xk)‖ ≤ 10−8‖∇f (x0)‖.The total CPU time include function, gradient, Hessian and tensor evaluations.

Trond Steihaug Newton and Halley are one step apart

Methods for Solving Nonlinear EquationsLocal Methods for Unconstrained Optimization

The General Sparsity of the Third DerivativeHow to Utilize Sparsity in the Problem

Numerical Results

Test Cases

References

B. W. Bader and T. G. Kolda. MATLAB Tensor Classes for Fast Algorithm Prototyping. Technical Report

SAND 2004-5187, October 2004.

D. Chen, I. K. Argyros and Q. Qian. A Local Convergence Theorem for the Super-Halley Method in a

Banach Space. Appl. Math. Lett. Vol. 7, 5, pp. 49-52, 1994.

A. Griewank and Ph. L. Toint. On the unconstrained optimization of partially separable functions. In Michael

J. D. Powell, editor, Nonlinear Optimization 1981, pages 301-312. Academic Press, New York, NY, 1982.

G. Gundersen and T. Steihaug. Sparsity in Higher Order Methods in Optimization. Reports in Informatics

327, Dept. of Informatics, Univ. of Bergen, 2006.

J. M. Gutierrez and M. A. Hernandez. An acceleration of Newton’s method: Super-Halley method. Applied

Mathematics and Computation. 25 January 2001, vol. 117, no. 2, pp. 223-239(17).

H. P. Schwefel. Numerical Optimization of Computer Models. John Wiley and Sons, Chichester, 1981.

Ph.L. Toint. Some numerical results using a sparse matrix updating formula in unconstrained optimization.

Mathematics of Computation, Volume 32, Number 143. July 1978, pages 839-851.

Trond Steihaug Newton and Halley are one step apart