Tikhonov regularization and a posteriori rules for solving nonlinear ...

INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS

Inverse Problems 19 (2003) 1–21 PII: S0266-5611(03)35534-0

Tikhonov regularization and a posteriori rules forsolving nonlinear ill posed problems

U Tautenhahn1 and Qi-nian Jin2

1 Department of Mathematics, University of Applied Sciences Zittau/Gorlitz, PO Box 1455,02755 Zittau, Germany2 Department of Mathematics, Rutgers University, New Brunswick, NJ 08903, USA

E-mail: [email protected] and [email protected]

Received 2 April 2002, in final form 24 October 2002Published 15 November 2002Online at stacks.iop.org/IP/19/1

AbstractBesides a priori parameter choice we study a posteriori rules for choosing theregularization parameter α in the Tikhonov regularization method for solvingnonlinear ill posed problems F(x) = y, namely a rule 1 of Scherzer et al(Scherzer O, Engl H W and Kunisch K 1993 SIAM J. Numer. Anal. 30 1796–838) and a new rule 2 which is a generalization of the monotone error rule ofTautenhahn and Hamarik (Tautenhahn U and Hamarik U 1999 Inverse Problems15 1487–505) to the nonlinear case. We suppose that instead of y there are givennoisy data yδ satisfying ‖y − yδ‖ � δ with known noise level δ and prove thatrule 1 and rule 2 yield order optimal convergence rates O(δ p/(p+1)) for the rangesp ∈ (0, 2] and p ∈ (0, 1], respectively. Compared with foregoing papersour order optimal convergence rate results have been obtained under muchweaker assumptions which is important in engineering practice. Numericalexperiments verify some of the theoretical results.

1. Introduction

In this paper we consider nonlinear ill posed problems

F(x) = y (1.1)

where F : D(F) ⊂ X → Y is a nonlinear operator with non-closed range R(F) and X, Yare infinite dimensional real Hilbert spaces with corresponding inner products (·, ·) and norms‖ · ‖, respectively. We are interested in the x-minimum-norm solution x† of problem (1.1),that is, x† satisfies

F(x†) = y and ‖x† − x‖ = minx∈D(F)

{‖x − x‖ ∣∣ F(x) = y}.We assume throughout this paper that yδ ∈ Y are the available noisy data with

‖y − yδ‖ � δ (1.2)

0266-5611/03/010001+21$30.00 © 2003 IOP Publishing Ltd Printed in the UK 1

http://stacks.iop.org/IP/19/1

2 U Tautenhahn and Q-N Jin

and known noise level δ. We further assume that F possesses a locally uniformly boundedFrechet derivative F ′(·) in a ball Br (x†) of radius r around x† ∈ X .

For the stable numerical solution of nonlinear ill posed problems regularization methodsare necessary. While for linear ill posed problems the regularization theory is rather complete(cf [2–4, 10, 12, 15, 22, 33, 35]), there are still many open problems in the regularization theoryfor nonlinear ill posed problems. For nonlinear ill posed problems, Tikhonov regularization(cf [4, 5, 9, 16, 17, 19–21, 27, 29, 30]) is known as one of the most widely applied methods. Inthis method a regularized approximation x δ

α is obtained by solving the minimization problem

minx∈D(F)

Jα(x), Jα(x) = ‖F(x) − yδ‖2 + α‖x − x‖2 (1.3)

with an initial guess x ∈ X and a properly chosen regularization parameter α > 0. If x δα is an

interior point of D(F), then the regularized approximation x δα satisfies the Euler equation

F ′(x)∗[F(x) − yδ] + α(x − x) = 0 (1.4)

of Tikhonov’s functional Jα(x). In formula (1.4) F ′(x)∗ is the adjoint of the Frechet derivativeF ′(x).

One important question in the application of the regularization method (1.3) is theproper choice of the regularization parameter α > 0. With too little regularization, theregularized approximations x δ

α are highly oscillatory due to noise amplification. With toomuch regularization, the regularized approximations are too close to the initial guess x dueto the limit relation limα→∞ x δ

α = x . Ideally, one should select the regularization parameterα such that the total error ‖x δ

α − x†‖ is minimized. Since x† is unknown one has to choosealternative rules which choose α from quantities that arise during calculations. Althoughmany rules have been proposed, very few of them are used in practice. Among the rules thathave found their way into engineering practice, the most common are Morozov’s discrepancyprinciple [23, 34] which requires the knowledge of a reliable lower bound of the noise levelδ, and in the linear case some noise-free rules such as generalized cross validation [8, 36] andthe L-curve method [13, 14]. On the other hand, noise-free rules occasionally fail. This cantheoretically be justified since due to Bakushinskii [1] there is a negative result which tellsus that for noise free rules convergence x δ

α → x† cannot be guaranteed. Besides Morozov’sdiscrepancy principle, one prominent a posteriori rule that requires the knowledge of thenoise level δ is the rule of Scherzer, Engl and Kunisch [27], which is an extension of therule of Raus and Gfrerer [7, 25] to the nonlinear case (see rule 1 in section 3.1). However,this theoretically well justified method requires very strong assumptions which can seldom besatisfied in engineering problems, see section 3.1 for some more detailed discussion.

In this paper we mainly reconsider rule 1 and give, compared with foregoing papers [17–19, 27], theoretical justifications of this rule under weaker assumptions which are generallymet in engineering practice. In addition, we study a second a posteriori rule (see rule 2 insection 3.1) which is a generalization of the monotone error rule (see [31, 32]) to the nonlinearcase. Since rule 2 outperforms rule 1 in the case of linear ill posed problems (see [32]) it makessense to study rule 2 also for nonlinear ill posed problems.

Some important properties of convergent a posteriori rules are order optimal Holder typeerror bounds ‖x δ

α − x†‖ = O(δ p/(p+1)). For the concept of order optimality see, e.g., [5]. Theproof of order optimal error bounds requires a source condition and a nonlinearity condition.We will separate our studies into different cases p = 1, p ∈ [1, 2] and p ∈ (0, 1]. In the firsttwo cases p = 1 and p ∈ [1, 2] our analysis requires the following two assumptions.

Assumption 1. There exist elements v ∈ Y and w ∈ Y such that with A = F ′(x†)

x − x† = A∗v and x − x† = A∗(AA∗)(p−1)/2w for p � 1.

Tikhonov regularization and a posteriori rules 3

Assumption 2. The Frechet derivative F ′(·) is locally Lipschitz in a ball Br(x†) ⊂ D(F) ofradius r around x† ∈ X , that is, there exists a Lipschitz constant L � 0 with

‖F ′(x) − F ′(x0)‖ � L‖x − x0‖ for all x, x0 ∈ Br (x†).

In the rough case p ∈ (0, 1] we do not know if Lipschitz continuity of the Frechetderivative F ′(·) is sufficient for guaranteeing order optimal error bounds. In this case insteadof assumption 1 and assumption 2 we will exploit the following two assumptions.

Assumption 3. There exists an element w ∈ X such that with A = F ′(x†)

x − x† = (A∗ A)p/2w for p > 0.

Assumption 4. There exists a constant k0 � 0 such that for all x , x0 ∈ Br (x†) ⊂ D(F) andv ∈ X there exists an element k(x, x0, v) ∈ X with property

[F ′(x) − F ′(x0)]v = F ′(x0)k(x, x0, v) and ‖k(x, x0, v)‖ � k0‖x − x0‖‖v‖.Note that the converse results in [4] for linear ill posed problems imply that assumption 1

or assumption 3, respectively, is necessary for the convergence rate ‖x δα − x†‖ = O(δ p/(p+1)).

However, the verification of assumption 1 or assumption 3, respectively, is very hard or evenimpossible for the majority of applied nonlinear ill posed problems. Since the operator F ′(x†) isgenerally smoothing, these assumptions may be considered as abstract smoothness conditionsconcerning the (unknown) difference element x − x† and become stronger for larger p-values.Further note that from assumption 4 there follows assumption 2 with L = k0‖F ′(x0)‖. Hence,assumption 2 is weaker than assumption 4. An example of a nonlinear ill posed problemsatisfying assumption 2 but not assumption 4 is the autoconvolution problem discussed in [9].

Let us collect two well known estimates which will be applied throughout this paper. Thefirst estimate

‖F(x) − F(x0) − F ′(x0)(x − x0)‖ � L

2‖x − x0‖2 (1.5)

follows from assumption 2. The second estimate (see, e.g., [4]) follows from spectral theoryand tells us that for linear operators A ∈ L(X, Y ) and ν ∈ [0, 1]

‖(AA∗ + α I )−1(AA∗)ν‖ � cναν−1 (1.6)

with a constant cν satisfying cν � νν(1 − ν)1−ν � 1.The paper is organized as follows. In section 2 we study the case of a priori parameter

choice. These studies are mainly based on functional techniques and are essential for thestudy of the a posteriori rule 1 and rule 2 in section 3. Both sections 2 and 3 are divided intosubsections in which, for technical reasons, the special cases p = 1, p ∈ [1, 2] and p ∈ (0, 1],respectively, are treated separately. The order optimal error bounds in sections 2 and 3 reduceto well known error bounds for linear ill posed problems provided L = 0 in assumption 2 ork0 = 0 in assumption 4. Finally, in section 4 we provide numerical experiments that illustratesome of the theoretical results.

2. A priori parameter choice

In this section we prove some new error bounds for the method of Tikhonov regularizationin the case of a priori parameter choice. These estimates are mainly based on functionaltechniques and will be exploited in studying a posteriori parameter choice in section 3. Letus start with a well known preliminary result which follows from the minimizing property ofthe regularized approximation x δ

α of problem (1.3).


Proposition 2.1. Let x δα be a minimizer of Tikhonov’s functional (1.3). Then,

‖x δα − x‖ � δ√

α+ ‖x† − x‖ and ‖x δ

α − x†‖ � δ√α

+ 2‖x − x†‖. (2.1)

Estimate (2.1) shows that x δα ∈ Br (x†) with radius r = δ/

√α + 2‖x − x†‖. In order

to guarantee order optimal error bounds of Holder type ‖x δα − x†‖ = O(δ p/(p+1)) a source

condition and a nonlinearity condition are needed. In our first subsection we will consider thespecial case p = 1 in assumption 1.

2.1. Error bounds in the case p = 1

In this subsection we treat the special case p = 1 in assumption 1. This case has been studiedfirst in [5]. From [4] we have

Theorem 2.2. Let x δα ∈ D(F) be a solution of problem (1.3). Assume assumption 1 with

p = 1 and assumption 2 with radius r = δ/√

α + 2‖x − x†‖. If L‖v‖ < 1, then

‖x δα − x†‖ � δ + α‖v‖√

α√

1 − L‖v‖ and ‖F(x δα) − yδ‖ � δ + 2α‖v‖. (2.2)

If α is chosen a priori by α ∼ δ, then ‖x δα − x†‖ = O(δ1/2).

In the next two theorems we treat the two error terms ‖xα − x†‖ and ‖x δα − xα‖ separately.

These estimates are useful for a serious study of a posteriori parameter choice in section 3.In addition, our error bound for ‖x δ

α − xα‖ in theorem 2.4 improves a corresponding stabilityestimate which has been derived in [26].

Theorem 2.3. Let xα be a solution of problem (1.3) with yδ replaced by the exact data y. As-sume assumption 1 with p = 1 and assumption 2 with radiusr = 2‖x−x†‖. If 3L‖v‖ � 2, then

‖xα − x†‖ �√

α‖v‖2√

1 − L‖v‖ and ‖F(xα) − y‖ � α‖v‖. (2.3)

Proof. Let Aα = F ′(xα). Using the Euler equation A∗α[F(xα) − y] + α(xα − x) = 0 yields

α‖xα − x†‖2 = α(xα − x†, x − x†) + α(xα − x†, xα − x)

= α(xα − x†, x − x†) + (Aα(x† − xα), F(xα) − y).

We add ‖F(xα) − y‖2 on both sides, use the representation x − x† = A∗v of assumption 1,exploit in addition (1.5) and obtain

‖F(xα) − y‖2 + α‖xα − x†‖2 = α(y + A(xα − x†) − F(xα), v) + α(F(xα) − y, v)

+ (F(xα) + Aα(x† − xα) − y, F(xα) − y)

� αL‖v‖

2‖xα − x†‖2 + α‖v‖‖F(xα) − y‖ +

L

2‖xα − x†‖2‖F(xα) − y‖. (2.4)

From (2.2) with δ = 0 we know that ‖F(xα) − y‖ � 2α‖v‖. Consequently,

‖F(xα) − y‖2 + α‖xα − x†‖2 � α3L‖v‖

2‖xα − x†‖2 + α‖v‖‖F(xα) − y‖. (2.5)

We use the assumption 3L‖v‖ � 2 and obtain ‖F(xα) − y‖2 � α‖v‖‖F(xα) − y‖ whichprovides the second estimate of (2.3). To prove the first estimate of (2.3) we use (2.4) and thesecond estimate of (2.3) as well as the estimate 2ab � a2 + b2 and obtain

‖F(xα) − y‖2 + α‖xα − x†‖2 � αL‖v‖‖xα − x†‖2 + α‖v‖‖F(xα) − y‖2

� αL‖v‖‖xα − x†‖2 +α2‖v‖2

4+ ‖F(xα) − y‖2

which gives α(1 − L‖v‖)‖xα − x†‖2 � α2‖v‖2/4, and hence the first estimate of (2.3). �


Theorem 2.4. Let x δα ∈ D(F) be a solution of problem (1.3) and xα ∈ D(F) a solution of

problem (1.3) with yδ replaced by the exact data y. Assume assumption 1 with p = 1 andassumption 2 with radius r = δ/

√α + 2‖x − x†‖. If 3L‖v‖ � 2, then

‖x δα − xα‖ � δ√

α√

1 − L‖v‖ and ‖F(x δα) − yδ + y − F(xα)‖ � δ. (2.6)

Proof. We use the inequality Jα(x δα) � Jα(xα) and obtain

‖F(x δα) − yδ‖2 + α‖x δ

α − x‖2 � ‖F(xα) − yδ‖2 + α‖xα − x‖2.

We add on both sides the expression

2(F(x δα) − yδ, y − F(xα)) + ‖F(xα) − y‖2 + α{2(x δ

α − x, x − xα) + ‖xα − x‖2}and obtain by using the Euler equation F ′(xα)

∗(F(xα) − y) + α(xα − x) = 0 and (1.5)

‖F(x δα) − yδ + y − F(xα)‖2 + α‖x δ

α − xα‖2 � ‖F(xα) − yδ‖2 + ‖F(xα) − y‖2

+ 2(F(x δα) − yδ, y − F(xα)) + 2α(xα − x, xα − x δ

α)

= ‖y − yδ‖2 + 2(F(xα) − y, F(xα) − F(x δα)) + 2α(xα − x, xα − x δ

α)

= ‖y − yδ‖2 + 2(F(xα) − y, F(xα) + F ′(xα)(x δα − xα) − F(x δ

α))

� δ2 + L‖F(xα) − y‖‖x δα − xα‖2. (2.7)

From (2.7) and the second estimate of (2.3) we obtain

α(1 − L‖v‖)‖x δα − xα‖2 � δ2

which gives the first estimate of (2.6). The second estimate of (2.6) follows from (2.7), thesecond estimate of (2.3) and L‖v‖ � 1. �

2.2. Error bounds in the case p ∈ [1, 2]

In this subsection we shall provide a new order optimal error bound for the total error ‖x δα −x†‖

in the case of assumption 1 with p ∈ [1, 2]. Error bounds for the range p ∈ [1, 2] were studiedfirst in [24]. The proof in [24] uses functional techniques and exploits, as theorem 2.2, theinequality Jα(x δ

α) � Jα(x†). This in turn requires quite strong assumptions. Our idea of proofrequires less strong assumptions and is based on the inequality Jα(x δ

α) � Jα(x† + αBv) withB = A∗(AA∗ + α I )−1 and A = F ′(x†). The resulting estimate improves a correspondingbound given in [30] and will be exploited for providing error bounds in the case of a posterioriparameter choice in section 3.

Theorem 2.5. Let x δα ∈ D(F) be a solution of problem (1.3). Assume assumption 1 with fixed

p ∈ [1, 2] and assumption 2 with radius r = δ/√

α + 2‖x − x†‖. If L‖v‖ < 1, then

‖x δα − x†‖ � δ√

α√

1 − L‖v‖ + α p/2‖w‖ 1 + L‖v‖/2√1 − L‖v‖ . (2.8)

If α is chosen a priori by α ∼ δ2/(p+1), then ‖x δα − x†‖ = O(δ p/(p+1)).

Proof. Let us introduce the notations A = F ′(x†) and B = A∗(AA∗ + α I )−1 = (A∗ A +α I )−1 A∗. Since ‖αBv‖ � ‖x − x†‖ we have x† +αBv ∈ D(F). Consequently, Jα(x† +αBv)

is well defined and the inequality Jα(x δα) � Jα(x† + αBv) provides

α‖x δα − x†‖2 � ‖yδ − F(x†) − αABv + αv‖2 + α3‖Bv‖2

+ 2α(F(x†) + A(x δα − x†) − F(x δ

α), v)

+ ‖F(x† + αBv) − yδ‖2 − ‖F(x†) + αABv − yδ‖2 (2.9)


(cf estimate (9) in [30]). From the estimates (10) and (11) in [30] we obtain for the first threesummands s1 + s2 + s3 on the right-hand side of (2.9)

s1 + s2 + s3 � (α(p+1)/2‖w‖ + δ)2 + αL‖v‖ ‖x δα − x†‖2. (2.10)

We apply (1.5) with x = x† + αBv, the inequality ‖u‖2 −‖v‖2 � ‖u + v‖‖u − v‖ , the triangleinequality and (1.2) as well as assumption 1 and obtain for the final two summands s4 + s5 onthe right-hand side of (2.9) the estimate

s4 + s5 � ‖F(x† + αBv) + αABv + F(x†) − 2yδ‖‖F(x† + αBv) − αABv − F(x†)‖� {‖F(x† + αBv) − αABv − F(x†)‖ + 2‖αABv + F(x†) − yδ‖} L

2‖αBv‖2

�{

L

2‖αBv‖2 + 2α‖v‖ + 2δ

}L

2‖αBv‖2

where we have used the estimate ‖αABv‖ � α‖v‖. Now we apply the two valid estimates‖αBv‖2 � α(p+1)/2‖v‖‖w‖ and ‖αBv‖ � α p/2‖w‖ that follow from assumption 1, (1.6) andobtain

s4 + s5 � L2

4α p+1‖v‖2‖w‖2 + Lα p+1‖v‖‖w‖2 + Lδα(p+1)/2‖v‖‖w‖. (2.11)

Substituting (2.10), (2.11) into (2.9) provides

α(1 − L‖v‖)‖x δα − x†‖2 � (δ + α(p+1)/2‖w‖(1 + L‖v‖/2))2.

From this estimate we obtain (2.8). The convergence rate result follows from (2.8) togetherwith the a priori parameter choice of α. �

2.3. Error bounds in the case p ∈ (0, 1]

The theorems in sections 2.1 and 2.2 treat the smooth case p ∈ [1, 2] in assumption 1. In therough case p ∈ (0, 1) in assumption 1 or assumption 3, respectively, we do not know if orderoptimal error bounds hold true with the nonlinearity assumption 2. Nonlinearity assumptionsof a different kind which are stronger than assumption 2 have been exploited in [16, 18, 20].Here we exploit a nonlinearity assumption 4 from [18] which allows us to treat not only the casep ∈ (0, 1], but even the general case p ∈ (0, 2]. The resulting error bounds will be exploitedin studying the case of a posteriori parameter choice in section 3.4. Our first theorem providesan error bound for ‖xα − x†‖ and improves a corresponding result in [18].

Theorem 2.6. Let xα be a solution of the regularized problem (1.3) with yδ replaced by the exactdata y. Assume assumption 3 with p ∈ (0, 2] and assumption 4 with radius r = 2‖x − x†‖.If 2k0‖x − x†‖ < 1, then

‖xα − x†‖ � α p/2‖w‖1 − 2k0‖x − x†‖ . (2.12)

Proof. Let us use the notations A = F ′(x†) and Aα = F ′(xα). We apply the Euler equationA∗

α[F(xα) − y] + α(xα − x) = 0 and obtain

(A∗ A + α I )(xα − x†) = A∗ A(xα − x†) + A∗α[y − F(xα)] + α(x − x†)

= α(x − x†) + (A∗α − A∗)[y − F(xα)] + A∗[y + A(xα − x†) − F(xα)].

Multiplying both sides by (A∗ A + α I )−1 yields

xα − x† = α(A∗ A + α I )−1(x − x†) + (A∗ A + α I )−1(A∗α − A∗)[y − F(xα)]

+ (A∗ A + α I )−1 A∗[y + A(xα − x†) − F(xα)]. (2.13)


To estimate the first summand s1 on the right-hand side of (2.13) we use assumption 3 as wellas estimate (1.6) and obtain

‖s1‖ = ‖α(A∗ A + α I )−1(A∗ A)p/2w‖ � α p/2‖w‖. (2.14)

To estimate the second summand s2 on the right-hand side of (2.13) we use assumption 4 andthe Euler equation A∗

α[F(xα)− y]+α(xα −x) = 0 as well as the estimate ‖x −xα‖ � ‖x −x†‖which follows from proposition 2.1 with δ = 0 and obtain along the lines of estimate (A.7)in [19]

‖s2‖ � k0‖x − x†‖‖xα − x†‖. (2.15)

From the mean value theorem and assumption 4 we obtain

F(x2) − F(x1) − F ′(x1)(x2 − x1) =∫ 1

0[F ′(x1 + t (x2 − x1)) − F ′(x1)](x2 − x1) dt

= F ′(x1)

∫ 1

0k(x1 + t (x2 − x1), x1, x2 − x1) dt . (2.16)

We use the representation (2.16) with x1 = x†, x2 = xα and obtain for the third summand onthe right-hand side of (2.13)

‖s3‖ � ‖(A∗ A + α I )−1 A∗ A‖∫ 1

0‖k(x† + t (xα − x†), x†, xα − x†)‖ dt

� k0

2‖xα − x†‖2 � k0‖x − x†‖‖xα − x†‖. (2.17)

Now (2.12) follows from (2.13)–(2.15) and (2.17). �Our next theorem provides an error bound for the error ‖x δ

α − xα‖. In contrast to the errorbounds given in theorem 2.4 no source condition is required. This theorem improves stabilityresults which have been obtained in the papers [27] and [19].

Theorem 2.7. Let x δα be a solution of problem (1.3) and xα a solution of (1.3) with yδ replaced

by the exact data y. Assume assumption 4 with radius r = δ/√

α+2‖x−x†‖. If k0‖x†−x‖ < 1,then

‖x δα − xα‖ � δ√

α√

1 − k0‖x† − x‖ and ‖F(x δα) − yδ + y − F(xα)‖ � δ. (2.18)

Proof. Using the first part of (2.7), the representation (2.16), the Euler equation (1.4) withδ = 0 and assumption 4 as well as the estimate ‖xα − x‖ � ‖x† − x‖ yields

‖F(x δα) − yδ + y − F(xα)‖2 + α‖x δ

α − xα‖2

� δ2 + 2(F(xα) − y, F(xα) + F ′(xα)(x δα − xα) − F(x δ

α))

= δ2 + 2

(F(xα) − y, F ′(xα)

∫ 1

0k(xα + t (x δ

α − xα), xα, x δα − xα) dt

)

= δ2 + 2

(α(x − xα),

∫ 1

0k(xα + t (x δ

α − xα), xα, x δα − xα) dt

)

� δ2 + 2α‖xα − x‖∫ 1

0‖k(xα + t (x δ

α − xα), xα, x δα − xα)‖ dt

� δ2 + αk0‖x† − x‖‖x δα − xα‖2. (2.19)

We neglect the first summand on the left-hand side, rearrange terms and obtain the estimateα(1 − k0‖x† − x‖)‖x δ

α − xα‖2 � δ2 which gives the first inequality of (2.18). The secondinequality of (2.18) follows from (2.19) and k0‖x† − x‖ < 1. �


3. A posteriori parameter choice

3.1. A posteriori rules

Throughout this section we will consistently use the notations

A = F ′(x†), R = α(AA∗ + α I )−1, R = α(A∗ A + α I )−1,

Aα = F ′(xα), Rα = α(Aα A∗α + α I )−1, Rα = α(A∗

α Aα + α I )−1,

Aδα = F ′(x δ

α), Rδα = α(Aδ

α(Aδα)

∗ + α I )−1, Rδα = α((Aδ

α)∗ Aδα + α I )−1.

A priori parameter choice is not suitable in practice since a good regularization parameter α

requires the knowledge of the norm ‖w‖ and the smoothness parameter p of assumption 1 orassumption 3, respectively. This knowledge is not necessary for a posteriori parameter choice.For the method of Tikhonov regularization one well known a posteriori rule is Morozov’sdiscrepancy principle (see [4, 23, 34]) in which the regularization parameter α is chosen asthe solution of the nonlinear scalar equation ‖F(x δ

α) − yδ‖ = Cδ with a constant C > 1.Morozov’s discrepancy principle possesses some nice order optimality properties discussedin [34]. However, the best possible convergence rate is known to be ‖x δ

α − x†‖ = O(√

δ). Animplementable a posteriori rule for choosing the regularization parameter α in the method ofTikhonov regularization (1.3) for which order optimal convergence rates up to the best possibleorder O(δ2/3) can be guaranteed has been studied in [17, 19, 27].

Rule 1. Choose the regularization parameter α as the solution of the equation

d1(α) := ‖(Rδα)1/2[F(x δ

α) − yδ]‖ = Cδ. (3.1)

A similar a posteriori rule which is based on monotonicity arguments has been proposedin [32].


d2(α) := ‖(Rδα)1/2[F(x δ

α) − yδ]‖2

‖Rδα[F(x δ

α) − yδ]‖ = Cδ. (3.2)

For ill posed problems with linear operators F it is well known that rule 2 always providesa more accurate regularized solution than rule 1 (see [31]). Therefore it also makes sense tostudy this rule in the case of nonlinear ill posed problems.

Rule 1 has been studied in different papers. In [27] it has been shown that underassumption 3 and some further strong nonlinearity conditions concerning F (compare theassumptions (10)–(14), (93)–(98) and (151) in [27]) order optimal error bounds ‖x δ

α − x†‖ =O(δ p/(p+1)) can be guaranteed for the maximal range p ∈ (0, 2]. The strong nonlinearityconditions in [27] concerning F have been weakened in [19] for the range p ∈ (0, 2] whereassumption 4 is exploited, and in [17] for p = 2 where assumption 2 is exploited.

In our forthcoming sections we consider the range p ∈ (0, 2] and divide our studies into thespecial cases p = 1, p ∈ [1, 2] and p ∈ (0, 1]. Compared with foregoing papers [17, 19, 27]we obtain order optimal error bounds under much weaker assumptions. We found the mostimportant improvements for the special cases p = 1 and p ∈ (1, 2]. In these cases thenonlinearity assumption 4 is not necessary and assumption 2 is sufficient. Moreover, also forthe special case p ∈ (0, 1], in which the nonlinearity assumption 4 is exploited, we have foundimprovements of former results. Our order optimal error bounds are sharper and are valid forarbitrary constants C > 1 in both rule 1 and rule 2 (provided k0‖x − x†‖ is sufficiently small),while in foregoing papers order optimal error bounds for rule 1 could only be established forlarge C-values.


Let us start our study with the justification of rule 1 and rule 2. In [17] conditions aregiven that guarantee that equation (3.1) has a solution α = α(δ). In an analogous way it canbe shown that under the same conditions equation (3.2) also has a solution α = α(δ).

Proposition 3.1. Let C > 1 and ‖F(x) − yδ‖ > Cδ. If the regularized problem (1.3) has aunique solution x δ

α for each

α � α0 := (C − 1)2

‖x − x†‖2δ2, (3.3)

then both equations (3.1) and (3.2) possess a solution α = α1(δ) and α = α2(δ), respectively,and both solutions satisfy (3.3).

The proof of this proposition for rule 1 may be found in [17] and is based on the facts thatd1(α) is continuous for α ∈ [α0,∞), that lim

α→∞ d1(α) = ‖F(x) − yδ‖ and that

d1(α0) � ‖F(x δα0

) − yδ‖ � δ +√

α0‖x − x†‖ = Cδ

which follows from the estimates ‖(Rδα0

)1/2‖ � 1, ‖F(x δα) − yδ‖2 � δ2 + α‖x − x†‖2 and the

definition of α0. For rule 2 the proof of proposition 3.1 can be performed analogously.In our forthcoming considerations we always assume that rule 1 and rule 2 are well defined

and do not state the conditions explicitly.

3.2. Error bounds in the case p = 1

In order to prove order optimal error bounds for ‖x δα − x†‖ with α chosen from rule 1 or rule 2,

respectively, a preparatory proposition is required. This proposition gives a lower bound forthe regularization parameters α = α(δ) obtained by rule 1 and rule 2 which is sharper than thebound (3.3) in proposition 3.1.

Proposition 3.2. Let α be chosen by rule 1 or rule 2 with C > 1. Let assumption 1 with p = 1and assumption 2 with radius r = δ/

√α + 2‖x − x†‖ hold. If L‖v‖ < 1, then

α � C − 1

‖v‖ δ. (3.4)

Proof. From rule 1, ‖Rδα‖ � 1, the second inequality of (2.6) and the second inequality of (2.3)

we have

Cδ = ‖(Rδα)1/2[F(x δ

α) − yδ]‖ � ‖F(x δα) − yδ‖ � δ + α‖v‖

which gives (3.4) for rule 1. In order to prove (3.4) for rule 2 we use the estimate


α) − yδ]‖2

‖Rδα[F(x δ

α) − yδ]‖ � ‖F(x δα) − yδ‖

and proceed along the lines of the proof for rule 1. �

The next theorem shows that order optimal error bounds for ‖x δα − x†‖ are valid under

assumption 1 with p = 1 and assumption 2 provided the regularization parameter α is chosena posteriori by rule 1 or rule 2 with C > 1, respectively. The proof is based on some ideasin [32] where order optimal error bounds have been established for rule 1 and rule 2 in the caseof linear ill posed operator equations.


Theorem 3.3. Let α be chosen by rule 1 or rule 2 with C > 1. Let assumption 1 with p = 1and assumption 2 with radius r = δ/

√α + 2‖x − x†‖ hold. Suppose L‖v‖ is sufficiently small

that 3L‖v‖ � 2 and

ε1 :=√

L‖v‖2

+ L‖v‖ +L‖v‖

8√

1 − L‖v‖C + 1

C − 1< 1 (3.5)

hold. Then,

‖x δα − x†‖ � 1

1 − ε1

{(C + 1)1/2 +

1

2(C − 1)−1/2

}‖v‖1/2δ1/2. (3.6)

Proof. First, let us prove the theorem for rule 1. We apply the Euler equation (1.4) with x = x δα

and obtain

((Aδα)

∗ Aδα + α I )(x δ

α − x†) = (Aδα)∗ Aδ

α(x δα − x†) + α(x δ

α − x + x − x†)

= (Aδα)∗ Aδ

α(x δα − x†) + (Aδ

α)∗[yδ − F(x δα)] + α(x − x†)

= α(x − x†) + (Aδα)

∗[yδ + Aδα(x δ

α − x†) − F(x δα)].

Multiplying both sides by ((Aδα)

∗ Aδα + α I )−1 yields the error representation

x δα − x† = Rδ

α(x − x†) + ((Aδα)

∗ Aδα + α I )−1(Aδ

α)∗[yδ + Aδ

α(x δα − x†) − F(x δ

α)]. (3.7)

Due to (3.7), (1.6), (1.2) and (1.5), the first estimates of (2.3), (2.6) and the estimateδ/α � ‖v‖/(C − 1) which follows from (3.4) we obtain

‖x δα − x†‖ � ‖Rδ

α(x − x†)‖ +1

2√

α

(δ +

L

2‖x δ

α − x†‖2

)

� ‖Rδα(x − x†)‖ +

1

2√

α

[δ +

L

2

(δ + α‖v‖/2√α√

1 − L‖v‖)

‖x δα − x†‖

]

� ‖Rδα(x − x†)‖ +

1

2

√ ‖v‖C − 1

√δ +

L‖v‖8√

1 − L‖v‖C + 1

C − 1‖x δ

α − x†‖. (3.8)

From assumption 1 with p = 1, ‖Rδα‖ � 1 and assumption 2 we obtain

‖Rδα(x − x†)‖2 = (Rδ

α(x − x†), Rδα(Aδ

α)∗v) + (Rδ

α(x − x†), Rδα[A∗ − (Aδ

α)∗]v)

� (Rδα(x − x†), Rδ

α(Aδα)

∗v) + ‖Rδα(x − x†)‖L‖v‖‖x δ

α − x†‖which gives, using the implication c2 � a + bc ⇒ c �

√|a| + b and ‖(Rδα)1/2‖ � 1,

‖Rδα(x − x†)‖ �

√|(Rδ

α(x − x†), Rδα(Aδ

α)∗v)| + L‖v‖‖x δ

α − x†‖� ‖(Rδ

α)3/2 Aδα(x − x†)‖1/2‖v‖1/2 + L‖v‖‖x δ

α − x†‖. (3.9)

We multiply the Euler equation (Aδα)

∗[F(x δα) − yδ] + α(x δ

α − x) = 0 by Aδα and obtain the

equation Aδα(Aδ

α)∗[F(x δ

α) − yδ] = αAδα(x − x δ

α). Adding α[F(x δα) − yδ] on both sides and

multiplying by (Aδα(Aδ

α)∗ + α I )−1 yields

F(x δα) − yδ = Rδ

α[F(x δα) + Aδ

α(x − x δα) − yδ]

= Rδα[F(x δ

α) + Aδα(x† − x δ

α) − yδ] + Rδα Aδ

α(x − x†).

Multiplying by (Rδα)

1/2 provides

(Rδα)3/2 Aδ

α(x − x†) = (Rδα)1/2[F(x δ

α) − yδ] − (Rδα)3/2[F(x δ

α) + Aδα(x† − x δ

α) − yδ]. (3.10)

From (3.10) we obtain due to rule 1

‖(Rδα)3/2 Aδ

α(x − x†)‖1/2 �√

Cδ + δ +L

2‖x δ

α − x†‖2 �√

C + 1√

δ +

√L

2‖x δ

α − x†‖. (3.11)


Now the proof of the theorem for rule 1 follows from (3.8), (3.9) and (3.11). In order to provethe theorem for rule 2 we note that rule 1 has been exploited in the estimates (3.8) and (3.11).However, due to proposition 3.2 and d1(α) � d2(α), the estimates (3.8) and (3.11) are alsovalid for rule 2. Hence, the proof is complete. �

In the second part of this subsection we shall prove ‖xα − x†‖ = O(√

δ) provided α ischosen by rule 1. This result will especially be useful for studying the smooth case p ∈ [1, 2]in section 3.3. We start our examinations with an auxiliary result whose proof is along thelines of lemma 4.2 in [17].

Proposition 3.4. Let assumption 2 hold and let x , z ∈ Br (x†). Define

Ax := F ′(x), Az := F ′(z),Rx := α(Ax A∗

x + α I )−1 and Rz := α(Az A∗z + α I )−1.

Then for a = ‖R1/2x y‖2 and b = ‖R1/2

z y‖2 with arbitrary y ∈ Y there holds

|a − b| � L‖x − z‖√α

(a + b). (3.12)

Theorem 3.5. Let α be chosen by rule 1 with C > 1. Suppose assumption 1 with p = 1 andassumption 2 with radius r = 2‖x − x†‖. If L‖v‖ is sufficiently small that 2L‖v‖ � 3,

ε3 := L‖v‖(C − 1)

√1 − L‖v‖ < 1 and ε4 :=

√L‖v‖

2+ L‖v‖ +

L‖v‖8√

1 − L‖v‖ < 1,

(3.13)

then

‖xα − x†‖ � k4

√δ with k4 = ‖v‖1/2

1 − ε4

[1 + C

√1 + ε3

1 − ε3

]1/2

. (3.14)

Proof. From (3.7) with δ = 0 we have the error representation

xα − x† = Rα(x − x†) + (A∗α Aα + α I )−1 A∗

α[y + Aα(xα − x†) − F(xα)]. (3.15)

Due to (1.6), (1.5) and theorem 2.3 we obtain from (3.15) that

‖xα − x†‖ � ‖Rα(x − x†)‖ +L‖v‖

8√

1 − L‖v‖ ‖xα − x†‖. (3.16)

In analogy to the proof of (3.9) there follows

‖Rα(x − x†)‖ � ‖R3/2α Aα(x − x†)‖1/2‖v‖1/2 + L‖v‖‖xα − x†‖. (3.17)

From (3.10) with δ = 0 we obtain the identity

R3/2α Aα(x − x†) = R1/2

α [F(xα) − y] − R3/2α [F(xα) + Aα(x† − xα) − y]. (3.18)

To estimate the first summand on the right-hand side of (3.18) we apply proposition 3.4 witha = ‖R1/2

α [F(x δα) − yδ]‖2 and b = ‖(Rδ

α)1/2[F(x δα) − yδ]‖2 and obtain due to (2.6) and (3.4)

the estimate

|a − b| � L‖xα − x δα‖√

α(a + b) � Lδ

α√

1 − L‖v‖ (a + b) � ε3(a + b)

with ε3 given in (3.13). This estimate provides a � (1 + ε3)(1 − ε3)−1b. Hence, due to the

triangle inequality, ‖R1/2α ‖ � 1, the second estimate of (2.6) and rule 1 we have

‖R1/2α [F(xα) − y]‖ � ‖R1/2

α [F(x δα) − yδ + y − F(xα)]‖ + ‖R1/2

α [F(x δα) − yδ]‖

� δ +

√1 + ε3

1 − ε3‖(Rδ

α)1/2[F(x δα) − yδ]‖ �

[1 + C

√1 + ε3

1 − ε3

]δ. (3.19)


From (3.18), (3.19), ‖R3/2α ‖ � 1, (1.5) and the estimate

√a + b � √

a +√

b we obtain

‖R3/2α Aα(x − x†)‖1/2 �

[1 + C

√1 + ε3

1 − ε3

]1/2√δ +

√L

2‖xα − x†‖. (3.20)

Now (3.14) follows from (3.16), (3.17) and (3.20). �

3.3. Error bounds in the case p ∈ [1, 2]

In this subsection we will show that it is possible to derive order optimal error bounds for‖x δ

α − x†‖ under assumption 1 and assumption 2 for the special case p ∈ [1, 2] provided theregularization parameter α is chosen from rule 1. Unfortunately, analogous results for rule 2are unknown so far. Since the proof for p = 1 in section 3.2 fails for the general case p ∈ [1, 2],we will use another method which combines some new ideas with those from [19] where thetotal error ‖x δ

α − x†‖ has been decomposed into the sum ‖xα − xα0‖ + ‖xα0 − x†‖ + ‖x δα − xα‖

with a properly chosen α0. In order to maintain the right order O(δ p/(p+1)), the nonlinear termsappearing in the estimates have to be handled carefully. We start our study with a propositionwhich gives a lower bound for the regularization parameter α = α(δ) obtained by rule 1 which,for p > 1, is sharper than the bound (3.4).

Proposition 3.6. Let α = α(δ) be chosen by rule 1 with C > 1. Assume assumption 1 withp ∈ [1, 2] and assumption 2 with radius r = δ/

√α + 2 ‖x − x†‖. If L‖v‖ is sufficiently small

such that 3L‖v‖ � 2 and ε3 < 1 with ε3 given in (3.13), then

α �(

C − 1

‖w‖

√1 − ε3

1 + ε3

1 − L‖v‖(1 + L‖v‖/2)2

)2/(p+1)

δ2/(p+1). (3.21)

Proof. From rule 1, ‖Rδα‖ � 1 and the second inequality of (2.6) we have


α) − yδ]‖ � ‖(Rδα)1/2[F(xα) − y]‖ + ‖F(x δ

α) − yδ + y − F(xα)‖� ‖(Rδ

α)1/2[F(xα) − y]‖ + δ

which gives

(C − 1)δ � ‖(Rδα)1/2[F(xα) − y]‖. (3.22)

Changing the roles of xα and x δα we obtain in analogy to the first part of (3.19) that

‖(Rδα)1/2[F(xα) − y]‖ �

√1 + ε3

1 − ε3‖R1/2

α [F(xα) − y]‖. (3.23)

To estimate ‖R1/2α [F(xα)− y]‖ in terms of α we use the triangle inequality, apply the estimates

‖R1/2α ‖ � 1, ‖R1/2

α Aα‖ � √α and the two estimates (2.8) with δ = 0 and (2.3), and obtain

‖R1/2α [F(xα) − y]‖ � ‖R1/2

α [F(xα) + Aα(x† − xα) − y]‖ + ‖R1/2α Aα(xα − x†)‖

�{

L

2‖xα − x†‖ +

√α

}‖xα − x†‖

�{

L‖v‖2√

1 − L‖v‖ + 1

}α(p+1)/2‖w‖ 1 + L‖v‖/2√

1 − L‖v‖ � α(p+1)/2‖w‖(

1 + L‖v‖/2√1 − L‖v‖

)2

.

(3.24)

Now estimate (3.21) follows from (3.22)–(3.24). �Our second proposition in this subsection shows that under appropriate conditions the

norm ‖xα − xα0‖ with 0 < α0 � α and α chosen from rule 1 can be estimated properly.


Proposition 3.7. Let α be chosen by rule 1 with C > 1. Assume assumption 1 with p ∈ [1, 2]and assumption 2 with radius r = δ/

√α + 2‖x − x†‖. If α � α0 := c0δ

2/(p+1) with a constantc0 independent of δ and if L‖v‖ is sufficiently small that 3L‖v‖ � 2, (3.13) and

ε5 := L‖v‖ +L‖v‖

8√

1 − L‖v‖ +Lk4

4√

c0δ(p−1)/(2p+2) < 1 (3.25)

with k4 given in (3.14), then

‖xα − xα0‖ � ‖R1/2α [F(xα) − y]‖(1 − ε5)

√α0

. (3.26)

Proof. From the Euler equation A∗α[F(xα) − y] + α(xα − x) = 0 we have

α(x − xα) = A∗α[F(xα) − y] and α0(x − xα0) = A∗

α0[F(xα0) − y]. (3.27)

Due to the identity α0(xα − xα0) = (α − α0)(x − xα) + α0(x − xα0) − α(x − xα) we obtain byusing (3.27) that

α0(xα − xα0) = α − α0

αA∗

α[F(xα) − y] + A∗α0

[F(xα0) − y] − A∗α[F(xα) − y].

We add on both sides A∗α Aα(xα − xα0), multiply by (A∗

α Aα + α0 I )−1 and obtain

xα − xα0 = (A∗α Aα + α0 I )−1

{α − α0

αA∗

α[F(xα) − y] + A∗α Aα(xα − xα0)

+ A∗α0

[F(xα0) − y] − A∗α[F(xα) − y]

}

= (A∗α Aα + α0 I )−1

{α − α0

αA∗

α[F(xα) − y] + (A∗α − A∗

α0)[y − F(xα0)]

+ A∗α[F(xα0) − F(xα) − Aα(xα0 − xα)]

}=: s1 + s2 + s3. (3.28)

From [19] we know that due to the monotonicity of α/(λ+α) as a function of the regularizationparameter α there holds∥∥∥∥α0(AA∗ + α0 I )−1 1

α(AA∗ + α I )

∥∥∥∥ � 1 for α0 � α. (3.29)

Using the estimates |(α − α0)/α| � 1, ‖A∗α(Aα A∗

α + α0 I )−1/2‖ � 1 and (3.29) we obtain forthe first summand s1 on the right-hand side of (3.28)

‖s1‖ � ‖A∗α(Aα A∗

α + α0 I )−1/2‖∥∥∥∥√

α0(Aα A∗α + α0 I )−1/2 1√

α(Aα A∗

α + α I )1/2

∥∥∥∥× ‖√α(Aα A∗

α + α I )−1/2[F(xα) − y]‖√α0

� ‖R1/2α [F(xα) − y]‖√

α0. (3.30)

To estimate the second summand s2 we use assumption 2 as well as (2.3) and obtain

‖s2‖ � L

α0‖xα − xα0‖‖F(xα0) − y‖ � L‖v‖‖xα − xα0‖. (3.31)

In order to estimate the third summand s3 on the right-hand side of (3.28) we first give an upperbound for ‖xα − xα0‖/√α0 with α from rule 1 and α � α0. This bound can be obtained byusing theorems 3.5 and 2.3, which provide the estimate

‖xα − xα0‖√α0

� 1√α0

(‖xα − x†‖ + ‖xα0 − x†‖) � k4√c0

δ(p−1)/(2p+2) +‖v‖

2√

1 − L‖v‖ (3.32)


with k4 given in (3.14). Due to (1.6), (1.5) and (3.32) we obtain that the third summand s3 onthe right-hand side of (3.28) can be estimated by

‖s3‖ � L

4√

α0‖xα − xα0‖2 �

(Lk4

4√

c0δ

p−12 p+2 +

L‖v‖8√

1 − L‖v‖)

‖xα − xα0‖. (3.33)

From (3.28), (3.30), (3.31) and (3.33) we obtain (3.26). �

Now we are ready to give order optimal error bounds for ‖x δα − x†‖ with α chosen from

rule 1 under assumption 1 with p ∈ [1, 2] and assumption 2.

Theorem 3.8. Let α be chosen by rule 1 with C > 1. Assume assumption 1 with p ∈ [1, 2]and assumption 2 with radius r = δ/

√α + 2‖x − x†‖. Suppose L‖v‖ is sufficiently small that

3L‖v‖ � 2, (3.13) and (3.25) hold. Then there exists a constant C0 independent of δ with

‖x δα − x†‖ � C0δ

p/(p+1). (3.34)

Proof. Consider a fixed regularization parameter α = α0 of the form

α0 = c0δ2/(p+1) (3.35)

and distinguish two cases. In the first case we assume that the solution α = α(δ) of rule 1satisfies α � α0. In this case we obtain from theorem 2.5 that

‖x δα − x†‖ � α p/2‖w‖ 1 + L‖v‖/2√

1 − L‖v‖ +δ√

α√

1 − L‖v‖� α

p/20 ‖w‖ 1 + L‖v‖/2√

1 − L‖v‖ +δ√

α√

1 − L‖v‖ . (3.36)

In the second case we assume that the solution α = α(δ) of rule 1 satisfies α � α0. In thissecond case we use the triangle inequality and obtain from proposition 3.7, theorem 2.5 withδ = 0 and theorem 2.4 that

‖x δα − x†‖ � ‖xα − xα0‖ + ‖xα0 − x†‖ + ‖x δ

α − xα‖

� ‖R1/2α [F(xα) − y]‖(1 − ε5)

√α0

+ αp/20 ‖w‖ 1 + L‖v‖/2√

1 − L‖v‖ +δ√

α√

1 − L‖v‖ (3.37)

with ε5 given in (3.25). Since the error bound (3.36) of the first case is smaller than the errorbound (3.37) of the second case we conclude that the error bound for ‖x δ

α − x†‖ with α chosenfrom rule 1 is given by (3.37). To estimate (3.37) in terms of δ we use (3.19), substitute thespecial value α0 from (3.35) into the first two summands on the right-hand side of (3.37), useinequality (3.21) to estimate the third summand on the right-hand side of (3.37) and obtain

‖x δα − x†‖ �

[1 + C

√1 + ε3

1 − ε3

]δ p/(p+1)

(1 − ε5)√

c0+ c p/2

0 ‖w‖ 1 + L‖v‖/2√1 − L‖v‖ δ p/(p+1)

+

( ‖w‖C − 1

√1 + ε3

1 − ε3

(1 + L‖v‖/2)2

1 − L‖v‖)1/(p+1)

δ p/(p+1)

√1 − L‖v‖ (3.38)

with ε3 given by (3.13). Hence, the proof of (3.34) is complete. �

Since the constant c0 > 0 in the error bound (3.38) is arbitrary, a sharp constant C0 in theerror estimate (3.34) can be found by minimizing (3.38) with respect to c0.


3.4. Error bounds in the case p ∈ (0, 1]

In this subsection we will show that in the case p ∈ (0, 1] order optimal error bounds are validunder assumption 3 and assumption 4 provided α is chosen from rule 1 or rule 2, respectively.Let us start with a preliminary proposition which gives a lower bound for the regularizationparameter α chosen by rule 1 or rule 2, respectively.

Proposition 3.9. Let α be chosen by rule 1 or rule 2 with C > 1. Assume assumption 3 withp ∈ (0, 1] and assumption 4 with radius r = δ/

√α + 2‖x − x†‖. If ε6 := k0‖x − x†‖ � 1/2,

then

α �(

C − 1

‖w‖√1 + ε6/4

)2/(p+1)

δ2/(p+1). (3.39)

Proof. First, let us estimate ‖F(xα)− y‖ in terms of α. We use (2.16) with x1 = xα, x2 = x†,the Euler equation A∗

α[F(xα) − y] + α(xα − x) = 0 as well as the first estimate of (2.1) withδ = 0 and obtain

(F(xα) + Aα(x† − xα) − y, F(xα) − y)

�∫ 1

0‖k(xα + t (x† − xα), xα, x† − xα) dt‖A∗

α[F(xα) − y]‖

� k0

2‖xα − x†‖2α‖xα − x‖ � αk0

2‖x − x†‖‖xα − x†‖2. (3.40)

From (2.13) we obtain

α(xα − x†, x − x†) = α2((A∗ A + α I )−1(x − x†), x − x†)

+ α((A∗ A + α I )−1(A∗α − A∗)[y − F(xα)], x − x†)

+ α((A∗ A + α I )−1 A∗[y + A(xα − x†) − F(xα)], x − x†) =: s1 + s2 + s3.

(3.41)

To estimate s1 we use assumption 3 with p ∈ (0, 1] and obtain due to (1.6)

s1 = α2((A∗ A + α I )−1(A∗ A)pw,w) � α p+1‖w‖2. (3.42)

To estimate s2 we use assumption 3 and assumption 4, the Euler equation A∗α[F(xα) − y] +

α(xα − x) = 0, the first estimate of (2.1) with δ = 0 as well as the estimate 2ab � a2 + b2 andobtain

s2 = α(F(xα) − y, (A − Aα)(A∗ A + α I )−1(A∗ A)p/2w)

= α2(x − xα, k(x†, xα, (A∗ A + α I )−1(A∗ A)p/2w))

� α p/2+1k0‖x − x†‖‖xα − x†‖‖w‖� αk0‖x − x†‖‖xα − x†‖2 + 1

4α p+1k0‖x − x†‖‖w‖2. (3.43)

To estimate s3 we use (2.16) and obtain due to assumption 4

s3 = α(y + A(xα − x†) − F(xα), A(A∗ A + α I )−1(x − x†))

� α‖x − x†‖∫ 1

0‖k(x† + t (xα − x†), x†, xα − x†)‖ dt

� αk0

2‖x − x†‖‖xα − x†‖2. (3.44)

From the first part of (2.4) and (3.40) – (3.44) we obtain

‖F(xα) − y‖2 + α‖xα − x†‖2 � (1 + 14 k0‖x − x†‖)α p+1‖w‖2 + 2αk0‖x − x†‖‖xα − x†‖2.


We use the assumption 2k0‖x − x†‖ � 1 and obtain

‖F(xα) − y‖ �√

1 + k0‖x − x†‖/4‖w‖α(p+1)/2. (3.45)

Since d1(α) � ‖F(x δα) − yδ‖ and d2(α) � ‖F(x δ

α) − yδ‖, we conclude from the secondestimate of (2.18) that for both rule 1 and rule 2, respectively, we have Cδ � δ + ‖F(xα)− y‖.This estimate and (3.45) provide (3.39). �

In the next proposition we estimate ‖xα − xα0‖ for 0 < α0 � α.

Proposition 3.10. Let assumption 4 with ε6 := k0‖x† − x‖ < 1/2 and radius r = 2‖x − x†‖be satisfied. Then, for all 0 < α0 � α,

‖xα − xα0‖ � ‖R1/2α [F(xα) − y]‖(1 − 2ε6)

√α0

. (3.46)

Proof. We consider the error representation (3.28) and estimate the three summands s1, s2 ands3 separately. To estimate the first summand s1 we use (3.30). In order to estimate the secondsummand s2 on the right-hand side of (3.28) we proceed along the lines of the proof of (2.15)and obtain

‖s2‖ � k0‖x† − x‖‖xα − xα0‖. (3.47)

To estimate the third summand s3 on the right-hand side of (3.28) we proceed according to theproof of (2.17) and obtain

‖s3‖ � k0‖x† − x‖‖xα − xα0‖. (3.48)

Now the proof of (3.46) follows from (3.28), (3.30), (3.47) and (3.48). �Before we provide order optimal error bounds for ‖x δ

α − x†‖ with α chosen from rule 1or rule 2, respectively, let us formulate an auxiliary result whose proof is based on lemma 3.6in [27].

Proposition 3.11. Let assumption 4 hold and let x , z ∈ Br (x†). Define

Ax := F ′(x), Az := F ′(z),Rx := α(Ax A∗

x + α I )−1 and Rz := α(Az A∗z + α I )−1.

Then for a = ‖R1/2x y‖2 and b = ‖R1/2

z y‖2 with arbitrary y ∈ Y there holds

|a − b| � k0‖x − z‖(a + b). (3.49)

Now we are ready to provide order optimal error bounds for ‖x δα − x†‖ with α chosen

from rule 1 or rule 2, respectively.

Theorem 3.12. Let α = α(δ) be chosen by rule 1 or rule 2 with C > 1. Assume assumption 3with fixed p ∈ (0, 1] and assumption 4 with radius r = 2‖x − x†‖ + δ/

√α. If k0‖x − x†‖ is

sufficiently small such that ε6 := k0‖x − x†‖ < 1/2 and

ε7 := k0‖x − x†‖(C − 1)

√1 − k0‖x − x†‖ < 1, (3.50)

then

‖x δα − x†‖ � cp‖w‖ 1

p+1 δp

p+1 (3.51)

with a constant cp independent of δ and ‖w‖ of the form

cp = 2

1 − 2ε6

[1 + C

√1 + ε7

1 − ε7

]p/(p+1)

+1√

1 − ε6

[√1 + ε6/4

C − 1

]1/(p+1)

. (3.52)


Proof. Consider a fixed regularization parameter α = α0 of the form

α0 = c0

(δ

‖w‖)2/(p+1)

with c0 =[

1 + C

√1 + ε7

1 − ε7

]2/(p+1)

(3.53)

and distinguish two cases. In the first case we assume that the solution α = α(δ) of rule 1or rule 2, respectively, satisfies α � α0. In this first case we obtain from (2.12) and the firstinequality of (2.18) that

‖x δα − x†‖ � ‖xα − x†‖ + ‖x δ

α − xα‖ � ‖w‖α p/20

1 − 2ε6+

δ√α√

1 − ε6. (3.54)

In the second case we assume that the solution α = α(δ) of rule 1 or rule 2 satisfies α � α0.In this second case we obtain from proposition 3.10, (2.12) and the first inequality of (2.18)that

‖x δα − x†‖ � ‖xα − xα0‖ + ‖xα0 − x†‖ + ‖x δ

α − xα‖

� ‖R1/2α [F(xα) − y]‖(1 − 2ε6)

√α0

+α

p/20 ‖w‖

1 − 2ε6+

δ√α√

1 − ε6. (3.55)

Since the error bound (3.54) of the first case is smaller than the error bound (3.55) of thesecond case we conclude that the error bound for ‖x δ

α − x†‖ with α chosen from rule 1 orrule 2, respectively, is given by (3.55). To estimate ‖R1/2

α [F(x δα) − yδ]‖ in terms of δ we use

proposition 3.11 with a = ‖R1/2α [F(x δ

α) − yδ]‖2 and b = ‖(Rδα)1/2[F(x δ

α) − yδ]‖2 and obtaindue to (2.18) and (3.3) the estimate

|a − b| � k0‖x δα − xα‖(a + b) � k0δ√

α√

1 − ε6(a + b) � ε7(a + b)

with ε7 given in (3.50). This estimate provides a � (1 + ε7)(1 − ε7)−1b. We apply the

triangle inequality, the estimate ‖R1/2α ‖ � 1, the second estimate of (2.18) and rule 1 or rule 2,

respectively, use in the case of rule 2 the valid estimate d1(α) � d2(α) and obtain

‖R1/2α [F(xα) − y]‖ � ‖R1/2

α [F(x δα) − yδ + y − F(xα)]‖ + ‖R1/2

α [F(x δα) − yδ]‖

� δ +

√1 + ε7

1 − ε7‖(Rδ

α)1/2[F(x δα) − yδ]‖ �

[1 + C

√1 + ε7

1 − ε7

]δ. (3.56)

Substituting (3.56) into the first summand on the right-hand side of (3.55), substituting thespecial value α0 from (3.53) into the first two summands on the right-hand side of (3.55)and using inequality (3.39) to estimate the third summand on the right-hand side of (3.55)provides (3.51) and (3.52). �

Finally we remark that for rule 1 the results of theorem 3.12 can be extended to the maximalrange p ∈ (0, 2]. For the proof of this result we show along the lines of proposition 3.6 thatfor α from rule 1 we have α � cδ2/(p+1) with a constant c > 0, proceed according to the proofof theorem 3.12 and obtain

Theorem 3.13. Let α = α(δ) be chosen by rule 1 with C > 1. Assume assumption 3 with fixedp ∈ (0, 2] and assumption 4 with radius r = 2‖x − x†‖ + δ/

√α. If k0‖x − x†‖ is sufficiently

small that ε6 := k0‖x − x†‖ < 1/2 and (3.50) hold, then

‖x δα − x†‖ � cp‖w‖ 1

p+1 δp

p+1 (3.57)

with a constant cp independent of δ and ‖w‖ of the form

cp = 2

1 − 2ε6

[1 + C

√1 + ε7

1 − ε7

]p/(p+1)

+1√

1 − ε6

[√1 + ε7

1 − ε7

1 + ε6

(C − 1)(1 − 2ε6)

]1/(p+1)

.


4. Illustration

Nonlinear ill posed problems (1.1) arise in a number of applications. In this section we willshortly discuss a parameter identification problem which has been considered in differentpapers, see, e.g., [4, 5, 20]. This nonlinear ill posed problem consists of identifying thecoefficient c(x), x ∈ (0, 1), in the elliptic boundary value problem

−uxx + cu = f, u(0) = g0, u(1) = g1 (4.1)

from noisy data uδ(x) ∈ Y = L2(0, 1) where f ∈ L2(0, 1), g0 and g1 are given. We definethe nonlinear operator F : D(F) ⊂ X → Y as the solution operator c → u of the boundaryvalue problem (4.1) according to

F(c) = u with D(F) = {c ∈ X |c � 0 a.e.} (4.2)

where u = u(x; c) is the solution of (4.1) for given c ∈ D(F). We consider (4.2) in anL2-space setting with X = Y = L2(0, 1). Our test example is given by

Example 1. u(x) = 100 + x(1 − x), c†(x) = 1 + 3118 x − 49

18 x3 + 76 x5 − 1

6 x7, c(x) = 1.

In example 1, the function f of problem (4.1) has the form f (x) = 2 + c†(x)u(x). Forthis test example assumption 1 is satisfied with p = 2, see [20]. To discretize problem (4.1)we decompose the interval [0, 1] into n + 1 equidistant subintervals [x j , x j+1] ( j = 0, . . . , n)of length h = x j+1 − x j = 1/(n + 1) and approximate the boundary value problem (4.1) bythe finite-dimensional system equation

T (c, u) = b, T : Rn × R

n → Rn

of the form [h−2tridiag(−1, 2,−1)+diag(ci )]u = b with u = (ui)ni=1, b = (bi)

ni=1, ui = u(xi),

ci = c(xi) for i = 1, ..., n, bi = f (xi) for i = 2, ..., n − 1, b1 = f (x1) + g0/h2 andbn = f (xn) + g1/h2. The partial derivatives Tu and Tc of the bilinear operator T are given by

Tu = h−2 tridiag(−1, 2,−1) + diag(ci ) and Tc = diag(ui ).

Due to the implicit function theorem the Frechet derivative J := F ′(c) can be computedaccording to the formula J = −T −1

u Tc with u = F(c) = T −1u b. To compute discretized reg-

ularized approximations cδα ∈ R

n , instead of exact data vectors u ∈ Rn normally distributed

randomly perturbed data vectors uδ ∈ Rn with ‖u − uδ‖ � δ have been used where here and

in the rest of this subsection ‖ · ‖ denotes the discretized L2-norm. To solve the discretizedEuler equation (1.4) by the Gauss–Newton iteration

c := c − (J T J + α I )−1[J T (F(c) − uδ) + α(c − c)] (4.3)

the dense matrix J T J has to be generated and a dense linear system with the symmetric positivedefinite system matrix J T J + α I has to be solved. However, the partial derivative matricesTu and Tc are sparse. Hence, we replace the Gauss–Newton iteration (4.3) by an equivalentiteration with sparse matrix operations:

c := c − J T (J J T + α I )−1[F(c) − uδ − J (c − c)]

= c + T Tc (TcT T

c + αTu T Tu )−1[b − Tuuδ + Tc(c − c)]. (4.4)

Due to Tc = T Tc and Tu = T T

u , this equivalent iteration makes it possible to perform oneiteration step by

(i) solving the tridiagonal system Tuu = b in order to find u,(ii) solving the five-diagonal system (T 2

c + αT 2u )p = b − Tuuδ + Tc(c − c) to find p and

(iii) computing the new iterate c according to c := c + Tc p.


Table 1. Errors for example 1 with α chosen according to rules 1, 2 and 3.

δ e1 e2 e3 e1/δ2/3 e2/δ

2/3 e3/δ1/2

10−2 4.87 × 10−3 3.83 × 10−3 1.90 × 10−3 1.05 × 10−1 8.24 × 10−2 1.90 × 10−2

10−3 1.06 × 10−3 8.30 × 10−4 7.68 × 10−4 1.06 × 10−1 8.30 × 10−2 2.43 × 10−2

10−4 2.30 × 10−4 1.81 × 10−4 3.20 × 10−4 1.07 × 10−1 8.41 × 10−2 3.20 × 10−2

10−5 5.14 × 10−5 4.07 × 10−5 1.29 × 10−4 1.11 × 10−1 8.77 × 10−2 4.08 × 10−2

10−6 1.14 × 10−5 9.09 × 10−6 5.08 × 10−5 1.14 × 10−1 9.09 × 10−2 5.08 × 10−2

10−7 2.56 × 10−6 2.04 × 10−6 2.08 × 10−5 1.19 × 10−1 9.47 × 10−2 6.59 × 10−2

Table 2. Regularization parameters for example 1.

δ α1 α2 α3 α1/δ2/3 α2/δ

2/3 α3/δ

10−2 9.69 × 10−1 7.54 × 10−1 1.08 × 10−1 2.09e+1 1.62e+1 1.08e+110−3 2.07 × 10−1 1.61 × 10−1 1.10 × 10−2 2.07e+1 1.61e+1 1.10e+110−4 4.50 × 10−2 3.50 × 10−2 1.16 × 10−3 2.09e+1 1.62e+1 1.16e+110−5 9.97 × 10−3 7.74 × 10−3 1.27 × 10−4 2.15e+1 1.67e+1 1.27e+110−6 2.20 × 10−3 1.70 × 10−3 1.42 × 10−5 2.20e+1 1.70e+1 1.42e+110−7 4.81 × 10−4 3.71 × 10−4 1.66 × 10−6 2.23e+1 1.72e+1 1.66e+1

The method (4.4) of performing one Gauss–Newton step has been applied in differentparameter identification problems for differential equations and can also be formulated forthe corresponding infinite-dimensional problems (see [6, 28]). After having computed theregularized approximation cδ

α , the computation of function values di(α)(i = 1, 2, 3) can alsobe done by sparse matrix computations according to

d1(α) = √α(b2, b3), d2(α) = (b2, b3)

‖b4‖ , d3(α) = ‖b1‖

with b1 = T −1u b − uδ , b2 = Tub1, b3 = [T 2

c + αT 2u ]−1b2, and b4 = Tub3.

In our numerical experiments the starting vector c0 := c = (c(x j))nj=1 ∈ R

n has beenused and the regularization parameters α1, α2 and α3 have been chosen according to rule 1and rule 2 and Morozov’s discrepancy principle (rule 3), respectively, where rule 3 reads asfollows.


d3(α) := ‖F(x δα) − yδ‖ = Cδ.

The corresponding nonlinear equations di(α) = Cδ (i = 1, 2, 3) have been solved by thesecant method and for C the constant C = 1.1 has been used. For the dimension number nwe used in all experiments n = 400. Furthermore, for the error values

e1 := ‖cδα1

− c†‖, e2 := ‖cδα2

− c†‖, e3 := ‖cδα3

− c†‖we performed 20 experiments (with 20 different noisy data vectors uδ) and the given errorvalues as well as the given regularization parameters in tables 1 and 2 are mean values.

The results in table 1 verify the theoretical results of the estimate (3.34) of theorem 3.8which tells us that for c − c† ∈ R(A∗ A) there holds e1 = O(δ2/3). Table 1 also illustrates oursupposition that e2 = O(δ2/3) and shows the well known fact that e3 = O(δ1/2). In addition weobserved that e1 > e2 always holds true which is in agreement with the linear case (see [32]).Note that for δ = 10−7 the accuracy for rule 2 is ten times better than the accuracy for Morozov’sdiscrepancy principle. The results in table 2 show that the regularization parameters α1, α2


and α3 of the rules 1–3 are related by α3 < α2 < α1, that α1 is of the order O(δ2/3) and that α3

is of the order O(δ). Table 2 also shows that α2 is of the order O(δ2/3) which, however, couldnot be supported theoretically. For some further numerical experiments for comparing rule 1and rule 2 see [27].

References

[1] Bakushinskii A B 1984 Remarks on choosing a regularization parameter using the quasi-optimality and ratiocriterion USSR Comput. Math. Phys. 24 181–2

[2] Baumeister J 1987 Stable Solution of Inverse Problems (Braunschweig: Vieweg)[3] Engl H W 1993 Regularization methods for the stable solution of inverse problems Surv. Math. Ind. 3 71–143[4] Engl H W, Hanke M and Neubauer A 1996 Regularization of Inverse Problems (Dordrecht: Kluwer)[5] Engl H W, Kunisch K and Neubauer A 1989 Convergence rates for Tikhonov regularization of nonlinear ill

posed problems Inverse Problems 5 523–40[6] Friedrich V and Tautenhahn U 1989 Regularized parameter identification in elliptic boundary value problems

Z. Anal. Angew. 8 3–11[7] Gfrerer H 1987 An a posteriori parameter choice for ordinary and iterated Tikhonov regularization of ill-posed

problems leading to optimal convergence rates Math. Comput. 49 507–22[8] Golub G H, Heat M and Wahba G 1979 Generalized cross validation as a method for choosing a good ridge

parameter Technometrics 21 215–23[9] Gorenflo R and Hofmann B 1994 On autoconvolution and regularization Inverse Problems 10 353–73

[10] Groetsch C W 1984 The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind (Boston,MA: Pitman)

[11] Groetsch C W 1993 Inverse Problems in the Mathematical Sciences (Braunschweig: Vieweg)[12] Hanke M and Hansen P C 1993 Regularization methods for large-scale problems Surv. Math. Ind. 3 253–315[13] Hansen P C 1992 Analysis of discrete ill-posed problems by means of the L-curve SIAM Rev. 34 561–80[14] Hansen P C and O’Leary D P 1993 The use of the L-curve in the regularization of discrete ill-posed problems

SIAM J. Sci. Comput. 14 1487–503[15] Hofmann B 1986 Regularization for Applied Inverse and Ill-Posed Problems (Leipzig: Teubner)[16] Hofmann B and Scherzer O 1994 Factors influencing the ill-posedness of nonlinear problems Inverse Problems

10 1277–97[17] Jin Qi-nian 1999 A convergence analysis for Tikhonov regularization of nonlinear ill-posed problems Inverse

Problems 15 1087–98[18] Jin Qi-nian and Hou Zong-yi 1997 On the choice of the regularization parameter for ordinary and iterated

Tikhonov regularization of nonlinear ill posed problems Inverse Problems 13 815–27[19] Jin Qi-nian and Hou Zong-yi 1999 On an a posteriori parameter choice strategy for Tikhonov regularization of

nonlinear ill-posed problems Numer. Math. 83 139–59[20] Kohler J and Tautenhahn U 1994 Error bounds for regularized solutions of nonlinear ill-posed problems

J. Inverse Ill-Posed Problems 3 47–74[21] Kravaris C and Seinfeld H J 1985 Identification of parameters in distributed systems by regularization SIAM J.

Control Optim. 23 217–41[22] Louis A K 1989 Inverse und Schlecht Gestellte Probleme (Stuttgart: Teubner)[23] Morozov V A 1966 On the solution of functional equations by the method of regularization Sov. Math.–Dokl. 7

414–17[24] Neubauer A 1989 Tikhonov regularization for non-linear ill-posed problems: optimal convergence rates and

finite dimensional approximation Inverse Problems 5 541–57[25] Raus T 1985 On the discrepancy principle for the solution of ill-posed problems with non-selfadjoint operators

Acta Comment. Univ. Tartuensis 715 12–20 (in Russian)[26] Scherzer O 1993 A parameter choice for Tikhonov regularization for solving nonlinear ill-posed problems Appl.

Math. 38 479–87[27] Scherzer O, Engl H W and Kunisch K 1993 Optimal a posteriori parameter choice for Tikhonov regularization

for solving nonlinear ill-posed problems SIAM J. Numer. Anal. 30 1796–838[28] Tautenhahn U and Schweigert D 1992 Effective numerical methods for solving implicit ill-posed inverse

problems Theory and Practice of Geophysical Data Inversion ed A Vogel et al (Braunschweig: Vieweg)pp 3–19

[29] Tautenhahn U 1994 Error estimates for regularized solutions of nonlinear ill posed problems Inverse Problems10 485–500


[30] Tautenhahn U 1996 Tikhonov regularization for identification problems in differential equations ParameterIdentification and Inverse Problems in Hydrology, Geology and Ecology ed J Gottlieb et al (Dordrecht:Kluwer) pp 261–70

[31] Tautenhahn U 1999 On a new parameter choice rule for ill-posed inverse problems HERCMA ’98 Proc. 4thHellenic–Eur. Conf. on Computer Mathematics and its Applications ed E A Lipitakis (Athens: LEA)pp 118–25

[32] Tautenhahn U and Hamarik U 1999 The use of monotonicity for choosing the regularization parameter in illposed problems Inverse Problems 15 1487–505

[33] Tikhonov A N and Arsenin V Y 1977 Solution of Ill-Posed Problems (New York: Wiley)[34] Tikhonov A N, Leonov A S and Yagola A G 1998 Nonlinear Ill-Posed Problems vols 1 and 2 (London: Chapman

and Hall)[35] Vainikko G M and Veretennikov A Y 1986 Iteration Procedures in Ill-Posed Problems (Moscow: Nauka) (in

Russian)[36] Wahba G 1977 Practical approximate solutions to linear operator equations when the data are noisy SIAM J.

Numer. Anal. 14 651–67

Tikhonov regularization and a posteriori rules for solving nonlinear ...

Documents

Transcript of Tikhonov regularization and a posteriori rules for solving nonlinear ...