Linear and Nonlinear Heuristic Regularisation for Ill ...

Submitted byKemal Raik, MA MSc.

Submitted atIndustrial MathematicsInstitute

Supervisor andFirst ExaminerPriv. Doz. DIDr Stefan Kindermann

Second ExaminerUniv.-Prof.Dr Bernd Hofmann

July 2020

JOHANNES KEPLERUNIVERSITY LINZAltenbergerstraße 694040 Linz, Osterreichwww.jku.atDVR 0093696

Linear and NonlinearHeuristic Regularisationfor Ill-Posed Problems

Doctoral Thesis

to obtain the academic degree of

Doktor der technischen Wissenschaften

in the Doctoral Program

Technische Wissenschaften

Abstract

In this thesis, we cover the so-called heuristic (aka error-free or data-driven)parameter choice rules for the regularisation of ill-posed problems (whichjust so happen to be prominent in the treatment of inverse problems). Weconsider the linear theory associated with both continuous regularisationmethods, such as that of Tikhonov, and also iterative procedures, such asLandweber’s method. We provide background material associated with eachof the aforementioned regularisation methods as well as the standard resultsfound in the literature. In particular, the convergence theory for heuristicrules is typically based on a noise-restricted analysis. We also introducesome more recent developments in the linear theory for certain instances: incase of operator perturbations or weakly bounded noise for linear Tikhonovregularisation. In both the aforementioned cases, novel parameter choicerules were derived; for the case of weakly bounded noise, through necessityand in the case of operator perturbations, an entirely new class of parameterchoice rules are discussed (so-called semi-heuristic rules which could be saidto be the “middle ground” between heuristic rules and a-posteriori rules).We then delve further into the abyss of the relatively unknown; namely thenonlinear theory (by which we mean that the regularisation is nonlinear)for which the development and analysis of heuristic rules are still in theirinfancy. Most notably in this thesis, we present a recent study of the conver-gence theory for heuristic Tikhonov regularisation with convex penalty termswhich attempts to generalise, to some extent, the restricted noise analysisof the linear theory. As the error in this setting is measured in terms ofthe Bregman distance, this naturally lends itself to the introduction of somenovel parameter choice rules.Finally, we illustrate and supplement most of the preceding by includinga numerics section which displays the effectiveness of heuristic parameterchoice rules and conclude with a discussion of the results as well as a specu-lation of the potential future scope of research in this exciting area of appliedmathematics.

1

Zusammenfassung

In dieser Dissertation behandeln wir sogenannte heuristische Parameterwahl-regeln (auch noisefreie oder datengesteuerte Parameterwahlregeln genannt)fur die Regularisierung schlecht gestellter Probleme (welche bei der Behand-lung von inversen Problemen eine herausragende Rolle spielen). Wir behan-deln zunachst lineare inverse Probleme sowohl in Kombination mit kontinu-ierlichen Regularisierungsmethoden, wie zum Beispiel der Tichonov Regulari-sierung, als auch mit iterativen, wie etwa dem Landweber Verfahren. Wir lie-fern Hintergrundmaterial zu jedem dieser Regularisierungsmethoden als auchdie dazugehorigen Standardresultate aus der Literatur, wie etwa die Kon-vergenztheorie fur heuristische Parameterwahlregeln, die typischerweise aufeiner Analysis mit Einschrankungen an den Datenfehler basieren. Außerdemstellen wir einige neuere Entwicklungen in der linearen Theorie fur bestimmteSpezialfalle vor: im Fall von Operatorstorungen oder schwach beschranktenDatenfehlern fur lineare Tichonovregularisierung. In beiden Fallen werdenneuartige Parameterwahlen vorgestellt; im Fall von schwach beschranktenDatenfehlern aus Notwendigkeit, und im Fall von Operatorstorungen wirdeine vollig neue Klasse von Parameterwahlregeln diskutiert (sogenannte semi-heuristische Regeln, die in gewisser Weise ein “Mittelweg” zwischen heuristi-schen und a-posteriori-Regeln sind). Anschließend tauchen wir weiter in denAbgrund des relativ Unbekannten ein, namlich in die nichtlineare Theorie(d.h., wenn die Regularisierungmethode nichtlinear ist), fur die Entwicklungund Analyse heuristischer Regeln noch im Kindheitsstadium sind. Bemer-kenswert in dieser Arbeit ist, dass wir eine aktuelle Konvergenztheorie furheuristische Tikhonov-Regularisierung mit konvexem Strafterm entwickeln,die versucht, die Konvergenzanalyse mit Datenfehlerbeschrankungen der li-nearen Theorie bis zu einem gewissen Grad zu verallgemeinern. Da der Fehlerbei diesen Methoden ublicherweise in der Bregman-Distanz gemessen wird,bieten sich dementsprechend einige neuartige Regeln fur die Parameterwahlan. Schließlich veranschaulichen und erganzen wir die meisten der vorher-gehenden Resultate in einem Abschnitt mit numerischen Experimenten, derdie Wirksamkeit heuristischer Parameterwahlen illustriert, und wir schlie-ßen mit einer Diskussion der Ergebnisse sowie Spekulationen uber moglichezukunftige Forschungsinhalte in diesem spannenden Anwendungsbereich derMathematik.

2

Acknowledgements

First and foremost, I would like to acknowledge and thank my supervisor,Dr Stefan Kindermann, who provided significant guidance over the course ofmy doctoral studies. The research topic, on which this thesis is based, wasthrough his proposal, which was granted funding from the Austrian ScienceFund (FWF) to whom I also extend my thanks. Moreover, much of thecontents of this thesis are based on research which was jointly conducted bymyself and my supervisor.I would also like to thank Professor Bernd Hofmann for agreeing to be thesecond examiner for this thesis and also for being the original organiser ofthe Chemnitz Symposium on Inverse Problems which I have twice had thepleasure of participating in.I also owe a great deal of thanks to my family in London for their continuedsupport whilst I have been in Linz, particularly my mother who has visitedme here on a great deal of occasions. Given the natural beauty of Austria,she did not need much convincing, however.Finally, I would like to thank my friends and colleagues; most notably, andin no particular order, Dr Simon Hubmer, Fabian Hinterer, Onkar SandipJadhav, Alexander Ploier and Dr Gunter Auzinger for their friendship anddiscussions, both academic and otherwise.

Kemal RaikLinz, July 2020

3

Contents

1 Introduction 71.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Regularisation Methods . . . . . . . . . . . . . . . . . . . . . 13

1.3.1 Continuous Methods . . . . . . . . . . . . . . . . . . . 141.3.2 Iterative Methods . . . . . . . . . . . . . . . . . . . . . 201.3.3 Parameter Choice Rules . . . . . . . . . . . . . . . . . 21

1.4 Heuristic Parameter Choice Rules . . . . . . . . . . . . . . . . 23

I Theory 30

2 Linear Tikhonov Regularisation 312.1 Classical Theory . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.1.1 Heuristic Parameter Choice Rules . . . . . . . . . . . . 362.2 Weakly Bounded Noise . . . . . . . . . . . . . . . . . . . . . . 52

2.2.1 Modified Parameter Choice Rules . . . . . . . . . . . . 542.2.2 Predictive Mean-Square Error . . . . . . . . . . . . . . 612.2.3 Generalised Cross-Validation . . . . . . . . . . . . . . . 65

2.3 Operator Perturbations . . . . . . . . . . . . . . . . . . . . . . 692.3.1 Semi-Heuristic Parameter Choice Rules . . . . . . . . . 71

3 Convex Tikhonov Regularisation 783.1 Classical Theory . . . . . . . . . . . . . . . . . . . . . . . . . 783.2 Parameter Choice Rules . . . . . . . . . . . . . . . . . . . . . 86

3.2.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . 893.2.2 Convergence Rates (for the Heuristic Discrepancy Rule) 92

3.3 Diagonal Operator Case Study . . . . . . . . . . . . . . . . . . 943.3.1 Muckenhoupt Conditions . . . . . . . . . . . . . . . . . 97

4 Iterative Regularisation 1084.1 Landweber Iteration for Linear Operators . . . . . . . . . . . 108

4.1.1 Heuristic Stopping Rules . . . . . . . . . . . . . . . . . 1124.2 Landweber Iteration for Nonlinear Operators . . . . . . . . . . 122

4

CONTENTS 5

4.2.1 Heuristic Parameter Choice Rules . . . . . . . . . . . . 123

II Numerics 125

5 Semi-Heuristic Rules 1275.1 Gaußian Operator Noise Perturbation . . . . . . . . . . . . . . 128

5.1.1 Tomography Operator Perturbed by Gaußian Operator. 1285.2 Smooth Operator Perturbation . . . . . . . . . . . . . . . . . 129

5.2.1 Fredholm Integral Operator Perturbed by Heat Operator1295.2.2 Blur Operator Perturbed by Tomography Operator . . 130

5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

6 Heuristic Rules for Convex Regularisation 1366.1 `1 Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . 1376.2 `

32 Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . 138

6.3 `3 Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . 1396.4 TV Regularisation . . . . . . . . . . . . . . . . . . . . . . . . 1396.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

7 The Simple L-curve Rules for Linear and Convex TikhonovRegularisation 1447.1 Linear Tikhonov Regularisation . . . . . . . . . . . . . . . . . 145

7.1.1 Diagonal Operator . . . . . . . . . . . . . . . . . . . . 1457.1.2 Examples from IR Tools . . . . . . . . . . . . . . . . . 146

7.2 Convex Tikhonov Regularisation . . . . . . . . . . . . . . . . . 1487.2.1 `1 Regularisation . . . . . . . . . . . . . . . . . . . . . 148

7.3 `32 Regularisation . . . . . . . . . . . . . . . . . . . . . . . . . 149

7.4 TV Regularisation . . . . . . . . . . . . . . . . . . . . . . . . 1507.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

8 Heuristic Rules for Nonlinear Landweber Iteration 1528.1 Test problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

8.1.1 Nonlinear Hammerstein Operator . . . . . . . . . . . . 1548.1.2 Auto-Convolution . . . . . . . . . . . . . . . . . . . . . 1548.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 157

III Future Scope 158

9 Future Scope 1599.1 Convex Heuristic Regularisation . . . . . . . . . . . . . . . . . 1599.2 Heuristic Blind Kernel Deconvolution . . . . . . . . . . . . . . 159

9.2.1 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . 1609.2.2 Semi-Blind Deconvolution . . . . . . . . . . . . . . . . 160

9.3 Meta-Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . 162

A Functional Calculus 177

B Convex Analysis 181

6

Chapter 1

Introduction

Typically in the “real world”, we have problems in which we would like toextract information from given data, e.g., acoustic sound waves and X-raysinograms among other examples. In particular, an acoustic sound waverecorded on the surface of the Earth contains information regarding the sub-surface, and X-rays contain information on the density of the material whichthey pass through. In order to recover this information, one must, in ef-fect, reverse the aforementioned processes, i.e., solve the inverse problem. Inthe theory of inverse problems, this is usually mathematically formalised inoperator theoretic terms. That is, we generally consider an equation of theform

Ax = y, (1.1)

in which A : X → Y is a continuous operator mapping between two vectorspaces, called the “forward operator”. The objective is then to invert theforward operator and to thus recover the solution x from measured data y.Generally speaking, the data we measure is considered corrupted to reflect,for instance, real-world machine error, and what we consider in fact is aperturbation of the data, yδ = y + e (where e may be very small), which wecall noisy data, and then naturally y is called exact data. One should mentionthat the noise model, i.e., e may be deterministic or stochastic, although inthis thesis, we will limit ourselves to the deterministic framework for ill-posedproblems. On the topic of stochastic ill-posed problems though, we opt torefer the reader to [15,23].Note that significant parts of this thesis are derived from the papers [51,89–91] of which the author of this thesis was a coauthor.

1.1 Examples

Examples of inverse problems may be found in theory as well as a wide va-riety of applications ranging from differentiation as a theoretically groundedexample to tomography as an example found in application.

7

Differentiation Indeed, differentiation and integration may be seen as op-posites of one another and therefore we may define one as the direct (orforward) and the other as the inverse problem, respectively. In this way, wesee that the definition of the inverse problem is rather arbitrary as eithermay qualify. However, it is the norm to define differentiation as the inverseproblem and the reason for this is that, unlike integration, the differentiationproblem is ill-posed ; a concept which we will illustrate by example now anddefine somewhat more rigorously in the proceeding section.We include the following example from [35]: for any f ∈ C1[0, 1], considerthe perturbed function

f δn(x) := f(x) + δ sinnx

δ,

with δ ∈ (0, 1) and n ∈ 2, 3, ... arbitrary. In the language of inverse prob-lems, the first term would be the “exact data” and the second term would bethe “noise”, and subsequently their sum is referred to as the “data with noise”or in even more colloquial terms, the “noisy data”. Now, differentiating thefunction above yields

(f δn)′(x) = f ′(x) + n cosnx

δ.

Note that‖f − f δn‖∞ = δ, (1.2)

whereas‖f ′ − (f δn)′‖∞ = n. (1.3)

Or to put into words, arbitrarily small data errors (1.2) (e.g., δ < 1) maylead to arbitrarily large solution errors (1.3) (e.g., n → ∞). That is, thereis a lack of continuous dependence between the data and the solution whichmakes this problem, e.g., differentiation, ill-posed. This consequently leadsone to approximate the ill-posed problem by a well-posed problem, i.e., toregularise (another term which we shall define later).

Tomography In computerised tomography (CT), which is particularlyprevalent in the medical field for a variety of applications, one seeks to recon-struct the density of a medium from X-ray measurements. This falls underthe umbrella of non-destructive testing in which one would like to under-stand the properties within a medium without causing any physical damage,hence the term non-destructive. In computerised tomography, the subject ofinterest is usually the human body or more precisely, a body part. In par-ticular, if we restrict ourselves to a two dimensional domain and let Ω ⊂ R2

represent the (compact) cross-section of a human body, then the “aim ofthe game” is to recover the density, which we denote by a two-dimensionalfunction f : Ω → R, from X-ray measurements in the plane where Ω lies.

8

In particular, the X-rays travel in straight lines which are parametrised bytheir normal vector θ ∈ R2 (‖θ‖ = 1) and their distance s > 0 from the ori-gin (cf. [35]). The forward operator which maps this is known as the Radontransform, and we can represent it by the following integral expression

(Rf)(s, θ) :=

∫Rf(sθ + tθ⊥) dt. (1.4)

For the derivation of (1.4), the reader is referred to [118]. Note that theRadon transform was named after the Austrian mathematician Johann Radonwho, in fact, derived it under entirely theoretical grounds (cf. [130]). It isquite obvious then that R is the operator which models the forward problem.The inverse problem is therefore to invert the Radon transform and recoverthe density distribution f . This problem has been treated quite extensivelyand we refer the reader to [118], [102, Chapter 6] and the references therein.In fact, in R2, an explicit inversion formula for (1.4) already exists thanks toJohann Radon and the aforementioned reference [130]. The point, and rele-vance regarding the theory of ill-posed problems, is that the formula involvestaking the derivative of the data and as already shown in the previous exam-ple, differentiation is an ill-posed problem! Thus, by consequence, inversionof the Radon transform is also ill-posed.

Backwards Heat Equation This example was also taken from [35], andwe refer the reader to the aforementioned reference for further details as weonly give a relatively brief description of the problem. The “forward” heatequation is a well known one, and is also referred to as the diffusion equationas it mathematically models the diffusion of heat in a body or medium. Theone-dimensional heat equation is usually written in the following way:

∂

∂tu(x, t)− ∂2

∂xu(x, t) = 0,

u(x, 0) = u0, in Ω,

u = 0, on ∂Ω× [0, T ],

(1.5)

with an initial and Dirichlet boundary condition, where Ω ⊂ R is the domainof the body/medium with a constant temperature equal to 0 on its boundary.The forward operator which describes the forward problem would be the onewhich maps A : u(·, 0) 7→ f , where

f(x) = u(x, T ), (x ∈ Ω)

where T > 0 is the (final) time of measurement.This can be solved via, e.g., Fourier analysis. However, our concern is theinverse problem which would be to determine the initial temperature distri-bution u(x, 0) from data derived from measurements of the final temperature.

9

Note, however, that there is no solution for this inverse problem, unless, thatis, f is assumed to be analytic (cf. [25, 35]). Restricting ourselves to f forwhich a solution exists still does not remedy all our problems as this unique(cf. [35]) solution would still not depend continuously on the data. Therefore,two out of three requirements for a “well-posed” problem would be violated(namely, existence, uniqueness and continuous dependence of the data andsolution; see below). In order to see this, we follow the example of [35] bywriting (1.5) as

∆φk + λkφk = 0, in Ω,

φk = 0, on ∂Ω,(1.6)

where λkk and φkk represent the eigenvalues and eigenvectors for theDirichlet problem (1.6) on Ω, respectively, with φk ∈ L2(Ω) normalised suchthat ‖φk‖L2(Ω) = 1 for all k ∈ N.Letting

uk(x, t) :=1

λkφk(x) exp (λk(T − t)) ,

and plugging into (1.5), we see that

(∆uk)(x, t) =1

λk(∆φk)(x) exp (λk(T − t))

= −φk(x) exp (λk(T − t))

=∂

∂tuk(x, t),

which confirms that uk satisfies (1.5), with fk = φk/λk. Now, since λk →∞,we get that

‖fk‖L2 → 0,

whereas

‖uk(·, 0)‖L2 =exp(λkT )

λk→∞,

as k →∞.Therefore, considering fk as perturbations of f = 0 with (data) error 1/λk(measured in the L2 norm), the (solution) error of the inverse problem isamplified exponentially by the factor exp(λkT ). Thus, in the “quantification”of ill-posed problems, namely, the so-called degree of ill-posedness (again, seebelow for details), the backwards heat equation is said to be severely (akaexponentially) ill-posed.

1.2 Preliminaries

Assume henceforth, until stated otherwise, that A ∈ L(X, Y ) is a continuouslinear operator between two Hilbert spaces. In case A−1 does not exist, one

10

seeks to construct a generalised inverse A† which recovers the best approxi-mate solution, denoted by x† := A†y (cf. [116,117]).

Definition 1. The Moore-Penrose generalised inverse is defined as the uniquelinear extension of the operator A−1 : rangeA → kerA⊥ to the new domaindomA† := rangeA ⊕ rangeA⊥, with rangeA⊥ =: kerA†, where A is a re-striction of A to the orthogonal complement of its kernel.

A

A†

range A

range A⊥

ker A

ker A⊥

X Y

Figure 1.1: The Moore-Penrose generalised inverse, as we see in this illus-tration, maps the direct sum of the range of A and its complement to thecomplement of the kernel of A.

The Moore-Penrose generalised inverse A† : rangeA ⊕ rangeA⊥ → kerA⊥

thus allows a simple way to compute the best approximate solution which isusually expressed in terms of the least-squares solution of (1.1) [35]:

Definition 2. A vector x ∈ X is called a least-squares solution of(1.1) if

‖Ax− y‖ = infz∈X‖Az − y‖.

The best approximate solution of (1.1) may be defined as x ∈ X satisfying

‖x‖ = inf ‖z‖ | z is a least-squares solution of Ax = y .

In particular, the following theorem is key in understanding the relationshipbetween the least-squares solution of (1.1) and the generalised inverse [35]:

Theorem 1. x ∈ X is a least-squares solution of Ax = y if and only if theGaußian normal equation

A∗Ax = A∗y,

holds for all y ∈ domA†.

11

Courtesy of Theorem 1, for A∗A continuously invertible, one could “naively”compute the best approximate solution as

x† = A†y = (A∗A)−1A∗y =

∫ ∞0

1

λdEλA

∗y, (1.7)

for y ∈ domA†, where Eλ refers to the spectral family of the self-adjointoperator A∗A, and we refer to Appendix A for further details and explana-tion. However, for ill-posed problems (to be defined), it turns out that (1.7)is not well defined as, in particular, 0 could be in the spectrum of A∗A, whichis precisely where the integrand in (1.7) has a pole [139]. On the other hand,what can be deduced from Theorem 1 is that A† = (A∗A)†A∗ [35].Another question that begs to be asked is whether A†yδ would be a goodapproximation of A†y. In other words, if ‖yδ − y‖ is small, does that implythat ‖A†yδ−A†y‖ will remain small? The answer, it turns out, also dependson whether (1.1) is well-posed or not. There are essentially two well-knowndefinitions of well-posedness which are attributed to Hadamard and Nashed,respectively.

Definition 3. A problem (1.1) is said to be well-posed according to Hadamard[50] if all three of the following criteria are satisfied:

1. For all admissible data, a solution exists (i.e., rangeA = Y );

2. The solution is unique (i.e., kerA = 0);

3. The solution depends continuously on the data (i.e., A−1 ∈ L(Y,X)).

The working definition we opt to proceed with, however, is that of Nashed[116,117]:

Definition 4. The problem (1.1) is said to be well-posed according to Nashedif

rangeA = rangeA.

If any one of the criterion of Hadamard, or the criteria of Nashed fail to besatisfied, then (1.1) is said to be ill-posed. The latter definition may be linkedwith the following theorem [139]:

Theorem 2 (Open Mapping Theorem). A† is continuous if and only ifrangeA = rangeA.

Whilst we do not present the proof here and instead refer the reader to [35],we illustrate it via a rather abstract and general example. For instance, ifK is a compact operator (note that we usually opt to write K in place ofA whenever referring to a compact operator), then its range is closed if andonly if it is finite dimensional (cf. [33]). Thus, in case dim rangeK =∞, (1.1)

12

would automatically be ill-posed. That is, 0 would be an accumulation pointof the spectrum, so that (1.7) would no longer be well defined. In particular,for a compact operator, with yδn := y + δun, then ‖yδn − y‖ = δ, but

‖K†yδn −K†y‖ =δ

σn→∞,

as n→∞.In particular, for a compact operator K, we can write

K†y =∞∑i=1

1

σi〈y, ui〉vi, (1.8)

where σi; vi, ui is its singular system (cf. Appendix A), whenever

y ∈ domK† ⇐⇒∞∑i=1

1

σi|〈y, ui〉| <∞. (1.9)

Note that (1.9) is commonly referred to as the Picard condition (cf. [35]).Whilst it is only applicable in case the model operator is compact, thereforeyielding a singular value decomposition, it provides insight as to when thedata y is in the range of the pseudo-inverse (i.e., is attainable). In particular,it says that the Fourier coefficients, i.e., 〈y, un〉, with respect to the singularfunctions un, should decay faster than the singular values σn.We may also accordingly quantify degrees of ill-posedness (cf. [67]):

Definition 5. Let K : X → Y be a compact linear operator between twoHilbert spaces. Then the equation Kx = y with singular value decomposition(A.1) is said to be mildly ill-posed if σi = O(i−β) for some β > 0 and severelyill-posed if σi = O(e−i).

The faster the singular values decay, the more difficult the inverse problemis to solve. Thus, a severely ill-posed problem tends to be more problematicthan the mildly ill-posed case. The “irony” is that for heuristic rules (whichwill be defined later) to “work”, the problem should in fact be sufficientlyill-posed (but not “too ill-posed”). This will be explained in further detail inthe appropriate chapter.Note that in the example of differentiation, which we mentioned as a firstexample, the order of ill-posedness is given by σi = O(i−1) (cf. [35]). There isfurther literature on the nonlinear extensions on the degree of ill-posedness,cf. [69], as well as extensions for nonlinear equations, cf. [70, 71].

1.3 Regularisation Methods

We have seen that the pseudo-inverse does not always yield an acceptablesolution. For this reason, we would like to find a different way to compute

13

an acceptable (approximate) solution. As the topic of this thesis is on regu-larisation (cf. [35,49,115,145–149]), this is the method which we will explainbelow. In particular, it is possible to divide them into continuous and itera-tive methods (although both are related, as will become apparent):

1.3.1 Continuous Methods

In case A† is unbounded, we seek to approximate it by a parametric familyof continuous operators Rαα>0, with Rα : Y → X such that Rα → A† asα→ 0 pointwise [35]. In particular, if we consider the integral of the form:

Rαy = gα(A∗A)A∗y =

∫ ∞0

gα(λ) dEλA∗y, (1.10)

then we see that the “aim of the game” is to choose a filter function gα suchthat

gα(λ)→ 1

λ, (λ > 0),

as α → 0, as it is apparent that this would then imply convergence of theregularisation operator to the generalised inverse (see (1.7)).In case the forward operator is compact, one may also recall that we canexpress (1.10) as

Rαy =∞∑i=1

gα(σi)〈y, ui〉vi,

in which we would like for gα(σi)→ 1/σi as α→ 0 (see (1.8)).We also introduce the residual filter function as

rα(λ) := 1− λgα(λ), (1.11)

which may be derived, for instance, by considering the error. In particular,the residual function is derived in [35] as follows:

x† −Rαy = x† − gα(A∗A)A∗y = (I − gα(A∗A)A∗A)x†

=

∫ ∞0

(1− λgα(λ)) dEλx† = rα(A∗A)x†.

The following convergence proof is also courtesy of [35]:

Theorem 3. If gα is piecewise continuous and there exists a positive constantsuch that

|λgα(λ)| ≤ C, (1.12)

and

limα→0

gα(λ) =1

λ, (1.13)

14

for all λ ∈ (0, ‖A‖2), then

Rαy = gα(A∗A)A∗y → x†,

as α→ 0 for all y ∈ domA†. Note that if y /∈ domA†, then

‖Rαy‖ = ‖gα(A∗A)A∗y‖ → ∞,

as α→ 0.

Proof. The proof essentially boils down to exploiting Lebesgue’s DominatedConvergence Theorem [36]. We may write

‖Rαy − x†‖2 =

∫ ‖A‖2+

0

r2α(λ) d‖Eλx†‖2,

and due to (1.12), it follows that

|rα(λ)| = |1− λgα(λ)| ≤ 1 + C,

for all λ ∈ (0, ‖A‖2). Therefore,

limα→0

∫ ‖A‖2+

0

r2α(λ) d‖Eλx†‖2 =

∫ ‖A‖2+

0

limα→0

r2α(λ) d‖Eλx†‖2, (1.14)

is a consequence of the boundedness of the integrand thereby allowing oneto utilise the aforementioned dominated convergence theorem.Now, due to (1.13), it follows that

limα→0

rα(λ) = 1− limα→0

λgα(λ) = 1− 1 = 0,

for all λ > 0, and since rα(0) = 1, we also have that

limα→0

rα(0) = 1.

Hence, without going into great detail, it follows that the integral in (1.14)is equal to the “jump” of λ 7→ ‖Eλx†‖2 at λ = 0, i.e., it is equal tolimλ→0+ ‖Eλx†‖2−‖E0x

†‖2 = ‖Px†‖2, where P : X → kerA is an orthogonalprojection. For further detail, we refer to [35]. In essence, since x† ∈ kerA⊥,one has Px† = 0, which yields the desired result that

limα→0‖Rαy − x†‖2 = 0,

thus, completing the convergence proof. For the divergence result, we assume,for the sake of trying to prove a contradiction, that there exists a boundedsequence αn with αn → 0 such that ‖Rαny‖ is uniformly bounded sothat there exists a subsequence Rαnk

y such that ‖Rαnky‖ x ∈ X. Now,

due to the weak sequential continuity of A, it follows that ARαny Ax.On the other hand, ARαny → Qy =⇒ Ax = Qy, where Q : Y → rangeAis an orthogonal projection, i.e., we must have y ∈ domA†; that is, no suchbounded sequence can exist for y /∈ domA† and this completes the proof.

15

Next we provide some typical examples for the choice of the regularisationoperator Rα:

Example 1. Sticking with the general form (1.10), Tikhonov regularisation(cf. [145, 146]), sometimes also called Tikhonov-Phillips regularisation [128],amounts to choosing

gα(λ) =1

λ+ α,

(i.e., taking Rα = (A∗A+ αI)−1A∗, which is well defined as it is well knownthat the operator A∗A + αI is invertible for α > 0). It is trivial to seethat this tends to the desired integrand as α vanishes (and equivalently, that(A∗A+ αI)−1A∗ → (A∗A)†A∗ as α→ 0).

Example 2. An alternative is the spectral cutoff method (cf. [35,102]), whichconsists of choosing

gα(λ) = χ(α,∞)(λ)1

λ,

i.e.,

Rαy =

∫ ∞α

1

λdEλA

∗y,

where χ(α,∞) is the characteristic function on the interval specified by thesubscript; that is

χ(α,∞)(λ) :=

1, if λ ∈ (α,∞),

0, otherwise.

The intuition of this regularisation method is that we cut off the problematiceigenvalues congregating near zero. As before, it is clear to see once morethat Rαy → A†y as α→ 0.

Now, in the presence of noise, e.g., given data yδ = y+ e, convergence is notso straightforward to prove as a quick glance of Figure 1.3.1 may suggest. Inparticular, one may observe in Figure 1.3.1 that for a given noise level δ > 0,the error tends to infinity as the parameter α tends to zero. In the presenceof noise, however, convergence is proven with respect to δ (rather than α,with an α = α∗ such that 0 < α∗ < αmax <∞ usually depending on δ). It iscommon to define

xδα = Rαyδ and xα = Rαy,

where the latter xα is really an auxiliary function we consider only in proofs asit not possible to calculate it in practice, since we tend to have the presenceof noise in our data. xδα, on the other hand, is the approximate solutionwhich we actually calculate via regularisation. The approximation lies in aneighbourhood of the exact solution x†, the proximity to which is determinedby α. The idea is to choose the α as small as possible in order to obtain a

16

more accurate approximation whilst bearing stability of the computation inmind.In order to understand the relationship between the stability and approxi-mation errors, it is insightful to estimate the total error as

‖xδα − x†‖ ≤ ‖xδα − xα‖+ ‖xα − x†‖, (1.15)

where the first and second term correspond to the data propagation (a.k.a.stability) error and the approximation error, respectively. Subsequently,

‖xδα − xα‖ ≤ B(α), and ‖xα − x†‖ ≤ V (α), (1.16)

where α 7→ B(α) and α 7→ V (α) are typically increasing and decreasingfunctions, respectively. Thus, one has that for small α, the data propagationerror blows up, and for large α, one gets a larger approximation error.

α

‖xδα − xα‖

‖xδα − x†‖

‖xα − x†‖

Figure 1.2: In this plot, the total error, i.e., ‖xδα − x†‖ is represented bythe blue line whilst the propagated data (‖xδα − xα‖) and approximation(‖xα − x†‖) errors, represented by the red lines, are increasing for α → 0+

and α→∞, respectively.

Source Conditions

As well as proving convergence, as in Theorem 3, we are also interestedin convergence rates (i.e., the “speed” of convergence). In particular, it isknown that the approximate solution may converge arbitrarily slowly to thedesired solution in the absence of certain conditions. We give the followingproposition from [141]:

17

Proposition 1. Let rangeA be non-closed and α = α(δ, yδ) be a parameterchoice rule. Then there does not exist any function f : (0,∞)→ (0,∞) withlimδ→0 f(δ) = 0 for which

‖xδα(δ,yδ) − x†‖ ≤ f(δ),

holds for all y ∈ domA† with ‖y‖ ≤ 1 and all δ > 0.

In other words, the proposition above states that one cannot obtain uniformconvergence rates for ill-posed problems, i.e., the convergence may be arbi-trarily slow. We do not give the proof of Proposition 1 here, but insteadrefer the reader to [35]. Thus, in order to overcome this issue, one usuallyenforces a so-called source condition on the solution, i.e.,

x† ∈ range(φ(A∗A)) ⇐⇒ x† = φ(A∗A)ω, ‖ω‖ ≤ C, (1.17)

with an index function φ : R+ → R+ (i.e., φ is continuous, strictly monoton-ically increasing and satisfies φ(0) = 0 cf. [83]), i.e.,∫ ‖A‖2+

0

1

φ2(λ)d‖Eλx†‖2 <∞. (1.18)

Typical index functions include the so-called Holder type ones:

φ(λ) = λµ, (µ > 0) (1.19)

where larger µ indicates higher degrees of smoothness (of the solution) andfor this reason, we may refer to it as the smoothness index.For problems in which the singular values decay very rapidly (i.e., for severelyill-posed problems as defined previously), (1.19) may be unnecessarily strongand therefore, a more appropriate index function may be

φ(λ) =

(log

(C

λ

))−µ,

i.e., a logarithmic type source condition [73].The following proposition from [110] shows that for (1.17) satisfying (1.20),we achieve the desired bound on the approximation error:

Proposition 2. Let x† satisfy a source condition (1.17) and suppose thatthere exists a positive constant such that

rα(λ)φ(λ) ≤ Cφ(α), (1.20)

for all α > 0 and λ ∈ (0, ‖A‖2). Then the approximation error may beestimated as

‖xα − x†‖ ≤ Cφ(α), (1.21)

for all α ∈ (0, αmax).

18

Proof. Recalling that y = A†x†, we may write

‖xα − x†‖2 = ‖RαAx† − x†‖2 =

∫ ∞0

|1− λgα(λ)|2 d‖Eλx†‖2

=

∫ ∞0

|r2α(λ)|2φ

2(λ)

φ2(λ)d‖Eλx†‖2 ≤ Cφ2(α)

∫ ∞0

1

φ2(λ)d‖Eλx†‖2

≤ C ′φ2(α),

where we used (1.20) in the penultimate inequality and the final inequalityis a consequence of (1.18), thereby completing the proof.

Note that (1.20) typically does not hold for all φ, for instance, if we considerthe µ dependent ones then (1.20) may only hold for µ ∈ (0, µ0) in a finite in-terval, where µ0 would be the so-called qualification index, i.e., the maximumµ for which optimal convergence rates hold, i.e., where saturation occurs (tobe exemplified in the proceeding Chapter 2).

Iterating the Regularisation

After regularising the problem (1.1), one may opt to regularise again andconsequently define the so-called second regularisation iterate. In particular,given an “initial guess”, which would be the (first) regularised solution, xδα,we can consider the equation

A(z + xδα) = yδ =⇒ Az = yδ − Axδα,

and thereby iterate the process of regularisation to yield a new vector, zδα :=Rα(yδ − Axδα). Thus, the second iterate may be defined as

xIIα,δ := zδα + xδα = Rα(yδ − Axδα) +Rαyδ = 2Rαy

δ −RαARαyδ,

i.e.,xIIα,δ = (2Rα −RαARα)yδ, (1.22)

is the second regularisation iterate and we adopt the notation xIIα for thenoise-free case. Note that it is possible to repeat this process n times and weexemplify this with e.g., (2.9).It is also useful to define

pδα := yδ − Axδα and pIIα,δ := yδ − AxIIα,δ, (1.23)

as the residuals with respect to a regularisation and the second iterate ofan iterated regularisation, respectively. Moreover, pα and pIIα define theirnoise-free variants.The aforementioned families of regularisation operators, however, are notsufficient for determining a stable approximation of x†. In particular, which

19

α ∈ (0, αmax) does one opt for? If we choose it too large, then the approx-imation error increases; but if we choose it too small, then we may not beable to compute it stably in the presence of data error. A parameter choiceis needed, i.e., a way to select α = α∗ which will ensure a stable yet accuratecomputation of the regularised solution. To this end, we define a regulari-sation method as a pair (Rα∗ , α∗) such that Rα∗y

δ → A†y and α∗ → 0 asδ → 0. We will expand on this in Section 1.3.3.

1.3.2 Iterative Methods

If we recall the Gaußian normal equation: A∗Ax = A∗y, then it is not hardto see that one can transform this into the following fixed point equation

x = x+ A∗(y − Ax),

to approximate x†:xk = xk−1 + A∗(y − Axk−1), (1.24)

with k ∈ N and some initial guess x0. Note that (1.24) is known as Landwe-ber iteration (cf. [97]) and will be explored further in Chapter 4. Iterativemethods may be more generally expressed as

xk = xk−1 +Gk(xk−1, y), (1.25)

for different choices of Gk (cf. [81] and the references therein). The iterationindex k acts as the regularisation parameter for iterative methods and therelationship with α is given by k ∼ 1

α. Thus, if it were not already clear

before, it should be clear now that the asymptotics are inverted in the sensethat where α→ 0 for continuous methods, the iterative regularisation shouldbe such that xk → x† as k → ∞. Examples include Landweber iteration,already stated and also Newton type methods among others.In case (1.1) is ill-posed, then it is not true that xδk → x† as k → ∞.Instead, one observes a phenomenon known as semi-convergence in which‖xδk−x†‖ → ∞ as k →∞, but for a particular k∗ (which we call the stoppingindex, to be detailed in the proceeding section), one has that xδk∗ → x† asδ → 0.Note that we adapt the notation of (1.23) so that

pδk := yδ − Axδk, and pk := y − Axk, (1.26)

define the residual variables in this case. Moreover, it follows from (1.24),that the second iterates in this case correspond to

xIIk,δ = xδ2k, and pIIk,δ = pδ2k, (1.27)

which can easily be verified, i.e., iterating an iterative regularisation, e.g.,Landweber iteration, is tantamount to doubling the iteration index.

20

1.3.3 Parameter Choice Rules

The optimal parameter would be the one which minimises the total error:

αopt = argminα∈(0,αmax)

‖xδα − x†‖. (1.28)

However, (1.28) is not implementable as x† is unknown.Similarly, in case one regularises via an iterative method, one can also use astopping rule which, much like a parameter choice rule, is a methodology forapproximating the optimal stopping index, kopt (which minimises the totalerror with respect to k ∈ N).Therefore, we are left with the task of approximating αopt via an imple-mentable parameter choice rule the consequent parameter of which we shalldenote henceforth by α∗. In particular, they may be categorised as follows:

• a-priori: assumes knowledge of δ. As indicative of the name, such aparameter may be computed prior to measurement of the data. Forinstance, one may choose α∗ = O(δs(µ)) or equivalently k∗ = O(δs(µ)),with an exponent s(µ) that depends on the smoothness index µ of thesolution.

• a-posteriori: assumes knowledge of δ and yδ. A typical example wouldbe Morozov’s discrepancy principle (cf. [114,150]) in which would com-pute the parameter as

α∗ = α(δ, yδ) = supα ∈ (0, αmax) | ‖pδα‖ ≤ τδ, (1.29)

or equivalently, k∗ = infk ∈ N | ‖pδα‖ ≤ τδ, with a constant τ ≥ 1.

• heuristic (a.k.a. error-free, data-driven) [45, 83, 84, 87]: only assumesknowledge of yδ, i.e., α∗ = α(yδ), or equivalently, k∗ = k(yδ). As thisis the topic of the thesis, we proceed to expand on this particular wayof selecting the parameter below in the next section.

In a way, the most “disadvantaged” of the above classes of parameter choicerules are the a-priori rules since they require knowledge of both the noise levelδ and the smoothness index µ. At first glance, the naive reader would assumethe heuristic class of rules would be the natural choice, given that they are theleast strenuous with regard to the information required to implement them.However, it will become apparent in the subsequent convergence theory (andto some degree in the numerics) that heuristic rules are not without theirbanes.Note that in the stochastic case, i.e., where the noise e is considered random,it is common to assume that it has a Gaußian distribution. As such, it istypical in that case to utilise statistical estimators in order to determine thenoise level δ (under certain restrictive conditions) and then to combine it

21

ααopt

V (α)

B(α)

Figure 1.3: In this plot, the selected parameter αopt is the so-called oraclechoice which mimiseses the sum of B(α) and V (α) which are the upperbounds for the data propagation and approximation errors, respectively. Theclass of heuristic parameter choice rules which we will focus on are basedon minimising functionals which try to emulate the aforementioned oraclechoice.

with one of the a-posteiori methods mentioned above (cf. [15, 65]). In termsof heuristic rules, the most popular in the stochastic case are arguably thecross-validation and generalised cross-validation rules (cf. [106,107]). We willdiscuss the GCV rule in a deterministic framework in Section 2.2.3.

Definition 6. For a continuous linear operator A : X → Y between twoHilbert spaces, the family of operators Rα, with Rα : Y → X for eachα ∈ (0, αmax), is called a regularisation (for A†) if there exists a parameterchoice rule α = α∗ such that

limδ→0

supyδ∈Y

‖y−yδ‖≤δ

‖Rα∗yδ − A†y‖ = 0, (1.30)

for all y ∈ domA†, where α∗ is also such that

limδ→0

supyδ∈Y

‖y−yδ‖≤δ

α∗ = 0. (1.31)

For any y ∈ domA†, a pair (Rα, α) is called a (convergent) regularisationmethod (for solving (1.1)) whenever (1.37) and (1.31) hold.

Note that (1.30) case is often referred to as worst case convergence as, inparticular, the method must converge for the supremum of all possible noise.

22

1.4 Heuristic Parameter Choice Rules

In case one does not have knowledge of the noise level δ, one may selectthe parameter depending only on the measured data alone, i.e. α∗ = α(yδ).Generally speaking, such a parameter is typically selected as the minimiserof some convex functional ψ : (0, αmax)× Y → R ∪ ∞, i.e.,

α∗ = argminα∈(0,αmax)

ψ(α, yδ), (1.32)

where ψ acts as a surrogate functional, i.e., it should serve as an estimatorfor the error (cf. Figure 1.4).Similarly, a heuristic stopping rule may be defined via

k∗ = argmink∈[1,kmax]

ψ(k, yδ), (1.33)

where ψ : N× Y → R ∪ ∞.In this thesis, whenever A : X → Y is a continuous linear operator betweenHilbert spaces, the functionals we consider may represented via their spectralform (cf. Appendix A):

ψ(α, yδ)2 =

∫ ∞0

Φα(λ) d‖Fλyδ‖2, (1.34)

with Φα : (0,∞)→ R, a spectral filter function.The idea of the ψ-functionals, and consequently the associated heuristic rules,is that the aforementioned functional should act as a surrogate for the errorfunctional (cf. Figure 1.4), i.e.,

ψ(α, yδ) ∼ ‖xδα − x†‖.

Note that for a ψ-functional of the form (1.34), we may apply the triangleinequality to estimate the functional by two components:

ψ(α, yδ) ≤ ψ(α, y − yδ) + ψ(α, y), (1.35)

with the first term denoting the “noisy” component and the second termbeing associated with exact data.In particular, it is for this reason that (under certain conditions),

ψ(α, y − yδ) ∼ ‖xδα − xα‖,ψ(α, y) ∼ ‖xα − x†‖, (1.36)

i.e., the functional acting on the noise should behave similarly to the datapropagation error, and the functional acting on the exact data should anal-ogously behave like the approximation error.In order to prove convergence of a regularisation method (cf. Definition 6),one should be able to prove convergence with respect to the parameter choicefor all possible noise, i.e., for the worst case scenario (cf. (1.30)). A pitfall forheuristic rules is the so-called Bakushinskii veto [4], which is the following:

23

αα∗

‖xδα − x†‖ψ(α, yδ)

Figure 1.4: In this plot, we illustrate the principle that the heuristic ψ-functional should somehow act as a surrogate for the error (represented bythe black curve) so that minimising the ψ-functional which only requiresknowledge of the data should yield a parameter which is as close as possibleto the optimal one.

Proposition 3 (Bakushinskii’s veto). Suppose (1.1) is ill-posed, i.e., A† isnot continuous, and let α = α∗. Then we have that

‖xδα∗ − x†‖ → 0,

as δ → 0 only if α∗ depends on δ.

Proof. Suppose that α∗ = α(yδ) does not depend on δ. Then

limδ→0

supyδ∈Y

‖yδ−y‖≤δ

‖Rα(yδ)yδ − A†y‖ = 0, (1.37)

would imply that Rα(y)y = A†y for all y ∈ domA†. Thus, if yn (withyn ∈ domA†) is such that yn → y ∈ domA†, then this would in turn implythat A†yn = Rα(yn)yn → A†y, i.e., that A† is continuous [35].

Contrary to Bakushinskii’s veto, however, heuristic parameter choice ruleswhen implemented are often successful, usually providing better than sat-isfactory results (cf. [9]). A possible reason could be due to the fact thatBakushinskii’s veto considers the supremum of all possible noise, i.e., theworst possible noise, which in practical implementations is often not the en-countered scenario. Therefore, Kindermann and Neubauer proposed a noise

24

restricted analysis in which one can prove convergence of the heuristic rules(cf. [83, 87]), i.e., instead of attempting to prove (1.37), one proves

limδ→0

supyδ∈Y

y−yδ∈N

‖Rα∗yδ − A†y‖ = 0,

where N ⊂ Y is a restriction on the noise aimed to bypass the aforemen-tioned veto of Bakushinskii. In particular, with this concept of convergence,Proposition 3 no longer applies.Prior to the aforementioned solution, Glasko and Kriskin (cf. [45]) also de-fined

N :=e ∈ Y | ‖xδα − xα‖ ≤ Cψ(α, y − yδ) for all α ∈ (0, αmax)

, (1.38)

as the so-called auto-regularisation set in order to prove convergence withrespect to the quasi-optimality rule (although, as stated, one may use thissame condition to prove convergence in the linear theory with respect toall the mentioned heuristic rules), which postulates a bound on the datapropagation error. In particular, we provide a sketch below of how one wouldutilise this condition in order to prove convergence:

‖xδα∗ − x†‖ ≤ ‖xδα∗ − xα∗‖+ ‖xα∗ − x†‖ ≤ Cψ(α∗, y − yδ) + φ(α∗),

where the first inequality is a mere application of the triangle inequalityand the second inequality follows from the above auto-regularisation (1.38)condition and the bound for the approximation error by the index function,(1.21).Then, due to the sub-additivity condition of the ψ-functionals, one obtainsthe bound

ψ(α∗, y − yδ) ≤ ψ(α, yδ) + ψ(α∗, y). (1.39)

Additionally, the functionals satisfy ψ(α, yδ)→ 0 as δ → 0 (cf. Proposition 4)and ψ(α∗, y) ≤ Cα∗ for all α ∈ (0, αmax). Hence, since it can (and will) beshown that α∗ → 0 as δ → 0, it follows that xδα∗ → x† in the strong (i.e.,norm) topology.One may view (1.38) as quite an abstract condition, however, which does notprovide any particular insight or qualitative information regarding the noise.Prior to defining some concrete examples of ψ-functionals, we first providesome associated “axioms”. In particular, from [83], we can assume thatall ψ functionals (in the linear Hilbert space setting) satisfy the followingassumptions:

ψ(α, yδ) ≥ 0, for all α ∈ (0, αmax) and yδ ∈ Y ; (A1)

ψ(α, yδ) = ψ(α,−yδ) for all α ∈ (0, αmax) and yδ ∈ Y ; (A2)

ψ(α, ·) : Y → R ∪ ∞ is continuous for all α ∈ (0, αmax); (A3)

ψ(·, y) : (0, αmax)→ R ∪ ∞ is l.s.c. for all y ∈ Y ; (A4)

y ∈ domA† implies limα→0

ψ(α, y) = 0. (A5)

25

We may now give the following useful convergence result from [83]:

Proposition 4. Let α∗ be selected by (1.32) and ψ satisfy (A1)-(A5). Thenψ(α, yδ)→ 0 as δ → 0 for all yδ ∈ Y .

Proof. It is immediate that ψ(α∗, yδ) ≤ ψ(α, yδ) for all α ∈ (0, αmax). Addi-

tionally, by (A3), it follows that

lim supδ→0

ψ(α∗, yδ) ≤ ψ(α, y), for all α ∈ (0, αmax).

In light of (A5), it follows that

0 ≤ lim infδ→0

ψ(α∗, yδ) ≤ lim sup

δ→0ψ(α∗, y

δ) ≤ 0,

hence, yielding the desired result.

Definition 7. Let gα be such that rα ≥ 0. Then, for such a regularisation, wecan define four “classical” heuristic parameter choice rules via the functionals(which also satisfy (A1),(A2),(A3),(A4),(A5)):

ψHD(α, yδ) :=1√α‖pδα‖ =

1√α‖rα(AA∗)yδ‖, (HD)

ψHR(α, yδ) :=1√α〈pIIα,δ, pδα〉

12 =

1√α‖rα(AA∗)

32yδ‖, (HR)

ψL(α, yδ) := 〈xδα, xδα − xIIα,δ〉12 = ‖A∗gα(AA∗)rα(AA∗)

12yδ‖, (L)

ψQO(α, yδ) := ‖xIIα,δ − xδα‖ = ‖A∗gα(AA∗)rα(AA∗)yδ‖, (QO)

which define the heuristic discrepancy (HD), Hanke-Raus (HR) (cf. [59]),simple-L (L) (cf. [91]) and quasi-optimality (QO) (cf. [45,87,145,146]) func-tionals, respectively.

These would generally be the considered as the so-called “classical rules”.Their stopping rule counterparts are defined analogously by replacing thepδα, p

IIα,δ terms above by pδk, p

δ2k, respectively, and similarly xδα and xIIα,δ are

replaced by xδk and xδ2k, respectively, in the same fashion. Similarly for thefilter functions. In this thesis, we will generally focus slightly more on contin-uous regularisation methods, however. Moreover, one should note that thesecond equalities with the filter functions are only valid in case of linear reg-ularisation over Hilbert spaces. Thus, the expressions in the first equalitiesare the more general and can be used in a more expanded setting.The heuristic discrepancy rule was originally proposed as a stopping rulefor Landweber iteration in the paper [59] whilst the Hanke-Raus rule wassuggested as a parameter choice rule for Tikhonov regularisation. However,one may just as well use the HD rule continuous regularisation methods

26

such as Tikhonov regularisation and for this reason, it is sometimes ratherconfusingly also called the “Hanke-Raus” rule. The intuition is that since

ψHD(α, y) =1√α‖rα(AA∗)Ax†‖ ∼ ‖rα(A∗A)x†‖ = ‖xα − x†‖,

in most cases, it seems reasonable to expect that ψHD(α, yδ) = ‖rα(AA∗)yδ‖/√α

may be used for error estimation (cf. [35]).The logic behind the quasi-optimality rule is that the minimum of ψQO willnormally be obtained near the “cross-over point” where the propagated dataand approximation errors have roughly the same order of magnitude (cf. [35]).This incidentally turns out to be an effective parameter choice rule, as willbe revealed in later sections.Note that we may also “extend” the above to define a class of ratio rules bysimply dividing the functionals above by ‖xδα‖, e.g.

ψHDR(α, yδ) :=‖pδα‖√α‖xδα‖

,

which defines the heuristic disrepancy ratio functional, and in similar fashion,we use the notation ψHRR and ψQOR to refer to the Hanke-Raus ratio andquasi-optimality ratio functionals [99]. The ψHDR functional defined aboveis also known as the Brezinski-Rodriguez-Seatzu rule [18]. In the case ofstopping rules, the ratio rules are defined in an equivalent manner, replacingα with 1/k, and clearly also xδα and pδα by xδk and pδk, respectively.. Therationale behind the ratio rules being that whilst the standard functionalsapproximate the total error, the ratio functionals similarly approximate therelative error:

Errrel(α) =‖xδα − x†‖‖x†‖

.

However, we may disappoint the reader by stating now that the ratio ruleslie beyond the scope of this thesis.A further class of heuristic rules include those which combine pre-existingfunctionals, e.g.,

ψHME(α, yδ) :=ψHR(α, yδ)

ψHD(α, yδ)=

1√α

〈pIIα,δ, pδα〉‖pIIα,δ‖

,

which defines the functional for the heuristic monotone error rule [52, 144].One can observe that this is nothing other than the ratio of the Hanke-Rausand heuristic discrepancy functionals. This is also one of the “heuristifica-tions” of certain a-posteriori parameter choice rules; in this case, the mono-tone error rule which selects the parameter as

α∗ = sup

α ∈ (0, αmax) |

〈pIIα,δ, pδα〉‖pIIα,δ‖

≤ τδ

,

27

for a suitably chosen τ ≥ 1. Similarly, the HD rule is a heuristification of thediscrepancy principle (1.29) mentioned earlier and the Hanke-Raus rule is aheuristification of the so-called improved discrepancy principle [34,35,44]:

α∗ = supα ∈ (0, αmax) | 〈pIIα,δ, pδα〉 ≤ τδ

, (1.40)

also known as the Raus-Gferer rule, which has the advantage of always beingan optimal order method, contrary to the standard discrepancy principle.On the other hand, an a-posteriori rule seemingly based on a heuristic ruleis the so-called Lepskii rule a.k.a. balancing principle [96,100,109]

α∗ = supα ∈ (0, αmax) | ‖xIIα,δ − xδα‖ ≤ τδ

,

which is arguably a “a-posteriori-fication” of the quasi-optimality rule. An-other, in fact very popular, heuristic rule is the generalised cross validationrule [125,152]. However, that rule is restricted to the finite dimensional caseand we shall discuss it in Section 2.2.3.

Graph Based Heuristic Rules In addition to the ψ-based methods, al-ternative options for heuristic rules include choosing the parameter α∗ =α(yδ), again, based on the data alone, via a graph based method. Arguablythe most popular of these methods and perhaps all heuristic rules is the L-curve rule [60]. Other methods include the U-curve [93], the V-curve [40]and the recently proposed Q-curve rules [135].The L-curve method consists of plotting the graph (log ‖pδα‖, log ‖xδα‖) which,in principle, should produce an “L” shaped curve allowing one to subse-quently select the parameter α∗ as the corner of the “L”, i.e., the point ofmaximum curvature. See Figure 1.5. The intuition behind the L-curve isthat we would like to choose a parameter such that the residual is small,whilst also maintaining stability of the solution, i.e., ‖xδα‖ should not “blowup”. Whilst the logic may seem sound, it has been observed that the rulemay sometimes fail and cannot be subjected to the same analysis as the ψ-based methods, e.g., the noise restricted analysis induced by (2.17). Thus, itis difficult to garner an understanding of when the method works and oth-erwise. Recently, however, a ψ-functional rule based on the L-curve methodwas developed [91] for which a convergence analysis consistent with the otherψ-based methods is possible. See Section 2.1.1 for details.We will not go into detail on the other methods, since the focus of this thesisis on the ψ-functional based methods, but the V-curve essentially choosesthe parameter as the minimiser of the speed of the parameterisation of theL-curve. The Q-curve, on the other hand, consists of plotting the graph of(log〈pIIα,δ, pδα〉, logψQO(α, yδ)) and choosing the point of maximum curvature,similarly as we do for the aforementioned L-curve rule. Whilst the Q-curvealso exhibits an “L” shaped curve, in spite of its name (which is derived from

28

‖Axδα − yδ‖

‖xδα‖

α∗

Figure 1.5: In this figure, we see a typical example of how an L-curve plotshould look. The idea is that the graph of (‖Axδα− yδ‖, ‖xδα‖) should exhibitan “L” shape and the parameter is selected as the “corner point”.

its incorporation of the QO functional rather than the general shape of itsplots), the U-curve, in principle, should actually exhibit a “U” shaped plot.We would plot (α, U(α)), where

U(α) :=1

‖pδα‖+

1

‖xδα‖,

and we would select the parameter α∗ as the one which corresponds to thelocal maximum on the left-hand-side of the graph. Indeed, the “U” shapearises from the observation that U(α) → ∞ whenever α → 0 and also asα→∞. (cf. [93]).

29

Part I

Theory

30

Chapter 2

Linear Tikhonov Regularisation

In this chapter, we provide a theoretical analysis of Tikhonov regularisation,which the observant reader will recall was introduced in Example 1 of Chap-ter 1. This form of regularisation was introduced by its namesake, Tikhonov(cf. [145, 146]), in its variational form (see (2.4) below) in order to approx-imate the solution of ill-posed Fredholm integral equations. However, sinceits inception, Tikhonov regularisation has been thoroughly analysed via itsspectral representation (see (2.1) below) for linear ill-posed problems (cf. [35]and the references therein). We therefore devote the first section of this chap-ter as an exposition of the aforementioned classical theory with relevance tothe heuristic parameter choice rules. There are then a further two sectionswhich provide an expansion of the classical theory to particular deviations ofthe typical framework for ill-posed problems: when the noise in unboundedin the usual norm and when the operator is perturbed, respectively.As mentioned, this method is sometimes also referred to as Tikhonov-Phillipsregularisation as Phillips [128] suggested it around the same time. Furtherreferences include [49,147] and those therein.A scan of the Table of Contents would reveal that the proceeding chapter isalso on Tikhonov regularisation, but that covers the relatively more recentapplication to so-called convex regularisation problems.

2.1 Classical Theory

Now we delve into the classical convergence theory for Tikhonov regulari-sation based on error estimates. The functional calculus for operators willfeature heavily in the analysis and we therefore refer the unacquainted readerto Appendix A. Otherwise, we proceed.Let A : X → Y be a continuous linear operator between two Hilbert spaces.The idea of Tikhonov regularisation is to shift the spectrum of A∗A awayfrom the singularity (at zero), i.e., we estimate (1.7) by

xδα = (A∗A+ αI)−1A∗yδ, (2.1)

31

with α ∈ (0, αmax) (cf. [145,146]).In this case, we recall from Example 1, that the associated filter function isgiven by

gα(λ) =1

λ+ α, (2.2)

and the filter function for its associated residual is given by

rα(λ) =α

λ+ α. (2.3)

The equation (2.1) may also be written in the variational form [35]:

Proposition 5. Let xδα be defined by (2.1). Then

xδα = argminx∈X

‖Ax− yδ‖2 + α‖x‖2, (2.4)

for all α ∈ (0, αmax) and yδ ∈ Y .

Proof. From the optimality condition, it is clear that xδα satisfies

0 = A∗(Axδα − yδ) + αxδα = (A∗A+ αI)xδα − A∗yδ

⇐⇒ (A∗A+ αI)xδα = A∗yδ,

from which the result follows trivially.

In the latter sections, we will see that the form (2.4) lends itself better forgeneralisation than the spectral formulation (2.1) (cf. Chapter 3).Now we are ready to prove the following error estimates which will also beutilised for subsequent results [35]:

Proposition 6. There exist positive constants such that

‖xδα − xα‖ ≤ Cδ√α, ‖xα − x†‖ = O(1), as α→ 0,

‖A(xδα − xα)‖ ≤ Cδ, ‖Axα − y‖ ≤ C√α,

for all y, yδ ∈ Y and α ∈ (0, αmax)

Proof. First we estimate the data propagation error

‖xδα − xα‖2 = ‖(AA∗ + αI)−1A∗(yδ − y)‖2

= ‖A∗(AA∗ + αI)−1(yδ − y)‖2

= ‖(AA∗)12 (AA∗ + αI)−1(yδ − y)‖2

=

∫ ∞0

λ

(λ+ α)2d‖Fλ(yδ − y)‖2

≤ C1

α

∫ ∞0

d‖Fλ(yδ − y)‖2,

32

sinceλ

(λ+ α)2=

λ

λ+ α· 1

λ+ α≤ 1

α.

Thus, in this case,

‖xδα − xα‖ ≤ B(α) = Cδ√α.

The approximation error is estimated as

‖xα − x†‖2 = ‖[(A∗A+ αI)−1A∗A− I]x†‖2 = ‖α(A∗A+ αI)−1x†‖2

=

∫ ∞0

α2

(λ+ α)2d‖Eλx†‖2.

In particular, we prove similarly as in [35] and Theorem 3, by observing thatsince the integrand, i.e., the residual function, is bounded from above, wemay apply the dominated convergence theorem identically as before to get

limα→0

∫ ∞0

α2

(λ+ α)2d‖Eλx†‖2 =

∫ ∞0

limα→0

α2

(λ+ α)2d‖Eλx†‖2,

and since the limit tends to zero for all λ > 0 and to 1 for λ = 0, the resultfollows as in the proof of Theorem 3.Now, we estimate the data discrepancy as

‖A(xδα − xα)‖2 = ‖A(A∗A+ αI)−1A∗(yδ − y)‖2

= ‖(AA∗ + αI)−1AA∗(yδ − y)‖2

=

∫ ∞0

λ2

(λ+ α)2d‖Fλ(yδ − y)‖2 ≤ Cδ2.

Finally, we may bound the exact discrepancy in similar fashion

‖Axα − y‖2 = ‖A(xα − x†)‖2 = ‖αA(A∗A+ αI)−1x†‖2

=

∫ ∞0

α2λ

(λ+ α)2d‖Eλx†‖2 ≤ Cα,

thereby completing the proof.

The above estimates cater for a simple proof of the following convergenceresult for Tikhonov regularisation [35]:

Theorem 4. Let α = α(δ) be chosen such that α → 0 and δ/√α → 0 as

δ → 0. Then we have‖xδα − x†‖ → 0,

as δ → 0.

33

Proof. From the estimates in the previous proposition, we can estimate thetotal error as

‖xδα − x†‖ ≤ Cδ√α

+ O(1),

for all α ∈ (0, αmax). Therefore, choosing α = α(δ) such that δ/√α→ 0 and

α→ 0 as δ → 0 yields convergence.

Due to Proposition 1, convergence in Theorem 4 may be arbitrarily slow. Weremind the reader, however, that Proposition 1 (on arbitrarily slow conver-gence) is not just valid for Tikhonov regularisation, but for all regularisationmethods.In any case, to prove convergence rates, we postulate the following Holdertype source condition:

x† ∈ range(A∗A)µ, µ > 0, (2.5)

i.e., there exists an ω such that x† = (A∗A)µω. Thence, we obtain thefollowing estimates [35]:

Proposition 7. Let (2.5) hold. Then there exists constants such that

‖xα − x†‖ ≤ Cαµ, for µ ∈ [0, 1), (2.6)

‖Axα − y‖ ≤ Cαµ+ 12 , for µ ∈

[0,

1

2

), (2.7)

for all α ∈ (0, αmax) and y ∈ Y .

Proof. We have

‖xα − x†‖2 = ‖α(A∗A+ αI)−1x†‖2 = ‖α(A∗A+ αI)−1(A∗A)µω‖2

=

∫ ∞0

α2λ2µ

(λ+ α)2d‖Eλω‖2 ≤ sup

λ∈(0,∞)

(α2λ2µ

(λ+ α)2

)∫ ∞0

d‖Eλω‖2

= (µ− 1)2

(µα

1− µ

)2µ

‖ω‖2 = Cα2µ,

for µ ∈ [0, 1), and

‖Axα − y‖2 = ‖αA(A∗A+ αI)−1(A∗A)µω‖2

=

∫ ∞0

α2λ2µ+1

(λ+ α)2d‖Eλω‖2 ≤ sup

λ∈(0,∞)

(α2λ2µ+1

(λ+ α)2

)∫ ∞0

d‖Eλω‖2

=1

4α

(α + 2µα

1− 2µ

)2µ

(1− 4µ2)‖ω‖2 = Cα2µ+1,

for µ ∈ [0, 12), which yields the desired estimates.

34

In light of the above, we are now able to give the convergence rates result [35]:

Corollary 1. Let (2.5) hold with µ ∈ [0, 1). Then, choosing α = α(δ) =

αopt := δ2

2µ+1 yields that

‖xδα − x†‖ = O(δ

2µ2µ+1

), (2.8)

as δ → 0.

Proof. Recall the estimates (1.15) and (1.16). Then it follows from the datapropagation error estimate of Proposition 6 and (2.6), that computing

‖xδα − x†‖ ≤ infα∈(0,αmax)

(C

δ√α

+ Cαµ)

yields the desired estimate, with α = αopt = δ2

2µ+1 .

The estimate (2.8) is known as the optimal order of convergence (cf. [35]).From the estimates above, we may observe that Tikhonov regularisationexhibits the well-known saturation effect in which the convergence rates donot improve for µ ≥ 1 and in particular, why we do not assume a sourcecondition (2.5) with µ ≥ 1. In general, the saturation effect describes thebehaviour of regularisation methods for which (2.8) does not hold for all µ > 0for only up to some finite qualification index µ0. For Tikhonov regularisation,we observe that the qualification is µ0 = 1 (cf. [35]). Note that there alsoexist generalisations of the source condition above, cf. [110,111].One can also define an iterative regularisation scheme, as in Chapter 1and (1.22), with the Tikhonov regularisation operator, known as iteratedTikhonov regularisation (cf. [57]). This is given by the following expression:

xδα,n := (A∗A+ αI)−1(A∗yδ + αxδα,n−1), xδα,0 := 0. (2.9)

It is also possible to express (2.9) in its variational form:

xδα,n = argminx∈X

‖Ax− yδ‖2 + α‖x− xδαn−1‖2.

In this case, one may consider regularisation for α→ 0 with a fixed n ∈ N andthat is indeed usually the case. However, one may also fix an α ∈ (0, αmax)and consider (2.9) as an iterative regularisation method for n→∞ [35]. Therespective filter and residual functions may be written in the form:

gα,n(λ) =(λ+ α)n − αn

λ(λ+ α)n,

rα,n(λ) =

(α

λ+ α

)n,

35

which, we note, is a rather convenient form for the residual function.Of particular interest to us (for usage in several parameter choice rules) isthe second Tikhonov iterate, which we denote by (cf. (1.22) in Chapter 1):

xIIα,δ = (A∗A+ αI)−1(A∗yδ + αxδα), (2.10)

which is simple to compute and follows from plugging n = 2 into (2.9). Onemay also expand the expression in (2.10) to get

xIIα,δ = (A∗A+ αI)−1A∗[yδ + (yδ − Axδα)],

where we observe that the iterates are essentially “adding back the noise”;that is, we add yδ − Axδα to the initial data yδ (and then regularise). Notethat the filter function associated with the second Tikhonov iterate is givenby

gIIα (λ) :=λ+ 2α

(λ+ α)2,

and

rIIα (λ) :=

(α

λ+ α

)2

is then the respective filter function for the associated residual.

Remark. The qualification for the second Tikhonov iterate is µ0 = 2 andin general, for iterated Tikhonov regularisation, (1.20) holds for Holder typesource conditions a la (2.5) with qualification µ0 = n.

2.1.1 Heuristic Parameter Choice Rules

For Tikhonov regularisation, we may consider the “classical” ψ-functionalbased heuristic parameter choice rules, defined in Definition 7 of Chapter 1,as the minimisers of (1.34), with

Φj,kα (λ) =

αk−j−1λj

(λ+ α)k. (2.11)

In particular, from the definitions of gα (2.2) and rα (2.3), for Tikhonovregularisation, we have the following: if k = 2 and j = 0, then this definesthe heuristic discrepancy rule; k = 3 and j = 0 defines the Hanke-Rausrule; k = 4 and j = 1 defines the quasi-optimality rule, and all of theaforementioned fall into the so-called R1 family of rules (cf. [133]). Anotherrule, which does not fall into the R1 family, is the simplified L-curve rule,which is defined by taking k = 3 and j = 1. In addition to the aforementionedR1-rules, there is the greater family of R*-rules [54]. We may also expressthis in tabular form for he benefit of the reader:

36

k = 2 k = 3 k = 4j = 0 HD HRj = 1 L QO

Or more explicitly:

Φα(λ) =α

(λ+ α)2, (HD)

Φα(λ) =α2

(λ+ α)3, (HR)

Φα(λ) =αλ

(λ+ α)3, (L)

Φα(λ) =α2λ

(λ+ α)4. (QO)

Relatively recent results on the quasi-optimality rule may be found in [7, 8].Note that for Tikhonov regularisation, the simple-L functional may be writ-ten in the following form [91]:

ψL(α, yδ) = −⟨xδα, α

∂

∂αxδα

⟩. (2.12)

Naturally, it follows that the respective ratio rule may be expressed as

ψLR(α, yδ) = −⟨xδα, α

∂∂αxδα⟩

‖xδα‖2, (2.13)

although, as mentioned previously, we will generally focus more on the non-ratio rules. An a-posteriori version of this rule was also proposed in [91],namely,

α∗ = sup

α ∈ (0, αmax) | −α

⟨xδα, α

∂

∂αxδα

⟩≤ τδ

,

for an appropriately chosen τ ≥ 1, although this rule is yet to have beeninvestigated.

The Simple-L Rules

Note that the simple-L and simple-L ratio rules are relatively new devel-opments (cf. [91]) which drew inspiration from the more classical L-curvemethod (cf. [60, 62, 64, 98]) in which one plots the graph of (log(‖Axδα −yδ‖2), log(‖xδα‖2)) and selects the parameter as the maximiser of the curva-ture of the so-called L-graph, i.e., the curve

α 7→(κ(α)χ(α)

),

37

where κ(α) := log(‖Axδα − yδ‖2) and χ(α) := log(‖xδα‖2). In essence, thiscorresponds to selecting

α∗ = argmaxα∈(0,αmax)

θ(α),

where θ : (0, αmax)→ R ∪ ∞ denotes the signed curvature defined as [60]

θ(α) :=χ′′(α)κ′(α)− χ′(α)κ′′(α)

(χ′(α)2 + κ′(α)2)32

. (2.14)

In particular, for Tikhonov regularisation, (2.14) may be simplified to anexpression devoid of second derivatives; namely, if we set

%(α) := ‖Axδα − yδ‖2 and υ(α) := ‖xδα‖2, (2.15)

and observing that we can write

υ′(α) =∂

∂α‖(A∗A+ αI)−1A∗yδ‖2 =

∂

∂α

∫ ∞0

1

(λ+ α)2d‖EλA∗yδ‖2

=

∫ ∞0

∂

∂α

λ

(λ+ α)2d‖Fλyδ‖2 = −2

∫ ∞0

λ

(λ+ α)3d‖Fλyδ‖2,

and similarly again,

%′(α) =

∫ ∞0

∂

∂α

α2

(λ+ α)2d‖Fλyδ‖2 = 2

∫ ∞0

αλ

(λ+ α)3d‖Fλyδ‖2,

we have the identity%′(α) = −αυ′(α).

Moreover, we may define the so-called logarithmic derivatives of % and υ by

κ′′(α) =∂

∂α

%′

%=%′′%− (%′)2

%2and χ′′(α) =

∂

∂α

υ′

υ=υ′′υ − (υ′)2

υ2.

Furthermore,

%′′ =∂

∂α(−αυ′) = −υ′ − αυ′′,

and we may use all of the above to rewrite (2.14) as

θ(α) =υ(α)%(α)

|υ′(α)|%(α)υ(α) + αυ′(α)%(α) + α2υ′(α)υ(α)

(%(α)2 + α2υ(α)2)32

, (2.16)

as was observed in [55, 63, 91, 151]. The described L-curve method is whatmay be called a “graphical” method and cannot be analysed in the samemanner as the ψ-based methods. Note that other related methods include

38

the so-called V-curve [40], U-curve [93, 94] and more recently, Q-curve [135](see Section 1.4).Indeed, the V-curve selects the parameter similarly to the L-curve rule byminimising the function

θV (α) =

∥∥∥∥(α ∂∂ακ(α)

α ∂∂αχ(α)

)∥∥∥∥ .Preceding the ψL method was Reginka’s rule (cf. [136]) which composed ofminimising the functional:

ψR(α, yδ) = ‖xδα‖τ‖Axδα − yδ‖. (τ > 0)

Indeed, one can also find in [35, Proposition 4.37] the result that α∗ =argminψR(α, yδ) if and only if α∗ = argmax θ(α). However, the choice ofτ in ψR is somewhat of a cumbersome dilemma. Moreover, it was also ob-served in [83] that Reginska’s method is not subject to the same analysis asthe other ψ-based methods (e.g. the HD, HR or QO rules) in the sense thatψR is neither subadditive, nor can it satisfy a noise restriction a la (2.17);thus the motivation for the “new” L functionals, which are also motivatedby the following proposition [91]:

Proposition 8. We have

θ(α) =υ(α)

α|υ′(α)|C1(ζ)− C2(ζ), ζ :=

%(α)

αυ(α),

where C1, C2 are positive constants depending on ζ and satisfy

0 ≤ C1(ζ) ≤ 2

3√

3, and 0 ≤ C2(ζ) ≤ 1√

2,


Proof. The expression (2.16) can easily be rewritten as (2.15) with

c1(ζ) =ζ2

(ζ2 + 1)32

, c2(ζ) =ζ(1 + ζ)

(ζ2 + 1)32

.

By elementary calculus, we may find the maxima for c1 at ζ =√

2 and forc2 at ζ = 1 yielding the upper bounds.

Upon observation of the expression in Proposition 8, and the realisationthat the function α 7→ υ(α)/α|υ′(α)| is the sole contributer to large valuesof θ (recalling that the L-curve method maximises θ), it follows that onemay in effect “simplify” the task of computing α∗ by instead minimising thereciprocal of the aforementioned function: the direct result is the simple-Lratio rule. The simple-L rule is then derived from the observation of theprevious one being a ratio rule.

39

Convergence Analysis

Arguably the most important part of this section is the convergence analysisof the mentioned rules. For this, we first prove some more preliminary boundsbefore finally providing theorems which give convergence rates for the totalerror with respect to each heuristic parameter choice rule.First we show that if the parameter α is selected according to a heuristicparameter choice rule, then it may be estimated from above by the corre-sponding ψ-functional:

Proposition 9. Letα∗ = argmin

α∈(0,αmax)

ψ(α, yδ),

with ψ ∈ ψHD, ψHR, ψQO, ψL. Then

α∗ ≤

Cψ2(α, yδ), if ψ = ψHD and ∃ C1, C2 > 0 s.t. ‖yδ‖ ≥ C,

or ψ = ψL and ∃ C s.t. ‖A∗yδ‖ ≥ C,

Cψ(α, yδ), if ψ = ψHR and ∃C s.t. ‖yδ‖ ≥ C,

or ψ = ψQO and ∃ C > 0 s.t. ‖A∗yδ‖ ≥ C,

for all yδ ∈ Y .

Proof. Note that we may write

ψ(α, yδ) =

√α‖(AA∗ + αI)−1yδ‖, if ψ = ψHD,

α‖(AA∗ + αI)−32yδ‖, if ψ = ψHR,

α‖(A∗A+ αI)−2A∗yδ‖, if ψ = ψQO,√α‖(A∗A+ αI)−

32A∗yδ‖, if ψ = ψL.

Now, notice that for s ≥ 0, we have

‖(AA∗ + αI)s(AA∗ + αI)−syδ‖ ≤ ‖(AA∗ + αI)s‖‖(AA∗ + αI)−syδ‖,

i.e.,

‖(AA∗ + αI)−syδ‖ ≥ ‖yδ‖‖(AA∗ + αI)s‖

≥ C

Cs,

where ‖(AA∗ + αI)s‖ ≤ Cs. Then the result subsequently follows for theheuristic discrepancy and Hanke-Raus functionals with s = 1 and s = 3

2,

respectively.The estimate for the quasi-optimality and simple-L functionals follow simi-larly, as

ψQO/L(α, yδ) ≥ αt‖A∗yδ‖

‖(A∗A+ αI)s‖≥ C

Cs,

(s ∈

3

2, 2

, t ∈

1

2, 1

)with ‖(AA∗ + αI)s‖ ≤ Cs.

40

Corollary 2. Let α∗ be selected as in Proposition 9. Then α∗ → 0 as δ → 0.

Proof. It follows from Propositions 9 and 4 that

α∗ ≤ ψs(α∗, yδ) ≤ ψs(α, yδ)→ 0, s ∈ 1, 2

as δ → 0.

Noise Restriction Due to the ever ominous Bakushinkii veto (cf. Propo-sition 3), we would like to restrict the set of admissible noise in order tosatisfy the auto-regularisation condition (1.38), thereby proving convergenceof the method. In particular, whilst the former condition is quite abstract,a weaker and incidentally more insightful condition was given; namely theso-called Muckenhoupt condition, which requires that the noise belongs tothe following set:

Np :=

e ∈ Y | αp

∫ ∞α

λ−1 d‖Fλe‖2 ≤ C

∫ α

0

λp−1 d‖Fλe‖2 ∀α ∈ (0, αmax)

.

(2.17)Notice that for (2.17) to hold, there should be sufficiently many high fre-quency components (for the right-hand side to dominate). Therefore, thiscondition tells us that the noise should be sufficiently irregular; i.e., verysmooth perturbations of the noise do not satisfy the Muckenhoupt condi-tion. The observant reader might recall that this is the “irony” we referredto in the description of degrees of ill-posedness in Chapter 1.This yields the following proposition (cf. [83, 87]):

Proposition 10. Let

e = y − yδ ∈

N1, if ψ ∈ ψHD, ψHR,N2, if ψ ∈ ψQO, ψL.

Then‖xδα − xα‖ ≤ ψ(α, y − yδ), (2.18)

for all α ∈ (0, αmax) and y, yδ ∈ Y .

Proof. Recall that the data propagation error may be represented in its spec-tral form by

‖xδα − xα‖2 =

∫ ∞0

λ

(λ+ α)2d‖Fλ(y − yδ)‖2. (2.19)

The idea of the proof of this theorem is to split the above integral intotwo parts: λ ∈ (0, α) and λ ∈ (α,∞) which will allow us to utilise theMuckenhoupt condition (2.17) above to either achieve an appropriate bound

41

for the data propagation error from above and then to subsequently connectit with a bound for the ψ-functional acting on the noise from below.Since ∫ ∞

α

λ

(λ+ α)2d‖Fλ(y − yδ)‖2 ≤ C

∫ ∞α

λ−1 d‖Fλ(y − yδ)‖2, (2.20)

and ∫ α

0

λ

(λ+ α)2d‖Fλ(y − yδ)‖2 ≤ C

1

α2

∫ α

0

λ d‖Fλ(y − yδ)‖2 (2.21)

≤ C1

α

∫ α

0

d‖Fλ(y − yδ)‖2, (2.22)

it follows from the above estimates that (2.19) may be bounded as

‖xδα − xα‖2 ≤ C

(1

α2

∫ α

0

λ d‖Fλ(y − yδ)‖2 +

∫ ∞α

λ−1 d‖Fλ(y − yδ)‖2

)≤ C

(1

α

∫ α

0

d‖Fλ(y − yδ)‖2 +

∫ ∞α

λ−1 d‖Fλ(y − yδ)‖2

)Recall the Muckenhoupt condition (2.17), which allows us to bound the sec-ond term of the inequality above with p = 1 and p = 2, respectively, yielding

∫ ∞α

λ−1 d‖Fλ(y − yδ)‖2 ≤ C

1

α

∫ α

0

d‖Fλ(y − yδ)‖2, if p = 1,

1

α2

∫ α

0

λ d‖Fλ(y − yδ)‖2, if p = 2.

(2.23)

In case ψ ∈ ψQO, ψL, we observe, similarly as in [83,87,91], that

ψ2QO(α, y − yδ) ≥

∫ α

0

α2λ

(λ+ α)4d‖FλQ(y − yδ)‖2

≥ C

∫ α

0

λ

(λ+ α)2d‖FλQ(y − yδ)‖2 ≥ C

1

α2

∫ α

0

λ d‖FλQ(y − yδ)‖2,

and

ψ2L(α, y − yδ) ≥

∫ α

0

αλ

(λ+ α)3d‖Fλ(y − yδ)‖2 ≥ C

∫ α

0

λ

α2d‖Fλ(y − yδ)‖2.

so that with (2.23), where p = 2, we achieve the desired estimate.In case ψ ∈ ψHD, ψHR, we follow the example of [83] and find that

ψ2HD(α, y − yδ) ≥

∫ α

0

α

(λ+ α)2d‖Fλ(y − yδ)‖2 ≥ C

α

∫ α

0

d‖Fλ(y − yδ)‖2

42

and

ψ2HR(α, y − yδ) ≥

∫ α

0

α2

(λ+ α)3d‖Fλ(y − yδ)‖2 ≥ 1

α

∫ α

0

d‖Fλ(y − yδ)‖2.

Then similarly as for the two previous parameter choice rules, we use (2.23)with p = 1 to complete the proof.

We also bound the ψ-functional acting on the noise from above (cf. [83,87]):


ψ(α, y − yδ) ≤ C

δ√α, if ψ ∈ ψHD, ψHR

‖xδα − xα‖, if ψ ∈ ψQO, ψL,


Proof. For ψ = ψHD, one can immediately estimate that

ψ2HD(α, y − yδ) =

1

α

∫ ∞0

α2

(λ+ α)2d‖Fλ(y − yδ)‖2

≤ 1

α

∫ ∞0

d‖Fλ(y − yδ)‖2 ≤ δ2

α.

If ψ = ψHR, then

ψ2HR(α, y − yδ) =

∫ ∞0

α2

(λ+ α)3d‖Fλ(y − yδ)‖2

≤∫ ∞

0

1

λ+ αd‖Fλ(y − yδ)‖2 ≤ C

δ2

α.

For ψ = ψQO, we have

ψ2QO(α, y − yδ) =

∫ ∞0

α2λ

(λ+ α)4d‖Fλ(y − yδ)‖2

=

∫ ∞0

α2

(λ+ α)2· λ

(λ+ α)2d‖Fλ(y − yδ)‖2

≤∫ ∞

0

λ

(λ+ α)2d‖Fλ(y − yδ)‖2 = ‖xδα − xα‖2,

and for ψ = ψL, the estimate follows similarly as for the quasi-optimalityfunctional, since the filter function for the simple-L rule satisfies

Φα(λ) =αλ

(α + λ)3≤ λ

(α + λ)2,

thus completing the proof.

43

Now, the next proposition proves that the ψ-functionals may be boundedfrom above by the approximation error or, at least, a similar estimate:

Proposition 12. If ψ ∈ ψHR, ψQO, then we have

ψ(α, y) ≤ C‖xα − x†‖. (2.24)

If ψ ∈ ψHD, ψL and (2.5) holds, then

ψ(α, y) ≤

√∫ ∞0

α

λ+ αd‖Eλx†‖2 ≤ Cαµ,

(µ ≤ 1

2

)for all α ∈ (0, αmax) and y ∈ Y .

Proof. Let ψ = ψHD. Then

ψ2HD(α, y) =

∫ ∞0

αλ

(λ+ α)2d‖Eλx†‖2 =

∫ ∞0

λ

λ+ α

α

λ+ αd‖Eλx†‖2

≤∫ ∞

0

α

λ+ αd‖Eλx†‖2,

from which the result follows thanks to (2.5), courtesy of the fact that λλ+α≤

1 for all λ ≥ 0. Note that the rest of the estimates in this proof also followfrom the aforementioned upper bound.For ψ = ψHR, it easily follows that

ψ2HR(α, y) =

∫ ∞0

α2λ

(λ+ α)3d‖Eλx†‖2 ≤

∫ ∞0

α2

(λ+ α)2

λ

λ+ αd‖Eλx†‖2

≤ ‖xα − x†‖2.

If ψ = ψQO, then it is immediate that

ψ2QO(α, y) =

∫ ∞0

α2λ2

(λ+ α)4d‖Eλx†‖2 =

∫ ∞0

α2

(λ+ α)2

λ2

(λ+ α)2d‖Eλx†‖2

≤ ‖xα − x†‖2.

For ψ = ψL, it follows easily that

ψ2L(α, y) =

∫ ∞0

αλ2

(λ+ α)3d‖Eλx†‖2 ≤

∫ ∞0

α

λ+ αd‖Eλx†‖2,

and this expression is the order of αµ (for µ ≤ 12) which follows similarly via

(1.20), and this completes the proof, as the remaining details are identical asfor the HD rule.

44

Remark. We remark that in order to bound the HD functional from aboveby the approximation error, as one can do for the HR and QO rules, thefollowing regularity condition was considered in [83]:∫ ∞

α

λα2

(λ+ α)2d‖Eλx†‖2 ≤ Cα

∫ ∞α

α2

(λ+ α)2d‖Eλx†‖2, (2.25)

for all α ∈ (0, αmax), with which one can prove:

ψ2HD(α, y) =

(∫ α

0

+

∫ ∞α

)αλ


≤∫ α

0

α2

(λ+ α)2d‖Eλx†‖2 +

∫ ∞α

α2

(λ+ α)2

λ

αd‖Eλx†‖2

(2.25)

≤∫ α

0

α2

(λ+ α)2d‖Eλx†‖2 + C

∫ ∞α

α2


≤ max1, C‖xα − x†‖2.

However, the inequality (2.25) is very restrictive as it is rarely satisfied inpractice, so we will not consider it in general.

In the absence of any source conditions, we now have all the tools to provethat when the parameter is selected according to a heuristic rule, the corre-sponding total error converges as the noise level decays to zero:

Theorem 5. Letα∗ = argmin

α∈(0,αmax)

ψ(α, yδ),

with ψ ∈ ψHD, ψHR, ψQO, ψL and let

y − yδ ∈

N1, if ψ ∈ ψHD, ψHR,N2, if ψ ∈ ψQO, ψL,

and suppose ‖yδ‖ 6= 0 in case ψ ∈ ψHD, ψHR, and ‖A∗yδ‖ 6= 0 in caseψ ∈ ψQO, ψL. Then

‖xδα∗ − x†‖ → 0,

as δ → 0.

Proof. We have

‖xδα∗ − x†‖ ≤ ‖xδα∗ − xα∗‖+ ‖xα∗ − x†‖

(2.18)= O

(ψ(α∗, y − yδ) + ‖xα∗ − x†‖

)(1.39)= O

(ψ(α, yδ) + ψ(α∗, y) + ‖xα∗ − x†‖

),

where we have used the optimality of α∗; namely, that ψ(α∗, yδ) ≤ ψ(α, yδ)

for all α ∈ (0, αmax) (cf. (1.32)).

45

If ψ ∈ ψQO, ψL, then it follows from Proposition 12 that

‖xδα∗ − x†‖ = O

(ψQO/L(α, yδ) + ‖xα∗ − x†‖

)= O

(ψQO/L(α, y − yδ) + ψQO/L(α, y) + ‖xα∗ − x†‖

)= O

(‖xδα − xα‖+ ‖xα − x†‖+ φ(α∗)

),

with a function φ satisfying φ(α)→ 0 as α→ 0 (cf. (1.21)), and since α∗ → 0due to Corollary 2, it follows that choosing any α = α(δ) such that α → 0and δ2/α→ 0 as δ → 0 yields the desired result.If ψ = ψHD, it also follows from Propositions 11 and 12, that

‖xδα∗ − x†‖ = O

(δ√α

+ α + α∗

),

and the result follows selecting α as before (cf. Corollary 2).

If we now consider the source condition (2.5), then, in addition to being ableto prove convergence of the error, we may also prove (suboptimal) conver-gence rates. However, before doing so, we first give the following propositionwhich, like Proposition 9, provides an estimate for the ψ-functionals frombelow, this time by the approximation error and in terms of the smoothnessindex µ:

Proposition 13. Let (2.5) hold with µ ∈ [0, 1) and assume ‖A∗Ax†‖ 6= 0.Then

‖xα − x†‖ ≤ C

ψ2µ(α, y), if ψ ∈ ψHD, ψL and µ ≤ 1

2,

ψµ(α, y), if ψ ∈ ψHR, ψQO,

for all α ∈ (0, αmax) and y ∈ Y .

Proof. If ψ = ψL, we can estimate

ψ2L(α, y) =

∫ ∞0

αλ2

(λ+ α)3d‖Eλx†‖2 ≥ α

(‖A‖2 + αmax)3

∫ ∞0

λ2 d‖Eλx†‖2

= C1‖(A∗A)x†‖2α,

and conversely, we have that

‖xα − x†‖ ≤ C2αµ ≤ C2

(1

C1‖(A∗A)x†‖2ψ2

L(α, y)

)µ=

C

‖(A∗A)x†‖2µψ2µ

L (α, y),

If ψ = ψQO, we estimate, analogously,

ψ2QO(α, y) =

∫ ∞0

α2λ2

(λ+ α)4d‖Eλx†‖2 ≥ C‖(A∗A)x†‖2α2,

46

by which we similarly arrive at the estimate:

‖xα − x†‖ ≤ Cαµ ≤ CψµQO(α, y).

Note that the proofs for the heuristic discrepancy and Hanke-Raus rule followin exactly the same fashion. The rates for the HD rule match that of theL rule and the Hanke-Raus matches the quasi-optimality rule’s rates in thisinstance due to the order of the α factor in the respective filter functions.

We remark that in the above proposition, we can see that the HD and L rulesboth exhibit the early saturation effect, as the result only holds for µ ≤ 1

2

rather than 1. This also has a “knock-on” effect on the subsequent results,namely, the main convergence rates result of this section, which we providevia the following corollary:

Corollary 3. In addition to the conditions of Theorem 5, let the sourcecondition (2.5) hold with µ ∈ [0, 1). Then

‖xδα∗ − x†‖ =

O(δ

4µ2µ+1

µ), if ψ ∈ ψHD, ψL and µ ≤ 1

2,

O(δ

2µ2µ+1

µ), if ψ ∈ ψHR, ψQO,

as δ → 0.

Proof. Similar as in the proof of the previous theorem, we have

‖xδα∗ − x†‖ = O

(ψ(α, yδ) + ψ(α∗, y) + ‖xα∗ − x†‖

),

which follows from the optimality of α∗, cf. (1.32) and the inequality (1.39).For ψ ∈ ψHD, ψL, by Propositions 11 and 12, we have

‖xδα∗ − x†‖ = O

(δ√α

+ αµ + αµ∗

)with µ ≤ 1

= O(δ

2µ2µ+1 + ψ2µ

HD/L(α, yδ))

= O(δ

4µ2µ+1

µ),

where the second equality follows from the fact that α∗ ≤ ψ2HD/L(α∗, y

δ) with

µ ≤ 12

(cf. Prop. 9).For ψ ∈ ψHR, ψQO, the proofs follow similarly, except that since α∗ ≤ψ(α∗, y

δ) for those rules, we get

‖xδα∗ − x†‖ = O

(δ

2µ2µ+1 + ψµHR/QO(α, yδ)

)= O

(δ

2µ2µ+1

µ),

thereby completing the proof.

47

Remark. Note that the heuristic discrepancy and simple-L rules exhibitdifferent rates to the Hanke-Raus and quasi-optimality rules. Moreover, ifone wanted to prove that, say the HD rule satisfied the sharper bound (2.24),then one would have to require an additional condition, namely (2.25), whichwe have already stated is somewhat restrictive (and thus we do not considerit in general). For certain problems, however, the HD and simple-L rulesare optimal, e.g., whenever µ = 1

2. On the other hand, for µ = 1, the QO

and HR rules yield optimal rates. Thus, for solutions with low smoothness,the HD and L rules in fact offer themselves as the better alternatives. Fornumerical implementations and results of the aforementioned rules, we referthe reader to [9, 53,126].

As stated already, the convergence rates of the previous corollary are subop-timal and in order to prove optimal rates (i.e., rates which match (2.8)), werequire an additional regularity condition [83,87]:

α2

∫ ∞α

λ−2 d‖Eλx†‖2 ≥ C

∫ α

0

d‖Eλx†‖2, (2.26)

for all α ∈ (0, αmax). Note, however, that we also have the weaker condition(cf. [91]): ∫ α

0

d‖Eλx†‖2 ≤ Cα

∫ ∞α

λ−1 d‖Eλx†‖2, (2.27)

which suffices for the HD and simple L-curve rules in the following Lemma 1.Condition (2.27) is the weaker of the above two conditions in the sense that(2.26) implies (2.27).Considering the addition of the latest regularity condition, we now statethe following lemma which allows us to prove that the ψ-functional for theconsidered heuristic parameter choice rules may be bounded from below bythe approximation error [83, 87]:

Lemma 1. For µ ∈ [0, 1), let (2.26) hold for ψ ∈ ψHR, ψQO and (2.27)hold for ψ ∈ ψHD, ψL. Then we have

ψ(α, y) ≥ C‖xα − x†‖,for all y ∈ Y and α ∈ (0, αmax).

Proof. First, we decompose the spectral form of the approximation error asthe sum of two integrals:

‖xα − x†‖2 =

(∫ α

0

+

∫ ∞α

)α2

(λ+ α)2d‖Eλx†‖2.

Now, similarly as in [87], we estimate the second integral of the sum as:∫ ∞α

α2

(λ+ α)2d‖Eλx†‖2 ≥ Cα2

∫ ∞α

λ−2 d‖Eλx†‖2

(2.26)

≥ C

∫ α

0

d‖Eλx†‖2 ≥ C

∫ α

0

α2

(λ+ α)2d‖Eλx†‖2. (2.28)

48

Thus, with the above, we obtain an upper bound for the overall approxima-tion error:

‖xα − x†‖2 ≤ C

∫ ∞α

α2

(λ+ α)2d‖Eλx†‖2.

Hence, for all ψ ∈ ψHD, ψHR, ψQO, ψL, it suffices to prove that

ψ2(α, y) ≥∫ ∞α

α2

(λ+ α)2d‖Eλx†‖2.

For ψ = ψQO, we can estimate

ψ2QO(α, y) ≥

∫ ∞α

α2λ2

(λ+ α)2d‖Eλx†‖2 ≥ C

∫ ∞α

α2

(λ+ α)2d‖Eλx†‖2,

and similarly for ψ = ψHR:

ψ2HR(α, y) ≥

∫ ∞α

α2

(λ+ α)3λ d‖Eλx†‖2 ≥

∫ ∞α

α2

(λ+ α)2d‖Eλx†‖2.

Note that for ψ ∈ ψHD, ψL, we may use the weaker condition (2.27). Inparticular, we can bound∫ α

0

α2

(λ+ α)2d‖Eλx†‖2 ≤

∫ α

0

α2

α2d‖Eλx†‖2 ≤

∫ α

0

d‖Eλx†‖2. (2.29)

Thus,

‖xα − x†‖2 ≤∫ α

0

α2

(λ+ α)2d‖Eλx†‖2 + C

∫ ∞α

λ

α

α2


≤ C

∫ ∞α

λ

α

α2

(λ+ α)2d‖Eλx†‖2,

where the first inequality follows from the fact that λ/α ≥ 1 for all λ ≥ α,and the second inequality follows from a combination of (2.29) and (2.27).For ψ = ψHD, the proof follows from the observation that

ψ2HD(α, y) ≥

∫ ∞α

αλ

(λ+ α)2d‖Eλx†‖2 ≥

∫ ∞α

α2

(λ+ α)2d‖Eλx†‖2,

whereas for ψ = ψL, we observe that

ψ2L(α, y) ≥

∫ ∞α

αλ2

(λ+ α)3d‖Eλx†‖2 ≥ 1

2

∫ ∞α

α

λd‖Eλx†‖2,

from which the result follows via (2.27).

49

Now we state the optimal convergence rates theorem for the heuristic pa-rameter choice rules which, more or less, combines all of the above results inthis section:

Theorem 6. Let the source condition (2.5) hold with µ ∈ [0, 1), with theadditional restriction that µ ≤ 1

2for ψ ∈ ψHD, ψL, and let y − yδ ∈ N1

for ψ = ψHD, ψHR and y − yδ ∈ N2 whenever ψ ∈ ψL, ψQO. Suppose,in addition, that the regularity condition (2.27) holds for ψ ∈ ψHD, ψL andthat (2.26) holds for ψ ∈ ψHR, ψQO. Then

‖xδα∗ − x†‖ = O

(δ

2µ2µ+1

),

as δ → 0.

Proof. We first assume that for δ sufficiently small there exists an α ∈(0, αmax) such that

‖xδα − xα‖+ ‖xα − x†‖ = infα∈(0,αmax)

‖xδα − xα‖+ ‖xα − x†‖

.

Assume α ≥ α∗. Then it follows, since α 7→ ‖xα − x†‖ is a monotonicallyincreasing function, that ‖xα∗ − x†‖ ≤ ‖xα − x†‖. Now, we estimate

‖xδα∗ − xα∗‖ = Cψ(α∗, y − yδ) ≤ C(ψ(α, yδ) + ψ(α∗, y)

)≤ C

(ψ(α, y − yδ) + ψ(α, y) + αµ∗

), if ψ ∈ ψHD, ψL, µ ≤

1

2,(

ψ(α, y − yδ) + ψ(α, y) + ‖xα∗ − x†‖), if ψ ∈ ψHR, ψQO.

=

O(

δ√α

+ αµ + αµ∗

), if ψ = ψHD,

O(‖xδα − xα‖+ αµ + αµ∗

), if ψ = ψL,

O(

δ√α

+ ‖xα − x†‖+ ‖xα∗ − x†‖), if ψ = ψHR,

O(‖xδα − xα‖+ ‖xα − x†‖+ ‖xα∗ − x†‖

), if ψ = ψQO.

=

O(

δ√α

+ αµ), if ψ = ψHD,

O(‖xδα − xα‖+ αµ

), if ψ = ψL,

O(

δ√α

+ ‖xα − x†‖), if ψ = ψHR,

O(‖xδα − xα‖+ ‖xα − x†‖

), if ψ = ψQO,

where we have used the estimates for the ψ-functionals from Propositions 11,12, and for α ≥ α∗, we recall ‖xα∗ − x†‖ ≤ ‖xα − x†‖ and αµ∗ ≤ αµ.

50

Moreover, continuing with the case that α < α∗:

‖xα∗ − x†‖ ≤ Cψ(α∗, y) ≤ C(ψ(α, yδ) + ψ(α∗, y − yδ)

)

=

O(

δ√α

+ αµ +δ√α∗

), if ψ = ψHD,

O(‖xδα − xα‖+ αµ + ‖xδα∗ − xα∗‖

), if ψ = ψL,

O(

δ√α

+ ‖xα − x†‖+δ√α∗

), if ψ = ψHR,

O(‖xδα − xα‖+ ‖xα − x†‖+ ‖xδα∗ − xα∗‖

), if ψ = ψQO,

=

O(

δ√α



), if ψ = ψL,

O(

δ√α

+ ‖xα − x†‖), if ψ = ψHR,

O(‖xδα − xα‖+ ‖xα − x†‖

), if ψ = ψQO,

where we have used the estimate from Lemma 1 which give a bound forthe approximation error by the ψ-functionals , and for α < α∗, since α 7→‖xδα−xα‖ is a monotonically decreasing function, it follows that ‖xδα∗−xα∗‖ ≤‖xδα − xα‖. Thus, for all α ∈ (0, αmax), we have

‖xδα∗ − x†‖ =

O(

δ√α



), if ψ = ψL,

O(

δ√α

+ ‖xα − x†‖), if ψ = ψHR,

O(‖xδα − xα‖+ ‖xα − x†‖

), if ψ = ψQO.

= O(δ

2µ2µ+1

), as δ → 0,

which completes the proof.

Remark. One may observe in the proof of the above theorem that the quasi-optimality rule actually satisfies

‖xδα∗ − x†‖ ≤ C inf

α∈(0,αmax)

‖xδα − xα‖+ ‖xα − x†‖

,

which is quite remarkable as the upper bound above is almost the mostoptimal rate that is possible.

We may organise the above four rules into a table in terms of early saturationand required noise restriction as follows:

Early Saturation: µ ≤ 12

No Early Saturation: µ < 1Np with p = 1 HD HRNp with p = 2 L QO

51

where the required noise restrictions were demonstrated in Proposition 10and the early saturation effect of the HD and L rules was first observedwhen bounding the approximation error in terms of the smoothness indexµ. Namely, we saw in Proposition 13 that we had to restrict the range ofpossible µ to µ ∈ [0, 1

2). The Hanke-Raus and quasi-optimality rules are thus

a kind of “remedy” to this effect for the heuristic discrepancy and L rules,respectively, as they are closely related. We remark that for a-posterioriparameter choice rules, the discrepancy (1.29) and improved discrepancy(1.40) principles have a similar relationship in that the latter is also the“remedy” for the former (with respect to saturation). Returning to thetopic of heuristic parameter choice rules, we mention that in spite of being“remedies” regarding saturation, the noise restrictions required for the L andquasi-optimality rules are more stringent compared to the somewhat weakerrestriction required for the convergence of the other two rules.

2.2 Weakly Bounded Noise

This section is largely analogous with the paper [89]. In this scenario, wedeviate from the classical theory somewhat and consider the case in whichthe noise is unbounded in the image space, but satisfies another constraintwhich qualifies it to be named as in the title of this section. We describe thisin the following definition:

Definition 8. If the noise e = y − yδ satisfies

τ := ‖(AA∗)pe‖ <∞, with p ∈[0,

1

2

],

then we say that the noise is weakly bounded [29,30]. That is to say that thenoise belongs to the Hilbert space Z, which can be defined as the completionof rangeA with respect to ‖(AA∗)p · ‖ complemented by rangeA⊥ (cf. [89]).

The above noise model may be interpreted as a deterministic treatment of thewhite noise model, which is studied extensively in the stochastic treatmentof ill-posed problems [88].The question of whether one may still regularise, in the determistic sense, inthe presence of this large, i.e., weakly bounded noise can be verified by theproceeding proposition [51]:

Proposition 14. Suppose that e ∈ Z. Then the Tikhonov regularised solu-tion xδα ∈ X is well defined.

Proof. Notice that

‖xδα‖ = ‖A∗(AA∗ + αI)−1yδ‖≤ ‖A∗(AA∗ + αI)−1y‖+ ‖A∗(AA∗ + αI)−1(y − yδ)‖.

52

The first term is well defined, thus we proceed to show that the second termis also finite

‖(AA∗)1/2(AA∗ + αI)−1(y − yδ)‖2 =

∫ ∞0

1

(λ+ α)2d‖Fλ(AA∗)1/2(y − yδ)‖2

≤ supλ∈(0,‖A‖2)

1

(λ+ α)2‖(AA∗)1/2(y − yδ)‖2 =

τ 2

α2<∞,

with p = 12, which proves existence, since ‖(A∗A)

12 e‖ ≤ C‖(A∗A)pe‖ for

p ≤ 12.

We would like to replicate the analogous data error estimates as we had incase the noise was bounded in Y . Indeed, the following error estimates arecourtesy of [28,112]:

Proposition 15. There exist constants such that

‖xδα − xα‖ ≤ Cτ

αp+12

and ‖A(xδα − xα)‖ ≤ Cτ

αp, (2.30)

for all y, yδ ∈ Z and α ∈ (0, αmax).

Proof. We have

‖xδα − xα‖2 =

∫ ∞0

λ1−2p

(λ+ α)2λ2p d‖Fλ(yδ − y)‖2

≤ Cp1

α2p+1‖(AA∗)p(yδ − y)‖2 = Cp

τ 2

α2p+1.

Moreover,

‖A(xδα − xα)‖2 =

∫ ∞0

λ2−2p

(λ+ α)2λ2p d‖Fλ(yδ − y)‖2

≤ Cp1

α2p‖(AA∗)p(yδ − y)‖2 = Cp

τ 2

α2p,

where Cp are positive constants depending on p.

The observant reader will notice that the above proposition has omitted ap-proximation error estimates and will realise that as those may be estimatedindependently of the noise, they are identical to the ones we derived in Propo-sition 6. Thus, with the error estimates, one can even prove convergence ofthe Tikhonov regularised solution, even in the weakly bounded noise case:

Theorem 7. Choosing α = α(τ) such that α(τ)→ 0 and τ/α(τ)p+12 → 0 as

τ → 0, we have‖xδα − x†‖ → 0,

as τ → 0.

53

Proof. The estimate for the data propagation error of the previous proposi-tion and the classical estimate for the approximation error allows us to boundthe error as

‖xδα − x†‖ ≤ Cτ

αp+12

+ Cα→ 0, (2.31)

as τ → 0, if we choose α such that τ/αp+12 → 0 and α→ 0 as τ → 0.

In order to prove convergence rates (which, in this instance, will be in termsof τ), we recall from the usual theory that we require a source condition. Tothis end, we assume once more that (2.5) holds [28,30]:

Corollary 4. Let x† satisfy the source condition (2.5) with µ ∈ [0, 1). Then

choosing α = α(τ) = αopt ∼ τ2

2µ+2p+1 , we have

‖xδα − x†‖ = O(τ2µ

2µ+2p+1 ),

as τ → 0.

Proof. If we take the infimum of (2.31) over α, then we get α ∼ τ2

2µ+2p+1 .Indeed, with this a-priori choice of parameter, one gets the convergence rateabove.

What we observe is that the above results are true generalisations of thestandard theory in the sense that setting p = 0 so that rather than e ∈ Z, wehave e ∈ Y , the results are analogous with the standard ones (see Section 2.1).

2.2.1 Modified Parameter Choice Rules

In this setting, the discrepancy is not well defined and for this reason, weintroduce, from [89]

ψMHD(α, yδ) : =1

αq+12

‖(AA∗)q(Axδα − yδ)‖, (2.32)

ψMHR(α, yδ) : =1

αq+12

〈(AA∗)q(AxIIα,δ − yδ), (AA∗)q(Axδα − yδ)〉12 , (2.33)

as the modified heuristic discrepancy and Hanke-Raus rules, respectively,with q ≥ 0, a parameter to be specified. The sharp eyed reader will noticethe lack of a modified quasi-optimality rule; but this rule need not be changedas it lived in the domain space and is therefore well defined in any case. TheL-curve rule has also been omitted, for the same reason, as well as the factthat its analysis was not included in [89] as it was introduced in a laterpaper (cf. [91]), and we also forgo that task of analysing it here in the weakly

54

bounded noise case. The linear filter functions (2.11) may be generalised toinclude the modified parameter functionals above; in particular

Φq,mα (λ) :=

αmλ2q

α2q(λ+ α)m+1,

with

q ≥ p and m = 1, if ψ = ψMHD,

q ≥ p and m = 2, if ψ = ψMHR,

q = 12

and m = 3, if ψ = ψQO.

(2.34)

Noise Restriction We may also generalise the Muckenhoupt inequalities,i.e., the set of permissible noise as

Nν :=

e ∈ Z | αν+1

∫ ∞α


∫ α

0

λν d‖Fλe‖2 ∀ α ∈ (0, αmax)

,

(2.35)where ν = 1 when ψ = ψQO and ν = 2q whenever ψ ∈ ψMHD, ψMHR [2].In the following, we state some examples in which the condition (2.35) holds

Case Study of Noise Restriction Note that in the classical situationof (strongly) bounded noise, it has been verified that (2.35) is satisfied intypical situations [87]. Moreover, for coloured Gaußian noise, (2.35) holdsalmost surely for mildly ill-posed problems [88].Suppose A is compact such that AA∗ has eigenvalues λi with polynomialdecay, and we assume a certain polynomial decay or growth of the noisee = yδ − y with respect to the eigenfunctions of AA∗, denoted by ui:

λi =1

iγ, γ > 0, and |〈yδ − y, ui〉|2 = κ

1

iββ ∈ R, κ > 0. (2.36)

Then

‖yδ − y‖2 = κ

∞∑i=1

1

iβ, τ 2 = ‖(AA∗)p(yδ − y)‖2 = κ

∞∑i=1

1

iβ+2pγ.

If we consider the case of unbounded but weakly bounded noise, i.e., ‖yδ −y‖2 =∞ but τ <∞, then the exponents β, p should thus satisfy

β ≤ 1 and β + 2pγ > 1, thus β ∈ (1− 2pγ, 1].

The inequality in (2.35) can then be written as

καν+1∑

1≤i≤α−1γ

iγ−β = αν+1∑λi≥α

1

λi|〈yδ − y, ui〉|2

≤ C∑λi≤α

λνi |〈yδ − y, ui〉|2 = Cκ∑i≥α−

1γ

1

iγν+β.

55

Defining N∗ = α−1γ , we have

∑1≤i≤α−

1γ

iγ−β ≤∫ N∗

1

xγ−β dx ≤ C

Nγ−β+1∗ if γ − β > −1,

1 if γ − β < −1,

and ∑i≥α−

1γ

1

iγν+β∼∫ ∞N∗

1

xγν+βdx ∼

C

Nγν+β−1∗

if γν + β > 1,

∞ if γν + β ≤ 1.

Since α = N−γ∗ , we arrive at the sufficient inequality

N−γ(ν+1)+1+γ−β∗ ≤ CN1−γν−β,

in the case that γ−β > −1 and γν+β > 1. Since the exponents match, thenoise condition is then satisfied. If γν + β ≤ 1, then the inequality is clearlysatisfied because of the divergent right-hand side. Thus, the noise conditionholds for

β < γ + 1.

Roughly speaking, this means that the noise should not be too regular (rela-tive to the smoothing of the operator). In particular, the deterministic modelof white noise, where β = 0 (no decay) satisfies a noise condition if the op-erator is smoothing. Most importantly, the assumption of a noise condition(2.35) is compatible with a weakly bounded noise situation.


The convergence analysis of regularisation methods with standard (non-heuristic) parameter choice rules in the weakly bounded noise setting is wellestablished (cf. [28]). For the analysis of heuristic rules, we follow the exam-ple of [89] and prove the following estimates for the associated functionals:

Lemma 2. Let x† satisfy (2.5) with µ ∈ [0, 1). Then

• If y − yδ ∈ N1, then there exists a positive constant C such that

C‖xδα − xα‖ ≤ ψQO(α, y − yδ) ≤ ‖xδα − xα‖, (2.37)

ψQO(α, y) ≤ ‖xα − x†‖; (2.38)

• if y − yδ ∈ N2q, with q ∈ minp + 1, 12− µ, then there exist positive

constants such that

‖xδα − xα‖ ≤ ψMHD(α, y − yδ) ≤ Cτ

αp+12

, (2.39)

ψMHD(α, y) ≤ Cαµ; (2.40)

56

• if y − yδ ∈ N2q and q ≤ minp + 32, 1 − µ, then there exist positive

constants such that

C‖xδα − xα‖ ≤ ψMHR(α, y − yδ) ≤ Cτ

αp+12

, (2.41)

ψMHR(α, y) ≤ Cαµ, (2.42)


Proof. In the following, we utilise the following useful estimate: that fort ≥ 0, there exists a positive constant such that

λt

α + λ≤ C

αmax1−t,0 , (2.43)

for all α, λ ≥ 0.The estimates (2.37),(2.38) do not require the weakly boundedness conditionon the noise and therefore the proofs are nigh on identical to the ones foundin Chapter 2, Propositions 10, 11 and (12).For m > 0 and with (2.43), we obtain

λ2q

α2q

αm

(λ+ α)(m+1)≤ λ2p

α2q−m

(λ

2(q−q)m+1

λ+ α

)m+1

≤ Cλ2p 1

α2q−m+max1− 2(q−q)m+1

,0(m+1)= C

λ2p

αmax1+2p,2q−m = Cλ2p

α1+2p,

if q < m+12

+ p. Moreover,

λ2q

α2q

αm

(λ+ α)(m+1)λ1+2µ ≤ 1

α2q−m

(λ

1+2µ+2qm+1

λ+ α

)m+1

≤ C1

α2q−m+max1− 1+2µ+2qm+1

,0(m+1)= C

1

αmax−2µ,2q−m ≤ Cα2µ,

if q < m2− µ. Using the spectral representation, the upper estimates (2.39)

(using m = 2) and (2.41) (using m = 3) follow from the first estimate, while(2.40) and (2.42) are obtained similarly by the second one.For the lower bound, we estimate

ψ2MHD(α, y − yδ) =

1

α2q+1

∫ ‖A‖2+

0

λ2q α2

(λ+ α)2d‖Fλ(y − yδ)‖2

≥ C1

α2q+1

∫ α

0

λ2q d‖Fλ(y − yδ)‖2 + C1

α2q−1

∫ ‖A‖2+

α

λ2q−2d‖Fλ(y − yδ)‖2,

(2.44)

57

for all α ∈ (0, αmax), since for λ ≤ α, it follows that α/(λ + α) ≥ C and forλ ≥ α, one has α/(λ+ α) ≥ Cα/λ, yielding the estimate above. Conversely,

‖xδα − xα‖2 =

∫ α

0

λ

(λ+ α)2d‖Fλ(y − yδ)‖2

≤ C1

α

∫ α

0

λ

αd‖Fλ(y − yδ)‖2 + C

∫ ‖A‖2+

α

λ−1 d‖Fλ(y − yδ)‖2.

(2.45)

Since 2q − 1 ≤ 0, we observe that the term with∫ α

0in the above inequal-

ity is bounded by the corresponding term in (2.44). Thus, using the noisecondition, the second term can be bounded by the first one of (2.44).For the lower bound of the modified Hanke-Raus functional, we can estimate

λ2q

α2q

α2

(λ+ α)3≥ C

λ2q

α2q+1if λ ≤ α,

λ2q−3

α2q−2if λ ≥ α.

Now, using N2q and (2.45), we can estimate ‖xδα − xα‖ by the part ofψMHR(α, y − yδ) restricted to λ ≤ α. The part for λ ≥ α can then beestimated from below by 0, i.e.,

ψMHR(α, y − yδ) ≥ C1

α2q+1

∫ α

0

λ2q d‖Fλ(y − yδ)‖2

(2.35)

≥ C

∫ ‖A‖2+

α

λ−1 d‖Fλ(y − yδ)‖2,


Note that our parameters satisfy µ ≥ 0 and p ∈ [0, 12], hence, the restrictions

on the parameter q reduce to q ≤ 12− µ or q ≤ 1 − µ for the modified

heuristic discrepancy or Hanke-Raus rules, respectively. Since, by definition,q ≥ p must hold in any case, we have as a restriction to the smoothnessindex that µ ∈ [0, 1

2− p] for ψMHD and µ ∈ [0, 1− p] for ψMHR, respectively.

Only then, there exists a possible choice for q that satisfies the conditionsof the previous lemma. Observe that the interval for µ is smaller for ψMHD,which also illustrates a saturation effect of the discrepancy-based rules whichis well-known in the standard noise case, i.e., p = 0 (see Section 2.1).

Theorem 8. Let x† satisfy the source condition (2.5) and in addition, sup-pose that

y − yδ ∈ N1 and A∗y 6= 0, ψ = ψQO,

(y − yδ) ∈ N2q, µ ∈ [0, 12− p], q ∈ [p, 1

2− µ], (AA∗)qy 6= 0, ψ = ψMHD,

(y − yδ) ∈ N2q, µ ∈ [0, 1− p], q ∈ [p, 1− µ], (AA∗)qy 6= 0, ψ = ψMHR.

58

Then

‖xδα∗ − x†‖ =

O(τ

2µ2µ+2p+1

µ), if ψ = ψQO,

O(τ

2µ2µ+2p+1

· 2µ1−2q

), if ψ = ψMHD,

O(τ

2µ2µ+2p+1

· µ1−q

), if ψ = ψMHR,

as τ → 0.

Proof. We treat the different parameter choice rules separately:

• From the definition of α∗ and the triangle inequality, it follows, with

α = τ2

2µ+2p+1 , that

ψQO(α∗, yδ) ≤ ψQO(α, yδ) ≤ ψQO(α, yδ − y) + ψQO(α, y)

≤ ‖xα − x†‖+ ‖xδα − xα‖ ≤ Cαµ + Cτ

αp+12

= O(τ

2µ2µ+2p+1

).

By the triangle inequality, (2.37) and (2.38) of Lemma 24,

‖xδα∗ − x†‖ ≤ ‖xα∗ − x†‖+ ‖xα∗ − xδα∗‖

= O(‖xα∗ − x†‖+ ψQO(α∗, y − yδ)

)≤ O

(‖xα∗ − x†‖+ ψQO(α∗, y

δ) + ψQO(α∗, y))

= O(αµ∗ + τ

2µ2µ+2p+1

).

Note that

ψ2QO(α, yδ) ≥ α2

∫ ‖A‖20

λ

(λ+ ‖A‖2)4d‖Fλyδ‖2

≥ α2 1

(2‖A‖2)4

∫ ‖A‖20

λ d‖Fλyδ‖2

≥ α2 1

(2‖A‖2)4

(‖A∗y‖ − ‖AA∗‖

12−pτ)2

≥ Cα2,

(2.46)

for all α ∈ (0, αmax) and τ sufficiently small. Hence for α = α∗, it

follows that α∗ ≤ Cτ2µ

2µ+2p+1 . Therefore, we may deduce that

‖xδα∗ − x†‖ = O(τ

2µ2µ+2p+1

µ),

as τ → 0.

• Note that from (AA∗)qy 6= 0, we may conclude, as in (2.46), that

α∗ ≤ C

(ψMHD(α, yδ)

112−q

)= O

(τ

2µ2µ+2p+1

21−2q

).

59

Then it follows, as above, from (2.39) and (2.40), that

‖xδα∗ − x†‖ ≤ ‖xα∗ − x†‖+ ‖xα∗ − xδα∗‖ = O(αµ∗ + ψMHD(α∗, y − yδ))

= O(αµ∗ + αµ +

τ

α12

+p

)= O

(τ

2µ2µ+2p+1

2µ1−2q + τ

2µ2µ+2p+1

),

as τ → 0.

• One may similarly verify that if ‖(AA∗)qy‖ ≥ C, then

α∗ ≤ CψMHR(α∗, yδ)

11−q .

Therefore,

‖xδα∗ − x†‖ ≤ ‖xδα∗ − xα∗‖+ ‖xα∗ + x†‖

= O(αµ∗ + ψMHR(α, y) + ψMHR(α, y − yδ)

)= O

(ψMHR(α, yδ)

µ1−q + τ

2µ2µ+2p+1

)= O

(τ

2µ2µ+2p+1

µ1−q

).

For the quasi-optimality rule, one may notice that the above convergencerates are optimal for the saturation case µ = 1, but they are only suboptimalfor µ < 1 (similarly as in Section 2.1).Let us further discuss the assumptions in this theorem: for the modifiedheuristic discrepancy rule, the first condition on q is not particularly restric-tive. However, the requirement q ≤ 1

2−µ implies that µ ≤ 1

2−q, which means

that we obtain a saturation at µ = 12− q. This is akin to the bounded noise

case (q = 0), where this method saturates at µ = 12

(see Section 2.1). It iswell known that a similar phenomenon occurs for the non-heuristic analogueof this method, namely the discrepancy principle [35].In contrast to the modified discrepancy rule, we observe that the saturationfor the modified Hanke-Raus rule occurs at µ = 1 − q. Hence, again analo-gous to the bounded noise case (and to the non-heuristic case), the modifiedHanke-Raus method yields convergence rates for a wider range of smoothnessclasses.We may, however, impose an additional condition as before in order to achievean optimal convergence rate. More specifically, since it was independent ofthe noise, we may consider the condition from the standard theory describedbefore; namely, (2.26).

Theorem 9. Let the assumptions of the previous theorem hold and let α∗be selected according to either the quasi-optimality, modified heuristic dis-crepancy, or the modified Hanke-Raus rule. Then, assuming the regularitycondition (2.26), it follows that

‖xδα∗ − x†‖ = O

(τ

2µ2µ+2p+1

),

as τ → 0.

60

Proof. The proof for the quasi-optimality rule is analogous to the standardtheory (see Corollary 6), so therefore we omit it. For the modified heuristicdiscrepancy and Hanke-Raus rules, we show that the regularity conditionimplies that ψ(α, y) ≥ C‖xα − x†‖. Recall that

‖xα − x†‖2 =

∫ ‖A‖2+

0

α2

(α + λ)2d‖Eλx†‖2

≤ C

∫ α

0

d‖Eλx†‖2 + Cα2

∫ ‖A‖2+

α

1

λ2d‖Eλx†‖2. (2.47)

For λ ≥ α we have the following estimate for m > 0 with a constant Cq,m:(λ

α

)2qλαm

(λ+ α)m+1≥(λ

λ

)2qλαm

(λ+ λ)m+1= Cq,m

αm

λm≥ Cq,m

αm+1

λm+1.

Thus, for ψ = ψMHD taking m = 1 and for ψ = ψMHR, taking m = 2, weobtain

ψ2(α, y) ≥∫ ‖A‖2+

0

(λ

α

)2qλαm

(λ+ α)m+1d‖Eλx†‖2

≥∫ ‖A‖2+

α

(λ

α

)2qλαm

(λ+ α)m+1d‖Eλx†‖2

≥ C

∫ ‖A‖2+

α

α2

λ2d‖Eλx†‖2.

By the regularity condition, the first integral in the upper bound in (2.47)can be estimated by the second part which agrees up to a constant withthe lower bound for ψ(α, y) in both cases. In the proof of Theorem 8, theestimate ‖xα∗ −x†‖ ≤ Cαµ∗ can then be replaced by ‖xα∗ −x†‖ ≤ Cψ(α∗, y),which leads to the optimal rate.

2.2.2 Predictive Mean-Square Error

The predictive mean-square error functional (cf. [89]) is given by

ψPMS(α, yδ) := ‖Axδα − y‖.

It is clear that this is not an implementable parameter choice rule as thefunctional to be minimised requires knowledge of the exact data y. Themotivation for studying it, however, becomes apparent when we consider thegeneralised cross-validation rule. Indeed, if we now turn our attention tofinite dimensional ill-conditioned problem

Anx = yn,

61

for An : X → Rn, then we can define the generalised cross-validation func-tional by

ψGCV(α, yδn) :=1

ρ(α)‖Anxδα − yδn‖,

withρ(α) :=

α

ntr

(AnA∗n + αIn)−1

. (2.48)

For i.i.d. noise, the expected value of ψ2GCV(α, yδn) − ‖e‖2 estimates the pre-

dicted mean-square error functional squared (cf. [105,152]).The predictive mean-square error functional differs from the previous onesin the sense that it has different upper bounds. In fact, from (2.30), oneimmediately finds that

ψ2PMS(α, yδ) ≤ C

τ 2

α2p+ Cα2µ+1,

for µ ≤ 12. It can easily be seen that for strongly bounded noise ‖yδ−y‖ <∞,

the method fails as it selects α∗ = 0.

The minimum of the upper bound is obtained for α = αopt = O(τ2

2p+2µ+1 ),but the resulting rate is of the order

ψ2PMS(α, yδ) ≤ C

[τ

(2µ+1)2p+2µ+1

]2

,

which agrees with the optimal rate for the error in the A-norm, i.e., ‖xδα −x†‖A := ‖A(xδα − x†)‖. Thus, for this method, it is not reasonable to boundthe functional ψPMS by expressions involving ‖xδα−xα‖ or ‖xα−x†‖. Rather,we try to directly relate the selected regularisation parameter α∗ to the op-timal choice αopt.To do so, we need some estimates from below, although in this case, we willneed to introduce a noise condition of a different type and an additionalcondition on the exact solution.

Lemma 3. Suppose that there exists a positive constant C such that yδ−y ∈Z satisfies ∫ ∞

α

d‖FλQ(y − yδ)‖2 ≥ Cτ 2

α2p−ε , (2.49)

for all α ∈ (0, αmax) and ε > 0 small. Then

‖A(xδα − xα)‖ ≥ Cτ

αp−ε2

.

Proof. From (2.49), one can estimate

‖A(xδα − xα)‖2 =

∫ ∞0

λ2

(α + λ)2d‖FλQ(y − yδ)‖2 ≥

∫ ∞α

d‖FλQ(y − yδ)‖2

≥ Cτ 2

α2p−ε .

62

Let us exemplify condition (2.49): for the case in (2.36), we have that∫λ≥α

d‖FλQ(y − yδ)‖2 =∑

1≤i≤N∗

1

iβ∼∫ N∗

1

1

xβdx =

CN1−β

∗ if 1− β > 0,

C if 1− β < 0,

with N∗ = 1

α1γ

. This gives that the left-hand side is of the order of α−1−βγ .

For (2.49) to hold true, we require that 1−βγ≥ 2p− ε, which means that

1 + εγ ≥ β + 2pγ.

If we now choose p close to the smallest admissible exponent for the weaklybounded noise condition, i.e. 2pγ = 1 − β + εγ, with ε small, then thecondition holds. In other words, our interpretation of the stated noise con-dition means that ‖(AA∗)p(yδ − y)‖ < ∞ and p is selected as the minimalexponent such that this holds. This noise condition automatically excludesthe (strongly) bounded noise case. The example also shows that the desiredinequality with ε = 0 cannot be achieved.

Theorem 10. Let µ ≤ 12, α∗ be the minimiser of ψPMS(α, yδ), assume that

the noise satisfies (2.49) and that Ax† 6= 0. Then

‖xδα∗ − x†‖ ≤

Cτ

2µ2µ+2p+1

2µ+12 , if α∗ ≥ αopt,

Cτ2µ

2µ+2p+1−ε 2p+1

(2p−ε)(2µ+2p+1) , if α∗ ≤ αopt.

If additionally for some ε2 > 0,∫ ∞α

λ2µ−1 d‖Eλω‖2 ≥ Cα2µ−1+ε2 , (2.50)

then for the first case we have

‖xδα∗ − x†‖ ≤ Cτ

2µ2µ+2p+1

2µ+12µ+1+ε2 , if α∗ ≥ αopt.

Proof. If α∗ ≥ αopt, it follows from Ax† 6= 0 that

‖Axα − y‖2 ≥ Cα2, (2.51)

and if (2.50) holds, then one even has that

‖A(xα − x†)‖2 ≥∫ ∞α

λ1+2µα2

(α + λ)2d‖Eλω‖2

≥ α2

∫ ∞α

λ2µ−1 d‖Eλω‖2 ≥ Cα2µ+1+ε2 . (2.52)

63

Since α 7→ ‖A(xδα − xα)‖2 is a monotonically decreasing function and usingYoung’s inequality [36], we may obtain that

Cαt∗ ≤ ‖A(xδαopt− xαopt)‖2 + ‖Axαopt − y‖2 ≤ C

[τ

2µ+12µ+2p+1

]2

,

i.e.,

α∗ ≤ Cτ2µ+1

2µ+2p+12t ,

where t = 2 or t = 2µ+ 1 + ε2 if (2.50) holds.If α∗ ≤ αopt, then we may bound the functional from below as

ψ2PMS(α, yδ) ≥ 1

2‖A(xδα − xα)‖2 − ‖Axα − y‖2,

for all α ∈ (0, αmax), which allows us to obtain

1

2‖A(xδα∗ − xα∗)‖

2 − ‖Axα∗ − y‖2 ≤ ψ2PMS(α∗, y

δ) ≤ ψ2PMS(αopt, y

δ)

≤ 2‖A(xδαopt− xαopt)‖2 + 2‖Axαopt − y‖2 ≤ C

[τ

2µ+12µ+2p+1

]2

.

i.e., by Lemma 3,

Cτ 2

α2p−ε∗− Cα2µ+1

∗ ≤ 1

2‖A(xδα∗ − xα∗)‖

2 − ‖Axα∗ − y‖2 ≤ C[τ

2µ+12µ+2p+1

]2

.

Now, from α∗ ≤ αopt, we get

Cτ 2

α2p−ε∗

≤ C[τ

2µ+12µ+2p+1

]2

+ Cα2µ+1∗

≤ C[τ

2µ+12µ+2p+1

]2

+ Cα2µ+1opt ≤ C

[τ

2µ+12µ+2p+1

]2

,

i.e.,

α2p−ε∗ ≥ C

[τ

2p2µ+2p+1

]2

⇐⇒ α∗ ≥ Cτ2

2µ+2p+1· 2p2p−ε .

Then inserting the respective bounds for α∗ into (1.15) yields the desiredrates.

Condition (2.50) can again be verified as we did for the noise conditionfor some canonical examples. The inequality with ε2 = 0 does not usu-ally hold. The condition can be interpreted as the claim that x† satisfies asource condition with a certain µ but this exponent cannot be increased, i.e.,x† 6∈ range((A∗A)µ+ε). A similar condition was used by Lukas in his analysisof the generalised cross-validation rule [105].The theorem shows that we may obtain almost optimal convergence resultsbut only under rather restrictive conditions. Moreover, the method shows asaturation effect at µ = 1

2comparable to the heuristic discrepancy rule.

64

2.2.3 Generalised Cross-Validation

The generalised cross-validation rule was proposed and studied in particularby Wahba [152], and is most popular in a statistical context [125] but lessso for deterministic inverse problems. It is derived from the cross-validationmethod by combining the associated estimates with certain weights. Impor-tantly, it was shown in [152] that the expected value of the generalised cross-validation functional converges to the expected value of the PMS-functionalas the dimension tends to infinity. This is why, in the preceding section, westudied ψPMS in detail.

Proposition 16. Let supn tr(A∗nAn) < ∞; then it follows that the weightρ(α) in ψGCV is monotonically increasing with ρ(0) = 0 and bounded withρ(α) ≤ 1. Furthermore, for α > 0, it follows that ρ(α)→ 1 as the dimensionn→∞.

Proof. Observe that, from the definition of ρ, namely (2.48), we may derive

ρ(α) =α

ntr((A∗nAn + αI)−1) =

1

n

n∑i=1

α

α + λi=

1

n

n∑i=1

1− λiα + λi

= 1− 1

n

n∑i=1

λiα + λi

. (2.53)

Moreover, it is clear that ρ(α) ≤ 1 for all α > 0, since the second term ispositive. Additionally, since

limn→∞

n∑i=1

λiα + λi

=∞∑i=1

λiα + λi

≤ supn

tr(A∗nAn)∞∑i=1

1

α + λi,

is clearly bounded, as per the assumption on the supremum in the statementof the proposition, and since the series is clearly convergent, it follows thattaking the limit 1/n → 0 as n → ∞ yields that the second term of (2.53)converges to 0 as n→∞. Thus, ρ(α)→ 1 as n→∞. The fact that ρ(0) = 0is obvious.

This is also the reason why one has to study the GCV in terms of weaklybounded noise. The limit limn→∞ ψGCV tends pointwise to the residual‖Axδα − yδ‖, which in the bounded noise case does not yield a reasonableparameter choice as then α∗ = 0 is always chosen. Note that in a stochasticcontext, and using the expected value of ψGCV, a convergence analysis hasbeen done by Lukas [105]. In contrast, we analyse the deterministic case.We now consider the ill-conditioned problem

Anx = yn, (2.54)

where we only have noisy data yδn ∈ Rn.

65

We impose a discretisation independent source condition; that is,

x† = (A∗nAn)µω, ‖ω‖ ≤ C, 0 < µ ≤ 1,

where C does not depend on the dimension n. Furthermore, let us restatesome definitions for this discrete setting:

δn := ‖yδn − yn‖, τ 2 :=n∑i=1

λ2pi |〈yδn − yn, ui〉|2.

Note that in an asymptotically weakly bounded noise case, we might assumethat τ is bounded independent of n while δn might be unbounded as n tendsto infinity.Moreover, we impose a noise condition of similar type as for the predictivemean-square error, but slightly different:∑

λi≥α

α2

λ2i

|〈yδn − yn, ui〉|2 ≥ Cτ 2

α2p−ε , for all α ∈ I, (2.55)

where C does not depend on n. This is different from the condition statedin [89], as the author of this thesis found that the condition there was falseand it is thus corrected here with arguably a much more restrictive condition(which is nevertheless needed to for the proceeding results of this section).Note that in the discrete case, one must restrict the noise condition to aninterval with I = [αmin, αmax] with αmin > 0.We also state a regularity condition∑

λi≥α

λ2µ−1|〈ω, vi〉|2 ≥ Cα2µ−1+ε2 , for all α ∈ I, (2.56)

where vi denote the eigenfunctions of A∗nAn.In order to deduce convergence rates, we look to bound the functional fromabove as we did for the other functionals in the previous sections:

Lemma 4. For yδn ∈ Rn, there exist positive constants such that

ψGCV(α, yn) ≤ C

ρ(α)Cα2µ+1, µ ≤ 1

2, (2.57)

ψGCV(α, yδn − yn) ≤ C

ρ(α)δ2n, µ ≤ 1

2, (2.58)

ψGCV(α, yδn) ≤ 1

ρ(α)

(Cα2µ+1 + δ2

n

), µ ≤ 1

2. (2.59)

Proof. It is a standard result (cf. [35]) that

‖Anxδα − yδn − (Anxα − yn)‖ ≤ ‖yδn − yn‖ ≤ δn.

Similarly, by the usual source condition, we obtain ‖(Anxα − yn)‖ ≤ Cα2µ+1

for µ ≤ 12

(see Proposition 7). The result follows from the triangle inequality.

66

The proceeding results generally follow from the infinite dimensional settingand we similarly obtain the following bounds from below:

Lemma 5. Suppose that α ∈ I and also that (2.55) holds. Then

ψGCV(α, yδn − yn) ≥ 1

ρ(α)

(C

τ 2

α2p−ε

).

Moreover, if ‖Anx†‖ ≥ C0, with an n-independent constant, then there existsan n-independent constant C with

ψGCV(α, yn) ≥ C1

ρ(α)α2.

If (2.56) holds and α ∈ I, then

ψGCV(α, yn) ≥ C1

ρ(α)α2µ+1+ε2 , µ ≤ 1

2.

Proof. We can estimate

ψ2GCV(α, yδn − yn) ≥ 1

ρ2(α)

∑λi≥α

α2

(λi + α)2|〈yδn − yn, ui〉|2

≥ C1

ρ2(α)

∑λi≥α

α2

λ2i

|〈yδn − yn, ui〉|2

(2.55)

≥ C1

ρ2(α)

(τ 2

α2p−ε

).

The remaining two inequalities in the lemma follow analogously to (2.51) and(2.52).

Theorem 11. Let µ ≤ 12, assume α∗ is the minimiser of ψGCV(α, yδn) and

suppose further that α∗ ∈ I such that (2.55) holds. Then

α∗ ≥[

infα≥α∗

(Cα2µ+1 + Cδ2n)

]− 12p−ε

τ2

2p−ε ≥ Cδ− 2

2p−εn τ

22p−ε .

On the other hand

α∗ ≤[

infα≤α∗

1

ρ(α)

(Cα2µ+1 + Cδ2

n

)] 1t

,

with t = 2. If α∗ ∈ I and (2.56) hold, then the above upper bound on α∗ holdswith t = 2µ+ 1 + ε2.

67

Proof. Take an arbitrary α and consider first the case α∗ ≤ α. Following onfrom the previous lemmas and using (2.58), we have

1

ρ(α∗)

(C

τ 2

α2p−ε∗

)≤ ψGCV(α∗, y

δn − yn) ≤ ψGCV(α∗, y

δn) + ψGCV(α∗, yn)

≤ ψGCV(α, yδn) + C1

ρ(α∗)α2µ+1∗ ≤ 1

ρ(α)

(Cα2µ+1 + δ2

n

)+ C

1

ρ(α∗)α2µ+1∗ .

Hence, by the monotonicity of α 7→ α2µ+1 and since ρ is monotonicallyincreasing (cf. Proposition 16), we obtain that(

Cτ 2

α2p−ε∗

)≤ ρ(α∗)

ρ(α)

(Cα2µ+1 + δ2

n

)+ α2µ+1

∗

≤(Cα2µ+1 + δ2

n

)+ α2µ+1

∗ ≤(Cα2µ+1 + δ2

n

).

Hence,

α∗ ≥[

infα≥α∗

(Cα2µ+1 + Cδ2n)

]− 12p−ε

τ2

2p−ε ≥ Cδ− 2

2p−εn τ

22p−ε .

Now, suppose α∗ ≥ α. Then using that α∗ is a minimiser, and with t as inthe statement of the theorem

C

ρ(α∗)αt∗ ≤ ψGCV(α∗, yn) ≤ ψGCV(α∗, y

δn) + ψGCV(α∗, y

δn − yn)

≤ 1

ρ(α)(Cα2µ+1 + Cδ2

n) + C1

ρ(α∗)δ2n

≤ 1

ρ(α)(Cα2µ+1 + Cδ2

n) + C1

ρ(α)δ2n.

Hence, as ρ(α∗) is bounded from above by 1, it follows that

α∗ ≤[

infα≤α∗

1

ρ(α)

(Cα2µ+1 + Cδ2

n

)] 1t

.

Theorem 12. Suppose that µ ≤ 12, α∗ is the minimiser of ψGCV(α, yδn),

where α∗ ∈ I such that (2.55) and (2.56) are satisfied. Suppose further that

one has ρ(δ2

2µ+1n ) ≥ C. Then

‖xδα∗ − x†‖ ≤ δ

2µtn + δn

(τ

δn

) 12p−ε

,

with t as in Theorem 11.

68

Proof. Since

‖xδα∗ − x†‖ ≤ Cαµ∗ + C

δn√α∗,

we may take the balancing parameter α = δ2

2µ+1n . From the previous theorem,

it follows that if α∗ ≤ α, then by Theorem 11

α∗ ≥τ

22p−ε

[infα≥α∗(Cα2µ+1 + Cδ2

n)]1

2p−ε≥(τ

δn

) 22p−ε

.

On the other hand, if α∗ ≥ α, and ρ(α) ≥ C, then

α∗ ≤ Cδ2t .

Thus, taking for αµ and δn/√α the worst of these estimates, we obtain the

desired result.

This result establishes convergence rates in the discrete case. However, therequired conditions are somewhat restrictive as we need that the selected α∗has to be in a certain interval (although this is to be expected in a finite-dimensional setting). Note that the term δ2

n in Theorem 11 can be replaced byany reasonable monotonically decreasing upper bound for ψ2

GCV(α, yδn − yn).In particular, if we could conclude that α∗ is in a region where ψ2

GCV(α, yδn−yn) ≤ C τ2

α2p , then we would obtain similar convergence results as for the

predictive mean square error. Moreover, the condition that ρ(δ2

2µ+1n ) > C

restricts the analysis to the weakly bounded noise scenario, in which caseδn → ∞ as n → ∞. The standard “bounded noise” case is ruled out inTheorem 12, because if δn would tend to zero, then this would lead to acontradiction with Proposition 16.In general, however, the performance of the GCV-rule for the regularisationof deterministic inverse problems is subpar compared to other heuristic rules,e.g., those mentioned in the previous sections; cf., e.g., [9, 53]. This is alsoillustrated by the fact that we had to impose stronger conditions for theconvergence results compared to the aforementioned rules.

2.3 Operator Perturbations

We now consider another deviation from the classical theory and attempt toreprove the standard results for this scenario: suppose that, in addition tonoisy data yδ (that is, we return to the usual setting in which ‖y− yδ‖ ≤ δ),one also only has knowledge of a noisy operator

Aη = A+ ∆A,

69

such that‖Aη − A‖ ≤ η.

Note that this section is largely based on the paper [51]. We also refer thereader to [103,134,143,150].The specific situation that we consider here, which is often met in practicalsituations, is that we suppose we have knowledge of the operator noise level,i.e., we assume η known, but we do not know the level of the data error δ.The regularised solution in this case defined as

xδα,η = (A∗ηAη + αI)−1A∗ηyδ, (2.60)

and the task is then to determine a parameter α such that

‖xδα,η − x†‖ → 0,

as δ, η → 0.First, we state the following useful lemma from [51] which provides someuseful bounds for the subsequent analysis:

Lemma 6. Let α ∈ (0, αmax) and for s ∈ 0, 12, 1, define

Bη,s :=

(A∗ηAη)

s if s ∈ 0, 1,A∗η if s = 1

2,

Bs :=

(A∗A)s if s ∈ 0, 1,A∗ if s = 1

2.

Let Bη,s and Bs be the operators we get from Bη,s and Bs by changing theroles of the operators Aη with A∗η and A with A∗, respectively. Then fors ∈ 0, 1

2, 1 and t ∈ −1,−3

2,−2, there exist positive constants Cs,t such

that ∥∥(A∗ηAη + αI)qBη,s − (A∗A+ αI)tBs

∥∥ ≤ Cs,tη

α12−s−t

. (2.61)∥∥∥(AηA∗η + αI)tBη,s − (AA∗ + αI)tBs

∥∥∥ ≤ Cs,tη

α12−s−t

. (2.62)

Proof. We prove (2.61), this gives (2.62) changing the roles of the operatorsAη ↔ A∗η and A↔ A∗. We recall the elementary estimates

‖(A∗A+ αI)−1‖ ≤ 1

α,

‖(A∗A+ αI)−1A∗‖ ≤ 1

2√α,

‖(A∗A+ αI)−1A∗A‖ ≤ 1,

(2.63)

which also hold with A and A∗ replaced by Aη and A∗η, respectively. Fors ∈ 0, 1, it follows from some algebraic manipulations, the fact that Bs, Bη.s

70

commute with the inverses below, and the previous estimates that

(A∗ηAη + αI)−1Bη,s −Bs(A∗A+ αI)−1

= (A∗ηAη + αI)−1[Bη,s(A

∗A+ αI)− (A∗ηAη + αI)Bs

](A∗A+ αI)−1

= (A∗ηAη + αI)−1[Bη,sA

∗A− A∗ηAηBs

](A∗A+ αI)−1

+ α(A∗ηAη + αI)−1 [Bη,s −Bs] (A∗A+ αI)−1.

In the case s = 0 and Bη,0 = B0 = I, we find

Bη,0A∗A− A∗ηAηB0 = (A∗ − A∗η)A+ A∗η(A− Aη),

which, using (2.63), gives C0,−1 = 1. Similarly, we can prove that C1,−1 = 1.For the case s = 1

2, if Bη,s = A∗η and Bs = A∗, we obtain C 1

2,−1 = 5

4with

minor modifications noting that (A∗A + αI)−1A∗ = A∗(AA∗ + αI)−1. Theother cases of t follow in a similar manner by

(A∗ηAη + αI)tBη,s −Bs(A∗A+ αI)t

= (A∗ηAη + αI)t+1[(A∗ηAη + αI)−1Bη,s −Bs(A

∗A+ αI)−1]

+[(A∗ηAη + αI)t+1 − (A∗A+ αI)t+1

]Bs(A

∗A+ αI)−1,

and by using (2.63) and the result for t = −1. For t = −32, we employ an

additional identity from semigroup operator calculus [92],

(A∗ηAη + αI)−12 − (A∗A+ αI)−

12

=sin(π

2)

π

∫ ∞0

w−12

[(A∗ηAη + (α + w)I)−1 − (A∗A+ (α + w)I)−1

]dw,

which leads to

‖(A∗ηAη + αI)−12 − (A∗A+ αI)−

12‖ ≤ C0,−1η

π

∫ ∞0

1√w(α + w)

32

dw

≤ 2C0,−1

π

η

α,

thereby finishing the proof.

2.3.1 Semi-Heuristic Parameter Choice Rules

In terms of parameter selection, previous work can be found in, e.g., [103,104,134,143]. Note that the aforementioned references generally consider a-posteriori rules; namely, the generalised discrepancy principle which selectsthe parameter as

α∗ = α(δ, η, yδ) = infα ∈ (0, αmax) | ‖Aηxδα,η − yδ‖ = τ

(δ + η‖xδα,η‖

),

(2.64)

71

with τ ≥ 1 and also a generalised balancing principle.The fact that prohibits the direct use of a minimisation-based rule with, say,a functional, which especially for this section, we define to be of the form

ψ : (0, αmax)× L(X, Y )× Y → R ∪ ∞(α,Aη, y

δ) 7→ ψ(α,Aη, yδ),

is that we are faced with an additional operator error, which is usually notrandom or irregular and hence it would be unrealistic to assume that for theoperator perturbation an analogous inequality to the Muckenhoupt condition(2.17) holds. The remedy is to employ a modified functional, which uses thenoisy operatorAη, but is designed to emulate a functional for the unperturbedoperator.To guarantee a minimiser and convergence of the regularised solution, werestrict the minimisation to an interval [γ, αmax], where the lower bound γ isselected depending on η (but not on δ):

α∗ := α(η, yδ) := argminα∈[γ,αmax]

ψ(α,Aη, yδ), γ = γ(η) > 0, (2.65)

withψ(α,Aη, y

δ) := ψ(α,Aη, yδ)− J (α,Aη, y

δ, η).

In this way, we combine heuristic rules with an η-based choice. Here, ψis a standard heuristic parameter choice functional and J is the so-calledcompensating functional. Note that the noise restrictions for the convergenceresults of the heuristic parameter choice rules may fail to be satisfied in caseof smooth noise (i.e., when the noise is in the range of the forward operator).Incidentally, operator noise, when it exists, tends to be smooth. Therefore,the purpose of the compensating functional is to subtract this smooth part ofthe noise, i.e., it should behave approximately like ψ(α,A, yδ)−ψ(α,Aη, y

δ).For the compensating functional, we propose two possibilities:

J (α,Aη, yδ, η) = Dη‖xδα,η‖, (SH1)

or

J (α,Aη, yδ, η) = D

η√α, (SH2)

which define the (SH1) and (SH2) variants of the semi-heuristic functionals.As a consequence of Lemma 6, we obtain some useful bounds.

Lemma 7. For any of the functionals ψ ∈ ψHD, ψHR, ψQO, and any α ∈(0, αmax), we have

ψ(α,Aη, Aηx†) ≤ Cs,t

η‖x†‖√α

+ ψ(α,A,Ax†), (2.66)

72

ψ(α,Aη, yδ) ≤ δ√

α+ (1 + Cs,t)

η‖x†‖√α

+ ψ(α,A,Ax†), (2.67)

with the constants Cs,t from Lemma 6: s = 12, t = −1 for the heuristic

discrepancy, s = 12, t = −3

2the Hanke-Raus, and s = 1, t = −2 for the

quasi-optimality functionals, respectively.

Proof. The inequality (2.66) follows from (2.61) and (2.62), the inequality(2.67) from (2.66) and from the inequalities

ψ(α,Aη, yδ) ≤ ψ(α,Aη, yδ − y) + ψ(α,Aη, (A− Aη)x†) + ψ(α,Aη, Aηx†)

≤ δ√α

+η‖x†‖√

α+ ψ(α,Aη, Aηx

†).

We remark that the term ψ(α,A,Ax†) converges to 0 as α→ 0; see Proposi-tion 12. Furthermore, if x† additionally satisfies a source condition (2.5), thenthe expression can be bounded by a convergence rate of order α (with someexponent depending on the source condition) that agrees with the standardrate for the approximation error ‖xα − x†‖ (see Proposition 7).


Suppose that α∗ is the selected parameter by the proposed parameter choicerules with the operator noise (2.65). In the following lemma, we show thatfor such a choice of parameter, it follows that α∗ → 0 if all noise (with respectto both the data and the operator) vanishes:

Lemma 8. Let α∗ be selected as in (2.65), i.e., with ψ as in and ψ ∈ψHD, ψHR, ψQO with either (SH1) or (SH2). Suppose there exist positiveconstants (not necessarily equal which we denote universally by C) such that‖yδ‖ ≥ C for ψ ∈ ψHD, ψHR and ‖A∗ηyδ‖ ≥ C for ψ = ψQO.If γ = γ(η) is chosen such that η√

γ→ 0 as η → 0 then

α∗ → 0,

as δ, η → 0.

Proof. Firstly, we recall Proposition 9 of Chapter 2, which proved that onemay estimate the parameter α by the ψ-functionals. In particular, it followsthat

ψ(α,Aη, yδ) ≥

C√α if ψ = ψHD,

Cα if ψ ∈ ψHR, ψQO.(2.68)

By the standard error estimate

‖xδα∗,η‖ ≤‖yδ‖√α∗,

73

we find, for the case in which the compensating functional is chosen as in(SH1) using (2.68) and (2.67) with t ∈ 1/2, 1 suited to ψ according to(2.68),

Cαt∗ −Dη‖yδ‖√α∗≤ ψ(α∗, Aη, y

δ) = infα∈[γ,αmax]

ψ(α,Aη, yδ)

≤ infα∈[γ,αmax]

ψ(α,Aη, yδ)

≤ infα∈[γ,αmax]

δ√α

+ (1 + Cp,q)η‖x†‖√

α+ ψ(α,A,Ax†)

≤ inf

α∈[γ,αmax]

δ√α

+ +ψ(α,A,Ax†)

+ (1 + Cp,q)

η‖x†‖√γ. (2.69)

Hence,

Cαt∗ ≤ infα∈[γ,αmax]

δ√α

+ ψ(α,A,Ax†)

+ (C +D)

η√γ.

It is not difficult to verify the same estimate analogously for the case in whichthe compensating functional is chosen according to (SH2).Inserting the (non-optimal) choice α = δ + γ in the infimum, we obtain anupper bound that tends to 0 as δ, γ → 0. By the hypothesis, the last twoterms vanish, thereby proving the desired result.

Remark. If α∗ is the minimiser of ψ(α,Aη, yδ), then this functional is thesame as (SH1) and/or (SH2) with D = 0 and one obtains the same result asabove; namely, that α∗ → 0 as δ, η → 0 provided that the conditions in thelemma are fulfilled.

Now, we can establish an estimate from above for the total error which isderived courtesy of a lower estimate of the parameter choice functional withthe data error, which we recall, due to Bakushinskii’s veto (cf. Proposition 3),necessitates a restriction on the noise. At first we study bounds for thefunctional in (2.65).

Proposition 17. Let α∗ be selected according to (2.65) with ψ as in (SH1).Suppose that for the noisy data yδ, the noise condition is satisfied, with(2.17) with e ∈ N1 for ψ ∈ ψHD, ψHR and e ∈ N2 for ψ = ψQO. Then,for η sufficiently small, we get the following error estimate for all ψ ∈ψHD, ψHR, ψQO:

‖xδα∗,η − x†‖

≤ (1−DηCnc)−1

[Cηδ

α∗+ Cnc inf

α∈[γ,αmax]ψ(α,Aη, yδ) +DCncη‖x†‖

+ Cη√α∗‖x†‖+ Cψ(α∗, A,Ax

†) + α∗‖(A∗A+ α∗I)−1x†‖].

(2.70)

74

Proof. We begin by estimating the terms:

‖xδα∗,η − x†‖ = ‖(A∗ηAη + α∗I)−1A∗ηy

δ − (A∗ηAη + α∗I)−1(A∗ηAη + α∗I)x†‖≤ ‖(A∗ηAη + α∗I)−1

[A∗ηy

δ − A∗ηAηx† − α∗x†]‖

≤ ‖(A∗ηAη + α∗I)−1[A∗η(y

δ − y) + A∗η(A− Aη)x†]‖

+ α∗‖(A∗ηAη + α∗I)−1x†‖

≤ ‖(A∗ηAη + α∗I)−1A∗η(yδ − y)‖+

η

2√α∗‖x†‖+ α∗‖(A∗ηAη + α∗I)−1x†‖.

By (2.61), the last term can be bounded by

α∗‖(A∗ηAη + α∗I)−1x†‖ ≤ α∗‖[(A∗ηAη + α∗I)−1 − (A∗A+ α∗I)−1

]x†‖

+ α∗‖(A∗A+ α∗I)−1x†‖

≤ C0,−1η‖x†‖√α∗

+ α∗‖(A∗A+ α∗I)−1x†‖.

This leaves the remaining term:

‖(A∗ηAη + α∗I)−1A∗η(yδ − y)‖

≤ ‖[(A∗ηAη + α∗I)−1A∗η − (A∗A+ α∗I)−1A∗

](yδ − y)‖

+ ‖(A∗A+ α∗I)−1A∗(yδ − y)‖

≤ 5ηδ

4α∗+ Cncψ(α∗, A, y

δ − y),

where we used the noise condition (2.17) for the last estimate.Utilising the noise condition (2.17) again, and with the operator error esti-mates (2.61), (2.62), for all ψ ∈ ψHD, ψHR, ψQO, we obtain

‖xδα∗,η − x†‖ = ‖(A∗ηAη + α∗I)−1A∗η(y

δ − y)‖

≤ 5ηδ

4α∗+ Cncψ(α∗, Aη, y

δ − y) + CncCp,qδη

α∗

≤ (5 + CncCp,q)ηδ


δ) + Cncψ(α∗, Aη, y)

≤ (5 + CncCp,q)ηδ


δ) +DCncη‖xδα∗,η‖

+ Cncψ(α∗, Aη, Ax†)

≤ Cηδ

α∗+ Cnc inf

α∈[γ,αmax]ψ(α,Aη, y

δ) +DCncη‖xδα∗,η − x†‖+DCncη‖x†‖

+ Cncψ(α∗, Aη, (Aη + A− Aη)x†)

≤ Cηδ

α∗+ Cnc inf

α∈[γ,αmax]ψ(α,Aη, y

δ) +DCncη‖xδα∗,η − x†‖+DCncη‖x†‖

+ Cncψ(α∗, Aη, Aηx†) + Cncψ(α∗, Aη, (A− Aη)x†).

75

The last terms can be bounded using standard error estimates (cf. Proposi-tion 11) by

ψ(α∗, Aη, (A− Aη)x†) ≤1√α‖(A− Aη)x†‖ =

η‖x†‖√α,

while for the other term we employ (2.61) and (2.66)

ψ(α∗, Aη, Aηx†) ≤ Cp,q

η‖x†‖√α

+ ψ(α∗, A,Ax†).

Hence, for all ψ ∈ ψHD, ψHR, ψQO, we obtain

(1−DηCnc)‖xδα∗,η − x†‖

≤ Cηδ

α∗+ Cnc inf

α∈[γ,αmax]ψ(α,Aη, yδ) +DCncη‖x†‖


†) + α∗‖(A∗A+ α∗I)−1x†‖.

The proof is easily adapted to obtain a similar proposition for the alternativechoice of compensating functional as in (SH2):

Proposition 18. Let the assumptions of the Proposition 17 hold. Let α∗be selected according to (2.65) with ψ as in (SH2). Then for η sufficientlysmall, we get

‖xδα∗,η − x†‖

≤ Cηδ

α∗+ Cnc inf

α∈[γ,αmax]ψ(α,Aη, yδ) + CncC

η√α∗


†) + α∗‖(A∗A+ α∗I)−1x†‖.

(2.71)

Note that the setting D = 0 in the previous propositions yields upper boundsfor the total errors in the case of employing the unmodified heuristic rules.

Theorem 13. Let α∗ be selected as in (2.65). Suppose that the noise con-dition (2.17) and the conditions of Lemma 8 are satisfied and furthermoresuppose that γ ∈ [0, αmax] satisfies

η

γ≤ C as η → 0,

where C is a constant. Then

‖xδα∗,η − x†‖ → 0,

as δ, η → 0.

76

Proof. Since we have that α∗ ≥ γ, the conditions in the theorem imply thatηδγ→ 0, η√

γ→ 0. The terms with ψ(α∗, A,Ax

†) and α∗‖(A∗A + α∗I)−1x†‖vanish by standard arguments (cf. Proposition 4 and Proposition 6, respec-tively), as α∗ → 0 according to Lemma 8. Finally, infα∈[γ,αmax] ψ(α,Aη, yδ)tends to 0 because of (2.69) and we may take an appropriate choice for α inthe infimum.

Remark. Note that one might use more general functionals than those in(SH1) and (SH2) by replacing η with ηs, s ∈ (0, 1). Still, in this case, similarconvergence results are valid with a slightly adapted choice of γ (dependingon s). However, we observed through some numerical experimentation (seeChapter 5) that s = 1 appeared to be a natural choice, which is fully in linewith our motivation that the compensating term should represent the errorin ψ due to operator perturbations.We further remark that the unmodified heuristic choice (i.e., with D = 0),stipulating the same condition as in the previous theorem, also yields conver-gence as the errors tend to zero. However, as will be observed in Chapter 5,which gives a numerical study related to the methods of this section, themodified rules represent a substantial improvement.Convergence rates were not proven in [51] and also lie beyond the scope ofthis thesis.

77

Chapter 3

Convex TikhonovRegularisation

In this chapter, we revert to the original representation of Tikhonov reg-ularisation, namely (2.4) and make use of it to expand our horizons fromlinear ill-posed problems to convex regularisation. This form of regularisa-tion gained popularity in recent decades due to the demand from applicationwhich, for instance, seeks to recover solutions which may be sparse or in thecase of image processing, one may want to retain the edges when denoisingan image. For the aforementioned applications, one may replace the squarequadratic regularisation term in (2.4) with an `1 norm or a total variationseminorm, respectively. This is the flexibility which (2.4) allows us over thespectral form (2.1). The downside is that the analysis is arguably much moredifficult in the absence of spectral theory.We begin this chapter with an initial section which covers the classical theory,mainly from the works of [21, 68, 142]. Then the proceeding sections are onheuristic parameter choice rules with a focus on proving convergence withrespect to an analogous noise restriction to (2.17) from the linear theory.

3.1 Classical Theory

We assume henceforth that A : X → Y is a continuous linear operator withX as a Banach space (and that Y is a Hilbert space). Then the functionalcalculus for self-adjoint operators used previously becomes void. Moreover,the Moore-Penrose generalised inverse may no longer exists, thus we mustredefine our notion of the best approximate solution. One survey in thissetting can be found in [140]. One may also consult with [142] in case onewants to consider Y as a Banach space also, although this is beyond thescope of this thesis. Note that this section is by and large analogous to [90].In this context, it is common to define the best approximate solution as the

78

so-called R-minimising solution, i.e.,

x† = argminx∈XAx=y

R(x).

One may still regularise a la Tikhonov, however, if one sticks with the vari-ational form:

xδα ∈ argminα

1

2‖Ax− yδ‖2 + αR(x), (3.1)

where the regularisation term R : X → R ∪ ∞ may be a generalisation ofthe usual square norm, which we shall assume to be convex, proper, coerciveand weak-∗ lower semi-continuous (akin to the aforementioned norm func-tional), which are in fact the standard assumptions for the well-defindness of(3.1) cf. [78, 140, 142]. In contrast with ‖ · ‖2, the R functional need not bedifferentiable.The optimality condition for the Tikhonov functional may be stated as fol-lows:

0 ∈ A∗(Axδα − yδ) + α∂R(xδα), (3.2)

for all α ∈ (0, αmax) and yδ ∈ Y . We may define specific selections of thesubgradient of R at xδα as:

ξδα : = − 1

αA∗(Axδα − yδ) ∈ ∂R(xδα),

and denote by ξα its respective noise-free variant.In this scenario, we also have to consider a different source condition to, e.g.,(2.5). Namely,

∂R(x†) ∈ rangeA∗, (3.3)

i.e., there exists ω such that A∗ω ∈ ∂R(x†) [21,140]. Subsequently, we define

ξ := A∗ω,

to be a subgradient of R at x†.It is useful to define the residual vectors as before:

pδα := yδ − Axδα, pα := y − Axα.

For notational purposes, we also define shorthands for differences of noisyand noise-free variables:

∆y := yδ − y, ∆pα := pδα − pα.

Additionally, in this context, rather than prove convergence in norm, wefollow the examples of e.g., [21,68,142], and prove estimates and convergencewith respect to the Bregman distance, denoted by Dξ(·, ·), cf. Appendix Band in particular, Definition 13. We begin by stating a useful estimate forthe data propagation error:

79

Lemma 9. We have the following upper bound for the data propagation error:

Dξα(xδα, xα) ≤ 1

α〈∆y −∆pα,∆pα〉, (3.4)


Proof. We may estimate

Dξα(xδα, xα) ≤ Dsymξδα,ξα

(xδα, xα) =1

α〈∆pα, A(xδα − xα)〉 =

1

α〈∆pα,∆y −∆pα〉,

where Dsym is the symmetric Bregman distance, cf. Appendix B and Defini-tion B.4, which proves the desired result.

We give the error estimates [21], which, contrary to the previous chapter, arein terms of the Bregman distance:

Proposition 19. Assume that (3.3) holds. Then there exist positive con-stants such that

Dξ(xα, x†) ≤ C‖ω‖2α, ‖Axα − y‖ ≤ C‖ω‖α,

Dξα(xδα, xα) ≤ Cδ2

α, ‖A(xδα − xα)‖ ≤ Cδ,

and

Dξ(xδα, x

†) ≤ C

(δ√α

+ ‖ω‖√α

)2

, ‖Axδα − yδ‖ ≤ δ + C‖ω‖α, (3.5)


Proof. We have

1

2‖Axα − y‖2 + αR(xα) ≤ 1

2‖Ax† − y‖2 + αR(x†) = αR(x†).

Now, fromDξ(xα, x

†) = R(xα)−R(x†)− 〈ξ, xα − x†〉,it follows that

αR(x†) = αR(xα)− α〈ξ, xα − x†〉 − αDξ(xα, x†).

Thus,

1

2‖pα‖2 + αR(xα) ≤ αR(x†)

⇐⇒ 1

2‖pα‖2 + αR(xα) ≤ αR(xα)− α〈ξ, xα − x†〉 − αDξ(xα, x

†)

⇐⇒ 1

2‖pα‖2 + αDξ(xα, x

†) ≤ −α〈ξ, xα − x†〉,

80

i.e.,

1

2‖pα‖2 ≤ −αDξ(xα, x

†)− α〈A∗ω, xα − x†〉 ≤ −α〈ω, pα〉

≤ α|〈ω, pα〉| ≤ α‖ω‖‖pα‖.

Hence, we obtain the discrepancy estimate: ‖pα‖ ≤ 2‖ω‖α. Now, for theapproximation error, we recall the previous estimate

1

2‖pα‖2 + αDξ(xα, x

†) ≤ α〈ξ, xα − x†〉 ≤ α‖ω‖‖pα‖ ≤1

2

(‖ω‖2α2 + ‖pα‖2

).

Now, for the estimates with noise, we have

1

2‖pδα‖2 + αDξα(xδα, xα) ≤ 1

2‖pδα‖2 − α〈 1

αA∗pα, x

δα − xα〉

=1

2‖pδα‖2 − 〈pα, A(xδα − xα)〉

=1

2〈pδα, pδα〉 − 〈A(xδα − xα), pα〉 − 〈y − yδ, A(xδα − xα)〉

= 〈12pδα + Axδα − Axα, pδα〉 − 〈y − yδ, A(xδα − xα)〉

=1

2〈Axα − yδ + Axδα + Axδα − Axα − Axα, Axα − Axδα + Axδα − yδ〉

− 〈y − yδ, A(xδα − xα)〉

=1

2〈Axα − yδ + Axδα + Axδα − Axα − Axα, Axδα − yδ〉

− 1

2〈Axα − yδ + Axδα + Axδα − Axα − Axα, A(xδα − xα)〉

− 〈y − yδ, A(xδα − xα)〉

=1

2‖Axδα − yδ‖2 +

1

2〈A(xδα − xα), Axδα − yδ〉 −

1

2‖A(xδα − xα)‖2

− 1

2〈Axδα − yδ, A(xδα − xα)〉 − 〈y − yδ, A(xδα − xα)〉

=1

2‖pδα‖2 − 1

2‖A(xδα − xα)‖2 − 〈y − yδ, A(xδα − xα)〉,

i.e.,

1

2‖pδα‖2 + αDξα(xδα, xα)

≤ 1

2‖pδα‖2 − 1

2‖A(xδα − xα)‖2 − 〈y − yδ, A(xδα − xα)〉

⇐⇒ 1

2‖A(xδα − xα)‖2 + αDξ(x

δα − xα) ≤ −〈y − yδ, A(xδα − xα)〉.

81

From the non-negativity of the Bregman distance, it follows that the aboveinequality implies that

1

2‖A(xδα − xα)‖2 ≤ −〈y − yδ, A(xδα − xα)〉 ≤ ‖y − yδ‖‖A(xδα − xα)‖

≤ δ‖A(xδα − xα)‖.

Thus, the data discrepancy can be estimated as ‖A(xδα − xα)‖ ≤ 2δ. Now,from

1

2‖A(xδα − xα)‖2 + αDξα(xδα, xα) ≤ −〈y − yδ, A(xδα − xα)〉

≤ δ‖A(xδα − xα)‖ ≤ 1

2δ2 +

1

2‖A(xδα − xα)‖2,

we get the estimate for the data propagation error.For the total error and discrepancy estimates, we begin by recalling that

1

2‖Axδα − yδ‖2 + αDξα(xδα, xα) ≤ 1

2‖Axα − y‖2 − α〈ξα, xδα − xα〉,

which implies

‖Axδα − yδ‖2 ≤ ‖Axα − y‖2 + 2〈A∗(Axα − y), xδα − xα〉≤ 4‖ω‖2α2 + 2〈Axα − y, A(xδα − xα)〉≤ 4‖ω‖2α2 + 2‖Axα − y‖‖A(xδα − xα)‖≤ C‖ω‖2α2 + Cδ2

≤ (Cδ + C‖ω‖α)2 ,

thereby yielding the estimate for the total discrepancy. Finally, for the totalerror, we may use the optimality of xδα to estimate

1

2‖Axδα − yδ‖2 + αR(xδα) ≤ δ2

2+ αR(x†),

which is equivalent to

1

2α‖Axδα − yδ‖2 ≤ δ2

2α+R(x†)−R(xδα).

Therefore,

1

2α‖Axδα − yδ‖2 +Dξδα

(xδα, x†)

≤ δ2

2α+R(x†)−R(xδα) +R(xδα)−R(x†)− 〈ξ, xδα − x†〉

≤ −〈A∗ω, xδα − x†〉+δ2

2α,

82

which allows us to estimate the total error was

Dξδα(xδα, x

†) ≤ −〈ω,A(xδα − xα + xα − x†)〉+δ2

2α− 1

2α‖Axδα − yδ‖2

≤ ‖ω‖‖A(xδα − xα)‖+ ‖ω‖‖Axα − y‖+δ2

2α

≤ C‖ω‖δ + C‖ω‖2α +δ2

2α

≤ C

(δ√α

+ ‖ω‖√α

)2

,

which completes the proof [78].

Corollary 5. Let a source condition (3.3) hold. Then we have

Dξ(xδα, x

†) ≤ Dξα(xδα, xα) +Dξ(xα, x†) + C‖ω‖δ, (3.6)


Proof. It follows from (B.3), with xδα = x1, x† = x2 and xα = x3, that

Dξ(xδα, x

†) = Dξα(xδα, xα) +Dξ(xδα, x

†) + 〈ξα − ξ, xδα − xα〉.

Observe that the last term can be estimated as

〈ξα − ξ, xδα − xα〉 = −〈 1αA∗(Axα − y) + A∗ω, xδα − xα〉

= − 1

α〈Axα − y, A(xδα − xα)〉 − 〈ω,A(xδα − xα)〉

≤ ‖A(xδα − xα)‖(

1

α‖Axα − y‖+ ‖ω‖

)≤ Cδ(C‖ω‖+ ‖ω‖) = C‖ω‖δ,

where we used the estimates of the previous proposition.

Theorem 14. Let a source condition (3.3) hold. Then

Dξ(xδα, x

†)→ 0,

as δ → 0.

Proof. We may estimate the error in three parts, as in (3.6), with the firstand second terms corresponding to the data propagation and approximationerrors, respectively. Subsequently, from Proposition 19, we obtain

Dξ(xδα, x

†) ≤ Cδ2

α+ Cα + Cδ. (3.7)

Thus, choosing α such that δ2/α→ 0 and α→ 0 as δ → 0 yields the desiredresult.

83

In this setting, we can also generalise iterated Tikohonov regularisation (2.9)to a convex analogue, namely, Bregman iteration, which is defined as

xδα,k ∈ argminx∈X

1

2‖Ax− yδ‖2 + αDξδα,k−1

(x, xδα,k−1), (3.8)

with ξδα,k−1 ∈ ∂R(xδα,k−1) for k ∈ N.For certain parameter choice rules, we are particularly interested in the secondBregman iterate (cf. [17, 124,154]), which we denote by

xIIα,δ ∈ argminx∈X

1

2‖Ax− yδ‖2 + αDξδα

(x, xδα). (3.9)

Note that the second Bregman iterate may be computed by minimising aslightly simpler expression which does not involve the Bregman distance [153]:

Proposition 20. We may compute the second Bregman iterate as

xIIα,δ ∈ argminx∈X

1

2‖Ax− yδ − (yδ − Axδα)‖2 + αR(x).


Proof. Expanding the second term in (3.9), and using the definition of ξδα,we see that xIIα,δ minimises

1

2‖Ax− yδ‖2 + αR(x)− αR(xδα)− α〈ξδα, x− xδα〉

=1

2‖Ax− yδ‖2 − 〈Axδα − yδ, A(x− xδα)〉+ αR(x)− αR(xδα)

=1

2‖Ax− yδ‖2 − 〈Axδα − yδ, Ax− yδ − (Axδα − yδ)〉+ αR(x)− αR(xδα)

=1

2‖Ax− yδ + Axδα − yδ − Axδα + yδ‖2 − 〈Ax− yδ − (Axδα − yδ), Axδα − yδ〉

+ αR(x)− αR(xδα)

=1

2‖Ax− yδ − (yδ − Axδα)‖2 + 〈Ax− yδ + Axδα − yδ, Axδα − yδ〉

+1

2‖Axδα − yδ‖2 − 〈Ax− yδ − (Axδα − yδ), Axδα − yδ〉

+ αR(x)− αR(xδα)

=1

2‖Ax− yδ − (yδ − Axδα)‖2 + αR(x) + 2‖Axδα − yδ‖2 −R(xδα).

Notice that the last two terms do not depend on x; hence, the result follows.

84

For the residual vector with the second Bregman iterate, we stick to thenotation:

pIIα,δ := yδ − AxIIα,δ, pIIα := y − AxIIα ,

for noisy and exact data, respectively. Subsequently, we define:

∆pIIα := pIIα,δ − pIIα .


‖pIIα,δ‖ ≤ ‖pδα‖, and R(xδα) ≤ R(xIIα,δ), (3.10)


Proof. From the optimality of xIIα,δ, it follows that

1

2‖pIIα,δ‖2 + αDξδα

(xIIα,δ, xδα) ≤ 1

2‖pδα‖2 + αDξδα

(xδα, xδα) =

1

2‖pδα‖2,

i.e.,

1

2‖pIIα,δ‖2 ≤ 1

2‖pδα‖2 − αDξδα

(xIIα,δ, xδα) ≤ 1

2‖pδα‖2,

which follows from the non-negativity of the Bregman distance.

Similarly as for the Tikhonov functional (i.e. (3.2)), we can state the opti-mality condition for the Bregman functional in the same manner:

0 ∈ A∗(AxIIα,δ − yδ − (yδ − Axδα)) + α∂R(xIIα,δ).

It follows immediately as a consequence of the above that one can define aspecific selection of the subgradient of R at xIIα,δ as

ξIIα,δ := − 1

αA∗(AxIIα,δ − yδ − (yδ − Axδα)) ∈ ∂R(xδα

II),

and denote by ξIIα its respective noise-free variant.

Proposition 22. The residuals pδα, pIIα,δ may be expressed in terms of a prox-

imal mapping operator, proxJ : Y → Y ,

proxJ = (I + ∂J )−1, (3.11)

with the functional J := αR∗ 1αA∗ in the following form:

pδα := proxJ (yδ), pIIα,δ := proxJ(y + pδα

)− pδα. (3.12)

85

Proof. It follows from the optimality condition for the Tikhonov functionalthat

∂R(xδα) 3 −A∗(Axδα − yδ)/α ⇐⇒ xδα ∈ ∂R∗(−A∗(Axδα − yδ)/α), (3.13)

since x∗ ∈ ∂R(x) ⇐⇒ x ∈ ∂R∗(x∗) for all x ∈ X and x∗ ∈ X∗ (cf. [32]).Furthermore, (3.13) is equivalent to

Axδα − yδ ∈ A∂R∗(−A∗(Axδα − yδ)/α)− yδ

⇐⇒ (−pδα)− A∂R∗(A∗pδα/α) ∈ −yδ.

We can rewrite the above as

(I + A∂R∗(A∗ · /α))(pδα) ∈ yδ ⇐⇒ (I + α∂(R∗(A∗ · /α)))(pδα) ∈ yδ,

due to the identity A(∂R∗(A∗·)) = ∂(R∗(A∗·)), which holds true if R∗ isfinite and continuous at a point in the range of A∗ (cf. [32, Prop. 5.7]), forinstance, at 0 = A∗0. By a result of Rockafellar [137, Thm. 4C, Thm 7A],this follows if R has bounded sublevel sets, which is a consequence of theassumed coercivity. By definition of the proximal mapping, (3.12) follows forpδα and analogously for pIIα,δ.

3.2 Parameter Choice Rules

Similarly as before, we define a ψ-functional based heuristic parameter choicerule by

α∗ ∈ argminα∈(0,αmax)

ψ(α, yδ). (3.14)

In the linear theory, αmax is usually chosen as ‖A‖2. In the convex case, itschoice is irrelevant for the convergence theory. In practice, one might chooseit similarly as in the linear case.In case no α∗ exists in the defined interval, the reader is referred to [83, (2.13),p. 237], which details how one can extend the definition of ψ to overcome thatissue. For simplicity, we persevere with (3.14) and simply assume existence.The conceptual basis for (3.14) in this setting is that the surrogate functionalψ should act as an error estimator of the form ψ ∼ Dξ(x

δα, x

†), so thatits minimiser should, in theory, be a good approximation of the optimalparameter choice. As in the linear theory, the error estimating behaviouris not guaranteed to hold in general (due to the Bakushinskii veto) unlessrestrictions on the noise are postulated cf. (2.17) in Chapter 2. Since theBregman distance in the linear case corresponds to the norm squared, wehenceforth, in this chapter, redefine the ψ-functionals of before such thatthey coincide in the linear case with the squared versions of the ones in,

86

e.g., Chapter 1 and 2 (i.e., e.g., ψHD in this chapter corresponds to ψ2HD of

Chapter 2).As we continue to assume that Y is a Hilbert space and the heuristic dis-crepancy (cf. [79, 80, 154]), Hanke-Raus and heuristic monotone error rulesare defined in the Y space, the expressions for their respective functionalsmay remain unchanged with the consideration that xIIα,δ would be the secondBregman iterate rather than the second Tikhonov iterate:

ψHD(α, yδ) :=1

α‖Axδα − yδ‖2, ψHR(α, yδ) :=

1

α〈AxIIα,δ − yδ, Axδα − yδ〉.

However, the quasi-optimality and simple-L rules were defined in theX space,and since we do not prove convergence in the strong topology of X, perhapsthis would not be the appropriate formulation of the aforementioned rulesin this setting. Indeed, we prove convergence with respect to the Bregmandistance, however, its general non-symmetricity opens the possibility for sev-eral ways of defining the quasi-optimality functional, for instance (cf. [86]).Indeed, we define

ψRQO(α, yδ) := Dξδα(xIIα,δ, x

δα), ψLQO(α, yδ) := DξIIα,δ

(xδα, xIIα,δ), (3.15)

ψSQO(α, yδ) := Dsym

ξIIα,δ,ξδα(xIIα,δ, x

δα), (3.16)

as the right, left and symmetric quasi-optimality functionals, respectively.For the simple-L rule, there also exist a plethora of options, one being

ψL(α, yδ) := R(xIIα,δ)−R(xδα), (3.17)

which we will not explore in this section. However, it does make an ap-pearance in Chapter 7, where we test it numerically, and this is thus farthe author’s preferred definition of ψL in this setting. Of course, this doesnot coincide exactly with ψL as defined in (2.12) in Chapter 2, since there,ψL(α, yδ) = 〈xδα, xδα − xIIα,δ〉 = ‖xδα‖2 − 〈xδα, xIIα,δ〉.Note that we omit ψLQO from the following analysis as in preliminary numer-ical tests, it performed very poorly in comparison to the other rules (whichare tested in Chapter 6) and this was also the conclusion of [86].As in the previous settings, it is also possible to define the ratio rules in theconvex setting; for instance, by replacing the ‖xδα‖2 from the expressions inthe linear setting with R(xδα).Similarly as for the heuristic discrepancy and Hanke-Raus rules, the symmet-ric quasi-optimality functional may also be expressed in terms of residuals:

Proposition 23. We have that

ψSQO(α, yδ) =1

α〈pδα − pIIα,δ, pIIα,δ〉,


87

Proof. From the symmetric Bregman distance definition (cf. (B.4)), we have

ψSQO(α, yδ) = Dsym

ξIIα,δ,ξδα(xIIα,δ, x

δα) = 〈ξIIα,δ − ξδα, xIIα,δ − xδα〉

=1

α〈A∗(Axδα − yδ)− A∗(AxIIα,δ − yδ + Axδα − yδ), xIIα,δ − xδα〉

=1

α〈A(xδα − xIIα,δ), AxIIα,δ − yδ〉

=1

α〈Axδα − yδ − (AxIIα,δ − yδ), AxIIα,δ − yδ〉,

for all α ∈ (0, αmax) and yδ ∈ Y , which is what we wanted to show.

The proposition below provides some useful estimates:

Proposition 24. For all α ∈ (0, αmax) and yδ ∈ Y , we have

0 ≤ ψRQO(α, yδ) ≤ ψSQO(α, yδ) ≤ ψHR(α, yδ) ≤ ψHD(α, yδ). (3.18)

Moreover, if (3.3) holds, then

ψHD(α, yδ) ≤(

δ√α

+ 2‖ω‖√α

)2

. (3.19)

In particular, with (3.3) and α∗ selected as (3.14) with ψ ∈ ψHD, ψHR, ψSQO, ψRQO,we have that

limδ→0

ψ(α∗, yδ) = 0. (3.20)

Proof. Since

ψSQO(α, yδ) = ψRQO(α, yδ) +DξIIα,δ(xδα, x

IIα,δ),

it follows immediately that ψSQO ≥ ψRQO. Moreover, from Proposition 23and Proposition 22, we get that

ψSQO(α, yδ) =1

α〈pδα − pIIα,δ, pIIα,δ〉 =

1

α〈pδα, pIIα,δ〉 −

1

α‖pIIα,δ‖2 ≤ 1

α〈pδα, pIIα,δ〉

= ψHR(α, yδ)(3.12)=

1

α

⟨proxJ ∗(y

δ + pδα)− proxJ ∗(yδ), yδ + pδα − yδ

⟩(B.7)

≤ 1

α‖ proxJ ∗(y

δ + pδα)− proxJ ∗(yδ)‖‖pδα‖

(3.10)

≤ 1

α‖pδα‖2 = ψHD(α, yδ).

The last estimate (3.19) follows from (3.5). As the respective α∗ are theminimisers, we may estimate ψ(α∗, y

δ) by the minimiser over α of (3.19),which is easily shown to be of the order of δ and thus tends to 0.

88

3.2.1 Convergence Analysis

As observed in the previous chapters, for heuristic rules in the linear case, it isoften standard to show convergence of the selected regularisation parameterα∗ as the noise level tends to zero. This is not necessarily true in the convexcase, as we shall see. The next lemma deals with the (exceptional) case inwhich limδ→0 α∗ 6= 0.

Lemma 10. Assume that A : X → Y is compact. Let α∗ be the minimiser ofψ ∈ ψHD, ψHR, ψSQO, ψRQO. In case of ψSQO, ψRQO, we additionally assumethat R is strictly convex. Suppose that limδ→0 α∗ = α > 0. Then

Dξ(xδα∗ , x

†)→ 0,

as δ → 0.

Proof. We show that any subsequence of Dξ(xδα∗ , x

†) has a convergent sub-sequence with limit 0. In case of ψHD, the result follows from [78, proof ofThm. 3.5] even without the compactness assumption of A. From the sameproof, it follows also that xδα∗ is bounded and similarly, we may show thatxIIα∗,δ is bounded as δ → 0. Hence, there exist weakly (or weakly-*) convergentsubsequences with

xδα∗ v, xIIα∗,δ z.

In [78], it was shown that v = xα, i.e., the limit of xδα∗ is the regularisedsolution for exact data and regularisation parameter α. Now using the com-pactness of A, we may find strongly convergent subsequences for the residualsas δ → 0:

pδα∗ → y − Axα =: pα, pIIα∗,δ → y − Az.

From lower semicontinuity, the minimisation property of xIIα∗,δ, and the strong

convergence of pδα∗ , we obtain that for any x ∈ X

1

2‖Az − y − (y − Axα)‖2 + αR(z)

≤ lim infδ→0

1

2‖AxIIα∗,δ − y

δ − (yδ − Axδα∗)‖2 + α∗R(xδα)

≤ lim infδ→0

1

2‖Ax− yδ − (yδ − Axδα∗)‖

2 + α∗R(x)

=1

2‖Ax− y − (y − Axα)‖2 + αR(x).

Hence, z is the minimiser of the functional on the left-hand side and by itsuniqueness, it follows that z = xIIα and thus pIIα∗,δ → y − AxIIα =: pIIα . Fromthe optimality conditions, we furthermore obtain the strong convergence ofξδα∗ → ξα.

89

Consider ψHR: it follows from (B.7) that

1

α∗‖pIIα∗,δ‖

2 ≤ 1

α∗〈pδα∗ , p

IIα∗,δ〉 = ψHR(α∗, y

δ),

and by (3.20), we find that pIIα = 0. Since pIIα = proxJ (y+pα)−pα, it followsthat

proxJ (y + pα) = pα = proxJ (y).

As proxJ is bijective, we obtain that pα = 0. From this, the result follows asin [78, Proof of Thm. 3.5].In case of ψSQO, ψRQO, we have, by (3.20), thatDξδα∗

(xIIα∗,δ, xδα∗)→ 0, and from

the lower semicontinuity, and the strong convergence of the subgradient ξδα,it follows that

Dξα(xIIα , xα) = 0.

By the assumed strict convexity of R, the Bregman distance is strictly pos-itive for nonidentical arguments; thus xIIα = xα ⇒ pα = pIIα . Employing theproximal representation, we thus have that

proxJ (y + pα)− pα = pα ⇔ 2pα + ∂J (2pα) = y + pα

⇔ pα + ∂J (2pα) = y = pα + ∂J (pα)⇔ ∂J (2pα) = ∂J (pα).

The strict convexity implies 2pα = pα, hence pα = 0. The results then followsas before from those in [78].

Noise Restriction As discussed, the auto-regularisation condition (1.38)becomes void in this setting due to the dominating term ψ(α, y − yδ) nolonger making sense when the regularisation operator is nonlinear. In order tograsp this, firstly consider that the ψ functionals can no longer be expressedin the spectral form a la (1.34). Thus, there are no linear filter functionsas such to act on the vector y − yδ. Instead, we stipulate an alternativeauto-regularisation condition, as suggested in [90].In the following, we state the main convergence theorem of this section alongwith the analogous auto-regularisation conditions:

Theorem 15. Let A : X → Y be compact, the source condition (3.3) be sat-isfied, α∗ be the minimiser of ψ ∈ ψHD, ψHR, ψSQO, ψRQO, and assume thereexist constants C > 0 such that the respective auto-regularisation condition

〈∆y −∆pα,∆pα〉 ≤ C‖∆pα‖2, (ARC-HD)

〈∆y −∆pα,∆pα〉 ≤ C〈∆pIIα ,∆pα〉, (ARC-HR)

〈∆y −∆pα,∆pα〉 ≤ C〈∆pα −∆pIIα ,∆pIIα 〉, (ARC-SQR)

1

α〈∆y −∆pα,∆pα〉 ≤ CDξδα

(xIIα,δ, xδα) +O(α), (ARC-RQO)

90

holds for all α ∈ (0, αmax) and yδ ∈ Y for the heuristic discrepancy, Hanke-Raus, symmetric quasi-optimality and right quasi-optimality rules. If ψ ∈ψSQO, ψRQO, assume in addition that R is strictly convex. Then it followsthat the method converges; i.e.,

Dξ(xδα∗ , x

†)→ 0,

as δ → 0.

Proof. Take an arbitrary subsequence of Dξ(xδα∗ , x

†). We show that it con-tains a convergent subsequence with limit 0, which would prove the state-ment. Since α∗ is bounded, we may consider a subsequence, which, in ad-dition satisfies α∗ → α. In case α∗ → α > 0, the result limδ→0Dξ(x

δα∗ , x

†) = 0follows from Lemma 10 without even needing to invoke the auto-regularisationconditions. Thus, let us assume that α∗ → 0.Then it follows from the estimate (3.6) and Proposition 19 that one onlyneeds to prove convergence of the data propagation error Dξα(xδα, xα), whichwe can immediately estimate via (3.4) and the respective auto-regularisationcondition in Theorem 15.For α∗ minimising the heuristic discrepancy functional, it follows from Propo-sition 19 and (3.20) that limδ→0 ψHD(α∗, y) = 0 and limδ→0 ψHD(α∗, y

δ) = 0.Thus, we may conclude that

‖∆pα∗‖2

α∗≤(‖pδα∗‖√α∗

+‖pα∗‖√α∗

)2

=(√

ψHD(α∗, yδ) +√ψHD(α∗, y)

)2

→ 0,

as δ → 0. Hence, using (3.4) and (ARC-HD) yields that Dξα∗ (xδα∗ , xα∗)→ 0

as δ → 0. For the approximation error, it follows from Proposition 19 thatDξ(xα∗ , x

†) ≤ Cα∗ → 0. Therefore, each term in (3.6) tends to 0 as δ → 0.Let α∗ be the minimiser of the Hanke-Raus functional. Then, as before, weestimate the Bregman distance as (3.4) and from (ARC-HR), deduce that

Dξα∗ (xδα∗ , xα∗) ≤

C

α∗〈∆pIIα∗ ,∆pα∗〉 =

C

α∗〈pIIα∗ ,∆pα∗〉 −

C

α∗〈pIIα∗,δ,∆pα∗〉

=C

α∗〈pIIα∗ , pα∗〉 −

C

α∗〈pIIα∗ , p

δα∗〉+

C

α∗〈pIIα∗,δ, p

δα∗〉 −

C

α∗〈pIIα∗,δ, pα∗〉

≤ CψHR(α, yδ) + CψHR(α∗, y)− C

α∗〈pIIα∗ , p

δα∗〉 −

C

α∗〈pIIα∗,δ, pα∗〉. (3.21)

The last two terms can be estimated via the Cauchy-Schwarz inequality and(3.10), and are bounded from above by Proposition 19

C1

α∗‖pδα∗‖‖pα∗‖ ≤

1

α∗2ωα∗(δ + 2‖ω‖α∗).

Moreover, the last terms decay to zero as δ → 0 and clearly, the first coupleof terms vanish as the noise decays also, since limδ→0 ψHR(α∗, y

δ) = 0 due to

91

(3.20) and from (3.18) and Proposition 19, ψHR(α∗, y) ≤ ψHD(α∗, y) → 0 asδ → 0.For α∗ minimising ψSQO, note that from (3.6) and (ARC-SQR), it remainsto estimate

Dξα∗ (xδα∗ , xα∗) ≤

1

α∗〈pδα∗ − p

IIα∗,δ, p

IIα∗,δ〉+

1

α∗〈pα∗ − pIIα∗ , p

IIα∗〉

− 1


IIα∗,δ〉 −

1


IIα∗,δ, p

IIα∗〉

≤ ψSQO(α, yδ) + ψSQO(α∗, y)

− 1


IIα∗,δ〉 −

1


IIα∗,δ, p

IIα∗〉.

As before, the result follow very similarly by estimating the first two termsfrom above (cf. Proposition 24) and the “remainder” terms via the Cauchy-Schwarz inequality, the triangle inequality, and (3.10) and all of which vanishas the noise decays.We omit the proof for the right quasi-optimality rule as it is analogous tothe above (and in fact, even simpler as the RHS of its associated auto-regularisation condition (ARC-RQO) vanishes due to (3.20)).

Remark. Note that Theorem 15 holds true if the left-hand side of the auto-regularisation conditions, i.e., 〈∆y −∆pα,∆pα〉 is replaced by Dξα(xδα, xα).Moreover, it is easy to see that for (ARC-HD) it is enough to prove

〈∆pα,∆y〉 ≤ C‖∆pα‖2, (3.22)

for some positive constant C.

Remark. Observe that the heuristic discrepancy, the Hanke-Raus, and thesymmetric quasi-optimality rules can all be expressed in terms of the residualsof the Bregman iteration pδα, p

IIα,δ. The similarity of patterns in the formulae

for ψ may provide a hint that such a larger family of rules could be defined inthe convex case as well, similar as in the linear case; see (2.11) in Chapter 2.

The auto-regularisation condition is an implicit condition on the noise. Onemay observe that it resembles the analogous condition of (1.38) in the linearcase. To gain a better understanding and in particular show that they can besatisfied in practical situations, in Section 3.3, we will derive sufficient andmore transparent conditions in the form of Muckenhoupt-type inequalities(2.17) .

3.2.2 Convergence Rates (for the Heuristic Discrep-ancy Rule)

With the aid of the source condition, auto-regularisation condition and anadditional regularity condition, we can even derive rates of convergence. We

92

will do so only for the heuristic discrepancy rule, as per the title of thissection, as rates for the other rules lie beyond the scope of this thesis. Theseresults stem from [90]. We start with the following proposition:

Proposition 25. Suppose that ∂R∗ satisfies the following condition:

x→ 0⇒ ξ ∈ ∂R∗(x) 0. (3.23)

Then, for all positive constants D > 0, there exists another constant D1 > 0such that for all yδ ∈ Y with ‖yδ‖ ≥ D, it holds that

α ≤ D1ψHD(α, yδ) ∀α ∈ (0, αmax). (3.24)

Proof. Suppose that the statement (3.24) is not true. Then there would exista constant D > 0 and a sequence of yδk such that ‖yδk‖ ≥ D, and a sequenceof αk with ∥∥∥∥pδkαkαk

∥∥∥∥ =ψHD(αk, y

δk)

αk→ 0 as k →∞.

Define zk :=pδkαk

αk. From its representation as a proximal mapping, we have

that zk satisfiesyδk = αkzk + A∂R∗(A∗zk).

Thus, from zk → 0, the boundedness of αk, A∗zk → 0, and (3.23), we obtain

0 < D ≤ ‖yδk‖2 = αk〈zk, yδk〉+ 〈∂R∗(A∗zk), A∗yδk〉X∗,X∗∗ → 0,

which yields a contradiction; hence the statement of the proposition is true.

We now state the main convergence rates result:

Proposition 26. Let the source condition (3.3) hold, α∗ minimise ψHD andsuppose the auto-regularisation condition (ARC-HD) is satisfied. Assumethat ‖yδ‖ ≥ C and, in addition that (3.23) holds.Then

Dξ(xδα∗ , x

†) = O(δ),

for δ → 0.

Proof. Note that from Proposition 25 and since α∗ is the global minimiser,for any α, we have that α∗ ≤ CψHD(α∗, y

δ) ≤ CψHD(α, yδ). Observe that

93

from (3.6), it follows that

Dξ(xδα∗ , x

†) ≤ Dξα∗ (xδα∗ , xα∗) +

‖w‖2

2α∗ + 6‖w‖δ

(ARC-HD)

≤(√

ψHD(α, yδ) +√ψHD(α∗, y)

)2

+ Cδ + Cα∗

≤(

δ√α

+ C√α + C

√α∗

)2

+ Cδ + Cα∗ via Proposition 19

= O

((δ√α

+√α

)2

+δ2

α+ α + δ

)since α∗ ≤ CψHD(α, yδ),

= O (δ) ,

choosing α = α(δ) = δ.

Note that if the source condition (3.3) holds, α∗ is selected as the minimiserof ψ ∈ ψHR, ψSQO, ψRQO and the respective auto-regularisation conditions((ARC-HR),(ARC-SQR), or (ARC-RQO)) are satisfied, and additionally,that for some µ > 0

αµ∗ ≤ Cψ(α∗, yδ), (3.25)

then one may also prove, analogously that

Dξ(xδα, x

†) = O(δ1µ ),

for δ → 0. In the linear theory, the inequality (3.25) corresponds to (12) fromChapter 2, and we recall that under certain additional regularity conditionson x†, e.g., (2.26), optimal convergence rates are obtained (cf. Theorem 6).We note that the condition on ∂R∗, (3.23), holds if R∗ is continuously differ-entiable in 0 with ∇R(0) = 0. This is true, for instance, for R∗(x) = ‖x‖p`pwith 1 < p < ∞. On the other hand we observe that the conclusion inProposition 25 is not satisfied for exact penalisation methods [21] (such as`1-regularisation) as then the residual pα (and hence ψHD) could vanish fornonzero α, which does not concur with the estimate (3.24).

3.3 Diagonal Operator Case Study

This section, as the rest of the chapter, is from the paper [90]. In the followinganalysis, we consider the case of the operator A : X → Y being diagonalbetween spaces of summable sequences; in particular, X = `q(N), with 1 <q < ∞, and Y = `2(N), and the regularisation functional is selected as the`q-norm to the q-th power divided by q. The main objective in this settingis to state sufficient conditions for the auto-regularisation conditions in theform of Muckenhoupt-type inequalities similar to (2.17) and to illustrate theirvalidity for specific instances.

94

Let enn∈N be the canonical (i.e., Cartesian) basis for X and also Y , and letλnn∈N be a sequence of real (and for simplicity) positive scalars monoton-ically decaying to 0. Then we define a diagonal operator A : `q(N)→ `2(N),

Aen = λnen. (3.26)

The regularisation functional is chosen as the q-th power of the `q-norm(divided by q):

R :=1

q‖ · ‖q`q with ∂R(x) = |xn|q−1 sgn(xn)n∈N, q ∈ (1,∞).

(3.27)As we assume q > 1, the choice of sgn at 0 is not relevant and we may assumesgn(0) = 0 throughout. In the present situation, the problem decouples andthe components of the regularised solution can be computed independentlyof each other. Thus, for notational purposes, one may omit the sequenceindex n for the components of the regularised solutions and write

xδα =: xδα,nn =: x δαn,xα =: xα,nn =: xαn,

yδ =: yδn,y =: yn,

where xα, x δα , yδ, y ∈ R. As the problem decouples, xα and x δα can be computedby an optimisation problem on R, i.e., the optimality conditions read

x δα + γn|x δα |q−1 sgn(x) =yδ

λn, with γn :=

α

λ2n

,

and similar expressions hold for x IIα,δ := xIIαn,δn . Because the term |x δα |q−1

is homogeneous of degree q − 1, by an appropriate scaling we can furthersimplify expressions: define the components of pδα, pα as

pδα := yδ − λnx δα , pα := y − λnxα,

and we use the expressions y , yδ, ∆y , ∆pα, ∆pIIα to denote the components ofyδ, y, ∆y, ∆pα, ∆pIIα , respectively, where we again omit the sequence indexn in the notation:

yδ = yδn,

y = yn,

∆y : = yn − yδn,∆pα : = pαn − pδαn,

∆pIIα : = pIIαn − pIIα,δ,

95

for n ∈ N.Letting

hq(x) := x+ |x|q−1 sgn(x), x ∈ R, ηn := γ1

2−qn =

(α

λ2n

) 12−q

,

Φq(y) := h−1q (y),

we obtain via some simple calculation that

x δα = ηnΦq(yδ

ηnλn), x IIα,δ = ηnΦq

(yδ

ηnλn,+Φq∗(

yδ

ηnλn)), (3.28)

pδα = λnηnΦq∗(yδ

ηnλn), pIIα,δ = λnηn

(Φq∗

(yδ

ηnλn+ Φq∗(

yδ

ηnλn))− Φq∗(

yδ

ηnλn)),

(3.29)

where q∗ is the conjugate index to q. Note that Φq corresponds to a proximaloperator on R. For x > 0 we have

xq−1 ≤ hq(x). (3.30)

Moreover, Φq is monotonically increasing and it is not difficult to verify thatfor any 1 < q <∞,

x 7→ Φq∗(x1

2−q )q∗−2 is increasing for x > 0. (3.31)

We now state useful estimates for the function Φq:

Lemma 11. For 1 < q < ∞, q 6= 2, there exist constants Dp, Dq, and forany τ > 0, a constant Dq,τ , such that for all x1 > 0 and |x2| ≤ x1,

1

1 +DpΦq(x1)q−2≤ Φq(x1)− Φq(x2)

x1 − x2

≤ 1

1 +DpΦq(x1)q−2, (3.32)

1

1 +DpΦq(x1)q−2≤

1

Dq

x2−qq−1

1 if 1 < q < 2,

1

Dq,τ

x2−qq−1

1 if q > 2 and ∀x1 > τ.

(3.33)

Proof. For any z1 > 0, |z2| ≤ z1, we have

hq(z1)− hq(z2)

z1 − z2

= 1 +zq−1

1 − |z2|q−1 sgn(z2)

z1 − z2

= 1 + zq−21 k

(z2

z1

) ≤ 1 + zq−2

1 Dp,

≥ 1 + zq−21 Dp,

where

Dp ≤ k(z) :=1− |z|q−1 sgn(z)

1− z≤ Dp, ∀z ∈ [−1, 1].

96

Replacing zi by Φq(xi) yields the lower bound and the first upper bound in(3.32). In case 1 < q < 2, we find that

1

1 +DpΦq(x1)q−2=

Φq(x1)2−q

Φq(x1)2−q +Dp

≤ x2−qq−1

1

Dp

,

where we used that Φq(x1)2−q ≥ 0 in the denominator and the estimate

Φq(x1) ≤ x1q−1

1 that follows from (3.30). Now consider the case q > 2. Then

1

1 +DpΦq(x1)q−2≤ C

2−qq−1τ

Dp

x2−qq−1

1 ∀x1 ≥ τ,

where we used the estimate hq(x) ≤ Cτxq−1 for x ≥ hq(τ). This yields the

result.

3.3.1 Muckenhoupt Conditions

As in the rest of this section, we follow the example of [90] very closely. In casethe forward operator is diagonal, we may specialise the auto-regularisationcondition to Muckenhoupt-type inequalities [83, 87, 88] similar to the linearcase, cf. (2.17) in Chapter 2. If we consider A : X → Y a compact operatorand R = ‖ · ‖2

`2 , then the Muckenhoupt-type conditions take the followingform, with some t ∈ 1, 2: There exist constants C1, C2 such that for all ∆y

∑n:

σ2nα≥C1

|∆y |2 ασ2n

≤ C2

∑n:

σ2nα<C1

|∆y |2(σ2n

α

)t−1

, (3.34)

for all α ∈ (0, αmax), where ∆y = 〈yδ − y, un〉 with un the eigenvectors andσ2n the eigenvalues of AA∗. The integer t is taken as t = 1 for the heuristic

discrepancy and Hanke-Raus rules and t = 2 for the quasi-optimality rules,which is analogous with the theory of Chapter 2, where the Np condition(2.17) held for the HD and HR rules with p = 1 and for the quasi-optimalityrule with p = 2. Here, t plays the role of p. In this case, the linear auto-regularisation conditions hold for the respective rules and one can proveconvergence of the method. The Muckenhoupt inequalities hold in manysituations; see, e.g., [87, 88].In order to realise the insight (3.34) provides, one can observe that the right-hand side of the inequality (3.34) represents the high frequency componentsof the noise. Thus, in order for this upper bound to hold, one can interpretthis as requiring that the noise be sufficiently irregular. This is analogouswith the requirements of Chapter 2. In the case of the diagonal setting above,we have that σn = λn and the definition of ∆y agrees with that in (3.34).

97

For later reference, we define the following sequence of positive numbers:

θq,n := λqn max|y |, |yδ|2−q. (3.35)

Then the following theorem provides a sufficient condition for the auto-regularisation condition to hold:

The Heuristic Discrepancy Rule

Theorem 16. Let A be a diagonal operator (3.26) and R the regularisationfunctional in (3.27) with q ∈ (1,∞). Suppose that for some constant C1,there exists a constant C2 such that for all yδ and α ∈ (0, αmax)∑

n:θq,nα≥C1

|∆y |2 α

θq,n≤ C2

∑n:

θq,nα<C1

|∆y |2. (3.36)

Then the auto-regularisation condition (ARC-HD) holds for the heuristic dis-crepancy principle.

Proof. For A being a diagonal operator, the condition to prove, (3.22), maybe rewritten as ∑

n∈N

∆pα∆y ≤ C∑n∈N

|∆pα|2. (3.37)

Let IHD ⊂ N be a set of indices where, for some fixed constants β1, β2, itholds that

n ∈ IHD ⇒ |∆y | ≤ β1|∆pα|, (3.38)

n 6∈ IHD ⇒ |∆pα| ≤ β2|∆y | αθq,n

. (3.39)

Note that we construct this set in the proceeding. We first show that for(3.37), it is sufficient that there exists a constant C2 such that∑

n6∈IHD

|∆y |2 α

θq,n≤ C2

∑n∈IHD

|∆y |2. (3.40)

Indeed, splitting the sum in (3.37) into two parts and using (3.39), (3.38),and (3.40), and noting that ∆pα always has the same sign as ∆y , we obtain∑

n∈N

∆pα∆y =∑n6∈IHD

∆pα∆y +∑n∈IHD

∆pα∆y

≤ β2

∑n6∈IHD

|∆y |2 α

θq,n+ β1

∑n∈IHD

|∆pα|2

≤ β2C2

∑n∈IHD

|∆y |2 + β1

∑n∈IHD

|∆pα|2 ≤ (β2C2β21 + β1)

∑n∈IHD

|∆pα|2.

98

The Lipschitz continuity of the proximal mapping |∆pα| ≤ |∆y | now implies(3.37). Note that (3.36) has the form of (3.40) with

IHD := n :θq,nα

< C1.

Thus, it remains to verify that for this index set, there exist constants β1, β2

for which (3.39) and (3.38) hold.

We note that by monotonicity, the ratio∆pα∆y is always positive and invariant

when y , yδ are switched and when y , yδ are replaced by −y , −yδ. Thus,without loss of generality, we may assume that y > 0 and |yδ| ≤ y such thaty = max|y |, |yδ|. Using this convention, noting that

λnηn =( αλq

) 12−q

and the definition of θq,n, (3.35), we have that(y

λnηn

)2−q

=max|y |, |yδ|2−qλqn

α=θq,nα. (3.41)

Thus for n ∈ IHD and by (3.31), we have

Φq∗

(y

λnηn

)q∗−2

≤ Φq∗

(C

12−q1

)q∗−2

.

Using (3.29), Lemma 11 with

x1 =y

λnηn=

max|y |, |yδ|λnηn

and x2 =yδ

λnηn,

we find that for n ∈ IHD

∆pα∆y≥ 1

1 +DpΦq∗

(y

λnηn

)q∗−2 ≥1

1 +DpΦq∗

(C

12−q1

)q∗−2 > 0, (3.42)

which verifies (3.38).In view of (3.39), let n 6∈ IHD, then as θq,n

α≥ C1, we conclude that

yλnηn

≥ C1

2−q1 if q ≤ 2 ( i.e. q∗ > 2). (3.43)

Applying Lemma 11 with Φq∗ , x1 =y

λnηn, we observe that the conditions on

the right-hand side of (3.33) hold true by (3.43). Noting that the exponentsin (3.32) satisfy 2−q∗

q∗−1= q − 2, we obtain the upper bound

∆pα∆y≤ C

(y

λnηn

) 2−q∗q∗−1

= C

(y

λnηn

)q−2

= Cα

θq,n, (3.44)

which verifies (3.39) and thus completes the proof.

99

The Hanke-Raus Rule

Contrary to the heuristic discrepancy case, we have to impose a restriction onthe regularisation functional exponent q in order to keep certain expressionspositive:

Lemma 12. If q ≥ 32, then it follows that

∆pIIα ∆pα ≥ 0,


Proof. Setting z1 = yδ

ηnλn, z2 = y

ηnλn, noting (3.29) and the identity

Φq∗ (z1 + Φq∗(z1)) = Φq∗ (hq∗(Φq∗(z1)) + Φq∗(z1)) ,

it is enough to verify that the mapping

F : p 7→ Φq∗ (hq∗(p) + p)− p,

is monotonically increasing. As this function is differentiable everywhereexcept at p = 0, it suffices to prove the inequality

0 ≤ F ′(p) =2 + (q∗ − 1)|p|q∗−2

1 + (q∗ − 1)|Φq∗ (hq∗(p) + p) |q∗−2− 1,

for any p ∈ R. Since F is antisymmetric and hence F ′ is symmetric,it is in fact sufficient to prove this inequality for p > 0. Setting r =Φq∗ (hq∗(p) + p) ≥ p, we thus have to show that

2 + (q∗ − 1)|p|q∗−2

1 + (q∗ − 1)|r|q∗−2≥ 1, where hq∗(r) = hq∗(p) + p. (3.45)

Defining the number ζ implicitly by hq∗(ζp) = hq∗(p) + p, (i.e., r = ζp), itfollows that ζ ∈ [1, 2] and that pq

∗−2 = 2−ζζq∗−1−1

. Plugging this formula and

that for r into the inequality (3.45), we obtain that monotonicity holds if

(q∗ − 1)(2− ζ)(ζq

∗−2 − 1)

ζq∗−1 − 1≤ 1, ∀ζ ∈ [1, 2]. (3.46)

Some detailed analysis reveals that this is satisfied for q∗ ≤ 3, which isequivalent to q ≥ 3

2.

The next lemma is needed to estimate a term in (ARC-HR).

Lemma 13. Let n ∈ N be such that

θq,nα≤ C1, (3.47)

with a constant C1 that is sufficiently small. Then there is a constant β1

depending on C1 but independent of n with

∆pα∆y ≤ β1∆pIIα ∆pα.

100

Proof. Define

x1 =y

ηnλn+ Φq∗(

yηnλn

) =y+pαηnλn

, x2 =yδ

ηnλn+ Φq∗(

yδ

ηnλn) =

yδ+pδαηnλn

.

From the definition of pα, pδα it follows that sgn(x1) = sgn(y), sgn(x2) =sgn(yδ) and x1, x2 are increasing functions of pα, pδα, respectively. Thus thefollowing ratio

RII :=∆pIIα + ∆pα∆pα + ∆y

=Φq∗(x1)− Φq∗(x2)

x1 − x2

,

is always positive and, moreover, invariant when x1, x2 are switched and,respectively, replaced by −x1,−x2. Thus, we may assume without loss ofgenerality (otherwise we redefine the variables x1, x2) that x1 > 0 and |x2| ≤x1, which is equivalent to y > 0 and |yδ| ≤ y . Applying Lemma 11 yieldsthen

RII =∆pIIα + ∆pα∆pα + ∆y

=Φq∗(x1)− Φq∗(x2)

x1 − x2

≥ 1

1 +DpΦq∗(x1)q∗−2.

It follows from (3.47) and y ≤ y + pα ≤ 2y that

x2−q1 =

(y + pαληn

)2−q

≤(C4

yληn

)2−q

=

(C4θq,nα

)2−q

= C2−q4 C1,

where C4 ∈ 2, 1 depending whether q > 2 or q < 2. In any case, we obtainwith (3.31) that as before,

Φq∗

(y + pαλnηn

)q∗−2

≤ Φq∗

(C4C

12−q1

)q∗−2

,

and hence

RII ≥ 1

1 +DpΦq∗

(C4C

12−q1

)q∗−2 .

Some standard calculus furthermore reveals that

limC1→0

Φq∗

(C4C

12−q1

)q∗−2

= 0.

Thus we may choose C1 sufficiently small such that

DpΦq∗

(C4C

12−q1

)q∗−2

≤ θ < 1, (3.48)

101

as then RII ≥ 11+θ

> 12. From this inequality and using Lipschitz continuity

of the residuals, |∆pα| ≤ |∆y |, we find that

∆pIIα ∆pα = RII(|∆pα|2 + ∆y∆pα

)− |∆pα|2

≥ 1

1 + θ∆y∆pα − (1− 1

1 + θ)|∆pα|2

≥ (2

1 + θ− 1)∆y∆pα.

This completes the proof.

Theorem 17. Let A and R be as in Theorem 16 with 32≤ q <∞. Suppose

that there is a sufficiently small constant C1, and a constant C2 such that forall yδ, (3.36) holds. Then the auto-regularisation condition (ARC-HR) holdsfor the Hanke-Raus rule.

Proof. As in the HD-case, we define a set of indices IHR with the property

n ∈ IHR ⇒ ∆pα∆y ≤ β1∆pIIα ∆pα and ∆y ≤ β2∆pα, (3.49)

n 6∈ IHR ⇒ |∆pα| ≤ β3|∆y | αθq,n

. (3.50)

Then, sufficient for (ARC-HR) is that a constant C2 exists with∑n6∈IHR

|∆y |2 α

θq,n≤ C2

∑n∈IHR

|∆y |2. (3.51)

This can be seen as follows:∑n∈N

(∆pα∆y − |∆pα|2

)≤∑n∈IHR

∆pα∆y +∑n6∈IHR

∆pα∆y

≤ β1

∑n∈IHR

∆pIIα ∆pα + β3

∑n6∈IHR

|∆y |2 α

θq,n

≤ β1

∑n∈IHR

∆pIIα ∆pα + β3C2

∑n∈IHR

|∆y |2 ≤ (β1 + β3C2β2β1)∑n∈IHR

∆pIIα ∆pα

≤ (β1 + β3C2β2β1)∑n∈N

∆pIIα ∆pα,

where we used that ∆pIIα ∆pα ≥ 0 in the last step. Hence (ARC-HR) followsfrom (3.51).Note that (3.36) has the form (3.51) with

IHR := n :θq,nα≤ C1,

and C1 sufficiently small. We have already shown that for such indices, ∆y ≤β2∆pα holds and for n 6∈ IHR, (3.50) holds. Moreover, from Lemma 13 it fol-lows that on IHR also ∆pα∆y ≤ β1∆pIIα ∆pα holds. Thus collecting these re-sults yields that (3.36) implies the auto-regularisation condition (ARC-HR).

102

The smallness condition on C1 is given by (3.48).

The Symmetric Quasi-Optimality Rule

Similar as for the Hanke-Raus rule, we first have to verify the nonnegativityof certain expressions:

Lemma 14. If q ≥ 32, then

(∆pα −∆pIIα )∆pIIα ≥ 0,

for all α ∈ (0, αmax) and y , yδ ∈ R.

Proof. Recall the mapping F : pα 7→ pIIα defined in the proof of Lemma 12.In order to prove the statement, it is enough to show that

((p1 − F (p1))− (p2 − F (p2)) (F (p1)− F (p2)) ≥ 0 ∀p1, p2.

It is not difficult to see that if F is monotone and Lipschitz continuous, thenthis inequality holds true. Thus, we have to prove that

0 ≤ F ′(p) ≤ 1, ∀p.

As in the proof of Lemma 12, we may employ the variable ζ ∈ [1, 2], wherealso the inequality 0 ≤ F ′(p) was verified for q ≥ 3

2. The additional condition

F ′(p) ≤ 1 leads to ζq∗−2 ≥ 1

2, which holds for any q∗ ≥ 1. This shows the

result.

Theorem 18. Let A and R be as in Theorem 16 with 32≤ q <∞. Suppose

that there are constants C1, C2, C3, with C1 sufficiently small, such that forall yδ and α ∈ (0, αmax)∑n:

θq,nα≥C1

|∆y |2 α

θq,n+

∑n:

θq,nα≤C1∩Ic2

|∆y |2 ≤ C2

∑n:

θq,nα

<C1∩I2

[θq,nα

] 1q−1

|∆y |2,

(3.52)

whereI2 =

n ∈ N : |∆pα −∆y | ≤ C3|∆pα −∆pIIα |

.

Then the auto-regularisation condition (ARC-SQR) holds.

Proof. We define an index set ISOR with the property that

n ∈ ISOR ⇒ |∆pα −∆y | ≤ β1|∆pα −∆pIIα | and |∆pα| ≤ β2|∆pIIα |. (3.53)

Then, for (ARC-SQR) it is sufficient to prove that∑n6∈ISOR

(∆y −∆pα)∆pα ≤ C∑

n∈ISOR

(∆pα −∆pIIα )∆pIIα . (3.54)

103

This can be seen as in the previous cases since the sum∑

n∈ISOR(∆y −∆pα)∆pα

can be bounded by∑

n∈ISOR(∆pα −∆pIIα )∆pIIα by the definition of ISOR and

the sum∑

n6∈ISORon the right can be bounded from below by 0 according to

Lemma 14.Now take

ISOR = IHR ∩ I2.

Similar as for the Hanke-Raus rule, the inequality |∆pα| ≤ β2|∆pIIα | holdstrue on IHR with C1 sufficiently small, thus ISOR satisfies the requirements(3.53). Thus, it remains to show that the stated condition (3.52) implies(3.54). The left-hand side in (3.54) can be bounded from above by∑n6∈ISOR

(∆y −∆pα)∆pα ≤∑n/∈IHR

(∆y −∆pα)∆pα +∑

nIHR∩Ic2

(∆y −∆pα)∆pα

≤∑n/∈IHR

|∆y |2 α

θq,n+

∑n∈IHR∩Ic2

|∆y |2,

where we used the estimates (3.39) on the complement of IHR and theLipschitz-continuity of the proximal mapping for the second sum:

(∆y −∆pα)∆pα ≤ |∆y ||∆pα| ≤ |∆y |2.

Thus, the left-hand side of (3.52) serves as an upper bound for the left-handside of (3.54).The sum on the right-hand side of (3.54) can be bounded from below asfollows: the summation index n is in ISQO ⊂ IHR, hence

(∆pα −∆pIIα )∆pIIα ≥ |∆pα −∆y ||∆pα| ≥ β1(∆pα −∆y)∆y

≥ β1|∆y |2(

∆pα∆y− 1

)≥ β1|∆y |2

1

1 +DpΦq∗

(y

λnηn

)q∗−2 − 1

= β1|∆y |2DpΦq∗

((θq,nα

) 12−q)q∗−2

1 +DpΦq∗

((θq,nα

) 12−q)q∗−2

≥ β1|∆y |2

C

1 +DpΦq∗

(C

12−q1

)q∗−2

(θq,nα

) 1q−1

,

where we used (3.41), (3.42), and a bound for z > 0 on ISQO of the form

Φq∗

(z

12−q

)q∗−2

≥ C ′z1q−1 ,

104

that can be obtained by similar means as above. Thus, the right-hand sideof (3.52) is a lower bound for the right-hand side of (3.54). Together, (3.52)implies (3.54) and thus the desired auto-regularisation condition.

Remark. The condition in (3.52) has an additional sum over the index setIHR ∩ Ic2 on the left-hand side. It might be possible to prove that this set isempty, e.g., if IHR ⊂ I2. Then the corresponding sum would vanish, and thishappens in the linear case (q = 2). However, we postpone a more detailedanalysis of this issue to the future.We also point out that the Muckenhoupt-type conditions (3.36), and (3.52)(except for the additional sum) agree with the respective ones for the linearcase q = 2 so that they appear, in fact, as natural extensions of the linearconvergence theory.

Case Study of Noise Restrictions

For the cases that the operator ill-posedness, the regularity of the exactsolution and the noise show some typical behaviour, we investigate the re-strictions that the Muckenhoupt-type condition (3.36) impose on the noise.In particular, we would like to point out that the restrictions are completelyrealistic and they are satisfied in paradigmatic situations.Consider a polynomially ill-posed problem, with a given decay of the exactsolution and a polynomial decay of the error:

λn =D1

nβ, |y | = D2

nν, ∆y = δsn

1

nκ,

for ν > β > 0, 0 < κ < ν, and sn ∈ −1, 1.The restrictions κ < ν, ν > β > 0 are natural as the noise is usually lessregular than the exact solution and the exact solution has higher decay ratesthan λn due to regularity. In the linear case, Muckenhoupt-type conditionslead to restrictions on the regularity of the noise, i.e., upper bounds for thedecay rate κ. This is perfectly in line with their interpretation as conditionsfor sufficiently irregular noise.In the following, we write ∼ if the left and right expressions can be estimatedby constants independent of n. (There may be a q-dependence, however).The numbers θq,n that appear in (3.36) now read as

θq,n := max|y |, |yδ|2−qλqn = max1, |yδ|/|y |q−2|y |2−qλqn

∼ 1

nβq+ν(2−q) max1, |1 +snδ

C2

nν−κ|2−q.

We additionally impose the restriction that for sufficiently large n, θq,n → 0monotonically. If 2− q > 0, this is trivially satisfied, while for 2− q < 0, werequire that

βq + κ(2− q) > 0, if 2− q < 0. (3.55)

105

Under these assumptions, for any α sufficiently small, we find an n∗ suchthat θq,n = C1α and θq,n ≤ C1α for n ≥ n∗. Expressing α in terms of θq,nyields a sufficient condition for (3.36), as

θq,n∗n∗∑n=1

|∆y |2

θq,n≤ C

∞∑n=n∗+1

|∆y |2 ∼ 1

n∗2κ−1. (3.56)

By the straightforward estimate max1, |1 + snδC2nν−κ| ∼ 1 + δnν−κ, the

inequality (3.56) reduces to

(1 + δn∗ν−κ)2−q

n∗βq+ν(2−q)−2κ

n∗∑n=1

nβq+ν(2−q)−2κ

(1 + δnν−κ)2−q ≤ Cn∗. (3.57)

For any x ≥ 0 and 0 ≤ z ≤ 1, it holds that

1 ≤ 1 + x

1 + zx≤ 1

z.

We use this inequality with z = nn∗

and x = δn∗. Then, we obtain thesufficient conditions

1

n∗βq+ν(2−q)−2κ−(2−q)(ν−κ)

n∗∑n=1

nβq+ν(2−q)−2κ−(2−q)(ν−κ) ≤ Cn∗, 2− q > 0,

1

n∗βq+ν(2−q)−2κ

n∗∑n=1

nβq+ν(2−q)−2κ ≤ Cn∗, 2− q < 0.

These inequalities are satisfied if the exponent for n is strictly larger than−1. This finally leads to the restrictions

κ ≤ β +1

q, q < 2,

κ ≤ q

2β +

2− qq

ν +1

2, q > 2.

Note that for q > 2, we additionally require (3.55).We hope to have the reader convinced that the imposed conditions on thenoise are not too restrictive and, in particular, the set of noise that satis-fies them is nonempty. These conditions provide a hint for which cases themethods may work or fail:In case 3

2≤ q ≤ 2, both the heuristic discrepancy and the Hanke-Raus are

reasonable rules. The conditions on the noise are less restrictive the smallerq is.In case 1 < q < 3

2, our convergence analysis only applies to the heuristic

discrepancy rule, as the nonnegativity condition of the Hanke-Raus rule is

106

not guaranteed in this case. It could be said that the heuristic discrepancyrule is the more robust one then.In case q > 2, we observe that the restriction on the noise depends on theregularity of the exact solution. For highly regular exact solutions (ν 1)the noise condition might fail to be satisfied as q becomes very large. Thishappens for both the heuristic discrepancy, and the Hanke-Raus rules.We did not include the quasi-optimality condition in this analysis as it stillrequires further analysis. However, the conditions for it are usually even morerestrictive than for the Hanke-Raus rules and we expect similar problems forthe case q > 2.

107

Chapter 4

Iterative Regularisation

In this chapter, we initially revert to treating linear ill-posed problems throughan iterative method, namely Landweber iteration (cf. [97]) which the readermight recall from Section 1.3.2. This presents an alternative approach toTikhonov regularisation which we covered quite thoroughly in the precedingchapters. In the second section of this chapter, we then cover the basics ofLandweber iteration for nonlinear forward operators.

4.1 Landweber Iteration for Linear Opera-

tors

Let A : X → Y be a linear operator mapping between Hilbert spaces. Thenrecall that we may solve the linear problem Ax = y via Landweber iteration(1.24), more specifically defined as the iterative procedure:

xδk = xδk−1 + ωA∗(yδ − Axδk−1), k ∈ N (4.1)

with an initial guess xδ0 := x0, which we take as x0 = 0, and a relaxation (i.e.,step-size) parameter ω ∈ (0, ‖A‖−2]. Note that without loss of generality, wemay assume ‖A‖ < 1 and thence drop the parameter ω.In terms of spectral theory, which we will use for our analysis as we did forTikhonov regularisation, the Landweber iterates may be expressed as

xk =

∫ ∞0

gk(λ) dEλA∗y, (4.2)

with the filter function this time defined as

gk(λ) :=k−1∑j=0

(1− λ)j,

i.e.,

xk =k−1∑j=0

(I − A∗A)jA∗y,

108

where the second equality follows from the geometric sum formula and ourassumption that ‖A‖ < 1. Incidentally, the filter function in (4.2) may alsobe expressed as

gk(λ) =1− (1− λ)k

λ, (0 < λ < 1)

and consequently we can write the filter function for the associated residualas

rk(λ) = (1− λ)k. (4.3)

We now state the following proposition from [35, Theorem 6.1, p. 155] whichproves that Landweber iteration is a convergent regularisation method:

Proposition 27. If y ∈ domA†, then xk → x† as k → ∞. If y /∈ domA†,then ‖xk‖ → ∞ as k →∞.

Proof. For y ∈ domA†, we have

xk − x† = rk(A∗A)x† = (I − A∗A)kx†,

which follows from the definition of the residual filter function, cf. (4.3).Moreover, due to our assumption that ‖A‖ < 1, we have that for all λ ∈ (0, 1):

|λgk(λ)| = |λk−1∑j=0

(1− λ)j| = |1− (1− λ)k| ≤ C,

andλgk(λ) = 1− (1− λ)k → 1,

as k →∞ for λ > 0. Hence,

gk(λ)→ 1

λ, (λ > 0)

as k →∞. The rest of the proof then follows a la [35, Theorem 4.1, p.72].

The observant reader will have noticed that the above proposition is merelya particular instance of Theorem 3.Now, in the next proposition, we provide the standard error estimates [35]:

Proposition 28. Assume that x† satisfies the source condition (2.5). Thenwe have

‖xδk − xk‖ ≤ C√kδ, ‖xk − x†‖ ≤ C(µ, ω)(k + 1)−µ,

‖A(xδk − xk)‖ ≤ 2δ, ‖Axk − y‖ ≤ C(µ, ω)(2µ+ k + 1)−µ−1√k,

and consequently

‖xδk − x†‖ ≤ C√kδ + C(µ, ω)(k + 1)−µ,

‖Axδk − yδ‖ ≤ δ + C(µ, ω)(2µ+ k + 1)−µ−1√k,

for all k ∈ N and y, yδ ∈ Y .

109

Proof. We begin by following the example of [35, Lemma 6.2, p. 156] by firstestimating the data propagation error as

‖xδk − xk‖ = ‖k−1∑i=0

(I − A∗A)iA∗(yδ − y)‖ ≤ ‖k−1∑i=0

(I − A∗A)iA∗‖‖yδ − y‖.

Moreover, recalling that Rk define our family of regularisation operators,the first term in the product can be estimated as

‖k−1∑j=0

(I − A∗A)iA∗‖2 = ‖Rk‖2 = ‖RkR∗k‖

= ‖k−1∑i=0

(I − A∗A)i(I − (I − A∗A)k)‖

= ‖k−1∑i=0

(I − A∗A)i −k−1∑i=0

(I − A∗A)i(I − A∗A)k‖

≤ ‖k−1∑i=0

(I − A∗A)i‖+ ‖k−1∑i=0

(I − A∗A)i‖‖(I − A∗A)k‖

≤ 2‖k−1∑i=0

(I − A∗A)i‖ ≤ 2k,

which allows us to prove the first estimate. We proceed to estimate theapproximation error [35, Theorem 6.5, p. 159]:

‖xk − x†‖2 = ‖(I − A∗A)kx†‖2 =

∫ 1+

0

(1− λ)k d‖Eλx†‖2

=

∫ 1+

0

λ2µ(1− λ)k d‖Eλω‖2 ≤(

µ

µ+ k

)2µ ∫ 1+

0

d‖Eλω‖2

=

(k + 1)−2µ‖ω‖2, if µ ≤ 1,

µ2µ(k + 1)−2µ‖ω‖2, if µ > 1.

The residual with exact data may be estimated as

‖Axk − y‖2 = ‖(I − AA∗)ky‖2 = ‖A(I − A∗A)kx†‖2

=

∫ 1+

0

λ2µ+1(1− λ)k d‖Eλω‖2

≤(

k

2µ+ 1 + k

)(2µ+ 1

2µ+ 1 + k

)2µ+1

‖ω‖2

= (2µ+ 1)2µ+1(2µ+ k + 1)−2µ−2k‖ω‖2.

110

And

‖A(xδk − xk)‖ = ‖Agk(A∗A)A∗(yδ − y)‖ = ‖AA∗gk(AA∗)(yδ − y)‖= ‖(I − (I − AA∗)k)(yδ − y)‖ ≤ δ + ‖(I − AA∗)k(yδ − y)‖ ≤ 2δ.

Finally, the residual with noise can be estimated as follows:

‖Axδk − yδ‖ = ‖(I − AA∗)kyδ‖ = ‖(I − AA∗)k(yδ − y + y)‖≤ ‖(I − AA∗)k(yδ − y)‖+ ‖(I − AA∗)ky‖≤ δ + C(µ, ω)(2µ+ k + 1)−µ−1

√k.

Note that the estimate for the total error also follows from a simple applica-tion of the triangle inequality.

Remark. We observe courtesy of the proposition above that, as opposed toTikhonov regularisation, Landweber iteration does not exhibit Holder typesaturation, i.e., with a Holder type source condition (2.5), Proposition 2 holdsfor all µ > 0. In other words, the qualification index is µ0 =∞ [35]. It shouldbe noted, however, that one may take a more general view of qualification ala [108], i.e., for all 0 ≤ λ ≤ αmax there exists a constant Cλ > 0 such that

inf0≤α≤αmax

|1− λgα(λ)|ρ(α)

≥ Cλ,

then ρ(α) is said to be the maximal qualification. Now, taking α ∼ 1k, [108]

gives the maximal qualification for Landweber iteration as ρκ(k) = e−κk,with a positive constant κ > 0, and consequently with saturation φ(δ) =

δ(log(

1δ

)) 12κ . Thus, one concludes that one of the most “powerful” regu-

larisations may not regularise optimally, even for mildly ill-posed problems(cf. [108]).

Remark. Note that in the absence of a source condition, one may still ob-tain estimates and subsequently convergence for the error components withrespect to the exact data. For instance,

‖Axk − y‖ ≤ ‖x1 − x†‖√k,

where x1 is the first Landweber iterate (cf. [35, p. 158]).

In light of the estimates above in the previous proposition, one may observethat the asymptotic behaviour of the function k 7→ ‖xδk−x†‖ is opposite thatof α 7→ ‖xδα − x†‖, where xδα minimises (2.1). That is, the data propagationerror is of the order of

√kδ which is the dominating term for large k and tends

“blows up” (i.e., tends to infinity) as k →∞. The approximation error is ofthe order of (k+ 1)−µ which is larger for small k. Thus, as stated previously,

111

the behaviour of the stopping index can be somewhat loosely related to theTikhonov regularisation parameter by α ∼ 1

k.

We now state the general convergence rates theorem for Landweber iteration[35]:

Theorem 19. If x† satisfies a source condition (1.17), then choosing k =

k(δ) = kopt := O(δ−

22µ+1

)yields that

‖xδα − x†‖ = O(δ

2µ2µ+1

),

as δ → 0.

Proof. From previous estimates of Proposition 28, we have

‖xk∗ − x†‖ ≤ ‖xδk − xk‖+ ‖xk − x†‖

= O(√

kδ + (k + 1)−µ),

and the result simply follows by bounding (k + 1)−µ ≤ k−µ, and the choiceof k = kopt.

4.1.1 Heuristic Stopping Rules

We may use a heuristic stopping rule to select k∗ according to (1.33) withthe functionals defined as in Chapter 1:

ψHD(k, yδ) =√k‖pδk‖ =

√k‖rk(AA∗)yδ‖,

ψHR(k, yδ) =√k〈pδ2k, pδk〉

12 =

√k‖r

32 (AA∗)yδ‖,

ψL(k, yδ) = 〈xδk, xδ2k − xδk〉12 = ‖A∗gk(AA∗)r

12k (AA∗)yδ‖,

ψQO(k, yδ) = ‖xδ2k − xδk‖ = ‖A∗gk(AA∗)rk(AA∗)yδ‖,

where xδ2k, pδ2k may be identified with the second iterates as explained in

Chapter 1; see (1.27).Recalling that the heuristic functionals may be expressed in the form

ψ2(k, yδ) =

∫ 1+

0

Φk(λ) d‖Fλyδ‖2,

we can, similarly as for Tikhonov regularisation, write the associated filterfunctions Φk : (0,∞)→ R for the heuristic rules as [83,91]:

Φk(λ) = kr2k(λ) = k(1− λ)2k, (HD)

Φk(λ) = kr3k(λ) = k(1− λ)3k, (HR)

Φk(λ) = λg2k(λ)rk(λ) =

1

λ

(1− (1− λ)k

) [(1− λ)k − (1− λ)2k

], (L)

Φk(λ) = λg2k(λ)r2

k(λ) =1

λ[(1− λ)k − (1− λ)2k]2. (QO)

112

The observant reader will note from the above equations that for Landweberiteration, one has that ψHD(k, yδ) ≈ ψHR(k, yδ). This was an observation alsomade in [59] (where both rules were originally introduced).Analogously as Proposition 9 of Chapter 2, we first prove that we can esti-mate the stopping index from above by its associated heuristic functional:

Proposition 29. Letk∗ = argmin

k∈Nψ(k, yδ),

with ψ ∈ ψHD, ψHR, ψQO, ψL and assume there exists a positive constantsuch that ‖yδ‖ ≥ C1 whenever ψ ∈ ψHD, ψHR and ‖A∗yδ‖ ≥ C2 wheneverψ ∈ ψL, ψQO. Then

ψHD(k, yδ) ≥ C√k∗ (1− ‖AA∗‖)k∗ ,

ψHR(k, yδ) ≥ C√k∗ (1− ‖AA∗‖)

32k∗ ,

ψL(k, yδ) ≥ C(1− ‖A∗A‖)12k∗ ,

ψQO(k, yδ) ≥ C(1− ‖A∗A‖)k∗ ,

for all yδ ∈ Y .

Proof. This follows identically as in the proof of Proposition 9. For instance,if ψ = ψHD, then since

‖yδ‖ = ‖(I − AA∗)−k(I − AA∗)kyδ‖ ≤ ‖(I − AA∗)−k‖‖(I − AA∗)kyδ‖,

one has that,

‖(I − AA∗)kyδ‖ ≥ ‖yδ‖‖(I − AA∗)−k‖

≥ C(1− ‖AA∗‖)k,

which follows from the observation that

‖(I − AA∗)−k‖ ≤ ‖(I − AA∗)−1‖k ≤ 1

(1− ‖AA∗‖)k,

where we used the Neumann series estimate ‖(I−AA∗)−1‖ ≤ (1−‖AA∗‖)−1

[139].Thus, combining the above yields

ψHD(k, yδ) =√k‖(I − AA∗)kyδ‖ ≥ C

√k(1− ‖AA∗‖)k,

from which we can derive the estimate k ≤ C ′(1− ‖AA∗‖)−2kψ2HD(k, yδ).

For ψ = ψHR, it follows in exactly the same way, observing that we maywrite ψHR(k, yδ) =

√k‖(I − AA∗) 3

2kyδ‖. For ψ = ψQO, it follows similarly.

113

In particular, since we may write

‖xδ2k − xδk‖ =

∥∥∥∥∥2k−1∑j=0

(I − A∗A)jA∗yδ −k−1∑j=0

(I − A∗A)jA∗yδ

∥∥∥∥∥=

∥∥∥∥∥2k−1∑j=k

(I − A∗A)jA∗yδ

∥∥∥∥∥ ,we thus have that

ψ2QO(k, yδ) = ‖xδ2k − xδk‖2 =

2k−1∑j=k

2k−1∑i=k

〈(I − A∗A)jA∗yδ, (I − A∗A)iA∗yδ〉

=2k−1∑j=k

2k−1∑i=k

〈(I − A∗A)i+jA∗yδ, A∗yδ〉

≥2k−1∑j=k

2k−1∑i=k

(1− ‖A∗A‖)i+j‖A∗yδ‖2

=

(1− (1− ‖A∗A‖)k

1− (1− ‖A∗A‖)

)2

(1− ‖A∗A‖)2k‖A∗yδ‖2,

where we have twice used the formula for the geometric sum. Now, since1− ‖A∗A‖ < 1, and

0 ≤ (1− ‖A∗A‖)k ≤ (1− ‖A∗A‖)=⇒ 1− (1− ‖A∗A‖)k ≥ 1− (1− ‖A∗A‖) = ‖A∗A‖,

it follows that1− (1− ‖A∗A‖)k

1− (1− ‖A∗A‖)≥ ‖A

∗A‖‖A∗A‖

= 1.

Hence,ψQO(k, yδ) ≥ (1− ‖A∗A‖)k‖A∗yδ‖,

yielding the desired estimate.

114

Finally, for ψ = ψL, it similarly follows that

ψ2L(k, yδ) = 〈xδk, xδ2k − xδk〉 =

⟨k−1∑j=0

(I − A∗A)jA∗yδ,2k−1∑i=k

(I − A∗A)iA∗yδ

⟩

≥k−1∑j=0

2k−1∑i=k

(1− ‖A∗A‖)j+i‖A∗yδ‖2

=

(k−1∑j=0

(1− ‖A∗A‖)j)(

2k−1∑i=k

(1− ‖A∗A‖)i)‖A∗yδ‖2

=

(1− (1− ‖A∗A‖)k

1− (1− ‖A∗A‖)

)(1− ‖A∗A‖)k

(k−1∑j=0

(1− ‖A∗A‖)j)‖A∗yδ‖2

=

(1− (1− ‖A∗A‖)k

1− (1− ‖A∗A‖)

)2

(1− ‖A∗A‖)k‖A∗yδ‖2

≥ C(1− ‖A∗A‖)k,

by similar arguments as we used for the ψQO estimate, which completes theproof.

Proposition 30. Let the source condition (2.5) be satisfied. Then for allψ ∈ ψHD, ψHR, ψL, ψQO, there exist positive constants such that

ψ(k, y − yδ) ≤ C√kδ, and ψ(k, y) ≤ Cµ(k + 1)−µ,

for all k ∈ N and y, yδ ∈ Y .

Proof. In case ψ = ψHD, it follows immediately that

ψHD(k, y − yδ) =√k‖(I − AA∗)k(y − yδ)‖ ≤ 2

√kδ,

and

ψHD(k, y) =√k‖Axk − y‖ ≤ C(2µ+ k + 1)−µ−1k

≤ C(k + 1)−µ−1(k + 1)

= C(k + 1)−µ.

For ψ = ψHR, the estimate follows similarly as for the HD rule following theobservation that

ψHR(k, y − yδ) ≤√k‖(I − AA∗)

32k‖‖(y − yδ)‖ ≤ C

√kδ,

and

ψHR(k, y) =√k‖Ax 3

2k − y‖ ≤ C

(2µ+

3

2k + 1

)−µ−1

k

≤ C

(3

2k + 1

)−µ−1

(k + 1) ≤ Cµ

(k +

2

3

)−µ≤ Cµ(k + 1)−µ,

115

with Cµ =(

23

)−µ−1.

For ψ = ψQO, note that we may write

ψQO(k, y) = ‖A∗gk(AA∗)rk(AA∗)y‖≤ ‖(AA∗)

12 gk(AA

∗)12‖‖gk(AA∗)

12‖‖rk(AA∗)y‖

≤ C√k‖pk‖ ≤ C(k + 1)−µ,

since |λgk(λ)| ≤ C and |gk(λ)| ≤ k for all λ ∈ (0, 1) and due to similarconsiderations as above. Furthermore,

ψQO(k, y − yδ) = ‖A∗gk(AA∗)rk(AA∗)(y − yδ)‖≤ ‖(AA∗)

12 gk(AA

∗)12‖‖rk(AA∗)‖‖gk(AA∗)

12‖‖y − yδ‖

≤√kδ,

which follows from the previous recollections and that |rk(λ)| ≤ C for all0 < λ < 1.For ψ = ψL, the upper bounds follow similarly as

ψL(k, y) = ‖A∗gk(AA∗)rk(AA∗)12y‖ = ‖A∗Agk(A∗A)rk(A

∗A)12x†‖

≤ ‖A∗Agk(A∗A)‖‖rk(A∗A)12x†‖ ≤ C

(k

2+ 1

)−µ= C · 2µ(k + 2)−µ

≤ Cµ(k + 1)−µ,

and also,

ψL(k, y − yδ) ≤ ‖(AA∗)12 gk(AA

∗)12‖‖rk(AA∗)

12‖‖gk(AA∗)

12‖‖y − yδ‖

≤ C√kδ,

due to similar estimates as before, which completes the proof.

Note that the Muckenhoupt condition (2.17) remains identical with that usedfor Tikhonov regularisation when we consider the relationship α ∼ 1

k; that

is, e ∈ Np if there exists a positive constant such that

1

kp

∫ ∞1k


∫ 1k

0

λp−1 d‖Fλe‖2, (4.4)

for all k ∈ N.

Proposition 31. Assume that k > 1 and

e = y − yδ ∈

N1, if ψ ∈ ψHD, ψHR,N2, if ψ ∈ ψQO, ψL.

116

Then there exists a positive constant such that

‖xδk − xk‖ ≤ Cψ(k, y − yδ),

for all k ∈ N [83, 91].

Proof. First we estimate the data propagation error as

‖xδk − xk‖2 =

(∫ 1k

0

+

∫ 1+

1k

)λg2

k(λ) d‖Fλ(yδ − y)‖2.

We may bound the integrand of the first integral from the estimates:

|λg2k(λ)| ≤ Ck2λ, and |λg2

k(λ)| ≤ C|gk(λ)| ≤ Ck,

and similarly, since λg2k(λ) = λ−1(1−(1−λ)), it follows that the integrand of

the second integral can be bounded from above by λ−1, since 1−(1−λ)k < 1,where both the aforementioned estimates follow in fact for all λ ∈ (0, 1).Thus, we have

‖xδk − xk‖2 ≤ Ck2

∫ 1k

0

λ d‖Fλ(yδ − y)‖2 +

∫ 1+

1k

λ−1 d‖Fλ(yδ − y)‖2, (4.5)

≤ Ck

∫ 1k

0

d‖Fλ(yδ − y)‖2 +

∫ 1+

1k

λ−1 d‖Fλ(yδ − y)‖2. (4.6)

In case p = 2, the second term in (4.5) may be bounded by the first term.Whilst for p = 1, it follows similarly, i.e., the first term in (4.6) may beestimated by the second term.Hence, for each ψ-functional, similarly as for Tikhonov regularisation, itsuffices to bound the ψ-functional acting on the noise y − yδ by the firstterm in either (4.6) or (4.5); that is,

ψ(k, y − yδ) ≥ Ckp∫ 1

k

0

λp−1 d‖Fλ(y − yδ)‖2, (4.7)

where p = 1 for ψ ∈ ψHD, ψHR and p = 2 for ψ ∈ ψQO, ψL as per thestatement of the proposition.For ψ = ψHD, we have

ψ2HD(k, y − yδ) = k‖(I − AA∗)k(y − yδ)‖2 ≥ k

∫ 1k

0

(1− λ)2k d‖Fλ(y − yδ)‖2

≥ Ck

∫ 1k

0

d‖Fλ(y − yδ)‖2(4.4)

≥∫ 1+

1k

λ−1 d‖Fλ(y − yδ)‖2,

117

where the second last inequality follows from the fact that for 0 < λ ≤ 1k, we

have

(1− λ)2k ≥[(1− 1

k)k]2

≥ C2, (k > 2) (4.8)

which is what we wanted to show.Similarly for ψ = ψHR, we have

ψHR(k, y − yδ) = k〈(I − AA∗)2k(y − yδ), (I − AA∗)k(y − yδ)〉

≥ k

∫ 1k

0

(1− λ)2k(1− λ)k d‖Fλ(y − yδ)‖2

(4.8)

≥ Ck

∫ 1k

0

d‖Fλ(y − yδ)‖2(4.4)

≥∫ 1+

1k

λ−1 d‖Fλ(y − yδ)‖2,

by the same token.In case ψ = ψQO,

ψQO(k, y − yδ) ≥∫ 1

k

0

λg2k(λ)r2

k(λ) d‖Fλ(y − yδ)‖2

≥ Ck2

∫ 1k

0

λ d‖Fλ(y − yδ)‖2,

which follows from the fact that for 0 < λ ≤ 1k, we have

gk(λ) ≥ k(1− (1− λ)k

)≥ Ck,

since 1− λ ≤ C < 1 for λ > 0, and

rk(λ) = (1− λ)k ≥ C > 0,

which follows from the fact that 1−λ > 0 for all λ < 1, and that yields (4.7),as desired.Finally, for ψ = ψL,

ψL(k, y − yδ) ≥∫ 1

k

0

λg2k(λ)rk(λ) d‖Fλ(y − yδ)‖2

≥ Ck2

∫ 1k

0

λ d‖Fλ(y − yδ)‖2,

which follows similarly as for the quasi-optimality rule above.

We now state the following convergence rates theorem, partly courtesy of [83]:

118

Theorem 20. Without loss of generality, assume ‖A‖ < 1 and suppose x†

satisfies (1.17). Let k∗ be chosen as in (1.33) with ψ ∈ ψHD, ψHR, ψQO, ψL.Furthermore, assume that there exist positive constants such that ‖yδ‖ ≥ C1

and ‖A∗yδ‖ ≥ C2 whenever ψ ∈ ψHD, ψHR and ψ ∈ ψQO, ψL, respectively.We also assume that

e = y − yδ ∈

N1, if ψ ∈ ψHD, ψHR,N2, if ψ ∈ ψL, ψQO.

Then

‖xδk∗ − x†‖ =

O((−W

(Cδ

4µ2µ+1

))−µ), ψ ∈ ψHD, ψHR,

O((− log (δ))−µ

), ψ ∈ ψL, ψQO,

as δ → 0, where W is Lambert’s W -function.

Proof. We can approximate

‖xδk∗ − x†‖ = O

(‖xδk∗ − xk∗‖+ ‖xk∗ − x†‖

)= O

(ψ(k∗, y − yδ) + (k∗ + 1)−µ

)= O

(ψ(k, yδ) + ψ(k∗, y) + (k∗ + 1)−µ

)= O

(ψ(k, y − yδ) + ψ(k, y) + ψ(k∗, y) + (k∗ + 1)−µ

). (4.9)

For ψ = ψHD, it follows from Propositions 30 and 31 that

‖xδk∗ − x†‖ = O

(√kδ + (k + 1)−µ + (k∗ + 1)−µ

). (4.10)

Thus, from Proposition 29, we may recall

Ck∗(1− ‖AA∗‖)2k∗ ≤ ψ2HD(k, yδ) ⇐⇒ Cϕ(k∗) ≤ ψ2

HD(k, yδ),

with ϕ(x) := x(1 − ‖AA∗‖)2x. If we assume that ‖AA∗‖ is not sufficientlysmall (which may always be achieved through appropriate scaling), then ϕis a decreasing function for x > 1. Thus,

k∗ ≥ ϕ−1(Cψ2

HD(k, yδ)),

with

ϕ−1(y) =W (2 log(1− ‖AA∗‖)y)

2 log(1− ‖AA∗‖)=|W (2 log(1− ‖AA∗‖)y)||2 log(1− ‖AA∗‖)|

,

since both numerator and denominator are negative, where W representsLambert’s W -function (cf. [26]), which is negative for negative arguments.Hence,

k∗ ≥ O(|W (−Cψ2

HD(k, yδ))|),

119

as δ → 0, where we have replaced 2 log(1−‖AA∗‖) by −C < 0, where C is apositive constant, as it does not depend on any variable. Therefore, we canestimate the (k∗ + 1)−µ term in (4.10) by k−µ∗ , and then with the above, get

‖xδk∗ − x†‖ = O

(√kδ + (k + 1)−µ + |W (−Cψ2

HD(k, yδ))|)−µ)

= O(√

kδ + (k + 1)−µ +∣∣∣W (

−C(√kδ + (k + 1)−µ)2

)∣∣∣−µ)= O

(∣∣∣W (−Cδ

4µ2µ+1

)∣∣∣−µ) = O((−W

(−Cδ

4µ2µ+1

))−µ),

as δ → 0, where we have once again used that W is negative for negativearguments and that |w| = −w for any w < 0. The third equality hold sincefor all z ∈ (0, 1), it follows that W (−Cz2) ∈ (−1, 0), which, in turn, impliesthat

z < 1 < |W (−Cz2)|−µ,

for all µ > 0.Note that for ψ = ψHR, since k∗ = O

(W (CψHR(k, yδ))

)due to Proposition 29

with a slightly different constant to what we had for the ψHD case, we getthe same rates as for the HD rule, since the estimates in Proposition 30 aremore or less identical.Similarly, for ψ = ψQO, from (4.9), we have

‖xδk∗ − x†‖ = O

(√kδ + (k + 1)−µ + (k∗ + 1)−µ

). (4.11)

Recalling the estimates from Proposition (29):

C(1− ‖A∗A‖)k∗ ≤ ψQO(k, yδ) ⇐⇒ Cφ(k∗) ≤ ψQO(k, yδ),

with φ(x) := (1 − ‖A∗A‖)x. Since φ−1(y) = log(y)/ log(1 − ‖A∗A‖), wherewe note that the denominator is negative (since 1− ‖A∗A‖ < 1) , we have

k∗ ≥ Clog(ψQO(k, yδ))

log(1− ‖A∗A‖)= C| log(ψQO(k, yδ))|| log(1− ‖A∗A‖)|

.

Therefore, we can estimate the (k∗ + 1)−µ term in (4.11) by k−µ∗ , and thenwith above, get

‖xδk∗ − x†‖ = O

(√kδ + (k + 1)−µ +

∣∣∣log(√kδ + (k + 1)−µ)

∣∣∣−µ)= O

(∣∣∣log(√

kδ + (k + 1)−µ)∣∣∣−µ) = O

(∣∣∣log(δ

2µ2µ+1

)∣∣∣−µ)= O

(|log (δ)|−µ

)= O

((− log(δ))−µ

),

120

as δ → 0. Note that the second equality holds by the following logic: for allz ∈ (0, 1), we have that there should exist a positive constant such that

z ≤ C| log(z)|−µ ⇐⇒ z(− log(z))µ ≤ C. (4.12)

Let f(z) := z(− log z)µ be a function. Then it is clear that f → 0 as z → 0,f(1) = 0, f > 0 on (0, 1) and f ∈ C1[(0, 1)]. Moreover, its maximum isattained at some z0 ∈ (0, 1), which satisfies

f ′(z0) = 0.

That is to say,(− log z0)µ − µ(− log z0)µ−1 = 0,

i.e.,− log(z0) = µ ⇐⇒ z0 = e−µ.

Thus, the maximum of f is

f(z0) = z0(− log z0)µ = e−µµµ =(µe

)µ.

Hence, (4.12) holds for z ∈ (0, 1) if and only if(µe

)µ≤ C,

i.e., for

C = Cµ =(µe

)µ.

Finally, since the estimates for ψL are nigh on identical to those of ψQO, theproof is also analogous.

Remark. The above theorem thus exemplifies that “infinite” qualificationleads to slow convergence rates (cf. [83]). In particular, we see that the con-vergence is much lower than the rates for Tikhonov regularisation (cf. Corol-lary 3). Thus, one may conclude that for heuristic rules, saturation is in factbeneficial for faster convergence rates.

In order to get optimal convergence rates, we must once again, as in Chap-ter 2, prove that we may bound the approximation error from above by theψ-functionals, as we did in Lemma 1 in Chapter 2, courtesy of a regularitycondition on the solution. Note, however, that these would be somewhatmore restrictive than, e.g., (2.26) for Tikhonov regularisation [83]. However,this is beyond the scope of this thesis and therefore we end our treatment oflinear ill-posed problems with Landweber iteration here, with the suboptimalconvergence rates theorem above. The aforementioned optimal convergencerates results with the required regularity conditions may be found in [83]. In

121

particular, the said regularity condition is of the form: there exists an indexfunction φ : R+ → R+ such that∫ 1

k

0

r2k(λ) d‖Eλx†‖2 ≤ φ

(∫ 1+

1k

r2k(λ) d‖Eλx†‖2

). (4.13)

For further details, we point the reader in the direction of the aforementionedreference.

4.2 Landweber Iteration for Nonlinear Oper-

ators

Now we consider the case in which the forward operator is nonlinear. Notethat, as far as the author’s knowledge is concerned, the theory of heuristicrules in this instance has not been developed whatsoever and indeed, thereare no further ground-breaking results presented here either. Instead, weconjecture some possible heuristic parameter choice rules which are based onthe ones we highlighted earlier for linear ill-posed problems and stated in theforthcoming paper [77].Let X and Y be Hilbert spaces, as in the previous section. Then, whenF : domF ⊂ X → Y is a nonlinear operator (which we also assume iscontinuously Frechet differentiable), i.e., we have

F (x) = y, (4.14)

where we have noisy data yδ satisfying

‖yδ − y‖ ≤ δ, (4.15)

(4.1) may be replaced by

xδk = xδk−1 + ωF ′(xk−1)∗(yδ − F (xδk−1)

), k ∈ N (4.16)

where F ′(xk−1) denotes denotes the Frechet derivative of F at xk−1, with aninitial guess xδ0 = x0 (cf. [35, 58,81]).Note that since the argument of F ′(·)∗ in (4.16) changes at each iterationstep, the sequence xδk is not guaranteed to remain within a certain invari-ant subspace (e.g., in range((A∗A)µ) as in the preceding section); therefore,rather restrictive additional conditions must be imposed in order to deriveconvergence rates (cf. [35]).A typical source condition (cf. [35]) one considers for nonlinear Landweberiteration is

x† − x0 = F ′(x†)ω. (ω ∈ Y ) (4.17)

122

A restrictive condition, of which we mentioned there would be before, is theso-called tangential cone condition:

‖F (x)− F (x)− F ′(x)(x− x)‖ ≤ η‖F (x)− F (x)‖, η <1

2, (4.18)

for x, x ∈ B2ρ(x0) ⊂ domF (cf. [35, 81]), which is a condition on the non-linearity of F . The debate on whether this condition is too strong to besatisfied in many practical situations is one which rages on, yet there aresome results which show that it is indeed satisfied in certain realistic exam-ples, e.g., [76, 82].Furthermore, if proving convergence with respect to the discrepancy princi-ple, the parameter τ in (1.29) has to satisfy

τ > 21 + η

1− 2η≥ 2 , (4.19)

where the factor 2 can be slightly improved by an expression depending onη which tends to 1 as η → 0, thereby recovering the optimal bound in thelinear case [56]. If, in addition, the step-size ω is chosen small enough suchthat, locally, one has

ω‖F ′(x)‖2 ≤ 1 , (4.20)

then Landweber iteration combined with the discrepancy principle (1.29)converges to x∗ as the noise level δ goes to 0 [58, 81]. Furthermore, conver-gence to the minimum-norm solution x† can be established given that in asufficiently large neighbourhood, it holds that ker(F ′(x†)) ⊂ ker(F ′(x)). Inorder to prove convergence rates, in addition to source conditions furtherrestrictions on the nonlinearity of F are necessary, see (4.18), cf. [35, 81].Although the tangential cone condition (4.18) holds for a number of differentapplications (see e.g. [82] and the references therein), and despite the fact thatseveral attempts have been made to replace it by more general conditions [85],these can still be difficult to prove for specific applications. Furthermore, evenif the tangential cone condition can be proven, the exact value of η typicallyremains unknown. Since this also renders condition (4.15) impractical, theparameter τ in the discrepancy principle then has to be chosen manually.Since in the linear case the theory implies that τ close to 1 is optimal, popularchoices include τ = 1.1 or τ = 2. These work well in many situations, but arealso known to fail in others (compare with Section 8 below). In any case, thisshows that for practical applications involving nonlinear operators, informed“heuristic” parameter choices remain necessary even if the noise level δ isknown.

4.2.1 Heuristic Parameter Choice Rules

As mentioned, to the best of the author’s knowledge, there is no convergenceanalysis available for heuristic rules in the case of nonlinear Landweber iter-

123

ation within the current spectrum of scientific literature. One may, however,postulate that the parameter choice rules represented by the ψ-functionalsof Section 4.1.1 may be applied to the nonlinear Landweber case with theappropriate adjustments. Certainly, in Section 8, we include the results fromthe upcoming paper [77], in which a numerical study of the aforementionedψ-functionals was done, where they were defined as:

ψHD(k, yδ) :=√k‖F (xδk)− yδ‖ ,

ψHR(k, yδ) := k〈yδ − F (xδ2k), yδ − F (xδk)〉 ,

ψQO(k, yδ) := ‖xδ2k − xδk‖ ,ψL(k, yδ) := 〈xδk, xδ2k − xδk〉 ,

(4.21)

with xδk, xδ2k ∈ X defined by (4.16).

124

Part II

Numerics

125

Foreword In order to illustrate the theory of the preceding sections, weconduct various numerical tests spanning a plethora of different settings fromthe previous chapters. The outline of this part of the thesis is as follows: inChapter 5, we examine the actual performance of the semi-heuristic rules ofSection 2.3 and compare them to standard heuristic rules and also to an a-posteriori rule for Tikhonov regularisation in the context of weakly boundednoise. For a comparison of heuristic rules for Tikhonov regularisation in thestandard setting with the presence of (strongly) bounded noise, we refer thereader to [53]. Note that the numerics and results are from the paper [51].Thereafter, in Chapter 6, we present the results of [90] which demonstratethe effectiveness of certain novel and already known heuristic rules for convexTikhonov regularisation. We include the results of [91] which showcase thenumerical performance of the ψL and ψLR rules against the more classical L-curve rule of Hansen and also the quasi-optimality rule in Chapter 7. Finally,we test some heuristic rules for a couple of examples for nonlinear Landweberiteration in Chapter 8. Note that at the end of each section, we shall providea brief summary detailing the concluding observations.The manner in which the numerics are assessed varies between the chapters.For instance, Chapter 5 is the only one in which we compute the ratio of therelative error with respect to each heuristic rule and the optimal error, andsubsequently plot a heat diagram. We leave the details as ambiguous for nowas they will be explained in the relevant section. In Chapter 6, however, weaway with the aforementioned heat plots and merely compute the total errorfor each noise level and plot alongside exemplary functional plots; similarly inChapter 8. In Chapter 7, there are no plots, but results displayed in tabularform.Typical issues which arise in the numerical implementation of the ψ-functionalbased heuristic rules are most often associated with the mandatory discreti-sation as will be detailed in the proceeding chapters.

126

Chapter 5

Semi-Heuristic Rules

In this chapter, we revert to the setting of Section 2.3 and demonstrate thenumerical performance of the semi-heuristic rules discussed there. As in thatsection, all of the results are taken from [51].We pit the various modified functionals ψ against one another, and also thegeneralised discrepancy principle, see, e.g. [103], for comparison’s sake, ona series of test problems. We provide two types of experiments: one withrandom operator noise and the other with a smooth operator perturbation.Note that heuristic rules can fail in the case of smooth errors that do notsatisfy a noise condition. Thus, a smooth operator perturbation is the mostcritical case for heuristic rules, and, as we will observe, the semi-heuristicmethods will prove to be more effective in that case.For each of the proposed rules, we compute the relative error with respectto the selected regularisation parameter α∗ and the error obtained by thetheoretically optimal choice of α by

Errrel :=‖xδα∗,η − x

†‖‖x†‖

and Erropt :=‖xδαopt,η − x

†‖‖x†‖

,

respectively. Furthermore, we denote the ratio of these errors by

Errper :=‖xδα∗,η − x

†‖‖xδαopt,η − x†‖

.

Note that in our simulations, we are afforded the luxury of knowing x†,thereby allowing us to minimise the error and compute αopt and Erropt.For the standard heuristic rules, we search for α ∈ [λmin, ‖A‖2], where λmin

is the minimum eigenvalue of the matrix A∗A. (However, if λmin is below10−14, then we choose αmin = 10−14 to avoid numerical instabilities). Insome cases of large operator noise, the heuristic rules selected α∗ = αmax;thus in this situation, we select the parameter corresponding to the smallestinterior local minimum. For the semi-heuristic rules, however, we restrict oursearch to the interval [γ, αmax], where γ = O(η) as above. Furthermore, in

127

each experiment, we have scaled the operator and the exact solution so that‖A‖ = 1 and ‖x†‖ = 1.

5.1 Gaußian Operator Noise Perturbation

5.1.1 Tomography Operator Perturbed by Gaußian Op-erator.

In this experiment, we use the tomo package from Hansen’s RegularisationTools [61] to define the finite-dimensional operator (i.e., matrix) Aη ∈ Rn×n,where Aη = A + C∆A, with A the tomography operator, which is a pene-tration of a two dimensional domain by rays in random directions. We userandom Gaußian distributed operator noise, i.e., ∆A ∈ Rn×n is a matrix withrandom entries. The data noise is defined as δ = C‖ε‖, where ε ∈ Rn is aGaußian distributed noise vector.In the following configuration, we set n = 625 and f = 1, according toHansen’s Tools.We provide a dot plot, namely Figure 5.1, in which we compare the errorErrrel according to the relative error function for each parameter choice ruleand for 100 different realisations of data errors and operator perturbationswith values of δ and η ranging from 1% to 10%. Each asterisk in the plotcorresponds to the relative error, Errrel, for a realisation of operator and datanoise.Note that “SH1” and “SH2” in Figure 5.1 refer to the modified rules withη‖xδα,η‖, cf. (SH1), and η/

√α, cf. (SH2), as compensating functionals, re-

spectively. Recall that the standard rules (in blue) correspond to the semi-heuristic rules with D = 0 and search for a parameter α in the interval[λmin, ‖A‖2]. The last row in the plot is then the dot plot of the relative errorfor the optimal choice of α, namely eopt. In each row, the green circles rep-resent the median of the respective relative errors over the 100 realisations.We see that the semi-heuristic rules present a noticeable improvement for allparameter choice rules, although the discrepancy in performance seems tobe slightly more pronounced for the quasi-optimality and Hanke-Raus rules.One may also observe that the generalised discrepancy principle (cf. (2.64)),marked in the first row of Figure 5.1, is the worst performing.We also compare the difference between the values of Errper with respectto the modified parameter choice rule and its unmodified counterpart, re-spectively as a percentage. For example, for any configuration of data andoperator noise, we would compute the value

ϑ(δ, η) = (Errper − Errper)× 100, (5.1)

where Errper and Errper denote the error ratio for the standard heuristicrule (i.e., D = 0 and αmin = λmin) and the modified rule (SH1) or (SH2),

128

respectively. This value is computed for several noise-levels δ and operatorerror levels η. Note that positive values indicate that the semi-heuristic rulesoutperform their heuristic counterparts and vice versa.The plots of Figure 5.4 indicate that the semi-heuristic rules do not nec-essarily offer improvements for small data and operator noise, but exhibitincreased performance for larger noise of both aforementioned varieties. Inparticular, this is more pronounced for the quasi-optimality rule where wemay observe blotches of dark red which indicate significant improvement overthe standard heuristic rule.The standard heuristic rules also performed reasonably well and a possibleexplanation could be the argumentation for using the compensating func-tional was based on the regularity of the operator noise and therefore it isprobable that the irregularity of the operator noise in this scenario did notaid the premise of using the modified rules.

5.2 Smooth Operator Perturbation

5.2.1 Fredholm Integral Operator Perturbed by HeatOperator

To simulate a deterministic, possibly smooth, operator perturbation, we firstconsider the Fredholm integral operator of the first kind perturbed by a heatoperator, which we think is an instance where the noise condition for Aηmight fail and where a semi-heuristic modification is highly advisable.For the implementation, we use the baart and heat packages on Hansen’sRegularisation Tools [61] to define the finite dimensional operator Aη ∈ Rn×n,with n = 400, where Aη = A + C∆A is the superposition of the baart

operator and scaled heat operator, respectively. More precisely, the baart

operator is the discretisation of a Fredholm integral equation of the first kindwith kernel K1 : (s1, t1) 7→ exp(s1 cos t1), where s1 ∈ [0, π/2], t1 ∈ [0, π], andthe heat operator is taken to be the Volterra integral operator with kernelK2 : (s2, t2) 7→ k(s2 − t2), where

k(t) :=t−

32

2√π

exp

(− 1

4t

),

for t ∈ [0, 1]. The exact solution is given by y(s) = 2 sin s/s and the datanoise is defined as before.We proceed similarly as in the previous experiment. In Figure 5.2, we observethat the best performing rule is in fact the semi-heuristic quasi-optimalityrule (SH1). The semi-heuristic variants of the Hanke-Raus and heuristicdiscrepancy rules are also improvements on the original rules, although thisis slightly more pronounced for the semi-heuristic Hanke-Raus rules. As in

129

the previous experiment, the generalised discrepancy rule performs worst; al-though, in this scenario, this is even more pronounced as the average relativeerror (which is not visible in the plot) is 4.1627. Note that the exact solu-tion in this experiment is smooth and we hypothesised that the well-knownsuboptimality (i.e., saturation) of the discrepancy principle (see, e.g., [35])was a possible cause. Indeed, when we reran the experiment using a lesssmooth (piecewise constant) solution, the results turned out to be similar asin the previous experiment; thus, failure of the discrepancy rule here is dueto the saturation effect. Note that a-posteriori rules which do not exhibit thesaturation effect include the modified discrepancy principle and monotoneerror rule (cf. [134]).In Figure 5.5, the plots for the heuristic discrepancy and Hanke-Raus rulesdemonstrate that the semi-heuristic rules offer an overall improvement forall ranges of operator and data noise. However, we observe that the semi-heuristic quasi-optimality rules performs slightly worse for small data and op-erator noise, but exhibit much better performance when both the mentionednoises are larger. Additionally, one may also observe that the semi-heuristicHanke-Raus rules perform significantly better than their standard heuristiccounterparts for very large operator noise.

5.2.2 Blur Operator Perturbed by Tomography Oper-ator

In a next experiment, we again simulate a deterministic operator perturba-tion by considering the blur operator from Hansen’s tools and perturbingit by the tomography operator from before. For the blur operator, we setband = 8 and sigma = 0.9, which is modelled by the Gaußian point spreadfunction:

h(x, y) =1

2πσ2exp

(−x

2 + y2

2σ2

).

In Figure 5.3, we observe as before that the semi-heuristic rules exhibit im-provements over their standard counterparts for the heuristic discrepancyand Hanke-Raus rules, although the standard quasi-optimality rule performsquite well and in this case, its semi-heuristic variants do not necessarilypresent a better choice. The generalised discrepancy rule performs similarlyas in the other experiments.In Figure 5.6, it is difficult to draw any meaningful conclusions, althoughit seems that for large operator noise and reasonable data noise, the semi-heuristic discrepancy and Hanke-Raus rules perform better than the standardheuristic rules. Consequently, one may conclude that for many situations,the semi-heuristic rules offer an improvement on their standard heuristiccounterparts.

130

Note that in all experiments, the minimiser in the range [λmin, αmax] of thestandard heuristic functionals was occasionally αmax; particularly when theoperator noise was large. We rectified this failure by the interior minimasearch as described above. Had we not rectified this failure, the improvementof the semi-heuristic methods would have been even greater pronounced.

5.3 Summary

The numerical experiments above confirm that the semi-heuristic methodsmay yield an improvement over the standard parameter choice rules in agreat many situations. In case one does not have knowledge of the operatornoise level, then we recommend that one uses the standard heuristic rules asthey also perform quite well in many situations. The numerical experiments,at least in our setup, showed little benefit of additional knowledge of the datanoise level. Incidentally, the optimal choices of D and γ presents room forfurther research.

131

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Relative Error

Optimal

HD

HD SH1

HD SH2

HR

HR SH1

HR SH2

QO

QO SH1

QO SH2

GDP

Relative Error Plot

Figure 5.1: Tomography operator per-turbed by random operator: D = 600 forSH1, D = 0.05 for SH2 and γ = 0.005× η.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Relative Error

Optimal

HD

HD SH1

HD SH2

HR

HR SH1

HR SH2

QO

QO SH1

QO SH2

GDP

Relative Error Plot

Figure 5.2: Fredholm operator of the firstkind perturbed by heat operator: D = 600for SH1, D = 0.12 for SH2 and γ = 0.07×η.

0 0.1 0.2 0.3 0.4 0.5 0.6

Relative Error

Optimal

HD

HD SH1

HD SH2

HR

HR SH1

HR SH2

QO

QO SH1

QO SH2

GDP

Relative Error Plot

Figure 5.3: Blur operator perturbed by tomography operator: D = 500 forSH1, D = 0.2 for SH2 and γ = 0.01× η.

132

Semi-Heuristic Discrep. 1

0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)

Semi-Heur. Hanke-Raus 1

0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)

Semi-Heur. Quasi-Opt. 1

0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)

-100 0 100 200 300 400

Performance of Modified Rule vs Unmodified Rule (%)

Figure 5.4: Tomography operator perturbed by random operator: set-upidentical to Figure 5.1. Red indicates that the semi-heuristic rules performbetter than their standard heuristic counterparts and vice versa.

133


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)

0 50 100 150 200


Figure 5.5: Fredholm operator of the first kind perturbed by heat operator:set-up identical to Figure 5.2. Red indicates that the semi-heuristic rulesperform better than their standard heuristic counterparts and vice versa.

134


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)


0.17 0.46 1.29 3.59 10

δ (%)

0.17

0.46

1.29

3.59

10

η (

%)

-120 -100 -80 -60 -40 -20 0 20 40


Figure 5.6: Blur operator perturbed by tomography operator: set-up identi-cal to Figure 5.3. Red indicates that the semi-heuristic rules perform betterthan their standard heuristic counterparts and vice versa.

135

Chapter 6

Heuristic Rules for ConvexRegularisation

In this chapter, we illustrate and verify the theory of Chapter 3; in particular,the rules based on the functionals ψHD ,ψHR, ψSQO and ψRQO as defined inSection 3.2. We omit ψLQO as this performed far and away the most poorly inpreliminary testing and this was also the observation in [86]. These numericsare from [90] and there, the ψL functional as defined in Section 3.2 was nottested. However, the reader should not despair, as in Chapter 7, one will findnumerical tests for ψL.It should be stressed that the convergence analysis of that chapter providesan understanding of the behaviour of the rules, but there exist further factorsthat influence the actual quality of the results using heuristic rules (e.g., foroptimal-order results (see Theorem 6 in Chapter 2), a regularity conditionon x† is often required (see, e.g., (2.26)) and the value of the constants inthe estimates are important). It should be noted as well that the resultsof Chapter 3 only proved sufficient conditions for convergence which doesnot necessarily mean that the methods fail when the conditions are violated.(For instance, by including δ-dependent estimates, one may find weaker con-vergence conditions that hold for certain ranges of the noise level). Still, apreliminary understanding can be gained from the numerical experiments inthis section.We remind the reader once more that all the results of this chapter are basedon [90]. In all experiments, we consider the discretised space Rn and for theregularisation functional, R = 1

q‖ · ‖q`q with q to be specified and R = TV

defined below. Due to the discretisation, we opt to choose the parameterα∗ ∈ [αmin, αmax]. We also choose to select αmax = ‖A‖2 (apart from for TVregularisation below), and for the more tricky issue of the lower bound, weset αmin = σmin, the smallest singular value of A∗A, (again excluding the TVregularisation case, for which we fix αmin = 10−9). Other methodologies forselecting αmin were suggested in [51,84].

136

A difficult issue in the parameter selection procedure is how global min-ima appearing at the boundary, i.e., α∗ = αmin or α∗ = αmax, should betreated. For linear regularisation methods, usually only the lower bound atαmin is an issue, which can be handled with the above mentioned techniques.However, for the present convex regularisation case, we observed several in-stances (in particular for `1-penalties), where the global minimum was at theright boundary at αmax leading to a suboptimal choice of α∗, which is quitean unusual phenomenon (in the linear case). We treated this issue through“brute force” by explicitly excluding boundary minima and selecting α∗ fromthe set of interior minima, even when the boundary values of ψ would havesmaller values. Only when the set of interior minima is empty do we considerboundary minima again.Additionally, we always rescale the forward operator and exact solution sothat ‖A‖ = ‖x†‖ = 1. In each experiment, we compute 10 different real-isations of the noise in which the noise level is logarithmically increasing.We also compute the error (the measure for which will differ for the variousregularisation schemes) induced by each parameter choice rule, as well as theoptimal parameter choice, which will be computed as the minimiser of therespective error functional itself. Moreover, we include plots of the total errorat each noise level for all considered scenarios (left-hand side plots), and wealso provide a supplementary plot of the parameter choice functionals (right-hand side plots) for an example noise level which is specified in the error plotvia a dotted vertical line. Note that in case we plot the ψ-functionals forδ = 10%, the aforementioned line will not be visible due to it lying on theboundary of the error graph.For our operators, we use the tomography operator (tomo) from Hansen’sRegularisation Tools (cf. [61]) with n = 625 and f = 1. We also define a diag-onal matrix A ∈ Rn×n, initially with eigenvalues λi = C 1

iβand subsequently

also with λi = C exp(−iβ) which simulate mildly and severely ill-posed prob-lems, respectively. In either case, we take an exact solution x† = C · si 1

iνand

data perturbed by noise ei = CN (1, 0) 1iκ

, where si ∈ −1, 1 are randomand set the parameters as n = 20, β = 4, ν = 2 and κ = 1.

6.1 `1 Regularisation

A particularly interesting application of convex variational Tikhonov regu-larisation is the case in which q = 1, since it is sparsity enforcing. In fact, itis the most sparsity enforcing regularisation method whilst still remaining aconvex minimisation problem.Significant work in the area of sparse regularisation includes [20,48,101,132].Whilst it does not fit with the Muckenhoupt-type conditions we derived ear-lier, it is nevertheless an interesting regularisation scheme for the practitionerwho would be eager to see the performance of the studied rules. Note that in

137

this case, we minimise the Tikhonov and Bregman functionals using FISTA(cf. [12]). The corresponding proximal mapping operator is the soft thresh-olding operator. In this experiment, we use the tomography operator definedabove.The solution x† in our experiment is chosen to be sparse in a custom builtmanner. For each parameter choice rule, we compute the error as

Err`1(α∗) = ‖xδα∗ − x†‖`1 .

We may observe in Figure 6.1 that, for smaller noise levels, the heuristic dis-crepancy rule appears to be the best performing, while for larger noise levels,the right quasi-optimality rules performs best. The two worst performers arethe symmetric QO and HR rules, where the latter bucks this trend for smallnoise.In Figure 6.2, we display a plot of the surrogate functionals ψ for a specific ex-ample. In particular, the Hanke-Raus, symmetric, and right quasi-optimalityfunctionals exhibit some erroneous negative values which hamper their per-formances. Indeed, in this example, negative values were frequently observedfor the symmetric QO rule, less so for the HR rule and less still for the rightQO rule. This issue will be discussed later on.


An interesting case for the purposes illustrating our theory is when q = 32,

since the respective proximal mapping operator for computing the minimiser(3.1) has a closed form solution which is easily computed. In this scenario, weuse the diagonal operator defined above, in the foreword of this chapter, withthe given parameters and we compute the error with the Bregman distance;namely,

Err`q(α∗) = Dξ(xδα∗ , x

†), q ∈ (1,∞).

We firstly consider the mildly ill-posed case and observe in Figure 6.3 thatthe Hanke-Raus rule is the best performing one in case that the noise levelis relatively small, although for mid-range noise levels, the heuristic dis-crepancy rule performs slightly better and for larger noise levels still, thequasi-optimality rules match the heuristic discrepancy rule. Note that thequasi-optimality rules appear indistinguishable in this plot and we remarktoo that the plots of their respective functionals were very similar (see Fig-ure 6.4).The relatively poor performance of the quasi-optimality rules may be ex-plained by the observation in Figure 6.4 that the selected minimisers of thequasi-optimality functionals are suboptimal. If the other local (instead ofglobal) minima were selected (e.g. those left of α = 10−2), then the resultswould be much improved. This is a common phenomenon in many of our

138

experiments involving the diagonal operator with q = 32. We observe that

the HD and HR functionals oscillate as well, although in Figure 6.4, at least,the “correct” minimisers were chosen.In the severely ill-posed setting, we notice in Figure 6.5 that the heuristicrules display a mixed performance. Observe in Figure 6.6 that the global min-imum of the quasi-optimality functionals appears to be the “wrong” choiceas in the previous experiment. Indeed, the optimal parameter choice almostcoincides with a local minimum of the quasi-optimality functionals. We alsosee in this example plot that the HD and HR rules grossly overestimate theparameter. A similar pattern was observed for other considered noise levels.


Based on the Muckenhoupt-type conditions of Chapter 3, we postulated thatfor q > 2, the parameter choice rules we consider are likely to face mishaps.Consequently, we have elected to run a numerical experiment with q = 3 inorder to illustrate what happens in practice. As in the previous experiment,we consider the diagonal operator and compute the error induced by theparameter choice rules as before.Firstly considering the mildly ill-posed case, we see in Figure 6.7 that all therules appear to perform very well. In Figure 6.8, we examine an examplecase in which the heuristic discrepancy would have selected α∗ = αmax, beingits global minimum, were it not for our method which effectively disqualifiesit.For the severely ill-posed setting, we observe in Figure 6.9 that the rulesperform very well in general, with the Hanke-Raus rule being the worst andthe quasi-optimality rules being the best of the bunch.

6.4 TV Regularisation

Selecting

R(x) := supφ∈C∞0 (Ω;Rn)‖φ‖∞≤1

∫Ω

x(t) div φ(t) dt,

with div denoting the divergence and Ω ⊂ Rn an open subset, yields totalvariation (TV) regularisation [138]. For the numerical treatment and forfunctions on the real line, this is often discretised as R =

∑‖∆x‖`1 with a

(e.g., forward) difference operator ∆. For our numerical implementation, weused the FISTA algorithm [12] with the proximal mapping operator for thetotal variation functional being computed using a fast Newton-type method,courtesy of the code provided by [5,6].

139

Note that in this case, we choose αmax such that ‖xδα∗‖ ≤ C for a reasonableconstant. Moreover, for each parameter choice rule, we compute the errorvia the so-called “strict metric”:

ErrTV(α∗) = |R(xδα∗)−R(x†)|+ ‖xδα∗ − x†‖`1 ,

which was suggested in [86]. In this instance, we consider the tomographyoperator.One may observe in Figure 6.11 that the right quasi-optimality rule ap-pears to overall be the best performing one, although for smaller noise levels,the heuristic discrepancy rule seems to be preferable. The Hanke-Raus andsymmetric quasi-optimality rules are generally the worst performing with atendency to exhibit negative values; see, e.g., Figure 6.12.In Figure 6.12, we display a plot of the ψ-functionals for one particular exam-ple in which we can see a demonstration of the symmetric quasi-optimalityfunctional’s proneness to exhibiting negative values due to numerical errors.In this particular plot, we see that even when the Hanke-Raus functionalis non-negative, it overestimates the parameter. Recall that the same phe-nomenon of negativity was observed for `1 regularisation, cf. Figure 6.2, al-though on this occasion, the right QO rule is more well behaved.The existence of negative values is a numerical artefact as all ψ-functionalsare, by definition, non-negative. The symmetric QO functional appears to bethe most susceptible of them. The negative values are only occurrent for `1

and TV penalties. Hence, we conjecture that this could be related to the lackof strict convexity of the penalty functional R. The negative values are likelycaused by numerical cancellation and perhaps more importantly, approxima-tion errors in the numerical computation of the minimiser of the Tikohnovfunctional. Indeed, when we calculate the minimiser more accurately, thennegative values are less occurrent.

6.5 Summary

To summarise the numerical experiments presented above, we remark thatthe rules worked reasonably well, even in instances contrary to the expecta-tions set by the theory. We observed that while none of the studied parameterchoice rules were completely immune to mishaps, the heuristic discrepancyrule could perhaps be said to be the most robust overall with the right quasi-optimality rule often presenting itself as the better choice for medium tolarge noise levels. The symmetric quasi-optimality rule, on the other hand,appears to be less reliable (cf. Sections 6.1 and 6.4) and prone to numericalerrors especially for non-strict convex penalties. For such cases, one shoulduse other methods instead. The Hanke-Raus rule is not a stellar performereither. An important issue that seems to influence the actual performance is

140

that the surrogate functionals ψ sometimes do not estimate the error preciseenough, for instance, when a “wrong” local minimum is selected. In the lin-ear theory this issue can be related to the lack of certain regularity conditionsfor x† (e.g.,(2.26) in Chapter 2) being satisfied [83], however, for the presentconvex case no such analysis is available yet, which, also in light of the aboveexperiments, makes it difficult to offer a particular recommendation for arule.

141

10-2 10-1 100 101

δ (%)

10-3

10-2

10-1

100

To

tal E

rro

r

Opt. Param.HD RuleHR RuleSym. QO RuleRight QO Rule

Figure 6.1: Error plots Err`1(α∗) for dif-ferent rules and optimal choice: `1 regu-larisation, tomo operator.

10-8 10-6 10-4 10-2

α

10-6

10-4

10-2

100

102

|| xδα

- x ||1

ψHD

(α,yδ)

ψHR

(α,yδ)

ψSQO

(α,yδ)

ψRQO

(α,yδ)

Figure 6.2: Plot of ψ-functionals: `1 regu-larisation, δ = 0.02%.

10-3 10-2 10-1 100 101

δ (%)

100

To

tal E

rro

r


Figure 6.3: Error plots Err`q(α∗) for dif-

ferent rules and optimal choice: `32 regu-

larisation, mildly ill-posed.

10-4 10-2 100 102

α

10-2

10-1

100

Dξ(xδ

α,x)

ψHD

(α,yδ)

ψHR

(α,yδ)

ψSQO

(α,yδ)

ψRQO

(α,yδ)

Figure 6.4: Plot of ψ-functionals: `32 reg-

ularisation, δ = 0.06%.

10-3 10-2 10-1 100 101

δ (%)

100

To

tal E

rro

r


Figure 6.5: Error plots Err`q(α∗) for differ-

ent rules and optimal choice: `32 , severely

ill-posed.

10-4 10-2 100 102

α

10-2

100

102

104

Dξ(xδ

α,x)

ψHD

(α,yδ)

ψHR

(α,yδ)

ψSQO

(α,yδ)

ψRQO

(α,yδ)

Figure 6.6: Plot of ψ-functionals: `32

severely ill-posed, δ = 1.3%.

142

10-3 10-2 10-1 100 101

δ (%)

10-3

10-2

10-1

100

101

102

To

tal E

rro

r


Figure 6.7: Error plots Err`q(α∗) for dif-ferent rules and optimal choice: `3 regu-larisation, mildly ill-posed

10-4 10-2 100 102

α

101

102

103

104

105

Dξ(xδ

α,x)

ψHD

(α,yδ)

ψHR

(α,yδ)

ψSQO

(α,yδ)

ψRQO

(α,yδ)

Figure 6.8: Plot of ψ-functionals: `3 regu-larisation, δ = 10%

10-3 10-2 10-1 100 101

δ (%)

10-3

10-2

10-1

100

101

102

To

tal E

rro

r


Figure 6.9: Error plots Err`q(α∗) for differ-ent rules and optimal choice: `3, severelyill-posed

10-4 10-2 100 102

α

100

102

104

106

Dξ(xδ

α,x)

ψHD

(α,yδ)

ψHR

(α,yδ)

ψSQO

(α,yδ)

ψRQO

(α,yδ)

Figure 6.10: Plot of ψ-functionals: `3

severely ill-posed, δ = 10%

10-3 10-2 10-1 100 101

δ (%)

10-2

10-1

100

101

102

To

tal E

rro

r


Figure 6.11: Error plots ErrTV(α∗) for dif-ferent rules and optimal choice: TV regu-larisation, tomo operator

10-8 10-6 10-4 10-2

α

10-3

10-2

10-1

100

101

Error

ψHD

(α,yδ)

ψHR

(α,yδ)

ψSQO

(α,yδ)

ψRQO

(α,yδ)

Figure 6.12: Plot of ψ-functionals: TVregularisation, δ = 0.06%

143

Chapter 7

The Simple L-curve Rules forLinear and Convex TikhonovRegularisation

In this chapter, we essentially present the results of [91] which was the paperthat introduced the ψL and ψLR functionals for minimisation based heuristicregularisation as an alternative to the more commonly known L-curve rule ofHansen [60]. We therefore pit these rules against one another for both linearand convex Tikhonov regularisation, and also include the quasi-optimalityrule for reference. Moreover, these rules get their own separate chapter sincethey are the most recently suggested rules and thus, were not compared inprevious papers preceding [91].The numerical tests are performed with the following noise-levels 0.01%,0.1%, 1%, 5%, 10%, 20%, 50%. Here, the first two are classified as “small”,the second pair as “medium” and the last triple is classified as “large”. Foreach noise-level, 10 experiments were done. Moreover, as stated, the methodsin question are: ψL (simple-L), ψLR (simple-L ratio), the QO-method, andthe original L-curve method defined by maximising the curvature.A general observation was that whenever the L-curve showed a clear corner,then the selected parameter by both ψL and ψLR was very close to that corner,which confirms the idea of those methods being simplifications of the L-curvemethod (for the derivations, see Section 2.1.1). Note, however, that closenesson the L-curve does not necessarily mean that the selected parameter is closeas well since the parameterisation around the corner becomes “slow”.We compare the four methods, namely, the two new simple-L rules, the QO-rule, and the original L-curve, according to their total error for the respectiveselected α and calculate the ratio of the relative error to the optimal error:

Errper(α∗) =d(xδα∗ , x

†)

d(xδαopt, x†)

, (7.1)

as we computed in Section 5, where one would typically compute Errper with

144

Table 7.1: Tikhonov Regularisation, Diagonal Operator: Median of Ratio(7.1) of errors rules over 10 runs.

simple-L simple-L rat. QO L-curves = 2, µ = 0.25δ small 1.02 1.02 1.03 9.49δ medium 1.01 1.02 1.08 1.78δ large 1.79 1.06 1.18 1.15δ = 50% 1.97 3.64 1.42 1.46s = 2, µ = 0.5δ small 1.48 1.48 1.01 50.68δ medium 1.66 1.72 1.07 3.78δ large 1.78 1.59 1.01 2.52δ = 50% 3.09 5.07 1.48 1.90s = 2, µ = 1δ small 3.88 3.88 1.07 77.12δ medium 2.01 2.01 1.07 7.98δ large 1.57 1.66 1.08 2.33δ = 50% 2.97 4.07 1.27 1.32

d(x, y) := ‖x− y‖ for the case of linear regularisation.

7.1 Linear Tikhonov Regularisation

We begin with classical Tikhonov regularisation, in which case we computethe regularised solution as (2.1).

7.1.1 Diagonal Operator

At first we consider a diagonal operator A with singular values having poly-nomial decay: σi = i−s, i = 1, . . . n, for some value s and consider an exactsolution also with polynomial decay 〈x†, vi〉 = (−1)ii−p. The size of the di-agonal matrix A ∈ Rn×n was chosen as n = 500. Furthermore we addedrandom noise (coloured Gaußian noise) 〈e, ui〉 = δi−0.6ei, where ei are stan-dard normally distributed values.Table 7.1 displays the median of the values of Errper over 10 experiments withdifferent random noise realisations and for varying smoothness indices µ. Thetable provides some information about the performance of the rules. Basedon additional numbers not presented here, we can state some conclusions:

• The simple-L and simple-L ratio outperform the other rules for smallsmoothness index µ = 0.25 and small data noise. Except for very large

145

Table 7.2: Tikhonov Regularisation, Rotational Blur Operator: Median ofRatio (7.1) of errors rules over 10 runs.

simple-L simple-L rat. QO L-curveδ small 1.29 1.29 74.04 9.43δ medium 1.05 1.09 1.19 2.59δ large 1.02 1.01 1.10 1.02δ = 50% 1.00 65.21 1.05 1.00

δ, the simple-L ratio is slightly better than the simple-L curve. Forvery large δ, the simple-L method works but is inferior to QO whilethe simple-L ratio method fails then.

• For high smoothness index, the QO-rule outperforms the other rulesand it is the method of choice then.

• The original L-curve method often fails for small δ. For larger δ itworks often only acceptably. Only in situations when δ is quite large(> 20%) we found several instances when it outperforms all other rules.

A similar experiment was performed for a higher smoothing operator bysetting s = 4 with similar conclusions. We note that the theory has indicatedthat for µ = 0.5, the simple-L curve is order optimal without any additionalcondition on x† while for the QO-rule this happens at µ = 1 (see Theorem 6 inChapter 2). One would thus expect that the simple-L rule perform better forµ = 0.5. However, this was not the case (only for µ ≤ 0.25) and the reasonis unclear. (We did not do experiments with an x† that does not satisfythe regularity condition (2.27), though). Still, the result that the simple-Lmethods perform better for small µ is backed by the numerical results.

7.1.2 Examples from IR Tools

For the next scenario, we consider a rotational blurring operator from IRTools [42], namely PRblur which outputs a sparse operator (which we choseas 1024×1024) and seek to reconstruct the satellite image solution provided inthe package. The data is corrupted with white Gaußian noise which is chosensuch that ‖e‖/‖x†‖ yields the relative noise level. Note that the operator andsolution are normalised such that ‖A‖ = ‖x†‖ = 1 and our parameter searchis restricted to the interval α ∈ [10−10, ‖A‖2]. Similarly as for the previousexperiment, in Table 7.2, we record the median of the values of Errper over10 different experiments with varying random noise realisations.Note that the simple-L rules outperform the other ones in the majority ofcases. However, the margin of improvement compared to the QO rule is not

146

Table 7.3: Tikhonov Regularisation, Tomography Operator: Median of Ratio(7.1) of errors rules over 10 runs.

simple-L simple-L rat. QO L-curveδ medium 1.36 1.36 4.32 1.68δ large 1.38 1.39 1.20 2.71δ = 50% 2.37 1.99 1.63 4.00

large. We observed that for small noise, the QO rule often overestimates theoptimal parameter.All rules performed quite well, and the L-curve method in particular showednoticeable improvement as the noise level increased. The simple-L ratiomethod failed for 50% noise, however. In [91, Theorem 2.13], a smallnesscondition was required for the noise in order to prove convergence rates forthe simple-L rules. Thus, the numerical results indicate that this does indeedseem to be necessary.Whilst using the IRtools package, we encountered an occurring problem,especially for the tomography operator (to be discussed), whereby the errorand functional plots did not display the typical shape one would expect. Inparticular, the stability error did not “blow up” as α→ 0. Similar problemshave been encountered for the L-curve method which fails to show a clearcorner point. The theoretical explanation for this appears to be that thetomography operator in IRtools is not sufficiently ill-conditioned, i.e., thesingular values do not decay sufficiently fast, which is a point regarding thatpackage which we would like to emphasise! Note that the Muckenhoupt-type condition may not be satisfied in this case which is a cause for possiblefailure of the heuristic rules. Interesting to note is that this phenomenais not occurring for the Hansen’s Regularisation Tools problems [61] fromChapters 5 and 6.We consider now the tomography operator from the IR Tools package, i.e.,PRtomo with the true solution being a Shepp Logan phantom. The operatorin question is of size 8100 × 1024. In Table 7.3, one can find a record ofthe median values of Errper for 10 different realisations of each noise level,where we omit the “small” noise case in this scenario as the problem is notsufficiently ill-conditioned so that for small noise levels, α∗ = αmin yields thebest choice. We also subsequently switch the 1% noise level consideration for2%. (We revert back to 1% noise level in the proceeding scenarios).In this scenario, the L-curve method actually performed worse for largernoise and most often chose α∗ = αmin. All rules, however, were guilty ofquite often underestimating the optimal parameter, which is a result of thelack of a sufficiently high ill-posedness mentioned in the paragraph above.

147

7.2 Convex Tikhonov Regularisation

We now investigate the heuristic rules for convex Tikhonov regularisation,i.e., we consider xδα as the minimiser of the functional (3.1) with a non-quadratic penalty R. Henceforth, the simple-L methods will consist of min-imising the functionals as defined in (3.17), i.e.,

ψL(α, yδ) = R(xIIα,δ)−R(xδα), ψLR(α, yδ) :=R(xIIα,δ)−R(xδα)

R(xδα),

which define the simple-L and simple-L ratio functionals, respectively.Note that one can also consider a discrete-type functional:

ψL(α) = R(xδαn+1)−R(xδαn), α = α0q

n q < 1,

as an alternative “convexification” of the simple L-curve method, but theformerly stated method yielded more fruitful results in our experiments andwe therefore opted to stick with that.

7.2.1 `1 Regularisation

To begin with, we consider R = ‖ · ‖1 and the rotational blur operatoras before (of the same size as our earlier configuration too), but this timewe would like to reconstruct a sparse solution x† and therefore we opt forthe sppattern solution from the IRtools package which is a sparse image ofgeometric shapes. We choose Gaußian white noise as before, corresponding tothe respective noise levels. Note that we compute a minimiser via FISTA [12].In this case, we measure the error with the `1 norm, i.e., we compute Errper

with d(x, y) := ‖x− y‖`1 .In our experiments, we observed that the values of the aforementioned simple-L functionals were particularly small, therefore on occasion yielding negativevalues due to numerical errors. This problem was easily rectified however bytaking the absolute value of (3.15), which is theoretically equivalent to theoriginal functional in any case. For the quasi-optimality functional, we optedto use the so-called right quasi-optimality rule, which we have defined in thisthesis as the standard quasi-optimality rule (3.15) (cf. [90]). For selectingthe parameter according to the L-curve method of Hansen, maximising thecurvature via (2.16) is no longer an implementable strategy as R is now non-smooth. However, it was still possible to compute the corner point due tothe discretisation of the problem. In Table 7.4, one may find a recording ofthe results.As mentioned already, the simple-L functionals produced very small valuesand therefore were somewhat oscillatory, i.e., they were prone to exhibitingmultiple local minima. Our algorithm selected the smallest interior min-imum, but in some plots, we observed that there were larger local minima

148

Table 7.4: `1 Regularisation, Rotational Blur Operator: Median of Ratio(7.1) of errors rules over 10 runs.


which would have corresponded to a more accurate estimation of the optimalparameter. It should be noted that for medium noise, the L-curve was quite“hit & miss” and for larger noise, quite unsatisfactory.


Continuing with the theme of convex Tikhonov regularisation and morespecifically `

32 regularisation, we now consider (3.1) with R = ‖ · ‖ 3

2. The

considered forward operator A : `32 (N) → `2(N) is a diagonal operator with

polynomially decaying singular values as considered previously i.e., σi = i−s

and we also consider a solution with polynomial decay⟨x†, vi

⟩= (−1)ii−p

and add random noise 〈e, ui〉 = δi−0.6ei. The size of the operator in ques-tion is 625 × 625. Note that in this scenario, we are easily able to computethe Tikhonov solution and second Bregman iterate as we have a closed formsolution of the associated proximal mapping operator; see Chapter 3 and [90].A table of results is compiled in Table 7.5 and the following observations arenoted:

• Barring the quasi-optimality rule, all methods were generally subparin case of small noise for all tested smoothness indices. In general, thequasi-optimality rule would appear to be the best performing overallat least, although trumped on a few occasions.

• The “sweet spot” for both simple-L methods appears to be medium tolarge noise. Overall, at least, they appear to perform marginally betterfor smaller smoothness indices. The original L-curve method performsquite well for larger noise, as has been observed in other experiments,but the margin for error is quite large for smaller noise levels.

149

Table 7.5: `32 Regularisation, Diagonal Operator: Median of Ratio (7.1) of

errors rules over 10 runs.

simple-L simple-L rat. QO L-curves = 2, µ = 0.25δ small 6.59 6.59 1.02 471.03δ medium 1.95 1.95 1.18 31.22δ large 1.10 1.10 1.07 1.11δ = 50% 1.13 1.21 1.11 1.22s = 2, µ = 0.5δ small 14.41 14.41 1.00 8.91δ medium 2.05 2.05 1.01 115.00δ large 1.09 1.09 1.03 1.15δ = 50% 4.72 5.61 1.01 1.80s = 2, µ = 1δ small 20.51 20.51 1.46 4.29δ medium 1.36 1.36 1.33 107.77δ large 1.14 1.14 1.40 1.34δ = 50% 7.06 9.28 1.08 1.53

7.4 TV Regularisation

We now suppose that xδα is the minimiser of (3.1) with R = TV the totalvariation seminorm. The functional is minimised using FISTA with the prox-imal mapping operator for the total variation seminorm being computed bya fast Newton-type method as in [90]. In this case, we compute the errorwith respect to α via the so-called strict metric

dstrict(xδα, x

†) := |R(xδα)−R(x†)|+ ‖xδα − x†‖`1 ,

which was suggested in, e.g., [86], and we subsequently record the values ofErrper with d = dstrict, the results of which are provided in Table 7.6. Theoperator in question is the tomography one arising from PRtomo with thesame configuration as before. We add white Gaußian noise, correspondingto the respective noise levels.We note the following observations: all the functionals were oscillatory, ex-hibiting local minima which were much more pronounced compared to thelinear case. This oscillatory behaviour is often a cause for selection of a falseparameter, cf. the subpar results in Table 7.6. An inspection of this tablereveals that the simple-L ratio method appears to be the best performingoverall which we also observed in other experiments involving TV regulari-sation not recorded here.

150

Table 7.6: TV Regularisation, Tomography Operator: Median of Ratio (7.1)of errors rules over 10 runs.


7.5 Summary

To summarise the numerical results presented above, the simple-L methodsare near optimal for linear Tikhonov regularisation in case of low smoothnessof the exact solution. Moreover, the simple-L rule in particular edges thesimple-L ratio rule, but the margin of difference is small and only apparentfor larger noise levels. We also considered convex Tikhonov regularisation forwhich the simple-L functionals had to be adapted from their original forms.In any case, they were successfully implemented and demonstrated abovesatisfactory results. Interesting to note however, was that in this setting, thesimple-L ratio method appeared to present itself as the slightly superior ofthe two variants, especially for TV regularisation where it also outperformedthe RQO rule, in general.

151

Chapter 8

Heuristic Rules for NonlinearLandweber Iteration

Note that this section is taken entirely from the as of yet unpublished preprint[77], which the author of this thesis is a coauthor of, but for which thenumerics are largely credited to Simon Hubmer and Ekaterina Sherina. Weintroduce a couple of test problems in which we evaluate the performance ofthe heuristic stopping rules described in (4.21). We opt to present “classical”examples from the aforementioned paper, although, there, one will find agreater expanse of problems including, e.g., integral equations, tomography,and parameter estimation. For the problems we treat here, we provide ashort review and describe their precise mathematical setting in Section 8.1below.For each of the problems in question, we start from a known solution x†,which defines exact data y. Random noise corresponding to different noiselevels δ is subsequently added to y to simulate noisy data yδ. The step-size ω for Landweber iteration (4.16) is computed via (4.20) based on anumerical estimate of ‖F ′(x†)‖. Afterwards, we run Landweber iterationfor a predetermined number of iterations kmax, which is chosen manuallyfor each problem via a visual inspection of the error, residual, and heuristicfunctionals, such that all important points of the experiment are capturedfor this comparison.Following each application of Landweber iteration, we compute the values ofthe heuristic functionals, as well as their corresponding minimisers k∗. Foreach of the different heuristic rules, we then compute the resulting absoluteerror

Err(k∗) = ‖xδk∗ − x†‖ ,

and as in the previous chapters, for comparison, we also compute the optimalstopping index

kopt := argmink∈N

‖xδk − x†‖ ,

152

together with the corresponding optimal absolute error. Furthermore, wealso compute the stopping index kDP determined by the discrepancy principle(1.29), which can also be interpreted as the “first” minimizer of the functional

φDP(k) := |‖F (xδk)− yδ‖ − τδ| .

As noted in Section 4.2, since the exact value of η in (4.18) is unknown for ourtest problems, a suitable value for τ has to be chosen manually. Dependingon the problem, we use either one of the popular choices τ = 1.1 or τ = 2,although as we are going to see below, these are not necessarily the “optimal”ones. In any case, the corresponding results are useful reference points tothe performance of the different heuristic rules.Let us point out at least two peculiarities that we encounter when applyingheuristic rules to nonlinear Landweber regularisation.

• The convergence theory for nonlinear inverse problems is a local one,i.e., one can only prove convergence when the initial guess is sufficientlyclose to the exact solution x†, and in the case where noise is present,the iteration usually diverges out of the neighbourhood of the solutionx† as k → ∞. In particular, it may also happen that xδk “falls” outof the domain of the forward operator. Consequently, the functionalsψ(k, yδ) in (4.21) may not be defined for very large k. By definition,however, one would have to compute a minimiser over all k, which isthen not practically possible. Note that this is in contrast to Tikhonovregularisation (see Chapter 2), where solution is always well-defined forany α. In practice, a remedy is the introduction of an upper limit for thenumber of iterations up to which the functional ψ(k, yδ) is computed.

• The second issue concerns the fact that the approximation error (i.e.,(1.36)) is quite often poorly estimated by ψ, only for the first few itera-tions. For practical considerations, the consequence is that the ψ(k, yδ)functionals typically exhibit a local minimum for small k. This hap-pens quite often, even in the linear case. However, this local minimumis rarely the global one, which usually appears much later for largerk, although an inexperienced practitioner may be tempted to take thislocal minimum for the global one in order to save having to computefurther iterations. Note that the underlying reason for this local min-imum may be explained (at least in the linear case) from the analysisfound in [83]. That is, in order to estimate the approximation error re-liably, (i.e., for (1.36)) some regularity conditions for x† have to be sat-isfied (see (4.13)). These conditions are more restrictive for Landweberiteration than for Tikhonov regularisation and usually hold for smalliteration numbers k, and typically with “bad” constants.

Note that due to the above considerations, we consequently discard the first

153

few iterations (with regard to selecting a global minimiser) in our experi-ments.

8.1 Test problems

8.1.1 Nonlinear Hammerstein Operator

A typical nonlinear inverse problem [58, 74, 75, 121–123] which is used fortesting, in particular, the behaviour of iterative regularisation methods, isbased on so-called nonlinear Hammerstein operators of the form

F : H1[0, 1]→ L2[0, 1] , F (x)(s) :=

∫ 1

0

k(s, t)φ(x(t)) dt .

Here, we look at a special instance of this operator, namely

F (x)(s) :=

∫ s

0

x(t)3 dt ,

for which it is known (see for example [123]) that the tangential cone condi-tion (4.18) holds locally around a solution x†, given that it is bounded awayfrom zero. Furthermore, the Frechet derivative and its adjoint, which areneeded for the implementation of Landweber iteration, are easy to computeexplicitly.In our experiments here, we discretise with n = 128, and the exact solutionis x†(s) = 2 + (s − 0.5)/10. As initial guess, we take x0 = 1 and for thediscrepancy principle, opt for τ = 2. We conduct experiments for noise levelsδ ∈ (0.1, 2) with step-size 0.1.In Figure 8.1, we observe that in general, none of the rules perform particu-larly well, including the a-posteriori rule: the discrepancy principle. Amongstthe heuristic rules, at least, the heuristic discrepancy and Hanke-Raus rulesappear to perform the best for lower noise levels, but for noise levels over 1.2%perform the worst, where the Hanke-Raus rule exhibits this phenomenon tothe greatest degree. The quasi-optimality and simple-L rules maintain aconsistent performance across the entire tested range of δ.in Figure 8.2, we see example plots corresponding to the 1% noise levelcase. In this particular instance, we observe that the quasi-optimality andsimple-L rules overestimate the optimal stopping index, whereas the heuristicdiscrepancy and Hanke-Raus rules fare better.

8.1.2 Auto-Convolution

As an additional test example, we consider the problem of (de-)auto-convolution[22, 38, 47, 131]. Among the many inverse problems based on integral oper-ators, auto-convolution is particularly interesting due to its importance in

154

Figure 8.1: Nonlinear Hammerstein equation: error plots for different pa-rameter choice rules and the optimal choice.

Figure 8.2: Plot of ψ-functionals and k∗: nonlinear Hammerstein equation,δ = 1%.

laser optics [1, 14, 43]. Mathematically, it amounts to solving an operatorequation of the form (4.14) with the operator

F : L2[0, 1]→ L2[0, 1] , F (x)(s) := (x ∗ x)(s) :=

∫ 1

0

x(s− t)x(t) dt ,

where the functions on L2[0, 1] are interpreted as 1-periodic functions on R.While deriving the Frechet differentiability and its adjoint of F is straight-forward, it is not known whether the tangential cone condition (4.18) holds.

155

However, for small enough noise levels δ, the residual functional is locallyconvex around the exact solution x†, given that it only has finitely manynon-zero Fourier coefficients.In this experiment, we take x†(s) = 10 +

√2 sin(2πs) as the exact solution,

initial guess as x0(s) = 10 + 14

√2 sin(2πs) and τ = 1.1 for the discrepancy

principle. We conduct experiments for noise levels δ ∈ (0.01, 0.1) with step-size 0.005.

Figure 8.3: Autoconvolution: error plots for different parameter choice rulesand the optimal choice.

Figure 8.4: Plot of ψ-functionals: autoconvolution, δ = 0.5%.

In Figure 8.3, we observe that barring the Hanke-Raus rule, the other heuris-

156

tic rules outperformed the discrepancy principle, with the heuristic discrep-ancy rule even matching, or almost matching the optimal stopping rule inall tested instances. From this plot, it is evident that the Hanke-Raus ruleis by far the worst performing.in Figure 8.4, we provide example plots of the functionals and their associ-ated stopping indices. In this example, we see that the simple-L rule over-estimates the optimal stopping index, but since the error does not increasesignificantly as the iterations progress, it is the Hanke-Raus rule which ex-hibits the largest error, as it underestimates the optimal stopping index andthe error is significantly larger for smaller k.

8.1.3 Summary

The results of this chapter merely present a first step into the realm of apply-ing heuristic rules to nonlinear Landweber iteration. However, it is at leastpossible to draw some speculative conclusions based on the two experimentspresented in the preceding sections. Certainly, in Section 8.1.2 on auto-convolution, the heuristic stopping rules presented themselves as not onlyviable, but also very effective methods for terminating Landweber iteration.The tests of Section 8.1.1 on the nonlinear Hammerstein problem were lessconclusive as neither the heuristic rules nor the discrepancy rule performedconsistently. Finally, we remind the interested reader that further experi-ments will be conducted in the upcoming paper [77]. The additional testsin that paper, in addition to subsequent future research, will hopefully helpachieve a better understanding of heuristic nonlinear Landweber iteration.

157

Part III

Future Scope

158

Chapter 9

Future Scope

Several topics in current contention remain beyond the scope of this thesisand therefore we did not discuss them in the preceding chapters. However,for the interested reader, we provide a brief summary of them so far as theauthor’s knowledge permits:

9.1 Convex Heuristic Regularisation

In Section 3.3, we presented the recent work of [90], which proved that underMuckenhoupt-type noise restrictions, one can prove convergence of certainheuristic rules in the Bregman distance in case the operator is diagonal.Clearly, the restriction of a diagonal operator is very restrictive and does notcorrespond to the majority of practical situations. Therefore, it would beinteresting to see whether restrictions such as (3.36) can be lifted to the casewhere A is a more general linear operator. One may conjecture that thiscould perhaps be achieved through projections on the noise vector.Furthermore, there is also little theory as of yet for heuristic rules in caseA : X → Y is even a nonlinear operator and/or Y is a Banach space (as wellas X). The development of a noise restricted analysis in either of those caseswould be quite a feat and certainly a project for the future, especially givenhow the majority of natural phenomena are usually modelled via nonlinearoperators.

9.2 Heuristic Blind Kernel Deconvolution

A popular instance for inverse problems comes in form of the deconvolutionproblem [3, 127]. In particular, we consider the ill-posed problem (1.1) withA replaced by the convolution operator A = k∗; that is:

k ∗ f = g, (9.1)

159

where k, f, g ∈ H1 and only noisy data gδ = g + e are available and δ is, asusual, the noise level. Moreover, we assume, also as usual, that the solutiondoes not depend continuously on the data.This problem is a typical example of when the fields of inverse problemsand image processing overlap. It is typical for k to represent a point-spreadfunction that acts on the image f to produce the blurred (and noisy) imagegδ.

9.2.1 Deconvolution

In the standard deconvolution case, one generally assumes knowledge of thekernel k and may therefore determine a regularised solution of the one un-known (i.e., f †), via

f δα ∈ argminf∈H1

T δα (f), where T δα (f) :=1

2‖k ∗ f − gδ‖2 + αTV (f),

(9.2)is a special instance of the Tikhonov functional with regularisation term asthe functional TV : X → R ∪ ∞ and α ∈ (0, αmax).Note that when k = id, the identity function, then (9.2) is used for denoisingand is commonly referred to in the literature as Rudin-Osher-Fatemi (ROF)regularisation [138].For working purposes, one may consider

∫‖∇f‖ with a difference operator

∇. This is the so-called isotropic version and the version with the `1 normis known as the anisotropic case.In this case, we may solve the minimisation problem (9.2) with the alternatingdirection multiplier method (ADMM) [16,27,41,46]. Note that the augmentedLagrangian [13, 66,129] for the discretised version of (9.2) is given by

L(f , f ,f ,λ, ξ) =1

2‖k ∗ f − gδ‖2 + α

n∑i=1

‖fi‖+ω1

2‖f − f‖2 − λT (f − f)

+n∑i=1

(ω2

2‖fi −Dif‖2 − ξTi (fi −Dif)

),

where λ ∈ Rn and ξ ∈ Rn×2 [19].Certainly, it would be interesting to see how the heuristic rules perform. Insome preliminary testing, which involved deconvolving a noisy image whichhad been blurred with a Gaußian point spread function, the ψLR rule seemedmost promising.

9.2.2 Semi-Blind Deconvolution

In this case, we have two unknowns: namely, we do not know k in additionto the unknown f †. However, the seperating factor from (completely) blind

160

deconvolution is that in semi-blind deconvolution, we assume knowledge ofan approximation of k, namely kε satisfying ‖kε − k‖ ≤ ε where ε is theapproximation error.We consider a different functional to (9.2) which was suggested in [19];namely,

T δ,εα,β(k, f) =1

2‖k ∗ f − gδ‖2 + α

(‖f‖2 + TV (f)

)+ β

(‖k − kε‖2 + TV (k)

),

(9.3)from which we may obtain the regularised solutions for the two unknowns:

(kεβ, fδα) ∈ argmin

k,f∈H1

T δ,εα,β(k, f), (9.4)

In [19], they also proposed the following computationally efficient way ofcomputing the minimisers based on the associated augmented Lagrangian of(9.3):

L(f , f ,f , k, k,k, ξ, ζ,µ) = ‖k ∗ f − gδ‖2 + α

(‖f‖2 +

n∑i=1

‖fi‖

)

+ β

(‖k − kε‖2 +

n∑i=1

‖ki‖

)+ω1

2‖f − f‖2 − 〈λ, f − f〉

+n∑i=1

(ω2

2‖fi −Dif‖2 − 〈ξi, fi −Dif〉

)+ω3

2‖k − k‖2 − 〈ζ, k − k〉

+n∑i=1

(ω4

2‖ki −Dik‖2 − 〈µi, ki −Dik〉

),

where λ, ζ ∈ Rn and ξ,µ ∈ Rn×2, and the ADMM method. In the paper [19],the parameters α and β were chosen by “trial and error”. Therefore, an inter-esting scope of further work would be to develop and implement a heuristicparameter choice rule for selecting (α∗, β∗). Clearly, the parameter choicerules previously discussed would be void in their current form as presentedin the previous chapters. There are no doubt many possibilities on how onecould proceed, but one idea in contention could be to do it in an iterativemanner, i.e., for a fixed initial guess f0, one would first solve

kεβ ∈ argmink∈H1

1

2‖k ∗ f0 − gδ‖2 + β

(‖k − kε‖2 + TV (k)

),

with a heuristic parameter choice rule

β∗ ∈ argminβ∈(0,βmax)

ψ(β, gδ).

161

Then, one could proceed to minimise

f δα(β∗) ∈ argminf∈H1

1

2‖kεβ∗ ∗ f − g

δ‖2 + α(β∗)(‖f‖2 + TV (f)

),

and finally obtain a solution via a heuristic rule

α∗ ∈ argminα∈(0,αmax)

ψ(α(β∗), gδ),

which would yield f δα∗ . Of course, there may be more effective alternativesand they, along with the aforementioned algorithm, should be investigatednumerically.

9.3 Meta-Heuristics

Note all from the previous section may be considered an application of thegreater branch of meta-heuristics, e.g., when one has a regularisation of theform:

T δαj(x) :=1

2‖Ax− yδ‖2 +

j∑i=1

αiRi(x), (j ∈ N), (9.5)

where we can observe that (9.3) is merely the above with j = 2 and R1 =‖ · ‖2 + TV and R2 = ‖ · −kε‖2 + TV . The general principle of meta-heuristic regularisation has been applied (numerically) with some success(cf. [39, 119, 120]), with a typical applications involving the application ofthe theory of reproducing kernel Hilbert spaces (RKHS) to machine learningwith other examples also including, e.g., in finance (cf. [72]). Thus, thereis much potential for further investigation of this “sub-branch” of heuristicparameter choice rules.

162

Bibliography

[1] S. W. Anzengruber, S. Burger, B. Hofmann, and G. Stein-meyer, Variational regularization of complex deautoconvolution andphase retrieval in ultrashort laser pulse characterization, Inverse Prob-lems, 32 (2016), pp. 035002, 27.

[2] M. A. Arino and B. Muckenhoupt, Maximal functions on classi-cal Lorentz spaces and Hardy’s inequality with weights for nonincreas-ing functions, Transactions of the American Mathematical Society, 320(1990), pp. 727–735.

[3] G. Ayers and J. C. Dainty, Iterative blind deconvolution methodand its applications, Optics Letters, 13 (1988), pp. 547–549.

[4] A. Bakushinskiy, Remarks on choosing a regularization parameterusing quasi-optimality and ratio criterion, USSR Computational Math-ematics and Mathematical Physics, 24 (1985), pp. 181–182.

[5] A. Barbero and S. Sra, Fast Newton-type methods for total varia-tion regularization, in Proceedings of the 28th International Conferenceon Machine Learning (ICML-11), Citeseer, 2011, pp. 313–320.

[6] , Modular proximal optimization for multidimensional total-variation regularization, arXiv preprint arXiv:1411.0589, (2014).

[7] F. Bauer and S. Kindermann, The quasi-optimality criterion forclassical inverse problems, Inverse Problems, 24 (2008), pp. 035002, 20.

[8] , Recent results on the quasi-optimality principle, J. Inverse Ill-Posed Probl., 17 (2009), pp. 5–18.

[9] F. Bauer and M. A. Lukas, Comparing parameter choice methodsfor regularization of ill-posed problems, Mathematics and Computersin Simulation, 81 (2011), pp. 1795–1841.

[10] H. H. Bauschke and P. L. Combettes, Convex analysis andmonotone operator theory in Hilbert spaces, CMS Books in Mathe-matics/Ouvrages de Mathematiques de la SMC, Springer, Cham, sec-ond ed., 2017. With a foreword by Hedy Attouch.

163

[11] A. Beck, First-order methods in optimization, vol. 25 of MOS-SIAMSeries on Optimization, Society for Industrial and Applied Mathe-matics (SIAM), Philadelphia, PA; Mathematical Optimization Society,Philadelphia, PA, 2017.

[12] A. Beck and M. Teboulle, A fast iterative shrinkage-thresholdingalgorithm for linear inverse problems, SIAM J. Imaging Sci., 2 (2009),pp. 183–202.

[13] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributedcomputation: numerical methods, Athena Scientific, Belmont, MA,2014. Originally published by Prentice-Hall, Inc. in 1989. Includescorrections (1997).

[14] S. Birkholz, G. Steinmeyer, S. Koke, D. Gerth, S. Burger,and B. Hofmann, Phase retrieval via regularization in self-diffraction-based spectral interferometry, J. Opt. Soc. Am. B, 32 (2015),pp. 983–992.

[15] N. Bissantz, T. Hohage, A. Munk, and F. Ruymgaart, Con-vergence rates of general regularization methods for statistical inverseproblems and applications, SIAM Journal on Numerical Analysis, 45(2007), pp. 2610–2636.

[16] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein,Distributed optimization and statistical learning via the alternating di-rection method of multipliers, Found. Trends Mach. Learn., 3 (2011),p. 1–122.

[17] L. M. Bregman, A relaxation method of finding a common point ofconvex sets and its application to the solution of problems in convexprogramming, Z. Vycisl. Mat i Mat. Fiz., 7 (1967), pp. 620–631.

[18] C. Brezinski, G. Rodriguez, and S. Seatzu, Error estimatesfor linear systems with applications to regularization, Numerical Algo-rithms, 49 (2008), pp. 85–104.

[19] A. Buccini, M. Donatelli, and R. Ramlau, A semiblind regu-larization algorithm for inverse problems with application to image de-blurring, SIAM Journal on Scientific Computing, 40 (2018), pp. A452–A483.

[20] M. Burger, J. Flemming, and B. Hofmann, Convergence ratesin `1-regularization if the sparsity assumption fails, Inverse Problems,29 (2013), pp. 025013, 16.

164

[21] M. Burger and S. Osher, Convergence rates of convex variationalregularization, Inverse Problems, 20 (2004), pp. 1411–1421.

[22] S. Burger and B. Hofmann, About a deficit in low-order conver-gence rates on the example of autoconvolution, Applicable Analysis, 94(2015), pp. 477–493.

[23] L. Cavalier, Inverse problems in statistics, in Inverse problemsand high-dimensional estimation, vol. 203 of Lect. Notes Stat. Proc.,Springer, Heidelberg, 2011, pp. 3–96.

[24] G. Chen and M. Teboulle, Convergence analysis of a proximal-like minimization algorithm using Bregman functions, SIAM Journalon Optimization, 3 (1993), pp. 538–543.

[25] D. L. Colton, Analytic theory of partial differential equations, vol. 8of Monographs and Studies in Mathematics, Pitman (Advanced Pub-lishing Program), Boston, Mass.-London, 1980.

[26] R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey,and D. E. Knuth, On the Lambert W function, Adv. Comput. Math.,5 (1996), pp. 329–359.

[27] J. Douglas, Jr. and H. H. Rachford, Jr., On the numericalsolution of heat conduction problems in two and three space variables,Trans. Amer. Math. Soc., 82 (1956), pp. 421–439.

[28] H. Egger, Regularization of inverse problems with large noise, Journalof Physics: Conference Series, 124 (2008), p. 012022.

[29] P. N. Eggermont, V. LaRiccia, and M. Z. Nashed, Noise Mod-els for Ill-Posed Problems, Springer Berlin Heidelberg, Berlin, Heidel-berg, 2015, pp. 1633–1658.

[30] P. P. B. Eggermont, V. N. LaRiccia, and M. Z. Nashed,On weakly bounded noise in ill-posed problems, Inverse Problems, 25(2009), pp. 115018, 14.

[31] I. Ekeland and R. Temam, Analyse convexe et problemes variation-nels, Dunod; Gauthier-Villars, Paris-Brussels-Montreal, Que., 1974.Collection Etudes Mathematiques.

[32] , Convex analysis and variational problems, vol. 28 of Classics inApplied Mathematics, Society for Industrial and Applied Mathematics(SIAM), Philadelphia, PA, English ed., 1999. Translated from theFrench.

165

[33] H. W. Engl, Integralgleichungen, Springer Lehrbuch Mathematik.[Springer Mathematics Textbook], Springer-Verlag, Vienna, 1997.

[34] H. W. Engl and H. Gfrerer, A posteriori parameter choice forgeneral regularization methods for solving linear ill-posed problems,Appl. Numer. Math., 4 (1988), pp. 395–417.

[35] H. W. Engl, M. Hanke, and A. Neubauer, Regularization ofinverse problems, vol. 375 of Mathematics and its Applications, KluwerAcademic Publishers Group, Dordrecht, 1996.

[36] L. C. Evans, Partial differential equations, vol. 19 of Graduate Stud-ies in Mathematics, American Mathematical Society, Providence, RI,second ed., 2010.

[37] W. Fenchel, On conjugate convex functions, Canad. J. Math., 1(1949), pp. 73–77.

[38] G. Fleischer and B. Hofmann, On inversion rates for the auto-convolution equation, Inverse Problems, 12 (1996), pp. 419–435.

[39] M. Fornasier, V. Naumova, and S. V. Pereverzyev, Parame-ter choice strategies for multipenalty regularization, SIAM Journal onNumerical Analysis, 52 (2014), pp. 1770–1794.

[40] G. Frasso and P. H. Eilers, L-and V-curves for optimal smoothing,Statistical Modelling, 15 (2015), pp. 91–111.

[41] D. Gabay and B. Mercier, A dual algorithm for the solution ofnonlinear variational problems via finite element approximation, Com-puters & Mathematics with Applications, 2 (1976), pp. 17–40.

[42] S. Gazzola, P. C. Hansen, and J. G. Nagy, IR Tools: a MAT-LAB package of iterative regularization methods and large-scale testproblems, Numer. Algorithms, 81 (2019), pp. 773–811.

[43] D. Gerth, B. Hofmann, S. Birkholz, S. Koke, and G. Stein-meyer, Regularization of an autoconvolution problem in ultrashortlaser pulse characterization, Inverse Problems in Science and Engineer-ing, 22 (2014), pp. 245–266.

[44] H. Gfrerer, An a posteriori parameter choice for ordinary and it-erated Tikhonov regularization of ill-posed problems leading to optimalconvergence rates, Math. Comp., 49 (1987), pp. 507–522, S5–S12.

[45] V. Glasko and Y. Kriskin, On the quasi-optimality principle for ill-posed problems in Hilbert space, U.S.S.R. Comput.Maths.Math.Phys.,24 (1984), pp. 1–7.

166

[46] R. Glowinski and A. Marroco, Sur l’approximation, par elementsfinis d’ordre un, et la resolution, par penalisation-dualite d’une classede problemes de Dirichlet non lineaires, ESAIM: Mathematical Mod-elling and Numerical Analysis-Modelisation Mathematique et AnalyseNumerique, 9 (1975), pp. 41–76.

[47] R. Gorenflo and B. Hofmann, On autoconvolution and regular-ization, Inverse Problems, 10 (1994), pp. 353–373.

[48] M. Grasmair, M. Haltmeier, and O. Scherzer, Sparse regular-ization with `q penalty term, Inverse Problems, 24 (2008), pp. 055020,13.

[49] C. W. Groetsch, Inverse problems in the mathematical sciences,Vieweg Mathematics for Scientists and Engineers, Friedr. Vieweg &Sohn, Braunschweig, 1993.

[50] J. Hadamard, Lectures on Cauchy’s problem in linear partial differ-ential equations, Dover Publications, New York, 1953.

[51] U. Hamarik, U. Kangro, S. Kindermann, and K. Raik, Semi-heuristic parameter choice rules for Tikhonov regularisation with opera-tor perturbations, Journal of Inverse and Ill-posed Problems, 27 (2019),pp. 117–131.

[52] U. Hamarik, U. Kangro, R. Palm, T. Raus, and U. Taut-enhahn, Monotonicity of error of regularized solution and its use forparameter choice, Inverse Probl. Sci. Eng., 22 (2014), pp. 10–30.

[53] U. Hamarik, R. Palm, and T. Raus, Comparison of parameterchoices in regularization algorithms in case of different informationabout noise level, Calcolo, 48 (2011), pp. 47–59.

[54] U. Hamarik, R. Palm, and T. Raus, A family of rules for parame-ter choice in Tikhonov regularization of ill-posed problems with inexactnoise level, Journal of Computational and Applied Mathematics, 236(2012), pp. 2146–2157.

[55] M. Hanke, Limitations of the L-curve method in ill-posed problems,BIT, 36 (1996), pp. 287–301.

[56] , A note on the nonlinear Landweber iteration, Numer. Funct.Anal. Optim., 35 (2014), pp. 1500–1510.

[57] M. Hanke and C. W. Groetsch, Nonstationary iterated Tikhonovregularization, Journal of Optimization Theory and Applications, 98(1998), pp. 37–53.

167

[58] M. Hanke, A. Neubauer, and O. Scherzer, A convergence anal-ysis of the Landweber iteration for nonlinear ill-posed problems, Nu-merische Mathematik, 72 (1995), pp. 21–37.

[59] M. Hanke and T. Raus, A general heuristic for choosing the regu-larization parameter in ill-posed problems, SIAM Journal on ScientificComputing, 17 (1996), pp. 956–972.

[60] P. C. Hansen, Analysis of discrete ill-posed problems by means of theL-curve, SIAM Review, 34 (1992), pp. 561–580.

[61] , Regularization tools: a Matlab package for analysis and solutionof discrete ill-posed problems, Numer. Algorithms, 6 (1994), pp. 1–35.

[62] , Rank-deficient and discrete ill-posed problems, SIAM Monographson Mathematical Modeling and Computation, Society for Industrialand Applied Mathematics (SIAM), Philadelphia, PA, 1998. Numericalaspects of linear inversion.

[63] , The L-curve and its use in the numerical treatment of inverseproblems, in in Computational Inverse Problems in Electrocardiology,ed. P. Johnston, Advances in Computational Bioengineering, WITPress, 2000, pp. 119–142.

[64] P. C. Hansen and D. P. O’Leary, The use of the L-curve in theregularization of discrete ill-posed problems, SIAM Journal on ScientificComputing, 14 (1993), pp. 1487–1503.

[65] B. Harrach, T. Jahn, and R. Potthast, Beyond the Bakushinkiiveto: regularising linear inverse problems without knowing the noisedistribution, Numer. Math., 145 (2020), pp. 581–603.

[66] M. R. Hestenes, Multiplier and gradient methods, J. Optim. TheoryAppl., 4 (1969), pp. 303–320.

[67] B. Hofmann, Ill-posedness and regularization of inverse problems—areview of mathematical methods, in The Inverse Problem (Tegernsee,1994), Res. Meas. Approv., VCH, Weinheim, 1995, pp. 45–66.

[68] B. Hofmann, B. Kaltenbacher, C. Poeschl, andO. Scherzer, A convergence rates result for Tikhonov regularizationin Banach spaces with non-smooth operators, Inverse Problems, 23(2007), p. 987.

[69] B. Hofmann and S. Kindermann, On the degree of ill-posednessfor linear problems with non-compact operators, Methods Appl. Anal.,17 (2010), pp. 445–461.

168

[70] B. Hofmann and O. Scherzer, Factors influencing the ill-posednessof nonlinear problems, Inverse Problems, 10 (1994), pp. 1277–1297.

[71] , Local ill-posedness and source conditions of operator equations inHilbert spaces, Inverse Problems, 14 (1998), pp. 1189–1206.

[72] C. Hofmann, B. Hofmann, and A. Pichler, Simultaneous identi-fication of volatility and interest rate functions—a two-parameter regu-larization approach, Electron. Trans. Numer. Anal., 51 (2019), pp. 99–117.

[73] T. Hohage, Logarithmic convergence rates of the iteratively regular-ized Gauss-Newton method for an inverse potential and an inverse scat-tering problem, Inverse Problems, 13 (1997), p. 1279.

[74] S. Hubmer and R. Ramlau, Convergence analysis of a two-pointgradient method for nonlinear ill-posed problems, Inverse Problems, 33(2017), pp. 095004, 30.

[75] , Nesterov’s accelerated gradient method for nonlinear ill-posedproblems with a locally convex residual functional, Inverse Problems,34 (2018), pp. 095003, 30.

[76] S. Hubmer, E. Sherina, A. Neubauer, and O. Scherzer, Lameparameter estimation from static displacement field measurements inthe framework of nonlinear inverse problems, SIAM Journal on ImagingSciences, 11 (2018), pp. 1268–1293.

[77] S. Humber, S. Kindermann, K. Raik, and E. Sherina, A numer-ical comparison of some heuristic stopping rules for nonlinear Landwe-ber iteration, (to appear).

[78] B. Jin and D. A. Lorenz, Heuristic parameter-choice rules for con-vex variational regularization based on error estimates, SIAM J. Numer.Anal., 48 (2010), pp. 1208–1229.

[79] Q. Jin, Hanke-Raus heuristic rule for variational regularization in Ba-nach spaces, Inverse Problems, 32 (2016), pp. 085008, 19.

[80] , On a heuristic stopping rule for the regularization of inverse prob-lems by the augmented Lagrangian method, Numer. Math., 136 (2017),pp. 973–992.

[81] B. Kaltenbacher, A. Neubauer, and O. Scherzer, Iterativeregularization methods for nonlinear ill-posed problems, vol. 6 of RadonSeries on Computational and Applied Mathematics, Walter de GruyterGmbH & Co. KG, Berlin, 2008.

169

[82] B. Kaltenbacher, T. T. N. Nguyen, and O. Scherzer, Thetangential cone condition for some coefficient identification model prob-lems in parabolic PDEs, arXiv preprint arXiv:1908.01239, (2019).

[83] S. Kindermann, Convergence analysis of minimization-based noiselevel-free parameter choice rules for linear ill-posed problems, Electron.Trans. Numer. Anal., 38 (2011), pp. 233–257.

[84] , Discretization independent convergence rates for noise level-freeparameter choice rules for the regularization of ill-conditioned problems,Electron. Trans. Numer. Anal, 40 (2013), pp. 58–81.

[85] , Convergence of the gradient method for ill-posed problems, InverseProbl. Imaging, 11 (2017), pp. 703–720.

[86] S. Kindermann, L. D. Mutimbu, and E. Resmerita, A numericalstudy of heuristic parameter choice rules for total variation regulariza-tion, Journal of Inverse and Ill-Posed Problems, 22 (2014), pp. 63–94.

[87] S. Kindermann and A. Neubauer, On the convergence of the qua-sioptimality criterion for (iterated) Tikhonov regularization, InverseProblems and Imaging, 2 (2008), pp. 291–299.

[88] S. Kindermann, S. Pereverzyev, Jr., and A. Pilipenko, Thequasi-optimality criterion in the linear functional strategy, InverseProblems, 34 (2018), pp. 075001, 24.

[89] S. Kindermann and K. Raik, Heuristic parameter choice rules forTikhonov regularization with weakly bounded noise, Numerical Func-tional Analysis and Optimization, 40 (2019), pp. 1373–1394.

[90] , Convergence of heuristic parameter choice rules for convexTikhonov regularization, SIAM Journal on Numerical Analysis, 58(2020), pp. 1773–1800.

[91] , A simplified L-curve method as error estimator, Electron. Trans.Numer. Anal., 53 (2020), pp. 217–238.

[92] M. A. Krasnosel’skii, P. Zabreyko, E. I. Pustylnik, andP. Sobolevski, Integral operators in spaces of summable functions,Journal of Engineering Mathematics, 10 (1976), pp. 190–190.

[93] D. Krawczyk-StanDo and M. Rudnicki, Regularization param-eter selection in discrete ill-posed problems—the use of the U-curve,International Journal of Applied Mathematics and Computer Science,17 (2007), pp. 157–164.

170

[94] D. Krawczyk-Stando and M. Rudnicki, The use of L-curve andU-curve in inverse electromagnetic modelling, in Intelligent ComputerTechniques in Applied Electromagnetics, Springer, 2008, pp. 73–82.

[95] E. Kreyszig, Introductory functional analysis with applications, Wi-ley Classics Library, John Wiley & Sons, Inc., New York, 1989.

[96] Y. A. Kriksin, The choice of regularization parameter for the solutionof a linear operator equation, Zh. Vychisl. Mat. i Mat. Fiz., 25 (1985),pp. 1092–1097, 1119–1120.

[97] L. Landweber, An iteration formula for Fredholm integral equationsof the first kind, Amer. J. Math., 73 (1951), pp. 615–624.

[98] C. L. Lawson and R. J. Hanson, Solving least squares problems,Prentice-Hall, Inc., Englewood Cliffs, N.J., 1974. Prentice-Hall Seriesin Automatic Computation.

[99] A. S. Leonov, On the choice of a regularization parameter accordingto the criteria of quasioptimality and ratio for ill-posed problems oflinear algebra with a perturbed ope, Dokl. Akad. Nauk SSSR, 262 (1982),pp. 1069–1072.

[100] O. V. Lepskiı, A problem of adaptive estimation in Gaussian whitenoise, Teoriya Veroyatnostei i ee Primeneniya, 35 (1990), pp. 459–470.

[101] D. A. Lorenz, P. Maass, and P. Q. Muoi, Gradient descent forTikhonov functionals with sparsity constraints: theory and numericalcomparison of step size rules, Electron. Trans. Numer. Anal., 39 (2012),pp. 437–463.

[102] A. K. Louis, Inverse und schlecht gestellte Probleme, Teubner Studi-enbucher Mathematik. [Teubner Mathematical Textbooks], B. G. Teub-ner, Stuttgart, 1989.

[103] S. Lu, S. V. Pereverzev, Y. Shao, and U. Tautenhahn, On thegeneralized discrepancy principle for Tikhonov regularization in Hilbertscales, J. Integral Equations Appl., 22 (2010), pp. 483–517.

[104] S. Lu, S. V. Pereverzev, and U. Tautenhahn, Regularized totalleast squares: Computational aspects and error bounds, SIAM J. MatrixAnal. Appl., 31 (2009), pp. 918–941.

[105] M. A. Lukas, Asymptotic optimality of generalized cross-validationfor choosing the regularization parameter, Numerische Mathematik, 66(1993), pp. 41–66.

171

[106] , Robust generalized cross-validation for choosing the regularizationparameter, Inverse Problems, 22 (2006), pp. 1883–1902.

[107] , Strong robust generalized cross-validation for choosing the regu-larization parameter, Inverse Problems, 24 (2008), pp. 034006, 16.

[108] P. Mathe, Saturation of regularization methods for linear ill-posedproblems in Hilbert spaces, SIAM J. Numer. Anal., 42 (2004), pp. 968–973.

[109] , The Lepskiı principle revisited, Inverse Problems, 22 (2006),pp. L11–L15.

[110] P. Mathe and S. V. Pereverzev, Geometry of linear ill-posedproblems in variable Hilbert scales, Inverse problems, 19 (2003), p. 789.

[111] , The discretized discrepancy principle under general source condi-tions, Journal of Complexity, 22 (2006), pp. 371–381.

[112] P. Mathe and U. Tautenhahn, Regularization under general noiseassumptions, Inverse Problems, 27 (2011), pp. 035016, 15.

[113] J.-J. Moreau, Proximite et dualite dans un espace hilbertien, Bull.Soc. Math. France, 93 (1965), pp. 273–299.

[114] V. A. Morozov, On the solution of functional equations by themethod of regularization, Soviet Math. Dokl., 7 (1966), pp. 414–417.

[115] , Regularization methods for ill-posed problems, CRC Press, BocaRaton, FL, 1993. Translated from the 1987 Russian original.

[116] M. Z. Nashed, Generalized inverses, normal solvability, and itera-tion for singular operator equations, in Nonlinear Functional Anal. andAppl. (Proc. Advanced Sem., Math. Res. Center, Univ. of Wisconsin,Madison, Wis., 1970), Academic Press, New York, 1971, pp. 311–359.

[117] , Aspects of generalized inverses in analysis and regularization, inGeneralized inverses and applications (Proc. Sem., Math. Res. Center,Univ. Wisconsin, Madison, Wis., 1973), 1976, pp. 193–244. Publ. Math.Res. Center Univ. Wisconsin, No. 32.

[118] F. Natterer, The mathematics of computerized tomography, vol. 32of Classics in Applied Mathematics, Society for Industrial and AppliedMathematics (SIAM), Philadelphia, PA, 2001. Reprint of the 1986original.

172

[119] V. Naumova and S. V. Pereverzyev, Multi-penalty regulariza-tion with a component-wise penalization, Inverse Problems, 29 (2013),pp. 075002, 15.

[120] V. Naumova, S. V. Pereverzyev, and S. Sivananthan, Extrap-olation in variable RKHSs with application to the blood glucose reading,Inverse Problems, 27 (2011), pp. 075010, 13.

[121] A. Neubauer, On Landweber iteration for nonlinear ill-posed prob-lems in Hilbert scales, Numer. Math., 85 (2000), pp. 309–328.

[122] , Some generalizations for Landweber iteration for nonlinear ill-posed problems in Hilbert scales, Journal of Inverse and Ill-posed Prob-lems, 24 (2016), pp. 393–406.

[123] , A New Gradient Method for Ill-Posed Problems, Numerical Func-tional Analysis and Optimization, 0 (2017), pp. 1–26.

[124] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, Aniterative regularization method for total variation-based image restora-tion, Multiscale Modeling & Simulation, 4 (2005), pp. 460–489.

[125] F. O’Sullivan, A statistical perspective on ill-posed inverse problems,Statist. Sci., 1 (1986), pp. 502–527. With comments and a rejoinder bythe author.

[126] R. Palm, Numerical comparison of regularization algorithms for solv-ing ill-posed problems, PhD thesis, Tartu University Press, 2010.

[127] E. Pantin, J.-L. Starck, F. Murtagh, and K. Egiazarian,Deconvolution and blind deconvolution in astronomy, Blind Image De-convolution: Theory and Applications, (2007), pp. 277–317.

[128] D. L. Phillips, A technique for the numerical solution of certainintegral equations of the first kind, Journal of the ACM (JACM), 9(1962), pp. 84–97.

[129] M. J. D. Powell, A method for nonlinear constraints in minimiza-tion problems, in Optimization (Sympos., Univ. Keele, Keele, 1968),Academic Press, London, 1969, pp. 283–298.

[130] J. Radon, Uber die Bestimmung von Funktionen durch ihre Integral-werte langs gewissez Mannigfaltigheiten, Ber, Verh. Sachs. Akad. Wiss.Leipzig, Math Phys Klass, 69 (1917).

[131] R. Ramlau, TIGRA—an iterative algorithm for regularizing nonlinearill-posed problems, Inverse Problems, 19 (2003), pp. 433–465.

173

[132] R. Ramlau and E. Resmerita, Convergence rates for regulariza-tion with sparsity constraints, Electron. Trans. Numer. Anal, 37 (2010),pp. 87–104.

[133] T. Raus, An a posteriori choice of the regularization parameter in caseof approximately given error bound of data, Tartu Riikl. Ul. Toimetised,(1990), pp. 73–87.

[134] T. Raus and U. Hamarik, On the quasi-optimal rules for the choiceof the regularization parameter in case of a noisy operator, Advancesin Computational Mathematics, 36 (2012), pp. 221–233.

[135] , Q-curve and area rules for choosing heuristic parameter inTikhonov regularization, arXiv preprint arXiv:1809.02061, (2018).

[136] T. Reginska, A regularization parameter in discrete ill-posed prob-lems, SIAM Journal on Scientific Computing, 17 (1996), pp. 740–749.

[137] R. T. Rockafellar, Level sets and continuity of conjugate convexfunctions, Trans. Amer. Math. Soc., 123 (1966), pp. 46–63.

[138] L. I. Rudin, S. Osher, and E. Fatemi, Nonlinear total variationbased noise removal algorithms, vol. 60, 1992, pp. 259–268. Experi-mental mathematics: computational issues in nonlinear science (LosAlamos, NM, 1991).

[139] W. Rudin, Functional analysis, International Series in Pure and Ap-plied Mathematics, McGraw-Hill, Inc., New York, second ed., 1991.

[140] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier,and F. Lenzen, Variational Methods in Imaging, Applied Mathe-matical Sciences, Springer New York, 2008.

[141] E. Schock, Arbitrarily slow convergence, uniform convergence andsuperconvergence of Galerkin-like methods, IMA J. Numer. Anal., 5(1985), pp. 153–160.

[142] T. Schuster, B. Kaltenbacher, B. Hofmann, and K. S. Kaz-imierski, Regularization methods in Banach spaces, vol. 10 of RadonSeries on Computational and Applied Mathematics, Walter de GruyterGmbH & Co. KG, Berlin, 2012.

[143] U. Tautenhahn, Regularization of linear ill-posed problems with noisyright hand side and noisy operator, J. Inverse Ill-Posed Probl., 16(2008), pp. 507–523.

174

[144] U. Tautenhahn and U. Hamarik, The use of monotonicity forchoosing the regularization parameter in ill-posed problems, InverseProblems, 15 (1999), p. 1487.

[145] A. N. Tikhonov, On the regularization of ill-posed problems, Dokl.Akad. Nauk SSSR, 153 (1963), pp. 49–52.

[146] , On the solution of ill-posed problems and the method of regular-ization, Dokl. Akad. Nauk SSSR, 151 (1963), pp. 501–504.

[147] A. N. Tikhonov and V. Y. Arsenin, Solutions of ill-posed prob-lems, V. H. Winston & Sons, Washington, D.C.: John Wiley & Sons,New York-Toronto, Ont.-London, 1977. Translated from the Russian,Preface by translation editor Fritz John, Scripta Series in Mathematics.

[148] A. N. Tikhonov and V. B. Glasko, The approximate solutionof Fredholm integral equations of the first kind, USSR ComputationalMathematics and Mathematical Physics, 4 (1964), pp. 236–247.

[149] , Use of the regularization method in non-linear problems, USSRComputational Mathematics and Mathematical Physics, 5 (1965),pp. 93–107.

[150] G. M. Vaı nikko and A. Y. Veretennikov, Iteratsionnye protse-dury v nekorrektnykh zadachakh, “Nauka”, Moscow, 1986.

[151] C. R. Vogel, Non-convergence of the L-curve regularization parame-ter selection method, Inverse problems, 12 (1996), p. 535.

[152] G. Wahba, Spline models for observational data, vol. 59 of CBMS-NSF Regional Conference Series in Applied Mathematics, Society forIndustrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990.

[153] W. Yin, Analysis and generalizations of the linearized Bregmanmethod, SIAM Journal on Imaging Sciences, 3 (2010), pp. 856–877.

[154] Z. Zhang and Q. Jin, Heuristic rule for non-stationary iteratedTikhonov regularization in Banach spaces, Inverse Problems, 34 (2018),pp. 115002, 26.

175

Appendices

176

Appendix A

Functional Calculus

For the analysis of linear operators acting on Hilbert spaces, the Borel func-tional calculus for self-adjoint operators proves to be particularly useful andalso the functional calculus for compact operators, more commonly knownas the singular value decomposition. We largely draw on material from [35].Other useful references which detail the proceeding construction are [95,139].Before we go into the Borel functional calculus, we first treat the less general,but simpler functional calculus for compact operators. Let X and Y beHilbert spaces. In particular, without going into the derivation, we note thatwe can define an eigensystem for a compact linear operator K : X → Yas σn; vn, un, where σn are the singular values of the self-adjoint compactoperator K∗K and vn are the corresponding set of orthonormal eigenvectorsof K∗K so that for all x ∈ domK, we can write

Kx =∞∑n=1

σn〈x, vn〉un, (A.1)

where un are the orthonormal eigenvectors of KK∗ and may be defined by

un :=Kvn‖Kvn‖

.

Moreover, σ2n; vn defines a singular system for K∗K, so that

K∗Kx =∞∑n=1

σ2n〈x, vn〉vn, (A.2)

for all x ∈ dom(K∗K).For self adjoint operators (which are not necessarily compact), we have theaforementioned Borel functional calculus. Before we go into more detail, notethat (A.2) may be written as

K∗Kx =

∫ ∞0

λ dEλx, (A.3)

177

where λ are the eigenvalues and Eλ denotes the so-called spectral familyof K∗K, respectively. Note that Eλ : X → Xλ is an orthogonal projector,where

Xλ := vn | σ2n < λ and n ∈ N (+ ker(K∗K) if λ > 0) .

Now, for λ ≤ 0, one has that Eλ = 0. Moreover, as the eigenvectors vnspan range(K∗K), it follows that

Xλ = range(K∗K) + ker(K∗K) = X,

for λ > σ21; thus Eλ = I whenever λ > σ2

1. Much in line with [35], we alsoremark that for all λ1 ≤ λ2 and x ∈ X, one has that 〈Eλ1x, x〉 ≤ 〈Eλ2x, x〉,i.e., Eλ is “monotonically increasing” with respect to λ. Furthermore, Eλis piecewise constant with “jumps” at λ = σ2

n (and λ = 0 if and only ifkerK = ker(K∗K) 6= 0) of “height”

∞∑n=1σ2n=λ

〈·, vn〉vn,

that is, the orthogonal projectors onto the span of all eigenvectors corre-sponding to the eigenvalue σ2

n (of multiplicity possibly greater than 1) isadded. We may consequently define∫ ‖K‖+

0

f(λ) dEλx :=∑n∈N

f(σ2n)〈x, vn〉vn,∫ ‖K‖+

0

f(λ) d〈Eλx1, x2〉 :=∑n∈N

f(σ2n)〈x1, vn〉〈x2, un〉,∫ ‖K‖+

0

f(λ) d‖Eλx‖2 :=∑n∈N

f(σ2n)|〈x, vn〉|2,

for a (piecewise) continuous function f and x1, x2 ∈ X, where ‖K‖2 + ε =σ2

1 +ε for any ε > 0, although in the integration bound, we opt write “‖K‖+”.Thence, ∫ ‖K‖+

0

f(λ) dEλx =∑n∈N

f(σ2n)〈x, vn〉vn = f(K∗K).

In particular, in case f = id, the identity function, we may write

K∗K =

∫ ‖K‖+0

λ dEλx.

A particularly useful result, which we cite from [35], is the following obser-vation

178

Proposition 32. For all continuous functions f , one has that

f(K∗K)K∗ = K∗f(KK∗), (A.4)

where f(KK∗) is defined analogously to f(K∗K) with the spectral familywhich we denote by Fλ, defined by

Fλy :=∞∑n=1σ2n<λ

〈y, un〉un (+I −Q),

with I −Q : Y → ker(KK∗) = range(KK∗)⊥ an orthogonal projection.

Moreover, for the norm and inner product operations, we have

〈f(K∗K)x1, x2〉 =

∫f(λ) d〈Eλx1, x2〉, (A.5)

‖f(K∗K)‖2 =

∫f 2(λ) d‖Eλx‖2, (A.6)

for all x, x1, x2 ∈ X. Furthermore, we have the following useful bound:

‖f(K∗K)‖ ≤ supλ∈(0,‖K‖2)

|f(λ)|. (A.7)

Note that (A.4) and (A.7) are also valid for any linear operator A : X →Y (as will subsequently become apparent). For further details on the welldefinedness of the aforementioned integrals and also additional properties ofthe spectral family, we refer to [35].It further follows from [35, Proposition 2.14, p. 46] that for any self-adjointoperator, say T : X → Y there exists a unique spectral family Eλ suchthat for all functions f ∈ M, where M denotes the space of all measurablefunctions with respect to the measure d‖Eλx‖2 for all x ∈ X, we may definethe operator

f(T )x =

∫f(λ) dEλx, (x ∈ dom(f(T ))), (A.8)

where

dom(f(T )) :=

x ∈ X |

∫f 2(λ) d‖Eλx‖2 <∞

. (A.9)

Additionally, from [35, Proposition 2.16, p. 46], the following holds for allf, g ∈M:

1. For x1 ∈ dom f(T ) and x2 ∈ dom g(T ),

〈f(T )x1, g(T )x2〉 =

∫f(λ)g(λ) d〈Eλx1, x2〉.

179

2. If x ∈ dom f(T ), then f(T )x ∈ dom g(T ) if and only if x ∈ dom(fg)(T ).Moreover,

g(T )f(T )x = (gf)(T )x.

Finally, following on from the observation of [35, p. 46], we may replace Kin the expressions (A.3),(A.5) and (A.6) with A, defined as before.

180

Appendix B

Convex Analysis

For the analysis of Chapter 3 in particular, we require a different set of toolsto the preceding appendix. In particular, convex analysis and subdifferentialcalculus are the tools of choice and perhaps necessity. We consequently pro-vide this appendix as a background for the unacquainted reader and presentonly the fundamental and required background plus results. Our generalsource of reference is [31, 32]. Note that in the Hilbert space setting, a use-ful reference is also [10]. If one is in finite dimensions, even, then one canconsider [11].Firstly, the underlying basis of convex analysis are the definitions of convexsets and functionals. Let X henceforth be a Banach space, unless statedotherwise. We review these basic concepts courtesy of [32, 140]:

Definition 9. A set U ⊂ X is said to be convex if

λx+ (1− λ)y ∈ U.

for all x, y ∈ U and λ ∈ (0, 1).On the other hand, a functional R : X → R ∪ ∞ is said to be convex if

R(λx+ (1− λ)y) ≤ λR(x) + (1− λ)R(y), (B.1)

for all x, y ∈ X and λ ∈ (0, 1). The functional R is said to be strictly convexif the inequality (B.1) is strict, i.e., with “<” whenever x 6= y ∈ domR.

Note that reversing the inequality in (B.1) defines a concave functional.Typical examples of convex functionals include the norm ‖·‖`p : X → R∪∞with p ≥ 1.Another useful concept in the analysis of Chapter 3 in particular is that oflower semicontinuity, which is defined as [140]:

Definition 10. A functional R : X → R ∪ ∞ is said to be lower semi-continuous if

lim infk→∞

R(xk) ≤ R(x), (B.2)

whenever xk → x.

181

x y

R

Figure B.1: For a convex function R : R → R ∪ ∞, the line betweenconnecting any two points R(x) and R(y) lies above or on the graph.

One may also define upper semi-continuity in much the same way by simplyreversing the inequality in (B.2).The following concept, as will be uncovered, is closely related to convexityand, incidentally, also lower semicontinuity:

Definition 11. The epigraph of a functional R : X → R ∪ ∞ is definedas

epiR := (x, λ) ∈ X × R | R(x) ≤ λ .

In particular, we have the following proposition [32,140]:

Proposition 33. A functional R : X → R∪∞ is convex if and only if itsepigraph epiR is convex, and lower-semicontinuous if its epigraph is closed.

This is a particularly useful result as very often, it is simpler to prove theconvexity of a functional’s epigraph than to prove the inequality (B.1) holds.Much of convex analysis stems from the generalisation of the concept of aderivative for convex functions:

Definition 12. The subdifferential of a functional R : X → R ∪ ∞ atx ∈ X is defined as the set valued mapping

∂R : X ⇒ X∗

x 7→ ξ ∈ X∗ | R(x′) ≥ R(x) + 〈ξ, x′ − x〉X∗×X for all x′ ∈ X,

for all x ∈ X (cf. [11, 32]).

182

x y

R

Figure B.2: For a concave function R : R → R ∪ ∞, the line connectingany two points R(x) and R(y) lies below or on the graph.

Now we provide some standard examples of subdifferentials:

Example 3. We may compute ∂R(x) for various choices of R and a typicalexample is for R = 1

q‖ · ‖q`q with q ∈ (1,∞), where one has

∂

1

q‖ · ‖q`q

(x) = |xn|q−1 sgn(xn)n∈N,

with sgn, the set-valued mapping

sgn : R ⇒ [−1, 1]

x 7→

−1, if x < 0,

[−1, 1], if x = 0,

1, if x > 0.

For quantifying or measuring the error in convex regularisation and approx-imation theory, rather than using the norm, the so-called Bregman distanceis often employed [21] and is defined as follows:

Definition 13. For a convex functional R : X → R ∪ ∞, the Bregmandistance Dξ2 : X ×X → R∪∞ from x1 to x2 with respect to ξ2 ∈ ∂R(x2)is given by

Dξ2(x1, x2) := R(x1)−R(x2)− 〈ξ2, x1 − x2〉X∗×X .

cf. [17].

183

R

∂R(x0)

x0

Figure B.3: In this plot, a convex function R : R → R ∪ ∞ is non-differentiable at the point x0. Its subdifferential at that point ∂R(x0) is theset of tangent lines marked by the arrows.

Note that the Bregman distance is not really a distance (a.k.a. a metric) asit does not satisfy the triangle inequality; nor is it in general symmetric. Wedo, however, have the following useful so-called three point identity:

Dξ2(x1, x2) = Dξ3(x1, x3) +Dξ3(x3, x2) + 〈ξ3 − ξ2, x1 − x3〉, (B.3)

where ξ1, ξ2, ξ3 are subgradients of a functionalR : X → R∪∞ at x1, x2, x3,respectively (cf. [24, Lemma 3.1]).One can also define

Definition 14. The symmetric Bregman distance from x1 to x2 in X withrespect to ξ1 ∈ ∂R(x1) and ξ2 ∈ ∂R(x2) is defined as

Dsymξ1,ξ2

(x1, x2) := 〈ξ1 − ξ2, x1 − x2〉 = Dξ2(x1, x2) +Dξ1(x2, x1). (B.4)

Clearly, as is hinted by the name, the symmetric Bregman distance, contraryto the usual Bregman distance, is symmetric. It is also trivial from thedefinition that the symmetric Bregman distance is greater than or equal tothe standard Bregman distance.We will utilise the following important class of monotone operators exten-sively throughout our analysis [11]:

Definition 15. If X is a Hilbert space, then for a proper, convex and lowersemi-continuous functional R : X → R∪ ∞ and γ > 0, the proximal point

184

mapping with respect to R is defined as

proxγR : X → X

x 7→ argminz∈X

1

2γ‖z − x‖2

X +R(z).

In particular, we have the following resolvent representation:

proxγR = (I + γ∂R)−1. (B.5)

Definition 16. LetR : X → R∪∞ be proper. Then the Fenchel conjugate(cf. [37]) of R is defined as

R∗ : X∗ → R ∪ ∞x∗ 7→ sup

x∈X〈x∗, x〉X∗×X −R(x).

Note also that R∗ is always closed and convex [32]. A typical example ofa conjugate function is whenever R = 1

p‖ · ‖p`p , then R∗ = 1

p∗‖ · ‖p

∗

`p∗with

1p

+ 1p∗

= 1 and p ≥ 1. The Fenchel conjugate also satisfies the following

additional properties [32]:

R∗(0) = − inf x ∈ XR(x),

R ≤ G ⇐⇒ G∗ ≤ R∗,

(λR)∗(x∗) = λR∗(x∗

λ

), (λ > 0)

(R+ γ)∗ = R∗ − γ. (γ ∈ R)

What is the Fenchel conjugate of the Fenchel conjugate? That question isanswered as follows [32]:

Definition 17. For R : X → R ∪ ∞, we can define the biconjugate of Ras

R∗∗ : X → R ∪ ∞x 7→ sup

x∗∈X∗〈x, x∗〉 − R∗(x∗).

Proposition 34. If R is convex, then R∗∗ = R.

Corollary 6. For every R : X → R ∪ ∞, we have R∗∗∗ = R∗.

An important link between the Fenchel conjugate and the aforementionedproximal point mapping is given by the following proposition which allowsus to represent a vector as the sum of two proximal mappings [113]:

185

Proposition 35. Let R : X → R ∪ ∞ be proper, convex and lower semi-continuous and γ > 0, then we have Moreau’s decomposition:

x = proxR(x) + proxR∗(x)

= proxγR(x) + γ proxγ−1R∗

(x

γ

),

for all x ∈ X, where

proxR∗ : X → X

x 7→ argminx∈X

(1

2‖z − x‖2 +R∗(x)

).

The above proposition is most useful when one would like to compute theproximal mapping with respect to the Fenchel conjugate, for instance. Rewrit-ing Moreau’s decomposition above, one may subsequently write

proxR∗(x) = (I − proxR)(x), and proxγR∗(x) = x− γ prox 1γR

(x

γ

),

(B.6)for all x ∈ X and γ > 0.We will make use of the firm non-expansivity of the proximal mapping oper-ator:

〈proxJ (y1)− proxJ (y2), y1 − y2〉 ≥ ‖ proxJ (y1)− proxJ (y2)‖2, (B.7)

for which we confer [10,11].We have the following useful identity from [32, Prop 5.7, p. 27]

Proposition 36. Suppose there exists a point Ax where R is continuous andfinite. Then we have

∂(R A)(x) = A∗∂R(Ax).

for all x ∈ X.

186

Statutory Declaration

I hereby declare that the thesis submitted is my own unaided work, thatI have not used other than the sources indicated, and that all direct andindirect sources are acknowledged as references.This printed thesis is identical with the electronic version submitted.

Linz, July 2020

————————————————Kemal Raik

187

Curriculum Vitae

Name: Kemal Raik

Nationality: British

Date of Birth: 20 April 1991

Place of Birth: London, United Kingdom

Education:1996–2002 Primary School

Caldecot School, LondonCrawford School, LondonKilmorie School, London

2002–2006 Secondary SchoolForest Hill School, London

2007–2010 Sixth FormHillsyde Sixth Form, LondonSt Joseph’s College, London

2010–2014 Bachelor’s in Mathematics,University of Aberdeen, ScotlandBachelor’s Thesis:Functional analysis

2015–2017 Master’s in Mathematical Sciences,Utrecht University, The NetherlandsMaster’s Thesis:Inverse Schrodinger scattering forseismic imaging

2017–2020 Doctorate in Engineering Sciences(Industrial Mathematics),

Johannes Kepler University Linz, AustriaDoctoral Thesis:Linear and nonlinear heuristic regularisationfor ill-posed problems

Positions:2017-2020 Research Assistant,

Industrial Mathematics Institute,Johannes Kepler University Linz

Scientific Publications:Uno Hamarik, Urve Kangro, Stefan Kindermann, Kemal Raik.

Semi-heuristic parameter choice rules for Tikhonov regularisationwith operator perturbations.Journal of Inverse and Ill-posed problemsVol 27 (1), p. 117-131, 2019.

Stefan Kindermann, Kemal Raik.Heuristic parameter choice rules for Tikhonov regularisationwith weakly bounded noise.Numerical Functional Analysis and Optimization,Vol. 40 (12), p. 1373-1394, 2019.

Stefan Kindermann, Kemal Raik.A simplified L-curve method as error estimator.Electronic Transactions on Numerical Analysis,Vol. 53, p.217-238, 2020.

Stefan Kindermann, Kemal Raik.Convergence of heuristic parameter choice rules for convexTikhonov regularization.SIAM Journal on Numerical Analysis,Vol. 58, p.1773-1800, 2020.

Simon Hubmer, Stefan Kindermann, Kemal Raik, Ekaterina Sherina.A numerical comparison of some heuristic stopping rulesfor nonlinear Landweber iteration.Forthcoming, 2020.

Scientific Talks at Conferences, Workshops and Universities:Oct–June 19th Internet Seminar: Infinite Dimensional Analysis,2016 Casalmaggiore, Italy

189

Feb–Mar Wurzburg Winter School,2018 “Modern Methods in Nonsmooth Optimization”,

Wurzburg, Germany

Oct–June 21st Internet Seminar: Functional Calculus,2018 Wuppertal, Germany

Sept Chemnitz Symposium on Inverse Problems,2018 Chemnitz, Germany

June University of Kent Postgraduate Seminar,2019 Canterbury, United Kingdom

July Applied Inverse Problems (AIP) Conference,2019 Grenoble, France

July International Conference for Industrial and2019 Applied Mathematics (ICIAM),

Valencia, Spain

Oct Chemnitz Symposium on Inverse Problems: On Tour,2019 Frankfurt, Germany

Special Interests: Powerlifting, Boxing, Football, Chess, Photography.

190

Linear and Nonlinear Heuristic Regularisation for Ill ...

Documents

Transcript of Linear and Nonlinear Heuristic Regularisation for Ill ...