20 Greedy Approximation - Assetsassets.cambridge.org/97811070/03378/frontmatter/9781107003378... ·...

www.cambridge.org© in this web service Cambridge University Press

Cambridge University Press978-1-107-00337-8 - Greedy ApproximationVladimir TemlyakovFrontmatterMore information

CAMBRIDGE MONOGRAPHS ONAPPLIED AND COMPUTATIONALMATHEMATICS

Series Editors

M. ABLOWITZ, S. DAVIS, J. HINCH,

A. ISERLES, J. OCKENDON, P. OLVER

20 Greedy Approximation

http://www.cambridge.org/9781107003378

http://www.cambridge.org




The Cambridge Monographs on Applied and Computational Mathematics seriesreflects the crucial role of mathematical and computational techniques in contemporaryscience. The series publishes expositions on all aspects of applicable and numeri-cal mathematics, with an emphasis on new developments in this fast-moving area ofresearch.

State-of-the-art methods and algorithms as well as modern mathematical descrip-tions of physical and mechanical ideas are presented in a manner suited to graduateresearch students and professionals alike. Sound pedagogical presentation is a prereq-uisite. It is intended that books in the series will serve to inform a new generation ofresearchers.

A complete list of books in the series can be found atwww.cambridge.org/mathematicsRecent titles include the following:

8. Schwarz–Christoffel mapping, Tobin A. Driscoll & Lloyd N. Trefethen

9. High-order methods for incompressible fluid flow, M. O. Deville, P. F. Fischer

& E. H. Mund

10. Practical extrapolation methods, Avram Sidi

11. Generalized Riemann problems in computational fluid dynamics, Matania

Ben-Artzi & Joseph Falcovitz

12. Radial basis functions, Martin D. Buhmann

13. Iterative Krylov methods for large linear systems, Henk van der Vorst

14. Simulating Hamiltonian dynamics, Benedict Leimkuhler & Sebastian Reich

15. Collocation methods for Volterra integral and related functional differential

equations, Hermann Brunner

16. Topology for computing, Afra J. Zomorodian

17. Scattered data approximation, Holger Wendland

18. Modern computer arithmetic, Richard Brent & Paul Zimmermann

19. Matrix preconditioning techniques and applications, Ke Chen

20. Greedy approximation, Vladimir Temlyakov

21. Spectral methods for time-dependent problems, Jan Hesthaven, Sigal Gottlieb

& David Gottlieb

22. The mathematical foundations of mixing, Rob Sturman, Julio M. Ottino &

Stephen Wiggins

23. Curve and surface reconstruction, Tamal K. Dey

24. Learning theory, Felipe Cucker & Ding Xuan Zhou

25. Algebraic geometry and statistical learning theory, Sumio Watanabe

26. A practical guide to the invariant calculus, Elizabeth Louise Mansfield






Greedy Approximation

VLADIMIR TEMLYAKOV

University of South Carolina






C A M B R I D G E U N I V E R S I T Y P R E S S

Cambridge, New York, Melbourne, Madrid, Cape Town,Singapore, São Paulo, Delhi, Tokyo, Mexico City

Cambridge University PressThe Edinburgh Building, Cambridge CB2 8RU, UK

Published in the United States of America by Cambridge University Press, New York

www.cambridge.orgInformation on this title: www.cambridge.org/9781107003378

c© Cambridge University Press 2011

This publication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place without the written

permission of Cambridge University Press.

First published 2011

Printed in the United Kingdom at the University Press, Cambridge

A catalog record for this publication is available from the British Library

Library of Congress Cataloging in Publication dataTemlyakov, Vladimir, 1953–

Greedy approximation / Vladimir Temlyakov.p. cm. – (Cambridge monographs on applied and computational mathematics ; 20)

ISBN 978-1-107-00337-8 (hardback)1. Approximation theory. I. Title. II. Series.

QA221.T455 2011518′.5–dc232011025053

ISBN 978-1-107-00337-8 Hardback

Cambridge University Press has no responsibility for the persistence oraccuracy of URLs for external or third-party internet websites referred to inthis publication, and does not guarantee that any content on such websites is,

or will remain, accurate or appropriate.






Contents

Preface page ix

1 Greedy approximation with regard to bases 11.1 Introduction 11.2 Schauder bases in Banach spaces 61.3 Greedy bases 151.4 Quasi-greedy and almost greedy bases 331.5 Weak Greedy Algorithms with respect to bases 391.6 Thresholding and minimal systems 431.7 Greedy approximation with respect to the trigonometric

system 471.8 Greedy-type bases; direct and inverse theorems 581.9 Some further results 631.10 Systems L p-equivalent to the Haar basis 681.11 Open problems 76

2 Greedy approximation with respect to dictionaries:Hilbert spaces 77

2.1 Introduction 772.2 Convergence 842.3 Rate of convergence 892.4 Greedy algorithms for systems that are not dictionaries 972.5 Greedy approximation with respect to λ-quasi-orthogonal

dictionaries 1012.6 Lebesgue-type inequalities for greedy approximation 1112.7 Saturation property of greedy-type algorithms 1222.8 Some further remarks 1352.9 Open problems 141

3 Entropy 1433.1 Introduction: definitions and some simple properties 143

v






vi Contents

3.2 Finite dimensional spaces 1443.3 Trigonometric polynomials and volume estimates 1513.4 The function classes 1653.5 General inequalities 1683.6 Some further remarks 1753.7 Open problems 182

4 Approximation in learning theory 1834.1 Introduction 1834.2 Some basic concepts of probability theory 1894.3 Improper function learning; upper estimates 2064.4 Proper function learning; upper estimates 2354.5 The lower estimates 2534.6 Application of greedy algorithms in learning theory 270

5 Approximation in compressed sensing 2775.1 Introduction 2775.2 Equivalence of three approximation properties of the

compressed sensing matrix 2835.3 Construction of a good matrix 2875.4 Dealing with noisy data 2945.5 First results on exact recovery of sparse signals;

the Orthogonal Greedy Algorithm 2985.6 Exact recovery of sparse signals; the Subspace Pursuit

Algorithm 3055.7 On the size of incoherent systems 3145.8 Restricted Isometry Property for random matrices 3275.9 Some further remarks 3305.10 Open problems 332

6 Greedy approximation with respect to dictionaries:Banach spaces 334

6.1 Introduction 3346.2 The Weak Chebyshev Greedy Algorithm 3406.3 Relaxation; co-convex approximation 3476.4 Free relaxation 3506.5 Fixed relaxation 3546.6 Thresholding algorithms 3596.7 Greedy expansions 3636.8 Relaxation; X -greedy algorithms 3786.9 Incoherent dictionaries and exact recovery 3816.10 Greedy algorithms with approximate evaluations

and restricted search 385






Contents vii

6.11 An application of greedy algorithms for the discrepancyestimates 390

6.12 Open problems 404

References 405Index 415






Preface

From the beginning of time, human beings have been trying to replace com-plicated with simpler things. From ancient shamans working magic upon clayfigures to heal the sick, to Renaissance artists representing God Almighty as anude painted onto a ceiling, the fundamental problem of representation of theonly partially representable continues in contemporary applied mathematics.A generic problem of mathematical and numerical analysis is to represent agiven function approximately. It is a classical problem that goes back to thefirst results on Taylor’s and Fourier’s expansions of a function.

The first step in solving the representation problem is to choose a represen-tation system. Traditionally, a representation system has natural features suchas minimality, orthogonality, simple structure and nice computational charac-teristics. The most typical representation systems are the trigonometric system{eikx }, the algebraic system {xk}, the spline system, the wavelet system andtheir multivariate versions. In general we may speak of a basis � = {ψk}∞k=1in a Banach space X .

The second step in solving the representation problem is to choose a formof an approximant that is built on the base of the chosen representation sys-tem �. In a classical way that was used for centuries, an approximant am is apolynomial with respect to �:

am :=m∑

k=1

ckψk . (1)

The complexity of the approximant am is characterized by the order m ofthe polynomial. It is well known in approximation theory that approxima-tion by polynomials is closely related to the smoothness properties of the

ix






x Preface

function being approximated. Approximation of this type is referred to as lin-ear approximation theory because, for a fixed m, approximants come from alinear subspace spanned by ψ1, . . . , ψm .

It is understood in numerical analysis and approximation theory that in manyproblems from signal/image processing it is more beneficial to use an m-termapproximant with respect to � than a polynomial of order m. This means thatfor f ∈ X we look for an approximant of the form

am( f ) :=∑

k∈�( f )

ckψk, (2)

where �( f ) is a set of m indices which is determined by f . The complex-ity of this approximant is characterized by the cardinality |�( f )| = m of�( f ). Approximation of this type is referred to as nonlinear approximationtheory because, for a fixed m, approximants am( f ) come from different lin-ear subspaces spanned by ψk , k ∈ �( f ), which depend on f . The cardinality|�( f )| is a fundamental characteristic of am( f ) called sparsity of am( f ) withrespect to �. It is now well understood that we need to study nonlinear sparserepresentations in order to increase significantly our ability to process (com-press, denoise, etc.) large data sets. Sparse representations of a function are notonly a powerful analytic tool, but also they are utilized in many applications inimage/signal processing and numerical computation.

The third step in solving the representation problem is to choose a methodof construction of an approximant of desired form. Let us begin with the lineartheory. For example, the approximation method that picks the polynomial ofdegree m with respect to � of best approximation of f in X as an approximantis an optimal method of approximation by polynomials of degree m. However,such an obvious optimal method of approximation may not be good from thepoint of view of practical implementation. Standard methods of approximationthat are more practical than the above one are linear methods of approximation,in particular partial sums of the corresponding expansion of f with respect tothe basis �. Many books (see, for example, DeVore and Lorenz (1993)) discussthis classical topic of approximation theory.

An implementation of the third step in the nonlinear setting is not straight-forward. It is clear that an analog of the best polynomial approximant of orderm is the best m-term approximant. Nonlinearity brings about complicationseven at this stage. The existence of a best approximant from a finite dimen-sional subspace is well known; the existence of a best m-term approximantis a difficult problem. We discuss this problem in Chapter 1. Next, what arenonlinear analogs of, say, partial sums? We answer this question in Chapter 1.






Preface xi

It turns out that greedy approximants are natural substitutes for the partialsums.

We specify not only a form of an approximant, but also choose a specificmethod of approximation (for instance, the one that is known to be good inpractical implementations). Now, we have a precise mathematical problem ofstudying the efficiency of our specific method of approximation. We discussthis problem in detail here. It turns out that a convenient and flexible wayof measuring the efficiency of a specific approximation method is to provethe corresponding Lebesgue-type inequalities. We would like this method towork for all functions. Therefore, it should converge at least for each f ∈ X .Convergence is a fundamental theoretical problem. In this book we thoroughlydiscuss the problem of convergence of greedy algorithms.

The fundamental question of nonlinear approximation is how to devise goodconstructive methods (algorithms) of approximation. This problem has twolevels of nonlinearity. The first level of nonlinearity (discussed above) is m-term approximation with regard to bases. In this problem one can use theunique function expansion with regard to a given basis to build an approx-imant. Nonlinearity comes in when we look for m-term approximants withterms (i.e. basis elements) that are allowed to depend on a given function.We discuss m-term approximation with regard to bases in detail in Chapter1. On the second level of nonlinearity, we replace a basis by a more generalsystem, which is not necessarily minimal (for example, redundant system, ordictionary). This setting is much more complicated than the first one (basescase); however, there is a solid justification of importance of redundant systemsin both theoretical questions and in practical applications (see, for example,Donoho (2001), Huber (1985), Schmidt (1906)). In Chapters 2 and 6 we dis-cuss approximation by linear combinations of elements that are taken from aredundant (overcomplete) system of elements. We briefly discuss the question:Why do we need redundant systems? Answering this question, we first of allmention three classical redundant systems that are used in different areas ofmathematics.

Perhaps the first example of m-term approximation with regard to redundantdictionary was discussed by Schmidt (1906), who considered the approxima-tion of functions f (x, y) of two variables by bilinear forms

∑mi=1 ui (x)vi (y)

in L2([0, 1]2). This problem is closely connected with properties of the integraloperator J f (g) :=

∫ 10 f (x, y)g(y)dy with kernel f (x, y).

Another example, which hails from statistics, is the projection pursuitregression problem. In the language of function theory, the problem is toapproximate in L2(�), where � ⊂ R

d is a bounded domain, a given function






xii Preface

f ∈ L2(�) by a sum of ridge functions, i.e. by∑m

j=1 r j (ω j · x), x, ω j ∈R

d , j = 1, . . . ,m, where r j , j = 1, . . . ,m, are univariate functions.The third example is from signal processing. In signal processing the

most popular means of approximation are wavelets and the system of Gaborfunctions {ga,b(x−c), ga,b(x) := eiax e−bx2

, a, c ∈ R, b ∈ R+}. The Gaborsystem gives more flexibility in constructing an approximant, but it is a redun-dant (not minimal) system. It also seems natural (see the discussion in Donoho(2001)) to use redundant systems in modeling analyzing elements for the visualsystem.

Thus, in order to address the contemporary needs of approximation theoryand computational mathematics, a very general model of approximation withregard to a redundant system (dictionary) has been considered in many recentpapers. As an example of such a model, we choose a Banach space X with ele-ments as target functions and an arbitrary system D of elements of this spacesuch that the closure of spanD coincides with X as an approximating sys-tem. We would like to have an algorithm of constructing m-term approximantsthat adds at each step only one new element from D and keeps elements of Dobtained at the previous steps. This requirement is an analog of on-line com-putation that is very desirable in practical algorithms. Clearly, we are lookingfor good algorithms which converge for each target function. It is not obviousthat such an algorithm exists in a setting at the above level of generality (X , Dare arbitrary).

The fundamental question is how to construct good methods (algorithms) ofapproximation. Recent results have established that greedy type algorithms aresuitable methods of nonlinear approximation in both m-term approximationwith regard to bases and m-term approximation with regard to redundant sys-tems. It turns out that there is one fundamental principle that allows us to buildgood algorithms both for arbitrary redundant systems and for very simple wellstructured bases like the Haar basis. This principle is the use of a greedy stepin searching for a new element to be added to a given m-term approximant. Bya greedy step, we mean one which maximizes a certain functional determinedby information from the previous steps of the algorithm. We obtain differenttypes of greedy algorithms by varying the above-mentioned functional andalso by using different ways of constructing (choosing coefficients of the lin-ear combination) the m-term approximant from the already found m elementsof the dictionary. In Chapters 2 and 6 we present different greedy type algo-rithms beginning with a very simple and very natural Pure Greedy Algorithmin a Hilbert space and ending with its rather complicated modifications in aBanach space. Different modifications aim to make the corresponding greedy






Preface xiii

algorithms more ready for practical implementation. We discuss this issue indetail in Chapters 2 and 6.

It is known that in many numerical problems users are satisfied with aHilbert space setting and do not consider a more general setting in a Banachspace. We now make one remark that justifies our interest in Banach spaces.The first argument is an a-priori argument that the spaces L p are very naturaland should be studied along with the L2 space. The second argument is ana-posteriori argument.

The study of greedy approximation in Banach spaces has revealed that thecharacteristic of a Banach space X that governs the behavior of greedy approx-imation is the modulus of smoothness ρ(u) of X . It is known that the spacesL p, 2 ≤ p < ∞, have modulo of smoothness of the same order: u2. Thus,many results that are known for the Hilbert space L2 and proved using somespecial structure of a Hilbert space can be generalized to Banach spaces L p,2 ≤ p < ∞. The new proofs use only the geometry of the unit sphere of thespace expressed in the form ρ(u) ≤ γ u2.

The theory of greedy approximation is developing rapidly, and resultsare spread over hundreds of papers by different authors. There are severalsurveys that discuss greedy approximation (see DeVore (1998), Konyaginand Temlyakov (2002), Temlyakov (2003a), Temlyakov (2006b), Temlyakov(2008b), Wojtaszczyk (2002a)). This is the first book on greedy approxima-tion. This book is an extension of Temlyakov (2008b). The book provides asystematic presentation of fundamental results in greedy approximation. It alsocontains an introduction to two hot topics in numerical mathematics: learningtheory and compressed sensing. This book possesses features of both a surveypaper and a textbook. The majority of results are given with proofs. However,some important results with technically involved proofs are presented withoutproof. We included proofs of the most important and typical results; and wetried to include those proofs which demonstrate different ideas and are basedon different techniques. In this sense the book has a feature of a survey – it triesto cover broad material. On the other hand, we limit ourselves to a systematictreatment of a specific topic rather than trying to give an overview of all relatedtopics. In this sense the book is close to a textbook. There are many paperson theoretical and computational aspects of greedy approximation, learningtheory and compressed sensing. We have chosen to cover the mathematicalfoundations of greedy approximation, learning theory and compressed sensing.

The book is addressed to researchers working in numerical mathematics,analysis, functional analysis and statistics. It quickly takes the reader from clas-sical results to the frontier of the unknown, but is written at the level of a grad-uate course and does not require a broad background in order to understand the






xiv Preface

topics. Graduate students working in different areas of numerical mathematicsand analysis may find it useful to learn not only greedy approximation theory,but also theoretical foundations of learning theory and compressed sensing.The author taught three graduate courses, Greedy Approximation, LearningTheory and Compressed Sensing, based on the material of the book, at theUniversity of South Carolina. All three courses were very well accepted bystudents. The book might be used for designing different graduate courses. Itcontains a number of important open problems which may assist in uncoveringtopics for dissertations and research papers.

We use C , C(p, d), C p,d , etc., to denote various constants, the indices indi-cating dependence on other parameters. We use the following symbols forbrevity. For two non-negative sequences a = {an}∞n=1 and b = {bn}∞n=1, therelation, or order inequality, an � bn means that there is a number C(a, b)such that, for all n, we have an ≤ C(a, c)bn ; and the relation an bn meansthat an � bn and bn � an . Other notation is defined in the text.

Acknowledgements I am grateful to Jessica L. Nelson for help in thepreparation of the book.




20 Greedy Approximation - Assetsassets.cambridge.org/97811070/03378/frontmatter/9781107003378... ·...

Documents

Transcript of 20 Greedy Approximation - Assetsassets.cambridge.org/97811070/03378/frontmatter/9781107003378... ·...