Chapter 12: Best Approximationascher/ag_2012/ag_slides/chap12.pdf · 2013. 9. 13. · Chapter 12:...

July 20, 2013

Chapter 12: Best Approximation

Uri M. Ascher and Chen GreifDepartment of Computer ScienceThe University of British Columbia

{ascher,greif}@cs.ubc.ca

Slides for the bookA First Course in Numerical Methods (published by SIAM, 2011)

http://www.ec-securehost.com/SIAM/CS07.html

Best Approximation Goals

Goals of this chapter

• To develop some elegant, classical theory of best approximation using leastsquares and weighted least squares norms;

• to develop important families of orthogonal polynomials;

• to examine in particular Legendre and Chebyshev polynomials.

Uri Ascher & Chen Greif (UBC Computer Science) A First Course in Numerical Methods July 20, 2013 1 / 1

Best Approximation Outline

Outline

• Best least squares approximation

• Orthogonal basis functions

• Weighted least squares

• Chebyshev polynomials

Best Approximation Motivation

Approximation, not just interpolation

• As in Chapters 10 and 11, we are given a complicated (or implicit) functionf(x) which may be evaluated anywhere on the interval [a, b], and we look fora simpler approximation v(x) =

∑nj=0 cjϕj(x), where ϕj(x) are known basis

functions with some useful properties.

• Unlike Chapters 10 and 11, we no longer seek to impose v(xi)− f(xi) = 0 atany n + 1 specific interpolation points.

• One reason for shunning interpolation: if there is significant noise in themeasured values of f then measuring the approximation error atm + 1 > n + 1 points may yield more plausible or meaningful results. Thisleads to overdetermined linear systems which are most easily solved usingdiscrete least squares methods; see Section 6.1.

• Here, however, we stick to the continuous, rather than discrete level, and useintegrals rather than vectors. Typically we may seek a function v(x), i.e.,coefficients cj , so as to minimize

∥v − f∥22 =

(v(x) − f(x))2dx.

Aside: function norms

• For all integrable functions g(x) and f(x) on an interval [a, b], a norm ∥ · ∥ isa scalar function satisfying

• ∥g∥ ≥ 0; ∥g∥ = 0 iff g(x) ≡ 0,• ∥αg∥ = |α|∥g∥ for all scalars α,• ∥f + g∥ ≤ ∥f∥ + ∥g∥.

The set of all functions whose norm is finite forms a function spaceassociated with that particular norm.

• Some popular function norms and corresponding function spaces are:

L2 : ∥g∥2 =

(∫ b

g(x)2dx

, (least squares),

L1 : ∥g∥1 =∫ b

|g(x)|dx,

L∞ : ∥g∥∞ = maxa≤x≤b

|g(x)|, (max).

• We have seen the L∞ norm in action when estimating errors in Chapters 10and 11. Here we concentrate on the L2 norm, occasionally in weighted form.

L2 : ∥g∥2 =

(∫ b

g(x)2dx

, (least squares),

L1 : ∥g∥1 =∫ b

|g(x)|dx,

|g(x)|, (max).

L2 : ∥g∥2 =

(∫ b

g(x)2dx

, (least squares),

L1 : ∥g∥1 =∫ b

|g(x)|dx,

|g(x)|, (max).

Why continuous approximation?

• Discrete least squares (Chapter 6) is certainly efficiently manageable in manysituations and readily applicable. Indeed, in this chapter we don’t directlyconcentrate on efficient programs or algorithms.

• However, sticking to the continuous allows developing a mathematicallyelegant theory.

• Moreover, classical families of orthogonal basis functions are developed whichprove highly useful in Chapters 13, 15, 16 and others.

Best Approximation Continuous least squares

Outline

Best least squares approximation

• We are still approximating a given f(x) on interval [a, b] using

v(x) =n∑

cjϕj(x),

but we are no longer necessarily interpolating.

• Determine the coefficients cj as those that solve the minimization problem

∥f − v∥2 ≡ minc

f(x) −n∑

cjϕj(x)

• Taking first derivative with respect to each ck in turn and equating to 0 gives

f(x) −n∑

cjϕj(x)

ϕk(x)dx = 0, k = 0, 1, . . . , n.

Best least squares approximation

• We are still approximating a given f(x) on interval [a, b] using

v(x) =n∑

cjϕj(x),

but we are no longer necessarily interpolating.

• Determine the coefficients cj as those that solve the minimization problem

∥f − v∥2 ≡ minc

f(x) −n∑

cjϕj(x)

• Taking first derivative with respect to each ck in turn and equating to 0 gives

f(x) −n∑

cjϕj(x)

ϕk(x)dx = 0, k = 0, 1, . . . , n.

Best least squares approximation cont.

• This yields the following construction algorithm:...1 Calculate

B̃j,k =

ϕj(x)ϕk(x)dx, j, k = 0, 1, . . . , n

b̃j =

f(x)ϕj(x)dx j = 0, 1, . . . , n.

...2 Solve linear (n + 1) × (n + 1) system

B̃c = b̃

for the coefficients c = (c0, c1, . . . , cn)T .

• Example: using a monomial basis ϕj(x) = xj for a polynomialapproximation on [0, 1], obtain

B̃j,k =∫ 1

xj+kdx = 1/(j + k + 1), 0 ≤ j, k ≤ n.

Also, b̃j =∫ 1

0f(x)xjdx can be approximated by Chapter 15 techniques.

Best least squares approximation cont.

• This yields the following construction algorithm:...1 Calculate

B̃j,k =

ϕj(x)ϕk(x)dx, j, k = 0, 1, . . . , n

b̃j =

f(x)ϕj(x)dx j = 0, 1, . . . , n.

...2 Solve linear (n + 1) × (n + 1) system

B̃c = b̃

for the coefficients c = (c0, c1, . . . , cn)T .

• Example: using a monomial basis ϕj(x) = xj for a polynomialapproximation on [0, 1], obtain

B̃j,k =∫ 1

xj+kdx = 1/(j + k + 1), 0 ≤ j, k ≤ n.

Also, b̃j =∫ 1

0f(x)xjdx can be approximated by Chapter 15 techniques.

Best Approximation Orthogonal polynomials

Outline

Orthogonal basis functions

• The simple example using monomials leads to stability problems (badconditioning in B̃).

• Using monomial and other simple-minded bases, must evaluate manyintegrals when n is large.

• Idea: construct basis functions such that B̃ is diagonal!This means∫ b

ϕj(x)ϕk(x)dx = 0, j ̸= k.

• Take advantage of the fact that here, unlike interpolation, there are no inputabscissae: can construct families of orthogonal basis functions in advance!

Best least squares using orthogonal basis functions

• The construction algorithm becomes simply:...1 For j = 0, 1, . . . , n, set dj = B̃j,j =

j (x)dx in advance, and calculate

b̃j =

f(x)ϕj(x)dx.

...2 The sought coefficients are

cj = b̃j/dj , j = 0, 1, . . . , n.

• Next, restrict to polynomial best approximation and ask, how to construct afamily of orthogonal polynomials?

• ASIDE: Two square-integrable functions g ∈ L2 and h ∈ L2 are orthogonalto each other if their inner product vanishes, i.e., they satisfy∫ b

g(x)h(x)dx = 0.

b̃j =

f(x)ϕj(x)dx.

cj = b̃j/dj , j = 0, 1, . . . , n.

g(x)h(x)dx = 0.

b̃j =

f(x)ϕj(x)dx.

cj = b̃j/dj , j = 0, 1, . . . , n.

g(x)h(x)dx = 0.

Legendre polynomials

Defined on the interval [−1, 1] by

ϕ0(x) = 1,

ϕ1(x) = x,

ϕj+1(x) =2j + 1j + 1

xϕj(x) − j

j + 1ϕj−1(x), j ≥ 1.

ϕ2(x) =12(3x2 − 1),

ϕ3(x) =56x(3x2 − 1) − 2

12(5x3 − 3x).

Legendre polynomials

Defined on the interval [−1, 1] by

ϕ0(x) = 1,

ϕ1(x) = x,

ϕj+1(x) =2j + 1j + 1

xϕj(x) − j

j + 1ϕj−1(x), j ≥ 1.

ϕ2(x) =12(3x2 − 1),

ϕ3(x) =56x(3x2 − 1) − 2

12(5x3 − 3x).

Legendre polynomials: properties

• Orthogonality:∫ 1

ϕj(x)ϕk(x)dx =

{0 j ̸= k

22j+1 j = k.

• Calibration: |ϕj(x)| ≤ 1, −1 ≤ x ≤ 1, and ϕj(1) = 1.

• Oscillation: ϕj(x) has degree j (not less). All its j zeros are simple and lieinside the interval (−1, 1). Hence the polynomial oscillates j times in thisinterval.

Legendre polynomials: the picture

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−0.8

−0.6

−0.4

−0.2

You can determine which curve corresponds to which ϕj by the number ofoscillations.

Example

• Find polynomial best approximations for f(x) = cos(2πx) over [0, 1] using northogonal polynomials.

• Using Legendre polynomials (note interval change from [−1, 1] to [0, 1]),b̃j =

0cos(2πt)ϕj(2t − 1)dt, and cj = (2j + 1)b̃j .

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

−0.5

exactn=0 or 1n=2 or 3n=4

Here f is in green. Obtain the same approximation for n = 2l and n = 2l + 1due to symmetry of f . Note improvement in error as l increases.

Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

−0.5

Example

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1

−0.5

Best Approximation Orthogonal trigonometric polynomials

Trigonometric polynomials

• Not all interesting orthogonal basis functions are polynomials.

• Of particular interest are the periodic basis functions

ϕ0(x) =1√2π

, ϕ2l−1(x) =1√π

sin(lx), ϕ2l(x) =1√π

cos(lx) l = 1, 2, . . . n/2.

• These are orthonormal on [−π, π]:∫ π

ϕi(x)ϕj(x)dx =

{0, i ̸= j

1, i = j.

• Lead to the famous Fourier transform (Chapter 13).

Best Approximation Orthogonal trigonometric polynomials

Trigonometric polynomials

• Not all interesting orthogonal basis functions are polynomials.

• Of particular interest are the periodic basis functions

ϕ0(x) =1√2π

, ϕ2l−1(x) =1√π

sin(lx), ϕ2l(x) =1√π

cos(lx) l = 1, 2, . . . n/2.

• These are orthonormal on [−π, π]:∫ π

ϕi(x)ϕj(x)dx =

{0, i ̸= j

1, i = j.

• Lead to the famous Fourier transform (Chapter 13).

Best Approximation Weighted least squares

Outline

Weighted least squares

• An important generalization of best least squares approximation: with aweight function w(x) that is never negative, has bounded L2 norm, and mayequal 0 only at isolated points, seek coefficients cj for v(x) =

∑nj=0 cjϕj(x)

so as to minimize∫ b

w(x)(v(x) − f(x))2dx.

• The orthogonality condition correspondingly generalizes to∫ b

w(x)ϕj(x)ϕk(x)dx = 0, j ̸= k.

• If this holds then the best approximation solution is given by cj = b̃j/dj ,

where dj =∫ b

aw(x)ϕ2

j (x)dx > 0 and

b̃j =∫ b

w(x)f(x)ϕj(x)dx, j = 0, 1, . . . , n.

• Important families of orthogonal polynomials may be obtained for goodchoices of the weight function.

Weighted least squares

• An important generalization of best least squares approximation: with aweight function w(x) that is never negative, has bounded L2 norm, and mayequal 0 only at isolated points, seek coefficients cj for v(x) =

∑nj=0 cjϕj(x)

so as to minimize∫ b

w(x)(v(x) − f(x))2dx.

• The orthogonality condition correspondingly generalizes to∫ b

w(x)ϕj(x)ϕk(x)dx = 0, j ̸= k.

• If this holds then the best approximation solution is given by cj = b̃j/dj ,

where dj =∫ b

aw(x)ϕ2

j (x)dx > 0 and

b̃j =∫ b

w(x)f(x)ϕj(x)dx, j = 0, 1, . . . , n.

• Important families of orthogonal polynomials may be obtained for goodchoices of the weight function.

Gram-Schmidt process for constructing orthogonalpolynomials

• Given an interval [a, b] and a weight function w(x), this algorithm producesan orthogonal polynomial family via a short three-term recurrence!

• Set

ϕ0(x) = 1,

ϕ1(x) = x − β1,

ϕj(x) = (x − βj)ϕj−1(x) − γjϕj−2(x), j ≥ 2,

axw(x)[ϕj−1(x)]2dx∫ b

aw(x)[ϕj−1(x)]2dx

, j ≥ 1,

axw(x)ϕj−1(x)ϕj−2(x)dx∫ b

aw(x)[ϕj−2(x)]2dx

j ≥ 2.

• Intimately connected also to the derivation of Krylov subspace iterativemethods for symmetric linear systems (CG, MINRES); see Section 7.5.

Best Approximation Chebyshev polynomials

Outline

Chebyshev polynomials

• This is a family of orthogonal polynomials on [−1, 1] with respect to theweight function

w(x) =1√

1 − x2.

• Define

ϕj(x) = Tj(x) = cos(j arccos x) = cos(jθ),

where x = cos(θ).• So, these polynomials are naturally defined in terms of an angle, θ. Clearly,

the larger j the more oscillatory Tj .

• Easy to verify orthogonality:∫ 1

w(x)Tj(x)Tk(x)dx =∫ 1

Tj(x)Tk(x)√1 − x2

=∫ π

cos(jθ) cos(kθ)dθ =

{0 j ̸= kπ2 j = k > 0

Chebyshev polynomials: derivation and properties

• These polynomials obey the simple 3-term recursion

T0(x) = 1,

T1(x) = x,

Tj+1(x) = 2xTj(x) − Tj−1(x), j ≥ 1,

and obviously satisfy

|Tj(x)| ≤ 1, ∀x.

• The roots (zeros) of Tn are the Chebyshev points

xk = cos(

2k − 12n

), k = 1, . . . , n.

See Section 10.6 for their magic properties.

• The also-important interleaving extremal points (where Tn(x) = ±1) are

ξk = cos(

), k = 0, 1, . . . , n.

Chebyshev polynomials: the picture

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

You can determine which curve corresponds to which ϕj = Tj by the number ofoscillations. Note the perfect calibration.

Chapter 12: Best Approximationascher/ag_2012/ag_slides/chap12.pdf · 2013. 9. 13. · Chapter 12:...

Documents

Transcript of Chapter 12: Best Approximationascher/ag_2012/ag_slides/chap12.pdf · 2013. 9. 13. · Chapter 12:...