An introduction to sparse stochastic processes · to the pioneering work of Paul Lévy, who...

An introduction to Sparse Stochastic Processes Copyright © M. Unser and P.D. Tafti

An introduction to sparse stochastic

processes

Michael Unser and Pouya Tafti

Copyright © 2012 Michael Unser and Pouya Tafti. These materials are protected by

copyright under the Attribution-NonCommercial-NoDerivs 3.0 Unported License from

Creative Commons.



Summary

Sparse stochastic processes are continuous-domain processes that admit a parsimoniousrepresentation in some matched wavelet-like basis. Such models are relevant for imagecompression, compressed sensing, and, more generally, for the derivation of statisticalalgorithms for solving ill-posed inverse problems.

This book introduces an extended family of sparse processes that are specified by a gen-eric (non-Gaussian) innovation model or, equivalently, as solutions of linear stochasticdifferential equations driven by white Lévy noise. It presents the mathematical tools fortheir characterization. The two leading threads that underly the exposition are– the statistical property of infinite divisibility, which induces two distinct types of

behavior—Gaussian vs. sparse—at the exclusion of any other;– the structural link between linear stochastic processes and spline functions which is

exploited to simplify the mathematics.The last chapter is devoted to the use of these models for the derivation of algorithmsthat recover sparse signals. This leads to a Bayesian reinterpretation of popular sparsity-promoting processing schemes—such as total-variation denoising, LASSO, and waveletshrinkage—as MAP estimators for specific types of Lévy processes.

The book, which is mostly self-contained, is targeted to an audience of graduate stu-dents and researchers with an interest in signal/image processing, compressed sensing,approximation theory, machine learning, or statistics.

i



Chapter 1

Introduction

1.1 Sparsity: Occam’s razor of modern signal processing?

The hypotheses of Gaussianity and stationarity play a central role in the standard statist-ical formulation of signal processing. They fully justify the use of the Fourier transformas the optimal signal representation and naturally lead to the derivation of optimal linearfiltering algorithms for a large variety of statistical estimation tasks. This classical view ofsignal processing is elegant and reassuring, but it not at the forefront of research anymore.

Starting with the discovery of the wavelet transform in the late 80s [Dau88, Mal89], resear-chers in signal processing have progressively moved away from the Fourier transform andhave uncovered powerful alternatives. Consequently, they have ceased modeling signalsas Gaussian stationary processes and have adopted a more deterministic, approximation-theoretic point of view. The key developments that are presently reshaping the field, andwhich are central to the theory presented in this monograph, are summarized below.

Novel transforms and dictionaries for the representation of signals: New redundantand non-redundant representations of signals (wavelets, local cosine, cur-velets) haveemerged during the past two decades and have led to better algorithms for data com-pression, data processing, and feature extraction. The most prominent example is thewavelet-based JPEG-2000 standard for image compression [CSE00], which outperformsthe widely-used JPEG method based on the DCT (discrete cosine transform). Anotherillustration is wavelet-domain image denoising which provides a good alternative tomore traditional linear filtering [Don95]. The various dictionaries of basis functionsthat have been proposed so far are tailored to specific types of signals; there does notappear to be one that fits all.

Sparsity as a new paradigm for signal processing: At the origin of this new trend isthe key observation that many naturally-occurring signals and images—in particular,the ones that are piecewise-smooth—can be accurately reconstructed from a “sparse”wavelet expansion that involves much fewer terms than the original number of samples[Mal98]. The concept of sparsity has been systematized and extended to other trans-forms, including redundant representations (a.k.a. frames); it is at the heart of re-cent developments in signal processing. Sparse signals are easy to compress and todenoise by simple pointwise processing (e.g., shrinkage) in the transformed domain.Sparsity provides an equally-powerful framework for dealing with more difficult, ill-posed signal-reconstruction problems [CW08, BDE09]. Promoting sparse solutions inlinear models is also of interest in statistics: a popular regression shrinkage estimatoris LASSO, which imposes an upper bound on the `1-norm of the model coefficients[Tib96].

1



1. INTRODUCTION

New sampling strategies with fewer measurements: The theory of compressed sensingdeals with the problem of the reconstruction of a signal from a minimal, but suitablychosen, set of measurements [Don06, CW08, BDE09]. The strategy there is as fol-lows: among the multitude of solutions that are consistent with the measurements,one should favor the “sparsest” one. In practice, one replaces the underlying `0-normminimization problem, which is NP hard, by a convex `1-norm minimization which iscomputationally much more tractable. Remarkably, researchers have shown that thissimplification does yield the correct solution under suitable conditions (e.g., restrictedisometry) [CW08]. Similarly, it has been demonstrated that signals with a finite rate ofinnovation (the prototypical example being a stream of Dirac impulses with unknownlocations and amplitudes) can be recovered from a set of uniform measurements attwice the “innovation rate” [VMB02], rather than twice the bandwidth, as would other-wise be dictated by Shannon’s classical sampling theorem.

Superiority of nonlinear signal-reconstruction algorithms: There is increasing empiricalevidence that nonlinear variational methods (non-quadratic or sparsity-driven regular-ization) outperform the classical (linear) algorithms (direct or iterative) that are beingused routinely for solving bioimaging reconstruction problems [CBFAB97, FN03]. Sofar, this has been demonstrated for the problem of image deconvolution and for thereconstruction of non-Cartesian MRI [LDP07]. The considerable research effort in thisarea has also resulted in the development of novel algorithms (ISTA, FISTA) for solvingconvex optimization problems that were previously considered out of numerical reach[FN03, DDDM04, BT09].

1.2 Sparse stochastic models: The next step beyond Gaussianity

While the recent developments listed above are truly remarkable and have resulted in sig-nificant algorithmic advances, the overall picture and understanding is still far from beingcomplete. One limiting factor is that the current formulations of compressed sensing andsparse-signal recovery are fundamentally deterministic. By drawing on the analogy withthe classical linear theory of signal processing, where there is an equivalence betweenquadratic energy-minimization techniques and minimum-mean-square-error (MMSE)estimation under the Gaussian hypothesis, there are good chances that further progressis achievable by adopting a complementary statistical-modeling point of view 1. The cru-

1. It is instructive to recall the fundamental role of statistical modeling in the development of traditionalsignal processing. The standard tools of the trade are the Fourier transform, Shannon-type sampling, linearfiltering, and quadratic energy-minimization techniques. These methods are widely used in practice: Theyare powerful, easy to deploy, and mathematically convenient. The important conceptual point is that they arejustifiable based on the theory of Gaussian stationary processes (GSP). Specifically, one can invoke the followingoptimality results:

The Fourier transform as well as several of its real-valued variants (e.g., DCT) are asymptotically equivalent tothe Karhunen-Loève transform (KLT) for the whole class of GSP. This supports the use of sinusoidal transformsfor data compression, data processing, and feature extraction. The underlying notion of optimality here isenergy compaction, which implies decorrelation. Note that the decorrelation is equivalent to independencein the Gaussian case only.Optimal filters : Given a series of linear measurements of a signal corrupted by noise, one can readily specifyits optimal reconstruction (LMMSE estimator) under the general Gaussian hypothesis. The correspondingalgorithm (Wiener filter) is linear and entirely determined by the covariance structure of the signal and noise.There is also a direct connection with variational reconstruction techniques since the Wiener solution canalso be formulated as a quadratic energy-minimization problem (Gaussian MAP estimator).Optimal sampling/interpolation strategies: While this part of the story is less known, one can also invokeestimation-theoretic arguments to justify a Shannon-type, constant-rate sampling, which ensures a min-imum loss of information for a large class of predominantly-lowpass GSP [PM62, Uns93]. This is not totallysurprising since the basis functions of the KLT are inherently bandlimited. One can also derive minimummean-square-error interpolators for GSP in general. The optimal signal-reconstruction algorithm takes theform of a hybrid Wiener filter whose input is discrete (signal samples) and whose output is a continuously-

2



1.2. Sparse stochastic models: The next step beyond Gaussianity

cial ingredient that is required to guide such an investigation is a sparse counterpart tothe classical family of Gaussian stationary processes (GSP). This monograph focuses onthe formulation of such a statistical framework, which may be aptly qualified as the nextstep after Gaussianity under the functional constraint of linearity.

In light of the elements presented in the introduction, the basic requirements for a com-prehensive theory of sparse stochastic processes are as follows:

Backward compatibility: There is a large body of literature and methods based on themodeling of signals as realizations of GSP. We would like the corresponding identifica-tion, linear filtering, and reconstruction algorithms to remain applicable, even thoughthey obviously become suboptimal when the Gaussian hypothesis is violated. This callsfor an extended formulation that provides the same control of the correlation structureof the signals (second-order moments, Fourier spectrum) as the classical theory does.

Continuous-domain formulation: The proper interpretation of qualifying terms suchas “piecewise-smooth”, “translation-invariant”, “scale-invariant”, “rotation-invariant”calls for continuous-domain models of signals that are compatible with the conven-tional (finite-dimensional) notion of sparsity. Likewise, if we intend to optimize orpossibly redesign the signal-acquisition system as in generalized sampling and com-pressed sensing, the very least is to have a model that characterizes the informationcontent prior to sampling.

Predictive power: Among other things, the theory should be able to explain why wave-let representations can outperform the older Fourier-related types of decompositions,including the KLT, which is optimal from the classical perspective of variance concen-tration.

Ease of use: To have practical usefulness, the framework should allow for the deriva-tion of the (joint) probability distributions of the signal in any transformed domain.This calls for a linear formulation with the caveat that it needs to accommodate non-Gaussian distributions. In that respect, the best thing beyond Gaussianity is infinitedivisibility, which is a general property of random variables that is preserved under ar-bitrary linear combinations.

Stochastic justification and refinement of current reconstruction algorithms: A convin-cing argument for adopting a new theory is that it must be compatible with the stateof the art, while it also ought to suggest new directions of research. In the present con-text, it is important to be able to establish the connection with deterministic recoverytechniques such as `1-norm minimization.

The good news is that the foundations for such a theory exist and can be traced backto the pioneering work of Paul Lévy, who defined a broad family of “additive” stochasticprocesses, now called Lévy processes. Brownian motion (a.k.a. the Wiener process) is theonly Gaussian member of this family, and, as we shall demonstrate, the only represent-ative that does not exhibit any degree of sparsity. The theory that is developed in thismonograph constitutes the full linear, multidimensional extension of those ideas wherethe essence of Paul Lévy’s construction is embodied in the definition of Lévy innovations(or white Lévy noise), which can be interpreted as the derivative of a Lévy process in thesense of distributions (a.k.a. generalized functions). The Lévy innovations are then lin-early transformed to generate a whole variety of processes whose spectral characteristicsare controlled by a linear mixing operator, while their sparsity is governed by the innov-ations. The latter can also be viewed as the driving term of some corresponding linearstochastic differential equation (SDE). Another way of describing the extent of this gen-eralization is to consider the representation of a general continuous-domain Gaussian

defined signal that can be represented in terms of generalized B-spline basis functions [UB05b].

3



1. INTRODUCTION

process by a stochastic Wiener integral:

s(x) =Z

Rh(x, x 0) dW (x 0) (1.1)

where h(x, x 0) is the kernel—that is, the infinite-dimensional analog of the matrix repres-entation of a transformation in Rn—of a general, L2-stable linear operator. dW is theso-called random Wiener measure which is such that

W (x) =Zx

0dW (x 0)

is the Wiener process, where the latter equation constitutes a special case of (1.1) withh(x, x 0) = 1{x>x 0∏0}. There, 1≠ denotes the indicator function of the set ≠. If h(x, x 0) =h(x °x 0) is a convolution kernel, then (1.1) defines the whole class of Gaussian stationaryprocesses. The essence of the present formulation is to replace the Wiener measure by amore general non-Gaussian, multidimensional Lévy measure. The catch, however, is thatwe shall not work with measures but rather with generalized functions and generalizedstochastic processes. These are easier to manipulate in the Fourier domain and bettersuited for specifying general linear transformations. In other words, we shall rewrite (1.1)as

s(x) =Z

Rh(x, x 0)w(x 0) dx 0 (1.2)

where the entity w (the continuous-domain innovation) needs to be given a proper math-ematical interpretation. The main advantage of working with innovations is that theyprovide a very direct link with the theory of linear systems, which allows for the use ofstandard engineering notions such as the impulse and frequency responses of a system.

1.3 From splines to stochastic processes, or when Schoenberg meetsLévy

We shall start our journey by making an interesting connection between splines, whichare deterministic objects with some inherent sparsity, and Lévy processes with a spe-cial focus on the compound Poisson process, which constitutes the archetype of a sparsestochastic process. The key observation is that both categories of signals—namely, de-terministic and random—are ruled by the same differential equation. They can be gen-erated via the proper integration of an “innovation” signal that carries all the necessaryinformation. The fun is that the underlying differential system is only marginally stable,which requires the design of a special anti-derivative operator. We then use the close rela-tionship between splines and wavelets to gain insight on the ability of wavelets to providesparse representations of such signals. Specifically, we shall see that most non-GaussianLévy processes admit a better M-term representation in the Haar wavelet basis than inthe classical Karhunen-Loève transform (KLT) which is usually believed to be optimal fordata compression. The explanation for this counter-intuitive result is that we are breakingsome of the assumptions that are implicit in the proof of optimality of the KLT.

1.3.1 Splines and Legos revisited

Splines constitute a general framework for converting series of data points (or samples)into continuously-defined signals or functions. By extension, they also provide a powerfulmechanism for translating tasks that are specified in the continuous domain into efficientnumerical algorithms (discretization).

4



1.3. From splines to stochastic processes, or when Schoenberg meets Lévy0 5 10 15 20 25 30

0 0

Ê

Ê ÊÊ

Ê

Ê

Ê

ÊÊ

Ê

Ê

0 2 4 6 8 10

Ê

Ê ÊÊ

Ê

Ê

Ê

ÊÊ

Ê

Ê

0 2 4 6 8 10

0 2 4 6 8 10

(a)

(b)

(c)

Figure 1.1: Examples of spline signals. (a) Cardinal spline interpolant of degree 0(piecewise-constant). (b) Cardinal spline interpolant of degree 1 (piecewise-linear). (c)Nonuniform D-spline or compound Poisson process, depending on the interpretation(deterministic vs. stochastic).

The cardinal setting corresponds to the configuration where the sampling grid is on theintegers. Given a sequence of sample values f [k],k 2 Z, the basic cardinal interpolationproblem is to construct a continuously-defined signal f (x), x 2 R that satisfies the inter-polation condition f (x)

Ø

Ø

x=k = f [k], for all k 2 Z. Since the general problem is obviouslyill-posed, the solution is constrained to live in a suitable reconstruction subspace (e.g.,a particular space of cardinal splines) whose degrees of freedom are in one-to-one cor-respondence with the data points. The most basic concretization of those ideas is theconstruction of the piecewise-constant interpolant

f1(x) =X

k2Zf [k]Ø0

+(x °k) (1.3)

which involves rectangular basis functions (informally described as Legos) that are shiftedreplicates of the causal 2B-spline of degree zero

Ø0+(x) =

8

<

:

1, for 0 ∑ x < 1

0, otherwise.(1.4)

Observe that the basis functions {Ø0+(x °k)}k2Z are non-overlapping, orthonormal, and

that their linear span defines the space of cardinal polynomial splines of degree 0.Moreover, since Ø0

+(x) takes the value one at the origin and vanishes at all other integers,the expansion coefficients in (1.3) coincide with the original samples of the signal. Equa-tion (1.3) is nothing but a mathematical representation of the sample-and-hold methodof interpolation which yields the type of “Lego-like” signal shown in Fig. 1.1a.

A defining property of piecewise-constant signals is that they exhibit “sparse” first-orderderivatives that are zero almost everywhere, except at the points of transition where dif-ferentiation is only meaningful in the sense of distributions. In the case of the cardinal

2. A function f+(x) is said to be causal if f+(x) = 0, for all x < 0.

5



1. INTRODUCTION

1 2 3 4 5

1

�(0)(x) = 1+(x) � 1+(x � 1)

(a) (b)1 2 3 4 5

1

Figure 1.2: Causal polynomial B-splines. (a) Construction of the B-spline of degree 0starting from the causal Green function of D. (b) B-splines of degree n = 0, . . . ,4, whichbecome more bell-shaped (and beautiful) as n increases.

spline specified by (1.3), we have that

D f1(x) =X

k2Za1[k]±(x °k) (1.5)

where the weights of the integer-shifted Dirac impulses ±(·°k) are given by the corres-ponding jump size of the function: a1[k] = f [k]° f [k °1]. The main point is that the ap-plication of the operator D = d

dx uncovers the spline discontinuities (a.k.a. knots) whichare located on the integer grid: Its effect is that of a mathematical A-to-D conversion sincethe r.h.s. of (1.5) corresponds to the continuous-domain representation of a discrete sig-nal commonly used in the theory of linear systems. In the nomenclature of splines, wesay that f1(x) is a cardinal D-spline 3, which is a special case of a general non-uniformD-spline where the knots can be located arbitrarily (cf. Fig. 1.1c).

The next fundamental observation is that the expansion coefficients in (1.5) are obtainedvia a finite-difference scheme which is the discrete counterpart of differentiation. To getsome further insight, we define the finite-difference operator

Dd f (x) = f (x)° f (x °1).

The latter turns out to be a smoothed version of the derivative

Dd f (x) = (Ø0+§D f )(x),

where the smoothing kernel is precisely the B-spline generator for the expansion (1.3). Anequivalent manifestation of this property can be found in the relation

Ø0+(x) = DdD°1±(x) = Dd1+(x) (1.6)

where the unit step 1+(x) = 1[0,+1) (a.k.a. the Heaviside function) is the causal Greenfunction 4 of the derivative operator. This formula is illustrated in Fig. 1.2a. Its Fourier-domain counterpart is

Ø0+(!) =

Z

RØ0+(x)e°j!x dx = 1°e°j!

j!(1.7)

which is recognized as being the ratio of the frequency responses of the operators Dd andD, respectively.

3. Other brands of splines are defined in the same fashion by replacing the derivative D by some other differ-ential operator generically denoted by L.

4. We say that Ω(x) is the causal Green function of the shift-invariant operator L if Ω is causal and satisfiesLΩ = ±. This can also be written as L°1±= Ω, meaning that Ω is the causal impulse response of the shift-invariantinverse operator L°1.

6



1.3. From splines to stochastic processes, or when Schoenberg meets Lévy

Thus, the basic Lego component, Ø0+, is much more than a mere building block: it

is also a kernel that characterizes the approximation that is made when replacing acontinuous-domain derivative by its discrete counterpart. This idea (and its generaliz-ation for other operators) will prove to be one of the key ingredient in our formulation ofsparse stochastic processes.

1.3.2 Higher-degree polynomial splines

A slightly more sophisticated model is to select a piecewise-linear reconstruction whichadmits the similar B-spline expansion

f2(x) =X

k2Zf [k +1]Ø1

+(x °k) (1.8)

where

Ø1+(x) =

°

Ø0+§Ø0

+¢

(x) =

8

>

>

>

<

>

>

>

:

x, for 0 ∑ x < 1

2°x, for 1 ∑ x < 2

0, otherwise

(1.9)

is the causal B-spline of degree 1, a triangular function centered at x = 1. Note that the useof a causal generator is compensated by the unit shifting of the coefficients in (1.8), whichis equivalent to re-centering the basis functions on the sampling locations. The mainadvantage of f2 in (1.8) over f1 in (1.3) is that the underlying function is now continuous,as illustrated in Fig. 1b.

In an analogous manner, one can construct higher-degree spline interpolants that arepiecewise-polynomials of degree n by considering B-splines atoms of degree n obtainedfrom the (n +1)-fold convolution of Ø0

+(x) (cf. Fig. 1.2b). The generic version of such ahigher-order spline model is

fn+1(x) =X

k2Zc[k]Øn

+(x °k) (1.10)

withØn+(x) =

°

Ø0+§Ø0

+§ · · ·§Ø0+

| {z }

n+1

¢

(x).

The catch though is that, for n > 1, the expansion coefficients c[k] in (1.10) are notidentical to the sample values f [k] anymore. Yet, they are in a one-to-one correspond-ence with them and can be determined efficiently by solving a linear system of equationsthat has a convenient band-diagonal Toeplitz structure [Uns99].

The higher-order counterparts of relations (1.7) and (1.6) are

Øn+(!) =

µ

1°e°j!

j!

∂n+1

and

Øn+(x) = Dn+1

d D°(n+1)±(x)

=Dn+1

d (x)n+

n!(1.11)

=n+1X

m=0(°1)m

√

n +1m

!

(x °k)n+

n!.

7



1. INTRODUCTION

with (x)+ = max(0, x). The latter explicit time-domain formula follows from the fact thatthe impulse response of the (n+1)-fold integrator (or, equivalently, the causal Green func-

tion of Dn+1) is the one-sided power function D°(n+1)±(x) = xn+

n! . This elegant formula isdue to Schoenberg, the father of splines [Sch46]. He also proved that the polynomial B-spline of degree n is the shortest cardinal Dn+1-spline and that its integer translates forma Riesz basis of such polynomial splines. In particular, he showed that the B-spline rep-resentation (1.10) is unique and stable, in the sense that

k fnk2L2

=Z

R| fn(x)|2 dx ∑ kck2

`2=

X

k2Z

Ø

Øc[k]Ø

Ø

2.

Note that the inequality above becomes an equality for n = 0 since the squared L2-norm ofthe corresponding piecewise-constant function is easily converted into a sum. This alsofollows from Parseval’s identity because the B-spline basis {Ø0

+(·°k)}k2Z is orthonormal.

One last feature is that polynomial splines of degree n are inherently smooth, in the sensethat they are n-times differentiable everywhere with bounded derivatives—that is, Höldercontinuous of order n. In the cardinal setting, this follows from the property that

DnØn+(x) = DnDn+1

d D°(n+1)±(x)

= Dnd DdD°1±(x) = Dn

dØ0+(x),

which indicates that the nth-order derivative of a B-spline of degree n is piecewise-constant and bounded.

1.3.3 Random splines, innovations, and Lévy processes

To make the link with Lévy processes, we now express the random counterpart of (1.5) as

Ds(x) =X

nan±(x °xn) = w(x) (1.12)

where the locations xn of the Dirac impulses are uniformly distributed over the real line(Poisson distribution with rate parameter ∏) and the weights an are i.i.d. with amplitudedistribution p A(a). For simplicity, we are also assuming that p A is symmetric with finitevariance æ2

a =R

R a2p A(a) da. We shall refer to w as the innovation of the signal s since itcontains all the parameters that are necessary for its description. Clearly, s is a signal witha finite rate of innovation, a term that was coined by Vetterli et al.[VMB02].

The idea now is to reconstruct s from its innovation w by integrating (1.12). This requiresthe specification of some boundary condition to fix the integration constant. Since theconstraint in the definition of Lévy processes is s(0) = 0 (with probability one), we firstneed to find a suitable antiderivative operator, which we shall denote by D°1

0 . In the eventwhen the input function is Lebesgue integrable, the relevant operator is readily specifiedas

D°10 '(x) =

Zx

°1'(ø) dø°

Z0

°1'(ø) dø=

8

>

>

<

>

>

:

Zx

0'(ø) dø, for x ∏ 0

°Z0

x'(ø) dø, for x < 0

It is the corrected version (subtraction of the proper signal-dependent constant) of theconventional shift-invariant integrator D°1 for which the integral runs from °1 to x. TheFourier counterpart of this definition is

D°10 '(x) =

Z

R

e j!x °1j!

'(!)d!2º

8




which can be extended, by duality, to a much larger class of generalized functions (cf.Chapter 5). This is feasible because the latter expression is a regularized version of anintegral that would otherwise be singular, since the division by j! is tempered by a propercorrection in the numerator: e j!x°1 = j!+O(!2). It is important to note that D°1

0 is scale-invariant (in the sense that it commutes with scaling), but not shift-invariant, unlike D°1.Our reason for selecting D°1

0 over D°1 is actually more fundamental than just imposingthe “right” boundary conditions. It is guided by stability considerations: D°1

0 is a validright inverse of D in the sense that DD°1

0 = Id over a large class of generalized functions,while the use of the shift-invariant inverse D°1 is much more constrained. Other thanthat, both operators share most of their global properties. In particular, since the finite-difference operator has the convenient property of annihilating the constants that are inthe null space of D, we see that

Ø0+(x) = DdD°1

0 ±(x) = DdD°1±(x). (1.13)

Having the proper inverse operator at our disposal, we can apply it to formally solve thestochastic differential equation (1.12). This yields the explicit representation of the sparsestochastic process:

s(x) = D°10 w(x) =

X

nanD°1

0 {±(·°xn)}(x)

=X

nan

°

1+(x °xn)°1+(°xn)¢

(1.14)

where the second term 1+(°xn) in the last parenthesis ensures that s(0) = 0. Clearly,the signal defined by (1.14) is piecewise-constant (random spline of degree 0) and itsconstruction is compatible with the classical definition of a compound Poisson process,which is a special type of Lévy process. A representative example is shown in Fig. 1.1c.

It can be shown that the innovation w specified by (1.12), made of random impulses, is aspecial type of continuous-domain white noise with the property that

E{w(x)w(x 0)} = Rw (x °x 0) =æ20±(x °x 0) (1.15)

where æ20 = ∏æ2

a is the product of the Poisson rate parameter ∏ and the variance æ2a of

the amplitude distribution. More generally, we can determine the correlation form of theinnovation, which is given by

E{h'1, wih'2, wi} =æ20h'1,'2i (1.16)

for any real-valued functions '1,'2 2 L2(R) and h'1,'2i=R

R'1(x)'2(x) dx.

This suggests that we can apply the same operator-based synthesis to other types ofcontinuous-domain white noise, as illustrated in Fig. 1.3. In doing so, we are able togenerate the whole family of Lévy processes. In the case where w is a white Gaussiannoise, the resulting signal is a Brownian motion which has the property of being con-tinuous almost everywhere. A more extreme case arises when w is an alpha-stable noisewhich yields a stable Lévy process whose sample path has a few really large jumps and isrougher than a Brownian motion .

In the classical literature on stochastic processes, Lévy processes are usually defined interms of their increments, which are i.i.d. and infinitely-divisible random variables (cf.Chapter 7). Here, we shall consider the so-called increment process u(x) = s(x)° s(x °1), which has a number of remarkable properties. The key observation is that u, in itscontinuous-domain version, is the convolution of a white noise (innovation) with the B-spline kernel Ø0

+. Indeed, the relation (1.13) leads to

u(x) = Dds(x) = DdD°10 w(x) = (Ø0

+§w)(x). (1.17)

9



1. INTRODUCTION

0.0 0.2 0.4 0.6 0.8 1.0

0 0

0.0 0.2 0.4 0.6 0.8 1.00 0

0.0 0.2 0.4 0.6 0.8 1.0

0 0

Compound Poisson

Brownian motion

Integrator

Gaussian

Impulsive � t

0d�

Lévy flight

s(t)w(t)

White noise (innovation) Lévy process

S�S (Cauchy)

Figure 1.3: Synthesis of different brands of Lévy processes by integration of a correspond-ing continuous-domain white noise. The alpha-stable excitation in the bottom exampleis such that the increments of the Lévy process have a symmetric Cauchy distribution.

This implies, among other things, that u is stationary, while the original Lévy process s isnot (since D°1

0 is not shift-invariant). It also suggests that the samples of the incrementprocess u are independent if they are taken at a distance of 1 or more apart, the limitcorresponding to the support of the rectangular convolution kernel Ø0

+. When the auto-correlation function Rw (y) of the driving noise is well-defined and given by (1.15), we caneasily determine the autocorrelation of u as

Ru(y) = E{u(x)u(x + y)} =°

Ø0+§ (Ø0

+)_ §Rw¢

(y) =æ20Ø

1+(y °1) (1.18)

where (Ø0+)_(x) =Ø0

+(°x). It is proportional to the autocorrelation of a rectangle, which isa triangular function (centered B-spline of degree 1).

Of special interest to us are the samples of u on the integer grid, which are characterizedfor k 2Z as

u[k] = s(k)° s(k °1) = hw,Ø0+(·°k)i.

The r.h.s. relation can be used to show that the u[k] are i.i.d. because w is white, station-ary, and the supports of the analysis functions Ø0

+(·° k) are non-overlapping. We shallrefer to {u[k]}k2Z as the discrete innovation of s. Its determination involves the samplingof s at the integers and a discrete differentiation (finite differences), in direct analogy withthe generation of the continuous-domain innovation w(x) = Ds(x).

The discrete innovation sequence u[·] will play a fundamental role in signal processingbecause it constitutes a convenient tool for extracting the statistics and characterizingthe samples of a stochastic process. It is probably the best practical way of presenting theinformation because

1. we never have access to the full signal s(x), which is a continuously-defined entity,and

2. we cannot implement the whitening operator (derivative) exactly, not to mention thatthe continuous-domain innovation w(x) does not admit an interpretation as an or-dinary function of x. For instance, Brownian motion is not differentiable anywhere inthe classical sense.

This points to the fact that the continuous-domain innovation model is a theoretical con-struct. Its primary purpose is to facilitate the determination of the joint probability dis-tributions of any series of linear measurements of a wide class of sparse stochastic pro-

10




��

��

�2,0

�1,0�1,0

�0,0 �0,2�0,2�0,0

i = 0

i = 1

i = 2

��

��

�

(a)

��

��

i = 0

i = 1

i = 2

��

��

��2,0

(b)

Figure 1.4: Dual pair of multiresolution bases where the first kind of functions (wavelets)are the derivatives of the second (hierarchical basis functions): (a) (unnormalized) Haarwavelet basis. (b) Faber-Schauder basis (a.k.a. Franklin system).

cesses, including the discrete version of the innovation which has the property of beingmaximally decoupled.

1.3.4 Wavelet analysis of Lévy processes and M-term approximations

Our purpose so far has been to link splines and Lévy processes to the derivative operatorD. We shall now exploit this connection in the context of wavelet analysis. To that end,we consider the Haar basis {√i ,k }i2Z,k2Z, which is generated by the Haar wavelet

√Haar(x) =

8

>

>

>

<

>

>

>

:

1, for 0 ∑ x < 12

°1, for 12 ∑ x < 1

0, otherwise.

(1.19)

The basis functions, which are orthonormal, are given by

√i ,k (x) = 2°i /2√Haar

µ

x °2i k

2i

∂

(1.20)

where i and k are the scale (dilation of√Haar by 2i ) and location (translation of√i ,0 by 2i k)indices, respectively. A closely related system is the Faber-Schauder basis {¡i ,k (·)}i2Z,k2Z,which is made up of B-splines of degree 1 in a wavelet-like configuration (cf. Fig. 1.4).

Specifically, the hierarchical triangle basis functions are given by

¡i ,k (x) =Ø1+

µ

x °2i k

2i°1

∂

. (1.21)

While these functions are orthogonal within any given scale (because they are non-overlapping), they fail to be so across scales. Yet, they form a Schauder basis, which isa somewhat weaker property than being a Riesz basis of L2(R).

The fundamental observation for our purpose is that the Haar system can be obtainedby differentiating the Faber-Schauder one, up to some amplitude factor. Specifically, we

11



1. INTRODUCTION

have the relations

√i ,k = 2i /2°1D¡i ,k (1.22)

D°10 √i ,k = 2i /2°1¡i ,k . (1.23)

Let us now apply (1.22) to the formal determination of the wavelet coefficients of the Lévyprocess s = D°1

0 w . The crucial manipulation, which will be justified rigorously within theframework of generalized stochastic processes (cf. Chapter 3), is hs,D¡i ,ki= hD§s,¡i ,ki=°hw,¡i ,ki where we have used the adjoint relation D§ =°D and the right inverse propertyof D°1

0 . This allows us to express the wavelet coefficients as

Wi ,k = hs,√i ,ki=°2i /2°1hw,¡i ,ki

which, up to some scaling factors, amounts to a Faber-Schauder analysis of the innov-ation w = Ds. Since the triangle functions ¡i ,k are non-overlapping within a given scaleand the innovation is independent at every point, we immediately deduce that the corres-ponding wavelet coefficients are also independent. However, the decoupling is not per-fect across scales due to the parent-to-child overlap of the triangle functions. The residualcorrelation can be determined from the correlation form (1.16) of the noise, according to

E{Wi , j Wi 0,k 0 } = 2(i+i 0)/2°2E©

hw,¡i ,kihw,¡i 0,k 0 i™

/h¡i ,k ,¡i 0,k 0 i.

Since the triangle functions are non-negative, the residual correlation is zero iff. ¡i ,k and¡i 0,k 0 are non-overlapping, in which case the wavelet coefficients are independent as well.We can also predict that the wavelet transform of a compound Poisson process will besparse (i.e., with many vanishing coefficients) because the random Dirac impulses of theinnovation will intersect only few Faber-Schauder functions, an effect that becomes moreand more pronounced as the scale gets finer. The level of sparsity can therefore be expec-ted to be directly dependent upon ∏ (the density of impulses per unit length).

To quantify this behavior, we applied Haar wavelets to the compression of sampled real-izations of Lévy processes and compared the results with those of the “optimal” textbooksolution for transform coding. In the case of a Lévy process with finite variance, theKarhunen-Loève transform (KLT) can be determined analytically from the knowledge ofthe covariance function E{s(x)s(y)} =C

°

|x|+ |y |° |x ° y |¢

where C is an appropriate con-stant. The KLT is also known to converge to the discrete cosine transform (DCT) as the sizeof the signal increases. The present compression task is to reconstruct a series of 4096-point signals from their M largest transform coefficients, which is the minimum-error se-lection rule dictated by Parseval’s relation. Fig. 1.5 displays the graph of the relative quad-ratic M-term approximation errors for the three types of Lévy processes shown in Fig. 1.3.We also considered the identity transform as baseline, and the DCT as well, whose resultswere found to be indistinguishable from those of the KLT. We observe that the KLT per-forms best in the Gaussian scenario, as expected. It is also slightly better than wavelets atlarge compression ratios for the compound Poisson process (piecewise-constant signalwith Gaussian-distributed jumps). In the latter case, however, the situation changes dra-matically as M increases since one is able to reconstruct the signal perfectly from a frac-tion of the wavelet coefficients, in reason of the sparse behavior explained above. The ad-vantage of wavelets over the KLT/DCT is striking for the Lévy flight (SÆS distribution withÆ= 1). While these findings are surprising at first, they do not contradict the classical the-ory which tells us that the KLT has the minimum basis-restriction error for the given classof processes. The twist here is that the selection of the M largest transform coefficientsamounts to some adaptive reordering of the basis functions, which is not accounted forin the derivation of the KLT. The other point is that the KLT solution is not defined for thethird type of SÆS process whose theoretical covariances are unbounded—this does not

12



1.3. From splines to stochastic processes, or when Schoenberg meets Lévyre

lati

ve�2 2

erro

rre

lati

ve�2 2

erro

r

rela

tive

�2 2er

ror

BrownianMotion

CompoundPoisson

P(x = 0) = 0.9

Cauchy- Levy

Nsignal = 212

Naverage = 1000

(a)

(b)

(c)

Figure 1.5: Haar wavelets vs. KLT: M-term approximation errors for different brands ofLévy processes. (a) Gaussian (Brownian motion). (b) Compound Poisson with Gaussianjump distribution and e°∏ = 0.9. (c) Alpha-stable (symmetric Cauchy). The results areaverages over 1000 realizations.

prevent us from applying the Gaussian solution/DCT to a finite-length realization whose`2-norm is finite (almost surely). This simple experiment with various stochastic mod-els corroborates the results obtained with image compression where the superiority ofwavelets over the DCT (e.g., JPEG2000 vs. JPEG) is well-established.

1.3.5 Lévy’s wavelet-based synthesis of Brownian motion

We close this introductory chapter by making the connection with a multiresolutionscheme that Paul Lévy developed in the 1930s to characterize the properties of Brownianmotion. To do so, we adopt a point of view that is the dual of the one in Section 1.3.4: itessentially amounts to interchanging the analysis and synthesis functions. As first step,we expand the innovation w in the orthonormal Haar basis and obtain

w =X

i2Z

X

k2ZZi ,k√i ,k with Zi ,k = hw,√i ,ki.

This is acceptable 5 under the finite-variance hypothesis on w . Since the Haar basis isorthogonal, the coefficients Zi ,k in the above expansion are fully decorrelated, but notnecessarily independent, unless the white noise is Gaussian or the corresponding basis

5. The convergence in the sense of distributions is ensured since the wavelet coefficients of a rapidly-decaying test function ' are rapidly-decaying as well.

13



1. INTRODUCTION

functions do not overlap. We then construct the Lévy process s = D°10 w by integrating the

wavelet expansion of the innovation, which yields

s(x) =X

i2Z

X

k2ZZi ,k D°1

0 √i ,k (x)

=X

i2Z

X

k2Z2i /2°1Zi ,k¡i ,k (x). (1.24)

The representation (1.24) is of special interest when the noise is Gaussian, in which casethe coefficients Zi ,k are i.i.d. and follow a standardized Gaussian distribution. The for-mula then maps into Lévy’s recursive mid-point method of synthesizing Brownian mo-tion which Yves Meyer singles out as the first use of wavelets to be found in the literature.The Faber-Schauder expansion (1.24) stands out as a localized, practical alternative toWiener’s original construction of Brownian motion which involves a sum of harmoniccosines (KLT-type expansion).

1.4 Historical notes

14



Chapter 2

Road map to the monograph

The writing of this monograph was motivated by our desire to formalize and extend theideas presented in Section 1.3 to a class of differential operators much broader than thederivative D. Concretely, this translates into the investigation of the family of stochasticprocesses specified by the general innovation model that is summarized in Fig. 2.1. Thecorresponding generator of random signals (upper part of the diagram) has two funda-mental components: (1) a continuous-domain noise excitation w , which may be thoughtof as being composed of a continuum of i.i.d. random atoms (innovations), and (2) a de-terministic mixing procedure (formally described by L°1) which couples the random con-tributions and imposes the correlation structure of the output. The concise description ofthe model is Ls = w where L is the whitening operator. The term “innovation” refers to thefact that w represents the unpredictable part of the process. When the inverse operatorL°1 is linear shift-invariant (LSI), the signal generator reduces to a simple convolutionalsystem which is characterized by its impulse response (or, equivalently, its frequency re-sponse). Innovation modeling has a long tradition in statistical communication theoryand signal processing; it is the basis for the interpretation of a Gaussian stationary pro-cess as a filtered version of a white Gaussian noise [Kai70, Pap91].

In the present context, the underlying objects are continuously-defined. The innova-tion model then results from defining a stochastic process (or random field when theindex variable r is a vector in Rd ) as the solution of a stochastic differential equa-tion (SDE) driven by a particular brand of noise. The nonstandard aspect here is thatwe are considering the innovation model in its greatest generality, allowing for non-Gaussian inputs and differential systems that are not necessarily stable. We shall arguethat these extensions are essential for making this type of modeling compatible with thelatest developments in signal processing pertaining to the use of wavelets and sparsity-promoting reconstruction algorithms. Specifically, we shall see that it is possible to gen-erate a wide variety of sparse processes by replacing the traditional Gaussian input bysome more general brand of (Lévy) noise, within the limits of mathematical admissib-ility [UTAKed, UTSed]. We shall also demonstrate that such processes admit a sparserepresentation in a wavelet basis under the assumption that L is scale-invariant. Thedifficulty there is that scale-invariant SDEs are inherently unstable (due to the pres-ence of poles at the origin); yet, we shall see that they can still result in a proper spe-cification of fractal-type processes, albeit not within the usual framework of stationaryprocesses[TVDVU09, UT11]. The nontrivial aspect of these generalizations is that theynecessitate the resolution of instabilities—in the form of singular integrals. This is re-quired not only at the system level, to allow for non-stationary processes, but also at thestochastic level because the most interesting sparsity patterns are associated with un-bounded Lévy measures.

15



2. ROAD MAP TO THE MONOGRAPH

s(x)

White Lévy noise Generalizedstochastic process

Shaping filter

(appropriate boundary conditions)

Whitening operator

L�1

L

w(x)

w(r) s(r)

Figure 2.1: Innovation model of a generalized stochastic process. The process is gener-ated by application of the (linear) inverse operator L°1 to a continuous-domain white-noise process w . The generation mechanism is general in the sense that it applies to thecomplete family of Lévy noises, including Gaussian noise as the most basic (non-sparse)excitation. The output process s is stationary iff. L°1 is shift-invariant.

Before proceeding with the statistical characterization of sparse stochastic processes, weshall highlight the central role of the operator L and make a connection with spline theoryand the construction of signal-adapted wavelet bases.

2.1 On the implications of the innovation model

To motivate our approach, we start with an informal discussion, leaving the technicalitiesaside. The stochastic process s in Fig. 2.1 is constructed by applying the (integral) oper-ator L°1 to some continuous-domain white noise w . In most cases of interest, L°1 hasan infinitely-supported impulse response which introduces long-range dependencies. Ifwe are aiming at a concise statistical characterization of s, it is essential that we somehowinvert this integration process, the ideal being to apply the operator L which would giveback the innovation signal w that is fully decoupled. Unfortunately, this is not feasible inpractice because we do not have access to the signal s(r ) over the entire domain r 2 Rd ,but only to its sampled values on a lattice or, more generally, to a series of coefficientsin some appropriate basis. Our analysis options are essentially two fold, as described inSections 2.1.1 and 2.1.2.

2.1.1 Linear combination of sampled values

Given the sampled values s(k),k 2 Zd , the best we can aim at is to implement a discreteversion of the operator L, which is denoted by Ld. In effect, Ld will act on the sampledversion of the signal as a digital filter. The corresponding continuous-domain descriptionof its impulse response is

Ld±(r ) =X

k2Zd

d [k]±(r °k)

with some appropriate weights d . To fix ideas, Ld may correspond to the numerical ver-sion of the operator provided by the finite-difference method of approximating derivat-ives.

The interest is now to characterize the (approximate) decoupling effect of this discreteversion of the whitening operator. This is quite feasible when the continuous-domaincomposition of the operators Ld and L°1 is shift-invariant with impulse response ØL(r )which is assumed to be absolutely integrable (BIBO stability). In that case, one readily

16



2.1. On the implications of the innovation model

finds thatu(r ) = Lds(r ) = (ØL §w)(r ) (2.1)

whereØL(r ) = LdL°1±(r ). (2.2)

This suggests that the decoupling effect will be the strongest when the convolution kernelØL is the most localized and closest to an impulse 1. We call ØL the generalized B-splineassociated with the operator L. For a given operator L, the challenge will be to design themost localized kernel ØL, which is the way of approaching the discretization problem thatbest matches our statistical objectives. The good news is that this is a standard problemin spline theory, meaning that we can take advantage of the large body of techniquesavailable in this area, even though they have been hardly applied to the stochastic settingso far.

2.1.2 Wavelet analysis

The second option is to analyze the signal s using wavelet-like functions {√i (·° rk )}. Forthat purpose, we assume that we have at our disposal some real-valued “L-compatible”generalized wavelets which, at a given resolution level i , are such that

√i (r ) = L§¡i (r ). (2.3)

Here, L§ is the adjoint operator of L and ¡i is some smoothing kernel with good local-ization properties. The interpretation is that the wavelet transform provides some kindof multiresolution version of the operator L with the effective width of the kernels ¡i in-creasing in direct proportion to the scale; typically, ¡i (r ) / ¡0(r /2i ). Then, the waveletanalysis of the stochastic process s reduces to

hs,√i (·° r0)i = hs,L§¡i (·° r0)i= hLs,¡i (·° r0)i= hw,¡i (·° r0)i= (¡_

i §w)(r0) (2.4)

where ¡_i (r ) = ¡i (°r ) is the reversed version of ¡i . The remarkable aspect is that the

effect is essentially the same as in (2.1) so that it makes good sense to develop a commonframework to analyze white noise.

This is all nice in principle as long as one can construct “L-compatible” wavelet bases. Forinstance, if L is a pure nth-order derivative operator—or by extension, a scale-invariantdifferential operator—then the above reasoning is directly applicable to conventionalwavelets bases. Indeed, these are known to behave like multiscale versions of derivat-ives due to their vanishing-moment property [Mey90, Dau92, Mal09]. In prior work, wehave linked this behavior, as well as a number of other fundamental wavelet properties, tothe polynomial B-spline convolutional factor that is necessarily present in every waveletthat generates a multiresolution basis of L2(R) [UB03]. What is not so widely known is thatthe spline connection extends to a much broarder variety of operators—not necessarilyscale-invariant—and that it also provides a general recipe for constructing wavelet-likebasis functions that are matched to some given operator L. This has been demonstratedin 1D for the entire family of ordinary differential operators [KU06]. The only significantdifference with the conventional theory of wavelets is that the smoothing kernels ¡i arenot necessarily rescaled versions of each other.

1. One may be tempted to pretend that ØL is a Dirac impulse, which amounts to neglecting all discretizationeffects. Unfortunately, this is incorrect and most likely to result in false statistical conclusions. In fact, we shallsee that the localization deteriorates as the order of the operator increases, inducing higher (Markov) orders ofdependencies.

17



2. ROAD MAP TO THE MONOGRAPH

Note that the “L-compatible” property is relatively robust. For instance, if L = L0L0, thenan “L-compatible” wavelet is also L0-compatible with ¡0

i = L0¡i . The design challenge inthe context of stochastic modeling is thus to come up with a suitable wavelet basis suchthat ¡i in (2.3) is most localized—possibly, of compact support.

2.2 Organization of the monograph

The reasoning of Section 2.1 is appealing because of its conceptual simplicity and gen-erality. Yet, the precise formulation of the theory requires some special care because theunderlying stochastic objects are infinite-dimensional and possibly highly singular. Forinstance, we are faced with a major difficulty at the onset because the continuous-domaininput of our model (the innovation w) does not admit a conventional interpretation as afunction of the domain variable r . This entity can only be probed indirectly by formingscalar products with test functions in accordance with Laurent Schwartz’ theory of distri-butions, so that the use of advanced mathematics is unavoidable.

For the benefit of readers who would not be familiar with some the concepts used inthis monograph, we provide the relevant mathematical background in Chapter 3, whichalso serves the purpose of introducing the notation. The first part is devoted to thedefinition of the relevant function spaces with special emphasis on generalized functions(a.k.a. tempered distributions) which play a central role in our formulation. The secondpart reviews the classical, finite-dimensional tools of probability theory and shows howsome of the concepts (e.g., characteristic function, Bochner’s theorem) are extendable tothe infinite-dimensional setting within the framework of Gelfand’s theory of generalizedstochastic processes [GV64].

Chapter 4 is devoted to the mathematical specification of the innovation model. Sincethe theory gravitates around the notion of Lévy exponents, we start with a systematicinvestigation of such functions, denoted by f (!), which are fundamental to the (clas-sical) study of infinitely-divisible probability laws. In particular, we discuss their canon-ical representation given by the Lévy-Khinchine formula. In Section 4.4, we make use ofthe powerful Minlos-Bochner theorem to transfer those representations to the infinite-dimensional setting. The fundamental result for our theory is that the class of admissiblecontinuous-domain innovations for the model in Fig. 2.1 is constrained to the so-calledfamily of white Lévy noises, each brand being uniquely characterized by a Lévy exponentf (!). We conclude the chapter with the presentation of mathematical criteria for the ex-istence of solutions of Lévy-driven SDEs (stochastic differential equations) and providethe functional tools for the complete statistical characterization of these processes. In-terestingly, the classical Gaussian processes are covered by the formulation (by settingf (!) =° 1

2 |!|2), but they turn out to be the only non-sparse members of the family.

Besides the random excitation w , the second fundamental component of the innova-tion model in Fig. 2.1 is the inverse L°1 of the whitening operator L. It must fulfill somecontinuity/boundedness condition in order to yield a proper solution of the underlyingSDE. The construction of such inverses (shaping filters) is the topic of Chapter 5, whichpresents a systematic catalog of the solutions that are currently available, including recentconstructs for scale-invariant/unstable SDEs.

In Chapter 6, we review the tools that are available from the theory of splines in relationto the specification of the analysis kernels in Equations (2.1) and (2.3) above. Remarkably,the techniques are quite generic and applicable for any operator L that admits a properinverse L°1. This is not too surprising because we have taken advantage of our expertknowledge of splines to engineer/derive the solutions that are presented in Chapter 5.Indeed, by writing a generalized B-spline as ØL = LdL°1±, one can appreciate that the

18



2.2. Organization of the monograph

construction of a B-spline for some operator L implicitly provides the solution of twoinnovation-related problems at once: 1) the formal inversion of the operator L (for solv-ing the SDE), and 2) the proper discretization of L through a finite-difference scheme. Theleading thread in our formulation is that these two tasks should not be dissociated—thisis achieved formally via the identification of ØL which actually results in simplified andstreamlined mathematics.

In Chapter 7, we apply our framework to the functional specification of a variety of gen-eralized stochastic processes, including the classical family of Gaussian stationary pro-cesses and their sparse counterparts. We also characterize non-stationary processes thatare solutions of unstable SDEs. In particular, we describe higher-order extensions of Lévyprocesses, as well as a whole variety of fractal-type processes.

In Chapter 8, we rely on our functional characterization to obtain a maximally-decoupledrepresentation of sparse stochastic processes by application of the discretized version ofthe whitening operator or by suitable wavelet expansion.

While the transformed domain statistics can be worked out explicitly, our main point inChapter 9 is to show that the sparsity pattern of the input noise is essentially preserved.Apart from a shaping effect that can be quantified, the resulting PDF remains within thesame family of infinite-divisible laws.

Chapter 10 is devoted to the application of the theory to the general problem of recov-ering signals from incomplete, noisy measurements, which is highly relevant to signalprocessing and biomedical imaging. To that end, we develop a general framework forthe discretization of linear inverse problems using a suitable set of basis functions (e.g.,B-splines or wavelets) which is analogous to the finite-element method for solving PDEs.The central element is the “projection” of the continuous-domain stochastic model of thesignal onto the (finite dimensional) reconstruction space in order to specify the prior stat-istical distribution of the signal. We then apply Bayes’ rule to derive corresponding sig-nal estimators (MAP or MMSE). We present examples of imaging applications includingwavelet-domain signal denoising, deconvolution, and the reconstruction of MRI data.

19


An introduction to Sparse Stochastic Processes Copyright © M. Unser and P. D. Tafti

Chapter 3

Mathematical context and background

In this chapter we summarize some of the mathematical preliminaries for the remain-ing chapters. These concern the function spaces used in the book, duality, generalizedfunctions, probability theory, and generalized random processes. Each of these topics isdiscussed in a separate section.

For the most part, the theory of function spaces and generalized functions can be seen asan infinite-dimensional generalization of linear algebra (function spaces generalize Rn ,and continuous linear operators generalize matrices). Similarly, the theory of generalizedrandom processes involves the generalization of the idea of a finite random vector in Rn

to an element of an infinite-dimensional space of generalized functions.

To give a taste of what is to come, we briefly compare finite- and infinite-dimensionaltheories in Tables 3.1 and 3.2. The idea, in a nutshell, is to substitute vectors by (gener-alized) functions. Formally, this extension amounts to replacing some finite sums (in thefinite-dimensional formulation) by integrals. Yet, in order for this to be mathematicallysound, one needs to properly define the underlying objects as elements of some infinite-dimensional vector space, to specify the underlying notion(s) of convergence (which isnot an issue in Rn), while ensuring that some basic continuity conditions are met.

The impatient reader who is not directly concerned by those mathematical issues mayskip what follows the tables at first reading and consult these sections later as he may feelthe need. Yet, he should be warned that the material on infinite-dimensional probabilitytheory from Subsection 3.4.4 to the end of the chapter is fundamental to our formulation.The mastery of those notions also requires a good understanding of function spaces andgeneralized functions which are covered in the first part of the chapter.

3.1 Some classes of function spaces

By the term function we shall intend elements of various function spaces. At a minimum, afunction space is a set X along with some criteria for determining, first, whether or not agiven “function”'='(r ) belongs to X (in mathematical notation, ' 2X ) and, secondly,given','0 2X , whether or not' and'0 describe the same object in X (in mathematicalnotation, ' = '0). Most often, in addition to these, the space X has additional structure(see below).

In this book we shall largely deal with two types of function spaces: complete normedspaces such as Lebesgue Lp spaces, and nuclear spaces such as the Schwartz space S andthe space D of compactly supported test functions, as well as their duals S 0 and D0, whichare spaces of generalized functions. These two categories of spaces (complete-normed

21



3. MATHEMATICAL CONTEXT AND BACKGROUND

finite-dimensional theory (linear al-gebra)

infinite-dimensional theory (functionalanalysis)

Euclidean space RN , complexificationCN

function spaces such as the Lebesguespace Lp (Rd ) and the space of tempereddistributions S 0(Rd ), among others.

vector x = (x1, . . . , xN ) in RN or CN function f (r ) in S 0(Rd ), Lp (Rd ), etc.

bilinear scalar product

hx , yi=PNn=1 xn yn h', g i=

R

'(r )g (r ) dr

' 2 S (Rd ) (test function), g 2 S 0(Rd )(generalized function), or

' 2 Lp (Rd ), g 2 Lq (Rd ) with 1p + 1

q = 1, forinstance.

equality: x = y () xn = yn various notions of equality (depends onthe space), such as

() hu, xi= hu, yi, 8u 2RN weak equality of distributions: f = g 2S 0(Rd ) () h', f i = h', g i for all ' 2S (Rd ),

() kx ° yk2 = 0 almost-everywhere equality: f = g 2Lp (Rd ) ()

R

Rd | f (r )° g (r )|p dr = 0.

linear operators RN !RM continuous linear operators S (Rd ) !S 0(Rd )

y = Ax ) ym =PNn=1 amn xn g = A' ) g (r ) =

R

Rd a(r , s)'(s) ds forsome a 2 S 0(Rd £Rd ) (Schwartz’ kerneltheorem)

transpose adjoint

hx ,Ayi= hATx , yi h',Ag i= hA§', g i

Table 3.1: Comparison of notions of linear algebra with those of functional analysis andthe theory of distributions (generalized functions). See Sections 3.1-3.3 for an explana-tion.

and nuclear) cannot overlap, except in finite dimensions. Since the function spaces thatare of interest to us are infinite-dimensional (they do not have a finite vector-space basis),the two categories are mutually exclusive.

The structure of each of the afore-mentioned spaces has two aspects. First, as a vectorspace over the real numbers or its complexification, the space has an algebraic structure.Second, with regard to the notions of convergence and taking of limits, the space has atopological structure. The algebraic structure lends meaning to the idea of a linear oper-ator on the space, while the topological structure gives rise to the concept of a continuousoperator or map, as we shall see shortly.

All the spaces considered here have a similar algebraic structure. They are either vectorspaces overR, meaning for any','0 in the space and any a 2R, the operations of addition' 7!'+'0 and multiplication by scalars' 7! a' are defined and map the space (denotedhenceforth by X ) into itself. Or, we may take the complexification of a real vector space

22



3.1. Some classes of function spaces

finite-dimensional infinite-dimensional

random variable X in RN generalized stochastic process s in S 0

probability measure PX on RN probability measure Ps on S 0

PX (E) = Prob(X 2 E) =R

E pX (x) dx (pXis a generalized [i.e., hybrid] pdf)

Ps (E) = Prob(s 2 E) =R

E Ps (dg )

for suitable subsets E ΩRN for suitable subsets E ΩS 0

characteristic function characteristic functional

cPX (ª) = E{ejhª,X i} =R

RN ejhª,xipX (x) dx ,ª 2RN

cPs (') = E{ejh',si} =R

S 0 ejh',g iPs (dg ),' 2S

Table 3.2: Comparison of notions of finite-dimensional statistical calculus with the theoryof generalized stochastic processes. See Sections 3.4 for an explanation.

X , composed of elements of the form ' = 'r + j'i with 'r ,'i 2 X and j denoting theimaginary unit. The complexification is then a vector space over C. In the remainder ofthe book, we shall denote a real vector space and its complexification by the same symbol.The distinction, when important, will be clear from the context.

For the spaces with which we are concerned in this book, the topological structure is com-pletely specified by providing a criterion for the convergence of sequences. 1 By this wemean that, for any given sequence ('i ) in X and any ' 2 X , we are equipped with theknowledge of whether or not ' is the limit of ('i ). A topological space is a set X withtopological structure. For normed spaces, the said criterion is given in terms of a norm,while in nuclear spaces it is given in terms of a family of seminorms, as we shall discussbelow. But before that, let us first define linear and continuous operators.

An operator is a mapping from one vector space to another; that is, a rule that associatesan output function A' 2Y to each input ' 2X .

Definition 1 (Linear operator). An operator A : X !Y where X and Y are vector spacesis called linear if for any ','0 2X and a,b 2R (or C),

A(a'+b'0) = aA'+bA'0. (3.1)

Definition 2 (Continuous operator). Let X ,Y be topological spaces. An operator A : X !Y is called sequentially continuous (with respect to the topologies of X and Y ) if, for anyconvergent sequence ('i ) in X with limit ' 2X , the sequence

°

A'i¢

converges to A' in Y ,that is,

limi

A'i = A(limi'i ).

The above definition of continuity coincides with the stricter topological definition forspaces we are interested in.

We shall assume that the topological structure of our vector spaces is such that the op-erations of addition and multiplication by scalars in R (or C) are continuous. With thiscompatibility conditions our object is called a topological vector space.

1. This is in contrast with those topological spaces where one needs to consider generalizations of the notionof a sequence involving partially ordered sets (the so-called nets or filters). Spaces in which a knowledge ofsequences suffices are called sequential.

23




Having defined the two types of structure (algebraic and topological) and their relationwith operators in abstract terms, let us now show concretely how the topological structureis defined for some important classes of spaces.

3.1.1 About the notation: mathematics vs. engineering

So far, we have considered a function in abstract terms as an element of a vector space:' 2X . The more conventional view is that of map' :Rd !R (orC) that associates a value'(r ) to each point r = (r1, . . . ,rd ) 2Rd . Following the standard convention in engineering,we shall therefore also use the notation'(r ) [instead of'(·) or'] to represent the functionusing r as our generic d-dimensional index variable, the norm of which is denoted by|r |2 = Pd

i=1 |ri |2. This is to be contrasted with the point values (or samples) of ' whichwill be denoted using subscripted index variables; i.e., '(rk ) stands for the value of ' atr = rk . Likewise, '(r ° r 0) ='(·° r0) refers to the function ' shifted by r0.

A word of caution is in order here. While the engineering notation has the advantageof being explicit, it can also be felt as being abusive because the point values of ' arenot necessarily well defined, especially when the function presents discontinuities, notto mention the case of generalized functions that do not have a pointwise interpretation.'(r ) should therefore be treated as an alternative notation for ' that reminds us of thedomain of the function and not interpreted literally.

3.1.2 Normed spaces

A norm on X is a map X ! R, usually denoted as ' 7! k'k (with indices used if neededto distinguish between different norms), which fulfils the following properties for all a 2R(or C) and ','0 2X .

k'k ∏ 0 (nonnegativity).

ka'k= |a| k'k (positive homogeneity).

k'+'0k ∑ k'k+k'0k (triangular inequality).

k'k= 0 implies '= 0 (separation of points).

By relaxing the last requirement we obtain a seminorm.

A normed space is a vector space X equipped with a norm.

A sequence ('i ) in a normed space X is said to converge to ' (in the topology of X ), insymbols

limi'i =',

if and only iflim

ik'°'ik= 0.

Let ('i ) be a sequence in X such that for any ≤> 0 there exists an N 2N with

k'i °' j k< ≤ for all i , j ∏ N .

Such a sequence is called a Cauchy sequence. A normed space X is complete if it doesnot have any holes, in the sense that, for every Cauchy sequence in X , there exists an' 2 X such that limi 'i = ' (in other words if every Cauchy sequence has a limit in X ).A normed space that is not complete can be completed by introducing new points cor-responding to the limits of equivalent Cauchy sequences. For example, the real line is thecompletion of the set of rational numbers with respect to the absolute-value norm.

24



3.1. Some classes of function spaces

Examples

Important examples of complete normed spaces are the Lebesgue spaces. The Lebesguespaces Lp (Rd ), 1 ∑ p ∑ 1, are composed of functions whose Lp (Rd ) norm, denoted ask ·kp , is finite, where

k'(r )kLp :=(

°R

Rd |'(r )|p dr

¢

1p for 1 ∑ p <1

esssupr2Rd |'(r )| for p =1

and where two functions that are equal almost everywhere are considered to be equival-ent.

We may also define weighted Lp spaces by replacing the shift-invariant Lebesgue meas-ure (dr ) by a weighted measure w(r )dr in the above definitions. In that case, w(r ) isassumed to be a measurable function that is (strictly) positive almost everywhere. In par-ticular, for w(r ) = 1+|r |Æ with Æ> 0, we denote the associated norms as k ·kp,Æ, and thecorresponding normed spaces as Lp,Æ(Rd ). The latter spaces are useful when characteriz-ing the decay of functions at infinity. For example, L1,Æ(Rd ) is the space of functions thatare bounded by a constant multiple of 1

1+|r |Æ almost everywhere.

Some remarkable inclusion properties of Lp,Æ(Rd ), 1 ∑ p ∑1, Æ> 0 are

Æ>Æ0 implies Lp,Æ(Rd ) Ω Lp,Æ0 (Rd ).

L1, dp +≤(Rd ) Ω Lp (Rd ) for any ≤> 0.

Finally, we define the space of rapidly decaying functions, R(Rd ), as the intersection ofall L1,Æ(Rd ) spaces, Æ> 0, or, equivalently, as the intersection of all L1,Æ(Rd ) with Æ 2N.In other words, R(Rd ) contains all bounded functions that essentially decay faster than1/|r |Æ at infinity for all Æ 2 R+. A sequence ( fi ) converges in (the topology of) R(Rd ) ifand only if it converges in all L1,Æ(Rd ) spaces.

The causal exponential ΩÆ(r ) = 1[0,1)(r )eÆr with Re(Æ) < 0 that is central to linear systemstheory is a prototypical example of function included in R(R).

3.1.3 Nuclear spaces

Defining nuclear spaces is neither easy nor particularly intuitive. Fortunately, for our pur-pose in this book, knowing the definition is not necessary. We shall simply assert that cer-tain function spaces are nuclear, in order to use certain results that are true for nuclearspaces (specifically, the Minlos-Bochner theorem, see below). For the sake of complete-ness, a general definition of nuclear spaces is given at the end of this section, but thisdefinition may safely be skipped without compromising the presentation.

Specifically, it will be sufficient for us to know that the spaces D(Rd ) and S (Rd ), whichwe shall shortly define, are nuclear, as are the Cartesian products and powers of nuclearspaces, and their closed subspaces.

To define these spaces, we need to identify their members, as well as the criterion of con-vergence for sequences in the space.

The space D(Rd )

The space of compactly supported smooth test functions is denoted by D(Rd ). It consistsof infinitely differentiable functions with compact support in Rd . To define its topology,we provide the following criterion for convergence in D(Rd ):

25




A sequence ('i ) of functions in D(Rd ) is said to converge (in the topology of D(Rd )) if

1. There exists a compact (here, meaning closed and bounded) subset K of Rd such thatall 'i are supported inside K .

2. The sequence ('i ) converges in all of the seminorms

k'kK ,n := supr2K

|@n'(r )| for all n 2Nd .

Here, n = (n1, . . . ,nd ) 2 Nd is what is called a multi-index, and @n is shorthand forthe partial derivative @n1

r1 · · ·@ndrd

. We take advantage of the present opportunity also tointroduce two other notations: |n| for

Pdi=1 |ni | and r

n for the productQd

i=1 r nii .

The space D(Rd ) is nuclear (for a proof, see for instance [GV64]).

The Schwartz space S (Rd )

The Schwartz space or the space of so-called smooth and rapidly decaying test functions,denoted as S (Rd ), consists of infinitely differentiable functions ' on Rd , for which all ofthe seminorms defined below are finite:

k'km,n = sup

r2Rd|r m@n'(r )| for all m,n 2Nd .

In other words, S (Rd ) is populated by functions that, together with all of their derivatives,decay faster than the inverse of any polynomial at infinity.

The topology of S (Rd ) is defined by positing that a sequence ('i ) converges in S (Rd ) ifand only if it converges in all of the above seminorms.

The Schwartz space has the remarkable property that its complexification is invariant un-der the Fourier transform. In other words, the Fourier transform, defined by the integral

'(r ) 7! '(!) =F {'}(!) :=Z

Rde°jhr ,!i'(r ) dr ,

and inverted by

'(!) 7!'(r ) =F°1{'}(r ) :=Z

Rdejhr ,!i'(!)

d!

(2º)d,

is a continuous map from S (Rd ) into itself. Our convention here is to use ! 2 Rd as thegeneric Fourier-domain index variable.

In addition, both S (Rd ) and D(Rd ) are closed and continuous under differentiation ofany order and multiplication by polynomials. Lastly, they are included in R(Rd ) andhence in all the Lebesgue spaces, Lp (Rd ), which do not require any smoothness.

General definition of nuclear spaces§

Defining a nuclear space requires us to define nuclear operators. These are operators thatcan be approximated by operators of finite rank in a certain sense (an operator betweenvector spaces is of finite rank if its range is finite-dimensional).

We first recall the notation `p (N), 1 ∑ p < 1, for the space of p-summable sequences;that is, sequences c = (ci )i2N for which

X

i2N|ci |p

26



3.2. Dual spaces and adjoint operators

is finite. We also denote by `1(N) the space of all bounded sequences.

In a complete normed space Y , let (√i )i2N be a sequence with bounded norm (i.e., k√ik ∑M for some M 2 R and all i 2 N). We then denote by M√ the linear operator `1(N) ! Y

which maps a sequence c = (ci )i2N in `1 to the weighted sumX

i2Nci√i

in Y (the sum converges in norm by the triangular inequality).

An operator A : X ! Y , where X ,Y are complete normed spaces, is called nuclear ifthere exists a continuous linear operator

eA : X ! `1 : ' 7!°

ai (')¢

,

an operator§ : `1 ! `1 : (ci ) 7! (∏i ci )

whereP

i |∏i | <1, and a bounded sequence√= (√i ) in Y , such that we can write

A = M√ § eA.

This is equivalent to the following decomposition of A into a sum of rank 1 operators:

A :' 7!X

i2N∏i ai (')√i

The continuous linear operator X ! Y : ' 7! ∏i ai (')√i is of rank 1 because it maps X

into the one-dimensional subspace of Y spanned by √i ; compare (√i ) with a basis and°

ai (')¢

with the coefficients of A' in this basis.

More generally, given an arbitrary topological vector space X and a complete normedspace Y , the operator A : X ! Y is said to be nuclear if there exists a complete normedspace X1, a nuclear operator A1 : X1 ! Y , and a continuous operator B : X ! X1, suchthat

A = A1B.

Finally, X is a nuclear space if any continuous linear map X !Y , where Y is a completenormed space, is nuclear.

3.2 Dual spaces and adjoint operators

Given a space X , a functional on X is a map f that takes X to the scalar field R (or C). Inother words, f takes a function ' 2X as argument and returns the number f (').

When X is a vector space, we may consider linear functionals on it, where linearity hasthe same meaning as in Definition 1. When f is a linear functional, it is customary to usethe notation h', f i in place of f (').

The set of all linear functionals on X , denoted as X §, can be given the structure of avector space in the obvious way by the identity

h', a f +b f0i= ah', f i+bh', f0i,

where ' 2 X , f , f0 2 X §, and a,b 2 R (or C) are arbitrary. The resulting vector space X §

is called the algebraic dual of X .

The map from X £X § toR (orC) that takes the pair (', f ) to their so-called scalar producth', f i is then bilinear in the sense that it is linear in each of the arguments ' and f . Note

27




that the reasoning about linear functionals works both ways so that we can also switchthe order of the pairing. This translates into the formal commutativity rule h f ,'i= h', f iwith a dual interpretation of the two sides of the equality.

Given vector spaces X ,Y with algebraic duals X §,Y § and a linear operator A : X !Y ,the adjoint or transpose of A, denoted as A§, is the linear operator Y § !X § defined by

A§ f = f ±A

for any linear functional f : Y ! R (or C) in Y §, where ± denotes composition. The mo-tivation behind the above definition is to have the identity

hA', f i= h',A§ f i (3.2)

hold for all ' 2X and f 2Y §.

If X is a topological vector space, it is of interest to consider the subspace of X § com-posed of those linear functionals on X that are continuous with respect to the topologyof X . This subspace is denoted as X 0 and called the topological or continuous dual of X .Note that, unlike X §, the continuous dual generally depends on the topology of X . Inother words, the same vector space X with different topologies will generally have differ-ent continuous duals.

As a general rule, in this book we shall adopt some standard topologies and only workwith the corresponding continuous dual space, which we shall call simply the dual. Also,henceforth, we shall assume the scalar product h·, ·i to be restricted to X £X 0. There, thespace X may vary but is necessarily paired with its continuous dual.

Following the restrictions of the previous paragraph, we sometimes say that the adjointof A : X !Y exists, to mean that the algebraic adjoint A§ : Y § !X §, when restricted toY 0, maps into X 0, so that we can write

hA', f i= h',A§ f i,

where the scalar products on the two sides are now restricted to Y £Y 0 and X £X 0,respectively.

One can define different topologies on X 0 by providing various criteria for convergence.The only one we shall need to deal with is the weak-§ topology, which indicates (for asequential space X ) that ( fi ) converges to f in X 0 if and only if

limih', fi i= h', f i for all ' 2X .

This is precisely the topology of pointwise convergence for all “points” ' 2X .

We shall now mention some examples.

3.2.1 The dual of Lp spaces

The dual of the Lebesgue space Lp (Rd ), 1 ∑ p < 1, can be identified with the spaceLp 0 (Rd ) with 1 < p 0 ∑1 satisfying 1/p +1/p 0 = 1, by defining

h', f i=Z

Rd'(r ) f (r ) dr (3.3)

for ' 2 Lp (Rd ) and f 2 Lp 0 (Rd ). In particular, L2(Rd ), which is the only Hilbert space ofthe family, is its own dual.

28



3.2. Dual spaces and adjoint operators

To see that linear functionals described by the above formula with f 2 Lp 0 are continuouson Lp , we can rely on Hölder’s inequality, which states that

|h', f i|∑Z

Rd|'(r ) f (r )| dr ∑ k'kLp k f kLp0

for 1 ∑ p, p 0 ∑1 and 1/p +1/p 0 = 1. The special case of this inequality for p = 2 yields theCauchy-Schwarz inequality.

3.2.2 The duals of D and S

In this subsection, we give the mathematical definition of the duals of the nuclear spacesD and S . A physical interpretation of these definitions is postponed until the next sec-tion.

The dual of D(Rd ), denoted as D0(Rd ), is the so-named space of distributions over Rd (al-though we shall use the term distribution more generally to mean any generalized func-tion in the sense of the next section). Ordinary locally integrable functions 2 (in particular,all Lp functions and all continuous functions), can be identified with elements of D0(Rd )by using (3.3). By this, we mean that any locally integrable function f defines a continu-ous linear functional on D(Rd ) where, for ' 2 D(Rd ), h', f i is given by (3.3). However,not all elements of D0(Rd ) can be characterized in this way. For instance, the Dirac func-tional ±, which maps ' 2 D(Rd ) to the value h',±i = '(0), belongs in D0(Rd ) but cannotbe written as an integral à la (3.3). Even in this and similar cases, we may sometimeswrite

R

Rd '(r ) f (r ) dr , keeping in mind that the integral is no longer a true (i.e., Lebesgue)integral, but simply an alternative notation for h', f i.In similar fashion, the dual of S (Rd ), denoted as S 0(Rd ), is defined and called the spaceof tempered (or Schwartz) distributions. Since D ΩS and any sequence that converges inthe topology of D also converges in S , it follows that S 0(Rd ) is (can be identified with) asmaller space (i.e., a subspace) of D0(Rd ). In particular, not every locally-integrable func-tion belongs in S 0. For example, locally-integrable functions of exponential growth haveno place in S 0 as their scalar product with Schwartz test functions via (3.3) is not in gen-eral finite (much less continuous). Once again, S 0(Rd ) contains objects that are not func-tions on Rd in the true sense of the word. For example, ± also belongs in S 0(Rd ).

3.2.3 Distinction between Hermitian and duality products

We use the notation h f , g iL2 =R

Rd f (r )g (r ) dr to represent the usual (Hermitian-symmetric) L2 inner product. The latter is defined for f , g 2 L2(Rd ) (the Hilbert spaceof complex finite-energy functions); it is equivalent to Schwartz’ duality product onlywhen the second argument is real-valued (due to the presence of complex conjugation).The corresponding Hermitian adjoint of an operator A is denoted by AH . It is definedas hAH f , g iL2 = h f ,Ag iL2 = h f ,Ag i which implies that AH = A§. The distinction betweenboth types of adjoints is only relevant when considering signal expansions or analyses interms of complex basis functions.

The classical Fourier transform is defined as

f (!) =F { f }(!) :=Z

Rdf (r )e°jhr ,!i dr

for any f 2 L1(Rd ). This definition admits a unique extension, F : L2(Rd ) ! L2(Rd ), whichis an isometry map (Plancherel’s theorem). The fact that the Fourier transform preserves

2. A function on Rd is called locally integrable if its integral over any closed bounded set is finite.

29




the L2 norm of a function (up to a normalization factor) is a direct consequence of Par-seval’s relation

h f , g iL2 =1

(2º)dh f , g iL2 ,

whose duality product equivalent is h f , g i= h f , g i.

3.3 Generalized functions

3.3.1 Intuition and definition

We begin with some considerations regarding the modeling of physical phenomena. Letus suppose that the object of our study is some physical quantity f that varies in relationto some parameter r 2 Rd representing space and/or time. We assume that our way ofobtaining information about f is by making measurements that are localized in space-time using sensors (',√, . . .). We shall denote the measurement of f procured by ' ash', f i. 3 Let us suppose that our sensors form a vector space, in the sense that for any twosensors ',√ and any two scalars a,b 2 R (or C), there is a real or virtual sensor a'+b√such that

ha'+b√, f i= ah', f i+bh√, f i.

In addition, we may reasonably suppose that the phenomenon under observation hassome form of continuity, meaning that

limih'i , f i= h', f i,

where ('i ) is a sequence of sensors that tend to ' in a certain sense. We denote the setof all sensors by X . In the light of the above notions of linear combinations and limitsdefined in X , mathematically, the space of sensors then has the structure of a topologicalvector space.

Given the above properties and the definitions of the previous sections, we conclude thatf represents an element of the continuous dual X 0 of X . Given that our sensors, as pre-viously noted, are assumed to be localized in Rd , we may model them as compactly sup-ported or rapidly decaying functions on Rd , denoted by the same symbols (',√, . . .) and,in the case where f also corresponds to a function on Rd , relate the observation h', f i tothe functional form of ' and f by the identity

h', f i=Z

Rd'(r ) f (r ) dr .

We exclude from consideration those functions f for which the above integral is un-defined or infinite for some ' 2X .

However, we are not limited to taking f to be a true function of r 2 Rd . By requiring oursensor or test functions to be smooth, we can permit f to become singular; that is, todepend on the value of ' and/or of its derivatives at isolated points/curves inside Rd . Anexample of a singular generalized function f , which we have already noted, is the Diracdistribution ± that measures the value of ' at the single point r = 0 (i.e., h',±i='(0)).

Mathematically, we define generalized functions as members of the continuous dual X 0

of a nuclear space X of functions, such as D(Rd ) or S (Rd ).

3. The connection with previous sections should already be apparent from this choice of notation.

30



3.3. Generalized functions

3.3.2 Operations on generalized functions

Following (3.2), any continuous linear operator D ! D or S ! S can be transposed todefine a continuous linear operator D§ !D§ or S § !S §. In particular, since D(Rd ) andS (Rd ) are closed under differentiation, we can define derivatives of distributions.

First, note that, formally,h@n', f i= h',@n§ f i.

Now, using integration by parts in (3.3), for ', f in D(Rd ) or S (Rd ) we see that @n§ =(°1)|n|@n . In other words, we can write

h',@n f i= (°1)nh@n', f i. (3.4)

The idea is then to use (3.4) as the defining formula in order to extend the action of thederivative operator @n for any f 2D0(Rd ) or S 0(Rd ).

Formulas for scaling, shifting (translation), rotation, and other geometric transformationsof distributions are obtained in a similar manner. For instance, the translation by r0 of ageneralized function f is defined via the identity

h', f (·° r0)i= h'(·+ r0), f i.

More generally, we give the following definition.

Definition 3. Given operators U,U§ : S (Rd ) ! S (Rd ) that form an adjoint pair onS (Rd )£S (Rd ), we extend their action to S 0(Rd ) ! S 0(Rd ) by defining U f and U§ f soas to have

h',U f i= hU§', f i,h',U§ f i= hU', f i,

for all f . A similar definition gives the extension of adjoint pairs D(Rd ) !D(Rd ) to operat-ors D0(Rd ) !D0(Rd ).

Examples of operators S (Rd ) !S (Rd ) that can be extended in the above fashion includederivatives, rotations, scaling, translation, time-reversal, and multiplication by smoothfunctions of slow growth in the space-time domain. The other fundamental operation isthe Fourier transform which is treated in the next section.

3.3.3 The Fourier transform of generalized functions

We have already noted that the Fourier transform F is a reversible operator that maps the(complexified) space S (Rd ) into itself. The additional relevant property is that F is self-adjoint: h',F√i= hF',√i, for all ',√ 2S (Rd ). This helps us specifying the generalizedFourier transform of distributions in accordance with the general extension principle inDefinition 3.

Definition 4. The generalized Fourier transform of a distribution f 2S 0(Rd ) is the distri-bution f =F { f } 2S 0(Rd ) that satisfies

h', f i= h', f i

for all ' 2S , where '=F {'} is the classical Fourier transform of ' given by the integral

'(!) =Z

Rde°jhr ,!i'(r ) dr .

31




Temporal or spatial domain Fourier domain

bf (r ) =F { f }(r ) (2º)d f (°!)

f _(r ) = f (°r ) f (°!) = f _(!)

f (r ) f (°!)

f (ATr ) 1

|detA|bf (A°1!)

f (r ° r0) e°jhr0,!ibf (!)

ejhr ,!0i f (r ) bf (!°!0)

@n f (r ) (j!)n

bf (!)

r

n f (r ) j|n|@n

bf (!)

(g § f )(r ) bg (!) bf (!)

g (r ) f (r ) (2º)°d (bg § bf )(!)

Table 3.3: Basic properties of the (generalized) Fourier transform.

For example, since we haveZ

Rd'(r ) dr = h',1i= '(0) = h',±i,

we conclude that the (generalized) Fourier transform of ± is the constant function 1.

The fundamental property of the generalized Fourier transform is that it maps S 0(Rd )into itself and that it is invertible with F°1 = 1

(2º)d F where F { f } = F { f _}. This quasiself-reversibility—also expressed by the first row of Table 3.3—implies that any operationon generalized functions that is admissible in the space/time domain has its counterpartin the Fourier domain, and vice versa. For instance, the multiplication with a smoothfunction in the Fourier domain corresponds to a convolution in the signal domain. Con-sequently, the familiar functional identities concerning the classical Fourier transformsuch as the formulas for change of variables, differentiation, among others, also hold truefor this generalization. These are summarized in Table 3.3.

In addition, the reader can find in Appendix A a table of Fourier transforms of some im-portant singular generalized functions in one and several variables.

3.3.4 The kernel theorem

The kernel theorem provides a characterization of continuous operators X ! X 0 (withrespect to the nuclear topology on X and the weak-§ topology on X 0). We shall statea version of the theorem for X = S (Rd ), which is the one we shall use. The version forD(Rd ) is obtained by replacing the symbol S with D everywhere in the statement of thetheorem.

Theorem 1 (Schwartz’ kernel theorem: first form). Every continuous linear operatorS (Rd ) !S 0(Rd ) can be written in the form

'(r ) 7!Z

Rdh(r , s)'(s) ds, (3.5)

where h(·, ·) is a generalized function in S 0(Rd £Rd ).

32




We can interpret the above formula as some sort of continuous-domain matrix-vectorproduct, where r , s play the role of vector indices. This characterization of continuouslinear operators as infinite-dimensional matrix-vector products partly justifies our earlierstatement that nuclear spaces “resemble” finite-dimensional spaces in some importantways.

An equivalent statement of the above theorem is as follows.

Theorem 2 (Schwartz’s kernel theorem: second form). Every continuous bilinear forml : S (Rd )£S (Rd ) !R (or C) can be written as

l ('1,'2) = h'1(·)'2(·),h(·, ·)i=Z

Rd£Rd'1(r )'2(s)h(r , s) ds dr .

The connection between the two statements is clarified by relating the continuous bilin-ear form l to a continuous linear operator A : S (Rd ) !S 0(Rd ) by an identity of the form

l ('1,'2) = h'1,A'2i.

3.3.5 Linear shift-invariant (LSI) operators and convolutions

Let Sr0 denote the shift operator ' 7! '(·° r0). We call an operator U shift-invariant if

USr0 = S

r0 U for all r0 2Rd .

As a corollary of the kernel theorem, we have the following characterization of linear shift-invariant (LSI) operators S !S 0 (and a similar characterization for those D !D0).

Corollary 1. Every continuous linear shift-invariant operator S (Rd ) ! S 0(Rd ) can bewritten as a convolution

'(r ) 7! ('§h)(r ) =Z

Rd'(s)h(r ° s) ds

with some generalized function F 2S 0(Rd ).

Moreover, in this case we have the convolution-multiplication formula

F {h §'} = 'h. (3.6)

Note that the convolution of a test function and a distribution is in general a distribution.The latter is smooth (and therefore equivalent to an ordinary function), but not neces-sarily rapidly decaying. However, '§h will once again belong continuously to S if bh,the Fourier transform of h, is a smooth (infinitely differentiable) function with at mostpolynomial growth at infinity because the smoothness of h translates into h having rapiddecay in the spatio-temporal domain, and vice versa. In particular, we note that the con-dition is met when h 2R(Rd ) (since r

n h(r ) 2 L1(Rd ) for any n 2Nd ). A classical situationin dimension d = 1 where the decay is guaranteed to be exponential is when the Fouriertransform of h is a rational transfer function of the form

h(!) =C0

QMm=1(j!° zm)

QNn=1(j!°pn)

with no purely imaginary pole (i.e., with Re(pn) 6= 0, 1 ∑ n ∑ N ). 4

4. For M or N = 0, we shall take the corresponding product to be equal to 1.

33




Since any sequence that converges in some Lp space, with 1 ∑ p ∑1, also converges inS 0, the kernel theorem implies that any continuous linear operator S (Rd ) ! Lp (Rd ) canbe written in the form specified by (3.5).

In defining the convolution of two distributions, some caution should be exerted. To beconsistent with the previous definitions, we can view convolutions as continuous linearshift-invariant operators. The convolution of two distributions will then correspond tothe composition of two LSI operators. To fix ideas, let us take two distributions f and h,with corresponding operators A f and Ah . We then wish to identify f §h with the com-position A f Ah . However, note that, by the kernel theorem, A f and Ah are initially definedS !S 0. Since the codomain of Ah (the space S 0) does not match the domain of A f (thespace S ), this composition is a priori undefined.

There are two principal situations where we can get around the above limitation. The firstis where the range of Ah is limited to S ΩS 0 (i.e., Ah maps S to itself instead of the muchlarger S 0). This is the case for the distributions with a smooth Fourier transform that wediscussed previously.

The second situation where we may define the convolution of f and h is when the rangeof Ah can be restricted to some space X (i.e, Ag : S ! X ), and furthermore, A f has acontinuous extension to X ; that is, we can extend it as A f : X !S 0.

An important example of the second situation is when the distributions in question be-long to the spaces Lp (Rd ) and Lq (Rd ) with 1 ∑ p, q ∑ 1 and 1/p + 1/q ∑ 1. In thiscase, their convolution is well-defined and can be identified with a function in Lr (Rd ),1 ∑ r ∑1, with

1r= 1° 1

p° 1

q.

Moreover, for f 2 Lp (Rd ) and h 2 Lq (Rd ), we have

k f §hkLr ∑ k f kLp khkLq .

This result is Young’s inequality for convolutions. An important special case of this iden-tity, most useful in derivations, is obtained for q = 1 and p = r :

kh § f kLp ∑ khkL1k f kLp . (3.7)

The latter formula indicates that Lp (Rd ) spaces are “stable” under convolution with ele-ments of L1(Rd ) (stable filters).

3.3.6 Convolution operators on Lp (Rd )

While the condition h 2 L1(Rd ) in (3.7) is very useful in practice and plays a central rolein the classical theory of linear systems, it does not cover the entire range of boundedconvolution operators on Lp (Rd ). Here we shall be more precise and characterize thecomplete class of such operators for the cases p = 1,2,+1. In harmonic analysis, theseoperators are commonly referred to as Lp Fourier multipliers using (3.6) as starting pointfor their definition.

Definition 5 (Fourier multiplier). An operator T : Lp (Rd ) ! Lp (Rd ) is called a Lp Fouriermultiplier if it is continuous on Lp (Rd ) and can be represented as T f = F°1{ f H }. Thefunction H :Rd !C is the frequency response of the underlying filter.

The first observation is that the definition guarantees linearity and shift-invariance.Moreover, since S (Rd ) Ω Lp (Rd ) Ω S 0(Rd ), the multiplier operator can be written as a

34




convolution T f = h § f (see Corollary 1) where h 2S 0(Rd ) is the impulse response of theoperator T: h =F°1{H } = T±. Conversely, we also have that H = h =F {h}.

Since we are dealing with a linear operator on a normed vector space, we can rely on theequivalence between continuity (in accordance with Definition 2) and the boundednessof the operator.

Definition 6 (Operator norm). The norm of the linear operator T : Lp (Rd ) ! Lp (Rd ) isgiven by

kTkLp = supf 2Lp (Rd )\{0}

kT f kLp

k f kLp

.

The operator is said to be bounded if its norm is finite.

It practice, it is often sufficient to work out bounds for the extreme cases (e.g., p = 1,+1)and to then invoke the Riesz-Thorin interpolation theorem to extend the results to the pvalues in-between.

Theorem 3 (Riesz-Thorin). Let T be a linear operator that is bounded on Lp1 (Rd ) as wellas on Lp2 (Rd ) with 1 ∑ p1 ∑ p2. Then, T is also bounded for any p 2 [p1, p2] in the sensethat there exist constants Cp = kTkLp <1 such that

kT f kLp ∑Cpk f kLp

for all f 2 Lp (Rd ).

The next theorem summarizes the main results that are available on the characterizationof convolution operators on Lp (Rd ).

Theorem 4 (Characterization of Lp Fourier multipliers). Let T be a Fourier-multiplier op-erator with frequency response H : Rd !C. Then, the following statements apply:1) The operator T is an L1 Fourier multiplier if and only if H = h is the Fourier transform

of a finite complex-valued Borel measure.2) The operator T is an L1 Fourier multiplier if and only if H = h is the Fourier transform

of a finite complex-valued Borel measure.3) The operator T is an L2 Fourier multiplier if and only if H = h 2 L1(Rd ).The corresponding operator norms are

kTkL1 = kTkL1 = khkTV

kTkL2 =1

(2º)d/2kHkL1 ,

where khkTV is the total variation of the underlying measure. Finally, T is an Lp Fouriermultiplier for the whole range 1 ∑ p ∑ +1 if the condition on H (or h) in 1) or 2) is metwith

kTkLp ∑ khkTV. (3.8)

We note that the above theorem is an extension upon (3.7) since being a finite Borel meas-ure is less restrictive a condition than h 2 L1(Rd ). To see this, we invoke Lebesgue’s de-composition theorem stating that a finite measure µ admits a unique decomposition as

µ=µac +µsing,

where µac is an absolutely-continuous measure and µsing a singular measure whose massis concentrated on a set whose Lebesgue measure is zero. If µsing = 0, then there exists

35




a unique function h 2 L1(Rd )—the Radon-Nikodym derivative of µ with respect to theLebesgue measure—such that

Z

Rd'(r )µ( dr ) =

Z

Rd'(r )h(r ) dr ,

so that the total variation in Theorem 4 reduces to the L1 norm khkTV = khkL1 . Underthose circumstances, there is an equivalence between (3.7) and (3.8).

More generally, when µsing 6= 0, we can make the same kind of association between µ anda generalized function h which is no longer in L1(Rd ). The typical case is when µsing isa discrete measure which results in a generalized function hsing =

P

k hk±(·° rk ) that is asum of Dirac impulses. The total variation of h is then given by khkTV = khackL1 +

P

k |hk |.Statement 3) in Theorem 4 is a consequence of Parseval’s identity. It is consistent withthe intuition that a “stable” filter should have a bounded frequency response, as a min-imal requirement. The class of convolution kernels that satisfy this condition are some-times called pseudo-measures. These are more-general entities than measures becausethe Fourier transform of a finite measure is necessarily uniformly continuous in additionto being bounded.

The last result in Theorem 4 is obtained by interpolation between Statements 1) and 2)using the Riesz-Thorin theorem. The extent to which the TV condition on h can be relaxedfor p 6= 1,2,1 is not yet settled and considered to be a difficult mathematical problem. Alimit example of a 1-D convolution operator that is bounded for 1 < p <1 (see Theorem5 below), but fails to meet the necessary and sufficient TV condition for p = 1,1, is theHilbert transform. Its frequency response is HHilbert(!) = °jsign(!), which is boundedsince |HHilbert(!)| = 1 (all-pass filter), but which is not uniformly continuous because ofthe jump at!= 0. Its impulse response is the generalized function h(r ) = 1

ºr , which is notincluded in L1(R) for two reasons: the singularity at r = 0 and the lack of sufficient decayat infinity.

The case of the Hilbert transform is covered by Mihlin’s multiplier theorem whichprovides a sufficient condition on the frequency response of a filter for Lp stability.

Theorem 5 (Mihlin). A Fourier-multiplier operator is bounded in Lp (Rd ) for 1 < p <1 ifits frequency response H : Rd !C satisfies the differential estimate

Ø

Ø!n@n H(!)Ø

Ø∑Cn

for all |n|∑ (d/2)+1.

Mihlin’s condition, which can absorb some degree of discontinuity at the origin, is easy tocheck in practice. It is stronger than the minimal boundedness requirement for p = 2.

3.4 Probability theory

3.4.1 Probability measures

In this section we give an informal introduction to probability theory. A more precisepresentation can be found in Appendix D.

Probability measures are mathematical constructs that permit us to assign numbers(probabilities) between 0 (almost impossible) to 1 (almost sure) to events. An event ismodeled by a subset A of the universal set ≠X of all outcomes of a certain experiment X ,which is assumed to be known. The symbol PX (A) then gives the probability that someelement of A occurs as the outcome of experiment X . Note that, in general, we may assignprobabilities only to some subsets of ≠X . We shall denote the collection of all subsets of≠X for which PX is defined as SX .

36



3.4. Probability theory

The probability measure PX then corresponds to a function SX ! [0,1]. The triple(≠X ,SX ,PX ) is called a probability space.

Frequently, the collection SX contains open and closed sets, as well as their countableunions and intersections, collectively known as Borel sets. In this case we call PX a Borelprobability measure.

An important application of the notion of probability is in computing the “average” valueof some (real- or complex-valued) quantity f that depends on the outcome in ≠X . Thisquantity, the computation of which we shall discuss shortly, is called the expected valueof f , and is denoted as E{ f (X )}.

An important context for probabilistic computations is when the outcome of X can beencoded as a finite-dimensional numerical sequence, which implies that we can identify≠X with Rn (or a subset thereof). In this case, within the proper mathematical setting,we can find a (generalized) function pX , called the probability distribution 5 or densityfunction (pdf) of X , such that

PX (A) =Z

ApX (x) dx

for suitable subsets A of Rn . 6

More generally, the expected value of f : X !C is here given by

E{ f (X )} =Z

Rnf (x)pX (x) dx . (3.9)

We say “more generally” because PX (A) can be seen as the expected value of the indic-ator function A(X ). Since the integral of complex-valued f can be written as the sum ofits real and imaginary parts, without loss of generality we shall consider only real-valuedfunctions where convenient.

When the outcome of the experiment is a vector with infinitely many coordinates (forinstance a function R! R), it is typically not possible to characterize probabilities withprobability distributions. It is nevertheless still possible to define probability measures onsubsets of≠X , and also to define the integral (average value) of many a function f :≠X !R. In effect, a definition of the integral of f with respect to probability measure PX isobtained using a limit of “simple” functions (finite weighted sums of indicator functions)that approximate f . For this general definition of the integral we use the notation

E{ f (X )} =Z

≠X

f (x) PX (dx),

which we may also use, in addition to (3.9), in the case of a finite-dimensional≠X .

In general, given a function f : ≠X ! ≠Y that defines a new outcome y 2 ≠Y for everyoutcome x 2≠X of experiment X , one can see the result of applying f to the outcome ofX as a new experiment Y . The probability of an event B Ω≠Y is the same as the combinedprobability of all outcomes of X that generate an outcome in B . Thus, mathematically,

PY (B) =PX ( f °1(B)) =PX ± f °1(B),

5. Probability distributions should not be confused with the distributions in the sense of Schwartz (i.e., gen-eralized functions) that were introduced in Section 3.3. It is important to distinguish the two usages, in partbecause, as we describe here, in finite dimensions a connection can be made between probability distributionsand positive generalized functions.

6. In classical probability theory, pdfs are defined as the Radon-Nikodym derivative of a probability meas-ure with respect to some other measure, typically the Lebesgue measure (as we shall assume). This requiresthe probability measure to be absolutely continuous with respect to the latter measure. The definition of thegeneralized pdf given here is more permissive, and also includes measures that are singular with respect to theLebesgue measure (for instance the Dirac measure of a point, for which the generalized pdf is a Dirac distribu-tion). This generalization relies on identifying measures on the Euclidean space with positive linear functionals.

37




where the inverse image f °1(B) is defined as

f °1(B) = {x 2≠X : f (x) 2 B}.

PY =PX ( f °1·) is called the push-forward of PX through f .

3.4.2 Joint probabilities and independence

When two experiments X and Y with probabilities PX and PY are considered simultan-eously, one can imagine a joint probability space (≠X ,Y ,SX ,Y ,PX ,Y ) that supports bothX and Y , in the sense that there exist functions f :≠X ,Y !≠X and g :≠X ,Y !≠Y suchthat

PX (A) =PX ,Y ( f °1(A)) and PY (B) =PX ,Y (g°1(B))

for all A 2SX and B 2SY .

The functions f , g above are assumed to be fixed, and the joint event that A occurs for Xand B for Y , is given by

f °1(A)\ g°1(B).

If the outcome of X has no bearing on the outcome of Y and vice-versa, then X and Yare said to be independent. In terms of probabilities, this translates into the probabilityfactorization rule

PX ,Y ( f °1(A)\ g°1(B)) =PX (A) ·PY (B) =PX ,Y ( f °1(A)) ·PX ,Y (g°1(B)).

The above ideas can be extended to any finite collection of experiments X1, . . . , XM (andeven to infinite ones, with appropriate precautions and adaptations).

3.4.3 Characteristic functions in finite dimensions

In finite dimensions, given a probability measure PX on ≠X = Rn , for any vector ª 2 Rn ,we can compute the expected value (integral) of the bounded function x 7! ejhª,xi. Thispermits us to define a complex-valued function on Rn by the formula

bpX (ª) = E{ejhª,xi} =Z

Rnejhª,xipX (x) dx =F {pX }(ª), (3.10)

which corresponds to a slightly different definition of the Fourier transform of the (gen-eralized) probability distribution pX . The convention in probability theory is to definethe forward Fourier transform with a positive sign for jhª, xi, which is the opposite of theconvention used in analysis. To avoid confusion, we use the variable ª as the Fourier vari-able in probabilistic Fourier transforms [characteristic functions], and ! in the analyticdefinition.

One can prove that bpX , as defined above, is always continuous at 0 with bpX (0) = 1, andthat it is positive-definite (see Definition 32 in Appendix B).

Remarkably, the converse of the above fact is also true. We record the latter result, whichis due to Bochner, together with the former observation, as Theorem 6.

Theorem 6 (Bochner). Let bpX : Rn ! C be a function that is positive-definite, fulfillsbpX (0) = 1, and is continuous at 0. Then, there exists a unique Borel probability measurePX on Rn, such that

bpX (ª) =Z

Rnejhª,xiPX (dx) = E{ejhª,xi}.

Conversely, the function specified by (3.10) with pX (r ) ∏ 0 andR

Rn pX (r ) dr = 1 is positive-definite, uniformly continuous, and such that | bpX (ª)|∑ bpX (0) = 1.

38



3.4. Probability theory

The interesting twist (which is due to Lévy) is that the positive-definiteness of pX and itscontinuity at 0 implies continuity everywhere (as well as boundedness).

Since, by the above theorem, bpX uniquely identifies PX , it is called the characteristicfunction of probability measure PX (recall that the probability measure PX is related tothe density pX by PX (E) =

R

E pX (x) dx for sets E in the æ-algebra over Rn).

The next theorem characterizes weak convergence of measures on Rn in terms of theircharacteristic functions.

Theorem 7 (Lévy’s continuity theorem). Let (PXi ) be a sequence of probability measureson Rn with respective sequence of characteristic functions ( bpXi ). If there exists a functionbpX such that

limi

bpXi (ª) = bpX (ª)

pointwise on Rn, and if, in addition, bpX is continuous at 0, then bpX is the characteristicfunction of a probability measure PX on Rn. Moreover, PXi converges weakly to PX , insymbols

PXi

w°!PX ,

meaning for any continuous function f :Rn !R,

limiEXi { f } = EX { f }.

The reciprocal of the above theorem is also true; namely, if PXi

w°! PX , then bpXi (ª) !bpX (ª) pointwise.

3.4.4 Characteristic functionals in infinite dimensions

Given a probability measure PX on the continuous dual X 0 of some test function spaceX , one can define an analogue of the finite-dimensional characteristic function, dubbedthe characteristic functional of PX and denoted as cPX , by means of the identity

cPX (') = E{ejh',X i}. (3.11)

Comparing the above definition with (3.10), one notes that Rn , as the domain of the char-acteristic function bpX , is now replaced by the space X of test functions.

As was the case in finite dimensions, the characteristic functional fulfills two importantconditions:

Positive-definiteness: cPX is positive-definite, in the sense that for any N (test) func-tions '1, . . . ,'N , for any N , the N £N matrix with entries pi j = cPX ('i °' j ) is non-negative definite.

Normalization: cPX (0) = 1.

In view of the finite-dimensional result (Bochner’s theorem), it is natural to ask if a condi-tion in terms of continuity can be given also in the infinite-dimensional case, so that anyfunctional cPX fulfilling this continuity condition in addition to the above two, uniquelyidentifies a probability measure on X 0. In the case where X is a nuclear space (and, inparticular, for X =S (Rd ) or D(Rd ), cf. Subsection 3.1.3) such a condition is given by theMinlos-Bochner theorem.

Theorem 8 (Minlos-Bochner). Let X be a nuclear space and let cPX : X ! C be a func-tional that is positive-definite in the sense discussed above, fulfills cPX (0) = 1, and is con-

39




tinuous X ! C. Then, there exists a unique probability measure PX on X 0 (the continu-ous dual of X ), such that

cPX (') =Z

X 0ejh',xiPX (dx) = E{ejh',X i}.

Conversely, the characteristic functional associated to some probability measure PX on X 0

is positive-definite, continuous over X , and such that cPX (0) = 1.

The practical implication of this result is that one can rely on characteristic functionalsto indirectly specify infinite-dimensional measures (most importantly, probabilities ofstochastic processes)—which are difficult to pin down otherwise. Operationally, the char-acteristic functional cPX (') is nothing but a mathematical rule (e.g., cPX (') = e°

12 k'k

22 )

that returns a value in C for any given function ' 2 S . The truly powerful aspect is thatthis rule condenses all the information about the statistical distribution of some underly-ing infinite-dimensional random object X . When working with characteristic functionals,we shall see that computing probabilities and deriving various properties of the said pro-cesses are all reduced to analytical derivations.

3.5 Generalized random processes and fields

In this section, we present an introduction to the theory of generalized random pro-cesses, which is concerned with defining probabilities on function spaces, that is, infinite-dimensional vector spaces with some notion of limit and convergence. We have madethe point before that the theory of generalized functions is a natural extension of finite-dimensional linear algebra. The same kind of parallel can be drawn between the theoryof generalized stochastic processes and conventional probability calculus (which dealswith finite-dimensional random vector variables). Therefore, before getting into moredetailed explanations, it is instructive to have a look back at Table 3.2, which provides aside-by-side summary of the primary probabilistic concepts that have been introducedso far. The reader is then referred to Table 3.4, which presents a comparison of finite- andinfinite-dimensional “innovation models”. To give the basic idea, in finite dimensions, an“innovation” is a vector in Rn of independent identically distributed (i.i.d.) random vari-ables. An “innovation model” is obtained by transforming such a vector by means of alinear operator (a matrix), which embodies the structure of dependencies of the model.In infinite dimensions, the notion of an i.i.d. vector is replaced by that of a random processwith independent values at every point (which we shall call an “innovation process”). Thetransformation is achieved by applying a continuous linear operator which constitutesthe generalization of a matrix. The characterization of such models is made possible bytheir characteristic functionals, which, as we saw in the previous section, are the infinite-dimensional equivalents of characteristic functions of random variables.

3.5.1 Generalized random processes as collections of random variables

A generalized stochastic process 7 is essentially a randomization of the idea of a general-ized function (Section 3.3) in much the same way as an ordinary stochastic process is arandomization of the concept of a function.

At a minimum, the definition of a generalized stochastic process s should permit us to as-sociate probabilistic models with observations made using test functions. In other words,

7. We shall use the terms random/stochastic process and field almost interchangeably. The distinction, ingeneral, lies in the fact that for a random process, the parameter is typically interpreted as time, while for a field,the parameter is typically multi-dimensional and interpreted as spatial or spatio-temporal location.

40



3.5. Generalized random processes and fields

finite-dimensional infinite-dimensional

standard Gaussian i.i.d. vector W =(W1, . . . ,WN )

standard Gaussian white noise w

pW (ª) = e°12 |ª|

2, ª 2RN

cPw (') = e°12 k'k

22 , ' 2S

multivariate Gaussian vector X Gaussian generalized process s

X = AW s = Aw (for continuous A : S 0 !S 0)

pX (ª) = e°12 |A

Tª|2cPs (') = e°

12 kA§'k2

general i.i.d. vector W = (W1, . . . ,WN )with exponent f

general white noise w with Lévy expo-nent f

pW (ª) = ePN

i=1 f (ªi )cPw (') = e

R

Rd f°

'(r )¢

dr

linear transformation of general i.i.d.random vector W (innovation model)

linear transformation of general whitenoise s (innovation model)

X = AW s = Aw

pX (ª) = pW (ATª) cPs (') = cPw (A§')

Table 3.4: Comparison of innovation models in finite- and infinite-dimensional settings.See Sections 4.3-4.5 for a detailed explanation.

to any test function ' in some suitable test-function space X is associated a randomvariable s('), also often denoted as h', si. This is to be contrasted with an observations(t ) at time t , which would be modeled by a random variable in the case of an ordinarystochastic process. We shall denote the probability measure of the random variable h', sias Ps,'. Similarly, to any finite collection of observations h'i , si, 1 ∑ i ∑ N , N 2 N, cor-responds a joint probability measure Ps,'1:'N on RN (we shall only consider real-valuedprocesses here, and therefore assume the observations to be real-valued).

Moreover, finite families of observations h'i , si, 1 ∑ i ∑ N , and h√ j , si, 1 ∑ j ∑ M , need tobe consistent or compatible, as explained in Appendix D.4, to ensure that all computationsof the probability of an event involving finite observations yield the same value for theprobability. In modeling physical phenomena, it is also reasonable to assume some weakform of continuity in the probability of h', si as a function of '.

Mathematically, these requirements are fulfilled by the kind of probabilistic model in-duced by a cylinder-set probability measure, as discussed in Appendix D.4. In other words,a cylinder-set probability measure provides a consistent probabilistic description for allfinite sets of observations of some phenomenon s using test functions ' 2 X . Further-more, a cylinder-set probability measure can always be specified via its characteristicfunctional cPs (') = E{ejh',si}, which makes it amenable to analytic computations.

The only conceptual limitation of such a probability model is that, at least a priori, it doesnot permit us to associate the sample paths of the process with (generalized) functions.Put differently, in this framework, we are not allowed to interpret s as a random entity be-longing to the dual X 0 of X , since we have not yet defined a proper probability measureon X 0. 8 Doing so involves some additional steps.

8. In fact, X 0 may very well be too small to support such a description (while the algebraic dual, X §, cansupport the measure—by Kolmogorov’s extension theorem—but is too large for many practical purposes). Animportant example is that of white Gaussian noise, which one may conceive of as associating a Gaussian randomvariable with variance k'k2

2 to any test function' 2 L2. However, the “energy” of white Gaussian noise is clearly

41




3.5.2 Generalized random processes as random generalized functions

Fortunately, the above existence and interpretation problem is fully resolved by taking X

to be a nuclear space, thanks to the Minlos-Bochner theorem (Theorem 8). This allowsfor the extension of the underlying cylinder-set probability measure to a proper (by whichhere we mean countably additive) probability measure on X 0 (the topological dual of X ),as sketched out in Appendix D.4.

In this case, the joint probabilities Ps,'1:'N , '1, . . . ,'N 2 X , N 2 N, corresponding tothe random variables h'i , si= hs,'i i for all possible choices of test functions, collectivelydefine a probability measure Ps on the infinite-dimensional dual space X 0. This meansthat we can view s as an element drawn randomly from X 0 according to the probabilitylaw Ps .

In particular, if we take X to be either S (Rd ) or D(Rd ), then our generalized random pro-cess/field will have realizations that are distributions in S 0(Rd ) or D0(Rd ), respectively.We can then also think of h', si as the measurement of this random object s by means ofsome sensor (test function) ' in S or D.

Since we shall rely on this fact throughout the book, we reiterate once more that a com-plete probabilistic characterization of s as a probability measure on the space X 0 (dualto some nuclear space X ) is provided by its characteristic functional. The truly powerfulaspect of the Minlos-Bochner theorem is that the implication goes both ways: any con-tinuous positive-definite functional cPs : X ! C with proper normalization identifies aunique probability measure Ps on X 0. Therefore, to define a generalized random pro-cess s with realizations in X 0, it suffices to produce a functional cPs : X ! C with thenoted properties.

3.5.3 Determination of statistics from the characteristic functional

The characteristic functional of the generalized random process s contains complete in-formation about its probabilistic properties, and can be used to compute all probabilities,and to derive or verify the probabilistic properties related to s.

Most importantly, it can yield the N th-order joint probability density of any set of linearobservations of s by suitable N -dimensional inverse Fourier transformation. This followsfrom a straightforward manipulation in the domain of the characteristic function and isrecorded for further reference.

Proposition 1. Let Y = (h'1, si, . . . ,h'N , si) with '1, . . . ,'N 2 X be a set of linear meas-urements of the generalized stochastic process s with characteristic functional cPs (') =E{ejh',si} that is continuous over the function space X . Then,

pY (ª) = cPs,'1:'N (ª) = cPs

√

NX

n=1ªn'n

!

and the joint pdf of Y is given by

pY (y) =F°1

{pY }(y) =Z

RNcPs

√

NX

n=1ªn'n

!

e°jhy ,ªi dª(2º)n ,

where the observation functions 'n 2 X are fixed and ª = (ª1, . . . ,ªN ) plays the role of theN -dimensional Fourier variable.

infinite. Therefore it cannot be modeled as a randomly chosen function in (L2)0 = L2.

42




Proof. The continuity assumption over the function space X (which need not be nuclear)ensures that the manipulation is legitimate. Starting from the definition of the character-istic function of Y , we have

pY (ª) = E©

exp°

jhª, yi¢™

= E

(

exp°

jNX

n=1ªnh'n , si

¢

)

= E

(

exp°

jhNX

n=1ªn'n , si

¢

)

(by linearity of duality product)

= cPs

√

NX

n=1ªn'n

!

(by definition of cPs ('))

The density pY is then obtained by inverse (conjugate) Fourier transformation.

Similarly, the formalism allows one to retrieve all first- and second-order moments of thegeneralized stochastic process s. To that end, one considers the mean and correlationfunctionals defined and computed as

Ms (') := E{h', si} = (°j)d

dªcPs,'(ª)

Ø

Ø

ª=0

= (°j)d

dªcPs (ª')

Ø

Ø

ª=0.

Bs ('1,'2) := E{h'1, sih'2, si} = (°j)2 @2

@ª1@ª2

cPs,'1,'2 (ª1,ª2)Ø

Ø

ª1,ª2=0

= (°j)2 @2

@ª1@ª2

cPs (ª1'1 +ª2'2)Ø

Ø

ª1,ª2=0.

When the space of test functions is nuclear (X =S (Rd ) or D(Rd )) and the above quantit-ies are well defined, we can find generalized functions ms (the generalized mean) and cs(the generalized autocorrelation function) such that

Ms (') =Z

Rd'(r )ms (r ) dr , (3.12)

Bs ('1,'2) =Z

Rd'1(r )'2(s)cs (r , s) dr . (3.13)

The first identity is simply a consequence of Ms being a continuous linear functional onX , while the second is an application of Schwartz’ kernel theorem (Theorem 2).

3.5.4 Operations on generalized stochastic processes

In constructing stochastic models, it is of interest to separate the essential randomness ofthe models (the “innovation”) from their deterministic structure. Our way of approachingthis objective is by encoding the random part in a characteristic functional cPw , and thedeterministic structure of dependencies in an operator U (or, equivalently, in its adjointU§). In the following paragraphs, we first review the mathematics of this construction,before we come back to, and clarify, the said interpretation. The concepts presented herein an abstract form are illustrated and made intuitive in the remainder of the book.

Given a continuous linear operator U : X ! Y with continuous adjoint U§ : Y 0 ! X 0,where X ,Y need not be nuclear, and a functional

cPw : Y !C

43




that satisfies the three conditions of Theorem 8 (continuity, positive-definiteness, andnormalization), we obtain a new functional

cPs : X !C

fulfilling the same properties by composing cPw and U as per

cPs (') = cPw (U') for all ' 2X . (3.14)

WritingcPs (ª') = E{ejªh',si} = bph',si(ª)

andcPw (ªU') = E{ejªhU',wi} = bphU',wi(ª)

for generalized processes s and w , we deduce that the random variables h', si andhU', wi have the same characteristic functions and therefore follow

h', si= hU', wi in probability law.

The manipulation that led to Proposition 1 shows that a similar relation exists, more gen-erally, for any finite collection of observations h'i , si and hU'i , wi, 1 ∑ i ∑ N , N 2N.

Therefore, symbolically at least, by the definition of the adjoint U§ : Y 0 ! X 0 of U, wemay write

h', si= h',U§wi.

This seems to indicate that, in a sense, the random model s, which we have defined using(3.14), can be interpreted as the application of U§ to the original random model w . How-ever, things are complicated by the fact that, unless X and Y are nuclear spaces, we maynot be able to interpret w and s as random elements of Y 0 and X 0, respectively. There-fore the application of U§ : Y 0 ! X 0 to s should be understood to be merely a formalconstruction.

On the other hand, by requiring X to be nuclear and Y to be either nuclear or completelynormed, we see immediately that cPs : X ! C fulfills the requirements of the Minlos-Bochner theorem, and thereby defines a generalized random process with realizations inX 0.

The previous discussion suggests the following approach to defining generalized randomprocesses: take a continuous positive-definite functional cPw : Y ! C on some (nuclearor completely normed) space Y . Then, for any continuous operator U defined from anuclear space X into Y , the composition

cPs = cPw (U·)

is the characteristic functional of a generalized random process s with realizations in X 0.

In subsequent chapters, we shall mostly focus on the situation where U = L°1§ and U§ =L°1 for some given (whitening) operator L that admits a continuous inverse in the suitabletopology, the typical choice of spaces being X =S (Rd ) and Y = Lp (Rd ). The underlyinghypothesis is that one is able to invert the linear operator U and to recover w from s,which is formally written as w = Ls; that is,

h', wi= h',Lsi, for all ' 2Y .

The above ideas are summarized in Figure 3.1.

44




� �

�

� �

��Ps(�) = �Pw(U�) with U = L�1�

U�

w s = L�1w

�Ps�Pw

U

Figure 3.1: Definition of linear transformation of generalized stochastic processes usingcharacteristic functionals. In this book, we shall focus on innovation models where w isa white noise process. The operator L = U°1§ (if it exists) is called the whitening operatorof s since Ls = w .

3.5.5 Innovation processes

In a certain sense, the most fundamental class of generalized random processes we canuse to play the role of w in the construction of Section 3.5.4 are those with independentvalues at every point in Rd [GV64, Chap. 4, pp. 273-288]. The reason is that we can thenisolate the spatiotemporal dependency of the probabilistic model in the mixing operator(U§ in Figure 3.1), and attribute randomness to independent contributions (innovations)at geometrically distinct points in the domain. We call such a construction an innovationmodel.

Let us attempt to make the notion of independence at every point more precise in thecontext of generalized stochastic processes, where the objects of study are, more accur-ately, not pointwise observations, but rather observations made through scalar productswith test functions. To qualify a generalized process s as having independent values atevery point, we therefore require that the random variables h'1, wi and h'2, wi be inde-pendent whenever the test functions '1 and '2 have disjoint supports.

Since the joint characteristic function of independent random variables factorizes (is sep-arable), we can formulate the above property in terms of the characteristic functional cPwof w as

cPw ('1 +'2) = cPw ('1) cPw ('2).

An important class of characteristic functionals fulfilling this requirement are those thatcan be written in the form

cPw (') = eR

Rd f ('(r )) dr . (3.15)

To have cPw (0) = 1 (normalization), we require that f (0) = 0. The requirement of positive-definiteness narrows down the class of admissible functions f much further, practicallyto those identified by the Lévy-Khinchine formula. This will be the subject of the greaterpart of our next chapter.

3.5.6 Example: Filtered white Gaussian noise

In the above framework, we can define white Gaussian noise or innovation on Rd as arandom element of the space of Schwartz generalized functions, S 0(Rd ), whose charac-

45




teristic functional is given bycPw (') = e°

12 k'k

22 .

Note that this functional is a special instance of (3.15) with f (ª) = 12ª

2. The Gaussianappellation is justified by observing that, for any N test functions '1, . . . ,'N , the randomvariables h'1, wi, . . . ,h'N , wi are jointly Gaussian. Indeed, we can apply Proposition 1 toobtain the joint characteristic function

cP'1:'N (ª) = exp

√

°12

∞

∞

∞

∞

∞

NX

i=1ªi'i

∞

∞

∞

∞

∞

2

2

!

.

By taking the inverse Fourier transform of the above expression, we find that the randomvariables h'i , wi, i = 1, . . . , N , have a multivariate Gaussian distribution with mean 0 andcovariance matrix with entries

Ci j = h'i ,' j i.

The independence of h'1, wi and h'2, wi is obvious whenever '1 and '2 have disjointsupport. This justifies calling the process white. 9 In this special case, even mere ortho-gonality of '1 and '2 is enough for independence, since for '1 ?'2 we have Ci j = 0.

From Formulas (3.12) and (3.13), we also find that w has 0 mean and “correlation func-tion” cw (r , s) = ±(r °s), which should also be familiar. In fact, this last expression is some-times used to formally “define” white Gaussian noise.

A filtered white Gaussian noise is obtained by applying a continuous convolution (i.e.,LSI) operator U§ : S 0 ! S 0 to the Gaussian innovation in the sense described in Section3.5.4.

Let us denote the convolution kernel of the operator U : S ! S (the adjoint of U§) byh. 10 The convolution kernel of U§ : S 0 !S 0 is then h_. Following Section 3.5.4, we findthe following characteristic functional for the filtered process U§w = h_ §w :

cPU§w (') = e°12 kh§'k2

2 .

In turn, it yields the following mean and correlation functions

mU§w (r ) = 0,

cU§w (r , s) =°

h §h_¢

(r ° s),

as expected.

9. Our notion of whiteness in this book goes further than having a white spectrum. By whiteness, we meanthat the process is stationary and has truly independent (not merely uncorrelated) values over disjoint sets.

10. Recall that, for the convolution to map back into S , h needs to have a smooth Fourier transform, whichimplies rapid decay in the temporal or spatial domain. This is the case, in particular, for any rational transferfunction that lacks purely imaginary poles.

46



Chapter 4

Continuous-domain innovation models

The stochastic processes that we wish to characterize are those generated by linear trans-formation of non-Gaussian white noise. If we were operating in the discrete domain andrestricting ourselves to a finite number of dimensions, we would be able to use any se-quence of i.i.d. random variables wn as system input and rely on conventional multivari-ate statistics to characterize the output. This strongly suggests that the specification ofthe mixing matrix (L°1) and the probability density function (pdf) of the innovation issufficient to obtain a complete description of a linear stochastic process, at least in thediscrete setting.

But our goal is more ambitious since we place ourselves in the context of continuously-defined processes. Then, the situation is not quite as straightforward because: 1) weare dealing with infinite dimensional objects, 2) it is much harder to properly definethe notion of continuous-domain white noise, and 3) there are theoretical restrictionson the class of admissible innovations. While this calls for an advanced mathematicalmachinery, the payoff is that the continuous-domain formalism lends itself better to ana-lytical computations, by virtue of the powerful tools of functional and harmonic analysis.Another benefit is that the non-Gaussian members of the family are necessarily sparse asa consequence of the theory which rests upon the powerful characterization and exist-ence theorems by Lévy-Khinchine, Minlos-Bochner, and Gelfand-Vilenkin.

As in the subsequent chapters, we start by providing some intuition in the first sectionand then proceed with a more formal characterization. Section 4.2 is devoted to an in-depth investigation of Lévy exponents which are intimately tied to the family of infinitelydivisible distributions in the classical (scalar) theory of probability. What is nonstandardhere and fundamental to our argumentation is the link that is made between infinite di-visibility and sparsity in 4.2.3. In Section 4.3, we apply those results to the Fourier-domaincharacterization of a multivariate linear model driven by an infinitely divisible noise vec-tor, which primarily serves as preparation for the subsequent infinite-dimensional gen-eralization. In Section 4.4, we extend the formulation to the continuous domain whichresults in the proper specification of white Lévy noise w (or non-Gaussian innovations)as a generalized stochastic process (in the sense of Gelfand and Vilenkin) within inde-pendent “values” at every point. The fundamental result is that a given brand of noise(or innovations) is uniquely specified by its Lévy exponent f (!) via its characteristic formcPw ('). Finally, in Section 4.5, we characterize the statistical effect of the mixing operator

L°1 (general linear model) and provide mathematical conditions on f and L that ensurethat the resulting process s = L°1w is well-defined mathematically.

45



4. CONTINUOUS-DOMAIN INNOVATION MODELS

�4 �2 2 4

0.1

0.2

0.3

0.4

�4 �2 2 4

0.1

0.2

0.3

0.4

0.5

�4 �2 0 2 4

0.1

0.2

0.3

0.4

0.5

0.6

0.7

�4 �2 2 4

0.05

0.10

0.15

0.20

0.25

0.30

(a) Gaussian

(b) Laplace

(c) Compound Poisson

(d) Cauchy (stable)

2000

�5

5

2000

�5

5

2000

�5

5

2000

�5

5

Figure 4.1: Examples of canonical, infinitely divisible probability density functions andcorresponding observations of a continuous-domain white-noise process through an ar-ray of non-overlapping rectangular integration windows: (a) Gaussian distribution (notsparse), (b) Laplace distribution (moderately sparse), (c) Compound-Poisson distribu-tion (finite rate of innovation), (d) Cauchy distribution (ultra-sparse = heavy-tailed withunbounded variance).

4.1 Introduction: From Gaussian to sparse probability distributions

Intuitively, a continuous-domain white-noise process is formed by the juxtaposition of acontinuum of i.i.d. random contributions. Since these atoms of randomness are infinites-imal, the realizations (a.k.a. sample paths) of such processes are highly singular (discon-tinuous), meaning that they do not admit a classical interpretation as (random) functionsof the index variable r 2 Rd . Consequently, the random variables associated with thesample values w(r0) are undefined. The only concrete way of observing such noises isby probing them through some localized analysis window'(·°r0) centered around somelocation r0. This produces some scalar quantity X = h'(·°r0), wi which is a conventionalrandom variable with some pdf pX ('). Note that pX (') is independent upon the positionr0, which reflects the fact that w is stationary. In order to get some sense about the varietyof achievable random patterns, we propose to convert the continuous-domain process winto some corresponding i.i.d. sequence X

k

(discrete white noise) by selecting a sequenceof non-overlapping rectangular windows:

Xk

= hrect(·°k), wi.

The concept is illustrated in Figure 4.1. The main point that will be made clearer in thesequel is that there is a one-to-one correspondence between the pdf of X

k

—the so-called

46



4.2. Lévy exponents and infinitely divisible distributions

canonical pdf pid(x) = pX (rect)(x)—and the complete functional description of w via itscharacteristic form, which we shall investigate in Section 4.4. What is more remarkable(and distinct from the discrete setting) is that this canonical pdf cannot be arbitrary; thetheory dictates that it must be part of the family of infinitely divisible (id) laws (see Section4.2).

The prime example of an id pdf is the Gaussian distribution illustrated in Figure 4.1a. Asalready mentioned, it is the only non-sparse member of the family. All other distributionseither exhibit a mass density at the origin (like the compound-Poisson example in Figure4.1c with Prob(x = 0) = e°∏ = 0.75 and Gaussian amplitude distribution), or a slower rateof decay at infinity (fat-tail behavior). The Laplacian of Figure 4.1b results in the mild-est possible form of sparsity—indeed, it can be proven that there is a gap between theGaussian and the other members of the family in the sense that there is no id distribu-tion with p(x) = e°O(|x|1+≤) with 0 < ≤ < 1. In other words, a non-Gaussian pid(x) is con-strained to decay like e°∏|x| or slower—typically, like O(1/|x|r ) with r > 1 (inverse poly-nomial/algebraic decay). The sparsest example in Figure 4.1 is provided by the Cauchydistribution pCauchy(x) = 1

º(x2+1) which is part of the symmetric-alpha-stable (SÆS) fam-ily (here, Æ = 1). The SÆS distributions with Æ 2 (0,2) are notorious for their heavy-tailbehavior and the fact that their moments E{|x|p } are unbounded for p >Æ.

4.2 Lévy exponents and infinitely divisible distributions

The investigation of sparse stochastic processes requires a solid understanding of theclassical notions of Lévy exponents and infinite divisibility, which constitute the pillarsof our formulation. This section provides a self-contained presentation of the requiredmathematical background. It also brings out the link with sparsity.

Definition 5 (Lévy exponent). A continuous, complex-valued function f :R!C such thatf (0) = 0 is a valid Lévy exponent iff. it is conditionally positive-definite of order one, so that

NX

m=1

NX

n=1f (!m °!n)ªmªn ∏ 0

under the conditionPN

m=1 ªm = 0 for every possible choice of !1, . . . ,!N 2 R, ª1, . . . ,ªN 2 Cand N 2Z+.

The importance of Lévy exponents in mathematical statistics is that they are tightly linkedwith the property of infinite divisibility.

Definition 6 (Infinite divisibility). A random variable X with generic pdf pid is infinitelydivisible (id) iff., for any N 2Z+, there exist i.i.d. random variables X1, . . . , XN such that Xhas the same distribution as X1 +·· ·+XN .

The foundation of the theory of such random variables is that their characteristic func-tions are in one-to-one correspondence with Lévy exponents. While the better knownformulation of this equivalence is provided by the Lévy-Khintchine theorem (Theorem7), we first like to express it in functional terms, building upon the work of three giants inharmonic analysis: Lévy, Bochner, and Schoenberg.

Theorem 6 (Lévy-Schoenberg). Let pid(!) = E{ej!X } =R

R ej!x pid(x) dx be the character-istic function of an infinitely divisible (id) random variable X . Then,

f (!) = log pid(!)

47




is a Lévy exponent in the sense of Definition 5. Conversely, if f is a valid Lévy exponent,then the inverse Fourier integral

pid(x) =Z

Re f (!)e°j!x d!

2º

yields the pdf of an id random variable.

The proof is given in the supplementary material in Section 4.2.4. As for the latter implic-ation, we observe that the condition f (0) = 0 , pid(0) = 1 implies that

R

R pid(x) dx = 1,while the positive-definiteness ensures that pid(x) ∏ 0 so that it is a valid pdf.

4.2.1 Canonical Lévy-Khinchine representation

The second, more explicit statement of the announced equivalence with id distributionscapitalizes on the property that Lévy exponents admit a canonical representation in termsof their so-called Lévy density. 1

Definition 7 (Lévy density). A nonnegative function v :R!R+ is called a Lévy density if itsatisfies the admissibility condition

Z

Rmin(a2,1)v(a) da <1. (4.1)

As in the case of a pdf, the density v is implicitly linked to a measure which allows for theinclusion of isolated Dirac impulses (discrete part of the measure).

Theorem 7 (Lévy-Khinchine). A probability distribution pid is infinitely divisible (id) ifand only if its characteristic function can be written as

pid(!) =Z

Rpid(x)ej!x dx = exp

°

f (!)¢

(4.2)

with

f (!) = jb01!° b2!

2

2+

Z

R\{0}

°

eja!°1° ja!1|a|<1(a)¢

v(a) da (4.3)

where b01 2 R and b2 2 R+ are some arbitrary constants, and where v is an admissible Lévy

density; 1|a|<1(a) is the indicator function of the set ≠ = {a 2 R : |a| < 1} (i.e., 1≠(a) = 1 ifa 2≠ and 1≠(a) = 0 otherwise).

The admissibility condition (4.1) guarantees that the right-hand-side integral in (4.3)is well-defined; this follows from the bounds |eja! ° 1 ° ja!| < a2!2 and |eja! ° 1| <min(|a!|,2). An important aspect of the theory is that this allows for (non-integrable)Lévy densities with a singular behavior around the origin; for instance, v(a) =O(1/|a|2+≤)with ≤ 2 [0,1) as a ! 0.

The connection with Theorem 6 is that the Lévy-Khinchine expansion (4.3) provides acomplete characterization of the conditionally positive-definite functions of order one, asspecified in Definition 5. This theme is further developed in Appendix B, which contains

1. In most mathematical texts, the Lévy-Khinchine decomposition is formulated in terms of a Lévy measurerather than a Lévy density. Even though Lévy measures need not always have a density in the sense of the Radon-Nikodym derivative with respect to the Lebesgue measure (i.e., as an ordinary function), following Bourbaki wemay still identify them with positive linear functionals, which we represent notationally as integrals against a“generalized” density: V (E) =

R

E v(a) da for any set in the Borel algebra on R\{0}.

48




the proof of the above statement and also makes interesting links with theoretical resultsthat are fundamental to machine learning and approximation theory.

In Section 4.4, we shall indicate how id distributions (or, equivalently, Lévy exponents)can be used to specify an extended family of continuous-domain white-noise processes.In that context, we shall typically require that pid has a well-defined first-order absolutemoment and/or that it is symmetric with respect to the origin, which leads to the follow-ing simplifications of the canonical representation.

Corollary 2. Let pid be an infinitely divisible pdf whose characteristic function is given bypid(!) = e f (!). Then, depending on the properties of pid

°

or, equivalently, on the Lévy meas-ure v

¢

, the Lévy exponent f admits the following Lévy-Khinchine-type representations:

1. pid id symmetric°

i.e., pid(x) = pid(°x)¢

if and only if

f (!) =°b2!2

2+

Z

R\{0}

°

cos(a!)°1¢

v(a) da (4.4)

with v(a) = v(°a).

2. pid id withR

R |x|pid(x) dx <1 if and only if

f (!) = jb1!° b2!2

2+

Z

R\{0}

°

eja!°1° ja!¢

v(a) da (4.5)

withR

|a|>1 |a|v(a) da <1.

3. pid id withR

R\{0} |a|v(a) da <1 if and only if

f (!) = jb001!° b2!

2

2+

Z

R\{0}

°

eja!°1¢

v(a) da (4.6)

where b1 2R, b2 2R+, b001 = b1 °

R

R\{0} a v(a) da and v(a) ∏ 0 is an admissible Lévy density.

These are obtained by direct manipulation of (4.3) with b01 = b1 +

R

|a|∏1 a v(a) da. Equa-tion (4.4) is valid in all generality, provided that we interpret the integral as a Cauchyprincipal-value limit to handle potential (symmetric) singularities around the origin. TheLévy-Khinchine formulas (4.4) and (4.5) are fundamental because they give an explicit,constructive characterization of the noise functionals that are central to our formulation.From now on, we rely on Corollary 2 to specify admissible Lévy exponents: The paramet-ers (b1,b2, v) will be referred to as the Lévy triplet of f (!).

Below is a summary of known criteria for identifying admissible Lévy exponents, somebeing more operational than others [GV64, pp. 275-282]. These are all consequences ofBochner’s theorem, which provides a Fourier-domain equivalence between continuous,positive-definite functions and probability density functions (or positive Borel measures).See Appendix B for an overview and discussion of the functional notion of positive defin-iteness and corresponding mathematical tools.

Proposition 2. The following statements on f are equivalent:

1. f (!) is a continuous, conditionally positive-definite function of order one.

2. pid(x) =F°1

{e f (!)}(x) is an infinitely divisible distribution.

3. f admits a Lévy-Kintchine representation as in Theorem 7.

4. Let pXø (x) = F°1

{eø f (!)} for ø ∏ 0. Then, {pXø }ø2R+ is a family of valid pdfs; that is,pXø (x) ∏ 0 and

R

R pXø (x) dx = 1 for all ø∏ 0.

5. pXø (!) = eø f (!) is a continuous, positive-definite function of ! 2 R with pXø (0) = 1 forany ø 2 [0,1).

49




Interestingly, it was Schoenberg (the father of splines) who first established the equival-ence between Statements 1) and 5) (see proof of the direct part in Section 4.2.4). The equi-valence between 4) and 5) follows from Bochner’s theorem (Theorem 3). The fact that 2)implies 4) is a side product of the proof in Appendix 4.2.4, while the converse implica-tion is a direct consequence of 3). Indeed, if f (!) has a Lévy-Khinchine representation,then the same is true for ø f (!), which also implies that the whole family of pdfs {pXø }ø2R+

is infinitely divisible. The latter is uniquely specified by f and therefore in one-to-onecorrespondence with the canonical id distribution pid(x) = pXø (x)

Ø

Ø

ø=1. Another import-ant observation is that pXø (!) =

°

e f (!)¢ø =°

pid(!))ø so that pXø in Statement 4) may beinterpreted as the ø-fold (possibly, fractional) convolution of pid.

In our work, we sometimes need to limit ourselves to some particular subset of Lévy ex-ponents.

Definition 8. A Lévy exponent f with derivative f 0 is called p-admissible if it satisfies theinequality

| f (!)|+ |!| | f 0(!)|∑C |!|p (4.7)

for some constant C > 0 and 0 < p ∑ 2.

Proposition 3. The generic Lévy exponents

1. f1(!) =R

R\{0}

°

eja!°1¢

v1(a) da withR

R |a|v1(a) da <12. f2(!) =

R

R\{0}

°

cos(a!)°1¢

v2(a) da withR

R a2v2(a) da <13. f3(!) =

R

R\{0}

°

eja!°1° ja!¢

v3(a) da withR

R a2v3(a) da <1are p-admissible with p1 = 1, p2 = 2, and p3 = 2, respectively.

Proof. The first result follows from the bounds |eja!°1| ∑ |a| · |!| and | deja!

d! | < |a|. Thesecond is based on

Ø

Øcos(a!)°1Ø

Ø∑ |a!|2 and |sin(a!)|∑ |a!|. Specifically,

| f2(!)|∑Z

R|a!|2 v2(a) da = |!|2

Z

Ra2 v2(a) da

|!|| f 02(!)| = |!|

Ø

Ø

Ø

Ø

Z

R\{0}a sin(a!) v2(a) da

Ø

Ø

Ø

Ø

∑ |!|Z

R|a| |a!|v2(a) da = |!|2

Z

Ra2v2(a) da.

As for the third exponent, we also use the inequality |eja!°1° ja!|∑ |a!|2, which yields

| f3(!)|∑Z

R\{0}|a!|2 v3(a) da = |!|2

Z

Ra2 v3(a) da

|!| | f 03(!)| = |!|

Ø

Ø

Ø

Ø

Z

R\{0}ja(e!ja!°1) v3(a) da

Ø

Ø

Ø

Ø

∑ |!|Z

R|a| |a!|v3(a) da = |!|2

Z

Ra2v3(a) da,

which completes the proof.

Since the p-admissibility property is preserved through summation, this covers a largeportion of the Lévy exponents specified in Corollary 2.

50




Type pX (x) Var(X ) pX (!) =Z

RpX (x)ej!x dx Lévy density v(a)

Gaussian1

p2ºæ2

e°x2

2æ2 æ2 e°!2æ2

2 N/A

Compound Poisson e°∏±(x)+ (1°e°∏)p A,∏(x) exp°

∏pa(!)°∏p A(0)¢

∏ p A(a) 2 L1(R)

expµ

∏

Z

R(ej!a °1)p A(a) da

∂

Laplace (∏ 2R+)∏

2e°∏|x|

2∏2

∏2

∏2 +!2

e°∏|a|

|a|

Sym Gamma r 2R+2

12 °r |x|r° 1

2 K 12 °r (|x|)

°(r )pº

2rµ

11+!2

∂r r e°|a|

|a|

Hyperbolic cosine æ0 2R+ 1

2æ0 cosh≥

ºx2æ0

¥ æ20

1cosh(!æ0)

12

1a sinh ºa

2æ0

Meixner r, s 2R+ 2r°2

Ø

Ø°( r2 + j x

2s )Ø

Ø

2

sº°(r )s2r

µ

1cosh(s!)

∂r r2

1a sinh ºa

2s

Cauchy s 2R+ 1º

ss2 +x2 N/A e°s|!| s

ºa2

Sym Student r 2R+ 1

B°

r, 12

¢

µ

1x2 +1

∂r+ 12 r ∑ 1 : N/A

r > 1 : 12r°2

pº21°r |!|r K°r (|!|)°

°

r + 12

¢

B°

r, 12

¢ unknown

SÆS, Æ 2 (0,2], s 2R+ pÆ(x;Æ, s) N/A e°|s!|Æ CÆ,s

|a|1+Æ

Table 4.1: Primary families of symmetric, infinitely divisible distributions.

Examples: The power law fÆ(!) =°|!|Æ with 0 <Æ∑ 2 is Lévy Æ-admissible; it generatesthe symmetric Æ-stable (SÆS) id distributions [Fel71]. Note that fÆ(!) fails to be condi-tionally positive-definite for Æ > 2, meaning that the inverse Fourier transform of e°|!|

Æ

exhibits negative values and is therefore not a valid pdf. The upper acceptable limit isÆ= 2 and corresponds to the Gaussian law. More generally, a Lévy exponent that is sym-metric and twice-differentiable at the origin (which is equivalent to the variance of thecorresponding id distribution being finite) is p-admissible with p = 2; this follows as adirect consequence of Corollary 2 and Proposition 3.

Another fundamental instance, which generates the complete family of id compoundPoisson distributions, is

fPoisson(!) =∏

Z

R\{0}

°

eja!°1¢

p A(a) da

where ∏ > 0 is the Poisson rate and p A(a) ∏ 0 the amplitude pdf withR

R p A(a) da = 1. Ingeneral, fPoisson(!) is p-admissible with p = 1 provided that E{|a|} =

R

R |a|p A(a) da <1(cf. Proposition 3). If, in addition, p A is symmetric with a bounded variance, then thePoisson range of admissibility extends to p 2 [1,2]. Further examples of symmetric iddistributions are documented in Table 4.1. Their Lévy exponent is simply obtained bytaking f (!) = log pX (!).

The relevance of id distributions for signal processing is that any linear combination ofindependent id random variables is id as well. Indeed, let X1 and X2 be two independentid random variables with Lévy exponents f1 and f2, respectively; then, it is not difficult toshow that a1X1 +a2X2, where a1 and a2 are arbitrary constants, is id with Lévy exponentf (!) = f1(a1!)+ f2(a2!).

51




4.2.2 Deciphering the Lévy-Khinchine formula

From a harmonic-analysis perspective, the Lévy-Khinchine representation is closely re-lated to Bochner’s theorem stating that a positive-definite function g can always beexpressed as the Fourier transform of a positive finite Borel measure; i.e., g (!) =R

R ej!a v(a) da with v(a) ∏ 0 and g (0) =R

R v(a) da < 1. Here, the additional require-ment is that f (0) = 0 (conditional positive-definiteness of order one), which is enforcedby proper subtraction of a linear correction in ja! within the integral, the latter be-ing partly compensated by the addition of the component j!b1. The side benefit ofthis regularization is that it enlarges the set of admissible densities to those satisfyingR

Rmin(a2,1)v(a) da <1, which allows for a singular behavior around the origin. As forthe linear and quadratic terms outside the integral, they map into the singular point dis-tribution b1±

0(a)+b2±00(a) (weighted derivatives of Dirac impulses) that is concentrated

at the origin a = 0 and excluded from the (classical) Lebesgue integral. For the completedetails, we refer the reader to the second half of Appendix B. The proposed treatmentrelies on Gelfand and Vilenkin’s distributional characterization of conditionally positive-definiteness of order n in Theorem 28. Despite the greater generality of the result, we findits proof more enlightening and of lesser technical nature than the traditional derivationof the Lévy-Khintchine formula, which is summarized in Section 4.2.4-B, for complete-ness.

From a statistical perspective, the exponent f specified by the Lévy-Khinchine formulais the logarithm of the characteristic function of an id random variable. This means thatbreaking f into additive subparts is in fact equivalent to factorizing the pdf into convolu-tional factors. Specifically, let pX (!) = e

PNn=1 fn (!) be the characteristic function of a (com-

pound) id distribution where the fn are valid Lévy exponents. Then, pX (!) =QNn=1 pXn (!)

with pXn (!) = e fn (!) = E{ej!Xn }, which translates into the convolution relation

pX (x) =°

pX1 §pX2 § · · ·§pXN

¢

(x).

The statistical interpretation is that X = X1 +·· ·+XN where the Xn are independent withid pdf pXn . The infinitely divisible property simply translates into the fact that, for a givenf (!) and any N > 0, X can be always be broken down into N independent and identic-ally distributed components with Lévy exponent f (!)/N . Indeed, it is easy to see fromthe Lévy representation that the admissibility of f (!) implies that ø f (!) is a valid Lévyexponent as well for any ø∏ 0.

To further our understanding of id distributions, it is instructive to characterize the atomsof the Lévy-Khinchine representation. Focusing on the simplest form (4.6), we identifythree types of elementary constituents with the third type being motivated by the de-composition of a (continuous) Lévy density into a weighted “sum” of Dirac impulses:v(a) =

R

R v(ø)±(a °ø) døºP

n ∏n±(a °øn) with ∏n = v(øn)(øn °øn°1):

1. Linear term f1(!) = jb1!. This corresponds to the (degenerate) pdf of a constant X1 =b1 with pX1 (x) = ±(x °b1).

2. Quadratic term f2(!) = °b2!2

2 . As already mentioned, this leads to the centered Gaus-sian with variance b2 given by

pX2 (x) =F°1

{e°b2!2/2}(x) = 1

p

2ºb2exp

µ

° x2

2b2

∂

.

3. Exponential (or Poisson) term f3(!) = ∏(ejø! ° 1), which is associated with the ele-mentary Lévy triplet

°

0,0,∏±(a °ø)¢

. Based on the Taylor-series expansion pX3 (!) =e∏(z°1) = e°∏

P+1m=0

(∏z)m

m! with z = ej!ø, we readily obtain the pdf by (generalized) in-

52




verse Fourier transformation:

pX3 (x) =1X

m=0

e°∏∏m

m!±(x °mø).

This formula coincides with the continuous-domain representation of a Poisson dis-

tribution 2 with Poisson parameter∏ and gain factor ø; that is, Prob(X3 = øm) = e°∏∏m

m! .

More generally, when v(a) =∏p(a) where p(a) ∏ 0 is some arbitrary pdf withR

R p(a) da =1, we can make a compound-Poisson 3 interpretation with

fPoisson(!) =∏

Z

R(eja!°1)p(a) da =∏

°

p(!)°1¢

,

where p(!) =R

R eja!p(a) da is the characteristic function of p = p A . Using the fact thatp(!) is bounded, we apply the same type of Taylor-series argument and express the char-acteristic function as

e fPoisson(!) = e°∏1X

m=0

°

∏p(!)¢m

m!= pY (!).

Finally, by using the property that p(!)m is the characteristic function of the m-fold con-volution of p, we get the general formula of the compound-Poisson pdf with Poisson para-meter ∏ and amplitude distribution p as

pY (x) = e°∏µ

±(x)+ ∏

1!p(x)+ ∏2

2!(p §p)(x)+ ∏3

3!(p §p §p)(x)+·· ·

∂

. (4.8)

Thus, in essence, the Lévy-Khinchine formula is a description of the Fourier transform ofa distribution that is the convolution of three components: an impulse ±(·°b1) (shifting),a Gaussian of variance b2 (smoothing), and a compound-Poisson distribution (spread-ing). The effect of the first term is a simple re-centering of the pdf around b1. The thirdcompound-Poisson component is itself obtained via a suitable composition of m-foldconvolutions of some primary pdf p. It is much more concentrated at the origin thanthe Gaussian, because of the presence of the Dirac distribution with weight e°∏, but alsoheavier-tailed because of the spreading effect the m-fold convolution.

The additional linear correction terms in (4.3) and (4.5) allow for a wider variety of distri-butions that have the common property of being limits of compound-Poisson distribu-tions.

Proposition 4. Every id distribution is the weak limit of a sequence of Poisson distribu-tions.

Proof. Let pid be the characteristic function of some id distribution pid and consider anarbitrary sequence øn # 0. Then,

pXn (!) = exp°

ø°1n (pid(!)øn °1)

¢

2. The standard form of the discrete Poisson probability model is Prob(N = n) = e°∏∏n

n! with n 2N. It providesthe probability of a given number of independent events (n) occurring in a fixed space/time interval when theaverage rate of occurrence is ∏. The Poisson parameter is equal to the expected value of N , but also to itsvariance: ∏= E{N } = Var{N }.

3. The compound Poisson probability model has two components: The first is a random variable N thatfollows a Poisson distribution with parameter ∏, and the second a series of i.i.d. random variables A1, A2, A3, . . .with pdf p A which are drawn at each trial of N . Then, Y = PN

n=1 An is a compound-Poisson random variablewith Poisson parameter ∏ and amplitude pdf p A . Its mean and variance are given by E{Y } =∏E{A} and Var{Y } =∏Var(A), respectively.

53




is the characteristic function of a compound-Poisson distribution with parameter ∏= ø°1n

and amplitude distribution p(x) =F°1

{pid(!)øn }(x). Moreover, we have that

pXn (!) = exp≥

ø°1n (eøn log pid(!) °1)

¥

= exp°

log pid(!)+O(øn)¢

for every ! as n ! 1. Hence, pXn (!) ! exp°

log pid(!)¢

= pid(!) so that pXn convergesweakly to pid by Lévy’s continuity theorem (Theorem 4).

The id pdfs for which v › L1(R) are generally smoother than the compound-Poisson onesfor they not display a singularity (Dirac impulse) at the origin, unlike (4.8). Yet, dependingof the degree of concentration (or singularity) of v around the origin, they will typicallyexhibit a peaky behavior around the mean. While this class of distributions is responsiblefor the additional level of complication in the Lévy-Khinchine formula—as compared tothe simpler Poisson version (4.6)—, we argue that it is highly relevant for applicationsbecause of the many possibilities that it offers. Somewhat surprisingly, there are manyfamilies of id distributions with singular Lévy density that are more tractable mathemat-ically than their compound-Poisson cousins found in Table 4.1; at least, when consideringtheir pdf.

4.2.3 Gaussian vs. sparse categorization

The family of id distributions allows for a range of behaviors that varies between thepurely Gaussian and sparse extremes. In the context of Lévy processes, these are oftenreferred to as the diffusive and jump modes. To make our point, we consider two distinctscenarios.

Finite variance case

We first assume that the second moment m2 =R

R a2 v(a) da of the Lévy density is fi-nite, which also implies that

R

|a|>1 |a|v(a) da < 1 because of the admissibility condi-tion. Hence, the corresponding Lévy-Khinchine representation is (4.5). An interestingnon-Poisson example of infinitely-divisible probability laws that falls into this category(with non-integrable v) is the Laplace density with Lévy triplet

°

0,0, v(a) = e°|a||a|

¢

and

p(x) = 12 e°|x| (see Figure 4.1b). This model is particularly relevant in the context of sparse

signal recovery because it provides a tight connection between Lévy processes and total-variation regularization [UT11, Section VI].

Now, if the Lévy density is measurable°

i.e., v 2 L1(R)¢

, we can pull the linear cor-rection out of the Lévy-Khinchine integral and represent f using Expansion (4.6) withv(a) = ∏p A(a) and

R

R p A(a) da = 1. This implies that we can decompose X into the sumof two independent Gaussian and compound-Poisson random variables. The variances ofthe Gaussian and Poisson components are æ2 = b2 and ∏E{a2}, respectively. The Poissoncomponent is sparse because its probability density function exhibits the mass distribu-tion e°∏±(x) at the origin shown in Figure 4.1c, meaning that the chances for a continuousamplitude distribution of getting zero are overwhelmingly higher than any other value,especially for smaller values of ∏ > 0. It is therefore justifiable to use 0 ∑ e°∏ < 1 as ourPoisson sparsity index.

Infinite variance case

We now turn our attention to the case where the second moment of the Lévy density is un-bounded, which we like to label as “super-sparse”. To justify this terminology, we invoke

54




the Ramachandran-Wolfe theorem which states that the pth moment E{|x|p } with p 2R+

of an infinitely divisible distribution is finite iff.R

|a|>1 |a|p v(a) da <1 [Ram69, Wol71].For p ∏ 2, the latter is equivalent to

R

R |a|p v(a) da <1 because of the Lévy admissibilitycondition. It follows that the cases that are not covered by the previous scenario (includ-ing the Gaussian + Poisson model) necessarily give rise to distributions whose momentsof order p are unbounded for p ∏ 2. The prototypical representatives of such heavy-tail distributions are the alpha-stable ones (see Figure 4.1d) or, by extension, the broadfamily of infinitely divisible probability laws that are in their domain of attraction. It hasbeen shown that these distributions precisely fulfill the requirement for `p compressibil-ity [AUMew], which is a stronger form of sparsity than the presence of a mass probabilitydensity at the origin.

4.2.4 Proofs of Theorems 6 and 7

For completeness, we end this section on Lévy exponents with the proofs of the two keytheorems in the theory of infinitely divisible distributions. The Lévy-Schoenberg theoremis central to our formulation because it makes the link between the id property and thefundamental notion of positive definiteness. In the case of the Lévy-Khintchine theorem,we have opted for a sketch of proof which is adapted from the literature. The main in-tent there was to provide additional insights on the nature of the singularities of the Lévydensity and their effect on the form of the exponent.

A. Proof of Theorem 6 (Lévy-Schoenberg)

Let pid(!) =R

R ej!x pid(x) dx be the characteristic function of an id random variable.

Then, by definition,°

pid(!)¢1/n is a valid characteristic function for any n 2 Z+. Since

the convolution of two pdfs is a pdf, we can also take integer powers, which results into°

pid(!)¢m/n being a characteristic function. For any irrational number ø > 0, we can

specify a sequence of rational numbers m/n that converges to ø so that°

pid(!)¢m/n !

°

pid(!)¢ø with the limit function being continuous. This implies that pXø (!) =

°

pid(!)¢ø

is a characteristic function for any ø ∏ 0 by Lévy’s continuity theorem (Theorem 4).Moreover, pXø (x) =

°

pXø/s (!)¢s must be non-zero for any finite ø, owing to the fact that

lims!1 pXø/s (!) = 1. In particular, pid(!) = pXø (!)Ø

Ø

ø=1 6= 0 so that we can write it aspid(!) = e f (!) where f (!) is continuous with Re

°

f (!)¢

∑ 0 and f (0) = 0. Hence, pXø (!) =°

pid(!)¢ø = eø f (!) =

R

R ej!x pXø (x) dx, where pXø (x) is a valid pdf for any ø 2 [0,1), whichis Statement 4) in Proposition 2. By Bochner’s theorem (Theorem 3), this is equivalent toeø f (!) being positive-definite for any ø ∏ 0 with f (!) continuous and f (0) = 0. The firstpart of Theorem 6 then follows as a corollary of the next fundamental result, which is dueto Schoenberg.

Theorem 8 (Schoenberg correspondence). The function f (!) is conditionally positive-definite of order one if and only if eø f (!) is positive-definite for any ø> 0.

Proof. We only give the easy part and refer to [Sch38, Joh66] for the complete details. Theproperty that eø f (!) is positive-definite for every ø> 0 is expressed as

NX

m=1

NX

n=1ªmªneø f (!m°!n ) ∏ 0,

55




for every possible choice of !1, . . . ,!N 2 R, ª1, . . . ,ªN 2 C and N 2Z+. In the more restric-ted setup of Definition 5 where

PNn=1 ªn = 0, this can also be restated as

1ø

NX

m=1

NX

n=1ªmªn

°

eø f (!m°!n ) °1¢

∏ 0.

The next step is to take the limit

limø!0

NX

m=1

NX

n=1ªmªn

eø f (!m°!n ) °1ø

=NX

m=1

NX

n=1ªmªn f (!m °!n) ∏ 0,

which implies that f (!) is conditionally positive-definite of order one.

This also makes the second part of Theorem 6 easy because pid(!) = e f (!) can be factor-ized into a product of N identical positive-definite subparts with Lévy exponent 1

N f (!).

B. Sketch of proof of Theorem 7 (Lévy-Khinchine)

We start from the equivalence of the id property and Statement 4) in Proposition 2 estab-lished above. This result is restated as

eø f (!) °1ø

=Z

R(ej!x °1)

pXø (x)

ødx,

the limit of which as ø ! 0 exists and is equal to f (!). Next, we define the measureKø( dx) = x2

x2+1pXø (x)

ø dx, which is bounded for all ø > 0 because x2

x2+1 ∑ 1 and pXø is avalid pdf. We then express the Lévy exponent as

f (!) = limø!0

√

eø f (!) °1ø

!

= limø!0

Z

R(ej!x °1)

pXø (x)

ødx

= limø!0

Z

R

µ

ej!x °1° jx!1+x2

∂

x2 +1x2 Kø( dx)+ j! lim

ø!0a(ø)

where

a(ø) =Z

R

x1+x2

pXø (x)

ødx.

The technical part of the work, which is quite tedious and not included here, is to showthat the above integrals are bounded and that the two limits are well defined in the sensethat a(ø) ! a0 and Kø! K (weakly) as ø # 0 with K being a finite measure. This ultimatelyyields Khintchine’s canonical representation

f (!) = j!a0 +Z

R

µ

ej!x °1° jx!1+x2

∂

x2 +1x2 K ( dx)

where a0 2R and K is some bounded Borel measure. A potential advantage of Khinchine’srepresentation is that the corresponding measure K is not singular. The connection withthe standard Lévy-Khinchine formula is b2 = K (0+)°K (0°) and v(x) dx = x2+1

x2 K ( dx) forx 6= 0. It is also possible to work out a relation between a and b1, which depends upon thetype of linear compensation in the canonical representation.

The above manipulation shows that the coefficients of the linear and quadratic terms ofthe Lévy-Khinchine formula (4.3) are primarily due to the non-integrable part of g (x) =limø#0

pXø (x)ø = x2+1

x2 k(x) where k(x) dx = K ( dx).

56



4.3. Finite-dimensional innovation model

By convention, the classical Lévy density v is assumed to be zero at the origin so that itdiffers from g by a point distribution that is concentrated at the origin. By invoking abasic theorem in distribution theory stating that a distribution entirely localized at theorigin can always be expressed as a linear combination of the Dirac impulse and its deriv-atives, we can write that g (x)° v(x) = b0±(x)+b0

1±0(x)+b2±

00(x), where the higher-orderderivatives of ± are excluded because of the admissibility condition.

For the indirect part of the proof, we start from the integral of the Lévy-Khinchine formulaand consider the sequence of distributions whose exponent is

fn(!) =Z

|a|<1/n

°

eja!°1° ja!1|a|<1(a)¢

v(a) da

=°j!Z

1/n<|a|<1av(a) da

| {z }

an

+Z

|a|>1/n

°

eja!°1¢

v(a) da.

Since the leading constant an is finite andR

|a|>1/n v(a) da <1 for any fixed n (due to theadmissibility condition on v), this corresponds to the exponent of a shifted compound-Poisson distribution whose characteristic function is e fn (!). This allows us to deduce thatpn(!) = e°b0

1 j!° b22 !2+ fn (!) is a valid characteristic function for any n 2 Z+. Finally, we

have the convergence of the sequence pn(!) ! e f (!) as n ! 1 where f (!) is given by(4.3). Since f (!)—and therefore, e f (!)—is continuous around != 0, we infer that e f (!) isa valid pdf (by Lévy’s continuity theorem). The continuity of f is established by boundingthe Lévy-Khinchine integral and invoking Lebesgue’s dominated converge theorem. Theid part is obvious.

4.3 Finite-dimensional innovation model

To set the stage for the infinite-dimensional extension to come in Section 4.4, it is in-structive to investigate the structure of a purely discrete innovation model whose inputis the random vector U = (U1, . . . ,UN ) of i.i.d. infinitely divisible random variables. Thegeneric N th-order pdf of the discrete innovation variable U is

pU (u1, . . . ,uN ) =NY

n=1pid(un) (4.9)

where pid(x) =F°1

{e f (!)}(x) and f is the Lévy exponent of the underlying id distribution.Since pU (u1, . . . ,uN ) is separable due to the independence assumption, we can write itscharacteristic function as the product of individual id factors

pU (!) = EU {ejh!,U i} =NY

n=1e f (!n )

= exp

√

NX

n=1f (!n)

!

(4.10)

where ! = (!1, . . . ,!N ) is the frequency variable. The N -dimensional output signal X =(X1, . . . , XN ) is then specified as the solution of the matrix-vector innovation equation

W = LX

where the N £N whitening matrix L is assumed to be invertible. This implies that X = AUis a linear transformation of the excitation noise with A = L°1. Its N th-order characteristic

57




function is obtained by simple (linear) change of variable

pX (!) = EU {ejh!,AU i} = EU {ejhAT!,U i}

= pU (AT!)

= exp

√

NX

n=1f°

[AT!]n¢

!

. (4.11)

Based on this equation, we can determine any marginal distribution by setting the appro-priate frequency variables to zero. For instance, we find that the first-order pdf of Xn , thenth component of X , is given by

pXn (x) =F°1 n

e fn (!)o

(x)

where

fn(!) =NX

m=1f°

anm!¢

with weighting coefficients anm = [A]nm = [L°1]mn . The key observation here is that fn isan admissible Lévy exponent, which implies that the underlying distribution is infinitelydivisible (by Theorem 6), with the same being true for all the marginals and, by exten-sion, the distribution of any linear measurement(s) of X . While this provides a generalmechanism for characterizing the probability law(s) of the discrete signal X within theclassical framework of multivariate statistics, it is a priori difficult to perform the requiredcomputations (matrix inverse and inverse Fourier transforms) analytically, except in theGaussian case where the exponent is quadratic. Indeed, in this latter situation, (4.11) sim-plifies to pX (!) = e°

12 kAT!k2

2 , which is the Fourier transform of a multivariate Gaussiandistribution with zero mean and covariance matrix E{X X T } = AAT .

As we shall see in the next two sections, these results are transposable to the infinite-dimensional setting (cf. Table 3.4). While this may look as an unnecessary complicationat first sight, the payoff is a theory that lends itself better to an analytical treatment usingthe powerful tools of harmonic analysis. The essence of the generalization is to replacethe frequency variable ! by a generic test function ' 2 S (Rd ), the sums in (4.10) and(4.11) by Lebesgue integrals and the matrix inverses by Green’s functions which can of-ten be specified explicitly. To make an analogy, it is conceptually and practically easierto formulate a comprehensive (deterministic) theory of linear systems using Fourier ana-lysis and convolution operators than by relying on linear algebra, with the same applyinghere. At the end of the exercise, it is still possible to come back to an finite-dimensionalsignal representation by projecting the continuous-domain model onto a suitable set ofbasis functions, as will be shown in Chapter 10.

4.4 White Lévy noises or innovations

Having gained a solid understanding of Lévy exponents, we can now move to the specific-ation of a corresponding family of continuous-domain white-noise processes to drive theinnovation model in Figure 2.1. To that end, we rely of Gelfand’s theory of generalizedstochastic processes[GV64], which was briefly summarized in Section 3.5. This powerfulformalism allows for the complete and remarkably concise description of a generalizedstochastic process by its characteristic form. While the latter is not widely used in thestandard formulation of stochastic processes, it lends itself quite naturally to the specific-ation of generalized white-noise processes in terms of Lévy exponents, in direct analogywith what we have done before for id distributions.

58



4.4. White Lévy noises or innovations

Definition 9 (White Lévy noise or innovation). A generalized stochastic process w overD0(Rd ) is called a white Lévy noise (or innovation) if its characteristic form is given by

cPw (') = E{ejh',wi} = expµ

Z

Rdf°

'(r )¢

dr

∂

(4.12)

where f is a valid Lévy exponent and ' is a generic test function in D(Rd ) (the space ofinfinitely differentiable functions of compact support).

Equation (4.12) is very similar to (4.2) and its multivariate extension (4.10). The key differ-ence is that the frequency variable is now replaced by the generic test function ' 2D(Rd )(which is a more general infinite-dimensional entity) and that the sum inside the expo-nential in (4.10) is substituted by an integral over the domain of'. The fundamental pointis that cPw (') is a continuous, positive-definite functional on D(Rd ) with the key propertythat cPw ('1 +'2) = cPw ('1) cPw ('2) whenever '1 and '2 have non-overlapping support(i.e., '1(r )'2(r ) = 0). The first part of the statement ensures that these generalized pro-cesses are well-defined (by the Minlos-Bochner theorem), while the separability propertyimplies that they take independent values at all points, which partially justifies the “whitenoise” nomenclature. Remarkably, Gelfand and Vilenkin have shown that there is also aconverse implication [GV64, Theorem 6, p. 283]: Equation (4.12) specifies a continuous,positive-definite functional on D (and hence defines an admissible white-noise process)if and only if f is a Lévy exponent. This ensures that the Lévy family constitutes the broad-est possible class of acceptable white-noise inputs for our innovation model.

4.4.1 Specification of white noise over Schwartz’ space S

In the present work, which relies a lot on convolution operators and Fourier analysis, wefind it more convenient to define generalized stochastic processes with respect to testfunctions in the nuclear space S (Rd ), rather than the smaller space D(Rd ) used by Gel-fand and Vilenkin. This requires a minimal restriction on the class of admissible Lévydensities in reference to Definition 7 to compensate for the lack of compact support ofthe functions in S (Rd ).

Theorem 9. A white Lévy noise specified by (4.12) with ' 2 S (Rd ) is a generalizedstochastic process over S 0(Rd ) provided that f (·) is characterized by the Lévy-Khintchineformula (4.3) with Lévy triplet

°

b1,b2, v¢

where the Lévy density v(a) ∏ 0 satisfies

Z

Rmin(a2, |a|≤)v(a) da <1 for some ≤> 0. (4.13)

Proof. The material in this section is currently under review. It is available from theauthors upon request.By the Minlos-Bochner theorem (Theorem 5), it suffices to show that dPw (') is a continuous, positive-definite functional over S (Rd ) withdPw (0) = 1, where the latter follows trivially from f (0) = 0. The positive definiteness is a direct consequence of the exponential nature of the

characteristic form and the conditional positive definiteness of the Lévy exponent°

see Section 4.4.3 and the paragraph following Equation (4.18)¢

.

The only delicate part is to prove continuity in the topology of S (Rd ). To that end, we consider a series of functions'n that converge to' in S (Rd ).

First, we recall that S (Rd ) Ω Lp (Rd ) for all 0 < p ∑+1. Moreover, the convergence in S (Rd ) implies the convergence in all the Lp spaces. Indeed,if we select k > 0 such that kp > 1 and ≤0 > 0, we have, for n sufficiently large,

|'n (r )°'(r )|∑≤0

(kr k2 +1)k

which implies that

k'n °'kpLp

∑ ≤p0

Z

dr

(kr k2 +1)kp.

Since the right-hand-side integral is convergent and independent of n, we conclude that limn!1k'n °'kLp = 0.

59




Since continuity is preserved through the composition of continuous maps and the exponential function is continuous, we only need to establishthe continuity of F (') = log dPw ('). The continuity of the Gaussian part is obvious since

R

'n dr !R

' dr and k'nkL2! k'kL2

. We therefore

concentrate on the functional

G(') =Z

Rd

Z

R\{0}

≥

eja'(r ) °1° ja'(r )1|a|∑1(a)¥

v(a) da dr

that corresponds to the non-Gaussian part of the Lévy-Khintchine representation of f . Next, we write

|G('n )°G(')| ∑Z

Rd

Z

|a|>1

Ø

Ø

Ø

eja'n (r ) °eja'(r )Ø

Ø

Ø

v(a) da dr

+Z

Rd

Z

0<|a|∑1

Ø

Ø

Ø

eja'n (r ) °eja'(r ) ° ja°

'n (r )°'(r )¢

Ø

Ø

Ø

v(a) da dr

= (1)+ (2).

To bound the first integral, we use the inequality

Ø

Ø

Ø

ejx °ejyØ

Ø

Ø

=Ø

Ø

Ø

ejy (ej(x°y) °1)Ø

Ø

Ø

∑ min°

2, |x ° y |¢

∑ 2µ |x ° y |

2

∂≤

under the (non-restrictive) condition that ≤∑ 1. This yields

(1) ∑ 21°≤Z

Rd

Z

|a|>1|a|≤ |'n (r )°'(r )|≤v(a) da dr

= 21°≤µ

Z

|a|>1|a|≤v(a) da

∂

k'n °'k≤L≤

As for the second integral, we use the bound

Ø

Ø

Ø

ejx °ejy ° j(x ° y)Ø

Ø

Ø

=Ø

Ø

Ø

ejy °

ej(x°y) °1° j(x ° y)¢

+ j(x ° y)(ejy °1)Ø

Ø

Ø

∑Ø

Ø

Ø

ej(x°y) °1° j(x ° y)Ø

Ø

Ø

+Ø

Ø

Ø

(x ° y)(ejy °1)Ø

Ø

Ø

∑ (x ° y)2 +|x ° y | · |y |.

Therefore,

(2) ∑Z

Rd

Z

0<|a|∑1a2°

'n (r )°'(r )¢2 v(a) da dr

+Z

Rd

Z

0<|a|∑1a2 |'n (r )°'(r )| · |'(r )|v(a) da dr

∑µ

Z

0<|a|∑1a2 v(a) da

∂

≥

k'n °'k2L2

+k'n °'kL2k'kL2

¥

.

Since k'n °'kLp ! 0 for all p > 0 as 'n converges to ' in S (Rd ), we conclude that limn!1 |G('n )°G(')| = 0, which proves the continuity of

dPw (').

Note that (4.13), which will be referred to as Lévy-Schwartz admissibility, is a very slightrestriction on the classical condition (≤= 0) for id laws (see (4.1) in Definition 7). The factthat ≤ can be chosen arbitrarily small reflects the property that the functions in S have afaster-than-algebraic decay. From now on, we implicitly assume that the Lévy-Schwartzadmissibility condition is met.

To exemplify the procedure, we select a quadratic exponent which is trivially admissible(since v(a) = 0). This results in

cPwGauss (') = expµ

°b2

2k'k2

L2

∂

,

which is the functional that completely characterizes the white Gaussian noise of the clas-sical theory of stationary processes.

4.4.2 Impulsive Poisson noise

We have already alluded to the fact that the continuous-domain white-noise processesw are highly singular and generally too rough to admit an interpretation as conventionalfunctions of the index variable r 2Rd . The realizations (or sample paths) are generalizedfunctions that can only be probed indirectly through their scalar products h', wi withtest functions or observation windows, as illustrated in Section 4.1. While the use of such

60




an indirect approach is unavoidable in the mathematical formulation, it is possible toprovide an explicit pointwise description of a noise realization in the special case wherethe Lévy exponent f is associated with a compound-Poisson distribution [UT11]. Thecorresponding impulsive Poisson noise model is

w(r ) =X

k2ZAk±(r ° rk ) (4.14)

where rk are random point locations in Rd and where the Ak are i.i.d. random variableswith pdf p A . The random events are indexed by k (using some arbitrary ordering); theyare mutually independent and follow a spatial Poisson distribution. Specifically, let ¶ beany finite-measure subset of Rd , then the probability of observing N (¶) = n events in¶ is

Prob(N (¶) = n) = e°∏Vol(¶) (∏Vol(¶))n

n!

where Vol(¶) is the measure (or spatial volume) of¶. This is to say that the Poisson para-meter ∏ represents the average number of random impulses per unit hyper-volume. Thelink with the formal specification of Lévy noise in Definition 9 is as follows.

Theorem 10. The characteristic form of the impulsive Poisson noise specified by (4.14) is

cPwPoisson (') = E{ejh',wi} = expµ

Z

RdfPoisson

°

'(r )¢

dr

∂

(4.15)

with

fPoisson(!) =∏

Z

R(eja!°1)p A(a) da =∏(p A(!)°1), (4.16)

where ∏ is the Poisson density parameter, p A the amplitude pdf of the Dirac impulses andp A the corresponding characteristic function.

Proof. We select an arbitrary test function' 2D(Rd ) of compact support, with its supportincluded in the centered cube¶' = [°c',c']d . We denote by Nw,' the number of Poissonpoints of w in¶'; by definition, it is a Poisson random variable with parameter ∏Vol(¶').The restriction of w to¶' corresponds to the random sum

Nw,'X

n=1a0

n±(r ° r

0n),

where we used an appropriate relabeling of the variables©

(ak ,rk )Ø

Ø

rk 2¶'™

in (4.14). Cor-

respondingly, we have h', wi=PNw,'n=1 a0

n'(r

0n).

By the order-statistics property of Poisson processes, the r

0n are independent and all equi-

valent in distribution to a random variable r

0 that is uniform on¶'.

Using the law of total expectation, we expand the characteristic functional of w , cPw (') =

61




E©

ejh',wi™, as

cPw (') = E©

E©

ejh',wiØØNw,'

™™

= E(

E

(Nw,'Y

n=1eja0

n'(r

0n )

Ø

Ø

Ø

Ø

Ø

Nw,'

))

= E(Nw,'

Y

n=1En

eja0'(r

0)o

)

(by independence)

= E(Nw,'

Y

n=1En

En

eja'(r

0)Ø

Ø

Ø

aoo

)

(total expectation)

= E

8

<

:

Nw,'Y

n=1E

8

<

:

R

¶'eja'(r

0) dr

0

Vol(¶')

9

=

;

9

=

;

(as r

0 is uniform in¶')

= E(Nw,'

Y

n=1

R

R

R

¶'eja'(r ) dr p A(a) da

Vol(¶')

)

. (4.17)

The last expression has the inner expectation expanded in terms of the pdf p A(a) of therandom variable a. Defining the auxiliary functional

M(') =Z

¶¡

Z

Reja'(r )p A(a) da dr ,

we rewrite (4.17) as

E

(Nw,'Y

n=1

M(')Vol(¶')

)

= E(

µ

M(')Vol(¶')

∂Nw,')

.

Next, we use the fact that Nw,' is a Poisson random variable to compute the above ex-pectation directly:

E

(

µ

M(')Vol(¶')

∂Nw,')

=X

n∏0

µ

M(')Vol(¶')

∂n e°∏Vol(¶') °∏Vol(¶')¢n

n!

= e°∏Vol(¶') X

n∏0

°

∏M(')¢n

n!

= e°∏Vol(¶')e∏M(') (Taylor)

= exp°

∏°

M(')°Vol(¶')¢¢

.

We now replace M(') by its integral equivalent, noting also that Vol(¶') =R

¶'

R

R 1 £p A(a) da dr , whereupon we obtain the expression

cPw (') = exp°

∏

Z

¶'

Z

R(eja'(r ) °1)p A(a) da dr

¢

.

As (eja'(r ) ° 1) vanishes outside the support of ' (and, therefore, outside ¶'), we mayenlarge the domain of the inner integral to all of Rd , which yields (4.15). Finally, we usethe fact that the derived Poisson functional is part of the Lévy family and invoke Theorem9 to extend the domain of cPw from D(Rd ) to S (Rd ).

The interest of this result is twofold. First, it gives a concrete meaning to the compoundPoisson scenario in Figure 4.1c, allowing for a description in terms of conventional pointprocesses. Along the same vein, we can propose a physical analogy for the elementary

62




Poisson term f3(!) = ∏(eja0! ° 1) in Section 4.2.2 with p A(a) = ±(a ° a0): the countingof photons impinging on the detectors of a CCD camera with the photon density beingconstant over Rd and the integration time proportional to ∏. The corresponding processis usually termed “photon noise” in optical imaging. Second, the explicit noise model(4.14) suggests a practical mechanism for generating generalized Poisson processes as aweighted sum of shifted Green functions of L, each Dirac impulse being replaced by theresponse of the inverse operator in the innovation model in Figure 2.1.

4.4.3 Properties of white noise

To emphasize the parallel with the scalar formulation in Section 4.2, we start by introdu-cing the functional counterpart of Definition 5.

Definition 10 (Generalized Lévy exponent). A continuous complex-valued functional Fon the nuclear space S (Rd ) such that F (0) = 0, F (') = F (°') is called a generalized Lévyexponent if it is conditionally positive-definite of order one; i.e.,

NX

m=1

NX

n=1F ('m °'n)ªmªn ∏ 0.

under the conditionPN

n=1 ªn = 0 for every possible choice'1, . . . ,'N 2S (Rd ), ª1, . . . ,ªN 2C,and N 2Z+.

This definition is motivated by the infinite-dimensional counterpart of Schoenberg’s cor-respondence theorem (Theorem 8) [PR70].

Theorem 11 (Prakasa Rao). Let F be a complex-valued functional on the nuclear spaceS (Rd ) such that F (0) = 0, F (') = F (°'). Then, the following conditions are equivalent.

1. The functional F is conditionally positive definite of order one.

2. For every choice '1, . . . ,'N 2S (Rd ), ª1, . . . ,ªN 2C, ø> 0 and N 2Z+,

NX

m=1

NX

n=1exp

°

øF ('m °'n)¢

ªmªn ∏ 0.

3. For every choice '1, . . . ,'N 2S (Rd ), ª1, . . . ,ªN 2C and N 2Z+,

NX

m=1

NX

n=1

°

F ('m °'n)°F ('m)°F (°'n)¢

ªmªn ∏ 0.

In the white Lévy noise scenario of Definition 9, we have that

F (') =Z

Rdf°

'(r )¢

dr . (4.18)

It then comes as no surprise that the generalized Lévy exponent F (') inherits the relevantproperties of f , including conditional positive definiteness, with

NX

m=1

NX

n=1F ('m °'n)ªmªn =

Z

Rd

NX

m=1

NX

n=1f°

'm(r )°'n(r )¢

ªmªn

| {z }

∏0

dr ∏ 0

subject to the constraintPN

n=1 ªn = 0.

The simple additive nature of the mapping (4.18) between generalized Lévy exponentsand the classical ones translates into the following white-noise properties which are cent-ral to our formulation.

63




1) Independent atoms and stationarity

Proposition 5. A white Lévy noise is stationary and independent at every point.

Proof. The stationarity property is expressed by cPw (') = cPw°

'(·° r0)¢

for all r0 2 Rd .It is established by simple change of variable in the defining integral. To investigatethe independence at every point, we determine the joint characteristic function of therandom variables X1 = h'1, wi and X2 = h'2, wi, which is given by pX1,X2 (!1,!2) =exp

°

F (!1'1 +!2'2)¢

where F is defined by (4.18). When'1 and'2 have non-overlappingsupport, we use the fact that f (0) = 0 and decompose the exponent as

F (!1'1 +!2'2) = F (!1'1)+F (!2'2),

which implies that

pX1,X2 (!1,!2) = exp°

F (!1'1)¢

£exp°

F (!2'2)¢

= pX1 (!1)£ pX2 (!2)

where pX (!) = exp°

F (!')¢

, which proves that X1 and X2 are independent. The inde-pendence at every point follows from the fact that one can consider contracting, Dirac-like sequences of functions '1 and '2 that are non-overlapping and whose support getsarbitrarily small.

2) Infinite divisibility

Proposition 6. A white Lévy noise is uniquely specified by a canonical id distributionpid(x) =

R

R e f (!)°j!x d!2º where f is the defining Lévy exponent in (4.18). The latter cor-

responds to the pdf of the observation X = hrect(·° r0), wi through a rectangular windowat some arbitrary location r0.

The first part is just a restatement of the functional equivalence between Lévy noises andid distributions on the one hand, and Lévy exponents on the other. As for the second part,we recall that the characteristic function of the variable X = h', wi is given by

E{ej!X } = E{ej!h',wi} = E{ejh!',wi} = cPw (!') = exp°

F (!')¢

.

By choosing '(r ) = rect(r ° r0) with rect(r ) = 1 for r 2 (° 12 , 1

2 ]d and zero otherwise,we formally resolve the integral

R

Rd f°

!rect(r )¢

dr = f (!); this implies that pid(!) =exp

°

f (!)¢

, which is the desired result.

Along the same line, we can show that the use of an arbitrary, non-rectangular analysiswindow does not fundamentally change the situation in the sense that the pdf remainsinfinitely divisible.

Proposition 7. The observation X = h', wi of a white Lévy noise with Lévy exponent fthrough an arbitrary observation window '—not necessarily in S (Rd )—yields an infin-itely divisible random variable X whose characteristic function is pX (!) = e f'(!) wheref'(!) =

R

Rd f°

!'(r )¢

dr . The validity of f' requires some (mild) technical condition on fwhen ' 2 Lp (Rd ) is not rapidly decaying.

This property is investigated in full depth in Chapter 9 and exploited for derivingtransform-domain statistics. The precise statement of this id result for' 2 Lp (Rd ) is givenin Theorem 21.

Another manifestation of the id property is that a continuous-domain Lévy noise can al-ways be broken down into an arbitrary number of independent and identically distrib-uted (i.i.d.) components.

64




Proposition 8. A white Lévy noise w is infinitely divisible in the sense that it can be decom-posed as w = w1 +·· ·+wN for any N 2Z+ where the wn are i.i.d. white-noise processes.

This simply follows from the property that the characteristic form of the sum of two inde-pendent processes is the product of their individual characteristic forms. Specifically, wecan write

cPw (') =≥

cPwN (')¥N

where cPwN (') = exp°R

Rd f°

'(r )¢

/N dr

¢

is the characteristic form of a Lévy noise. Thejustification is that f (!)/N is a valid Lévy exponent for any N ∏ 1. In the impulsive Pois-son case, this simply translates into the Poisson density parameter ∏ being divided by N .

Interestingly, there is also a converse to the statement in Proposition 8 [PR70]: a gen-eralized process s over S 0(Rd ) is infinitely divisible if and only if its characteristic formcan be written as cPs (') = exp

°

F (')¢

where F (') is a continuous, conditional positive-definite functional over S (Rd ) (or generalized Lévy exponent) as specified in Definition10 [PR70, Main theorem]. While this general characterization is nice conceptually, it ishardly discriminative since the underlying notion of infinite divisibility applies to all con-crete families of generalized stochastic processes that are known to us. In particular, itdoes not require the “whiteness” property that is fundamental for defining proper innov-ations.

3) Flat power spectrum

Strictly speaking, the properties of stationarity and independence at every point are notsufficient for specifying “white” noise. There is also some implicit idea of enforcing a flatFourier spectrum. A simple example that satisfies the two first properties but fails to meetthe latter is the weak derivative of a Lévy noise whose generalized power spectrum (whendefined) is not flat but proportional to |!|2.

The notion of power spectrum is based on second-order moments and does only makesense when the stochastic process is stationary with a well-defined autocorrelation. InGelfand’s theory, the second-order dependencies are captured by the correlation formBw ('1,'2) = E{h'1, wi · h'2, wi}, where it is assumed that the generalized noise processw is real-valued and that its second-order moments are well-defined. The latter second-order requirement is equivalent to imposing that the Lévy exponent f should be twicedifferentiable at the origin or, equivalently, that the canonical id distribution pid of theprocess has a finite second-order moment.

Proposition 9. Let w be a (second-order) white Lévy noise with Lévy exponent f such thatf 0(0) = 0 (zero-mean assumption) and æ2

w = ° f 00(0) < +1 (finite-variance assumption).Then,

Bw ('1,'2) =æ2w h'1,'2i. (4.19)

Formally, this corresponds to the statement that the autocorrelation of a second-order 4

Lévy noise is a Dirac impulse; i.e.,

Rw (r ) = E{w(r0)w(r0 ° r )} =Bw°

±(·° r0),±(·° r0 ° r )¢

=æ2w±(r )

4. In the statistical literature, a second-order process usually designates a stochastic process whose second-order moments are all well defined. In the case of generalized processes, the property refers to the existence ofthe correlation form.

65




where r0 2 Rd can be arbitrary, as a consequence of the stationarity property. This alsomeans that the generalized power spectrum of a second-order white Lévy noise is flat

©w (!) =F {Rw (r )}(!) =æ2w .

We recall that the term “white” is used in reference to white light, whose electromagneticspectrum is distributed over the visible band in a way that stimulates all color receptorsof the eye equally. This is in opposition with “colored” noise whose spectral content is notequally distributed.

Proof. We have that Bw ('1,'2) = E{X1X2} where X1 = h'1, wi and X2 = h'2, wi are real-valued random variables with joint characteristic function pX1,X2 (!) = exp

°

F (!1'1 +!2'2)

¢

with!= (!1,!2). We then invoke the moment-generating property of the Fouriertransform which translates into

Bw ('1,'2) = E{X1X2} = (°j)2 @2pX1,X2 (!)

@!1@!2

Ø

Ø

Ø

Ø

!1=0,!2=0.

By applying the chain rule twice, we obtain

@2pX1,X2 (!)

@!1@!2= e fX1,X2 (!)

µ

@ fX1,X2 (!)

@!1

@ fX1,X2 (!)

@!2+@2 fX1,X2 (!)

@!1@!2

∂

wherefX1,X2 (!) = log pX1,X2 (!1,!2) =

Z

Rdf°

!1'1(r )+!2'2(r )¢

dr

is the cumulant generating function of pX1,X2 . The required first derivative with respectto !i , i = 1,2 is given by

@ fX1,X2 (!)

@!i=

Z

Rdf 0°!1'1(r )+!2'2(r )

¢

'i (r ) dr ,

which, when evaluated at the origin, simplifies to

@ fX1,X2 (0)

@!i= f 0(0)

Z

Rd'i (r ) dr =°j E{Xi }. (4.20)

Similarly, we get@2 fX1,X2 (0)

@!1@!2= f 00(0)

Z

Rd'1(r )'2(r ) dr .

By combining those results and using the property that fX1,X2 (0) = 0, we conclude that

E{X1X2} =° f 00(0)h'1,'2i°°

f 0(0)¢2h'1,1ih'2,1i,

which is equivalent to (4.19) under the hypothesis that f 0(0) = 0. It is also clear from(4.20) that this latter condition is equivalent to the zero-mean property of the noise; thatis, E{h', wi} = 0 for all ' 2S (Rd ). Finally, we note that (4.19) is compatible with the moregeneral cumulant formula (9.20) if we set n = (1,1), n = 2, and ∑2 = (°j)2 f 00(0).

Since f (0) = 0 by definition, another way of writing the hypotheses in Proposition 9 is

f (!) =°æ2w

2 !2 +O(|!|3), which expresses an asymptotic equivalence with the symmetricGaussian scenario (purely quadratic Lévy exponent). This second-order assumption en-sures that the noise has zero-mean and a finite variance æ2

w so that its correlation form(4.19) is well defined. In the sequel, it is made implicitly whenever we are talking of cor-relations or power spectra.

66




4) Stochastic counterpart of the Dirac impulse

From an engineering perspective, white noise is often viewed as the stochastic analog ofthe Dirac distribution ±whose spectrum is flat in the literal sense (i.e., F {±} = 1). The fun-damental difference, of course, is that the generalized function ± is a deterministic entity.The simplest way of introducing randomness is by considering a shifted and weightedimpulse A±(·° r0) whose location r0 is uniformly distributed over some compact subsetof Rd and whose amplitude A is a random variable with pdf p A . A richer form of excit-ation is obtained through the summation of such i.i.d. elementary contributions whichresults in the construction of impulsive Poisson noise, as specified by (4.14). Theorem10 ensures that this explicit way of representing noise is legitimate in the case where theLévy density v = ∏p A is integrable and the Gaussian part absent. We shall now see thatthis constructive approach can be pushed to the limit for the non-Poisson brands of in-novations, including the Gaussian ones.

Proposition 10. A white Lévy noise is the limit of a sequence of impulsive Poisson-noiseprocesses in the sense of the weak convergence of the underlying infinite-dimensional meas-ures.

Proof. The technical part of the proof uses an infinite-dimensional generalization ofLévy’s continuity theorem and will be reported elsewhere. The key idea is to considerthe following sequence of Lévy exponents

fn(!) = n°

e1n f (!) °1

¢

= f (!)+Oµ

f 2(!)n

∂

,

which are of the compound-Poisson type with ∏n = n and p An (!) = e1n f (!) and which

converge to f (!) as n goes to infinity. This suggests forming the corresponding sequenceof characteristic functionals

cPwn (') = expµ

Z

Rdn

≥

e1n f

°

'(r )¢

°1¥

dr

∂

,

which are expected to converge to cPw (') = exp°R

Rd f°

'(r )¢

dr

¢

as n ! +1. The mainpoint for the argument is that these are all of the impulsive-Poisson type for n fixed (seeTheorem 10). The crux of the proof is to control the convergence by specifying someappropriate bound and to verify that some basic equicontinuity conditions are met.

The result is interesting because it gives us some insight into the nature of continuous-domain noise. The limit process involves random Dirac impulses that get denser as nincreases. When the variance of the noise æ2

w = ° f 00(0) is finite, the increase of the av-erage number of impulses per unit volume ∏n = O(n) is compensated by a decrease ofthe variance of the amplitude distribution in inverse proportion: Var{An} =æ2

w /n. Whileany of the generalized noise processes in the sequence is as rough as a Dirac impulse, thispicture suggests that the degree of singularity of the limit object in the non-Poisson scen-ario can be potentially reduced due to the accumulation of impulses and the fact that thevariance of their amplitude distribution converges to zero. The particular example thatwe have in mind is the Gaussian white noise, which can be obtained as a limit of com-pound Poisson processes with contracting Gaussian amplitude distributions.

67




4.5 Generalized stochastic processes and linear models

As already mentioned, the class of generalized stochastic processes that are of interest tous are those defined through the generic innovation model Ls = w (linear stochastic dif-ferential equation) where the differential operator L is shift-invariant and where the driv-ing term w is a continuous-domain white Lévy noise. Having made sense of the latter,we can now proceed with the specification of the class of admissible whitening operatorsL. The key requirement there is that the model be invertible (in the sense of general-ized functions) which, by duality, translates into some boundedness constraint on theadjoint operator L°1§. For the time being, we shall limit ourselves to making some gen-eral statements about L and its inverse that ensure existence while deferring to Chapter 5for concrete examples of admissible operators.

4.5.1 Innovation models

The interpretation of the above continuous-domain linear model in the sense of general-ized functions is

8' 2S (Rd ), h',Lsi= h', wi. (4.21)

The generalized stochastic process s = L°1w is generated by solving this equation, whichamounts to a linear transformation of the Lévy innovation w . Formally, this translatesinto

8' 2S (Rd ), h', si= h',L°1wi= hL°1§', wi (4.22)

where L°1 is an appropriate right inverse of L. The above manipulation obviously onlymakes sense if the action of the adjoint operator L°1§ is well-defined over Schwartz’ classS (Rd ) of test functions—ideally, a continuous mapping from S (Rd ) into itself or, pos-sibly, Lp (Rd ) (or some variant) if one imposes suitable restrictions of the Lévy exponent fto maintain continuity.

We like to refer to (4.21) as the analysis statement of the model, and to (4.22)—or its short-hand s = L°1w— as the synthesis description. Of course, this will only work properly if wehave an exact equivalence, meaning that there is a proper and unique definition of L°1.The latter will need to be made explicit on a case-by-case basis with the possible help ofboundary conditions.

4.5.2 Existence and characterization of the solution

We shall now see that, under suitable conditions on L°1§ (see Theorem 12 below), one cancompletely specify such processes via their characteristic form and ensure their existenceas solutions of (4.21). We recall that the characteristic form of a generalized stochasticprocess s is defined as

cPs (') = E{ejh',si}

=Z

S 0(Rd )ejh',siPs ( ds),

where the latter expression involves an abstract infinite-dimensional integral over thespace of tempered distributions and provides the connection with the defining measurePs on S 0(Rd ). cPs is a functional S (Rd ) !C that associates the complex number cPs (')to each test function ' 2 S (Rd ) and which is endowed with three fundamental proper-ties: positive definiteness, continuity, and normalization (i.e., cPs (0) = 1). It can also be

68



4.5. Generalized stochastic processes and linear models

specified using the more concrete formula

cPs (') =Z

Rejy dPh',si(y), (4.23)

which involves a classical Stieltjes integral with respect to the probability law PY =h',si =Prob(Y < y), where Y = h', si is a conventional scalar random variable, once ' is fixed.

For completeness, we also recall the meaning of the underlying terminology in the contextof a generic (normed or nuclear) space X of test functions.

Definition 11 (Positive-definite functional). A complex-valued functional G : X ! C

defined over the function space X is said to be positive-definite if

NX

m=1

NX

n=1G('m °'n)ªmªn ∏ 0

for every possible choice of '1, . . . ,'N 2X , ª1, . . . ,ªN 2C, and N 2N+.

Definition 12 (Continuous functional). A functional G : X ! R (or C) is said to be con-tinuous (with respect to the topology of the function space X ) if, for any convergent se-quence ('i ) in X with limit ' 2X , the sequence G('i ) converges to G('); that is,

limi

G('i ) =G(limi'i ).

An essential element of our formulation is that Schwartz’ space of test function S (Rd ) isnuclear (see Section 3.1.3), as required by the Minlos-Bochner theorem (Theorem 5). Thelatter expresses the one-to-one correspondence (in the form of an infinite-dimensionalFourier pair) between the characteristic form cPs : S (Rd ) ! C and the measure Ps onS 0(Rd ) that uniquely characterizes the generalized process s. The truly powerful aspectof the theorem is that it suffices to check that cPs satisfies the three defining conditions—positive definiteness, continuity over S (Rd ), and normalization—to prove that it is a validcharacteristic form, which then automatically ensures the existence of the process sincethe corresponding measure over S 0(Rd ) is well defined.

The formulation of our generative model (4.22) in that context is

cPs (') = cPL°1w (') = cPw (L°1§'), (4.24)

where cPw (') is the characteristic form of the innovation process w .

Theorem 12 (Generalized innovation model). Let U = L°1§ be a linear operator that sat-isfies the two conditions

1. Left inverse property: UL§'=' for all' 2S (Rd ) where L§ is the adjoint of some given(whitening) operator L;

2. Stability: U is a continuous linear map from S (Rd ) into itself.

Then, the generalized stochastic process s that is fully characterized by cPs (') =exp

°R

Rd f°

L°1§'(r )¢

dr

¢

is well-defined and satisfies the innovation model Ls = w, wherew is a Lévy innovation with exponent f .

Moreover, when f is p-admissible (cf. Definition 8), the second condition can be substitutedby the much weaker Lp -stability requirement

kU'kLp <C2k'kLp , 8' 2S (Rd ). (4.25)

69




Proof. First, we prove that s is a bona fide generalized stochastic process over S 0(Rd )by showing that cPs (') is a continuous, positive-definite functional on S (Rd ) such thatcPs (0) = 1 (by the Minlos-Bochner theorem).

As for the first condition, we rely on the property that the Lévy noise functional cPw (') =exp

°R

Rd f°

'(r )¢

dr

¢

is continuous over S (Rd ) (Theorem 9). Moreover, when the Lévyexponent f is p-admissible, the positive definiteness and continuity properties extendto Lp (Rd ) by Theorem 20, which will be proven in Chapter 8. By construction, cPw is acontinuous functional on S (Rd )

°

resp., Lp (Rd )¢

. This, together with the assumption thatL°1§ is a continuous operator on S (Rd ), implies that the composed functional cPs (') =cPw (L°1§') is continuous on S (Rd ). Likewise, when f is p-admissible, the Lp stability

condition implies that L°1§ is a continuous operator S (Rd ) ! Lp (Rd ), so that the samereasoning is applicable as well—see the triangular diagram in Figure 3.1 with X =S (Rd )and Y = Lp (Rd ).

Next, for any given set of functions '1, . . . ,'N 2 S (Rd ) and coefficients ª1, . . . , ªN 2 C, wehave

NX

m=1

NX

n=1

cPs ('m °'n)ªmªn

=NX

m=1

NX

n=1

cPw°

L°1§('m °'n)¢

ªmªn

=NX

m=1

NX

n=1

cPw (L°1§'m °L°1§'n)ªmªn (by linearity)

∏0, (by the positivity of cPw )

which shows that cPs is positive definite on S (Rd ). Finally, cPs (0) = cPw (L°1§0) =cPw (0) = 1, which completes the first part of the proof.

The above result and stability conditions ensure that the action of the inverse operatorL°1 is well-defined over the relevant subset of tempered distributions, which justifies theformal manipulation made in (4.22). Now, if L°1§ is a proper left inverse of L§, we havethat

h', wi= hL°1§L§| {z }

Id

', wi= hL§',L°1w| {z }

s

i= h',Lsi,

which proves that the generalized process s = L°1w satisfies (4.21).

The next chapters are devoted to the investigation of specific instances of this model andto making sure that the conditions for existence in Theorem 12 are met. We shall alsodiscuss the conceptual connection with splines and wavelets. This connection is funda-mental to our purpose. In Chapter 9, we shall then use the generalized innovation modelto show that the primary statistical features of the input noise are essentially transferredto the signal as well as to the transform domain. The main point is that the marginal dis-tributions are all part of the same infinitely divisible family as long as the composition ofthe mixing procedure (L°1) and the signal analysis remains linear. On the other hand, theamount of coupling and level of interdependence will strongly depend on the nature ofthe transformation.

4.6 Notes

Infinitely divisible distributions were introduced by de Finneti and their primary proper-ties established by Kolmogorov, Lévy, and Khintchine in the 1930s [BDR02, SVH03].

70



4.6. Notes

We have chosen to name Theorem 6 after Lévy and Schoenberg because it essentiallyresults from the combination of two fundamental theorems in harmonic analysis namedafter these authors [Sch38]. While their groundwork dates back to the 1930s, it took untilthe late 1960s to reformulate the Lévy-Khintchine characterization in terms of the (con-ditional) positive definiteness of the exponent [Joh66].

71



Chapter 5

Operators and their inverses

In this chapter we review three classes of linear shift-invariant (LSI) operators: convolu-tion operators with stable LSI inverses, operators that are linked with ordinary differentialequations, and fractional operators.

The first class, considered in Section 5.2, is composed of the broad family of multidi-mensional operators whose inverses are stable convolution operators—or filters. Con-volution operators play a central role in signal processing. They are easy to characterizemathematically via their their impulse response. The corresponding generative model forstochastic processes amounts to LSI filtering of a white noise, which automatically yieldsstationary processes.

Our second class is the 1-D family of ordinary differential operators with constant coeffi-cients, which is relevant to a wide range of modeling applications. In the “stable” scenario,reviewed in Section 5.3, these operators admit stable LSI inverses on S 0 and are thereforeincluded in the previous category. On the other hand, when the differential operators inquestion have zeros on the imaginary axis (the marginally stable case), they find a non-trivial null-space in S 0, which consists of (exponential) polynomials. This implies thatthey are no longer unconditionally invertible on S 0, and that we can at best identify left-or right-side inverses, which should additionally fulfill appropriate “boundedness” re-quirements in order to be usable in the definition of stochastic processes. However, as weshall see in Section 5.4, obtaining an inverse with the required boundedness properties isfeasible but requires giving up shift-invariance. As a consequence, stochastic processesdefined by these operators are generally non-stationary.

The third class of LSI operators, investigated in Section 5.5, consists of fractional deriv-atives and/or Laplacians in one and several dimensions. Our focus is on the family oflinear operators on S that are simultaneously homogeneous (scale-invariant up to ascalar coefficient) and invariant under shifts and rotations. These operators are intim-ately linked to self-similar processes and fractals [BU07, TVDVU09]. Once again, find-ing a stable inverse operator to be used in the definition of self-similar processes posesa mathematical challenge since the underlying system is inherently unstable. The dif-ficulty is evidenced by the fact that statistical self-similarity is generally not compatiblewith stationarity, which means that a non-shift-invariant inverse operator needs to beconstructed. Here again, a solution may be found by extending the approach used forthe previous class of operators. From our first example in Section 5.1, we shall actuallysee that one is able to reconcile the classical theory of stationary processes with that ofself-similar ones by viewing the latter as a limit case of the former.

Before we begin our discussion of operators, let us formalize some notions of invariance.

75



5. OPERATORS AND THEIR INVERSES

Definition 15 (Translation invariance). An operator T is shift- (or translation-) invariantif and only if, for any function ' in its domain and any r0 2Rd ,

T{'(·° r0)}(r ) = T{'}(r ° r0).

Definition 16 (Scale invariance). An operator T is scale-invariant (homogeneous) of order∞ if and only if, for any function ' in its domain,

T{'}(r /a) = |a|∞T{'(·/a)}(r ),

where a 2R+ is the dilation factor.

Definition 17 (Rotation invariance). An operator T is scalarly rotation-invariant if andonly if, for any function ' in its domain,

T{'}(RTr ) = T{'(RT·)}(r ),

where R is any orthogonal matrix in Rd£d (by using orthogonal matrices in the definition,we take into account both proper and improper rotations, with respective determinants 1and °1).

5.1 Introductory example: first-order differential equation

To fix ideas, let us consider the generic first-order differential operator L = D°ÆId withÆ 2C, where D = d

dr and Id are the derivative and identity operators, respectively. Clearly, L isLSI, but generally not scale-invariant unless Æ= 0. The corresponding linear system with(deterministic or stochastic) output s and input w is defined by the differential equation:

ddr s(r )°Æs(r ) = w(r ). Under the classical stability assumption Re(Æ) < 0 (pole in the left

half of the complex plane), its impulse response is given by

ΩÆ(r ) =F°1Ω

1j!°Æ

æ

(r ) = 1+(r )eÆr (5.1)

and is rapidly decaying. This provides us with an explicit characterization of the inverseoperator, which reduces to a simple convolution with a decreasing causal exponential:

(D°ÆId)°1'= ΩÆ§'.

Likewise, it is easy to see that the corresponding adjoint inverse operator L°1§ is specifiedby

(D°ÆId)°1§'= Ω_Æ §',

where Ω_Æ(r ) = ΩÆ(°r ) is the time-reversed version of ΩÆ. Thanks to its rapid decay, Ω_

Æ

defines a continuous linear translation-invariant map from S (R) into itself.

This allows us to express the solution of the first-order SDE as a filtered version of theinput noise sÆ = ΩÆ §w . It follows that sÆ is a stationary process that is completely spe-cified by its characteristic form cPsÆ (') = cPw (Ω_

Æ§'), where cPw is the Lévy-white-noisefunctional of the input (see Section 4.4).

Let us now focus our attention on the limit caseÆ= 0, which yields an operator L = D thatis scale-invariant. Here too, it is possible to specify the LSI inverse (integrator)

D°1'(r ) =Zr

°1'(ø) dø= ( +§')(r ),

76



5.1. Introductory example: first-order differential equation

�(t)

t

D�1�(t) =

� t

��(�)d�D�1��(t)

I0�(t) =

� t

0�(�)d�

I�0�(t)

t

t

(a)

(b)

(c)

(a)

(b)

Figure 5.1: Comparison of antiderivative operators. (a) Input signal. (b) Result of shift-invariant integrator and its adjoint. (c) Result of scale-invariant integrator I0 and its Lp -stable adjoint I§0 ; the former yields a signal that vanishes at the origin, while the latterenforces the decay of the output as t ! °1 at the cost of a jump discontinuity at theorigin.

whose output is well-defined pointwise when ' 2S (R). The less-favorable aspect is thatthe classical LSI integrator does not fulfill the usual stability requirement due to non-integrability of its impulse response + › L1(R). This implies that D°1§'= _

+ §' is gen-erally not in Lp (R). Thus, we are no longer fulfilling the admissibility condition in The-orem 15. The source of the problem is the lack of decay of D°1§'(r ) as r ! °1 whenR

R'(ø) dø= '(0) 6= 0 (see Figure 5.1b). Fortunately, this can be compensated by definingthe modified antiderivative operator

I§0'(r ) =Z1

r'(ø) dø° 1+(°r )'(0) = ( _

+§')(r )°Z

R'(ø) dø _

+(r )

= D°1§'(r )° (D°1§')(°1) _+(r ) (5.2)

which happens to be the only left inverse of D§ = °D that is both scale-invariant andLp -stable for any p > 0. The adjoint of I§0 specifies the adjusted 1 integrator

I0'(r ) =Zr

0'(ø) dø= D°1'(r )° (D°1')(0), (5.3)

which is the correct scale-invariant inverse of D for our formulation, to be applied to ele-ments of S 0(R). It follows that the solution of the corresponding (unstable) SDE can be

1. Any two valid right inverses can only differ by a component (constant) that is in the null space of theoperator. The scale-invariant solution is the one that forces the output to vanish at the origin.

77




expressed as s = I0w , which is a well-defined stochastic process as long as the input noiseis Lévy p-admissible with p ∏ 1 (by Theorem 15). We note that the price to pay for the sta-bilization of the solution is to give up on shift invariance. Indeed, the adjusted integratoris such that it imposes the boundary condition s(0) = (I0w)(0) = 0 (see Figure 5.1c), whichis incompatible with stationarity, but a necessary condition for self-similarity. The so-constructed processes are fully specified by their characteristic form exp(

R

R f (I0'(r )) dt ),where f is a Lévy exponent. Based on this representation, we can show that these areequivalent to the Lévy processes that are usually defined for r ∏ 0 only [Sat94]. While thisconnection with the classical theory of Lévy processes is already remarkable, it turns outthat the underlying principle is quite general and applicable to a much broader family ofoperators, provided that we can ensure Lp -stability.

5.2 Shift-invariant inverse operators

The association of LSI operators with convolution integrals will be familiar to most read-ers. In effect, we saw in Section 3.3.5 that, as a consequence of the Schwartz kernel the-orem, every continuous LSI operator L : S (Rd ) !S 0(Rd ) corresponds to a convolution

L :' 7! L{±}§'

with a kernel (impulse response) L{±} 2 S 0(Rd ). 2 Moreover, by the convolution-productrule, we may also characterize L by a multiplication in the Fourier domain:

L :' 7!F°1{bL(!)b'(!)}.

We call bL the Fourier multiplier or symbol associated with the operator L.

From the Fourier-domain characterization of L, we see that, if bL is smooth and does notgrow too fast, then L maps S (Rd ) back into S (Rd ). This is in particular true if L{±} (theimpulse response) is an ordinary locally integrable function with rapid decay. It is alsotrue if L is a 1-D linear differential operator with constant coefficients, in which case L{±}is a finite sum of derivatives of the Dirac distribution and the corresponding Fourier mul-tiplier bL(!) is a polynomial in j!.

For operators with smooth Fourier multipliers that are nowhere zero in Rd and not de-caying (or decaying slowly) at 1, we can define the inverse L°1 : S (Rd ) ! S (Rd ) of Lby

L°1 :' 7!F°1Ω

b'(!)bL(!)

æ

.

This inverse operator is also linear and shift-invariant, and has the convolution kernel

ΩL =F°1Ω

1bL(!)

æ

, (5.4)

which is in effect the Green’s function of the operator L. Thus, we may write

L°1 :' 7! ΩL §'.

For the cases in which bL(!) vanishes at some points, its inverse 1/bL(!) is not in general alocally-integrable function, but even in the singular case we may still be able to regularizethe singularities at the zeros of bL(!) and obtain a singular “generalized function” whoseinverse Fourier transform, per (5.4), once again yields a convolution kernel ΩL that is a

2. A similar result holds for continuous LSI operators D !D0, which will have a convolution kernel in D0.

78



5.2. Shift-invariant inverse operators

Green’s function of L. The difference is that in this case, for an arbitrary ' 2S (Rd ), ΩL §'may no longer belong to S (Rd ).

As in our introductory example, the simplest scenario occurs when the inverse operatorL°1 is shift-invariant with an impulse response ΩL that has sufficient decay for the systemto be BIBO-stable (bounded input, bounded output).

Proposition 11. Let L°1'(r ) = (ΩL §')(r ) =R

Rd ΩL(r

0)'(r ° r

0) dr

0 with ΩL 2 L1(Rd ) (or,more generally, where ΩL is a complex-valued Borel measure of bounded variation). Then,L°1 and its adjoint specified by L°1§'(r ) = (Ω_

L §')(r ) =R

Rd ΩL(°r

0)'(r ° r

0) dr

0 are bothLp -stable in the sense that

kL°1'kLp ∑ kΩLkL1 k'kLp

kL°1§'kLp ∑ kΩLkL1 k'kLp

for all p ∏ 1.

The result follows from Theorem 4. For the sake of completeness, we shall establish thebound based on the two extreme cases p = 1 and p =+1.

Proof. To obtain the L1 bound, we manipulate the norm of the convolution integral as

kΩL §'kL1 =Z

Rd

Ø

Ø

Ø

Ø

Z

RdΩL(r )'(r

0 ° r ) dr

Ø

Ø

Ø

Ø

dr

0

∑Z

Rd

Z

Rd

Ø

ØΩL(r )'(s)Ø

Ø dr ds (change of variable s = r

0 ° r )

=Z

Rd

Ø

ØΩL(r )Ø

Ø dr

Z

Rd

Ø

Ø'(s)Ø

Ø ds = kΩLkL1 k'kL1 ,

where the exchange of integrals is justified by Fubini’s theorem. The corresponding L1bound follows from the simple pointwise estimate

Ø

Ø(ΩL §')(r )Ø

Ø∑ k'kL1

Z

Rd

Ø

ØΩL(r

0)Ø

Ø dr

0 = kΩLkL1 k'kL1

for any r 2 Rd . The final step is to invoke the Riesz-Thorin theorem (Theorem 3) whichyields Young’s inequality for Lp functions

°

see (3.7)¢

, and hence proves the desired resultfor 1 ∑ p ∑1. The inequality also applies to the adjoint operator since the latter amountsto a convolution with the reversed impulse response Ω_

L (r ) = Ω(°r ) 2 L1(Rd ).

Note that the L1 condition in Proposition 11 is the standard hypothesis that is made inthe theory of linear systems to ensure the BIBO stability of an analog filter. It is slightlystronger than the TV condition in Theorem 4, which is necessary and sufficient for bothBIBO (p =1) and L1 stabilities.

If, in addition, ΩL(r ) decays faster than any polynomial (e.g., is compactly supported ordecays exponentially), then we can actually ensure S -continuity so that there is no re-striction on the class of corresponding stochastic processes.

Proposition 12. Let L°1'(r ) = (ΩL §')(r ) with |ΩL(r )|∑ Cn1+kr kn for all n 2Z+ and r 2 Rd .

Then, L°1 and L°1§ are S -continuous in the sense that ' 2 S ) L°1',L°1§' 2 S withboth operators being bounded.

The key here is that the convolution with ΩL preserves the rapid decay of the test function'. The degree of smoothness of the output is not an issue because, for non-constantfunctions, the convolution operation commutes with differentiation.

79




The good news is that the entire class of stable 1D differential systems with rational trans-fer functions and poles in the left half of the complex plane falls into the category of Pro-position 12. The application of such operators provides us with a convenient mechanismfor solving ordinary differential equations, as detailed in the next section.

The S -continuity property is important for our formulation. It also holds for all shift-invariant differential operators whole impulse response is a point distribution; e.g.,(D° Id){±} = ±0 °±. It is preserved under convolution, which justifies the factorizationof operators into simpler constituents.

5.3 Stable differential systems in 1-D

The generic form of a linear shift-invariant differential equation in 1-D with (determin-istic or random) output s and driving term w is

NX

n=0anDn s =

MX

m=0bmDm w (5.5)

where the an and bm are arbitrary complex coefficients with the normalization constraintaN = 1. Equation (5.5) thus covers the general 1-D case of Ls = w where L is a shift-invariant operator with the rational transfer function

bL(!) = (j!)N +aN°1(j!)N°1 +·· ·+a1(j!)+a0

bM (j!)M +·· ·+b1(j!)+b0= pN (j!)

qM (j!). (5.6)

The poles of the system, which are the roots of the characteristic polynomial pN (≥) = ≥N +aN°1≥

N°1 + ·· ·+ a0 with Laplace variable ≥ 2 C, are denoted by {Æn}Nn=1. In the standard

causal-stable scenario where Re(Æn) < 0 for n = 1, . . . , N , the solution is obtained as

s(r ) = L°1w(r ) = (ΩL §w)(r )

where ΩL is the causal Green function of L specified by (5.4).

In practice, the determination of ΩL is based on the factorization of the transfer functionof the system as

1bL(!)

= qM (j!)NY

n=1

1j!°Æn

(5.7)

= bM

QMm=1(j!°∞m)

QNn=1(j!°Æn)

, (5.8)

which is then broken into simple constituents, either by serial composition of first-orderfactors or by decomposition into simple partial fractions. We are providing the fully fac-torized form (5.8) of the transfer function to recall the property that a stable N th-ordersystem is completely characterized by its poles {Æn}N

n=1 and zeros {∞m}Mm=1, up to the pro-

portionality factor bM .

Since the S -continuity property is preserved through the composition of convolutionoperators, we shall rely on (5.8) to factorize L°1 into elementary operators. To that end, weshall study the effect of simple constituents (first-order differential operators with stableinverses) before considering their composition into higher-order operators. We shall alsotreat the leading polynomial factor qM (j!) in (5.7) separately because it corresponds toa convolution operator whose impulse response is the point distribution

PMm=0 bm±

(m).The latter is S -continuous, irrespective of the choice of coefficients bm

°

or, equivalently,the zeros ∞m in (5.8)

¢

.

80



5.3. Stable differential systems in 1-D

5.3.1 First-order differential operators with stable inverses

The first-order differential operator

PÆ = D°ÆId

corresponds to the convolution kernel (impulse response) (±0 °Æ±) and Fourier multiplier(j!°Æ). For Re(Æ) 6= 0 (the stable case in signal-processing parlance), the inverse of theFourier multiplier, (j!°Æ)°1, is non-zero for all ! 2 R, with the well-defined Fourier in-verse

ΩÆ(r ) =F°1Ω

1j!°Æ

æ

(r ) =(

eÆr[0,1)(r ) if Re(Æ) < 0,

°eÆr(°1,0](r ) if Re(Æ) > 0.

Clearly, in either of the causal (Re(Æ) < 0) or anti-causal (Re(Æ) > 0) cases, these func-tions decay rapidly at infinity, and convolutions with them map S (R) back into itself.Moreover, a convolution with ΩÆ inverts PÆ : S (R) ! S (R) both from the left and theright, a reason for which we can write

P°1Æ '= ΩÆ§' (5.9)

for Re(Æ) 6= 0.

Since PÆ and P°1Æ ,Re(Æ) 6= 0, are both S -continuous (continuous from S (R) into S (R)),

their action can be transferred to the space S 0(R) of Schwartz distributions by identifyingthe adjoint operator P§

Æ with (°P°Æ), in keeping with the identity

hPÆ',¡i=°h',P°Æ¡i

on S (R). We recall that, in this context, h',¡i denotes the bilinear formR

R'(r )¡(r ) dr ,not the Hermitian product.

5.3.2 Higher-order differential operators with stable inverses

As noted earlier, the preservation of continuity by composition facilitates the study of dif-ferential systems by permitting us to decompose higher-order operators into first-orderfactors. Specifically, let us consider the equivalent factorized representation of the N th-order differential equation (5.5) given by

PÆ1 · · ·PÆN {s}(r ) = qM (D){w}(r ) (5.10)

where {Æn}Nn=1 are the poles of the system and where qM (D) = PM

m=0 bmDm is the Mth-order differential operator acting on the right-hand side of (5.5). Under the assumptionthat Re(Æn) 6= 0 (causal or anti-causal stability), we can invert the operators acting on theleft-hand side of (5.10), which allows us to express the solution of the differential equationas

s(r ) = P°1ÆN

· · ·P°1Æ1

qM (D)| {z }

L°1

{w}(r ). (5.11)

This translates into the following formulas for the corresponding inverse operator L°1:

L°1 = P°1ÆN

· · ·P°1Æ1

qM (D)

= bM P°1ÆN

· · ·P°1Æ1

P∞1 · · ·P∞N ,

which are consistent with (5.7) and (5.8), respectively. These operator-based manipu-lations are legitimate since all the elementary constituents in (5.10) and (5.11) are S -continuous convolution operators. We also recall that all Lp -stable and, a fortiori S -continuous, convolution operators satisfy the properties of commutativity, associativity,

81




and distributivity, so that the ordering of the factors in (5.10) and (5.11) is immaterial. In-terestingly, this divide-and-conquer approach to the problem of inverting a differentialoperator is also extendable to the unstable scenarios (with Re(Æn) = 0 for some values ofn), the main difference being that the ordering of operators becomes important (partialloss of commutativity).

5.4 Unstable N th-order differential systems

Classically, a differential system is categorized as being unstable when some of its polesare in the right complex half-plane, including the imaginary axis. Mathematically, there isno fundamental reason for excluding the cases Re(Æn) > 0 because one can simply switchto an anti-causal configuration which preserves the exponential decay of the response, aswe did in defining these inverses in Section 5.3.1.

The only tricky situation occurs for purely-imaginary poles of the form Æm = j!0 with!0 2R, to which we now turn our attention.

5.4.1 First-order differential operators with unstable shift-invariant inverses

Once again, we begin with first-order operators. Unlike the case of Re(Æ) 6= 0, for purelyimaginary Æ= j!0, Pj!0 is not a surjective operator from S (R) to S (R) (meaning it mapsS (R) into a proper subset of itself, not into the entire space), because it introduces afrequency-domain zero at !0. This implies that we cannot find an operator U : S (R) !S (R) that is a right inverse of Pj!0 . In other words, we cannot fulfill Pj!0 U' = ' for all' 2S (R) subject to the constraint that U' 2S (R).

If we now consider Pj!0 as an operator on S 0(R)°

as the adjoint of (°P°j!0 )¢

, then thisoperator is not one-to-one. More precisely, it has a non-trivial null-space consisting ofmultiples of the complex sinusoid ej!0x . Consequently, on S 0(R), Pj!0 does not have a leftinverse U0 : S 0(R) !S 0(R) with U0Pj!0 f = f for all f 2S 0(R).

The main conclusion of concern to us is that Pj!0 : S (R) ! S (R) and its adjoint P§j!0

:S 0(R) ! S 0(R) are not invertible in the usual sense of the word (i.e., from both sides).However, we are able to properly invert Pj!0 on its image (range), as we discuss now.

Let Sj!0 denote the image of S (R) under Pj!0 . This is the same as the subspace of S (R)consisting of functions ' for which 3

Z+1

°1e°j!0r'(r ) dr = 0.

In particular, for j!0 = 0, we obtain S0, the space of Schwartz test functions with van-ishing 0th-order moment. We may then view Pj!0 as an operator S (R) ! Sj!0 , and thisoperator now has an inverse P°1

j!0from Sj!0 !S (R) defined by

P°1j!0'(r ) = (Ωj!0 §')(r ), (5.12)

where

Ωj!0 (r ) =F°1Ω

1j!° j!0

æ

(r ) = 12

sign(r )ej!0r . (5.13)

3. To see this, note that (D° j!0Id)'(r ) = ej!0r D{e°j!0r'(r )}.

82



5.4. Unstable N th-order differential systems

Specifically, this LSI operator satisfies the right and left inverse relations

Pj!0 P°1j!0'=' for all ' 2Sj!0

P°1j!0

Pj!0'=' for all ' 2S (R).

In order to be able to use such inverse operators for defining stochastic processes, weneed to extend P°1

j!0to all of S (R).

Note that unlike the case of P°1Æ with Re(Æ) 6= 0, here the extension of P°1

j!0to an operator

S (R) !S 0(R) is in general not unique. For instance, P°1j!0

may also be specified as

P°1j!0,+'(r ) = (Ωj!0,+§')(r ), (5.14)

with causal impulse response

Ωj!0,+(r ) = ej!0r+(r ), (5.15)

which defines the same operator on Sj!0 but not on S (R). In fact, we could as well havetaken any impulse response of the form Ωj!(r )+p0(r ), where p0(r ) = c0ej!0r is an oscillat-ory component that is in the null space of Pj!0 . By contrast, the Lp -continuous inversesthat we define below remain the same for all of these extensions. To convey the idea, weshall first consider the extension based on the causal operator P°1§

j!0,+ defined by (5.14). Its

adjoint is denoted by P°1§j!0,+ and amounts to an (anti-causal) convolution with Ω_

j!0,+.

To solve the stochastic differential equation Pj!0 s = w , we need to find a left inverse of theadjoint P§

j!0acting on the space of test functions, which maps S (R) into the required Lp

space (Theorem 15). The problem with the “shift-invariant” 4 extensions of the inversedefined above is that their image inside S 0(R) is not contained in arbitrary Lp spaces.For this reason, we now introduce a different, “corrected”, extension of the inverse of P§

j!0

that maps S (R) to R(R)—therefore, a fortiori, also into all Lp spaces. This corrected leftinverse, which we shall denote as I§!0

, is constructed as

I§!0'(r ) =P°1§

j!0,+'(r )°µ

limy!°1

P°1§j!0,+'(y)

∂

Ω_j!0,+(r )

=(Ω_j!0,+§')(r )° '(°!0)Ω_

j!0,+(r ) (5.16)

in direct analogy with (5.2). As in our introductory example, the role of the second termis to remove the tail of P°1

j!0,+'(r ). This ensures that the output decays fast enough tobelong to R(R). It is not difficult to show that the convolutional definition of I§!0

given by(5.16) does not depend on the specific choice of the impulse response within the class ofadmissible LSI inverses of Pj!0 on Sj!. Then, we may simplify the notation by writing

I§!0'(r ) =(Ω_

j!0§')(r )° '(°!0)Ω_

j!0(r ), (5.17)

for any Green’s function Ωj!0 of the operator Pj!. While the left inverse operator I§!0fixes

the decay, it fails to be a right inverse of P§j!0

unless ' 2Sj!0 or, equivalently, '(°!0) = 0.

The corresponding right inverse of Pj!0 is provided by the adjoint of I§!0. It is identified via

the scalar product manipulation

h', I§!0¡i= h',P°1§

j!0¡i° ¡(°!0)h',Ω_

j!0i (by linearity)

= hP°1j!0',¡i°hej!0r ,¡i (P°1

j!0')(0) (using (5.12))

= hP°1j!0',¡i°hej!0r (P°1

j!0')(0),¡i.

4. These operators are shift-invariant because they are defined by means of convolutions.

83




Since the above is equal to hI!0',¡i by definition, we find that

I!0'(r ) = P°1j!0'(r )°ej!0r (P°1

j!0')(0)

= (Ωj!0 §')(r )°ej!0r (Ωj!0 §')(0), (5.18)

where Ωj!0 is defined by (5.13). The specificity of I!0 is to impose the boundary conditions(0) = 0 on the output s = I!0', irrespective of the input function '. This is achieved bythe addition of a component that is in the null space of Pj!. This also explains why wemay substitute Ωj!0 in (5.18) by any other Green’s function of Pj!, including the causalone given by (5.15).

In particular, for j!0 = 0 (that is, for Pj!0 = D), we have

I0'(r ) =Zr

°1'(t ) dt °

Z0

°1'(t ) dt =

(

Rr0 '(t ) dt r ∏ 0

°R0

r '(t ) dt r < 0,

while the adjoint is given by

I§0'(r ) =Z1

r'(t ) dt ° (°1,0](r )

Z1

°1'(t ) dt =

(

R1r '(t ) dt r ∏ 0

°Rr°1'(t ) dt r < 0.

These are equivalent to the solution described in Section 5.1 (see Figure 5.1).

The Fourier-domain counterparts of (5.18) and (5.17) are

I!0'(r ) =Z

R'(!)

ej!r °ej!0r

j!° j!0

d!2º

, (5.19)

I§!0'(r ) =

Z

R

'(!)° '(°!0)°j!° j!0

ej!r d!2º

, (5.20)

respectively. One can observe that the form of the numerator in both integrals is such thatit tempers the singularity of the denominator at !=!0. The relevance of these correctedinverse operators for the construction of stochastic processes is due to the following the-orem.

Theorem 16. The operator I§!0defined by (5.17) is continuous from S (R) to R(R) and

extends continuously to a linear operator R(R) ! R(R). It is the dual of the operator I!0

defined by (5.18) in the sense that hI!0¡,'i= h¡, I§!0'i. Moreover,

I!0'(0) = 0 (zero boundary condition)

I§!0(D° j!0Id)§'=' (left-inverse property)

(D° j!0Id)I!0¡=¡ (right-inverse property)

kI§!0'kLp ∑Cpk'kLp (Lp -stability)

for all ',¡ 2S (R).

The first part of the theorem follows from Proposition 13 below, which indicates that I§!0

preserves rapid decay. The statements in the second part have either already been dis-cussed or, as in the case of the penultimate one, are a direct consequence of the first part.The left-inverse property, for instance, follows from the fact that (D° j!0Id)§' 2 S°j!0 ,which is the subspace of S (R) for which all the inverses of P§

j!0= °P°j!0 are equivalent.

The right-inverse property of I!0 is easily verified by applying Pj!0 to (5.18).

To qualify the rate of decay of functions, we rely on the L1,Æ norm, defined as

k'kL1,Æ = esssupr2R

Ø

Ø'(r )(1+|r |Æ)Ø

Ø .

84




Hence, the inclusion ' 2 L1,Æ(R) is equivalent to

|'(r )|∑k'kL1,Æ

1+|r |Æ a.e.,

which is to say that ' has an algebraic decay of order Æ. We also recall that R(R) is thespace of rapidly-decaying functions, which is the intersection of all L1,Æ(R) spaces withÆ> 0. The relevant embedding relations are S (R) ΩR(R) Ω L1,Æ(R) Ω Lp (R) for any Æ> 1and p ∏ 1. Moreover, since S (R) has the strictest topology in the chain, a sequence thatconverges in S (R) is also convergent in R(R), L1,Æ(R), or Lp (R).

Proposition 13. Let I§!0be the linear operator defined by (5.16). Then, for all ' 2 L1,Æ(R)

with Æ> 1, there exists a constant C such that

kI§!0'kL1,Æ°1 ∑C k'kL1,Æ .

Hence, I§!0is a continuous operator from L1,Æ(R) into L1,Æ°1(R) and, by restriction of its

domain, from R(R) !R(R) or S (R) !R(R).

Proof. For r < 0, we rewrite (5.16) as

I§!0'(r ) = I§!0

'(r )°e°j!0r '(°!0)

=Z+1

re°j!0(r°ø)'(ø) dø°e°j!0r

Z1

°1ej!0ø'(ø) dø

=°e°j!0rZr

°1ej!0ø'(ø) dø.

This implies that

|I§!0'(r )| =

Ø

Ø

Ø

Ø

Zr

°1ej!0ø'(ø) dø

Ø

Ø

Ø

Ø

∑Zr

°1

Ø

Ø'(ø)Ø

Ø dø

∑Zr

°1

k'kL1,Æ

1+|ø|Æ dø∑µ

2ÆÆ°1

∂ k'kL1,Æ

1+|r |Æ°1

for all r < 0. For r > 0, I§!0'r =

R1r e°j!0(r°ø)'(ø) dø so that the above upper bound re-

mains valid.

While I§!0is continuous over R(R), it is not shift-invariant. Moreover, it will generally

spoil the global smoothness of the functions in S (R) to which it is applied due to thediscontinuity at the origin that is introduced by the correction. By contrast, its adjointI!0 preserves the smoothness of the input but fails to return functions that are rapidlydecaying at infinity. This lack of shift-invariance and the slow growth of the output atinfinity appear to be the price to pay for being able to solve unstable differential systems.

5.4.2 Higher-order differential operators with unstable shift-invariantinverses

Given that the operators I§!0,!0 2R, defined in Section 5.4.1 are continuous R(R) !R(R),

they may be composed to obtain higher-order continuous operators R(R) ! R(R) thatserve as left inverses of the corresponding higher-order differential operators in the sense

85




of the previous section. More precisely, given (!1, . . . ,!K ) 2 RK , we define the compositeintegration operator

I(!1:!K ) = I!1 ± · · ·± I!K (5.21)

whose adjoint is given by

I§(!1:!K ) =°

I!1 ± · · ·± I!K

¢§ = I§!K± · · ·± I§!1

. (5.22)

I§(!1:!K ), which maps R(R)°

and therefore S (R) ΩR(R)¢

continuously into R(R), is then aleft inverse of

P§(j!K :j!1) = P§

j!1± · · ·±P§

j!K,

with PÆ = D°ÆId and P§Æ =°P°Æ. Conversely, I(!K :!1) is a right inverse of

P(j!1:j!K ) = Pj!1 ± · · ·±Pj!K .

Putting everything together, with the definitions of Section 5.3.2, we arrive at the followingcorollary of Theorem 16:

Corollary 3. For Æ = (Æ1, . . . ,ÆM ) 2 CM with Re(Æn) 6= 0 and (!1, . . . ,!K ) 2 RK , the (M +K )th-order operator P°1§

(ÆM :Æ1)I§(!K :!1) maps S (R) continuously into R(R). It is a left inverse

of P§(j!1:j!K )P

§(Æ1:ÆM ) in the sense that

P°1§(ÆM :Æ1)I

§(!K :!1)P

§(j!1:j!K )P

§(Æ1:ÆM )'='

for all ' 2S (R).

We shall now use this result to solve the differential equation (5.10) in the non-stable scen-ario. To that end, we order the poles in such a way that the unstable ones come last withÆN°K+m = j!m , 1 ∑ m ∑ K , where K is the number of purely imaginary poles. We thusspecify the right-inverse operator

L°1 = I(!K :!1)P°1(ÆN°K :Æ1)qM (D),

which we then apply to w to obtain the solution s = L°1w . In effect , by applying I(!K :!1)last, we are also enforcing the K linear boundary conditions

8

>

>

>

>

>

>

>

<

>

>

>

>

>

>

>

:

s(0) = 0

Pj!K {s}(0) = 0...

Pj!2 · · ·Pj!K {s}(0) = 0.

(5.23)

To show that s = L°1w is a consistent solution, we proceed by duality and write

h',P(Æ1:ÆN°K )P(j!1:j!K )si= h', P(Æ1:ÆN°K )P(j!1:j!K )L°1wi

= hP§(j!1:j!K )P

§(Æ1:ÆN°K )', I(!K :!1)P

°1(ÆN°K :Æ1)qM (D)wi

= hP°1§(ÆN°K :Æ1)I

§(!K :!1)P

§(j!1:j!K )P

§(Æ1:ÆN°K )

| {z }

Id

', qM (D)wi

= h', qM (D)wi,

where we have made use of Corollary 3. This proves that s satisfies the differential equa-tion (5.10) with driving term w , subject to the boundary conditions (5.23).

86




5.4.3 Generalized boundary conditions

In the resolution method presented so far, the inverse operator I!0 was designed to im-pose zero boundary conditions at the origin. In more generality, one may consider inverseoperators I!0,'0 that incorporate conditions of the form

h'0, I!0,'0 wi= 0 (5.24)

on the solution s = I!0,'0 w . This leads to the definition of the right-inverse operator (see[UTAKed, Appendix A])

I!0,'0'(r ) =(Ωj!0 §')(r )°ej!0r hΩj!0 §','0i'0(°!0)

, (5.25)

where Ωj!0 is a Green’s function of Pj!0 and '0 is some given rapidly-decaying functionsuch that '0(°!0) 6= 0. In particular, if we set !0 = 0 and '0 = ±, we recover the scale-invariant integrator I0 = I0,± that was used in our introductory example (Section 5.1) toprovide the connection with the classical theory of Lévy processes. The Fourier-domaincounterpart of (5.25) is

I!0,'0'(r ) =Z

R'(!)

0

@

ej!r °ej!0r '0(°!)'0(°!0)

j(!°!0)

1

A

d!2º

. (5.26)

The above operator is well-defined pointwise for any ' 2 L1(R). Moreover, it is a rightinverse of (D° j!0Id) on S (R) because the regularization in the numerator amounts toa sinusoidal correction that is in the null space of the operator. The adjoint of I!0,'0 isspecified by the Fourier-domain integral

I§!0,'0'(r ) =

Z

R

0

@

'(!)° '(°!0)'0(°!0) '0(!)

°j(!+!0)

1

Aej!r d!2º

, (5.27)

which is non-singular, thanks to the regularization in the numerator. The beneficial effectof this adjustment is that I§!0,'0

is R-continuous and Lp -stable, unlike its more conven-tional shift-invariant counterpart P°1§

j!0. The time-domain counterpart of (5.27) is

I§!0,'0'(r ) =(Ω_

j!0§')(r )° '(°!0)

'0(°!0)

°

'0 §Ω_j!0

¢

(r ), (5.28)

where Ωj!0 is a Green’s function of Pj!0 . This relation is very similar to (5.17), with thenotable difference that the second term is convolved by '0. This suggests that we canrestore the smoothness of the output by picking a kernel '0 with a sufficient degree ofdifferentiability. In fact, by considering a sequence of such kernels in S (R) that convergeto the Dirac distribution (in the weak sense), we can specify a left-inverse operator that isarbitrarily close to I§!0

and yet S -continuous.

While the imposition of generalized boundary conditions of the form (5.24) has some sig-nificant implication on the statistical properties of the signal (non-stationary behavior), itis less of an issue for signal processing because of the use analysis tools (wavelets, finite-difference operators) that stationarize these processes—in effect, filtering out the null-space components—so that the traditional tools of the trade remain applicable. There-fore, to simplify the presentation, we shall only consider boundary conditions at zero inthe sequel, and work with the operators I§!0

and I!0 .

87




5.5 Fractional-order operators

5.5.1 Fractional derivatives in one dimension

In one dimension, we consider the general class of all homogeneous and shift-invariantoperators and their inverses. These were studied in [UB07, BU07]. To motivate their defin-ition, let us recall that the nth-order derivative Dn corresponds to the Fourier multiplier(j!)n . This suggests the following fractional extension, going back to Liouville, wherebythe exponent n is replaced by a nonnegative real number ∞:

D∞'(r ) =Z

R(j!)∞'(r )ej!r d!

2º.

This definition is further generalized in the next proposition, which gives a complete char-acterization of scale-invariant convolution operators in 1-D.

Proposition 14 (see [UB07, Proposition 2]). The complete family of 1-D scale-invariantconvolution operators of order ∞∏ 0 is given by the fractional derivative @∞ø whose Fourier-based definition is

@∞ø'(r ) =

Z

R(j!)

∞+ø2 (°j!)

∞°ø2 '(!)ej!r d!

2º,

where '(!) is the 1-D Fourier transform of '(r ).

These operators are endowed with a semi-group property: they satisfy the composition

rule @∞ø@∞0

ø0 = @∞+∞0ø+ø0 for ∞,∞0 2 R+ and ø,ø0 2 R. The parameter ø is a phase factor that yields

a progressive transition between the purely causal derivative D∞ = @∞∞2

and its anti-causal

counterpart D∞§ = @∞

° ∞2

, which happens to be the adjoint of the former. We also note that

@0ø is equivalent to the fractional Hilbert transform operator Hø investigated in [CU10].

This implies that the fractional derivatives of order ∞, @∞ø = @∞∞2@0ø° ∞

2= D∞Hø° ∞

2, are all

related to D∞ via a fractional Hilbert transform, which is a unitary operator that essentiallyacts as a shift operator on the oscillatory part of a wavelet.

The Fourier multiplier associated with @∞ø is equal in amplitude to |!|∞. This observation,

in combination with Theorem 19, indicates that Lp -stable inverses of fractional derivat-ives are the ones given by Proposition 15.

Proposition 15. The Lp -stable left and right inverses of @∞ø , applied to test and generalizedfunctions, respectively, are given by

@°∞§°ø,p'(r ) = 1

2º

Z

R

b'(!)°P

k∑∞°1+ 1p

b'(k)(0)!k

k !

(°j!)∞2 °ø(j!)

∞2 +ø

ej!r d!

@°∞°ø,p'(r ) = 1

2º

Z

R

ej!r °P

k∑∞°1+ 1p

(j!r )k

k !

(°j!)∞2 °ø(j!)

∞2 +ø

b'(!) d!.

where b'(k)(0) = dkb'(!)

d!k

Ø

Ø

Ø

!=0

5.5.2 Fractional Laplacians

The fractional Laplacian of order ∞∏ 0 is defined by the inverse Fourier integral

(°¢)∞2 '(r ) =

Z

Rdk!k∞'(!)ejh!,r i d!

(2º)d,

88



5.5. Fractional-order operators

where '(!) is the d-dimensional Fourier transform of '(r ). For ∞ = 2, this characteriza-tion coincides with the classical definition of the negative Laplacian; °¢=°P

@2ri

.

In slightly more generality, we obtain a complete characterization of homogeneous shift-and rotation-invariant operators and their inverses in terms of convolutions with homo-geneous rotation-invariant distributions, as given in Theorem 17. The idea and defini-tions may be traced back to [Duc77, GS68, Hör80].

Theorem 17 ([Taf11], Corollary 2.ac). Any continuous linear operator S (Rd ) ! S 0(Rd )that is simultaneously shift- and rotation-invariant, and homogeneous or scale-invariantof order ∞ in the sense of Definition 7.5, is a multiple of the operator

L∞ :' 7! Ω°∞°d §'= (2º)d2 F°1 ©

Ω∞'™

where Ω∞, ∞ 2C, is the distribution defined by

Ω∞(!) = k!k∞

2∞2 °

°∞+d2

¢

(5.29)

with the property that

F {Ω∞}(!) = (2º)d2 Ω°∞°d (!)

(° denotes the Gamma function.)

Note that L∞ is self-adjoint, with L∞§ = L∞.

As is clear from the definition of Ω∞, for ∞ 6= °d ,°d ° 2,°d ° 4, . . ., L∞ is simply a renor-malized version of the fractional Laplacian introduced earlier. For ∞ = °d ° 2m, m =0,1,2,3, . . ., where the ° function in the denominator of Ω∞ has a pole, Ω∞ is proportional to(°¢)m±, while the previous definition of the fractional Laplacian without normalizationdoes not define a scale-invariant operator.

Also note that when Re(∞) > °d°

Re(∞) ∑ °d , respectively¢

, Ω∞ (its Fourier transform, re-spectively) is singular at the origin. This singularity is resolved in the manner describedin Appendix A, which is equivalent to the analytical continuation of the formula

¢Ω∞ = ∞Ω∞°2,

initially valid for Re(∞)° 2 > °d , in the variable ∞. For the details of the previous twoobservations we refer the reader to Appendix A and Tafti [Taf11, Section 2.2]. 5

Finally, it is important to remark that unlike integer-order operators, unless ∞ is a positiveeven integer, the image of S (Rd ) under L∞ is not contained in S (Rd ). Specifically, whilefor any ' 2 S , L∞' is always an infinitely differentiable regular function, in the case of∞ 6= 2m, m = 1,2, . . ., it may have slow (polynomial) decay or growth.

Put more simply, the fractional Laplacian of a Schwartz’ function is not in general aSchwartz’ function.

5.5.3 Lp -stable inverses

From the identity

Ω∞(!)Ω°∞(!) = 1

°° d+∞

2

¢

°° d°∞

2

¢

for! 6= 0,

5. The difference in factors of (2º)d2 and (2º)d between the formuals given here and in Tafti [Taf11] is due to

different normalizations used in the definition of the Fourier transform.

89




we conclude that up to normalization, L°∞ is the inverse of L∞ on the space of Schwartz’test functions with vanishing moments, in particular for Re(∞) > °d . 6 This inversefor Re(∞) > °d can further be extended to a shift-invariant left inverse of L∞ acting onSchwartz’ functions. However, as was the case in Section 5.4.1, this shift-invariant inversegenerally does not map S (Rd ) into a given Lp (Rd ) space, and is therefore not suitable fordefining generalized random fields in S 0(Rd ).

The problem exposed in the previous paragraph is once again overcome by defining a“corrected” left inverse. However, unlike previously, it is not possible here to have a singleleft-inverse operator that maps S (Rd ) into the intersection of all Lp (Rd ) spaces, p ∏ 1.Instead, we shall need to define a separate left-inverse operator for each Lp (Rd ) spacewe are interested in. Under the constraints of homogeneity and rotation invariance, such“non-shift-invariant” left inverses are identified in the following theorem.

Theorem 18 ([Taf11], Theorem 2.aq and Corollary 2.am). The operator

L°∞§p :' 7! Ω∞°d §'°

X

|k |∑bRe(∞)+ dp c°d

@k

Ω∞°d

k !

Z

Rdy

k'(y) dy (5.30)

with Re(∞)+ dp 6= 1,2,3, . . . is rotation-invariant and homogeneous of order (°∞) in the sense

of Definition 7.5. It maps S (Rd ) continuously into Lp (Rd ).

The adjoint of L°∞§p is given by

L°∞p :' 7! Ω∞°d §'°

X

|k|∑bRe(∞)+ dp c°d

r

k @k L°∞'(0)k !

.

If we exclude the cases where the denominator of (5.29) has a pole, we may normalize theabove operators to find left and right inverses corresponding to the fractional Laplacian(°¢)

∞2 . The next theorem gives an equivalent Fourier-domain characterization of these

operators.

Theorem 19 (cf. [SU12, Theorem 3.7]). Let I∞§p with p ∏ 1 and ∞> 0 be the isotropic frac-tional integral operator defined by

I∞§p '(r ) =Z

Rd

'(!)°X

|k |∑b∞+ dp c°d

@k '(0)!k

k !

k!k∞ ejh!,r i d!

(2º)d(5.31)

Then, under the condition that∞ 6= 2,4, . . . and∞+dp 6= 1,2,3, . . ., I∞§p is the unique left inverse

of (°¢)∞2 that is scale-invariant and Lp -stable in the sense of (4.25). The adjoint operator

I∞p , which is the proper scale-invariant right inverse of (°¢)∞2 , is given by

I∞p'(r ) =Z

Rd

ejh!,r i °X

|k |∑b∞+ dp c°d

j|k |r k!k

k !

k!k∞ '(!)d!

(2º)d. (5.32)

The fractional integral operators I∞p and I∞§p are both scale-invariant of order (°∞), but theyare not shift-invariant.

6. Here we exclude the cases where the Gamma functions in the denominator have poles, namely where theirargument is a negative integer. For details see [Taf11, 2.ah].

90



5.6. Discrete convolution operators

5.6 Discrete convolution operators

We conclude this chapter by providing a few basic results on discrete convolution oper-ators and their inverses. These will turn out to be helpful for establishing the existence ofcertain spline interpolators which are required for the construction of operator-like wave-let bases in Chapter 6 and for the representation of autocorrelation functions in Chapter7.

The convention in this book is to use square brackets to index sequences. This allows oneto distinguish them from functions of a continuous variable. In other words, h(·) or h(r )stands for a function defined on a continuum, while h[·] or h[k] denotes some discretesequence. The notation h[·] is often simplified to h when the context is clear.

The discrete convolution between two sequences h[·] and a[·] over Zd is defined as

(h §a)[n] =X

m2Zd

h[m]a[n °m]. (5.33)

This convolution describes how a digital filter with discrete impulse response h acts onsome input sequence a. If h 2 `1(Zd ), then the map a[·] 7! (h §a)[·] is a continuous oper-ator `p (Zd ) ! `p (Zd ). This follows from Young’s inequality for sequences:

kh §ak`p ∑ khk`1kak`p , (5.34)

where

kak`p =(

°

P

n2Zd |a[n]|p¢

1p for 1 ∑ p <1

supn2Zd |a[n]| for p =1.

Note that the condition h 2 `1(Zd ) is the discrete counterpart of the (more involved) TVcondition in Theorem 4. As in the continuous formulation, it is necessary and sufficientfor stability in the extreme cases p = 1,+1. Establishing (5.34) for those two cases is aroutine calculation—one can then invoke Theorem 3 (Riesz-Thorin) to interpolate thebound for 1 < p <+1. Another way of understanding the result is by observing that `1-stability implies all the other forms because of the embedding `p (Zd ) Ω `q (Zd ) for any1 ∑ p < q ∑1, which goes hand-in-hand with the basic norm inequality

kak`p ∏ kak`q ∏ kak`1

for all a 2 `p (Zd ) Ω `q (Zd ) Ω `1(Zd ).

In the Fourier domain, (5.33) maps into the multiplication of the discrete Fourier trans-forms of h and a as

b[n] = (h §a)[n] , B(!) = H(!)A(!)

where we are using capitalized letters to denote the discrete-time Fourier transforms ofthe underlying sequences. Specifically, we have that

B(!) =Fd{b}(!) =X

k2Zd

b[k]e°jh!,ki,

which is 2º-periodic, while the corresponding inversion formula is

b[n] =F°1d {B} [n] =

Z

[°º,º]dB(!)ejh!,ni d!

(2º)d.

The stability condition h 2 `1(Zd ) ensures that the frequency response H(!) of the digitalfilter h is bounded and continuous.

The task of specifying discrete inverse filters is greatly facilitated by a theorem, knownas Wiener’s lemma, which ensures that the inverse convolution operator is `p -stablewhenever the frequency response of the original filter is non-vanishing.

91




Theorem 20 (Wiener’s lemma). Let H(!) =P

k2Zd h[k]e°jh!,ki, with h 2 `1(Zd ), be a stablediscrete Fourier multiplier such that H(!) 6= 0 for ! 2 [°º,º]d . Then, G(!) = 1/H(!) hasthe same property in sense that it can be written as 1/H(!) = P

k2Zd g [k]e°jh!,ki with g 2`1(Zd ).

The so-defined filter g identifies a stable convolution inverse of h with the property that

(g §h)[n] = (h § g )[n] = ±[n]

where

±[n] =(

1 for n = 0

0 for n 2Zd \ {0}

is the Kronecker unit impulse.

5.7 Notes

The specification of Lp -stable convolution operators is a central topic in harmonic ana-lysis [SW71, Gra08]. The basic result in Proposition 11 relies on Young’s inequality withq = r = 1. The complete class of functions that result in S -continuous convolutionkernels is provided by the inverse Fourier transform of the space of smooth slowly-increasing Fourier multipliers which play a crucial role in the theory of generalized func-tions [Sch66]. They are an extension upon R(Rd ) in the sense that they also contain pointdistributions such as ± and its derivatives.

The operational calculus that is used for solving ordinary differential equations (ODEs)can be traced to Heaviside who also introduced the symbol D for the derivative oper-ator. It was initially met with skepticism because Heaviside’s exposition lacked rigor.Nowadays, the preferred method for solving ODEs is based on the Laplace transform orthe Fourier transform. The operator-based formalism that is presented Section 5.3 is astandard application of distribution theory and Green’s functions [Zem10].

The main advantage of the framework of generalized stochastic processes is that itgreatly facilitates the transposition of the deterministic methods to the solution of lin-ear stochastic differential equations (SDEs). The procedure is fairly straightforward inthe stable scenario since all underlying operators are S -continuous. As we saw in sec-tions 5.4 and 5.5.1, the extension to the unstable regime is possible as well, but morechallenging mathematically, which explains why the developments in this area are morerecent. The operator-based approach for solving unstable SDEs was initiated in [BU07] inan attempt to link splines and fractals. It was further refined in another series of papers[UTAKed, UTSed, TVDVU09, UT11, SU12].

92



Chapter 6

Splines and wavelets

A fundamental aspect of our formulation is that the whitening operator L is naturally tiedto some underlying B-spline function, which will play a crucial role in the sequel. Thespline connection also provides a strong link with wavelets [UB03].

In this chapter, we review the foundations of spline theory and show how one can con-struct B-spline basis functions and wavelets that are tied to some specific operator L. Thechapter starts with a gentle introduction to wavelets that exploits the analogy with Legosblocks. This naturally leads to the formulation of a multiresolution analysis of L2(R) us-ing piecewise-constant functions and a de visu identification of Haar wavelets. We thenproceed in Section 6.2 with a formal definition of our generalized brand of splines—thecardinal L-splines—followed by a detailed discussion of the fundamental notion of Rieszbasis. In Section 6.3, we systematically cover the first-order operators with the construc-tion of exponential B-splines and wavelets, which have the convenient property of be-ing orthogonal. We then address the theory in its full generality and present two gen-eric methods for constructing B-spline basis functions (Section 6.4) and semi-orthogonalwavelets (Section 6.5). The pleasing aspect is that these results apply to the whole classof shift-invariant differential operators L whose null space is finite-dimensional (possiblytrivial), which are precisely those that can be safely inverted to specify sparse stochasticprocesses.

6.1 From Legos to wavelets

It is instructive to get back to our introductory example of piecewise-contant splines inChapter 1 (§1.3) and to show how these are naturally connected to wavelets. The funda-mental idea in wavelet theory is to construct a series of fine-to-coarse approximations ofa function s(r ) and to exploit the structure of the multiresolution approximation errors,which are orthogonal across scale. Here, we shall consider a series of approximating sig-nals {si }i2Z, where si is a piecewise-constant spline with knots positioned on 2iZ. Thesemultiresolution splines are represented by their B-spline expansion

si (r ) =X

k2Zci [k]¡i ,k (r ), (6.1)

where the B-spline basis functions (rectangles) are dilated versions of the cardinal onesby a factor of 2i

¡i ,k (r ) =Ø0+

µ

r °2i k

2i

∂

=

8

<

:

1, for r 2£

2i k,2i (k +1)¢

0, otherwise.(6.2)

93



6. SPLINES AND WAVELETS

The variable i is the scale index that specifies the resolution (or knot spacing) a = 2i ,while the integer k encodes for the spatial location. The B-spline of degree 0, ¡ = ¡0,0 =Ø0+, is the scaling function of the representation. Interestingly, it is the identification of a

proper scaling function that constitutes the most fundamental step in the construction ofa wavelet basis of L2(R).

Definition 18 (Scaling function). ¡ 2 L2(R) is a valid scaling function if and only if it sat-isfies the following three properties:– Two-scale relation

¡(r /2) =X

k2Zh[k]¡(r °k), (6.3)

where the sequence h 2 `1(Z) is the so-called refinement mask.– Partition of unity

X

k2Z¡(r °k) = 1 (6.4)

– The set of functions {¡(·°k)}k2Z forms a Riesz basis.

In practice, a given brand of (orthogonal) wavelets (e.g., Daubechies or splines) is oftensummarized by its refinement filter h since the latter uniquely specifies ¡, subject to theadmissibility constraints (6.4) and ¡ 2 L2(R). In the case of the B-spline of degree 0, wehave that h[k] = ±[k]+±[k °1], where

±[k] =

8

<

:

1, for k = 0

0, otherwise

is the discrete Kronecker impulse. This translates into what we jokingly refer to as theLego-Duplo relation 1

Ø0+(r /2) =Ø0

+(r )+Ø0+(r °1). (6.5)

The fact that Ø0+ satisfies the partition of unity is obvious. Likewise, we already observed

in Chapter 1 that Ø0+ generates an orthogonal system which is a special case of a Riesz

basis.

By considering the rescaled version of such a basis, we specify the subspace of splines atscale i as

Vi =(

si (r ) =X

k2Zci [k]¡i ,k (r ) : ci 2 `2(Z)

)

Ω L2(R)

which, in our example, contains all the finite-energy functions that are piecewise-constant on the intervals

£

2i k,2i (k +1)¢

with k 2 Z. The two-scale relation (6.3) impliesthat the basis functions at scale i = 1 are contained in V0 (the original space of cardinalsplines) and, by extension, in Vi for i ∑ 0. This translates into the general inclusion prop-erty Vi 0 Ω Vi for any i 0 > i , which is fundamental to the theory. A subtler point is thatthe closure of

S

i2ZVi is equal to L2(R), which follows from the fact that any finite-energyfunction can be approximated arbitrarily well by a piecewise-constant spline when thesampling step 2i tends to zero (i !°1). The necessary and sufficient condition for thisasymptotic convergence is the partition of unity (6.4), which ensures that the representa-tion is complete.

1. The Duplos are the larger-scale versions of the Lego building blocks and are more suitable for smallerchildren to play with. The main point of the analogy with wavelets is that Legos and Duplos are compatible;they can be combined to build more complex shapes. The enabling property is that a Duplo is equivalent to twosmaller Legos placed next to each other, as expressed by (6.5)

94



6.1. From Legos to wavelets

+

-

Wavelets: Haar transform revisited

+

-

ri(x) = si!1(x) ! si(x)

+

-

Wavelet:�(x)

s2(x)

s1(x)

s3(x)

s0(x)

Figure 6.1: Multiresolution signal analysis using piecewise-constant splines with a dy-adic scale progression. Left: multiresolution pyramid. Right: error signals between twosuccessive levels of the pyramid.

Having set the notation and specified the underlying hierarchy of function spaces, we nowproceed with the multiresolution approximation procedure starting from the fine-scalesignal s0(x), as illustrated in Figure 6.1. Given the sequence c0[·] of fine-scale coefficients,our task is to construct the best spline approximation at scale 1 which is specified by its B-spline coefficients c1[·] in (6.1) with i = 1. It is easy to see that the minimum-error solution(orthogonal projection of s0 onto V1) is obtained by taking the mean of two consecutivesamples. The procedure is then repeated at the next coarser scale and so forth until onereaches the bottom of the pyramid, as shown on the left-hand side of Figure 6.1. Thedescription of this coarsening algorithm is

ci [k] = 12

ci°1[2k]+ 12

ci°1[2k +1] =°

ci°1 § h¢

[2k]. (6.6)

It is run recursively for i = 1, . . . , imax where imax denotes the bottom level of the pyramid.The outcome is a multiresolution analysis of our input signal s0.

In order to uncover the wavelets, it is enlightening to look at the residual signals ri (r ) =si°1(r )° si (r ) 2 Vi°1 on the right of Figure 6.1. While these are splines that live at thesame resolution as si°1, they actually have half the apparent degrees of freedom. Theseerror signals exhibit a striking sign-alternation pattern due to the fact that two consecut-ive samples (ci°1[2k],ci°1[2k +1]) are at an equal distance from their mean value (ci [k]).This suggests rewriting the residuals more concisely in terms of oscillating basis functions(wavelets) at scale i , like

ri (r ) = si°1(r )° si (r ) =X

k2Zdi [k]√i ,k (r ), (6.7)

where the (non-normalized) Haar wavelets are given by

√i ,k (r ) =√Haar

µ

r °2i k

2i

∂

95




with the Haar wavelet being defined by (1.19). The wavelet coefficients di [·] are given bythe consecutive half differences

di [k] = 12

ci°1[2k]° 12

ci°1[2k +1] =°

ci°1 § g¢

[2k]. (6.8)

More generally, since the wavelet template at scale i = 1, √1,0 2V0, we can write

√(r /2) =X

k2Zg [k]¡(r °k) (6.9)

which is the wavelet counterpart of the two-scale relation (6.3). In the present example,we have g [k] = (°1)k h[k], which is a general relation that is characteristic of an or-thogonal design. Likewise, in order to gain in generality, we have chosen to expressthe decomposition algorithms (6.6) and (6.8) in terms of discrete convolution (filter-ing) and downsampling operations where the corresponding Haar analysis filters areh[k] = 1

2 h[°k] and g [k] = 12 (°1)k h[°k]. The Hilbert-space interpretation of this approx-

imation process is that ri 2Wi , where Wi is the orthogonal complement of Vi in Vi°1; thatis, Vi°1 =Vi +Wi with Vi ?Wi (as a consequence of the orthogonal-projection theorem).

Finally, we can close the loop by observing that

s0(r ) = simax (r )+imaxX

i=1

°

si°1(r )° si (r )| {z }

ri (r )

¢

=X

k2Zcimax [k]¡imax,k (r )+

imaxX

i=1

X

k2Zdi [k]√i ,k (r ), (6.10)

which provides an equivalent, one-to-one representation of the signal in an orthogonalwavelet basis, as illustrated in Figure 6.2. More generally, we can push the argument tothe limit and apply the decomposition to any finite-energy function

8s 2 L2(R), s =X

i2Z

X

k2Zdi [k]√i ,k , (6.11)

where di [k] = hs,√i ,kiL2 and {√i ,k }(i ,k)2Z2 is a suitable (bi-)orthogonal wavelet basis withthe property that h√i ,k ,√i 0,k 0 iL2 = ±k°k 0,i°i 0 .

Remarkably, the whole process described above—except the central expressions in (6.6)and (6.8), and the equations explicitly involving Ø0

+—is completely generic and applic-able to any other wavelet basis of L2(R) that is specified in terms of a wavelet filter g anda scaling function ¡ (or, equivalently, an admissible refinement filter h). The bottom lineis that the wavelet decomposition and reconstruction algorithm is fully described by thefour digital filters (h, g , h, g ) that form a perfect reconstruction filterbank. The Haar trans-form is associated with the shortest-possible filters. Its less favorable aspects are that thebasis functions are discontinuous and that the scale-truncated error decays only like thefirst power of the sampling step a = 2i (first order of approximation).

The fundamental point of our formulation is that the Haar wavelet is matched to the purederivative operator D = d

dr , which goes hand in hand with Lévy processes (see Chapter1). In that respect, the critical observations relating to spline and wavelet theory are asfollows:– all piecewise-constant functions can be interpreted as D-splines;– the Haar wavelet acts as a smoothed version of the derivative in the sense that™Haar =

D¡, where ¡ is an appropriate kernel (triangle function);– the B-spline of degree 0 can be expressed as Ø0

+ = ØD = DdD°1±, where the finite-difference operator Dd is the discrete counterpart of D.

We shall now show how these ideas are extendable to a much broader class of differentialoperators L.

96



6.2. Basic concepts and definitionsWavelets: Haar transform revisited

+

+

+

=

r1 =�

k

d1[k]�1,k

r2 =�

k

d2[k]�2,k

s3 =�

k

c3[k]�3,k

r3 =�

k

d3[k]�3,k

s0 =�

k

c0[k]�0,k

Figure 6.2: Decomposition of a signal into orthogonal scale components. The error sig-nals ri = si°1 ° si between two successive signal approximations are expanded using aseries of properly scaled wavelets.

6.2 Basic concepts and definitions

6.2.1 Spline-admissible operators

Let L : S (Rd ) !S 0(Rd ) be a generic Fourier-multiplier operator with frequency responsebL(!). We further assume that L has a continuous extension L : X (Rd ) ! S 0(Rd ) to somelarger space of functions X (Rd ) with S (Rd ) ΩX (Rd ).

The null space of L is denoted by NL and defined as

NL = {p0(r ) : Lp0(r ) = 0}.

The immediate consequence of L being LSI is that NL is shift-invariant as well, in thesense that p0(r ) 2NL , p0(r °r0) 2NL for any r0 2Rd . We shall use this property to arguethat NL generally consists of generalized functions whose Fourier transforms are pointdistributions. In the space domain, they correspond to modulated polynomials, whichare linear combinations of exponential monomials of the form ejh!0,r i

r

n with !0 2 Rd

and multi-index n = (n1, . . . ,nd ) 2 Nd . It actually turns out that the existence of a singlesuch element in NL has direct implications on the structure and dimensionality of theunderlying function space.

Proposition 16 (Characterization of null space). If L is LSI and pn

(r ) = ehz0,r ir

n 2 NLwith z0 2 Cd , then NL does necessarily include all exponential monomials of the formp

m

(r ) = ehz0,r ir

m with m ∑ n. In addition, if NL is finite-dimensional, it can only con-sist of atoms of that particular form.

Proof. The LSI property implies that pn

(r ° r0) 2 NL for any r0 2 Rd . To make our pointabout the inclusion of the lower-order exponential polynomials in NL, we start by ex-

97




panding the scalar term (ri ° r0,i )n as

(ri ° r0,i )ni =niX

m=0

√

ni

m

!

r mi (°1)ni°mr ni°m

0,i =X

m+k=ni

ni !m! k !

(°1)k r mi r k

0,i .

By proceeding in a similar manner with the other monomials and combining the results,we find that

(r ° r0)n =X

m+k=n

n!m! k !

(°1)|k | r

m

r

k

0

=X

m∑n

bm

(r0)r

m

with polynomial coefficients bm

(r0) that depend upon the multi-index m and the shift r0.Finally, we note that the exponential factor ehz0,r i can be shifted by r0 by simple multiplic-ation with a constant (see (6.12) below). These facts taken together establish the structureof the underlying vector space. As for the last statement, we rely on the theory of Lie groupthat tells us that the only finite-dimensional collection of functions that are translation-invariant is made of exponential polynomials. The pure exponentials ehz0,r i (with n = 0)are special in that respect: They are the eigenfunctions of the shift operator in the sensethat

ehz0,r°r0i =∏(r0) ehz0,r i (6.12)

with the (complex) eigenvalue ∏(r0) = ehz0,r0i, and hence the only elements that specifyshift-invariant subspaces of dimension 1.

Since our formulation relies on the theory of generalized functions, we shall focus on therestriction of NL to S 0(Rd ). This rules out the exponential factors z0 = Æ0 + j!0 in Pro-position 16 with Æ0 2 Rd \ {0} for which the Fourier-multiplier operator is not necessarilywell-defined. We are then left with null-space atoms of the form ejh!0,r i

r

n with !0 2 Rd ,which are functions of slow growth.

The next important ingredient is the Green’s function ΩL of the operator L. Its definingproperty is LΩL = ±, where± is the d-dimensional Dirac distribution. Since there are manyequivalent Green’s functions of the form ΩL+p0 where p0 2NL is an arbitrary componentof the null space, we resolve the ambiguity by defining the (primary) Green’s function ofL as

ΩL(r ) =F°1Ω

1bL(!)

æ

(r ), (6.13)

with the requirement that ΩL 2 S 0(Rd ) is an ordinary function of slow growth. In otherwords, we want ΩL(r ) to be defined pointwise for any r 2 Rd and to grow no faster thana polynomial. The existence of the generalized inverse Fourier transform (6.13) imposessome minimal continuity and decay conditions on 1/bL(!) and also puts some restrictionson the number and nature of its singularities

°

e.g., the zeros of bL(!)¢

.

Definition 19 (Spline admissibility). The Fourier-multiplier operator L : X (Rd ) !S 0(Rd )with frequency response bL(!) is called spline admissible if (6.13) is well-defined and ΩL(r )is an ordinary function of slow growth.

An important characteristic of spline-admissible operators is the rate of growth of theirfrequency response at infinity.

Definition 20 (Order of a Fourier multiplier). The Fourier-multiplier bL(!) is of (asymp-totic) order ∞ 2R+ if there exists a radius R 2R+ and a constant C such that

C |!|∞ ∑ |bL(!)| (6.14)

for all |!|∏ R where ∞ is critical in the sense that the condition fails for any larger value.

98



6.2. Basic concepts and definitions

The order is in direct relation with the degree of smoothness of the Green’s function ΩL.In the case of a scale-invariant operator, it also coincides with the scaling order (or degreeof homogeneity) of bL(!). For instance, the fractional derivative operator D∞, which isdefined via the Fourier multiplier (j!)∞, is of order ∞. Its Green’s function is given by (seeTable A.1 in Appendix A)

ΩD∞ (r ) =F°1Ω

1(j!)∞

æ

(r ) =r ∞°1+°(∞)

, (6.15)

where ° is Euler’s gamma function (see Appendix C.2) and r ∞°1+ = max(0,r ∞°1). Clearly,

the latter is a function of slow growth. It a has single singularity at the origin whose Hölderexponent is (∞° 1), and is infinitely differentiable everywhere else. It follows that ΩD∞

is uniformly Hölder-continuous of degree (∞° 1). This is one less than the order of theoperator. On the other hand, the null space of D∞ consists of the polynomials of degree

N = d∞°1e since dn (j!)∞

d!n / (j!)∞°n is vanishing at the origin up to order N with (∞°1) ∑N < ∞ (see argumentation in Section 6.4.1).

A fundamental result is that all partial differential operators with constant coefficientsare spline-admissible. This follows from the Malgrange-Ehrenpreis theorem which guar-antees the existence of their Green’s function [Mal, Wag09]. The generic form of suchoperators is

LN =X

|n|<Na

n

@n

with an

2 Rd , where @n is the multi-index notation for @n1+···+nd

@rn11 ··· @r

ndd

. The corresponding

Fourier multiplier is LN (!) = P

|n|<N an

j|n|!n , which is a polynomial of degree N = |n|.The operator is elliptic if bLN (!) vanishes at the origin and nowhere else. More generally,it is called quasi-elliptic of order ∞ if bLN (!) fulfills the growth condition in Definition 20.For d = 1, it is fairly easy to determine ΩL using standard Fourier-inversion techniques(see Chapter 5). Moreover, the condition for quasi-ellipticity of order N is automaticallysatisfied. When moving to higher dimensions, the study of partial differential operatorsand the determination of their Green’s function becomes more challenging because ofthe absence of a general multidimensional factorization mechanism. Yet, it is possible totreat special cases in full generality such as the scale-invariant operators (with homogen-eous, but not necessarily rotation-invariant, Fourier multipliers), or the class of rotation-invariant operators that are polynomials of the Laplacian (°¢).

6.2.2 Splines and operators

The foundation of our formulation is the direct correspondence between a spline-admissible operator L and a particular brand of splines.

Definition 21 (Cardinal L-spline). A function s(r ) (possibly of slow growth) is called a car-dinal L-spline if and only if

Ls(r ) =X

k2Zd

a[k]±(r °k).

The location of the Dirac impulses specifies the spline discontinuities (or knots). Theterm “cardinal” refers to the particular setting where these are located on the Cartesiangrid Zd .

The remarkable aspect in this definition is that the operator L has the role of a math-ematical A-to-D converter since it maps a continuously defined signal s into a discretesequence a = (a[k]). Also note that the weighted sum of Dirac impulses in the r.h.s. of

99




the above equation can be interpreted as the continuous-domain representation of thediscrete signal a—it is a hybrid-type representation that is commonly used in the theoryof linear systems to model ideal sampling (multiplication with a train of Dirac impulses).

The underlying concept of spline is fairly general and it naturally extends to nonuniformgrids.

Definition 22 (Nonuninorm spline). Let {rk }k2S be a set of points (not necessarily finite)that specifies a (nonuniform) grid in Rd . Then, a function s(r ) (possibly of slow growth) isa nonuniform L-spline with knots {rk }k2S if and only if

Ls(r ) =X

k2Sak±(r ° rk ).

The direct implication of this definition is that a (nonuniform) L-spline with knots {rk }can generally be expressed as

s(r ) = p0(r )+X

k2SakΩL(r ° rk ), (6.16)

where ΩL = L°1± is the Green’s function of L and p0 2 NL is an appropriate null-spacecomponent that is typically selected to fulfill some boundary conditions.

In the case where the grid is uniform, it is usually more convenient to express splines interms of localized B-spline functions which are shifted replicates of a simple template ØL,or some other equivalent generator. An important requirement is that the set of B-splinefunctions constitutes a Riesz basis.

6.2.3 Riesz bases

To quote Ingrid Daubechies [Dau92], a Riesz basis is the next best thing after an ortho-gonal basis. The reason for not enforcing orthogonality is to leave more room for otherdesirable features such as simplicity of the construction, maximum localization of thebasis function (e.g., compact support), and, last but not least, fast computational solu-tions.

Definition 23 (Riesz basis). A sequence of functions {¡k (r )}k2Z in L2(Rd ) forms a Rieszbasis if and only if there exist two constants A and B such that

Akck`2 ∑ kX

k2Zck¡k (r )kL2(Rd ) ∑ Bkck`2

for any sequence c = (ck ) 2 `2. More generally, the basis is Lp -stable if there exist two con-stants Ap and Bp such that

Apkck`p ∑ kX

k2Zck¡k (r )kLp (Rd ) ∑ Bpkck`p .

This definition imposes an equivalence between the L2 (Lp , resp.) norm of the continu-ously defined function s(r ) = P

k2Z ck¡k (r ) and the `2 (`p , resp.) norm of its expansioncoefficients (ck ). This ensures that the representation is stable in the sense that a smallperturbation of the expansion coefficients results in a perturbation of comparable mag-nitude on s(r ) and vice versa. Also note that the lower inequality implies that the func-tions {¡k } are linearly independent (by setting s(r ) = 0), which is the defining property of abasis in finite dimensions—but which, on its own, does not ensure stability in infinite di-mensions. When A = B = 1, we have a perfect norm equivalence which translates into the

100



6.2. Basic concepts and definitions

basis being orthonormal (Parseval’s relation). Finally, we like to point out that the the ex-istence of the bounds A and B ensures that the (infinite) Gram matrix is positive-definiteso that it can be readily diagonalized to yield an equivalent orthogonal basis.

In the (multi-)integer shift-invariant case where the basis functions are given by ¡k

(r ) =¡(r °k),k 2Zd , there is a simpler equivalent reformulation of the Riesz basis requirementof Definition 23.

Theorem 21. Let ¡(r ) 2 L2(Rd ) be a B-spline-like generator whose Fourier transform isdenoted by ¡(!). Then, {¡(r °k)}

k2Zd forms a Riesz basis with Riesz bounds A and B ifand only if

0 < A2 ∑X

n2Zd

|¡(!+2ºn)|2 ∑ B 2 <1 (6.17)

for almost every!. Moreover, the basis is Lp -stable for all 1 ∑ p ∑+1 if, in addition,

supr2[0,1]d

X

k2Zd

|¡(r °k)| = A2,1 <+1. (6.18)

Under such condition(s), the induced function space

V¡ =(

s(r ) =X

k2Zd

c[k]¡(r °k) : c 2 `p (Zd )

)

is a closed subspace of Lp (Rd ), including the standard case p = 2.

Observe that the central quantity in (6.17) corresponds to the discrete-domain Fouriertransform of the Gram sequence a¡[k] = h¡(·°k),¡iL2 . Indeed, we have that

A¡(ej!) =X

k2Zd

a¡[k]e°jh!,ki =X

n2Zd

|¡(!+2ºn)|2 (6.19)

where the r.h.s. follows from Poisson’s summation formula applied to the sampling at theintegers of the autocorrelation function (¡

_ §¡)(r ). Formula (6.19) is especially advant-ageous in the case of compactly supported B-splines for which the autocorrelation is of-ten known explicitly (as a B-spline of twice the order) since it reduces the calculation to afinite sum over the support of the Gram sequence (discrete-domain Fourier transform).

Theorem 21 is a fundamental result in sampling and approximation theory [Uns00]. Itis instructive here to briefly run through the L2 part of the proof which also serves asa refresher on some of the standard properties of the Fourier transform. In particular,we like to emphasize the interaction between the continuous and discrete aspects of theproblem.

Proof. We start by computing the Fourier transform of s(r ) = P

k2Zd c[k]¡(r °k), whichgives

F {s}(!) =X

k2Zd

c[k] e°jh!,ki¡(!) (by linearity and shift property)

=C (ej!) · ¡(!),

101




where C (ej!) is recognized as the discrete-domain Fourier transform of c[·]. Next, weinvoke Parseval’s identity and manipulate the Fourier-domain integral as follows

ksk2L2

=Z

Rd

Ø

ØC (ej!)Ø

Ø

2 Ø

Ø¡(!)Ø

Ø

2 d!

(2º)d

=X

n2Zd

Z

[0,2º]d

Ø

ØC (ej(!+2ºn))Ø

Ø

2 Ø

Ø¡(!+2ºn)Ø

Ø

2 d!

(2º)d

=Z

[0,2º]d

Ø

ØC (ej!)Ø

Ø

2 X

n2Zd

Ø

Ø¡(!+2ºn)Ø

Ø

2 d!

(2º)d

=Z

[0,2º]d

Ø

ØC (ej!)Ø

Ø

2A¡(ej!)

d!

(2º)d.

There, we have used the fact that C (ej!) is 2º-periodic and the nonnegativity of the integ-rand to interchange the summation and the integral (Fubini). This naturally leads to theinequality

inf!2[0,2º]d

A¡(ej!) ·kck2`2

∑ ksk2L2

∑ sup!2[0,2º]d

A¡(ej!) ·kck2`2

where we are now making use of Parseval’s identity for sequences, so that

kck2`2

=Z

[0,2º]d

Ø

ØC (ej(!)Ø

Ø

2 d!

(2º)d.

The final step is to show that these bounds are sharp. This can be accomplished throughthe choice of some particular (bandlimited) sequence c[·].

Note that the almost everywhere part of (6.17) can be dropped when ¡ 2 L1(Rd ) becausethe Fourier transform of such a function is continuous (Riemman-Lebesgue Lemma).

While the result of Theorem 21 is restricted to the classical Lp spaces, there is no funda-mental difficulty in extending it to wider classes of weighted (with negative powers) Lpspaces by imposing some stricter condition than (6.18) on the decay of ¡. For instance,if ¡ has exponential decay, then the definition of the function space V¡ can be extendedfor all sequences c that are growing no faster than a polynomial. This happens to be theappropriate framework for sampling generalized stochastic processes which do not livein the Lp spaces since they are not decaying at infinity.

6.2.4 Admissible wavelets

The other important tool for analyzing stochastic processes is the wavelet transformwhose basis functions must be “tuned” to the object under investigation.

Definition 24. A wavelet function √ is called L-admissible if it can be expressed as √ =LH¡ with ¡ 2 L1(Rd ).

Observe that we are now considering the Hermitian transpose operator LH = L§

whichis distinct from the adjoint operator L§ when the impulse response has some imaginarycomponent. The reason for this is that the wavelet-analysis step involves a Hermitianinner product h·, ·iL2 whose definition differs by a complex conjugation from that of thedistributional scalar product h·, ·i used in our formulation of stochastic processes whenthe second argument is complex valued; specifically, h f , g iL2 = h f , g i=

R

Rd f (r )g (r ) dr .

The best matched wavelet is the one for which the wavelet kernel¡ is the most localized—ideally, the shortest-possible support assuming that it is at all possible to construct a

102



6.3. First-order exponential B-splines and wavelets

(a) Green functions (c) Interpolators

(b) B-splines (d) Wavelets

� = 0

�1�2�4

0 1

1-1

1-1

1

1

Figure 6.3: Comparison of basis functions related to the first-order differential operatorPÆ = D°ÆI for Æ = 0,°1,°2,°4. (a) Green’s functions ΩÆ(r ). (b) Exponential B-splinesØÆ(r ) (c) Augmented spline interpolators 'int(r ). (d) Orthonormalized versions of theexponential spline wavelets √Æ(r ) = P§

Æ'int(r ).

compactly-supported wavelet basis. The very least is that ¡ should be concentratedaround the origin and exhibit a sufficient rate of decay; for instance, |¡(r )| ∑ C

1+kr kÆ forsome Æ> d .

A direct implication of Definition 24 is that the wavelet √ will annihilate all the com-ponents (e.g., polynomials) that are in the null space of L because h√(·° r0), p0i = h¡(·°r0),Lp0i= 0, for all p0 2 NL and r0 2 Rd . In conventional wavelet theory, this behavior isachieved by designing “N th-derivative-like” wavelets with vanishing moments up to theorder N .

6.3 First-order exponential B-splines and wavelets

Rather than aiming for the highest level of generality right away, we propose to first ex-amine the 1-D first-order scenario in some detail. First-order differential models are im-portant theoretically because they go hand in hand with the Markov property. In thatrespect, they constitute the next level of generalization just beyond the Lévy processes.Mathematically, the situation is still quite comparable to that of the derivative operator inthe sense that it leads to a nice and self-contained construction of (exponential) B-splinesand wavelets. The interesting aspect, though, is that the underlying basis functions are nolonger conventional wavelets that are dilated versions of a single prototype: they now fallinto the lesser-known category of non-stationary 2 wavelets.

The (causal) Green’s function of our canonical first-order operator PÆ = (D ° ÆId) isidentical to the impulse response ΩÆ of the corresponding differential system, while the(one-dimensional) null space of the operator is given by NÆ = {a0eÆr : a0 2 R}. Someexamples of such Green’s functions are shown in Figure 6.3. The case Æ = 0 (darker/redcurve) is the classical one already treated in Section 6.1.

2. In the terminology of wavelets, the term “non-stationary” refers to the property that the shape of the wave-let changes with scale, but not with respect to the location, as the more usual statistical meaning of the termwould suggest.

103




6.3.1 B-spline construction

The natural discrete approximation of the differential operator PÆ = (D°ÆId) is the first-order weighted difference operator

¢Æs(r ) = s(r )°eÆs(r °1). (6.20)

Observe that ¢Æ annihilates the exponentials a0eÆr so that its null space includes NÆ.The corresponding B-spline is obtained by applying ¢Æ to ΩÆ, which yields

ØÆ(r ) =F°1Ω

1°eÆe°j!

j!°Æ

æ

(r ) =

8

<

:

eÆr , for 0 ∑ r < 1

0, otherwise.(6.21)

In effect, the localization by ¢Æ results in a “chopped off” version of the causal Green’sfunction that is restricted to the interval [0,1) (see Figure 6.3b). Importantly, the schemeremains applicable in the unstable scenario Re(Æ) ∏ 0. It always results in a well-definedFourier transform due to the convenient pole-zero cancellation in the central expressionof (6.21). The marginally unstable case Æ= 0 results in the rectangular function shown inred in Figure 6.3, which is the standard basis function for representing piecewise-constantsignals. Likewise, ØÆ generates an orthogonal basis for the space of cardinal PÆ-splinesin accordance with Definition 21. This allows us to specify our prototypical exponentialspline space as V0 = span{ØÆ(·°k)}k2Z with knot spacing 20 = 1.

6.3.2 Interpolator in augmented-order spline space

The second important ingredient is the interpolator for the “augmented-order” spline

space generated by the autocorrelation (Ø_Æ §ØÆ)(r ) of the B-spline. Constructing it is

especially easy in the first-order case because it involves the simple normalization

'int,Æ(r ) = 1

(Ø_Æ §ØÆ)(0)

(Ø_Æ §ØÆ)(r ) (6.22)

Specifically, 'int,Æ is the unique cardinal PHÆ PÆ-spline function that vanishes at all the

integers except at the origin where it takes the value one (see Figure 6.3c). Its classicaluse is to provide a sinc-like kernel for the representation of the corresponding family ofsplines, and also for the reconstruction of spline-related signals, including special brandsof stochastic processes, from their integer samples [UB05b]. Another remarkable andlesser known property is that this function provides the proper smoothing kernel for de-fining an operator-like wavelet basis.

6.3.3 Differential wavelets

In the generalized spline framework, instead of specifying a hierarchy of multiresolutionsubspaces of L2(R) (the space of finite-energy functions) via the dilation of a scaling func-tion, one considers the fine-to-coarse sequence of L-spline spaces

Vi = {s(r ) 2 L2(R) : Ls(r ) =X

k2Zai [k]±(r °2i k)},

where the embedding Vi ∂ Vj for i ∑ j is obvious from the (dyadic) hierarchy of splineknots, so that s j 2 Vj implies that s j 2 Vi with an appropriate subset of its coefficientsai [k] being zero.

104



6.4. Generalized B-spline basis

We now detail the construction of a wavelet basis at resolution 1 such that W1 =span{√1,k }k2Z with W1 ? V1 and V1 +W1 = V0 = span{ØÆ(·°k)}k2Z. The recipe is to take√1,k (r ) =√Æ(r °1°2k)/k√ÆkL2 where √Æ is the mother wavelet given by

√Æ(r ) = PHÆ 'int,Æ(r ) /¢H

Æ ØÆ(r ).

There, ¢HÆ is the Hermitian adjoint of the finite-difference operator ¢Æ. Examples of such

exponential-spline wavelets are shown in Figure 6.3d, including the classical Haar wavelet(up to a sign change) which is obtained for Æ = 0 (red curve). The basis functions √1,kare shifted versions of √Æ that are centered at the odd integers and normalized to havea unit norm. Since these wavelets are non-overlapping, they form an orthonormal basis.Moreover, the basis is orthogonal to the coarser spline space V1 as a direct consequence ofthe interpolating property of 'int,Æ (Proposition 19 in Section 6.5). Finally, based on factthat √1,k 2 V0 for all k 2 Z, one can show that these wavelets span W1, which translatesinto

W1 =(

v(r ) =X

k2Zv1[k]√1,k (r ) : v1 2 `2(Z)

)

.

This method of construction extends to the other wavelet subspaces Wi provided that theinterpolating kernel 'int,Æ is substituted by its proper counterpart at resolution a = 2i°1

and the sampling grid adjusted accordingly. Ultimately, this results into a wavelet basis ofL2(R) whose members are all PÆ-splines—that is, piecewise-exponential with parameterÆ—but not dilates of the same prototype unless Æ = 0. Otherwise, the correspondingdecomposition is not fundamentally different from a conventional wavelet expansion.The basis functions are equally well localized and the scheme admits the same type offast reversible filterbank algorithm, albeit with scale-dependent filters [KU06].

6.4 Generalized B-spline basis

The procedure of Section 6.3.1 remains applicable for the broad class of spline-admissibleoperators (see Definition 19) in one or multiple dimensions. The two ingredients for con-structing a generalized B-spline basis are: 1) the knowledge of the Green’s function ΩL ofthe operator L, and 2) the availability of a discrete approximation (finite-difference-like)of the operator of the form

Lds(r ) =X

k2Zd

dL[k]s(r °k) (6.23)

with d 2 `1(Zd ) that fulfills the null-space matching constraint 3

Ldp0(r ) = Lp0(r ) = 0 for all p0 2NL. (6.24)

The generalized B-spline associated with the operator L is then given by

ØL(r ) = LdΩL(r ) =F°1

(

P

k2Zd dL[k]e°jhk ,!i

bL(!)

)

(r ), (6.25)

where the numerator and denominator in the r.h.s. expression correspond to the fre-quency responses of Ld and L, respectively. The null-space matching constraint is es-pecially helpful for the unstable cases where ΩL › L1(Rd ): it ensures that the zeros of bL(!)(singularities) are cancelled by some corresponding zeros of bLd(!) so that the Fouriertransform of ØL remains bounded.

3. We want the null space of Ld to include NL and to remain the smallest possible. In that respect, it is worthnoting that the null space of a discrete operator will always be much larger than that of its continuous-domaincounterpart. For instance, the derivative operator D suppresses constant signals, while its finite-differencecounterpart annihilates all 1-periodic functions, including the constants.

105




Definition 25. The functionØL specified by (6.25) is an admissible B-spline for L if and onlyif 1) ØL 2 L1(Rd )\L2(Rd ), and 2) it generates a Riesz basis of the space of cardinal L-splines.

In light of Theorem 21, the latter property requires the existence of the two Riesz boundsA and B such that

0 < A2 ∑X

n2Zd

|ØL(!+2ºn)|2 =Ø

Ø

P

k2Zd dL[k]e°jhk ,!iØØ

2

P

n2Zd |bL(!+2ºn)|2∑ B 2. (6.26)

A direct consequence of (6.25) is that

LØL(r ) =X

k2Zd

dL[k]±(r °k) (6.27)

so that ØL is itself a cardinal L-spline in accordance with Definition 21. The bottom line inDefinition 25 is that any cardinal L-spline admits a unique representation in the B-splinebasis {ØL(·°k)}

k2Zd as

s(r ) =X

k2Zd

c[k]ØL(r °k) (6.28)

where the c[k] are the B-spline coefficients of s.

While (6.25) provides us with a nice recipe for constructing B-splines, it does not guar-antee that the Riesz-basis condition (6.26) is satisfied. This needs to be established on acase-by-case basis. The good news for the present theory of stochastic processes is thatB-splines are available for virtually all the operators that have been discussed so far.

6.4.1 B-spline properties

To motivate the use of B-splines, we shall first restrict our attention to the space VL ofcardinal L-splines with finite energy, which is formally defined as

VL =(

s(r ) 2 L2(Rd ) : s(r ) =X

k2Zd

a[k]±(r °k)

)

. (6.29)

The foundation of spline theory is that there are two complementary ways of representingsplines using different types of basis functions: Green’s functions versus B-splines. Thefirst representation follows directly from the Definition 21

°

see also (6.16)¢

and is given by

s(r ) = p0(r )+X

k2Zd

a[k]ΩL(r °k), (6.30)

where p0 2 NL is a suitable element of the null space of L and where ΩL = L°1± is theGreen’s function of the operator. The functions ΩL(·°k) are nonlocal and very far frombeing orthogonal. In many cases, they are not even part of VL, which raises fundamentalissues concerning the L2 convergence of the infinite sum 4 in (6.30) and the conditionsthat must be imposed upon the expansion coefficients a[·]. The second type of B-splineexpansion (6.28) does not have such stability problems. This is the primary reason why itis favored by practitioners.

4. Without further assumptions on ΩL and a, (6.30) is only valid in the weak sense of distributions.

106




Stable representation of cardinal L-splines

The equivalent B-spline specification of the space VL of cardinal splines is

VL =(

s(r ) =X

k2Zd

c[k]ØL(r °k) : c[·] 2 `2(Zd )

)

,

where the generalized B-spline ØL satisfies the conditions in Definition 25. The Riesz-basis property ensures that the representation is stable in the sense that, for all s 2VL, wehave that

Akck`2 ∑ kskL2 ∑ Bkck`2 . (6.31)

There, kck`2 =°

P

k2Zd |c[k]|2¢

12 is the `2-norm of the B-spline coefficients c. The fact that

the underlying functions are cardinal L-splines is a simple consequence of the atoms be-ing splines themselves. Moreover, we can easily make the link with (6.30) by using (6.27),which yields

Ls(r ) =X

k2Zd

c[k]LØL(r °k) =X

k2Zd

(c §d)[k]| {z }

a[k]

±(r °k).

The less obvious aspect, which is implicit in the definition of the B-spline, is the complete-ness of the representation in the sense that the B-spline basis spans the space VL definedby (6.29). We shall establish this by showing that the B-splines are capable of reproducingΩL as well as any component p0 2 NL in the null space of L. The implication is that anyfunction of the form (6.30) admits a unique expansion in a B-spline basis. This is also truewhen the function is not in L2(Rd ), in which case the B-spline coefficients c are no longerin `2(Zd ) due to the discrete-continuous norm equivalence (6.31).

Reproduction of Green’s functions

The reproduction of Green’s functions follows from the special form of (6.25). To revealit, we consider the inverse L°1

d of the discrete localization operator Ld specified by (6.23),whose continuous-domain impulse response is written as

L°1d ±(r ) =

X

k2Zd

p[k]±(r °k) =F°1Ω

1P

k2Zd dL[k]e°jhk ,!i

æ

.

The sequence p, which can be determined by generalized inverse Fourier transform, is ofslow growth with the property that (p §d)[k] = ±[k]. The Green’s function reproductionformula is then obtained by applying L°1

d to the B-spline ØL and making use of the left-inverse property of L°1

d . Thus,

L°1d ØL(r ) = L°1

d LdΩL(r ) = ΩL(r )

results into

ΩL(r ) =X

k2Zd

p[k]ØL(r °k). (6.32)

To illustrate the concept, let us get back to our introductory example in Section in 6.3.1with L = PÆ = (D°ÆId) where Re(Æ) < 0. The frequency response of this first-order oper-ator is

bPÆ(!) = j!°Æ,

107




while its Green’s function is given by

ΩÆ(r ) = +(r )eÆr =F°1Ω

1j!°Æ

æ

(r ).

On the discrete side of the picture, we have the finite-difference operator ¢Æ with

¢Æ(!) = 1°eÆ°j!,

and its inverse ¢°1Æ whose expansion coefficients are

pÆ[k] = +[k]eÆk =F°1d

Ω

1

1°eÆe°j!

æ

[k],

where F°1d denotes the discrete-domain inverse Fourier transform 5. The application of

(6.32) then yields the exponential-reproduction formula

+(r )eÆr =1X

k=0eÆkØÆ(t °k) (6.33)

whereØÆ is the exponential B-spline defined by (6.21). Note that the range of applicabilityof (6.33) extends to Re(Æ) ∑ 0.

Reproduction of null-space components

A fundamental property of B-splines is their ability to reproduce the components that arein the null space of their defining operator. In the case of our working example, we cansimply extrapolate (6.33) for negative indices, which yields

eÆr =X

k2ZeÆkØÆ(r °k).

It turns out that this reproduction property is induced by the matching null-space con-straint (6.24) that is imposed upon the localization filter. While the reproduction of ex-ponentials is interesting in its own right, we shall focus here on the important case ofpolynomials and provide a detailed Fourier-based analysis.

We start by recalling that the general form of a multidimensional polynomial of total de-gree N is

qN (r ) =X

|n|∑Na

n

r

n

using the multi-index notation with n = (n1, . . . ,nd ) 2 Nd , r

n = r n11 · · ·r nd

d , and |n| = n1 +·· ·+nd . The generalized Fourier transform of qN 2S 0(Rd ) (see Table 3.3 and entry r

n f (r )with f (r ) = 1) is given by

qN (!) =X

|n|∑N(2º)d a

n

j|n|@n±(!),

where @n± denotes the nth partial derivative of the multidimensional Dirac impulse ±.Hence, the Fourier multiplier bL will annihilate the polynomials of order N if and only ifbL(!)@n±(!) = 0 for all |n| ∑ N . To understand when this condition is met, we expandbL(!)@n±(!) in terms of @k

bL(0), |k |∑ |n| by using the general product rule for the manip-ulation of Dirac impulses and their derivatives given by

f (r ) @n±(r ° r0) =X

k+l=n

n!k ! l !

(°1)|n|+|l |@k f (r0) @l±(r ° r0).

5. Our definition of the inverse discrete Fourier transform in 1-D is

F°1d

n

H(ej!)o

[k] = 12º

Rº°º H(ej!)ej!k d! with k 2Z.

108




The latter follows from Leibnitz’ rule for partially differentiating a product of functions as

@n ( f ') =X

k+l=n

n!k ! l !

@k f @l',

and the adjoint relation h', f @n±(·° r0)i = h@n§( f '),±(·° r0)i with @n§ = (°1)|n|@n . Thisallows us to conclude that the necessary and sufficient condition for the inclusion of thepolynomials of order N in the null space of L is

@n

bL(0) = 0, for all n 2 N d with |n|∑ N , (6.34)

which is equivalent to bL(!) = O(k!kN+1) around the origin. Note that this behavior isprototypical of scale-invariant operators such as fractional derivatives and Laplacians.The same condition has obviously to be imposed upon the localization filter bLd for theFourier transform of the B-spline in (6.25) to be nonsingular at the origin. Since bLd(!) is2º-periodic, we have that

@n

bLd(2ºk) = 0, k 2Zd ,n 2Nd with |n|∑ N . (6.35)

For practical convenience, we shall assume that the B-spline ØL is normalized to have aunit integral 6 so that ØL(0) = 1. Based on (6.35) and ØL(!) = bLd(!)/bL(!), we find that

8

<

:

ØL(0) = 1,

@n ØL(2ºk) = 0, k 2Zd \{0},n 2Nd with |n|∑ N ,(6.36)

which are the so-called Strang-Fix conditions of order N . Recalling that j|n|@n ØL(!) is theFourier transform of r

nØL(r ) and that periodization in the signal domain corresponds toa sampling in the Fourier domain, we finally deduce that

X

k2Zd

(r °k)nØL(r °k) = j|n|@n ØL(0) =Cn

, n 2Nd with 0 < |n|∑ N , (6.37)

with the implicit assumption thatØL has a sufficient order of algebraic decay for the abovesums to be convergent. The special case of (6.37) with n = 0 reads

X

k2Zd

ØL(r °k) = 1 (6.38)

and is called the partition of unity. It reflects the fact that ØL reproduces the constants.More generally, Condition (6.37)

°

or (6.36)¢

is equivalent to the existence of sequences pn

such that

r

n =X

k2Zd

pn

[k]ØL(r °k) for all |n|∑ N , (6.39)

which is a more direct statement of the polynomial-reproduction property. For instance,(6.37) with n = (1, . . . ,1) implies that

r

√

X

k2Zd

ØL(r °k)

!

| {z }

p(0,...,0)=1

°X

k2Zd

kØL(r °k) =C(1,...,1),

6. This is always possible thanks to Condition (6.24), which ensures that ØL(0) 6= 0 due to a proper cancella-tion of poles and zeros in the r. h. s. of (6.25).

109




from which one deduces that p(1,...,1)[k] = k +C(1,...,1). The other sequences pn

, whichare polynomials in k , may be determined in a similar fashion by proceeding recursively.Another equivalent way of stating the Strang-Fix conditions of order N is that the sums

X

k2Zd

k

nØL(r °k) =X

l2Zd

j|n| @n

!

°

e°jh!,r iØL(°!)¢

Ø

Ø

!=2ºl

are polynomials with leading term r

n for all |n| ∑ N . The left-hand-side expression fol-lows from Poisson’s summation formula 7 applied to the function f (x) = x

nØL(r °x) withr being considered as a constant shift.

Localization

The guiding principle for designing B-splines is to produce basis functions that are max-imally localized onRd . Ideally, B-splines should have the smallest possible support whichis the property that makes them so useful in applications. When it is not possible to con-struct compactly supported basis functions, the B-spline should at least be concentratedaround the origin and satisfy some decay bound with the tightest possible constants. Theprimary types of spatial localization, by order of preference, are

1. Compact support: ØL(r ) = 0 for all r › ≠ where ≠ Ω Rd is a convex set with thesmallest-possible Lebesgue measure.

2. Exponential decay: |ØL(r ° r0)| ∑ C exp(°Æ|r |) for some r0 2 Rd and the largest-possible Æ 2R+.

3. Algebraic decay: |ØL(r ° r0)| ∑ C 11+kr kÆ for some r0 2 Rd and the largest possible Æ 2

R+.

By relying on the classical relations that link spatial decay to the smoothness of the Fouriertransform, one can get a good estimate of spatial decay based on the knowledge of theFourier transform ØL(!) = bLd(!)/bL(!) of the B-spline. Since the localization filter bLd(!)acts by compensating the (potential) singularities of L(!), the guiding principle is that therate of decay is essentially determined by the degree of differentiability of bL(!).

Specifically, if ØL(!) is differentiable up to order N , then the B-spline ØL is guaranteed tohave an algebraic decay of order N . To show this, we consider the Fourier transform pairr

nØL(r ) $ j|n|@n ØL subject to the constraint that @n ØL 2 L1(Rd ) for all n| < N . From thedefinition of the inverse Fourier integral, it immediately follows that

Ø

Ø

r

nØL(r )Ø

Ø∑ 1

(2º)dk@n ØLkL1 ,

which, when properly combined over all multi-indices |n| < N , yields an algebraic decayestimate with Æ= N .

By pushing the argument to the limit, we see that exponential decay (which is faster thanany order of algebraic decay) requires that ØL 2C1(Rd ) (infinite order of differentiability),which is only possible if bL(!) 2C1(Rd ) as well.

The ultimate limit in Fourier-domain regularity is when ØL has an analytic extension thatis an entire 8 function. In fact, by the Paley-Wiener theorem (Theorem 22 below), oneachieves compact support of ØL if and only if ØL(≥) is an entire function of exponentialtype. To explain this concept, we focus on the one-dimensional case where the B-spline

7. The standard form of Poisson’s summation formula isP

k2Zd f (k) =P

l2Zd f (2ºl ): It is valid for any Four-

ier pair f , f =F { f } 2 L1(Rd ) with sufficient decay for the two sums to be convergent.8. An entire function is a function that is analytic over the whole complex plane C.

110




ØL is supported in the finite interval [°A,+A]. We then consider the holomorphic Fourier(or Fourier-Laplace) transform of the B-spline given by

ØL(≥) =Z+A

°AØL(r )e°≥r dr (6.40)

with ≥ = æ+ j! 2 C, which formally amounts to substituting j! by ≥ in the expression ofthe Fourier transform of ØL. In order to obtain a proper analytic extension, we need toverify that ØL(≥) satisfies the Cauchy-Riemann equation. We shall do so by applying adominated-convergence argument. To that end, we construct the exponential bound

Ø

ØØL(≥)Ø

Ø∑ eA|≥|Z+A

°A|ØL(r )| dr

∑ eA|≥|

s

Z+A

°A1 dr

s

Z+A

°A|ØL(r )|2 dr

= eA|≥|p2A kØLkL2

where we have applied Cauchy-Schwarz’ inequality to derive the lower inequality. Sincee°≥r for r fixed is itself an entire function and (6.40) is convergent over the whole complexplane, the conclusion is that ØL(≥) is entire as well, in addition to being a function ofexponential type A as indicated by the bound. The whole strength of the Paley-Wienertheorem is that the implication also works the other way around.

Theorem 22 (Paley-Wiener). Let f 2 L2(R). Then, f is compactly supported in [°A, A] ifand only if its Laplace transform

F (≥) =Z

Rf (r )e°≥r dr

is an entire function of exponential type A, meaning that there exists a constant C such that

|F (≥)|∑C eA|≥|

for all ≥ 2C.

The result implies that one can deduce the support of f from its Laplace transform. Wecan also easily extend the result to the case where the support is not centered around theorigin by applying the Paley-Wiener theorem to the autocorrelation function

°

f § f _¢

(r ).The latter is supported in the interval [°2A,2A], which is twice the size of the support off irrespective of its center. This suggest the following expression for the determination ofthe support of a B-spline:

support(ØL) = limsupR!1

log°

sup|≥|∑R

Ø

ØØL(≥)ØL(°≥)Ø

Ø

¢

R. (6.41)

It returns twice the exponential type of the recentered B-spline which gives support(ØL) =2A. While this formula is only strictly valid when ØL(≥) is an entire function, it can be usedotherwise as an operational measure of localization when the underlying B-spline is notcompactly supported. Interestingly, (6.41) provides a measure that is additive with respectto convolution and proportional to the order ∞. For instance, the support of an (exponen-tial) B-spline associated with an ordinary differential operator of order N is precisely N ,as a consequence of the factorization property of such B-splines (see Sections 6.4.2 and6.4.4).

111




To get some insight into (6.41), let us consider the case of the polynomial B-spline of order1 (or degree 0) with ØD(r ) = [0,1)(r ) and Laplace transform

ØD(≥) =√

1°e°≥

≥

!

.

The required product in (6.41) is

ØD(≥)ØD(°≥) = °e≥+2°e°≥

≥2 ,

which is analytic over the whole complex plane because of the pole-zero cancellation at≥= 0. For R sufficiently large, we clearly have that

max|≥|∑R

Ø

ØØD(≥)ØD(°≥)Ø

Ø= eR +2+e°R

R2 ! eR

R2 .

By plugging the above expression in (6.41), we finally get

support(ØD) = limsupR!1

R °2logRR

= 1,

which is the desired result. While the above calculation may look like overkill for the de-termination of the already-known support of ØD, it becomes quite handy for making pre-dictions for higher-order operators. To illustrate the point, we now consider the B-splineof order ∞ associated with the (possibly fractional) derivative operator D∞ whose Fourier-Laplace transform is

ØD∞ (≥) =√

1°e°≥

≥

!∞

.

We can then essentially replicate the previous manipulation while moving the order outof the logarithm to deduce that

support(ØD∞ ) = limsupR!1

∞R °2∞ logRR

= ∞.

This shows that the “support” of the B-spline is equal to its order, with the caveat thatthe underlying Fourier-Laplace transform ØD∞ (≥) is only analytic (and entire) when theorder ∞ is a positive integer. This points to the fundamental limitation that a B-splineassociated with a fractional operator—that is, when bL(≥) is not an entire function—cannotbe compactly supported.

Smoothness

The smoothness of a B-spline refers to its degree of continuity and/or differentiability.Since a B-spline is a linear combination of shifted Green’s functions, its smoothness is thesame as that of ΩL.

Smoothness descriptors come in two flavors—Hölder continuity versus Sobolevdifferentiability—depending on whether the analysis is done in the signal or Fourierdomain. Due to the duality between Fourier decay and order of differentiation, thesmoothness of ØL may be predicted from the growth of bL(!) at infinity without need forthe explicit calculation of ΩL. To that end, one considers the Sobolev spaces W Æ

2 whichare defined as

W Æ2 (Rd ) =

Ω

f :Z

Rd(1+k!k2)Æ| f (!)|2 d!<1

æ

.

112




Since the partial differentiation operator @n corresponds to a Fourier-domain multiplic-ation by (j!)n , the inclusion of f in W Æ

2 (Rd ) requires that its (partial) derivatives be well-defined in the L2 sense up to order Æ. The same is also true for the “Bessel potential”operators

°

Id°¢¢Æ/2 of order Æ, or, alternatively, the fractional Laplacians (°¢)Æ/2 with

Fourier multiplier k!kÆ.

Proposition 17. Let ØL be an admissible B-spline that is associated with a Fourier multi-plier bL(!) of order ∞. Then, ØL 2W Æ

2 (Rd ) for any Æ< ∞°d/2.

Proof. Because of Parseval’s identity, the statement ØL 2 W Æ2 (Rd ) is equivalent 9 to ØL 2

L2(Rd ) and (°¢)Æ/2ØL 2 L2(Rd ). Since the first inclusion is part of the definition, it is suf-ficient to check for the second. To that end, we recall the stability conditions ØL 2 L1(Rd )and d 2 `1(Rd ), which are implicit to the B-spline construction (6.25). These, togetherwith the order condition (6.14), imply that

|bLd(!)|∑ kdk`1

|ØL(!)| =Ø

Ø

Ø

Ø

bLd(!)bL(!)

Ø

Ø

Ø

Ø

∑ minµ

kØLkL1 ,Ckdk`1

k!k∞∂

.

This latter bound allows us to control the L2 norm of (°¢)Æ/2ØL by splitting the spectralrange of integration as

k(°¢)Æ/2ØLk2L2

=Z

Rdk!k2Æ|ØL(!)|2 d!

(2º)d

=Z

k!k<Rk!k2Æ|ØL(!)|2 d!

(2º)d+

Z

k!k>Rk!k2Æ|ØL(!)|2 d!

(2º)d

∑ kØLk2L1

Z

k!k<Rk!k2Æ d!

(2º)d| {z }

I1

+C 2kdk2`1

Z

k!k>Rk!k2Æ°2∞ d!

(2º)d| {z }

I2

.

The first integral I2 is finite due the boundedness of the domain. As for I2, it is convergentprovided that the rate of decay of the argument is faster than d , which corresponds to thecritical Sobolev exponent Æ= ∞°d/2.

As final step of the analysis, we invoke the Sobolev embedding theorems to infer that ØLis Hölder-continuous of order r with r < Æ° d

2 = (∞°d), which essentially means thatØL is differentiable up to order r with bounded derivatives. One should keep in mind,however, that the latter estimate is a lower bound on Hölder continuity, unlike the Sobolevexponent in Proposition 17, which is sharp. For instance, in the case of the 1-D Fouriermultiplier (j!)∞, we find that the corresponding (fractional) B-spline—if it exists—shouldhave a Sobolev smoothness (∞° 1

2 ), and a Hölder regularity r < (∞°1). Note that the latteris arbitrarily close (but not equal) to the true estimate r0 = (∞°1) that is readily deducedfrom the Green’s function (6.15).

6.4.2 B-spline factorization

A powerful aspect of spline theory is that it is often possible to exploit the factorizationproperties of differential operators to recursively generate whole families of B-splines.Specifically, if the operator can be decomposed as L = L1L2, where the B-splines associ-ated to L1 and L2 are already known, then ØL = ØL1 §ØL2 is the natural choice of B-splinefor L.

9. The underlying Fourier multipliers are of comparable size in the sense that there exist two constants c1and c2 such that c1(1+k!k2)Æ ∑ 1+k!k2Æ ∑ c2(1+k!k2)Æ.

113




Proposition 18. Let ØL1 ,ØL2 be admissible B-splines for the operators L1 and L2, respect-ively. Then, ØL(r ) = (ØL1 §ØL2 )(r ) is an admissible B-spline for L = L1L2 if and only if thereexists a constant A > 0 such that

X

n2Zd

Ø

ØØL1 (!+2ºn)ØL2 (!+2ºn)Ø

Ø

2 ∏ A > 0.

for all ! 2 [0,2º]d . When L1 = L∞2 for ∞ ∏ 0, then the auxiliary condition is automaticallysatisfied.

Proof. Since ØL1 ,ØL2 2 L1(Rd ), the same holds true for ØL (by Young’s inequality). Fromthe Fourier-domain definition (6.25) of the B-splines, we have

ØLi (!) =P

k2Zd di [k]e°jhk ,!i

bLi (!)=

bLd,i (!)bLi (!)

,

which implies that

ØL =F°1

(

bLd,1(!)bLd,2(!)bL1(!)bL2(!)

)

=F°1Ω

bLd(!)bL(!)

æ

= LdL°1±

with L°1 = L°12 L°1

1 and Ld = Ld,1Ld,2. This translates into the combined localization oper-ator Lds(r ) = P

k2Zd dL[k]s(r °k) with dL[k] = (d1 §d2)[k], which is factorizable by con-struction. To establish the existence of the upper Riesz bound for ØL, we perform themanipulation

AØL (!) =X

n2Zd

|ØL(!+2ºn)|2

=X

n2Zd

|ØL1 (!+2ºn)|2|ØL2 (!+2ºn)|2

∑√

X

n2Zd

|ØL1 (!+2ºn)| |ØL2 (!+2ºn)|!2

∑X

n2Zd

|ØL1 (!+2ºn)|2X

n2Zd

|ØL2 (!+2ºn)|2

∑ B 21 B 2

2 <+1

where the third line follows from the norm inequality kak`2 ∑ kak`1 and the fourth fromCauchy-Schwarz; B1 and B2 are the upper Riesz bounds of ØL1 and ØL2 , respectively. Theadditional condition in the proposition takes care of the lower Riesz bound.

6.4.3 Polynomial B-splines

The factorization property is directly applicable to the construction of the polynomialB-splines (we use the equivalent notation Øn

+ = ØDn+1 in Section 1.3.2) via the iteratedconvolution of a B-spline of degree 0. Specifically,

ØDn+1 (r ) = (ØD §ØDn )(r ) = (ØD § · · ·§ØD| {z }

n+1

)(r )

with ØD =Ø0+ = 1[0,1) and the convention that ØD0 =ØId = ±.

114




6.4.4 Exponential B-splines

More generally, one can consider a generic N th-order differential operator of the formPÆ = PÆ1 · · ·PÆN with parameter vectorÆ= (Æ1, . . . ,ÆN ) 2CN and PÆn = D°ÆnId. The cor-responding basis function is an exponential B-splines of order N with parameter vectorÆ, which can be decomposed as

ØÆ(r ) =°

ØÆ1 §ØÆ2 § · · ·§ØÆN

¢

(r ) (6.42)

where ØÆ = ØPÆ is the first-order exponential spline defined by (6.21). The Fourier-domain counterpart of (6.42) is

ØÆ(!) =NY

n=1

1°eÆn e°j!

j!°Æn, (6.43)

which also yields

ØÆ(r ) =¢ÆΩÆ(r ), (6.44)

where ¢Æ =¢Æ1 · · ·¢ÆN

°

with ¢Æ defined by (6.20)¢

is the corresponding N th-order local-ization operator (weighted differences) and ΩÆ the causal Green’s function of PÆ. Notethat the complex parameters Æn , which are the roots of the characteristic polynomialof PÆ, are the poles of the exponential B-spline, as seen in (6.43). The actual recipe forlocalization is that each pole is cancelled by a corresponding (2º-periodic) zero in thenumerator.

Based on the above equations, one can infer the following properties of the exponentialB-splines (see [UB05a] for a complete treatment of the topic):– They are causal, bounded, and compactly supported in [0, N ], simply because all ele-

mentary constituents in (6.42) are bounded and supported in [0,1).– They are piecewise-exponential with joining points at the integers and a maximal de-

gree of smoothness (spline property). The first part follows from (6.44) using the well-known property that the causal Green’s function of an N th-order ordinary differentialoperator is an exponential polynomial restricted to the positive axis. As for the state-ment about smoothness, the B-splines are Hölder-continuous of order (N °1). In otherwords, they are differentiable up to order (N ° 1) with bounded derivatives. This fol-lows from the fact that DØÆn (r ) = ±(r )°eÆn±(r °1), which implies that every additionalelementary convolution factor in (6.42) improves the differentiability of the resultingB-spline by one.

– They are the shortest elementary constituents of exponential splines (maximally loc-alized kernels) and they each generate a valid Riesz basis (by integer shifting) of thespaces of cardinal PÆ-splines if and only if Æn °Æm 6= j2ºk,k 2Z, for all distinct, purelyimaginary poles.

– They reproduce the exponential polynomials that are in the null space of the operatorPÆ, as well as any of its Green’s functions ΩÆ, which all happen to be special types ofPÆ-splines (with a minimum number of singularities).

– For Æ = (0, . . . ,0), one recovers Schoenberg’s classical polynomial B-splines of degree(N °1) [Sch46, Sch73], as expressed by the notational equivalence

Øn+(r ) =ØDn+1 (r ) =Ø(0, · · · ,0)

| {z }

n+1

(r ).

The system-theoretic interpretation is that the classical polynomial spline of degree nhas a pole of multiplicity (n + 1) at the origin: It corresponds to an (unstable) linearsystem that is an (n +1)-fold integrator.

115




There is also a corresponding B-spline calculus whose main operations are– Convolution by concatenation of parameter vectors:

(ØÆ1 §ØÆ2 )(r ) =Ø(Æ1:Æ2)(r ).

– Mirroring by sign change

ØÆ(°r ) =√

NY

n=1eÆn

!

Ø°Æ(r +N ).

– Complex-conjugationØÆ(r ) =ØÆ(r ).

– Modulation by parameter shifting

ej!0rØÆ(r ) =ØÆ+ j!0 (r )

with the convention that j = (j, . . . , j).Finally, we like to point out that exponential B-splines can be computed explicitly on acase-by-case basis using the mathematical software described in [Uns05, Appendix A].

6.4.5 Fractional B-splines

The fractional splines are an extension of the polynomial splines for all non-integer de-grees Æ > °1. The most notable members of this family are the causal fractional B-splines ØÆ+ whose basic constituents are piecewise-power functions of degree Æ [UB00].These functions are associated with the causal fractional derivative operator DÆ+1 whoseFourier-based definition is

D∞'(r ) =Z

R(j!)∞'(!)ej!r d!

2º

in the sense of generalized functions. The causal Green’s function of D∞ is the one-sidedpower function of degree (∞° 1) specified by (6.15). One constructs the correspondingB-splines through a localization process similar to the classical one, replacing finite dif-ferences by the fractional differences defined as

¢∞+'(r ) =

Z

R(1°e°j!)∞'(!)ej!r d!

2º. (6.45)

In that respect, it is important to note that (1°e°j!)∞ = (j!)∞+O(|!|2∞) which justifies thisparticular choice. By applying (6.25), we readily obtain the Fourier-domain representa-tion of the fractional B–splines

ØÆ+(!) =µ

1°e°j!

j!

∂Æ+1

(6.46)

which can then be inverted to provide the explicit time-domain formula

ØÆ+(r ) =¢Æ+1+ rÆ+

°(Æ+1)(6.47)

=1X

m=0(°1)m

√

Æ+1m

!

(r °m)Æ+°(Æ+1)

,

where the generalized fractional binomial coefficients are given by√

Æ+1m

!

= °(Æ+2)°(m +1)°(Æ+2°m)

= (Æ+1)!m! (Æ+1°m)!

.

116




What is remarkable with this construction is the way in which the classical B-spline for-mulas of Section 1.3.2 carry over to the fractional case almost literally by merely replacingn by Æ. This is especially striking when we compare (6.47) to (1.11), as well as the expan-ded versions of these formulas given below, which follow from the (generalized) binomialexpansion of (1°e°j!)Æ+1.

Likewise, it is possible to construct the (Æ,ø) extension of these B-splines. They are asso-ciated with the operators L = @Æ+1

ø √! (j!)Æ+1

2 +ø(°j!)Æ+1

2 °ø and ø 2R [BU03]. This familycovers the entire class of translation- and scale-invariant operators in 1-D (see Proposi-tion 14).

The fractional B-splines share virtually all the properties of the classical B-splines, includ-ing the two-scale relation, and can also be used to define fractional wavelet bases with anorder ∞ = Æ+1 that varies continuously. They only lack positivity and compact support.Their most notable properties are summarized below.– Generalization: For Æ integer, they are equivalent to the classical polynomial splines.

The fractional B-splines interpolate the polynomial ones in very much the same way asthe gamma function interpolates the factorials.

– Stability: All brands of fractional B-splines satisfy the Riesz-basis condition in Theorem21.

– Regularity: The fractional splines are Æ-Hölder continuous; their critical Sobolev expo-nent (degree of differentiability in the L2 sense) is Æ+1/2 (see Proposition 17).

– Polynomial reproduction: The fractional B-splines reproduce the polynomials of degreeN = dÆ+1e that are in the null space of the operator DÆ+1 (see Section 6.2.1).

– Decay: The fractional B-splines decay at least like |r |°Æ°2; the causal ones are com-pactly supported for Æ integer.

– Order of approximation: The fractional splines have the non-integer order of approx-imation Æ+1, a property that is rather unusual in approximation theory.

– Fractional derivatives: Simple formulas are available for obtaining the fractional deriv-atives of B-splines. In addition, the corresponding fractional spline wavelets essentiallybehave like fractional-derivative operators.

6.4.6 Additional brands of univariate B-splines

To be complete, we briefly mention some additional types of univariate B-splines thathave been investigated systematically in the literature.

- The generalized exponential B-splines of order N that cover the whole class of differ-ential operators with rational transfer functions[Uns05]. These are parameterized bytheir poles and zeros. Their properties are very similar to those of the exponential B-splines of the previous section, which are included as a special case.

- The Matérn splines of (fractional) order∞ and parameterÆ 2R+ with L = (D+ÆId)∞√!(j!+Æ)∞[RU06]. These constitute the fractionalization of the exponential B-spline witha single pole of multiplicity N .

In principle, it is possible to construct even broader families via the convolution of ex-isting components. The difficulty is that it may not always be possible to obtain explicitsignal-domain formulas, especially when some of the constituents are fractional.

6.4.7 Multidimensional B-splines

While the construction of B-splines is well understood and covered systematically in 1-D, the task becomes more challenging in multiple dimensions because of the inherentdifficulty of imposing compact support. Apart from the easy cases where the operator L

117




can be decomposed in a succession of 1-D operators (tensor-product B-splines and boxsplines), the available collection of multidimensional B-splines is much more restrictedthan in the univariate case. The construction of B-splines is still considered an art wherethe ultimate goal is to produce the most localized basis functions. The primary familiesof multidimensional B-splines that have been investigated so far are

- The polyharmonic B-splines of (fractional) order ∞ with L = (°¢)∞2 √!k!k∞ [MN90b,

Rab92a, Rab92b, VDVBU05].

- The box splines of multiplicity N ∏ d with L = Du1 · · ·DuN √! QN

n=1hj!,uni withkunk = 1, where D

un = hrrr,uni is the directional derivative along un [dBHR93]. Thebox splines are compactly supported functions in L1(Rd ) if and only if the set of orient-ation vectors {un}N

n=1 forms a frame of Rd .

We encourage the reader who finds the present list incomplete to work on expanding it.The good news for the present study is that the polyharmonic B-splines are particularlyrelevant for image-processing applications because they are associated with the class ofoperators that are scale- and rotation-invariant. They naturally come into play when con-sidering isotropic fractal-type random fields.

The principal message of this section is that B-splines—no matter the type—are localizedfunctions with an equivalent width that increases in proportion to the order. In general,the fractional brands and the non-separable multidimensional ones are not compactlysupported. The important issue of localization and decay is not yet fully resolved in higherdimensions. Also, since Lds =ØL §Ls, it is clear that the search for a “good” B-spline ØL isintrinsically related to the problem of finding an accurate numerical approximation Ld ofthe differential operator L. Looking at the discretization issue from the B-spline perspect-ive leads to new insights and sometimes to nonconventional solutions. For instance, inthe case of the Laplacian L = ¢, the continuous-domain localization requirement pointsto the choice of the 2D discrete operator ¢d described by the 3£3 filter mask

Isotropic discrete Laplacian:16

0

B

B

B

@

°1 °4 °1

°4 20 °4

°1 °4 °1

1

C

C

C

A

which is not the standard version used in numerical analysis. This particular set ofweights produces a much nicer, bell-shaped polyharmonic B-spline than the conven-tional finite-difference mask which induces significant directional artifacts, especiallywhen one starts iterating the operator [VDVBU05].

6.5 Generalized operator-like wavelets

In direct analogy with the first-order scenario in Section 6.3.3, we shall now take advant-age of the general B-spline formalism to construct a wavelet basis that is matched to somegeneric operator L.

6.5.1 Multiresolution analysis of L2(Rd )

The first step is to lay out a fine-to-coarse sequence of (multidimensional) L-spline spacesin essentially the same way as in our first-order example. Specifically,

Vi = {s(r ) 2 L2(Rd ) : Ls(r ) =X

k2Zd

ai [k]±(r °Dik)}

118



6.5. Generalized operator-like wavelets

where D is a proper dilation matrix with integer entries (e.g., D = 2I in the standard dyadicconfiguration). These spline spaces satisfy the general embedding relation Vi ∂ Vj fori ∑ j .

The reference space (i = 0) is the space of cardinal L splines which admits the standardB-spline representation

V0 = {s(r ) =X

k2Zd

c[k]ØL(r °k) : c 2 `2(Zd )},

where ØL is given by (6.25). Our implicit assumption is that each Vi admits a similar B-spline representation

Vi = {s(r ) =X

k2Zd

ci [k]ØL,i (r °Dik) : ci 2 `2(Zd )},

which involves the multiresolution generators ØL,i described in the next section.

6.5.2 Multiresolution B-splines and the two-scale relation

In direct analogy with (6.25), the multiresolution B-splines ØL,i are localized versions ofthe Green’s function ΩL with respect to the grid DiZd . Specifically, we have that

ØL,i (r ) =X

k2Zd

di [k]ΩL(r °Dik) = Ld,i L°1±(r ),

where Ld,i is the discretized version of L on the grid DiZd . The Fourier-domain counter-part of this equation is

ØL,i (!) =P

k2Zd di [k]e°jh!,Diki

bL(!). (6.48)

The implicit requirement for the multiresolution decomposition scheme to work is thatØL,i generates a Riesz basis. This needs to be asserted on a case-by-case basis.

A particularly favorable situation occurs when the operator L is scale-invariant withbL(a!) = |a|∞bL(!). Let i 0 > i be two multiresolution levels of the pyramid such thatDi 0°i = mI where m is a proportionality constant. It is then possible to relate the B-splineat resolution i 0 to the one at the finer level i via the simple dilation relation

ØL,i 0 (r ) /ØL,i (r /m).

This is shown by considering the Fourier transform of ØL,i (r /m), which is written as

|m|d ØL,i (m!) = |m|dP

k2Zd di [k]e°jh!,mDiki

bL(m!)

= |m|d°∞P

k2Zd di [k]e°jh!,Di 0°i Diki

bL(!)

= |m|d°∞P

k2Zd di [k]e°jh!,Di 0ki

bL(!),

and found to be compatible with the form of ØL,i 0 (!) given by (6.48) by taking di 0 [k] /di [k]. The prototypical scenario is the dyadic configuration D = 2I for which the B-splinesat level i are all constructed through the dilation of the single prototype ØL = ØL,0, sub-ject to the scale-invariance constraint on L. This happens, for instance, for the classicalpolynomial splines which are associated with the Fourier multipliers (j!)N .

119




A crucial ingredient for the fast wavelet-transform algorithm is the two-scale relation thatlinks the B-splines basis functions at two successive levels of resolution. Specifically, wehave that

ØL,i+1(r ) =X

k2Zd

hi [k]ØL,i (r °Dik),

where the sequence hi specifies the scale-dependent refinement filter. The frequencyresponse of hi is obtained by taking the ratios of the Fourier transforms of the corres-ponding B-splines as

hi (!) =ØL,i+1(!)

ØL,i+1(!)(6.49)

=P

k2Zd di+1[k]e°jh!,Di+1ki

P

k2Zd di [k]e°jh!,Diki

(6.50)

which is 2º(DT )°i periodic and hence defines a valid digital filter with respect to the spa-tial grid DiZd .

To illustrate those relations, we return to our introductory example in Section 6.1: theHaar wavelet transform, which is associated with the Fourier multipliers j! (derivative)and (1° e°j!) (finite-difference operator). The dilation matrix is D = 2 and the localiz-ation filter is the same at all levels because the underlying derivative operator is scale-invariant. By plugging those entities into (6.48), we obtain the Fourier transform of thecorresponding B-spline at resolution i as

ØD,i (!) = 2°i /2 1°ej2i!

j!,

where the normalization by 2°i /2 is included to standardize the norm of the B-splines.The application of (6.49) then yields

hi (!) = 1p

2

1°ej2i+1!

1°ej2i!

= 1p

2(1+ej2i!),

which, up to the normalization byp

2, is the expected refinement filter with coefficientsproportional to (1,1) that are independent upon the scale.

6.5.3 Construction of an operator-like wavelet basis

To keep the notation simple, we concentrate on the specification of the wavelet basis atthe scale i = 1 with W1 = span{√1,k }

k2Zd \DZd such that W1 ? V1 and V0 = V1 +W1, whereV0 = span{ØL(·°k)}

k2Zd is the space of cardinal L-splines.

The relevant smoothing kernel is the interpolation function 'int = 'int,0 for the space of

cardinal LH L-splines, which is generated by (Ø_L §ØL)(r ) (autocorrelation of the general-

ized B-spline). This interpolator is best described in the Fourier domain using the formula

'int(r ) =F°1

(

|ØL(!)|2P

n2Zd |ØL(!+2ºn)|2

)

(r ), (6.51)

120



6.5. Generalized operator-like wavelets

where ØL (resp., ØL) is the Fourier transform of the generalized B-spline ØL (resp., Ø_L ). It

satisfies the fundamental interpolation property

'int(k) = ±[k] =

8

<

:

1, for k = 0

0, for k 2Zd \{0}.(6.52)

The existence of such a function is guaranteed whenever ØL = ØL,0 is an admissible B-spline. In particular, the Riesz basis condition (6.26) implies that the denominator of'int(!) in (6.51) is non-vanishing.

The sought-after wavelets are then constructed as √1,k (r ) =√L(r °k)/k√LkL2 , where theoperator-like mother wavelet √L is given by

√L(r ) = LH'int(r ), (6.53)

where LH is the adjoint of L with respect to the Hermitian-symmetric L2 inner product.Also, note that we are removing the functions located on the next coarser resolution gridDZd associated with V1 (critically sampled configuration).

The proof of the following result is illuminating because it relies heavily on the notion ofduality which is central to our whole argumentation.

Proposition 19. The operator-like wavelet √L = LH'int satisfies the property hs1,√L(·°k)iL2 = hs1,√L(·°k)i = 0,8k 2 Zd \DZd for any spline s1 2 V1. Moreover, it can be writtenas √L(r ) = LH

d ØL(r ) = P

k2Zd dL[°k]ØL(r °k) where {ØL(·°k)}k2Zd is the dual basis of V0

such that hØL(·°k), ØL(·°k

0)iL2 = ±[k°k

0]. This implies that W1 = span{√1,k }k2Zd \DZd ΩV0

and W1 ?V1.

Proof. We pick an arbitrary spline s1 2V1 and perform the inner-product manipulation

hs1,√L(·°k0)iL2 = hs1,L§{'int(·°k0)}i (by shift-invariance of L)

= hLs1,'int(·°k0)i (by duality)

= hX

k2Zd

a1[k]±(·°Dk),'int(·°k0)i (by definition of V1)

=X

k2Zd

a1[k]'int(Dk °k0). (by definition of ±)

Due to the interpolation property of 'int, the kernel values in the sum are vanishing ifDk °k0 2Zd \{0} for all k 2Zd , which proves the first part of the statement.

As for the second claim, we consider the Fourier-domain expression of √L:

√L(!) = bL(!)'int(!) = bL(!) ØL(!) bØL(!)

where

bØL(!) = ØL(!)P

n2Zd |ØL(!+2ºn)|2

is the Fourier transform of the dual B-spline ØL. The above factorization implies that'int(r ) = (ØL§Ø

_L )(r ), which ensures the biorthonormality hØL(·°k),ØL(·°k

0)iL2 = ±[k] ='int(k °k

0) of the basis functions.

Finally, by replacing ØL(!) by its explicit expression (6.25), we show that bL(!)ØL(!) =bLd(!), where bLd(!) =P

k2Zd dL[k]e°hk ,!i is the frequency response of the (discrete) oper-

ator Ld. This implies that √L = bLdbØL, which is the Fourier equivalent of √L = LH

d ØL.

121




This interpolation-based method of construction is applicable to all the wavelet sub-spaces Wi and leads to the specification of operator-like Riesz bases of L2(Rd ) under rel-atively mild assumptions on L [KUW]. Specifically, we have that Wi = span{√i ,k }

k2Zd \DZd

with

√i ,k (r ) / LH'int,i°1(r °Di°1k), (6.54)

where 'int,i°1 is the LH L-spline interpolator on the grid Di°1Zd . The fact that the in-terpolator is specified with respect to the grid of the next finer spline space Vi°1 =span{ØL,i°1(·°Di°1

k)}k2Zd is essential to ensure that Wi Ω Vi°1. This kernel satisfies the

fundamental interpolation property

'int,i°1(Di°1k) = ±[k], (6.55)

which results in Wi being orthogonal to Vi = span{ØL,i (·°Dik)}

k2Zd (the reasoning is thesame as in the proof of Proposition 19 which covers the case i = 1). For completeness, wealso provide the general expression of the Fourier transform of 'int,i ,

'int,i (!) = |det(D)|i|ØL,i (!)|2

P

n2Zd

Ø

ØØL,i°

!+2º(DT )°in

¢

Ø

Ø

2

= |det(D)|iØ

Ø

bLd,i (!)Ø

Ø

2/Ø

Ø

bL(!)Ø

Ø

2

Ø

Ø

bLd,i (!)Ø

Ø

2/P

n2Zd

Ø

Ø

bL°

!+2º(DT )°in

¢

Ø

Ø

2

= |det(D)|i |bL(!)|°2

P

n2Zd

Ø

Ø

bL°

!+2º(DT )°in

¢

Ø

Ø

°2 , (6.56)

which can be used to show that LH'int,i (·°Dik) / LH

d,i ØL,i (·°Dik) 2Vi for any k 2Zd .

While we have seen that this scheme produces an orthonormal basis for the first-order operator PÆ in Section 6.3.3, the general procedure does only guarantee semi-orthogonality. More precisely, it ensures the orthogonality between the wavelet sub-spaces Wi . If necessary, one can always fix the intra-scale orthogonality a posteriori byforming appropriate linear combinations of wavelets at a given resolution. The result-ing orthogonal wavelets will still be L-admissible in the sense of Definition 24. However,for d > 1, intra-scale orthogonalization is likely to spoil the simple, convenient struc-ture of the above construction which uses a single generator per scale, irrespective ofthe number of dimensions. Indeed, the examples of multidimensional orthogonal wave-let transforms that can be found in the literature—either separable, or non-separable—systematically involve M = (det(D)° 1) distinct wavelet generators per scale. Moreover,unlike the present operator-like wavelets, they do generally not admit an explicit analyt-ical description.

In summary, wavelets generally behave like differential operators and it is possible tomatch them to a given class of stochastic processes. The wavelet transforms that are cur-rently most widely used in applications act as multiscale derivatives or Laplacians. Theyare therefore best suited for the representation of fractal-type stochastic processes thatare defined by scale-invariant SDEs [TVDVU09].

The general theme that emerges is that a signal transform will behave appropriately if ithas the ability to suppress the signal components (polynomial or sinusoidal trends) thatare in the null space of the whitening operator L. This will result in a stationarizing effectthat is well-documented in the Gaussian context [Fla89, Fla92]. This is the fundamentalreason why vanishing moments are so important.

122



6.6. Bibliographical notes

6.6 Bibliographical notes

Section 6.1

Alfred Haar constructed the orthogonal Haar system as part of his Ph.D. thesis, which hedefended in 1909 under the supervision of David Hilbert [Haa10]. From then on, the Haarsystem remained relatively unnoticed until it was revitalized by the discovery of wave-lets nearly one century later. Stéphane Mallat set the foundation of the multiresolutiontheory of wavelets in [Mal89] with the help of Yves Meyer, while Ingrid Daubechies con-structed the first orthogonal family of compactly supported wavelets [Dau88]. Many ofthe early constructions of wavelets are based on splines [Mal89, CW91, UAE92, UAE93].The connection with splines is actually quite fundamental in the sense that all multiresol-ution wavelet bases, including the non-spline brands such as Daubechies’, necessarilyinclude a B-spline as a convolution factor—the latter is responsible for their primarymathematical properties such as vanishing moments, differentiability, and order of ap-proximation [UB03]. Further information on wavelets can be found in several textbooks[Dau92, Mey90, Mal09].

Section 6.2

Splines constitute a beautiful topic of investigation in their own right with hundreds ofpapers specifically devoted to them. The founding father of the field is Schoenberg who,during war time, was asked to develop a computational solution for constructing an ana-lytic function that fits a given set of equidistant noisy data points [Sch88]. He came upwith the concept of spline interpolation and proved that polynomial spline functions havea unique expansion in terms of B-splines [Sch46]. While splines can also be specified fornonuniform grids and extended in a variety of ways [dB78, Sch81], the cardinal settingis especially pleasing because it lends itself to systematic treatment with the aid of theFourier transform [Sch73]. The relation between splines and differential operators wasrecognized early on and led to the generalization known as L-splines [SV67].

The classical reference on partial differential operators and Fourier multipliers is [Hör80].A central result of the theory is the Malgrange-Ehrenpreiss theorem [Mal, Ehr54], and itsextension stating that the convolution with a compactly supported generalized functionis invertible [Hör05].

The concept of a Riesz basis is standard in functional analysis and approximation theory[Chr03]. The special case where the basis functions are integer translates of a single gen-erator is treated in [AU94]. See also [Uns00] for a review of such representations in thecontext of sampling theory.

Section 6.3

The first-order illustrative example is borrowed from [UB05a, Figure 1] for the construc-tion of the exponential B-spline, and from [KU06, Figure 1] for the wavelet part of thestory.

Section 6.4

The 1-D theory of cardinal L-splines for ordinary differential operators with constantcoefficients is due to Micchelli [Mic76]. In the present context, we are especially con-cerned with ordinary differential equations, which go hand-in-hand with the extendedfamily of cardinal exponential splines [Uns05]. The properties of the relevant B-splines

123




are investigated in full detail in [UB05a], which constitutes the ground material for Sec-tion 6.4. A key property of B-splines is their ability to reproduce polynomials. It is en-sured by the Strang-Fix conditions (6.37) which play a central role in approximation the-ory [dB87, SF71]. While there is no fundamental difficulty in specifying cardinal-splineinterpolators in multiple dimensions, it is much harder to construct compactly suppor-ted B-splines, except for the special cases of the box splines [DBH82, dBHR93] and ex-ponential box splines [Ron88]. For elliptic operators such as the Laplacian, it is pos-sible to specify exponentially decaying B-splines, with the caveat that the constructionis not unique [MN90b, Rab92a, Rab92b]. This calls for some criterion to identify themost-localized solution [VDVBU05]. B-splines, albeit non-compactly supported ones,can also be specified for fractional operators [UB07]. This line of research was initiatedby Unser and Blu with the construction of the fractional B-splines [UB00]. As suggestedby the name, the (Gaussian) stochastic counterparts of these B-splines are Mandelbrot’sfractional Brownian motions [MVN68], as we shall see in Chapters 7 and 8. The associ-ation is essentially the same as the connection between the B-spline of degree 0 (rect) andBrownian motion, or by extension, the whole family of Lévy processes (see Section 1.3).

Section 6.5

de Boor et al. were among the first to extend the notion of multiresolution analysis beyondthe idea of dilation and to propose a general framework for constructing “non-stationary”wavelets [dBDR93]. Khalidov and Unser proposed a systematic method for constructingwavelet-like basis functions based on exponential splines and proved that these waveletsbehave like differential operators [KU06]. The material in Section 6.5 is an extension ofthose ideas to the case of a generic Fourier-mutiplier operator in multiple dimensions;the full technical details can be found in [KUW]. Operator-like wavelets have also beenspecified within the framework of conventional multiresolution analysis; in particular,for the Laplacian and its iterates [VDVBU05, TVDVU09] and for the various brands of 1-Dfractional derivatives [VDVFHUB10], which have the common property of being scale-invariant. Finally, we mention that each exponential-spline wavelet has a compactly sup-ported Daubechies’ counterpart that is orthogonal and operator-like in the sense of hav-ing the same vanishing exponential moments [VBU07].

124


An introduction to sparse stochastic processes · to the pioneering work of Paul Lévy, who...

Documents

Transcript of An introduction to sparse stochastic processes · to the pioneering work of Paul Lévy, who...