Sparse Sampling : Theory, Algorithms and Applications · Is there a sampling theory beyond Shannon?...

Sparse Sampling: Theory, Algorithms and Applications

Tutorial ICASSP 2009

Thierry Blu (CUHK), Pier-Luigi Dragotti (ICL), Pina Marziliano (NTU), Martin Vetterli (EPFL & UCB)

Tutorial Outline IntroductionPart 1: Overview (Martin Vetterli )

– Classical sampling revisited– Generalizations of classical sampling – Sparse sampling: FRI, CS– Diverse applications

Part 2: Sampling Signals with Finite Rate of Innovation (Pina Marziliano)– Definitions – Periodic stream of Dirac pulses – Applications: Compression of EEG signals, removal of shot noise in band-limited signals

Part 3: Recovery of noisy sparse signals (Thierry Blu)– Acquisition in the noisy case– Denoising strategies– Cramer-Rao bounds – Applications : Optical Coherence Tomography

Part 4: Variations and extensions (Pier-Luigi Dragotti)– Alternative sampling kernels– Sampling FRI signals with polynomial reproducing kernels – Application: Image super-resolution

Summary and Conclusions , Q/A

ICASSP 2009


Part 1



Spring-09 - 2

Acknowledgements

• Sponsors– NSF Switzerland: thank you!

• Collaborations– C.Lee, B.Aryan, D.Julian, S.Rangan Qualcomm– J. Kusuma, MIT/Schlumberger– I. Maravic, Deutsche Bank– P. Prandoni, Quividi

• Discussions and Interactions– V. Goyal, MIT– A.Fletcher, UCB– M. Unser, EPFL

Spring-09 - 3

Outline

1. IntroductionShannon and beyond…

2. Signals with finite rate of innovationShift-invariant spaces and Poisson spikesSparsity in CT and DT

3. Sampling signals at their rate of innovationThe basic set up: periodic set of diracs, sampled at Occam’s rateVariations on the theme: Gaussians, splines and Strang-Fix

4. The noisy caseTotal least squares and Cadzow

5. Sparse sampling, compressed sensing, and compressive samplingSparsity, fat matrices, and framesNorms and algorithms

6. Applications and technologyWideband communications, UWB, superresolutionTechnology transfer

7. Discussion and conclusions

Spring-09 - 4

The Question!

You are given a class of objects, a class of functions (e.g. BL)You have a sampling device, as usual to acquire the real world• Smoothing kernel or lowpass filter• Regular, uniform sampling• That is, the workhorse of sampling!

Obvious question:

When does a minimum number of samples uniquely specify the function?

Spring-09 - 5

The Question!

About the observation kernel:Given by nature• Diffusion equation, Green function

Ex: sensor networksGiven by the set up• Designed by somebody else, it is out there

Ex: Hubble telescopeGiven by design• Pick the best kernel

Ex: engineered systems, but constraints

About the sampling rate:Given by problem• Ex: sensor networksGiven by design• Usually, as low as possible

Ex: digital UWB

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Spring-09 - 6

1. Shannon and Beyond… 1948 Foundation paper

If a function x(t) contains no frequencies higher than W cps, it is completely determined by giving its ordinates at a series of points spaced 1/(2W) seconds apart.[if approx. T long, W wide, 2TW numbers specify the function]

It is a representation theorem:• sinc(t-n) is an orthogonal basis for BL[-π,π]• x(t) ∈ BL[-π,π] can be written as

Note:• … slow convergence of the series• Shannon-BW, BL sufficient, not necessary

BLx

Spring-09 - 7

Shannon’s Theorem… a bit of History

Whittaker 1915, 1935Nyquist 1928Kotelnikov 1933Raabe 1938Gabor 1946Shannon 1948So… you have never heard of Herbert Raabe?

– Wrote his PhD on the sampling theorem…– Wrong time– Wrong country– Wrong lab…

Spring-09 - 8

Shannon’s Theorem: Variations on the subspace theme

• Non uniform– Kadec 1/4 theorem

• Derivative sampling– Sample signal and derivative…

… at half the rate• Stochastic

– Power spectrum is good enough• Shift-invariant subspaces

– Generalize from sinc to othershift-invariant Riesz bases (ortho. and biorthogonal)

– Structurally, it is the same thing!• Multichannel sampling

– Known shift: easy– Unknown shift: interesting

(superresolution imaging)

Spring-09 - 9

A Sampling Nightmare….

Shannon BL case

or 1/T degrees of freedom per unit time

• But: a single discontinuity, and no more sampling theorem…

• Are there other signals with finite number of degrees of freedom per unit of time that allow exact sampling results?

Spring-09 - 10

Examples of non-bandlimited signals

Spring-09 - 11

1. Shannon and Beyond…

• Is there a sampling theory beyond Shannon?– Shannon: bandlimitedness is sufficient but not necessary– Shannon bandwidth: dimension of subspace– Shift-invariant spaces popular with wavelets etc

• Thus, develop a sampling theory for classes ofnon-bandlimited but sparse signals!

t

x(t)

t0x1

t1x0

…

xN-1

tN-1

Generic, continuous-time sparse signal

Spring-09 - 12

Related work

Spectral estimation, Fourier analysis, sampling– Prony 1795– classic high resolution spectral estimation (Stoica/Moses)– uncertainty principle (Donoho and Stark, 1989)– spectrum blind sampling (Bresler et al, 1996, 1999)

Error correction coding– Errors are sparse…Reed-Solomon, Berlekamp-Massey

Randomized algorithms for fast estimation– Faster Fourier transforms, Gilbert et al 2005– Sketches of databases, Indyk et al

Compressed sensing, compressive sampling– Gribonval, Nielsen 2003, Donoho, Elad 2003–Donoho et al, 2006, Candes et al, 2006– etc!

Spring-09 - 13

Outline

1. Introduction2. Signals with finite rate of innovation

Poisson spikesRate of innovation ρThe set up and the key questions

3. Sampling signals at their rate of innovation4. The noisy case5. Sparse sampling, compressed sensing, and compressive

sampling6. Applications and technology7. Discussion and conclusions

Spring-09 - 14

2. Sparsity and Signals with Finite Rate of Innovation

Sparsity:CT: parametric class, with degrees of freedom

(e.g. K diracs, or 2K degrees of freedom)DT: N dimensional vector x, and its l0 norm K = ||x||0.

Then K/N indicates sparsityCompressibility:• Object expressed in an ortho-basis has fast decaying NLA

For ex., decay of ordered wavelet coefficients in

ρ : Rate of innovation or degrees of freedom per unit of timeCall CT the number of degrees of freedom in the interval [-T/2,T/2], then

Note: Relation to information rate in Shannon-48

( ), 1O k

Spring-09 - 15

2. Signals with Finite Rate of Innovation

Ex: Poisson Process: a set of Dirac pulses, |tk - tk+1| exp. distrib

Innovations: times {tk}, rate of innovation average # of Diracs per sec.Call CT the number of Diracs in the interval [-T/2,T/2], then

Poisson process: average time is 1/λ, thus ρ = λ

Weighted Diracs:

Obvious question: is there a sampling theorem at ρ samples per sec?Necessity: Obvious! Sufficiency?

Spring-09 - 16

2. Signals with Finite Rate of Innovation

The set up:

Where x(t): signal, φ(t): sampling kernel, y(t): filtering of x(t) and yn: samples

For a sparse input, like a weighted sum of Diracs (in DT and CT)• When is there a one-to-one map yn ⇔ x(t)?• Efficient algorithm?• Stable reconstruction?• Robustness to noise?• Optimality of recovery?

Spring-09 - 17

Outline

1. Introduction2. Signals with finite rate of innovation3. Sampling signals at their rate of innovation

The basic set up: periodic set of diracs, sampled at their rate of innovationA representation theorem [VMB:02]Toeplitz systemAnnihilating filter and root findingTotal least squares and SVDVariations

4. The noisy case5. Sparse sampling, compressed sensing, and compressive sampling6. Applications and technology7. Discussion and conclusions

Spring-09 - 18

K Diracs on the interval: 2K degrees of freedom. Periodic case:

• Key: The Fourier series is a weighted sum of K exponentials

• Result: taking 2K+1 samples from a lowpass version of bandwidth (2K+1) allows to perfectly recover x(t)

An example and overview

t

x(t)

τtk

xk

Spring-09 - 19

A Representation Theorem [VMB:02]

TheoremPeriodic stream of K Diracs, of period τ, weights {xk} and locations {tk}.

Take a sampling kernel of bandwidth B, with Bτ an odd integer > 2K

or a Dirichlet kernel. Then the N samples, N > Bτ, T = τ/Ν ,

are a sufficient characterization of x(t).

Spring-09 - 20

Note: Linear versus Non-Linear Problem

Problem is non-linear in tk, and linear in xk given tkConsider two such streams of K Diracs, of period τ,

and weights and locations {xk,tk} and {x’k,t’k}, respectivelyThe sum is in general a stream with 2K Diracs.

But, given a set of locations {tk} then the problem is linear in {xk}.

The key to the solution:Separability of non-linear from linear problem

Note: Finding locations is a key task in estimation/retrieval of sparse signals, but also in registration, feature detection etc

Spring-09 - 21

Notes on Proof

The procedure is constructive, and leads to an algorithm:1. Take 2K+1 samples yn from the Dirichlet kernel output2. Compute the DFT to obtain the Fourier series coefficients -K..K3. Solve a Toeplitz system of equation of size K by K to get H(z)4. Find the roots of H(z) by factorization, to get uk and tk5. Solve a Vandermonde system of equation of size K by K to get xk.

The complexity is:1. Analog to digital converter2. K Log K3. K2

4. K3 (can be accelerated)5. K2

Or polynomial in K!Note: For size N vector, with K Diracs, O(K3) complexity, noiseless

Spring-09 - 22

Generalizations [VMB:02]

For the class of periodic FRI signals which includes– Sequences of Diracs– Non-uniform or free knot splines– Piecewise polynomials

There are sampling schemes with sampling at the rate of innovation with perfect recovery and polynomial complexity

• Variations: finite length, 2D, local kernels etc

Spring-09 - 23

Generalizations [VMB:02]

Sinc

Gaussian

Splines

Spring-09 - 24

Generalizations [DVB:07]

The return of Strang-Fix!

Local, polynomial complexity reconstruction, for diracs and piecewise polynomials

Pure discrete-time processing!

Spring-09 - 25

Generalizations: 2D extensions [MV:06]

Note: true 2D processing!

Spring-09 - 26

Outline

1. Introduction2. Signals with finite rate of innovation3. Sampling signals at their rate of innovation4. The noisy case

Total least squaresCadzow or model based denoisingUncertainty principles on location and amplitude of DiracsAlgorithmsExamples

5. Sparse sampling, compressed sensing, and compressive sampling6. Applications and technology7. Discussion and conclusions

Spring-09 - 27

The noisy case…

Acquisition in the noisy case:

where ‘’analog’’ noise is before acquisition (e.g. communication noise on a channel) and digital noise is due to acquisition (ADC, etc)

Example: Ultrawide band (UWB) communication….

Spring-09 - 28

The noisy case: The non-linear estimation problem

Data:

where: T=τ/N, ϕ(t) Dirichlet kernel, N/(ρ τ) redundancy

Estimation: For

Find the set such that is minimized

This is: a non-linear estimation problem in {tk},a linear estimation problem in {xk} given {tk}.

Spring-09 - 29

Outline

1. Introduction2. Signals with finite rate of innovation3. Sampling signals at their rate of innovation4. The noisy case5. Sparse sampling, compressed sensing, and compressive sampling

Discrete, finite size set up: sparse vectorsFat matrices and framesUnicity and conditioningReconstruction algorithmsCompressed sensing and compressive sampling

6. Applications and technology7. Discussion and conclusions

Spring-09 - 30

Sparsity revisited… Discrete, finite-dimensional

Consider a discrete-time, finite dimensional set upModel:

– World is discrete, finite dimensional of size N– x ∈ RN, but |x|0 = K << N : K sparse in a basis Φ

Method:– Take M measurements, where K < M << N– Measurement matrix F: a fat matrix of size M x N

SparsityBasis

Measurement MatrixF

Mx1measurements N x 1

Sparse signal

K Non-zeros

Spring-09 - 31

Geometry of the problem

This is a vastly under-determined system…– F is a frame matrix, of size M by N, M << N– Key issue: the (N|K) submatrices (K out of the N) columns of F– Consider the set of submatrices {Fk}, k=1…(N|K)– Calculate projection of y onto range of Fk,

– If , possible solution– In general, choose k such that

– Note: this is hopeless in general ;)

Necessary conditions (for most inputs, or prob. 1)- M > K- all Fk must have rank K- all ranges of Fk must be different

Spring-09 - 32

Geometry of the problem

Easy solution… Vandermonde matrices!– Each submatrix Fk is of rank K and spans a different subspace

– Particular case: Fourier matrix or DFT matrix, or a slice of it

Spring-09 - 33

The power of random matrices! Donoho, Candes et al

Set up: x ∈ RN, but |x|0 = K << N : K sparse, F of size M by NMeasurement matrix with random entries (gaussian, bernoulli)

– Pick M = O(K log N/K) – With high probability, this matrix is good!

Condition on matrix F– Uniform uncertainty principle or restricted isometry property

All K-sparse vectors x satisfy an approx. norm conservation

Reconstruction Method:– Solve linear program

under constraintStrong result: l1 minimization finds, with high probability, sparse

solution, or l0 and l1 problems have the same solution

Spring-09 - 34

The power of random matrices!

Notes:Works for sparse in any basis

– Just replace F by F Θ, where Θ is the sparse basisWorks for approximately sparse in basis

– For ex. fast decay of wavelet coefficientsWorks for noisy case

– LP will work by picking large coefficients (like wavelet denoising)

Price– Unstructured matrices (e.g. Fx can be expensive)– Some redundancy (log N factor)

Gain– Universality of measurement– General set up

Spring-09 - 35

Relation to compressed sensing: Differences

Sparse Sampling of Signal Innovations+ Continuous or discrete, infinite or finite dimensional+ Lower bounds (CRB) provably optimal reconstruction+ Close to “real” sampling, deterministic– Not universal, designer matrices

Compressed sensing+ Universal and more general± Probabilistic, can be complex– Discrete, redundant

The real game:In the space of frame measurements matrices F• Best matrix? (constrained grassmanian manifold)• Tradeoff for M: 2K, KlogN, Klog2N• Complexity (measurement, reconstruction)

Spring-09 - 36

Outline

1. Introduction2. Signals with finite rate of innovation3. Sampling signals at their rate of innovation4. The noisy case5. Sparse sampling, compressed sensing, and compressive sampling6. Applications and technology

SuperresolutionJoint sparsity estimationWideband communications, ranging, channel estimationAcoustic tomographyA technology transfer story

7. Discussion and conclusions

Spring-09 - 37

Applications: The proof of the pie is in …

1. Look for sparsity!– In either domain, in any basis– Different singularities– Model is key– Physics might be at play

2. Choose operating point, algorithm, – Low noise, high noise?– Model precise of approximate?– Choice of kernel?– Algorithmic complexity? Iterative?

3. It is not a black box….– It is a toolbox– Handcraft the solution– No free lunch!

Spring-09 - 38

Applications: Joint Sparsity Estimation

Two non-sparse signals, but related by sparse convolution– Often encountered in practice– Distributed sensing– Room impulse measurement

Question: What is the sampling rate region?Similar to Slepian-Wolf problem in information theory

Spring-09 - 39


Sampling rate region:– Universal (all signals): no gain!– Almost surely (undecodable signals of measure zero)

Universal recovery

Almost sure recovery

Annihilating filter method

[Roy,Hormati, Lu et al, ICASSP09]

Spring-09 - 40


Ranging, room impulse response, UWB:– Know signal x1, low rate acquisition of x2

Spring-09 - 41

Conclusions and Outlook

Sampling at the rate of innovation– Cool!– Sharp theorems– Robust algorithms– Provable optimality over wide SNR ranges

Many actual and potential applications– Fit the model (needs some work)– Apply the ‘’right’’ algorithm– Catch the essence!

Still a number of good questions open, from the fundamentalto the algorithmic and the applications

A slogan: Sort the wheat from the chaff… at Occam’s rate!

Spring-09 - 42

Publications

Special issue on Compressive Sampling:– R.Baraniuk, E.Candes, R.Nowak and M.Vetterli

(Eds.)IEEE Signal Processing Magazine, March 2008.10 papers overviewing the theory and practice of Sparse Sampling, Compressed Sensing and Compressive Sampling (CS).

Basic paper:– M.Vetterli, P. Marziliano and T. Blu, “Sampling

Signals with Finite Rate of Innovation,” IEEE Transactions on Signal Processing, June 2002.

Main paper, with comprehensive review:– T.Blu, P.L.Dragotti, M.Vetterli, P.Marziliano, and

L.Coulot,“Sparse Sampling of Signal Innovations: Theory, Algorithms and Performance Bounds,” IEEE Signal Processing Magazine, Special issue on Compressive Sampling, March 2008.

Spring-09 - 43

Publications

For more details:– P.L. Dragotti, M. Vetterli and T. Blu, “Sampling Moments and

Reconstructing Signals of Finite Rate of Innovation: Shannon Meets Strang-Fix,” IEEE Transactions on Signal Processing, May 2007.

– P. Marziliano, M. Vetterli and T. Blu, Sampling and exact reconstruction of bandlimited signals with shot noise, IEEE Transactions on Information Theory, Vol. 52, Nr. 5, pp. 2230-2233, 2006.

– I. Maravic and M. Vetterli, Sampling and Reconstruction of Signals with Finite Rate of Innovation in the Presence of Noise, IEEE Transactions on Signal Processing, Aug. 2005.

– I. Maravic and M. Vetterli, Exact Sampling Results for Some Classes of Parametric Non-Bandlimited 2-D Signals, IEEE Transactions on Signal Processing, Vol. 52, Nr. 1, pp. 175-189, 2004.

Spring-09 - 44

Thank you for your attention!

http://panorama.epfl.ch

Questions?


Part 2



Outline

• Mathematical background – Dirac pulses and applications– Fourier series, Z-transform– Toeplitz and Vandermonde matrices– Classical sampling theorem

• Signals with Finite Rate of Innovation– Definition and examples

• Periodic stream of Dirac pulses– Sampling and reconstruction scheme: step by step

• Applications– Modeling and compression of neonatal EEG seizure signals– Removal of shot noise in band-limited signals– Demo

ICASSP 2009 - 2

Signal Representations

1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

1822 Jean Baptiste Joseph Fourier: Fourier Transform1872 Ludwig Boltzmann: Entropy of a single gas particle1915 E. T. Whittaker: Functions and their expansions in shift orthonormal spaces1928 Harry Nyquist: Distortionless transmission in band-limited channels1929 Norbert Weiner: Hermitian polynomials and Fourier analysis1937 E. U. Condon: Fractional Fourier Kernel1948 C. E. Shannon: Sampling theorem and theory of communication

1985 The Wavelet Era2000 The POST Wavelet Era: Sparse Signal Representations/Compressive Sampling

ICASSP 2009 - 3

Definition

( ) ( ) (0)t t dt

(Paul) Dirac’s-Delta Function (Unit Impulse Function)

Paul

Dira

c

Self-Fourier Function Property of Stream of Diracs

Delta Function is continuous analogue of the discrete Kronecker delta.

ICASSP 2009 - 4

2 2( ) ( )FT T

n k

t nT w k

0( )

0 0

tt

t

The mathematical model of the multipath can be presented using a stream of Diracs.

t

Example : Multipath Signal Propagation

ICASSP 2009 - 5

Complex Amplitude

1

0

( ) ( ) ( ) ( )n

N

nn

iney t h t x t x t

tn

nine

1N

1

0

( ) ( )n

Ni

n nn

h t e t

N –

mul

tipat

h

( )x t

( )h t

n

niφ

n

: No. of possible pathsτ : Time delay of n channelλ e : Complex amplitude of n received pulse

th

th

N

Example : Pulse Position ModulationSignal of Interest Pulse Amplitude Modulated Signal

Unmodulated Pulse Train Pulse Position Modulated Signal

Modulation of a pulse carrier wherein the value of each instantaneous sample of a modulating wave varies the position in time of a pulse relative to its unmodulated time of occurrence.

ICASSP 2009 - 6

Useful matricesDefn.: A matrix is Toeplitz if each diagonal has a constant value :

Defn. : A matrix is Vandermonde if each row is a power of each other :

Note: • A Vandermonde system of equations admits a unique solution when

ICASSP 2009 - 7

1 2 32 2 2 21 2 3

1 2 3

1 1 1 1

K

Vandermonde K

K K K KK

a a a aa a a a

a a a a

=

A

for all m na a m n≠ ≠

0 1 2

1 0 1 1

2 1 0 2

1 2 0

K

K

Toeplitz K

K K K

a a a aa a a aa a a a

a a a a

− −

− − −

− − + − +

=

A

Review : Fourier and Z-transform domain

ICASSP 2009 - 8

Time domain Transform domainNon-periodic Fourier Transform (FT or CTFT)

- Periodic Fourier Series (FS or CTFS)

Non-periodic Discrete-time Fourier Transform (DTFT)

Non-periodic Z-transform

- Periodic Discrete Fourier Series (DFS)

- Finite length Discrete Fourier Transform (DFT)

1( ) ( )

2i t

c cx t X e d

( ) ( ) ,i t

c cX x t e dt

2 /0 0

ˆ( ) , [ , ]i tmp m

m

x t x e t t t

0

0

2 /1ˆ ( ) ,

ti mt

m ptx x t e dt m

( ) [ ] , [ , ]i i n

n

X e x n e

12 /

0

[ ] , 0, , 1N

i kn Nk p

n

c x ne k N

1[ ] ( ) ,

2i i nx n X e e d n

12 /

0

1[ ] , 0, , 1

Ni nk N

p kk

x n c e n NN

τ

N

Con

tinuo

us-ti

me

Dis

cret

e-tim

e

N

12 /

0

1[ ] [ ] , 0, , 1

Ni nk N

k

x n X k e n NN

12 /

0

[ ] [ ] , 0, , 1N

i kn N

n

X k x ne k N

( ) [ ] ,n

n

X z x n z z

11[ ] ( ) ,

2n

Cx n X z z dz n

i

Thm. 1:

Classical sampling theorem

ICASSP 2009 - 9

[ , ]

2

A bandlimited signal ( ) :{ ( ) 0, > }

is perfectly recovered from a set of samples ( ) taken apartif the sampling rate is greater than or equal to two times the maximum frequ

M M M

s T

x t BL X

x nT Tω ω

π

ω ω ω

ω

−∈ =

=

ency : 2

Ms M T π

ωω ω≥ ⇔ ≤

ICASSP 2009 - 10

Sampling and reconstruction of band-limited signalsTime domain

Frequency domain

( )n

t nT

( )ny x nT( )x t sinc ( ) sinc( )nn

x t y t nT

( )s sn

k

( )X

1( ) ( )s s

k

Y X kT

rect ( ) ( )sX TY

What about non band-limited signals?

( )n

t nT

( )x t

sinc

( )y t( ) ( ), sinc( )ny y nT x t t nT sinc

ICASSP 2009 - 11

In practice, we force a band-limit !

Reconstruction not perfect!

Defn.: Rate of innovation number of degrees of freedom per unit of time

A signal with finite rate of innovation has a parametric representation and

Example : Consider band-limited to , then

is exactly defined by the samples

Signals with Finite Rate of Innovation

ICASSP 2009 - 12

( )( ) sincnn

x t x Bt n∈Ζ

= −∑[ ]2 2,B B−)(tx

1 degrees of freedom per secondT

B

( ){ sinc , ( ) ( )}nx B Bt n x t x nT= − =

0 t

0 t

0 t

Examples of signals with FRI

Piecewise constant :

ICASSP 2009 - 13

( )x t

2(2)

2( ) ( )dx t x tdt

=

Piecewise linear :

Stream of Dirac pulses :

4 weights 4 locations4 Diracs :

(1) ( ) ( )dx t x tdt

=

More examples of FRI signals

t0 t

1

c0

c1

c2

t2

tK-1

cK-1

0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.50.5

0

0.5

1

1.5

50 100 150 200 250 300

-0.5

0

0.5

1

1.5

Filtered Stream of DiracsUltra Wide Band

Neural spikes

Streams of Dirac Pulses Poisson ProcessAction Potentials

Bilevel signalsPPM, CDMA

Piecewise constantPAM

Bandlimited + PWLinearECG signals

0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.54

3

2

1

0

1

2

3

Piecewise polynomials 1D 2D

All sparse in some sense !

ICASSP 2009 - 14

Sampling theorem for periodic stream of Diracs

Thm. 2:

ICASSP 2009 - 15

1

Consider a -periodic stream of Diracs with amplitudes located at distinct times [0, )

( ) ( ).

2and rate of innovation = .

Take a sampling

k

K

k kk

k

k

k

K

x t t kt

x

K

x

t

δ τ

ττ

ρτ

′= ∈

∈

′= − −∑ ∑

1

ernel of bandwidth sin( ) ( ) sinc( ( ))sin( / )

or a Dirichlet kernel such that 2 1 2 . Then the samples, ,

( ),sinc( ( ))

k

K

nk

k

BBtt B t k

B tB M K

N N B

y x t B nT xt

ρπϕ τ

τ π ττ

τ

′∈

=

≥

′= − =

= + >>

= − =

∑

( )

for 1, 2, , are a sufficient characterization of ( ).

knT

n N

t

x t

ϕ −

=

∑

1t 2t 3t 1kt Kt

Kx

1kx 2x

1x

3x

0

Sampling and reconstruction periodic stream of Diracs

( )n

t nT

( )y t( ) ( ), ( )ny y nT x t nT t ( )t

Step 1Step 2Step 3

Step 4

Find K annihilating

filtercoefficients

Find K Dirac

locations

Find at least 2K spectral values of

Find K Dirac weights

( )x t

( )x t

kt

kxICASSP 2009 - 16

A periodic stream of K weighted Dirac pulses is defined by

where

are the Fourier series/spectral values of .

Rate of innovation is given by K locations + K weights for each period :

22 ( )

10

1 1ˆ ( ) ,mmt

k

Ki i

mk

tkx x t e dt e mx

ππτ τ

τ

τ τ− −

=

= = ∈∑∫

-Periodic Stream of K weighted Dirac pulses

2

1( ) ( )

ˆ mt

K

k

im

m

k kk

x t t k

x e

tx

πτ

δ τ′= ∈

∈

′= − −

=

∑ ∑

∑

2K

( )x t

( )x t

1t 2t 3t 1kt Kt

Kx

1kx 2x

1x

3x

0

τ

ICASSP 2009 - 17

Periodic sinc sampling kernel

ICASSP 2009 - 18

Consider the Dirichlet kernel or the periodic sinc sampling kernel :

1 if sin( ) ˆ ( ) sinc( ( ))0 elsesin( / )

where2 and ,ie. take 2 1

2

Fourierm

k

M m MBtt B t kB t

B KM B B M

πϕ τ ϕτ π τ

τ ρ ττ

′∈

− ≤ ≤′= − = ←→ =

= ≥ = = +

∑

0

K-K

1

M

-M

ˆm

m0

( )t

t

( )y t

tT NT

The N sample values are obtained by uniform sampling the filtered signal

or by evaluating

Find the N sample values

ICASSP 2009 - 19

0

( ) ( ) ( )

( ), ( )

( ) sinc( ( )) , 1, 2, ,

n t nTy y nT x t t

x t nT t

x t B nT t dt n Nτ

ϕ

ϕ=

= = ∗

= −

= − =∫

Since by substituting into

If then

0-M

ˆmx

mM

2M K

0-M

ˆmy

mM

3, 2 1 7M N M

N

system of equations and unknown contiguous spectral values

2

2

ˆ( ), ( ) , ( )

ˆ , 1, 2, ,

mt

mnT

in m

mM

im

m M

y x t nT t x e nT t

x e n N

πτ

πτ

ϕ ϕ∈

=−

= − = −

= =

∑

∑

Step 1: Find at least 2K spectral values of x(t)2

ˆ( ) mti

mm

x t x eπτ

∈

= ∑

2 1N B Mτ≥ = +

N 2 1M +

NTτ

=

ˆ , , ,mx m M M= −

2 2

ˆˆ ( )

0 for other ,m

m n N N

x M m My DFT y

mτ − ≤ ≤

= = ∈ −

ICASSP 2009 - 20

Step 2: Find K annihilating filter coefficientsBut what is an annihilating filter?

Defn.: A filter is called an annihilating filter for a sequence when

Example: Verify that the filter with Z-transform annihilates the exponential sequence :

0 1 11

1 1 1

ˆ ˆ ˆ

1 ( ) 0m m m m

m m

h x h x h xu u u

−

−

∗ = +

= ⋅ + − ⋅ =

ˆ ˆ* 0 for all m m k m kk

h x h x m−∈

= = ∈∑

11( ) 1H z u z−= −

mh

ICASSP 2009 - 21

ˆmx

mh1ˆ m

mx u=

In general, a filter with coefficients and Z-transform

annihilates the linear combination of exponentials

where .

Verification:

Step 2: Find K annihilating filter coefficients

0 0 1 1 0

( ) 0

1ˆ ˆ* 0

k

K K K K Km l m l

m m l m l l k k l kl l k k l

H u

kkh x h x uxxh u u h

τ τ− −

−= = = = =

=

= = = =∑ ∑ ∑ ∑ ∑

{ } 0,1, ,m m Kh

=

1

0 1

( ) (1 ) KK

mm k

m k

H z h z u z− −

= =

= = −∑ ∏

ICASSP 2009 - 22

1

1ˆ , K

mm kk

kxx u m

τ =

= ∈∑ 2

( ) k

tkiu eπτ−=

Using 2K spectral values obtained in solve the Toeplitz system of equations

Example: solve

We obtain the K annihilating filter coefficients

How do we find the annihilating filter coefficients?

0 1 2 1 1

1 0 1 2 2

2 1 0 3 3

ˆ ˆ ˆ ˆˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ

x x x h xx x x h xx x x h x

− −

−

= −

ICASSP 2009 - 23

0

01

ˆ ˆ * 0 for 1, ,

ˆ ˆ for 1, ,

K

m m k m kkK

k m k mk

h x h x m K

h x h x m K

−=

−=

= = =

⇔ = − =

∑

∑

Step 1

0Take 3, let 1 and 1, ,3K h m= = =

{ }1 2, , , Kh h h

The roots of the z-transform of the annihilating filter

uniquely define the K locations of the Diracs.

Note : • Nonlinear problem• Annihilating filter ~ error locator polynomial, spectral estimation (Prony’s method)

Step 3 : Find K locations of the Diracs

( )

2

1

0 1

( )

( ) 1 0

, 1, ,

KKm

m km k

it

k

k

H z h z u z

z u e k Kπτ

− −

= =

−

= = − =

⇔ = = =

∑ ∏

ICASSP 2009 - 24

kt

From K spectral values and K distinct locations ,

solve the Vandermonde system of K equations, K unknowns

where

Example: K=3

Note :• Linear problem• Vandermonde system admits a unique solution since the locations are distinct.

Step 4: Find K weights of the Diracs

1 1 2 32 2 2

2 1 2 33 3 3

3 1 2

1

33

2

ˆ1ˆ

ˆ

x u u ux u u ux u u

x

uxx

τ

=

ICASSP 2009 - 25

1

1ˆ , 1, ,K

mm kk

kx u mx K

τ =

= =∑

{ } 1, ,ˆm m Kx

=

{ } 1, ,k k Kt

=

{ } 1, ,k k Kx

=

( 2 / )ktiku e π τ−=

kt

Notes on FRI sampling and reconstruction scheme

The complexity is • N samples : Analog to Digital converter• 2K spectral values from DFT of samples : • K annihilating filter coefficients from

Toeplitz system :• K locations from roots of Z-transform of

annihilating filter (non linear problem) : • K weights from Vandermonde system :

(linear problem) The key to the solution:

Separability of non-linear from linear problem

Note: Finding the locations is a key task in estimation/retrieval of sparse signals, but also in registration, feature detection,etc…

ICASSP 2009 - 26

( log )O K K

2( )O K

3( )O K2( )O K

Polynomial in K !

FRI Applications

• Modeling and compression of neonatal EEG seizure signals

• Removing shot noise from bandlimited signals

ICASSP 2009 - 27

Neonatal EEG seizure as FRI signal

ICASSP 2009 - 28

Ampl

itude

(mic

rovo

lt)Am

plitu

de (m

icro

volt)

Ampl

itude

(mic

rovo

lt)

Time samples

0 50 100 150 200 250 300 350 400 450-40

-20

0

20

40

60A section of the neonatal EEG seizure signal

0 50 100 150 200 250 300 350 400 450-40

-20

0

20

40

60Extracted stream of Dirac pulses

0 50 100 150 200 250 300 350 400 450-40

-20

0

20

40

60Non-uniform linear spline approximation of the seizure signal

0 t

Sampling and reconstruction non-uniform linear spline

( )n

t nT

( )y t( ) ( ), ( )ny y nT x t nT t ( )t

Step 1

Step 2 : Find Annhilating filterStep 3 : Find K locationsStep 4 : Find K weights

Find at least 2K spectral

values

, ,ˆm m K K

x

( )x t

ICASSP 2009 - 29

0 t

Step 1a

( 1) ( 1)2ˆ ˆ( ), ,

R Ri mm mx x

m K K

πτ

+ +=

= −

Differentiate (R+1) times Step 4a

Integrate R+1 times

( 1)

1

( ) ( ) R

R

x t x t dt+

+

= ∫∫∫( 1) ( )Rx t+

R= degree of spline =1

2K

Close-up of reconstruction neonatal EEG seizure

ICASSP 2009 - 30

320 340 360 380 400 420 440 460 480 500 520-20

-15

-10

-5

0

5

10

15

FRISinc approximationActual seizure signal

Ampl

itude

(mic

rovo

lt)

Time Compression ratio: 83%Error FRI method : 17%Error Sinc method : 37%

BL+ Shotnoise

Band-limited signal with additive shot-noise

ICASSP 2009 - 31

τ

( ) ( ) ( )BL Dx t x t x t= +

[ ],( )

( ), stream of K weighted DiracsBL L L

D

x t BL

x t−∈

Band-limited signal with additive shot-noise

Degrees of freedom for are

1. From the uniform set of samples, find Fourier series 2. Using annihilating filter method, determine stream of Diracs using 2K Fourier series

outside of the band , i.e.3. Determine the band-limited signal by taking the inverse Fourier series of

ICASSP 2009 - 32

( )x t

( ) ( )2 1 2

BL Dx t x tL K+ +

{ } [ ( 2 ), 2 ]ˆ[ ]

m L K L Kx m

∈ − + +

{ } [ 1, 2 ]ˆ ˆ[ ] [ ]D m L L Kx m x m

∈ + +=[ , ]L L−

{ } [ , ]ˆ ˆ ˆ[ ] [ ] [ ]BL D m L Lx m x m x m

∈ −= −

Spectrum of BL +Shotnoise

ˆ [ ]Dx m

ˆ [ ]BLx m

Spectrum

0 2L L L K− +

Demo: FRI GUI

ICASSP 2009 - 33

Summary Part 2

• Stream of Dirac pulse is very important• Signals with FRI allow uniform sampling after

appropriate smoothing and perfect reconstruction from the samples

• Methods rely on separating the innovation in a signal from the rest and identifying the innovative part solely from the samples

• The annihilating filter method is a key component in isolating the innovative part of the signals

• Applications

ICASSP 2009 - 34

Greetings from Sunny Singapore!

35


Part 3



Acquisition in the noisy caseDenoising strategiesCramer-Rao bounds

Applications

Recovery of noisy sparse signalsSparse Sampling: Theory, Algorithms and Applications

Thierry Blu

The Chinese University of Hong Kong

April 2009

Thierry Blu Recovery of noisy sparse signals


Applications

Outline

1 Acquisition in the noisy case

2 Denoising strategies

3 Cramer-Rao bounds

4 Applications



Applications

FRI with noise

Schematical acquisition of a τ -periodic FRI signal with noise

K∑

k=1

xkδ(t− tk)-- -⊕?

analog noise

⊕?

digital noise

ϕ(t)

sampling kernel

y(t)�

��W

T-- -yn

Modelization

yn =

K∑

k=1

xkϕ(nT − tk) + εn

NOTE: N measurements yn for n = 1, 2, . . . , N



Applications

FRI with noise

Estimation problem

Find estimates yn, xk and tk of yn, xk and tk such that

yn =

K∑

k=1

xkϕ(nT − tk)

‖y − y‖ℓ2 is as small as possible

Reminder

τ = NT is the periodicity of the filtered signal

ϕ(t) =sin(π(2M + 1)t/τ)

(2M + 1)τ sin(πt/τ)is the Dirichlet kernel

M ≥ K is some integer characterizing the bandwidth of ϕ(t)



Applications

Annihilation

Without noise, this problem can be brought back to

AH = 0

where

the roots of H(z) (built from H) provide tk

A is built using the DFT coefficients of yn

In the presence of noise, the matrix A is only approximatively singular.



Applications

Total least-squares

Replace the annihilation equation AH = 0 by

minH‖AH‖2 under the constraint ‖H‖2 = 1

Solution

Perform a Singular Value Decomposition

A = USVT

and choose the last column of V for H.

U is unitary of same size as A

S is diagonal (with decreasing coefficients) and of size(K + 1)× (K + 1)

V is unitary and of size (K + 1)× (K + 1)



Applications

Total least-squares

The estimation of the innovations are then obtained as follows

tk: by finding the roots of the polynomial H(z)

xk: by least-square minimization of

ϕ(T − t1) ϕ(T − t2) · · · ϕ(T − tK)ϕ(2T − t1) ϕ(2T − t2) · · · ϕ(2T − tK)

......

...ϕ(NT − t1) ϕ(NT − t2) · · · ϕ(NT − tK)

x1

x2

...xK

−

y1y2...yN

.

NOTE:

Related to Pisarenko method

Not robust with respect to noise ; need for extra denoising



Applications

Cadzow iterated denoising

Without noise, the annihilation property AH = 0 still holds iflength(H) = L+ 1 is larger than K + 1. We have the properties

A is still of rank K

A is a band-diagonal matrix (Tœplitz)

conversely, if A is Tœplitz and has rank K, then yn are the samplesof an FRI signal

Rank K “projection” algorithm

1 Perform the SVD of A: A = USVT

2 Set to zero the L−K + 1 smallest diagonal elements of S ; S′

3 build A′ = US′VT

4 find the band-diagonal matrix that is closest to A′ and goto step 1



Applications

Cadzow iterated denoising

Essential details

Iterations of the projection algorithm are performed until the matrix

A is of “effective” rank K

L is chosen maximal, i.e., L = M

Schematical view of the whole retrieval algorithm

yn yn- FFT -��HHH

HHH

��too

noisy?

yes

�Cadzow

? -no AnnihilatingFilter method

-tk

-

linea

rsy

stem -

-

tk

xk



Applications

Examples

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6Original and estimated signal innovations : SNR = 5 dB

original Diracsestimated Diracs

0 0.2 0.4 0.6 0.8 1−0.5

0

0.5

1

Noiseless samples

0 0.2 0.4 0.6 0.8 1−0.5

0

0.5

1

Noisy samples : SNR = 5 dB

Retrieval of an FRI signal with 7 Diracs (left) from 71 noisy (SNR = 5dB) samples (right).



Applications

Examples

0 0.2 0.4 0.6 0.8 1

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Original and estimated signal innovations : SNR = 20 dB

original 100 Diracsestimated Diracs

0 0.2 0.4 0.6 0.8 1−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1001 samples : SNR = 20 dB

signalnoise

Retrieval of an FRI signal with 100 Diracs (left) from 1001 noisy (SNR =20 dB) samples (right).



Applications

General uncertainty formula

Consider the noisy FRI retrieval problem

yn =

K∑

k=1

xkϕ(nT − tk) + εn for n = 1, 2, . . . , N

where εn is a zero-mean Gaussian noise with correlation matrix R.

This interpolation/approximation problem is actually a parametricestimation problem of the vector X = [x1, . . . , xK , t1, . . . , tK ]T, based onthe vector of noisy measurements Y = [y1, y2, . . . , yN ]T.

Any estimate X = Θ(Y) of X will be random and it is possible toevaluate its minimal “uncertainty”.



Applications

General uncertainty formula

Cramer-Rao lower bound

The covariance of any unbiased estimate of X based on the noisymeasurements yn is lower-bounded by Fisher’s matrix

cov{Θ(Y)} ≥(ΦTR−1Φ

)−1

where

Φ =

2

6

6

6

4

ϕ(T − t1) · · · ϕ(T − tK) −x1ϕ′(T − t1) · · · −xKϕ

′(T − tK)ϕ(2T − t1) · · · ϕ(2T − tK) −x1ϕ

′(2T − t1) · · · −xKϕ′(2T − tK)

......

......

ϕ(NT − t1) · · · ϕ(NT − tK) −x1ϕ′(NT − t1) · · · −xKϕ

′(NT − tK)

3

7

7

7

5

.

No periodicity requirement, arbitrary differentiable kernel ϕ(t).



Applications

The one-Dirac case

Under periodicity assumptions, the lower bounding matrix is diagonal.

Minimal location and amplitude uncertainties

∆t1τ≥ Bτ

2π|x1|√N

(∑

|m|≤⌊Bτ/2⌋

m2

rm

)−1/2

∆x1 ≥Bτ√N

(∑

|m|≤⌊Bτ/2⌋

1

rm

)−1/2

Here, rm = E {εnεn−m} and rm is its DFT.

NOTE: If the Diracs are sufficiently well-separated (typ. 2/N), theone-Dirac case formula is a good approximation of the multi-Dirac case.



Applications

The one-Dirac case

Digital white noise

∆t1τ≥ 1

π

√3Bτ

N(B2τ2 − 1)·PSNR−1/2 and

∆x1

|x1|≥√Bτ

N·PSNR−1/2

Analog (filtered) white noise

∆t1τ≥ 1

π

√3

B2τ2 − 1· PSNR−1/2 and

∆x1

|x1|≥ PSNR−1/2

In both cases, increasing the bandwidth of the antialiasing filter improves

the temporal resolution.

NOTE: uncertainty relation N · PSNR1/2 · ∆t1τ≥√

3

π



Applications

Simulations: Quasi-optimality

−10 0 10 20 30 40 500

0.2

0.4

0.6

0.8

1

input SNR (dB)

Pos

ition

s

2 Diracs / 21 noisy samples

retrieved locationsCramér−Rao bounds

−10 0 10 20 30 40 5010

−5

10−3

10−1

100

First Dirac

Pos

ition

s

−10 0 10 20 30 40 5010

−5

10−3

10−1

100

Second Dirac

Pos

ition

sinput SNR (dB)

observed standard deviationCramér−Rao bound

Retrieval of the locations of a FRI signal. Left: scatterplot of thelocations; right: standard deviation (averages over 10000 realizations)compared to Cramer-Rao lower bounds.

Quasi-optimality of the algorithm.



Applications

Simulations: Robustness

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

−1

−0.5

0

0.5

1

Original and estimated signal innovations : SNR = −5 dB; ’Observed’ SNR : −5.0847

time location

ampl

itude

of t

he D

iracs

1 2 3 4 5 6 7

0

0.2

0.4

0.6

0.8

1

Average distance between original and retrieved positions: 0.15467

Dirac number (1,2,... K)

time

loca

tion

original Diracsestimated Diracs

Original positionRetrieved position

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5

−1

−0.5

0

0.5

1290 noiseless samples

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5

−1

−0.5

0

0.5

1290 noisy samples : SNR = −5 dB

sampled time

ampl

itude

290 samples of an FRI signal in -5 dB noise. Left: retrieved locations andamplitudes; right: noiseless and noisy signal.

In high noise levels, the algorithm is still able to find accurately

a substantial proportion of Diracs



Applications

Optical Coherence Tomography: Principle

Detection of coherent backscattered waves from an object by makinginterferences with a low-coherence reference wave. Measurementperformed with a standard Michelson interferometer.

Axial (depth) resolution: ∝ coherence length of the reference wave;Transversal resolution: width of the optical beam.

NOTE: Very high sensitivity (low SNR), noninvasive, low-depth penetration; biomedical applications (ophthalmology, dentistry, skin).

Axial resolution1: 10→20µm. Better resolution → better diagnoses.

OCT is a ranging application!

1with SuperLuminescent Diods: low cost, compact, easy to use.



Applications

OCT: Experimental Setup

object

photodetector

AAAAAAAA

moving mirror

refe

renc

e pa

th

low-coherencesource light (SLD)

ψR(t)

ψO(t)

ψO = h ∗ ψR

〈|ψR(t− z

c) + ψO(t− z0

c)|2〉

NOTE: z, z0 are the optical path lengths of the reference and the object wave.



Applications

OCT: Mathematical setting

Measured intensity:

varia

ble

(mov

ing m

irror

)

Iphoto(z0 − z) =⟨∣∣ψR

(t− z

c

)+ ψO

(t− z0

c

)∣∣2⟩

= const + 2R{(h ∗G)

(z−z0

c

)}

︸︷︷︸OCT signal

.

G(t) is the temporal coherence function of the reference wave:

G(t′ − t) = 〈ψR(t′), ψR(t)〉

Typically, G(t) = A(t)e2iπν0t is a bandpass function.

A deconvolution/superresolution problem

Retrieve h(t) from the uniform samples of the OCT signal.



Applications

OCT: Resolution

Resolution limit of OCT (two interfaces): Lc = c× FWHM of A(t)

-5 0 5-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

acquisition time (ms)

NOTE: Because the light travels twice (forward then backward) in theobject, the actual physical resolution is Lc/2. Moreover, a larger value ofrefraction index inside the object further divides the resolution limit.



Applications

OCT: Finite Rate of Innovation Signal

A succession of K layers of varying depth, each of constant refractiveindex:

h(ν) =

K∑

k=1

ake−2iπbkν/c for all |ν − ν0| ≤

c

Lc

optical path = b1= b2

= b3

Problem: retrieve the innovations ak and bk from the uniform samplesof the OCT signal.

NOTE: A physical modelization (1D wave equation) shows that ak and bkare related to the depth (×2) and refractive index of the layers.



Applications

OCT: Gaussian kernel hypothesis

We assume that the coherence function has a Gaussian envelope:

G(t) ∝ e−4 ln(2) c2

L2c

t2+2iπν0t

This is a very reasonable hypothesis for SLDs:

-10 -5 0 5 10

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2


OCT mirror signaldifference with Gabor function



Applications

OCT: A standard FRI retrieval problem

Denote the OCT signal by I(z), then

I(z) = 2R{

(h ∗G)(

z−z0

c

)}

= 2R{ K∑

k=1

αke− 1

2σ2(z−βk)2

}

︸︷︷︸sum of Gabor functions

where the complex numbers αk and βk are related to ak and bk,and where σ = Lc

2√

2 log 2.

An FRI problem

Retrieve amplitudes αk and “locations” βk from I(l∆z), where∆z = sampling step.



Applications

OCT: Example of Processing

Separated Gaussianslocated at z0 and z1 their sum

-10 -5 z0 z1 0 5 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

→

-10 -5 0 5 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

ց

resynthetized signal retrieved Gaussians

-10 -5 0 5 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

←

-10 -5 z0 z1 0 5 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

ւ

+noise

-10 -5 0 5 10-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1



Applications

OCT: Simulation example

Simulation examples: two interfaces distant by 7µm (1ms below)

PSNR 40dB PSNR 30dB PSNR 25dB

-2 -1 0 1 2 3-0.3

-0.2

-0.1

0

0.1

0.2

0.3


OCT signaldifference with modelretrieved positions

-2 -1 0 1 2 3

-0.3

-0.2

-0.1

0

0.1

0.2

0.3



-2 -1 0 1 2 3

-0.3

-0.2

-0.1

0

0.1

0.2

0.3





Applications

OCT: Real Data Processing

SLD source of central wavelength 0.814µm and coherence length 25µm.; OCT resolution of 12.5µm.

Depth scan of a 4µm thick pellicle beamsplitter of an optical2 depth of6.6µm ; approximately half the OCT resolution.

Calibration part:

Depth scan of 1 interface ; effective coherence function;

High-coherence interferometer ; accurate position of the movingmirror.

2refractive index 1.65.Thierry Blu Recovery of noisy sparse signals


Applications

OCT: Real Data Processing

Example of superresolution (preliminary results3): data-modelPSNR=31dB

4.5 5 5.5 6 6.5 7 7.5 8

-0.06

-0.04

-0.02

0

0.02

0.04

0.06



The two retrieved interfaces are distant by 17 interference fringes; 17× 0.814/2 = 6.9µm.

3without correction for the movement of the mirror (HC phase unreliable).Thierry Blu Recovery of noisy sparse signals


Part 4



Part 4: Variations and Extensions

• Problem Statement and Motivation

• Alternative Sampling Kernels

• Kernels Reproducing Polynomials

– The family of B-Splines

• Sampling FRI Signals with Polynomial Reproducing Kernels

– Sampling Streams of Diracs

– Sampling Piecewise Polynomial Signals

• Kernels Reproducing Exponentials and Kernels with Rational Fourier Transform

• Estimation of FRI signal at the output of an RC circuit

• Application: Image Super-Resolution

• Summary and Conclusions

1

Problem Statement and Motivation

You are given a class of functions. You have a sampling device, typically, a low-pass filter. Given

the measurements yn = 〈ϕ(t/T − n), x(t)〉, you want to reconstruct x(t).

T

x(t)ϕ

Acquisition Device

h(t)= (−t/T) y =<x(t), (t/T−n)>n ϕy(t)

Natural questions:

• When is there a one-to-one mapping between x(t) and yn?

• What signals can be sampled and what kernels ϕ(t) can be used?

• What reconstruction algorithms?

In this part of the tutorial we aim to extend the choices of valid sampling kernels.

2


Digital Cameras:

Observedscene

Samples

AcquisitionSystem

Lens CCDArray

• The low-quality lens blurs the images. Lenses are, to some extent, equivalent to the sampling

kernel. Lenses do not necessarily behave like the sinc function.

3


−4 −2 0 2 4 6Pixels (perpendicular)

(a)Original (2014× 3039) (b) Point Spread function in black

The distortion introduced by a lens can be well approximated with splines.

4


A-to-D converters (simplified Sample and Hold Converter).

−OUTR

IN

C

+

• The electric circuits in many practical devices modify the measured signal, can we model this

distortion with a function that suits our sampling framework?

5

Alternative Sampling Kernels

Possible classes of kernels (we want to be as general as possible):

1. Any kernel ϕ(t) that can reproduce polynomials:

Xn

cm,nϕ(t− n) = tm

m = 0, 1, ..., L,

for a proper choice of coefficients cm,n.

2. Any kernel ϕ(t) that can reproduce exponentials:

Xn

cm,nϕ(t− n) = eamt

am = α0 + mλ and m = 0, 1, ..., L,

for a proper choice of coefficients cm,n.

3. Any kernel with rational Fourier transform:

ϕ(ω) =

Qi(jω − bi)Q

m(jω − am)am = α0 + mλ and m = 0, 1, ..., L.

6


Class 1 is made of all functions ϕ(t) satisfying Strang-Fix conditions. This includes any scaling

function generating a wavelet with L + 1 vanishing moments (e.g., Splines and Daubechies

scaling functions).

Strang-Fix Conditions:

ϕ(0) 6= 0 and ϕ(m)

(2nπ) = 0 for n 6= 0 and m = 0, 1, .., L,

where ϕ(ω) is the Fourier transform of ϕ(t).

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.2

0.4

0.6

0.8

1

β0(t)

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

0

0.2

0.4

0.6

0.8

1

β1(t)

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

β2(t)

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

β3(t)

β0(t) β1(t) β2(t) β3(t)

B-splines satisfy Strang-Fix conditions since βn(ω) = sinc(ω/2)n+1.

7

B-splines Reproduce Polynomials

−6 −4 −2 0 2 4 6

−3

−2

−1

0

1

2

3

t0−1 1

1

−6 −4 −2 0 2 4 6

−3

−2

−1

0

1

2

3

t

−6 −4 −2 0 2 4 6

−3

−2

−1

0

1

2

3

t

β1(t) c0,n = (1, 1, 1, 1, 1, 1, 1) c1,n = (−3,−2,−1, 0, 1, 2, 3)

The linear spline reproduces polynomials up to degree L=1:P

n cm,nβ1(t − n) = tm m = 0, 1, for a proper

choice of coefficients cm,n (in this example n = −3,−2, ..., 1, 2, 3).

Notice: cm,n = 〈ϕ(t− n), tm〉 where ϕ(t) is biorthogonal to ϕ(t): 〈ϕ(t), ϕ(t− n)〉 = δn.

8

B-splines Reproduce Polynomials

−6 −4 −2 0 2 4 6

−3

−2

−1

0

1

2

3

−6 −4 −2 0 2 4 6

−3

−2

−1

0

1

2

3

c0,n = (1, 1, 1, 1, 1, 1, 1) c1,n = (−3,−2,−1, 0, 1, 2, 3)

−6 −4 −2 0 2 4 6

−6

−4

−2

0

2

4

6

−6 −4 −2 0 2 4 6−20

−15

−10

−5

0

5

10

15

20

c2,n ∼ (8.7, 3.7, 0.7,−0.333, 0.7, 3.7, 8.7) c3,n ∼ (−24,−6,−0.001, 0, 0.001, 6, 24)

The cubic spline reproduces polynomials up to degree L=3:P

n cm,nβ3(t− n) = tm m = 0, 1, 2, 3.

9

Basic Set-up: Sampling Streams of Diracs

• Assume that x(t) is a stream of K Diracs on the interval [0, τ [:

x(t) =

K−1X

k=0

xkδ(t− tk).

The signal has 2K degrees of freedom.

• Assume that ϕ(t) is any function that can reproduce polynomials of maximum degree

L ≥ 2K − 1: Xn

cm,nϕ(t− n) = tm

, m = 0, 1, ..., L,

where cm,n = 〈ϕ(t− n), tm〉 and ϕ(t) is biorthogonal to ϕ(t).

• We want to retrieve x(t), from the samples yn = 〈x(t), ϕ(t/T − n)〉 with n =

0, 1, ..., N − 1 and TN = τ .

10

Sampling Streams of Diracs

Assume for simplicity that T = 1 and define sm =P

n cm,nyn, m = 0, 1, ..., L, we have that

sm =PN−1

n=0 cm,nyn

= 〈x(t),PN−1

n=0 cm,nϕ(t− n)〉

=R∞−∞ x(t)tmdt

=PK−1

k=0 xktmk m = 0, 1, ..., L

(1)

We thus observe sm =PK−1

k=0 xktmk , m = 0, 1, ..., L.

11


x0δ(t−t

0)

0 t 0 t

x0δ(t−t

0)

Xn

yn = 〈x0δ(t−t0),X

n

ϕ(t−n)〉 =

Z ∞

−∞x0δ(t−t0)

Xn

ϕ(t−n)dt = x0

Xn

ϕ(t0−n) = x0

Xn

c1,nyn = 〈x0δ(t− t0),X

n

c1,nϕ(t− n)〉 = x0

Xn

c1,nϕ(t0 − n) = x0t0

12


x0δ(t−t

0)

x1δ(t−t

1)

0 t

x0δ(t−t

0)

x1δ(t−t

1)

0 t

s0 =P

n yn = x0 + x1 s1 =P

n c1,nyn = x0t0 + x1t1

0 t

x0δ(t−t

0)

x1δ(t−t

1)

x0δ(t−t

0)

x1δ(t−t

1)

0 t

s2 =P

n c2,nyn = x0t20 + x1t

21 s3 =

Pn c3,nyn = x0t

30 + x1t

31

13

Sampling Streams of Diracs - Toy Examples

Example: T = 1, x(t) = x0δ(t− t0) with x0 = 2 and t0 =√

2.The sampling kernel is a linear spline.

−6 −4 −2 0 2 4 6−0.5

0

0.5

1

1.5

2

t

x0δ(t−t

0)

−6 −4 −2 0 2 4 6−0.5

0

0.5

1

1.5

2

n

y n

In this case

yn = 〈x(t), ϕ(t− n)〉 = x0ϕ(t0 − n) =

8<:

0 n 6= 1, 2

2(2−√2), n = 1,

2(√

2− 1) n = 2.

ands0 =

Pn c0,nyn =

Pn yn = y1 + y2 = x0 = 2,

s1 =P

n c1,nyn =P

n nyn = y1 + 2y2 = x0t0 = 2√

2.

14

The Annihilating Filter Method

The quantity

sm =K−1∑

k=0

xktmk , m = 0, 1, ..., L

is a power-sum series.Notice the similarity with the sinc case where we were finding the quantity:

xm =1τ

K−1∑

k=0

xke−j2πmtk/τ =

K−1∑

k=0

xkumk .

As for the sinc case, we can retrieve the locations tk and the amplitudes xk withthe annihilating filter method.

15


1. Call hm the filter with z-transform H(z) =QK−1

k=0 (1− tkz−1).

We have that hm ∗ sm =PK

i=0 hism−i = 0. This filter is thus called the annihilating

filter.

In matrix/vector form we have that SH = HS = 0 or alternatively using the fact that h0 = 1

26666666664

sK−1 sK−2 · · · s0

sK sK−1 · · · s1

... ... . . . ...

sL−1 sL−2 · · · sL−K

37777777775

0BBBBBBBBB@

h1

h2

...

hK

1CCCCCCCCCA

= −

0BBBBBBBBB@

sK

sK+1

...

sL

1CCCCCCCCCA

.

Solve the above Toeplitz system to find the coefficient of the annihilating filter.

16


2. Given the coefficients {1, h1, h2, ..., hk}, we get the locations tk by finding the roots of H(z).

3. Solve the first K equations in sm =PK−1

k=0 xktmk to find the amplitudes xk. In matrix/vector

form

26666666664

1 1 · · · 1

t0 t1 · · · tK−1

... ... . . . ...

tK−10 tK−1

1 · · · tK−1K−1

37777777775

0BBBBBBBBB@

x0

x1

...

xK−1

1CCCCCCCCCA

=

0BBBBBBBBB@

s0

s1

...

sK−1

1CCCCCCCCCA

. (2)

Classic Vandermonde system. Unique solution for distinct tks.

17

Sampling Streams of Diracs: Numerical Example

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t−4 −3 −2 −1 0 1 2 3 40

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

t

(a) Original Signal (b) Sampling Kernel (β7(t))

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

t0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t

(c) Samples (d) Reconstructed Signal

18

Sampling Streams of Diracs: Closely Spaced Diracs

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t−4 −3 −2 −1 0 1 2 3 40

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

t


0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

t0 0.2 0.4 0.6 0.8 1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t


19

Sampling Streams of Diracs: Sequential Reconstruction

−20 −15 −10 −5 0 5 10 15 20−4

−3

−2

−1

0

1

2

3

4

−5 −4 −3 −2 −1 0 1 2 3 4 5−0.1

0

0.1

0.2

0.3

0.4

0.5


0 5 10 15 20 25 30 35−1

−0.5

0

0.5

−20 −15 −10 −5 0 5 10 15 20−4

−3

−2

−1

0

1

2

3

4


20

Sinc Kernel vs B-Spline Kernel

{x }

TAF VS

0 Kx ..xt ..t0 Kmn{y }

Acquisition Reconstruction

x(t)

DFT

Fourier coefficients of x(t)

Sinc Kernel

^

B−Spline Kernel

TC AF VS

0 Kx ..xt ..t0 Km{s }n{y }

Moments of x(t)


x(t)

21

Sampling Piecewise Constant Signals

Insight: the derivative of a piecewise constant signal is a stream of Diracs. Thus, if we can

compute the derivative of piecewise constant signals, we can sample them.

Given the samples yn compute the finite difference zn = yn+1 − yn, we have that

zn = yn+1 − yn = 〈x(t), ϕ(t− n− 1)− ϕ(t− n)〉

= 12π〈x(ω), ϕ(ω)e−jωn(e−jω − 1)〉

= 12π〈x(ω),−jωϕ(ω)e−jωn

�1−e−jω

jω

�〉

= −〈x(t), ddt [ϕ(t− n) ∗ β0(t− n)]

= 〈dx(t)dt , ϕ(t− n) ∗ β0(t− n)〉.

Thus the samples zn are related to the derivative of x(t).

Notice: Only Discrete-Time Processing!

22

Sampling Piecewise Constant Signals

T z = y −yy

x(t)

T

β

ϕ (t)

ϕ 0 (t)* (t)

n n n n−1

zn

dx(t)dt

23

Piecewise Constant Signals. Example

200 400 600 800 1000 1200−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30

−1

−0.5

0

0.5

1

1.5

2

(a) Original Signals (b) Measured Samples

0 5 10 15 20 25 30

−1

−0.5

0

0.5

1

1.5

2

200 400 600 800 1000 1200−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

(c) Finite Difference (d) Reconstructed Signal

24

Sampling Diracs vs Sampling PCS

B−Spline Kernel

TC AF VS

0 Kx ..xt ..t0 Km{s }n{y }

Moments of x(t)


x(t)

Moments of dx/dt

0 5 10 15 20 25 30

−1

−0.5

0

0.5

1

1.5

2

0 5 10 15 20 25 30

−1

−0.5

0

0.5

1

1.5

2

200 400 600 800 1000 1200−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

0 Kx ..x


VSAF

x(t) B−Spline Kernel

C Kt ..t0T

Digital

(1−z)Filter{y }n

{z }n

{s }m

25

Kernel Reproducing Exponentials

Classe 2 includes any function ϕ(t) that can reproduce exponentials:

Xn


, am = α0 + mλ and m = 0, 1, ..., L.

This includes any composite kernel of the form φ(t) ∗ β~α(t) where β~α(t) = βα0(t) ∗ βα1

(t) ∗... ∗ βαL

(t) and βαi(t) is an Exponential Spline of first order [UnserB:05].

eα t

E−Spline βα(t)

βα(t) ⇔ β(ω) =1− eα−jω

jω − αNotice:

• α can be complex.

• E-Spline is of compact support.

• E-Spline reduces to the classical polynomial

spline when α = 0

26

Kernels Reproducing Exponentials

0 1 t

eα0 t

0 1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

t

The E-spline of first order βα0(t) reproduces the exponential eα0t:

Xn

c0,nβα0(t− n) = e

a0t.

In this case c0,n = eα0n. In general, cm,n = cm,0eαmn.

27

Trigonometric E-Splines

• The exponent α of the E-splines can be complex. This means βα(t) can be a complex

function.

• However if pairs of exponents are chosen to be complex conjugate then the spline stays real.

• Example:

βα0+jω0(t) ∗ βα0−jω0

(t) =

8>>>>>><>>>>>>:

sin ω0t

ω0eα0t 0 ≤ t < 1

−sin ω0(t−2)

ω0eα0t 1 ≤ t < 2

0 Otherwise

When α0 = 0 (i.e., purely imaginary exponents), the spline is called trigonometric spline.

28

Trigonometric E-Splines

−10 −8 −6 −4 −2 0 2 4 6 8 10 12−1.5

−1

−0.5

0

0.5

1

1.5

t

ExponentialReproductionScaled and shifted E−splines

−10 −8 −6 −4 −2 0 2 4 6 8 10 12−1.5

−1

−0.5

0

0.5

1

1.5

t


−10 −8 −6 −4 −2 0 2 4 6 8 10 12−1.5

−1

−0.5

0

0.5

1

1.5

t


−10 −8 −6 −4 −2 0 2 4 6 8 10 12−1.5

−1

−0.5

0

0.5

1

1.5

t


Here ~α = (−jω0, jω0) and ω0 = 0.2.P

n cn,mβ~α(t− n) = ejmω0 m = −1, 1.

Notice: β~α(t) is a real function, but the coefficients cm,n are complex.

29

Sampling Streams of Diracs with E-splines

• Assume that x(t) is a stream of K Diracs on the interval [0, τ [:

x(t) =

K−1X

k=0

xkδ(t− tk).

The signal has 2K degrees of freedom.

• Assume that ϕ(t) is any function that can reproduce exponentials of the form

Xn


am = mλ and m = 0, 1, ..., L,

where L ≥ 2K − 1.

• We want to retrieve x(t), from the samples yn = 〈x(t), ϕ(t/T − n)〉 with n =

0, 1, ..., N − 1 and TN = τ .

30

Sampling Diracs with E-splines

Assume that x(t) =PK−1

k=0 xkδ(t− tk). We have that

sm =P

n cm,nyn

= 〈x(t),P

n cm,nϕ(t− n)〉

=R∞−∞ x(t)eαmtdt

=PK−1

k=0 xkeαmtk

=PK−1

k=0 xkeλmtk =

PK−1k=0 xku

mk , m = 0, 1, ..., L

and we can again use the annihilating filter method to reconstruct the Diracs. In matrix form

HCY = 0.

Notice: if αm = jmω0 with m = 0, 1, ..., N , then sm = x(mω0) = xm.

In this case C ∼ DFT .

31


Class 3 includes any linear differential acquisition device. That is, any device for which the input

and output are related by linear differential equations. This includes most electrical, mechanical

or acoustic systems.

A class 3 kernel ϕ(ω) =Q

i(jω−bi)Qm(jω−am) can be converted into the class 2 kernel

Qi(jω−bi)β~α(ω)

by filtering the samples yn = 〈x(t), ϕ(t− n)〉 with an FIR filter H(z) =QL

m=0(1− eamz).

Example:

Assume ϕ(ω) = 1jω−α and yn = 〈x(t), ϕ(t− n)〉. Then

zn = hn ∗ yn = yn − eαyn+1 = 〈x(t), ϕ(t− n)− eαϕ(t− n− 1)〉

= 12π〈x(ω), e−jωn (1−eα−jω)

(jω−α) = 〈x(t), βα(t− n)〉.

32

Reconstruction of FRI signals at the output of an RC circuit

Filterx(t)

.

−

+

C

.

.

y(t)

R

−

+

yn

T=1it

iA

gn

.

Digital(1)

MethodFilter

Annihilating

A bit of circuit analysis ;-)

• Input x(t), samples yn = 〈x(t), ϕ(t − n)〉 with ϕ(ω) = α/(α − jω) and α = 1/RC,

ϕ(t) leads to the E-Spline βα(t) with α = 1/RC.

• An ideal reasonant circuit (LC circuit) leads to E-Splines with purely imaginary exponents:

α = ±jω0 and ω0 = 1/√

LC. With this we can estimate x(ω0).

• A classical RLC circuit leads to E-splines with complex exponential.

• More sophisticated passive or active circuits give higher order splines.

33

Application: Image Super-Resolution

Super-Resolution is a multichannel sampling problem with unknown shifts. Use moments to retrieve the shifts or thegeometric transformation between images.

(a)Original (512× 512) (b) Low-res. (64× 64) (c) Super-res ( PSNR=24.2dB)

• Forty low-resolution and shifted versions of the original.

• Intuition: The disparity between images has a finite rate of innovation and can be retrieved exactly.

• Accurate registration is achieved by retrieving the continuous moments of the ‘Tiger’ from the samples.

• The registered images are interpolated to achieve super-resolution.

34


• A sample Pm,n in the blurred image is given by

Pm,n = 〈I(x, y), ϕ(x/2J − n, y/2

J −m)〉.

• The scaling function ϕ(x, y) can reproduce polynomials:

Xn

Xm

c(l,j)m,nϕ(x− n, y −m) = x

ly

jl = 0, 1, ..., N ; j = 0, 1, ..., N.

• Retrieve the exact moments of I(x, y) from Pm,n:

τl,j =Xn

Xm

c(l,j)m,nPm,n = 〈I(x, y),

Xn

Xm

c(l,j)m,nϕ(x/2

J−n, y/2J−m)〉 =

Z ZI(x, y)x

ly

jdxdy.

• Register the low-resolution images using the retrieved moments.

35


−4 −2 0 2 4 6Pixels (perpendicular)

(a)Original (2014× 3039) (b) Point Spread function

36


(a)Original (128× 128) (b) Super-res (1024× 1024)

37


(a)Original (48× 48) (b) Super-res (480× 480)

38

Summary

• We have introduced alternative sampling kernels (e.g., B-splines, E-splines)

• Potential advantages:

– Kernels with compact support ⇒ reconstruction algorithms with lowcomplexity.

– Very close to real devices (e.g., Electric circuits, Digital cameras)

• Promising Application: Image Super-resolution.

• Many other potential applications such as for example parallel A-to-Dconverters.

39

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002 1417

Sampling Signals With Finite Rate of InnovationMartin Vetterli, Fellow, IEEE, Pina Marziliano, and Thierry Blu, Member, IEEE

Abstract—Consider classes of signals that have a finite numberof degrees of freedom per unit of time and call this number the rateof innovation. Examples of signals with a finite rate of innovationinclude streams of Diracs (e.g., the Poisson process), nonuniformsplines, and piecewise polynomials.

Even though these signals are not bandlimited, we show that theycan be sampled uniformly at (or above) the rate of innovation usingan appropriate kernel and then be perfectly reconstructed. Thus,we prove sampling theorems for classes of signals and kernels thatgeneralize the classic “bandlimited and sinc kernel” case. In par-ticular, we show how to sample and reconstruct periodic and fi-nite-length streams of Diracs, nonuniform splines, and piecewisepolynomials using sinc and Gaussian kernels. For infinite-lengthsignals with finite local rate of innovation, we show local samplingand reconstruction based on spline kernels.

The key in all constructions is to identify the innovative part ofa signal (e.g., time instants and weights of Diracs) using an annihi-lating or locator filter: a device well known in spectral analysis anderror-correction coding. This leads to standard computational pro-cedures for solving the sampling problem, which we show throughexperimental results.

Applications of these new sampling results can be found in signalprocessing, communications systems, and biological systems.

Index Terms—Analog-to-digital conversion, annihilating fil-ters, generalized sampling, nonbandlimited signals, nonuniformsplines, piecewise polynomials, poisson processes, sampling.

I. INTRODUCTION

M OST continuous-time phenomena can only be seenthrough sampling the continuous-time waveform, and

typically, the sampling is uniform. Very often, instead of thewaveform itself, one has only access to a smoothed or filteredversion of it. This may be due to the physical set up of themeasurement or may be by design.

Calling the original waveform, its filtered version is, where is the convolution

kernel. Then, uniform sampling with a sampling intervalleads to samples given by

(1)

Manuscript received May 16, 2001; revised February 11, 2002. The work ofP. Marziliano was supported by the Swiss National Science Foundation. Theassociate editor coordinating the review of this paper and approving it for pub-lication was Dr. Paulo J. S. G. Ferreira.

M. Vetterli is with the LCAV, DSC, Swiss Federal Institute of Technology,Lausanne, Switzerland, and also with the Department of Electrical Engineer-ring and Computer Science, University of California, Berkeley, CA, 94720 USA(e-mail: [email protected]).

P. Marziliano was with the Laboratory for Audio Visual Communication(LCAV), Swiss Federal Institute of Technology, Lausanne, Switzerland. She isnow with Genista Corporation, Genimedia SA, Lausanne, Switzerland (e-mail:[email protected]).

T. Blu is with the Biomedical Imaging Group, DMT, Swiss Federal Instituteof Technology, Lausanne, Switzerland (e-mail: [email protected]).

Publisher Item Identifier S 1053-587X(02)04392-1.

Fig. 1. Sampling setup:x(t) is the continuous-time signal;~h(t) = h(�t)is the smoothing kernel;y(t) is the filtered signal;T is the sampling interval;y (t) is the sampled version ofy(t); andy(nT ),n 2 are the sample values.The box C/D stands for continuous-to-discrete transformation and correspondsto reading out the sample valuesy(nT ) from y (t).

The intermediate signal corresponding to an idealizedsampling is given by

(2)

This setup is shown in Fig. 1.When no smoothing kernel is used, we simply have

, which is equivalent to (1) with . This simplemodel for having access to the continuous-time world is typ-ical for acquisition devices in many areas of science and tech-nology, including scientific measurements, medical and biolog-ical signal processing, and analog-to-digital converters.

The key question is, of course, if the samples are afaithful representation of the original signal . If so, how canwe reconstruct from , and if not, what approximationdo we get based on the samples ? This question is at theheart of signal processing, and the dominant result is the well-known sampling theorem of Whittakeret al., which states thatif is bandlimited, or , then samples

with are sufficient to reconstruct [5],[15], [18]. Calling , which is the bandwidth of in cyclesper second ( ), we see that sample/s area sufficient representation of . The reconstruction formulais given by

sinc (3)

where sinc . If is not bandlimited, con-volution with sinc (an ideal lowpass filter withsupport ) allows us to apply sampling and recon-struction of , which is the lowpass approximation of .This restriction of to the interval providesthe best approximation in the least squares sense ofin thesinc space [18].

A possible interpretation of the interpolation formula (3) isthe following. Any real bandlimited signal can be seen as having

degrees of freedom per unit of time, which is the number ofsamples per unit of time that specify it. In the present paper, thisnumber of degrees of freedom per unit of time is called therate

1053-587X/02$17.00 © 2002 IEEE

1418 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002

of innovationof a signal1 and is denoted by. In the bandlimitedcase above, the rate of innovation is .

In the sequel, we are interested in signals that have afinite rateof innovation, either on intervals or on average. Take a Poissonprocess, which generates Diracs with independent and identi-cally distributed (i.i.d.) interarrival times, the distribution beingexponential with probability density function . The ex-pected interarrival time is given by . Thus, the rate of inno-vation is since, on average, real numbers per unit of timefully describe the process.

Given a signal with a finite rate of innovation, it seems attrac-tive to sample it with a rate of samples per unit of time. Weknow it will work with bandlimited signals, but will it work witha larger class of signals? Thus, the natural questions to pursueare the following.

1) What classes of signals of finite rate of innovation can besampled uniquely, in particular, using uniform sampling?

2) What kernels allow for such sampling schemes?3) What algorithms allow the reconstruction of the signal

based on the samples?In the present paper, we concentrate on streams of Diracs,

nonuniform splines, and piecewise polynomials. These aresignal classes for which we are able to derive exact samplingtheorems under certain conditions. The kernels involved are thesinc, the Gaussian, and the spline kernels. The algorithms, al-though they are more complex than the standard sinc samplingof bandlimited signals, are still reasonable (structured linearsystems) but also often involve root finding.

As will become apparent, some of the techniques we use inthis paper are borrowed from spectral estimation [16] as wellas error-correction coding [1]. In particular, we make use of theannihilating filter method (see Appendices A and B), which isstandard in high-resolution spectral estimation. The usefulnessof this method in a signal processing context has been previ-ously pointed out by Ferreira and Vieira [24] and applied to errordetection and correction in bandlimited images [3]. In arraysignal processing, retrieval of time of arrival has been studied,which is related to one of our problems. In that case, a statis-tical framework using multiple sensors is derived (see, for ex-ample, [14] and [25]) and an estimation problem is solved. Inour case, we assume deterministic signals and derive exact sam-pling formulas.

The outline of the paper is as follows. Section II formallydefines signal classes with finite rate of innovation treated inthe sequel. Section III considers periodic signals in continuoustime and derives sampling theorems for streams of Diracs,2

nonuniform splines, and piecewise polynomials. These typesof signals have a finite number of degrees of freedom, and asampling rate that is sufficiently high to capture these degreesof freedom, together with appropriate sampling kernels, allowsperfect reconstruction. Section IV addresses the sampling offinite-length signals having a finite number of degrees offreedom using infinitely supported kernels like the sinc kerneland the Gaussian kernel. Again, if the critical number of

1This is defined in Section II and is different from the rate used in rate-distor-tion theory [2]. Here, rate corresponds to a degree of freedom that is specifiedby real numbers. In rate-distortion theory, the rate corresponds to bits used in adiscrete approximation of a continuously valued signal.

2We are using the Dirac distribution and all of the operations, includingderivatives, must be understood in the distribution sense.

samples is taken, we can derive sampling theorems. Section Vconcentrates on local reconstruction schemes. Given that thelocal rate of innovation is bounded, local reconstruction ispossible, using, for example, spline kernels. In the Appendix, wereview the “annihilating filter” method from spectral analysisand error-correction coding with an extension to multiple zeros.This method will be used in several of the proofs in the paper.

II. SIGNALS WITH FINITE RATE OF INNOVATION

In the Introduction, we informally discussed the intuitive no-tion of signals with finite rate of innovation. Let us introducemore precisely sets of signals having a finite rate of innovation.Consider a known function and signals of the form

(4)

of which bandlimited signals [see (3)] are a particular case.Clearly, the rate of innovation is . There are manyexamples of such signals, for example, when is a scalingfunction in a wavelet multiresolution framework [8], [20] and inapproximation theory [17].

A more general case appears when we allow arbitrary shiftsor

(5)

For example, when , , , andare i.i.d. with exponential density, then we have the

Poisson process of rate.Allowing a set of functions and arbitrary

shifts, we obtain

(6)

Finally, we will be considering signals that are built using anoninnovative part, for example, a polynomial and an innovativepart such as (5) for nonuniform splines or, more generally, (6)for piecewise polynomials of degree and arbitrary intervals(in that case, and are one-sided power functions

).Assuming the functions are known, it is clear that the

only degrees of freedom in a signal as in (6) are the timeinstants and the coefficients . Introducing a counting func-tion that counts the number of degrees of freedom in

over the interval , we can define the rate of innova-tion as

(7)

Definition 1: A signal with a finite rate of innovation is asignal whose parametric representation is given in (5) and (6)and with a finite , as defined in (7).

If we consider finite length or periodic signals of length,then the number of degrees of freedom is finite, and the rate ofinnovation is . Note that in the uniform case givenin (4), the instants, being predefined, are not degrees of freedom.

VETTERLI et al.: SAMPLING SIGNALS WITH FINITE RATE OF INNOVATION 1419

One can also define alocal rate of innovation with respect toa moving window of size . Given a window of size , the localrate of innovation at time is

(8)

In this case, one is often interested in the maximal local rate or

(9)

As , tends to . To illustrate the differencesbetween and , consider again the Poisson process withexpected interarrival time . The rate of innovation is givenby . However, for any finite , there is no bound on .

The reason for introducing the rate of innovationis that onehopes to be able to “measure” a signal by takingsamples perunit of time and be able to reconstruct it. We know this to betrue in the uniform case given in (4). The main contribution ofthis paper is to show that it is also possible for many cases ofinterest in the more general cases given by (5) and (6).

III. PERIODIC CONTINUOUS-TIME CASE

In this section, we consider–periodic signals that are eithermade of Diracs or polynomial pieces. The natural representationof such signals is through Fourier series

(10)

We show that such signals can be recovered uniquely from theirprojection onto a subspace of appropriate dimension. The sub-space’s dimension equals the number of degrees of freedom ofthe signal, and we show how to recover the signal from its pro-jection. One choice of the subspace is the lowpass approxima-tion of the signal that leads to a sampling theorem for periodicstream of Diracs, nonuniform splines, derivatives of Diracs, andpiecewise polynomials.

A. Stream of Diracs

Consider a stream of Diracs periodized with period ,, where , and

. This signal has 2 degrees of freedom per period,and thus, the rate of innovation is

(11)

The periodic stream of Diracs can be rewritten as

from Poisson's summation formula

(12)

The Fourier series coefficients are thus given by

(13)

that is, the linear combination of complex exponentials. Con-sider now a finite Fourier series with-transform

(14)

and having zeros at , that is

(15)

Note that is the convolution of elementary filterswith coefficients andthat the convolution of such a filter with the exponential

is zero

(16)

Therefore, because is the sum of exponentials and eachbeing zeroed out by one of the roots of , it follows that

(17)

The filter is thus called an annihilating filter since it an-nihilates exponentials [16]. A more detailed discussion ofannihilating filters is given in Appendix A. In error-correctioncoding, it is called the error locator polynomial [1]. The pe-riodic time-domain function is obtained by inversion ofthe Fourier series, or equivalently, by evaluating for

. Thus

(18)

that is, has zeros at . Thus, in the time domain,we have the equivalent of (17) as

Note that with nonzero coefficients forand that is unique for given by (12)

with and distinct locations . Thisfollows from the uniqueness of the set of roots in (15). Further,define the sinc function of bandwidth , where

as , that is, sinc . We are now ready toprove a sampling result for periodic streams of Diracs.

Theorem 1: Consider , which is a periodic stream ofDiracs of period with Diracs of weight and at lo-cation , as in (12). Take as a sampling kernel

sinc , where is chosen such that it is greater or equal tothe rate of innovation given by (11) and sampleat uniform locations , where

, and . Then, the samples

(19)

are a sufficient characterization of .


Proof: We first show how to obtain the Fourier series co-efficients from the samples

. Based on these Fourier series coefficients, we thenshow how to obtain the annihilating filter , which leads tothe locations of the Diracs. Finally, the weights of the Diracs arefound, thus specifying uniquely .

1) Finding from .Using (10) in (19), we have

(20)

(21)

(22)

where is the Fourier transform of . This systemof equations is not invertible when with

, where . In all other cases, the equations areof maximal rank . When is a divisor of ,this is simply the inverse discrete-time Fourier transform(IDTFT) of .

2) Finding the coefficients of the filter that annihilates.

We need to solve (17) for , given. For example, pick ,

, and to illustrate. Discarding ,we have that (17) is equivalent to

(23)

In general, at critical sampling, we have a system givenby

...

......

(24)

This is a classic Yule–Walker system [4], which in ourcase has a unique solution when there aredistinctDiracs in because there is a unique annihilating filter.

3) Factoring .Given the coefficients 1, , we factor its

– transform into its roots

(25)

where , which leads to the locations.

4) Finding the weights .

Given the locations , we can write valuesof as linear combinations of exponentials following(13). Again, for

(26)

In general, using , the system of equa-tions is given by

......

......

...

(27)which is a Vandermonde system, which always has a so-lution when the ’s are distinct.

An interpretation of the above result is the following. Take, and project it onto the lowpass subspace corresponding to

its Fourier series coefficients to . This projection, whichis denoted by , is a unique representation of . Since it islowpass with bandwidth , it can be sampledwith a sampling period smaller than . Note that step 2 inthe proof required 2 adjacent coefficients of . We chosethe ones around the origin for simplicity, but any set will do. Inparticular, if the set is of the form , thenone can use bandpass sampling and recoveras well.

Example 1: Consider a periodic stream of weightedDiracs with period and sinc sampling kernel withbandwidth , ( ), as illustrated in Fig. 2(a)and (b). The lowpass approximation is obtained by filteringthe stream of Diracs with the sampling kernel, as illustrated inFig. 2(c). The reconstruction of from the samples is exactto machine precision and, thus, is not shown. The annihilatingfilter is illustrated in Fig. 3, and it can be seen that the locationsof the Diracs are exactly the roots of the filter. In the third stepof the proof of Theorem 1, there is a factorization step. This canbe avoided by a method that is familiar in the coding literatureand known as the Berlekamp–Massey algorithm [1]. In ourcase, it amounts to a spectral extrapolation procedure.

Corollary 1: Given the annihilating filterand the spectrum , one can

recover the entire spectrum of by the following recursion:

(28)

For negative , use since is real.Proof: The Toeplitz system specifying the annihilating

filter shows that the recursion must be satisfied for all. Thisis a th-order recursive difference equation. Such a differenceequation is uniquely specified by initial conditions, in ourcase by , showing that the entire Fourier seriesis specified by (28).


Fig. 2. Stream of Diracs. (a) Periodic stream ofK = 8 weighted Diracs with period� = 1024. (b) Sinc sampling kernelh (t). (c) Lowpass approximationy(t),dashed lines are samplesy(nT ), T = 32. Sample valuesy(nT ) = hh (t� nT ); x(t)i. (d) Fourier seriesX[m] (dB). (e) Central Fourier seriesH [m];m =�50; . . . ; 50 (dB). (f) Central Fourier series of sample valuesY [m];m = �50; . . . ; 50 (dB).

Fig. 3. Real and imaginary parts of the annihilating filtera(t) =A(z)j = 1� e . The roots of theannihilating filter are exactly the locations of the Diracs.

The above recursion is routinely used in error-correctioncoding in order to fill in the error spectrum based on theerror locator polynomial. In that application, finite fields areinvolved, and there is no problem with numerical precision. Inour case, the real and complex field is involved, and numericalstability is an issue. More precisely, has zeros on the unitcircle, and the recursion (28) corresponds to convolution with

, that is, a filter with poles on the unit circle. Such afilter is marginally stable and will lead to numerical instability.Thus, Corollary 1 is mostly of theoretical interest.

B. Nonuniform Splines

In this section, we consider periodic nonuniform splinesof period . A signal is a periodic nonuniform spline ofdegree with knots at if and only if its( )th derivative is a periodic stream of weighted Diracs

, where , and. Thus, the rate of innovation is

(29)

Using (12), we can state that the Fourier series coefficients ofare . Differ-

entiating (10) times shows that these coefficients are

(30)

This shows that can be annihilated by a filterof length . From Theorem 1, we can recover the

periodic stream of Diracs from the Fourier series coefficientsand thus follows the periodic

nonuniform spline.Theorem 2: Consider a periodic nonuniform spline with

period , containing pieces of maximum degree. Take asinc sampling kernel such that is greater or equal tothe rate of innovation given by (29), and sampleat uniform locations , where

, and . Then, is uniquelyrepresented by the samples

(31)

Proof: The proof is similar to the proof of Theorem 1.We determine the Fourier series from the

samples exactly as in Step 1 in the proofof Theorem 1 .


Fig. 4. Nonuniform spline. (a) Periodic nonuniform linear spline withK = 4 knots or transitions with period� = 1024. (b) Sinc sampling kernelh (t).(c) Lowpass approximationy(t); dashed lines are samplesy(nT ), T = 64. Sample valuesy(nT ) = hh (t � nT ); x(t)i. (d) Fourier seriesX[m] (dB).(e) Central Fourier seriesH [m];m = �50; . . . ; 50 (dB). (f) Central Fourier seriesY [m];m = �50; . . . ; 50 (dB). Reconstruction is within machine precision.

Note that the Fourier series coefficients of the ( ) dif-ferentiated nonuniform spline are given by (30); therefore, thevalues provide .We substitute by in Steps 2–4 in the proof ofTheorem 1 and thus obtain the stream ofDiracs ,that is, the locations and weights .

The nonuniform spline (see Fig. 4) is obtained using (30)to get the missing (that is, ) and substitutingthese values in (10).3

C. Derivative of Diracs

Derivative of Diracs are considered in this section toset the grounds for the following section on piecewisepolynomial signals. The Dirac function is a distri-bution function whose th derivative has the property

, where is timescontinuously differentiable [9].

Consider a periodic stream of differentiated Diracs

(32)

with the usual periodicity conditions and.

Note that there are locations and weightsthat makes at most degrees of freedom per period, thatis, the rate of innovation is

(33)

3Note thatX[0] is obtained directly.

The corresponding Fourier series coefficients are given by

(34)Let and ; then, theFourier series simplifies to

(35)

From Proposition 4 in Appendix A, the filter an-nihilates the exponential , with , and therefore,the filter defined by

(36)

with poles at annihilates . The locations ofthe differentiated Diracs are obtained by first finding the annihi-lating filter coefficients and then findingthe roots of . The weights , on the other hand, are foundby solving the system in (35) for .

Theorem 3: Consider a periodic stream of differentiatedDiracs with period , as in (32). Take as a samplingkernel sinc , where is greater or equal tothe rate of innovation given by (33), and sampleat uniform locations , where

and . Then, the samples

(37)

are a sufficient characterization of .


Fig. 5. Piecewise polynomial spline. (a) Periodic piecewise linear spline withK = 4 knots or transitions with period� = 1024. (b) Sinc sampling kernelh (t). (c) Lowpass approximationy(t); dashed lines are samplesy(nT ), T = 32. Sample valuesy(nT ) = hh (t�nT ); x(t)i. (d) Fourier seriesX[m] (dB).(e) Central Fourier seriesH [m];m = �50; . . . ; 50 (dB). (f) Central Fourier series of sample valuesY [m];m = �50; . . . ; 50 (dB). Reconstruction from thesamples is within machine precision.

Proof: We follow the same steps as in the proof of The-orem 1. From the samples with , we ob-tain the values from which we determinethe annihilating filter coefficients . In Step3, the annihilating filter is factored as in (36), from which weobtain the multiple roots

and, therefore, the locations . The weights arefound by solving the generalized Vandermonde system in (35)for , from which we obtain . Sim-ilar to usual Vandermonde matrices, the determinant of the ma-trix given by the system vanishes only when for some

. This is not our case, and thus, the matrix is nonsingularand, therefore, admits a unique solution.

D. Piecewise Polynomials

Similar to the definition of a periodic nonuniform splines, asignal is a periodic piecewise polynomial with pieceseach of maximum degree if and only if its ( ) derivativeis a stream of differentiated Diracs, that is, given by

with the usual periodicity condi-tions. The degrees of freedom per period arefrom the loca-tions and from the weights; thus, the rate ofinnovation is

(38)

By analogy with nonuniform splines, we have the followingTheorem 4: Consider a periodic piecewise polynomial

with period , containing pieces of maximum degree. Take

a sinc sampling kernel such that is greater or equal tothe rate of innovation given by (38), and sampleat uniform locations , where

, and . Then, is uniquelyrepresented by the samples

(39)

Proof: The proof follows the same steps as the proof ofTheorems 2 and 3.

First, from the samples , we obtain the Fourier se-ries coefficients of the periodic piecewise polynomial signal

. Then, using (30), we obtainFourier series coefficients of the ( ) times differentiatedsignal, which is a stream of differentiated Diracs. UsingTheorem 3, we are able to recover the stream of differentiatedDiracs from . Similar to the pe-riodic nonuniform spline, the periodic piecewise polynomial

(see Fig. 5) is obtained using (30) to get the missing(that is, ) and substituting these values in (10).

IV. FINITE-LENGTH SIGNALS WITH FINITE

RATE OF INNOVATION

A finite-length signal with finite rate of innovation clearlyhas a finite number of degrees of freedom. The question of in-terest is as follows: Given a sampling kernel withinfinite sup-port, is there afinite set of samplesthat uniquely specifies thesignal? In the following sections, we will sample signals with fi-nite number of weighted Diracs with infinite support samplingkernels such as the sinc and Gaussian.


A. Sinc Sampling Kernel

Consider a continuous-time signal with a finite number ofweighted Diracs

(40)

and the sinc sampling kernel. The sample values are obtained byfiltering the signal with a sinc sampling kernel. This is equiva-lent to taking the inner product between a shifted version of thesinc and the signal, that is, , where

sinc , with . The question that arisesis the following: How many of these samples do we need to re-cover the signal? The signal has 2degrees of freedom: fromthe weights and from the locations of the Diracs; thus,samples will be sufficient to recover the signal. Sim-ilar to the previous cases, the reconstruction method will requiresolving two systems of linear equations: one for the locations ofthe Diracs involving a matrix and one for the weights of theDiracs involving a matrix . These systems admit solutions ifthe following conditions are satisfied.

C1] Rank , where is definedby (45).

C2] Rank , where is defined by (43).Theorem 5: Given a finite stream of weighted Diracs and

a sinc sampling kernel with , if conditions C1]and C2] are satisfied, then the samples

(41)

are a sufficient representation of the signal.Proof: The sample values in (41) are equivalent to

sinc

(42)

Let us define the degree Lagrange polynomial, where . Mul-

tiplying both sides of (42) by , we find an expression interms of the interpolating polynomials.

(43)

(44)

Since the right-hand side of (43) is a polynomial of degreein the variable , applying finite differences makes the

left-hand side vanish,4 that is,

4Note that theK finite-difference operator plays the same role as the annihi-lating filter in the previous section.

. If we let , then this annihi-lating equation is equivalent to

(45)

(46)

where is an matrix. The system (46) admitsa nontrivial solution when and the Rank ,that is, condition C1]. Therefore, (45) can be used to find the

unknowns , which lead to the locations since theseare the roots of . Once the locations are determined,the weights of the Diracs are found by solving the system in(44) for . Since , the systemadmits a solution from condition C2].

Note that the result does not depend on. This of courseholds only in theory since in practice, the matrix may beill-conditioned if is not chosen appropriately. A natural so-lution to this conditioning problem is to take more than the crit-ical number of samples and solve (46) using a singular valuedecomposition (SVD). This is also the method of choice whennoise is present in the signal. The matrix, which is used tofind the weights of the Diracs, is less sensitive to the value ofand better conditioned on average than.

B. Gaussian Sampling Kernel

Consider sampling the same signal as in (40) but, this time,with a Gaussian sampling kernel . Similar tothe sinc sampling kernel, the samples are obtained by filteringthe signal with a Gaussian kernel. Since there are 2unknownvariables, we show next that samples with aresufficient to represent the signal.

Theorem 6: Given a finite stream of weighted Diracs anda Gaussian sampling kernel . If ,then the sample values

(47)

are sufficient to reconstruct the signal.Proof: The sample values are given by

(48)

If we let , , and, then (48) is equivalent to

(49)

Note that we reduced the expression to a linear combina-tion of real exponentials . Since , the annihilatingfilter method described in Appendix B allows us to determine

and . In addition, note that the Toeplitz system in (68) hasreal exponential components , and therefore,a solution exists when the number of equations is greater thanthe number of unknowns, that is, , and the rank of


Fig. 6. (a) Hat spline sampling kernel� (t=T ), T = 1. (b) Bilevel signal with up to two transitions in an interval[n; n+1] sampled with a hat sampling kernel.

the system is equal to , which is the case by hypothesis. Fur-thermore, must be carefully chosen; otherwise, the system isill conditioned. The locations are then given by

(50)

and the weights of the Diracs are simply given by

(51)

Here, unlike in the sinc case, we have an almost local recon-struction because of the exponential decay of the Gaussian sam-pling kernel, which brings us to the next topic.

V. INFINITE-LENGTH BILEVEL SIGNALS WITH FINITE

LOCAL RATE OF INNOVATION

In this section, we consider the dual problem of Section IV,that is,infinite lengthsignals with a finitelocal rateof innovation and sampling kernels withcompact support[19].In particular, the B-splines of different degreeare considered[17]

(52)

where the box spline is defined by for and0 elsewhere.

We develop local reconstruction algorithms that depend onmoving intervals equal to the size of the support of the sam-pling kernel.5 The advantage of local reconstruction algorithmsis that their complexity does not depend on the length of thesignal. In this section, we consider bilevel signals but similar tothe previous sections the results can be extended to piecewisepolynomials.

Let be an infinite-length continuous-time signal thattakes on two values 0 and 1 with initial conditionwith a finite local rate of innovation. These are called bilevelsignals and are completely represented by their transitionvalues .

Suppose a bilevel signal is sampled with a boxspline . The sample values obtained are

, which correspond to the area occupiedby the signal in the interval . Thus, if there is atmost one transition per box, then we can recover the transitionfrom the sample. This leads us to the following proposition.

5The size of the support of� (t=T ) is equal to(d + 1)T .

Proposition 1: A bilevel signal with initial con-dition is uniquely determined from the samples

, where is the box spline ifand only if there is at most one transition in each interval

.Proof: For simplicity, let , and suppose that

. If there are no transitions in the interval , then thesample value is . If there is one transition in the interval

, then the sample value is equal to fromwhich we uniquely obtain the transition value .To show necessity, suppose there are two transitions, inthe interval , then the sample value is equal to

and is not sufficient to determine both transitions.Thus, there must be at most one transition in an intervalto uniquely define the signal.

Now, consider shifting the bilevel signal by an unknown shift; then, there can be two transitions in an interval of length,

and one box function will not be sufficient to recover the tran-sitions. Suppose we double the sampling rate; then, the supportof the box sampling kernel is doubled, and we have two samplevalues , covering the interval , but thesevalues are identical (see their areas). Therefore, increasing thesampling rate is still insufficient.

This brings us to consider a sampling kernel not only with alarger support but with added information. For example, the hatspline function for and 0 elsewhere leadsto the sample values defined by .

Fig. 6 illustrates that there are two sample values covering theinterval from which we can uniquely determinethe signal.

Proposition 2: An infinite-length bilevel signal , withinitial condition , is uniquely determined from thesamples defined by , whereis the hat sampling kernel if and only if there are at most twotransitions in each interval .

Proof: Again, for simplicity, let , and suppose thesignal is known for and . First, we show suf-ficiency by showing the existence and uniqueness of a solution.Then, we show necessity by a counterexample.

Similar to the box sampling kernel, the sample valueswill depend on the configuration of the transitions in theinterval . If there are at most two transitions inthe interval , then the possible configurations are(0,0), (0,1), (0,2), (1,0), (1,1), (2,0), where the first and


second component indicate the number of transitions in theintervals , , respectively. Furthermore,since the hat sampling kernel is of degree one, we obtainfor each configuration a quadratic system of equations withvariables ,

(53)

(54)

which admits a solution in the given interval.As for uniqueness, if and , then this

implies configuration (0,0).If and , then the possible

configurations are (0,1),(0,2). By hypothesis, there are atmost two transitions in the interval ; therefore,if , then the configuration in the interval

is (0,1); otherwise, if , then the configurationis (0,2).

If and , then this impliesconfiguration (2,0).

If and , then this impliesconfiguration (1,0).

Necessity is shown by counterexample.Consider a bilevel signal with three transitions in the

interval [0,2] but with all three in the interval [0,1]. Then,the quadratic system of equations is

(55)

(56)

which does not admit a unique but an infinite number ofsolutions. Thus, there must be at most 2 transitions in aninterval [0,2].

The pseudo-code for sampling bilevel signals using thebox and hat functions are given in full detail in [10], [22]. Astronger condition than the one in Proposition 2 is to require

. In that case, we are ensured that on anyinterval of length 2 , there are at most two transitions, andtherefore, the reconstruction is unique. Based on Propositions1 and 2, we conjecture that using splines of degree, a localrate of innovation ensures unicity ofthe reconstruction.

VI. CONCLUSION

We considered signals with finite rate of innovation thatallow uniform sampling after appropriate smoothing and per-fect reconstruction from the samples. For signals like streamsof Diracs, nonuniform splines, and piecewise polynomials, wewere able to derive exact reconstruction formulas, even thoughthese signals are nonbandlimited.

The methods rely on separating the innovation in a signalfrom the rest and identifying the innovative part from the sam-ples only. In particular, the annihilating filter method plays akey role in isolating the innovation. Some extensions of theseresults, like to piecewise bandlimited signals, are presented in[11] and [21], and more details, including a full treatment of thediscrete-time case, are available in [10], [22], and [23].

To prove the sampling theorems, we assumed deterministic,noiseless signals. In practice, noise will be present, and this canbe dealt with by using oversampling and solving the various sys-tems involved using the singular value decomposition (SVD).Such techniques are standard, for example, in noisy spectral es-timation [16]. Initial investigations using these techniques in oursampling problem are promising [13].

It is of also interest to compare our notion of “finite rateof innovation” with the classic Shannon bandwidth [12]. TheShannon bandwidth finds the dimension of the subspace (perunit of time) that allows us to represent the space of signals ofinterest. For bandpass signals, where Nyquist’s rate is too large,Shannon’s bandwidth is the correct notion, as it is for certainother spread-spectrum signals. For pulse position modulation(PPM) [7], Shannon’s bandwidth is proportional to the numberof possible positions, whereas the rate of innovation is fixed perinterval.6 Thus, the rate of innovation coincides with our intu-ition for degrees of freedom.

This discussion also indicates that an obvious application ofour results is in communications systems, like, for example, inultrawide band communication (UWB). In such a system, a verynarrow pulse is generated, and its position is carrying the infor-mation. In [6], initial results indicate that a decoder can workat much lower rate than Nyquist’s rate by using our samplingresults. Finally, filtered streams of Diracs, known as shot noise[7], can also be decoded with low sampling rates while still re-covering the exact positions of the Diracs. Therefore, we expectthe first applications of our sampling results to appear in wide-band communications.

Finally, the results presented so far raise a number of ques-tions for further research. What other signals with finite rate ofinnovation can be sampled and perfectly reconstructed? Whattools other than the annihilating filter can be used to identify in-novation, and what other types of innovation can be identified?What are the multidimensional equivalents of the above results,and are they computationally feasible? These topics are cur-rently under investigation. Thus, the connection between sam-pling theory on the one hand, and spectral analysis and errorcorrection coding on the other hand, is quite fruitful.

APPENDIX AANNIHILATING FILTERS

Here, we give a brief overview of annihilating filters (see [16]for more details). First, we have the following definition.

Definition 2: A filter is called an annihilating filter ofa signal when

(57)

6The number of degrees of freedom per interval is 1 or 2, depending if theamplitude is fixed or free.


Next, we give annihilating filters for signals that are linear com-binations of exponentials.

Proposition 3: The signal , where, , is annihilated by the filter

(58)

Proof: Note that

(59)

(60)

(61)

Thus, annihilates .Next, we show that the signal is annihilated by a filter

with poles, where .Proposition 4: The signal is annihilated by the

filter

(62)

Proof: Note that

(63)

(64)

By differentiating times , we easily see that

(65)

This is true for . Thus,for all polynomials of degree less than or equal to ,in particular, for . Thus, annihilates .

It follows from Proposition 3 and 4 that the filterannihilates the signal .

APPENDIX BANNIHILATING FILTER METHOD

The annihilating filter method consists of finding the valuesand in

(66)

and is composed of three parts: First, we need to find the anni-hilating filter that involves solving a linear system of equations;second, we need to find the roots of the-transform of the an-nihilating filter, which is a nonlinear function; third, we mustsolve another linear system of equations to find the weights.

1) Finding the annihilating filter.The filter coefficients in

must be such that (57) is satisfied or

(67)

In matrix/vector form, the system in (67) is equivalent to

......

...

......

. . .

......

...

...

(68)Suppose values are available. Since there are

unknown filter coefficients, we need at leastequations, and therefore, must be greater or equal to

. Define the appropriate submatrix; then, thesystem will admit a solution when Rank

.In practice, this system is solved using an SVD where

the matrix is decomposed into . We obtainthat , where is a vector with 1on position and 0 elsewhere. The method can beadapted to the noisy case by minimizing , in whichcase, is given by the eigenvector associated with thesmallest eigenvalue of .

2) Finding the .Once the the filter coefficients are found, then the

values are simply the roots of the annihilating filter.

3) Finding the .To determine the weights , it suffices to take

equations in (66) and solve the system for. Let; then, in matrix vector form, we have

the following Vandermonde system:

......

...

......

(69)

and has a unique solution when

(70)

This concludes the annihilating filter method.


ACKNOWLEDGMENT

The authors would like to thank the reviewers for theirconstructive remarks. Suggestions by Prof. S. Shamai on theShannon bandwidth and by Prof. E. Telatar and Prof. R. Urbankeon the connection to error correction codes are greatly ac-knowledged. A special thank you to Prof. P. J. S. G. Ferreirafor his invaluable suggestions.

REFERENCES

[1] R. E. Blahut,Theory and Practice of Error Control Codes. Reading,MA: Addison-Wesley, 1983.

[2] T. M. Cover and J. A. Thomas,Elements of Information Theory. NewYork: Wiley Interscience, 1991.

[3] P. J. S. G. Ferreira and J. M. N. Vieira, “Locating and correcting errors inimages,” inProc. IEEE Int. Conf. Image Process., vol. I, Santa Barbara,CA, Oct 1997, pp. 691–694.

[4] G. H. Golub and C. F. Van Loan,Matrix Computations. Baltimore,MD: John Hopkins Univ. Press, 1996.

[5] A. J. Jerri, “The Shannon sampling theorem-its various extensions andapplications: A tutorial review,”Proc. IEEE, vol. 65, pp. 1565–1597,1977.

[6] J. Kusuma, A. Ridolfi, and M. Vetterli, “Sampling of communicationsystems with bandwidth expansion,” presented at the Int. Contr. Conf.,2002.

[7] E. Lee and D. Messerschmitt,Digital Communication. Boston, MA:Kluwer, 1994.

[8] S. Mallat,A Wavelet Tour of Signal Processing. San Diego, Ca: Aca-demic, 1998.

[9] J. E. Marsden,Elementary Classical Analysis. New York: Springer-Verlag, 1974.

[10] P. Marziliano, “Sampling Innovations,” Ph.D. dissertation, Swiss Fed.Inst. Technol., Audiovis. Commun. Lab., Lausanne, Switzerland, 2001.

[11] P. Marziliano, M. Vetterli, and T. Blu, “A sampling theorem for a classof piecewise bandlimited signals,”submitted for publication.

[12] J. L. Massey, “Toward an information theory of spread-spectrum sys-tems,” inCode Division Multiple Access Communications, S.G. Glisicand P.A. Leppanen, Eds. Boston, MA: Kluwer, 1995, pp. 29–46.

[13] A. Ridolfi, I. Maravic, J. Kusuma, and M. Vetterli, “Sampling signalswith finite rate of innovation: the noisy case,”Proc. ICASSP, 2002.

[14] T. J. Shan, A. M. Bruckstein, and T. Kailath, “Adaptive resolution ofoverlapping echoes,” inProc. IEEE Int. Symp. Circuits Syst., 1986.

[15] C. E. Shannon, “A mathematical theory of communication,”Bell Syst.Tech. J., vol. 27, 1948.

[16] P. Stoica and R. Moses,Introduction to Spectral Analysis. EnglewoodCliffs, NJ: Prentice-Hall, 2000.

[17] M. Unser, “Splines: A perfect fit for signal and image processing,”IEEESignal Processing Mag., vol. 16, pp. 22–38, Nov. 1999.

[18] , “Sampling–50 years after Shannon,”Proc. IEEE, vol. 88, pp.569–587, Apr. 2000.

[19] M. Vetterli, “Sampling of piecewise polynomial signals,” Swiss Fed.Inst. Technol., Lausanne, Switzerland, LCAV, 1999.

[20] M. Vetterli and J. Kovacevic, Wavelets and SubbandCoding. Englewood Cliffs, NJ: Prentice-Hall, 1995.

[21] M. Vetterli, P. Marziliano, and T. Blu, “Sampling discrete-time piece-wise bandlimited signals,” inProc. Sampling Theory and ApplicationsWorkshop, Orlando, FL, May 2001, pp. 97–102.

[22] , “Sampling signals with finite rate of innovation,” Swiss Fed. Inst.Technol., Lausanne, Switzerland, DSC-017, 2001.

[23] , “A sampling theorem for periodic piecewise polynomial signals,”in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Salt Lake City,UT, May 2001.

[24] J. M. N. Vieira and P. J. S. G. Ferreira, “Interpolation, spectrum analysis,error-control coding, and fault-tolerant computing,” inProc. IEEE Int.Conf. Acoust., Speech, Signal Process., vol. 3, 1997, pp. 1831–1834.

[25] A. J. Weiss, A. S. Willsky, and B. C. Levy, “Eigenstructure approachfor array-processing with unknown intensity coefficients,”IEEE Trans.Acoust., Speech, Signal Processing, vol. 36, pp. 1613–1617, Oct. 1988.

Martin Vetterli (F’95) received the Dipl. El.-Ing. de-gree from ETH Zürich (ETHZ), Zürich, Switzerland,in 1981, the M.S. degree from Stanford University,Stanford, CA, in 1982, and the D.Sc. degree fromthe Swiss Federal Institute of Technology, (EPFL)Lausanne, Switzerland, in 1986.

He was a Research Assistant at Stanford and EPFLand has worked for Siemens and AT&T Bell Labora-tories. In 1986, he joined Columbia University, NewYork, NY, where he was last an Associate Professorof electrical engineering and co-director of the Image

and Advanced Television Laboratory. In 1993, he joined the University of Cali-fornia, Berkeley, where he was a Professor with the Department of Electrical En-gineering and Computer Science until 1997. He now holds an Adjunct Professorposition at Berkeley. Since 1995, he has been a Professor of communication sys-tems at EPFL, where he chaired the Communications Systems Division from1996 to 1997 and heads the Audio-Visual Communications Laboratory. Since2001, he has directed the National Competence Center in Research on mobileinformation and communication systems (www.terminodes.org). He has heldvisiting positions at ETHZ (1990) and Stanford (1998). He is on the editorialboards ofAnnals of Telecommunications, Applied and Computational HarmonicAnalysis, andThe Journal of Fourier Analysis and Applications. He is the co-author, with J. Kovacevic´, of the bookWavelets and Subband Coding(Engle-wood Cliffs, NJ: Prentice-Hall, 1995). He has published about 85 journal paperson a variety of topics in signal/image processing and communications and holdsseven patents. His research interests include sampling, wavelets, multirate signalprocessing, computational complexity, signal processing for communications,digital video processing, and joint source/channel coding.

Dr. Vetterli is a member of SIAM and was the Area Editor for Speech,Image, Video, and Signal Processing of the IEEE TRANSACTIONS ON

COMMUNICATIONS. He received the Best Paper Award from EURASIP in 1984for his paper on multidimensional subband coding, the Research Prize of theBrown Bovery Corporation (Switzerland) in 1986 for his doctoral thesis, andthe IEEE Signal Processing Society’s Senior Awards in 1991 and in 1996(for papers with D. LeGall and K. Ramchandran, respectively). He won theSwiss National Latsis Prize in 1996, the SPIE Presidential award in 1999, andthe IEEE Signal Processing Technical Achievement Award in 2001. He is amember of the Swiss Council on Science and Technology (www.swtr.ch). Hewas a plenary speaker at various conferences (e.g., 1992 IEEE ICASSP).

Pina Marziliano was born in Montréal, QC,Canada, in 1971. She received the B.Sc. degree inapplied mathematics in 1994 and the M.Sc. degreein computer science (operations research) in 1996,both from the Université de Montréal. She receivedthe Ph.D. degree in communication systems fromthe Swiss Federal Institute of Technology, Lausanne,in 2001.

She is presently doing research on perceptualquality metrics for images and video at GenistaCorporation, Genimedia SA, Lausanne. Her research

interests include Fourier and wavelet analysis, sampling theory applied toaudio-visual communications, and perceptual quality metrics.

Thierry Blu (M’96) was born in Orléans, France, in1964. He received the Diplôme d’ingénieur degreefrom École Polytechnique, Paris, France, in 1986 andfrom Télécom Paris (ENST) in 1988. He received thePh.D degree in electrical engineering in 1996 fromENST for a study on iterated rational filterbanks ap-plied to wideband audio coding.

He is currently with the Biomedical ImagingGroup at the Swiss Federal Institute of Technology(EPFL), Lausanne, on leave from France TélécomNational Centre for Telecommunication Studies

(CNET), Issy-les-Moulineaux, France. His interests include (multi)wavelets,multiresolution analysis, multirate filterbanks, approximation and samplingtheory, and psychoacoustics.

©P

HO

TO

CR

ED

IT

© DIGITAL VISION

Digital Object Identifier 10.1109/MSP.2007.914998

Signal acquisition and reconstruction is at the heart of signal processing, andsampling theorems provide the bridge between the continuous and the dis-crete-time worlds. The most celebrated and widely used sampling theorem isoften attributed to Shannon (and many others, from Whittaker to Kotel’nikovand Nyquist, to name a few) and gives a sufficient condition, namely bandlim-

itedness, for an exact sampling and interpolation formula. The sampling rate, at twice themaximum frequency present in the signal, is usually called the Nyquist rate.Bandlimitedness, however, is not necessary as is well known but only rarely taken advan-tage of [1]. In this broader, nonbandlimited view, the question is: when can we acquire asignal using a sampling kernel followed by uniform sampling and perfectly reconstruct it?

The Shannon case is a particular example, where any signal from the subspace ofbandlimited signals, denoted by BL, can be acquired through sampling and perfectly

[Thierry Blu,

Pier-Luigi Dragotti,

Martin Vetterli,

Pina Marziliano,

and Lionel Coulot]

Sparse Sampling of Signal Innovations

[Theory, algorithms, and performance bounds]

1053-5888/08/$25.00©2008IEEE IEEE SIGNAL PROCESSING MAGAZINE [31] MARCH 2008

IEEE SIGNAL PROCESSING MAGAZINE [32] MARCH 2008

interpolated from the samples. Using the sinc kernel, or ideallow-pass filter, nonbandlimited signals will be projected ontothe subspace BL. The question is: can we beat Shannon at thisgame, namely, acquire signals from outside of BL and stillperfectly reconstruct? An obvious case is bandpass samplingand variations thereof. Less obvious are sampling schemestaking advantage of some sort of sparsity in the signal, andthis is the central theme of this article. That is, instead ofgeneric bandlimited signals, we consider the sampling ofclasses of nonbandlimited parametric signals. This allows usto circumvent Nyquist and perfectly sample and reconstructsignals using sparse sampling, at a rate characterized by howsparse they are per unit of time. In some sense, we sample atthe rate of innovation of the signal by complying withOccam’s razor principle [knownas Lex Parcimoniæ or Law ofParsimony: Entia non svnt mvlti-plicanda præter necessitatem, or,“Entities should not be multi-plied beyond necessity” (fromWikipedia)].

Besides Shannon’s samplingtheorem, a second basic result that permeates signal pro-cessing is certainly Heisenberg’s uncertainty principle,which suggests that a singular event in the frequencydomain will be necessarily widely spread in the time domain.A superficial interpretation might lead one to believe that aperfect frequency localization requires a very long timeobservation. That this is not necessary is demonstrated byhigh resolution spectral analysis methods, which achievevery precise frequency localization using finite observationwindows [2], [3]. The way around Heisenberg resides in aparametric approach, where the prior that the signal is a lin-ear combination of sinusoids is put to contribution.

If by now you feel uneasy about slaloming around Nyquist,Shannon, and Heisenberg, do not worry. Estimation of sparsedata is a classic problem in signal processing and communica-tions, from estimating sinusoids in noise, to locating errors indigital transmissions. Thus, there is a wide variety of availabletechniques and algorithms. Also, the best possible performanceis given by the Cramér-Rao lower bounds for this parametricestimation problem, and one can thus check how close to opti-mal a solution actually is.

We are thus ready to pose the basic questions of this arti-cle. Assume a sparse signal (be it in continuous or discretetime) observed through a sampling device that is a smoothingkernel followed by regular or uniform sampling. What is theminimum sampling rate (as opposed to Nyquist’s rate, whichis often infinite in cases of interest) that allows to recover thesignal? What classes of sparse signals are possible? What aregood observation kernels, and what are efficient and stablerecovery algorithms? How does observation noise influencerecovery, and what algorithms will approach optimal perform-ance? How will these new techniques impact practical applica-tions, from inverse problems to wideband communications?

And finally, what is the relationship between the presentedmethods and classic methods as well as the recent advances incompressed sensing and sampling?

SIGNALS WITH FINITE RATE OF INNOVATIONUsing the sinc kernel (defined as sinc(t) = sin π t/π t ), a signalx(t) bandlimited to [−B/2, B/2] can be expressed as

x(t) =∑

k∈ Z

xk sinc(Bt − k), (1)

where xk = 〈Bsinc(Bt − k), x(t)〉 = x(k/B) , as stated byShannon in his classic 1948 paper [4]. Alternatively, we can say

that x(t) has B degrees of freedomper second, since x(t) is exactlydefined by a sequence of real num-bers {xk}k∈ Z , spaced T = 1/B sec-onds apart. It is natural to call thisthe rate of innovation of the ban-dlimited process, denoted by ρ ,and equal to B.

A generalization of the space of bandlimited signals is thespace of shift-invariant signals. Given a basis function ϕ(t) thatis orthogonal to its shifts by multiples of T, or〈ϕ(t − kT), ϕ(t − k ′ T)〉 = δk−k ′ , the space of functionsobtained by replacing sinc with ϕ in (1) defines a shift-invariantspace S . For such functions, the rate of innovation is againequal to ρ = 1/ T.

Now, let us turn our attention to a generic sparse source,namely a Poisson process, which is a set of Dirac pulses,∑

k∈Zδ(t − tk ), where tk − tk−1 is exponentially distributed

with p.d.f. λe−λt. Here, the innovations are the set of posi-tions {tk}k∈ Z . Thus, the rate of innovation is the averagenumber of Diracs per unit of time: ρ = limT→∞ CT/ T, whereCT is the number of Diracs in the interval [−T/2, T/2]. Thisparallels the notion of information rate of a source based onthe average entropy per unit of time introduced by Shannonin the same 1948 paper. In the Poisson case with decay rateλ, the average delay between two Diracs is 1/λ; thus, the rateof innovation ρ is equal to λ. A generalization involvesweighted Diracs or

x(t) =∑

k∈ Z

xkδ(t − tk).

By similar arguments, ρ = 2λ in this case, since both posi-tions and weights are degrees of freedom. Note that thisclass of signals is not a subspace, and its estimation is a non-linear problem.

Now comes the obvious question: is there a sampling theo-rem for the type of sparse processes just seen? That is, can weacquire such a process by taking about ρ samples per unit oftime, and perfectly reconstruct the original process, just as theShannon sampling procedure does.

CAN WE BEAT SHANNON AT THISGAME, NAMELY, ACQUIRE SIGNALS

FROM OUTSIDE OF BL AND STILLPERFECTLY RECONSTRUCT?

The necessary sampling rate is clearly ρ , the rate ofinnovation. To show that it is sufficient can be done in anumber of cases of interest. The archetypal sparse signal isthe sum of Diracs, observed through a suitable samplingkernel. In this case, sampling theorems at the rate of inno-vation can be proven. Beyond the question of a representa-tion theorem, we also derive efficient computationalprocedures, showing the practicality of the approach. Nextcomes the question of robustness to noise and optimal esti-mation procedures under these conditions. We proposealgorithms to estimate sparse signals in noise that achieveperformance close to optimal. This is done by computingCramér-Rao bounds that indicate the best performance ofan unbiased estimation of the innovation parameters. Notethat when the signal-to-noise ratio (SNR) is poor, the algo-rithms are iterative and thus trade computational complexi-ty for estimation performance.

To easily navigate through the article, the reader will find themost frequent notations in Table 1 that will be used in the sequel.

SAMPLING SIGNALS AT THEIR RATE OF INNOVATIONWe consider a τ -periodic stream of K Diracs with amplitudes xk

located at times tk ∈ [0, τ [:

x(t) =K∑

k=1

∑

k ′∈ Z

xkδ(t − tk − k ′τ) . (2)

We assume that the signal x(t)is convolved with a sinc windowof bandwidth B, where Bτ is anodd integer and is uni formlysampled with sampling periodT = τ/N . (We wi l l use th ishypothesis throughout the arti-cle to simplify the expressionsand because it allows conver-gence of the τ -periodized sum ofsinc kernels.) We therefore wantto retr ieve the innovat ions xk and tk f rom then = 1, 2, . . . , N measurements

yn = 〈 x(t), sinc(B(nT− t))〉 =K∑

k=1

xkϕ(nT− tk),

(3)

where ϕ(t) =∑

k ′∈ Z

sinc(B(t − k ′τ)) = sin(π Bt)Bτ sin(π t/τ)

(4)

is the τ -periodic sinc function or Dirichlet kernel. Clearly,x(t) has a rate of innovation ρ = 2K/τ , and we aim todevise a sampling scheme that is able to retrieve the inno-vations of x(t) by operating at a sampling rate that is asclose as possible to ρ .

Since x(t) is periodic, we can use the Fourier series to rep-resent it. By expressing the Fourier series coefficients of x(t)we thus have

x(t) =∑

m∈ Z

xm e j2π mt/τ , where

xm = 1τ

K∑

k=1

xk e− j2π mtk/τ︸︷︷︸

umk

. (5)

We observe that the signal x(t) is completely determinedby the knowledge of the K amplitudes xk and the K locationstk, or equivalently, uk. By considering 2K contiguous valuesof xm in (5), we can build a system of 2K equations in 2Kunknowns that is linear in the weights xk but is highlynonlinear in the locations tk and therefore cannot be solvedusing classical linear algebra. Such a system, however,admits a unique solution when the Diracs locations are dis-tinct, which is obtained by using a method known in spec-tral estimation as Prony’s method [2], [3], [5], [6], andwhich we choose to call the annihilating filter method forthe reason clarified below. Call {hk}k=0,1,... ,K the filter coeffi-cients with z-transform

H(z) =K∑

k=0

hkz−k =K∏

k=1

(1 − ukz−1

). (6)

That is, the roots of H(z) correspond to the locationsuk = e− j2π tk/τ . It clearly follows that

hm ∗ xm =K∑

k=0

hkxm−k =K∑

k=0

K∑

k ′=1

xk ′

τhkum−k

k ′

=K∑

k ′=1

xk ′

τum

k ′

K∑

k=0

hku−kk ′

︸︷︷︸H(uk ′ )=0

= 0. (7)

The filter hm is thus called annihilating filter since it annihilatesthe discrete signal xm. The zeros of this filter uniquely definethe locations tk of the Diracs. Since h0 = 1, the filter coeffi-cients hm are found from (7) by involving at least 2K consecu-tive values of xm, leading to a linear system of equations; e.g., if

SYMBOL MEANINGx(t ), τ, xm τ -PERIODIC FINITE RATE OF INNOVATION SIGNAL AND ITS FOURIER COEFFICIENTSK, tk, xk, and ρ INNOVATION PARAMETERS: x(t ) = ∑K

k=1 xK δ(t − tk) FOR tε[0, τ [AND RATE OF INNOVATION OF THE SIGNAL: ρ = 2K/τ

ϕ(t ), B “ANTIALIASING” FILTER, PRIOR TO SAMPLING: TYPICALLY, ϕ(t ) = sinc BtNOTE: B × τ IS RESTRICTED TO BE AN ODD INTEGER

yn, ym, N, T (NOISY) SAMPLES {yn}n = 1, 2, . . . , N OF (ϕ ∗ x)(t )

AT MULTIPLES OF T = τ/N [SEE (14)] AND THEIR DFT COEFFICIENTS ymA, L RECTANGULAR ANNIHILATION MATRIX WITH L + 1 COLUMNS [SEE (12)]H(z ), hk AND H ANNIHILATING FILTER: z-TRANSFORM, IMPULSE RESPONSE AND VECTOR

REPRESENTATION

[TABLE 1] FREQUENTLY USED NOTATIONS.


we have xm for m = −K,−K + 1, . . . , K − 1, this system canbe written in square Toeplitz matrix form as follows:

⎡⎢⎢⎢⎣

x−1 x−2 · · · x−K

x0 x−1 · · · x−K+1...

.... . .

...

xK−2 xK−3 · · · x−1

⎤⎥⎥⎥⎦

⎡⎢⎢⎢⎣

h1

h2...

hK

⎤⎥⎥⎥⎦ = −

⎡⎢⎢⎢⎣

x0

x1...

xK−1

⎤⎥⎥⎥⎦ . (8)

If the xk s do not vanish, this K × K system of equations has aunique solution because any hm satisfying it is also such thatH(uk) = 0 for k = 1, 2, . . . K. Given the filter coefficients hm,the locations tk are retrieved from the zeros uk of the z-trans-form in (6). The weights xk arethen obtained by considering, forinstance, K consecutive Fourier-series coefficients as given in (5).By writing the expression of theseK coefficients in vector form, weobtain a Vandermonde system ofequations that yields a uniquesolution for the weights xk sincethe uk s are distinct. Notice that weneed in total no more than 2K consecutive coefficients xm tosolve both the Toeplitz system (8) and the Vandermonde system.This confirms our original intuition that the knowledge of only2K Fourier-series coefficients is sufficient to retrieve x(t).

We are now close to solving our original sampling question;the only remaining issue is to find a way to relate the Fourier-series coefficients xm to the actual measurements yn. AssumeN ≥ Bτ then, for n = 1, 2, . . . , N, we have that

yn = 〈x(t), sinc(Bt − n)〉 =∑

|m|≤�Bτ/2�T xm e j2π mn/N. (9)

Up to a factor NT = τ , this is simply the inverse discrete Fouriertransform (DFT) of a discrete signal bandlimited to[−�Bτ/2�, �Bτ/2�] and which coincides with xm in this band-width. As a consequence, the discrete Fourier coefficients of yn

provide Bτ consecutive coefficients of the Fourier series of x(t)according to

ym =N∑

n=1

yne− j2π mn/N

={

τ xm if |m| ≤ �Bτ/2�0 for other m ∈ [−N/2, N/2].

(10)

Let us now analyze the complete retrieval scheme more pre-cisely and draw some conclusions. First of all, since we need atleast 2K consecutive coefficients xm to use the annihilating fil-ter method, this means that Bτ ≥ 2K. Thus, the bandwidth ofthe sinc kernel, B, is always larger than 2K/τ = ρ, the rate ofinnovation. However, since Bτ is odd, the minimum number ofsamples per period is actually one sample larger,

N ≥ Bminτ = 2K + 1, which is the next best thing to criticalsampling. Moreover, the reconstruction algorithm is fast anddoes not involve any iterative procedures. Typically, the onlystep that depends on the number of samples, N, is the computa-tion of the DFT coefficients of the samples yn, which can ofcourse be implemented in O(N log2 N) elementary operationsusing the fast Fourier transform algorithm. All the other stepsof the algorithm (in particular, polynomial rooting) depend onK only, i.e., on the rate of innovation ρ.

MORE ON ANNIHILATIONA closer look at (7) indicates that any nontrivial filter{hk}k=0,1,... ,L where L ≥ K that has uk = e− j2π tk/τ as zeros

will annihilate the Fourier seriescoefficients of x(t). The converseis true: any filter with transferfunction H(z) that annihilates thexm is automatically such thatH(uk) = 0 for k = 1, 2, . . . , K.Taking (10) into account, thismeans that for such filters

L∑

k=0

hkym−k = 0, for all |m| ≤ �Bτ/2�. (11)

These linear equations can be expressed using a matrix formal-ism: let A be the Toeplitz matrix

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

L+ 1 columns︷︸︸︷⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

y−M+L y−M+L−1 · · · y−M

y−M+L+1 y−M+L · · · y−M+1...

. . .. . .

......

. . .. . .

......

. . .. . .

...

yM yM−1 · · · yM−L

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

where M = �Bτ/2�, (12)

and H = [h0, h1, . . . , hL]T the vector containing the coeffi-cients of the annihilating filter, then (11) is equivalent to

AH = 0, (13)

which can be seen as a rectangular extension of (8). Note that,unlike (6), H is not restricted to satisfy h0 = 1. Now, if wechoose L > K, there are L − K + 1 independent polynomials ofdegree L with zeros at {uk}k=1,2,... ,K , which means that thereare L − K + 1 independent vectors H which satisfy (13). As aconsequence, the rank of the matrix A does never exceed K. Thisprovides a simple way to determine K when it is not known apriori: find the smallest L such that the matrix A built accordingto (12) is singular, then K = L − 1.


BESIDES SHANNON’S SAMPLINGTHEOREM, A SECOND BASIC

RESULT THAT PERMEATESSIGNAL PROCESSING IS

CERTAINLY HEISENBERG’SUNCERTAINTY PRINCIPLE.

2M–

L+

1 ro

ws

A =

The annihilation property (11) satisfied by the DFT coeffi-cients ym is narrowly linked to the periodized sinc-Dirichletwindow used prior to sampling. Importantly, this approach canbe generalized to other kernels such as the (nonperiodized) sinc,the Gaussian windows [7], and more recently any window thatsatisfies a Strang-Fix like condition [8].

FRI SIGNALS WITH NOISE“Noise,” or more generally model mismatch are unfortunatelyomnipresent in data acquisition, mak-ing the solution presented in the previ-ous section only ideal. Schematically,perturbations to the finite rate of inno-vation (FRI) model may arise both inthe analog domain during, e.g., a trans-mission procedure, and in the digitaldomain after sampling (see Figure 1)—in this respect, quantization is a sourceof corruption as well. There is then noother option but to increase the sam-pling rate in order to achieve robust-ness against noise.

Thus, we consider the signal resulting from the convolution ofthe τ -periodic FRI signal (2) and a sinc-window of bandwidth B,where Bτ is an odd integer. Due to noise corruption, (3) becomes

yn =K∑

k=1

xkϕ(nT− tk) + εn for n = 1, 2, . . . , N, (14)

where T = τ/N and ϕ(t) is the Dirichlet kernel (4). Given thatthe rate of innovation of the signal is ρ , we will considerN > ρτ samples to fight the perturbation εn, making the dataredundant by a factor of N/(ρτ). At this point, we do not makespecific assumptions—in particular, of statistical nature—on εn.What kind of algorithms can be applied to efficiently exploit thisextra redundancy and what is their performance?

A related problem has already been encountered decadesago by researchers in spectral analysis where the problem offinding sinusoids in noise is classic [9]. Thus we will not try topropose new approaches regarding the algorithms. One of thedifficulties is that there is as yet no unanimously agreed opti-mal algorithm for retrieving sinusoids in noise, althoughthere has been numerous evaluations of the different methods(see, e.g., [10]). For this reason, our choice falls on the thesimplest approach, the total least-squares approximation(implemented using a singular value decomposition (SVD), anapproach initiated by Pisarenko in [11]), possibly enhanced byan initial “denoising” (more exactly, model matching) stepprovided by what we call Cadzow’s iterated algorithm [12].The full algorithm, depicted in Figure 2, is also detailed in itstwo main components in “Annihilating Filter: Total Least-Squares Method” and “Cadzow’s Iterative Denoising.”

By computing the theoretical minimal uncertainties knownas Cramér-Rao bounds on the innovation parameters, we willsee that these algorithms exhibit a quasi-optimal behavior down

to noise levels of the order of 5 dB (depending on the number ofsamples). In particular, these bounds tell us how to choose thebandwidth of the sampling filter.

TOTAL LEAST-SQUARES APPROACHIn the presence of noise, the annihilation equation (13) is notsatisfied exactly, yet it is still reasonable to expect that the mini-mization of the Euclidian norm ‖AH‖2 under the constraintthat ‖H‖2 = 1 may yield an interesting estimate of H. Of partic-ular interest is the solution for L = K—annihilating filter ofminimal size—because the K zeros of the resulting filter pro-vide a unique estimation of the K locations tk. It is known thatthis minimization can be solved by performing an SVD of A as

[FIG2] Schematical view of the FRI retrieval algorithm. The data are considered “too noisy”until they satisfy (11) almost exactly.

yn ynFFT Too

Noisy?

YesCadzow

No AnnihilatingFilter Method

Line

ar S

yste

m

tk

xk

tk

An algorithm for retrieving the innovations xk and tk from thenoisy samples of (14).

1) Compute the N-DFT coefficients of the samplesym = ∑N

n=1 yne−j2πnm/N .2) Choose L = K and build the rectangular Toeplitzmatrix A according to (12).3) Perform the SVD of A and choose the eigenvector[h0, h1, . . . , hK]T corresponding to the smallest eigen-value—i.e., the annihilating filter coefficients.4) Compute the roots e−j2πtk/τ of the z-transformH(z) = ∑K

k=0 hkz−k and deduce {tk}k=1,... ,K .5) Compute the least mean square solution xk of the Nequations {yn − ∑

k xkϕ(nT − tk)}n=1,2,...N .When the measures yn are very noisy, it is necessary to

first denoise them by performing a few iterations ofCadzow’s algorithm (see “Cadzow’s Iterative Denoising”),before applying the above procedure.

[FIG1] Block diagram representation of the sampling of an FRIsignal, with indications of potential noise perturbations in theanalog, and in the digital part.

∑kxkδ(t−tk)Sampling Kernel

Analog Noise

y(t) T

Digital Noise

ynϕ(t )

ANNIHILATING FILTER: TOTAL LEAST-SQUARES METHOD


defined by (12)—more exactly: an eigenvalue decomposition ofthe matrix ATA—and choosing for H the eigenvector correspon-ding to the smallest eigenvalue. More specifically, if A = USV T

where U is a (Bτ − K) × (K + 1) unitary matrix, S is a(K + 1) × (K + 1) diagonal matrix with decreasing positive ele-ments, and V is a (K + 1) × (K + 1) unitary matrix, then H isthe last column of V. Once the tk are retrieved, the xk followfrom a least mean square minimization of the differencebetween the samples yn and the FRI model (14).

This approach, summarized in “Annihilating Filter: TotalLeast-Squares Method,” is closely related to Pisarenko’s method[11]. Although its cost is much larger than the simple solutionpresented earlier, it is still essentially linear with N (excludingthe cost of the initial DFT).

EXTRA DENOISING: CADZOWThe previous algorithm works quite well for moderate values of thenoise—a level that depends on the number of Diracs. However, fora small SNR, the results may become unreliable and it is advisableto apply a robust procedure that “projects” the noisy samples ontothe sampled FRI model of (14). This iterative procedure was alreadysuggested by Tufts and Kumaresan in [13] and analyzed in [12].

As noticed earlier, the noiseless matrix A in (12) is ofrank K whenever L ≥ K. The idea consists thus in perform-ing the SVD of A, say A = USVT , and forcing to zero theL + 1 − K smallest diagonal coefficients of the matrix S toyield S′ . The resulting matrix A′ = US′VT is not Toeplitz any-more but its best Toeplitz approximation is obtained by aver-aging the diagonals of A′ . This leads to a new “denoised”sequence y ′

n that matches the noiseless FRI sample modelbetter than the original yn ’s. A few of these iterations lead tosamples that can be expressed almost exactly as bandlimitedsamples of an FRI signal. Our observation is that this FRIsignal is all the closest to the noiseless one as A is closer to asquare matrix, i.e., L = �Bτ/2�.

The computational cost of this algorithm, summarized in“Cadzow’s Iterative Denoising,” is higher than the annihilatingfilter method since it requires performing the SVD of a squarematrix of large size, typically half the number of samples.However, using modern computers we can expect to performthe SVD of a square matrix with a few hundred columns in lessthan a second. We show in Figure 3 an example of FRI signalreconstruction having seven Diracs whose 71 samples areburied in a noise with 5 dB SNR power (redundancy ≈ 5): the

[FIG3] Retrieval of an FRI signal with (a) seven Diracs from (b) 71noisy (SNR = 5 dB) samples.

0 0.2 0.4 0.6

(a)

(b)

0.8 1

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6Original and Estimated Signal Innovations : SNR = 5 dB

0 0.2 0.4 0.6 0.8 1−0.5

0

0.5

1

Noiseless Samples

0 0.2 0.4 0.6 0.8 1

−0.5

0

0.5

1

Noisy Samples : SNR = 5 dB

Original DiracsEstimated Diracs


Algorithm for “denoising” the samples yn of ”Annihilating Filter:Total Least-Squares Method.”

1) Compute the N-DFT coefficients of the samplesym = ∑N

n=1 yne−j2πnm/N .2) Choose an integer L in [K, Bτ/2] and build the rectangu-lar Toeplitz matrix A according to (12).3) Perform the SVD of A = USV T where U is a (2M − L + 1)×(L + 1) unitary matrix, S is a diagonal (L + 1) × (L + 1) matrix,and V is a (L + 1) × (L + 1) unitary matrix.4) Build the diagonal matrix S ′ from S by keeping only the K

most significant diagonal elements, and deduce the totalleast-squares approximation of A by A′ = US ′VT.5) Build a denoised approximation y ′

n of yn by averaging thediagonals of the matrix A′.6) Iterate step 2 until, e.g., the (K + 1)th largest diagonalelement of S is smaller than the K th largest diagonal ele-ment by some pre-requisite factor.The number of iterations needed is usually small (less than

ten). Note that, experimentally, the best choice for L in step 2is L = M.

CADZOW’S ITERATIVE DENOISING

total computation time is 0.9 s on a PowerMacintosh G5 at 1.8GHz. Another more striking example is shown in Figure 4where we use 1,001 noisy (SNR = 20 dB) samples to recon-struct 100 Diracs (redundancy ≈ 5): the total computationtime is 61 s. Although it is not easy to check on a crowdedgraph, all the Dirac locations have been retrieved very precise-ly, while a few amplitudes are wrong. The fact that the Diracsare sufficiently far apart (≥ 2/N) ensures the stability of theretrieval of the Dirac locations.

CRAMÉR-RAO BOUNDSThe sensitivity of the FRI model to noise can be evaluated theo-retically by choosing a statistical model for this perturbation.The result is that any unbiased algorithm able to retrieve theinnovations of the FRI signal from its noisy samples exhibits acovariance matrix that is lower bounded by Cramér-Rao Bounds(see “Cramér-Rao Lower Bounds”). As can be seen in Figure 5,the retrieval of an FRI signal made of two Diracs is almost opti-mal for SNR levels above 5 dB since the uncertainty on these

We are considering noisy real measurements Y = [y1, y2, . . . yN] of the form

yn =K∑

k=1

xkϕ(nT − tk) + εn

where εn is a zero-mean Gaussian noise of covariance R; usually the noise is assumed to be stationary: [R]n,n ′ = rn−n ′ wherern = E{εn ′+nεn ′ }. Then any unbiased estimate �(Y) of the unknown parameters [x1, x2, . . . , xK]T and [t1, t2, . . . tK]T has a covariancematrix that is lower bounded by the inverse of the Fisher information matrix (adaptation of [21, (6)])

cov{�} ≥ (�TR−1�)−1,

where � =

⎡⎢⎢⎣

ϕ(T − t1) · · · ϕ(T − tK) − x1ϕ′(T − t1) · · · −xKϕ ′(T − tK)

ϕ(2T − t1) · · · ϕ(2T − tK) −x1ϕ′(2T − t1) · · · −xKϕ ′(2T − tK)

......

......

ϕ(NT − t1) · · · ϕ(NT − tK) −x1ϕ′(NT − t1) · · · −xKϕ ′(NT − tK)

⎤⎥⎥⎦ .

Note that this expression holds quite in general: it does not require that ϕ(t ) be periodic or bandlimited, and the noise does not needto be stationary.

One-Dirac periodized sinc case—If we make the hypothesis that εn is N-periodic and ϕ(t ) is the Dirichlet kernel (4), thenthe 2 × 2 Fisher matrix becomes diagonal. The minimal uncertainties on the location of one Dirac, �t1, and on its ampli-tude, �x1, are then given by:

�t1

τ≥ Bτ

2π |x1|√

N

(∑

|m|≤�Bτ/2�

m2

rm

)−1/2

and �x1 ≥ Bτ√N

(∑

|m|≤�Bτ/2�

1rm

)−1/2

.

[FIG4] Retrieval of an FRI signal with (a) 100 Diracs from (b) 1,001 noisy (SNR = 20 dB) samples.

0 0.2 0.4

(a) (b)0.6 0.8 1

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6Original and Estimated Signal Innovations : SNR = 20 dB

Original 100 DiracsEstimated Diracs

0 0.2 0.4 0.6 0.8 1−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1,001 Samples : SNR = 20 dB

SignalNoise


CRAMÉR-RAO LOWER BOUNDS

locations reaches the (unbiased) theoretical minimum given byCramér-Rao bounds. Such a property has already been observedfor high-resolution spectral algorithms (and notably those usinga maximum likelihood approach)by Tufts and Kumaresan [13].

It is particularly instructive tomake the explicit computation forsignals that have exactly two inno-vations per period τ and where thesamples are corrupted with a whiteGaussian noise. The results, whichinvolve the same arguments as in[14], are given in “UncertaintyRelation for the One-Dirac Case” and essentially state that theuncertainty on the location of the Dirac is proportional to

1/√

NBτ when the sampling noise is dominant (white noisecase), and to 1/(Bτ) when the transmission noise is dominant(ϕ(t)-filtered white noise). In both cases, it appears that it is bet-

ter to maximize the bandwidth B ofthe sinc-kernel in order to minimizethe uncertainty on the location ofthe Dirac. A closer inspection of thewhite noise case shows that theimproved time resolution is obtainedat the cost of a loss of amplitudeaccuracy by a

√Bτ factor.

When K ≥ 2, the Cramér-Raoformula for one Dirac still holds

approximately when the locations are sufficiently far apart.Empirically, if the minimal difference (modulo τ ) between twoof the Dirac locations is larger than, say, 2/N, then the maxi-mal (Cramér-Rao) uncertainty on the retrieval of these loca-tions is obtained using the formula given in “UncertaintyRelation for the One-Dirac Case.”

DISCUSSION

APPLICATIONSLet us turn to applications of the methods developed so far.The key feature to look for is sparsity together with a goodmodel of the acquisition process and of the noise present in thesystem. For a real application, this means a thorough under-standing of the setup and of the physics involved (rememberthat we assume a continuous-time problem, and we do not startfrom a set of samples or a finite vector).

One main application to use the theory presented in this arti-cle is ultra-wide band (UWB) communications. This communica-tion method uses pulse position modulation (PPM) with verywideband pulses (up to several gigahertz of bandwidth).Designing a digital receiver using conventional sampling theorywould require analog-to-digital conversion (ADC) running at over5 GHz, which would be very expensive and power consumptionintensive. A simple model of an UWB pulse is a Dirac convolvedwith a wideband, zero mean pulse. At the receiver, the signal isthe convolution of the original pulse with the channel impulseresponse, which includes many reflections, and all this buried inhigh levels of noise. Initial work on UWB using an FRI frameworkwas presented in [15]. The technology described in the presentarticle is currently being transferred to Qualcomm Inc.

The other applications that we would like to mention,namely electroencephalography (EEG) and optical coherencetomography (OCT), use other kernels than the Dirichlet win-dow, and as such require a slight adaptation to what has beenpresented here.

EEG measurements during neuronal events like epilepticseizures can be modeled reasonably well by an FRI excitation toa Poisson equation, and it turns out that these measurementssatisfy an annihilation property [16]. Obviously, accurate local-ization of the activation loci is important for the surgical treat-ment of such impairment.

[FIG5] Retrieval of the locations of a FRI signal. (a) Scatterplot ofthe locations; (b) standard deviation (averages over 10,000realizations) compared to Cramér-Rao lower bounds.

0

0.2

0.4

0.6

0.8

1

Input SNR (dB)

(a)

(b)

Pos

ition

s

Two Diracs / 21 Noisy Samples

Retrieved LocationsCramér−Rao Bounds

−10 0 10 20 30 40 50

−10 0 10 20 30 40 50

10−1

100First Dirac

Pos

ition

s

Second Dirac

Input SNR (dB)

Observed Standard DeviationCramér−Rao Bound

10−3

10−5

−10 0 10 20 30 40 50

10−1

100

Pos

ition

s

10−3

10−5


UNBIASED RETRIEVAL OF THEINNOVATIONS OF THE FRI SIGNAL

FROM ITS NOISY SAMPLESEXHIBITS A COVARIANCE

MATRIX LOWER BOUNDEDBY CRAMÉR-RAO BOUNDS

In OCT, the measured signal can be expressed as a convolu-tion between the (low-)coherence function of the sensing laserbeam (typically, a Gabor function which satisfies an annihila-tion property) and an FRI signalwhose innovations are the locationsof refractive index changes andtheir range, within the objectimaged [17]. Depending on thenoise level and the model adequacy,the annihilation technique allows toreach a resolution that is potentiallywell-below the “physical” resolution implied by the coherencelength of the laser beam.

RELATION WITH COMPRESSED SENSINGOne may wonder whether the approach described here couldbe addressed using compressed sensing tools developed in[18] and [19]. Obviously, FRI signals can be seen as “sparse”in the time domain. However, differently from the com-pressed sensing framework, this domain is not discrete: theinnovation times may assume arbitrary real values. Yet,assuming that these innovations fall on some discrete grid{θn ′ }n ′=0,1,... ,(N ′−1) known a priori, one may try to addressour FRI interpolation problem as

minx ′

0,x ′1,... ,x ′

N ′−1

N ′−1∑

n ′=0

|x ′n ′ | under the constraint

N∑

n=1

∣∣∣∣yn

−N ′−1∑

n ′=0

x ′n ′ϕ(nT− θn ′)

∣∣∣∣2

≤ Nσ 2 , (15)

where σ 2 is an estimate of the noise power.In the absence of noise, it has been shown that this mini-

mization provides the parameters of the innovation, with“overwhelming” probability [19] using O(K log N ′) measure-ments. Yet this method is not as direct as the annihilating fil-ter method that does not require any iteration. Moreover, thecompressed-sensing approach does not reach the critical sam-pling rate, unlike the method proposed here which almostachieves this goal (2K + 1 samples for 2K innovations). On theother hand, compressed sensing is not limited to uniformmeasurements of the form (14) and could potentially accom-modate arbitrary sampling kernels—and not only the few onesthat satisfy an annihilation property. This flexibility is certainlyan attractive feature of compressed sensing.

In the presence of noise, the beneficial contribution of the�1 norm is less obvious since the quadratic program (15)does not provide an exactly K-sparse solution anymore,although �1/�2 stable recovery of the x ′

k ′ is statistically guar-anteed [20]. Moreover, unlike the method we are proposinghere which is able to reach the Cramér-Rao lower bounds(computed in “Cramér-Rao Lower Bounds”), there is no evi-dence that the �1 strategy may share this optimal behavior.In particular, it is of interest to note that, in practice, the

compressed sensing strategy involves random measurementselection, whereas arguments obtained from Cramér-Raobounds computation, namely, on the optimal bandwidth of

the sinc-kernel, indicate that, onthe contrary, it might be worthoptimizing the sensing matrix.

CONCLUSIONSSparse sampling of continuous-t ime sparse s ignals has beenaddressed. In particular, it was

shown that sampling at the rate of innovation is possible,in some sense applying Occam’s razor to the sampling ofsparse signals. The noisy case has been analyzed andsolved, proposing methods reaching the optimal perform-ance given by the Cramér-Rao bounds. Finally, a numberof applications have been discussed where sparsity can betaken advantage of. The comprehensive coverage given inthis article should lead to further research in sparse sam-pling, as well as new applications.

We consider the FRI problem of finding [x1, t1] from the Nnoisy measurements [y1, y2, . . . , yN]

yn = μn + εn with μn = x1ϕ(nτ/N − t1)

where ϕ(t ) is the τ -periodic, B-bandlimited Dirichlet kerneland εn is a stationary Gaussian noise. Any unbiased algorithmthat estimates t1 and x1 will do so up to an error quantifiedby their standard deviation �t1 and �x1, lower bounded byCramér-Rao formulas (see “Cramér-Rao Lower Bounds”).Denoting the noise power by σ 2 and the peak SNR byPSNR = |x1|2/σ 2, two cases are especially interesting:

• The noise is white, i.e., its power spectrum density is con-stant and equals σ 2. Then we find

�t1

τ≥ 1

π

√3Bτ

N(B 2τ 2 − 1)· PSNR−1/2 and

�x1

|x1| ≥√

Bτ

N· PSNR−1/2.

• The noise is a white noise filtered by ϕ(t ), then we find

�t1

τ≥ 1

π

√3

B 2τ 2 − 1· PSNR−1/2 and

�x1

|x1| ≥ PSNR−1/2.

In both configurations, we conclude that, in order to mini-mize the uncertainty on t1, it is better to maximize the band-width of the Dirichlet kernel, i.e., choose B such that Bτ = N ifN is odd, or such that Bτ = N − 1 if N is even. Since Bτ ≤ Nwe always have the following uncertainty relation

N · PSNR1/2 · �t1

τ≥

√3

π,

involving the number of measurements, N, the end noiselevel and the uncertainty on the position.


UNCERTAINTY RELATION FOR THE ONE-DIRAC CASE

ONE MAIN APPLICATION TO USETHE THEORY PRESENTED IN THISARTICLE IS ULTRA-WIDE BAND

(UWB) COMMUNICATIONS.


AUTHORSThierry Blu ([email protected]) received the “Diplômed’ingénieur” from École Polytechnique, France, in 1986and the Ph.D. from Télécom Paris,in 1996. His main field ofresearch is approximation/inter-polation/sampling theory andwavelets. He was with theBiomedical Imaging Group at theSwiss Federal Institute ofTechnology, Lausanne, and isnow with the Department ofElectronic Engineering, theChinese University of HongKong. He received two best paper awards from the IEEESignal Processing Society (2003 and 2006).

Pier-Luigi Dragotti ([email protected]) is a seniorlecturer (associate professor) in the Electrical and ElectronicEngineering Department at Imperial College London. Hereceived the laurea degree in electrical engineering from theUniversity Federico II, Naples, Italy, in 1997 and the master’sand Ph.D. degrees from the École Polytechnique Federale deLausanne, in 1998 and 2002, respectively. He was a visitingstudent at Stanford University and a summer researcher inthe Mathematics of Communications Department at BellLabs, Lucent Technologies. His research interests includewavelet theory, sampling and approximation theory, imageand video processing, and compression. He is an associate edi-tor of IEEE Transactions on Image Processing.

Martin Vetterli ([email protected]) received his engi-neering degree from Eidgenoessische TechnischeHochschule-Zuerich, his M.S. from Stanford University, andhis Doctorate from École Polytechnique Federale de Lausanne(EPFL). He was with Columbia University, New York, and theUniversity of California at Berkeley before joining EPFL. Heworks on signal processing and communications, e.g., wavelettheory and applications, joint source-channel coding, sensornetworks for environmental monitoring. He received thebest paper awards of IEEE Signal Processing Society(1991,1996, and 2006). He is the coauthor of the bookWavelets and Subband Coding (Prentice-Hall, 1995) and theforthcoming textbook Fourier and Wavelets: Theory,Algorithms and Applications.

Pina Marziliano ([email protected]) received the B.Sc. inapplied mathematics and the M.Sc. in computer sciencefrom the Université de Montréal, Canada, in 1994 and 1996,respectively. In 2001, she completed the Ph.D from the ÉcôlePolytechnique Fédérale de Lausanne, Switzerland. Shejoined Genimedia in 2002. Since 2003, she has been an assis-tant professor in the School of Electrical and ElectronicEngineering at the Nanyang Technological University inSingapore. She received IEEE Signal Processing Society2006 Best Paper Award. Her research interests include sam-pling theory and applications in communications and bio-medical engineering, watermarking, and perceptual qualitymetrics for multimedia.

Lionel Coulot ([email protected]) received the B.Sc. andthe M.Sc. degrees in 2005 and 2007, respectively, in communi-cation systems from the Swiss Federal Institute of Technology,Lausanne, where he is currently a Ph.D. candidate. He spent one

year on exchange at CarnegieMellon University and was anintern at Microsoft Research Asia,Beijing, PRC. His research inter-ests include image and signal pro-cessing as well as auction andgame theory.

REFERENCES[1] M. Unser, “Sampling—50 Years after Shannon,” Proc. IEEE, vol. 88, pp.569–587, Apr. 2000.

[2] S.M. Kay, Modern Spectral Estimation—Theory and Application. EnglewoodCliffs, NJ: Prentice-Hall, 1988.

[3] P. Stoica and R.L. Moses, Introduction to Spectral Analysis. Upper Saddle River,NJ: Prentice-Hall, 1997.

[4] C.E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J.,vol. 27, pp. 379–423 and 623–656, Jul. and Oct. 1948.

[5] R. Prony, “Essai expérimental et analytique,” Ann. École Polytechnique, vol. 1,no. 2, p. 24, 1795.

[6] S.M. Kay and S.L. Marple, “Spectrum analysis—A modern perspective,” Proc.IEEE, vol. 69, pp. 1380–1419, Nov. 1981.

[7] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of inno-vation,” IEEE Trans. Signal Processing, vol. 50, pp. 1417–1428, June 2002.

[8] P.-L. Dragotti, M. Vetterli, and T. Blu, “Sampling moments and reconstructingsignals of finite rate of innovation: Shannon meets Strang-Fix,” IEEE Trans.Signal Processing, vol. 55, pp. 1741–1757, May 2007.

[9] Special Issue on Spectral Estimation, Proc. IEEE, vol. 70, Sept. 1982.

[10] H. Clergeot, S. Tressens, and A. Ouamri, “Performance of high resolution fre-quencies estimation methods compared to the Cramér-Rao bounds,” IEEE Trans.Acoust., Speech, Signal Processing, vol. 37, pp. 1703–1720, Nov. 1989.

[11] V.F. Pisarenko, “The retrieval of harmonics from a covariance function,”Geophys. J., vol. 33, pp. 347–366, Sept. 1973.

[12] J.A. Cadzow, “Signal enhancement—A composite property mapping algo-rithm,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 49–62, Jan.1988.

[13] D.W. Tufts and R. Kumaresan, “Estimation of frequencies of multiple sinu-soids: Making linear prediction perform like maximum likelihood,” Proc. IEEE,vol. 70, pp. 975–989, Sept. 1982.

[14] P. Stoica and A. Nehorai, “MUSIC, maximum likelihood, and Cramér-Raobound,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 720–741, May1989.

[15] I. Maravic, J. Kusuma, and M. Vetterli, “Low-sampling rate UWB channel char-acterization and synchronization,” J. Comm. Netw., vol. 5, no. 4, pp. 319–327,2003.

[16] D. Kandaswamy, T. Blu, and D. Van De Ville, “Analytic sensing: Reconstructingpointwise sources from boundary Laplace measurements,” in Proc. Spie–WaveletXII, vol. 6701, San Diego, CA, 2007, pp. 670114-1/67011Y.

[17] T. Blu, H. Bay, and M. Unser, “A new high-resolution processing method forthe deconvolution of optical coherence tomography signals,” in Proc. ISBI’02, vol.III, Washington, D.C., 2002, pp. 777–780.

[18] D.L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, pp.1289–1306, Apr. 2006.

[19] E.J. Candès, J.K. Romberg, and T. Tao, “Robust uncertainty principles: Exactsignal reconstruction from highly incomplete frequency information,” IEEE Trans.Inform. Theory, vol. 52, pp. 489–509, Feb. 2006.

[20] E.J. Candès, J.K. Romberg, and T. Tao, “Stable signal recovery from incom-plete and inaccurate measurements,” Comm. Pure Appl. Math., vol. 59, pp.1207–1223, Mar. 2006.

[21] B. Porat and B. Friedlander, “Computation of the exact information matrix ofGaussian time series with stationary random components,” IEEE Trans. Acoust.,Speech, Signal Processing, vol. 34, pp. 118–130, Feb. 1986.

ESTIMATION OF SPARSE DATA ISA CLASSIC PROBLEM IN SIGNAL

PROCESSING ANDCOMMUNICATIONS.

[SP]

Sparse Sampling : Theory, Algorithms and Applications · Is there a sampling theory beyond Shannon?...

Documents

Transcript of Sparse Sampling : Theory, Algorithms and Applications · Is there a sampling theory beyond Shannon?...