Burst Deblurring: Removing Camera Shake Through Fourier Burst...

Burst Deblurring: Removing Camera ShakeThrough Fourier Burst Accumulation

Mauricio Delbracio Guilermo SapiroECE, Duke University

{mauricio.delbracio, guillermo.sapiro}duke.edu

Abstract

Numerous recent approaches attempt to remove im-age blur due to camera shake, either with one or mul-tiple input images, by explicitly solving an inverse andinherently ill-posed deconvolution problem. If the pho-tographer takes a burst of images, a modality availablein virtually all modern digital cameras, we show thatit is possible to combine them to get a clean sharpversion. This is done without explicitly solving anyblur estimation and subsequent inverse problem. Theproposed algorithm is strikingly simple: it performs aweighted average in the Fourier domain, with weightsdepending on the Fourier spectrum magnitude. Themethod’s rationale is that camera shake has a randomnature and therefore each image in the burst is gener-ally blurred differently. Experiments with real cameradata show that the proposed Fourier Burst Accumula-tion algorithm achieves state-of-the-art results an orderof magnitude faster, with simplicity for on-board imple-mentation on camera phones.

1. Introduction

One of the most challenging experiences in photog-raphy is taking images in low-light environments. Thebasic principle of photography is the accumulation ofphotons in the sensor during a given exposure time. Ingeneral, the more photons reach the surface the betterthe quality of the final image, as the photonic noiseis reduced. However, this basic principle requires thephotographed scene to be static and that there is no rel-ative motion between the camera and the scene. Oth-erwise, the photons will be accumulated in neighbor-ing pixels, generating a loss of sharpness (blur). Thisproblem is significant when shooting with hand-heldcameras, the most popular photography device today,in dim light conditions.

Under reasonable hypotheses, the camera shake can

be modeled mathematically as a convolution,

v = u ? k + n, (1)

where v is the noisy blurred observation, u is the latentsharp image, k is an unknown blurring kernel and n isadditive white noise. For this model to be accurate, thecamera movement has to be essentially a rotation in itsoptical axis with negligible in-plane rotation, e.g., [26].The kernel k results from several blur sources: lightdiffraction due to the finite aperture, out-of-focus, lightintegration in the photo-sensor, and relative motion be-tween the camera and the scene during the exposure.To get enough photons per pixel in a typical low lightscene, the camera needs to capture light for a periodof tens to hundreds of milliseconds. In such a situ-ation (and assuming that the scene is static and theuser/camera has correctly set the focus), the dominantcontribution to the blur kernel is the camera shake —mostly due to hand tremors.

Current cameras can take a burst of images, this be-ing popular also in camera phones. This has been ex-ploited in several approaches for accumulating photonsin the different images and then forming an image withless noise (mimicking a longer exposure time a posteri-ori, see e.g., [2]). However, this principle is disturbed ifthe images in the burst have blur. The classical math-ematical formulation as a multi-image deconvolution,seeks to solve an inverse problem where the unknownsare the multiple blurring operators and the underly-ing sharp image. This procedure, although producesgood results [30], is computationally very expensive,and very sensitive to a good estimation of the blurringkernels. Furthermore, since the inverse problem is ill-posed it relies on priors either or both for the calculusof the blurs and the latent sharp image.

Camera shake originated from hand tremor vibra-tions is essentially random [4, 11, 27]. This impliesthat the movement of the camera in an individual im-age of the burst is independent of the movement inanother one. Thus, the blur in one frame will be differ-

1

ent from the one in another image of the burst. Thereare mainly three sources of tremor: arm (< 5Hz), wrist(5-20Hz), and fingers (20-30Hz) (Fig. 2 in [11]). Thisimplies that the correlation between successive framesis typically low, unless the frames are acquired at avery fast shutter speed (>1/100s), but of course, insuch scenario there will be no camera shake blur.

Our work is built on this basic principle. We presentan algorithm that aggregates a burst of images takingwhat is less blurred of each frame to build an imagethat is sharper and less noisy than all the images in theburst. The algorithm is straightforward to implementand conceptually simple. It takes as input a series ofregistered images and computes a weighted average ofthe Fourier coefficients of the images in the burst. Withthe availability of accurate gyroscope and accelerome-ters in, for example, phone cameras, the registrationcan be obtained “for free,” rendering the whole algo-rithm very efficient for on-board implementation. Wealso completely avoid the explicit computation of theblurring kernel (as commonly done in the literature),which is not only an unimportant hidden variable forthe task at hand, but as mentioned above, still leavesthe ill-posed and computationally very expensive taskof solving the inverse problem.

Evaluation through synthetic and real experimentsshows that the final image quality is significantly im-proved. This is done without explicitly performing de-convolution, which generally introduces artifacts andalso a significant overhead. Comparison to state-of-the-art multi-image deconvolution algorithms shows thatour approach produces similar or better results whilebeing orders of magnitude faster and simpler. The pro-posed algorithm does not assume any prior on the la-tent image; exclusively relying on the randomness ofhand tremor.

The remaining of the paper is organized as follows.Section 2 discusses the related work and the substan-tial differences to what we propose. Section 3 ex-plains how a burst can be combined in the Fourier do-main to recover a sharper image, while in Section 4 wepresent an empirical analysis of the algorithm perfor-mance through the simulation of camera shake kernels.Section 5 details the algorithm implementation and inSection 6 we present results of the proposed aggrega-tion procedure in real data, comparing the algorithmto state-of-the-art multi-image deconvolution methods.We finally close in Section 7.

2. Related Work

Removing camera shake blur is one of the most chal-lenging problems in image processing. Although in thelast decade several image restoration algorithms have

emerged giving outstanding performance, its success isstill very dependent on the scene. Most image deblur-ring algorithms cast the problem as a deconvolutionwith either a known (non-blind) or an unknown blur-ring kernel (blind). See e.g., the review by Kundur andHatzinakos [14], where a discussion of the most classi-cal methods is presented.

Most blind deconvolution algorithms try to estimatethe latent image without any other input than thenoisy blurred image itself. A representative work isthe one by Fergus et al. [9]. This variational methodsparked many competitors seeking to combine naturalimage priors, assumptions on the blurring operator,and complex optimization frameworks, to simultane-ously estimate both the blurring kernel and the sharpimage [13, 17, 20, 24, 28].

Others attempt to first estimate the degradation op-erator and then applying a non-blind deconvolution al-gorithm. For instance, [6] accelerates the kernel estima-tion step by using fast image filters for explicitly detect-ing and restoring strong edges in the latent sharp im-age. Since the blurring kernel has typically a very smallsupport, the kernel estimation problem is better condi-tioned than estimating the kernel and the sharp imagesimultaneously [16, 17]. However, even in non-blind de-blurring, i.e., when the blurring kernels are known, theproblem is generally ill-posed, because the blur intro-duces zeros in the frequency domain. Thereby avoidingexplicit inversion, as here proposed, becomes critical.

Two or more input images can improve the esti-mation of both the underlying image and the blurringkernels. Rav-Acha and Peleg [23] claimed that “Twomotion-blurred images are better than one,” if the di-rection of the blurs are different. In [29] the authorsproposed to capture two photographs: one having ashort exposure time, noisy but sharp, and one with along exposure, blurred but with low noise. The twoacquisitions are complementary, and the sharp one isused to estimate the motion kernel of the blurred one.

The closest to our work are papers on multi-imageblind deconvolution [3, 5, 25, 30, 32]. In [3] the au-thors introduce a prior on the sparsity of the motionblur kernel to constraint the blind deblurring problem.Most of these multi-image algorithms introduce cross-blur penalty functions between image pairs. Howeverthis has the problem of growing combinatorially withthe number of images in the burst. This idea is ex-tended in [30] using a Bayesian framework for couplingall the unknown blurring kernels and the latent imagein a unique prior. Although this prior has numerousgood mathematical properties, its optimization is veryslow. The algorithm produces very good results butit may take several minutes or even hours for a typical

Figure 1: When the camera is set to a burst mode, severalphotographs are captured sequentially. Due to the randomnature of hand tremor, camera shake blur is mostly inde-pendent from one frame to the other. An image consistingof white dots was photographed with a handheld cameraat 1/4” to depict the camera motion kernels. The kernelsare mainly unidimensional regular trajectories that are notcompletely random (perfect random walk) nor uniform.

burst of 8-10 images of several megapixels. The very re-cent work by Park and Levoy [22] relies on an attachedgyroscope, now present in many phones and tablets,to align all the input images and to get an estimationof the blurring kernels. Then, a multi-image non-blinddeconvolution algorithm is applied. By taking a burstof images, the multi-image deconvolution problem be-comes less ill-posed allowing the use of simpler priors.This is explored in [12] where the authors introduce atotal variation prior on the underlying sharp image.

All these papers propose kernel estimation and tosolve an inverse problem of image deconvolution. Themain inconvenience of tackling this problem as a decon-volution, on top of the computational burden, is that ifthe convolution model is not accurate or the kernel isnot accurately estimated, the restored image will con-tain strong artifacts (such as ringing). Our approachis radically different. The idea is to fuse all the imagesin the burst without explicitly estimating the blurringkernels and subsequent inverse problem approach, buttaking the information that is less degraded from eachimage in the burst. In that sense our work is closerto multi-image denoising [2], however we consider thatthe input images may also be blurred.

3. Fourier Burst Accumulation

Camera shake originated from hand tremor vibra-tions has undoubtedly a random nature [4, 11, 27].The independent movement of the photographer handcauses the camera to be pushed randomly and unpre-dictably, generating blurriness in the captured image.Figure 1 shows several photographs taken with a dslrhandheld camera. The photographed scene consists ofa laptop displaying a black image with white dots. Thecaptured picture of the white dots illustrates the traceof the camera movement in the image plane. If thedots are very small —mimicking Dirac masses— their

photographs represent the blurring kernels themselves.The kernels mostly consist of unidimensional regularrandom trajectories. This stochastic behavior will bethe key ingredient in our proposed approach.

Let F denote the Fourier Transform and k theFourier Transform of the kernel k. Images are definedin a regular grid indexed by the 2D position x andthe Fourier domain is indexed by the 2D frequencyζ. Let us assume, without loss of generality, that thekernel k due to camera shake is normalized such that∫k(x)dx = 1. The blurring kernel is nonnegative since

the integration of incoherent light is always nonnega-tive. This implies that the motion blur does not am-plify the Fourier spectrum:

Claim 1. Let k(x) ≥ 0 and∫k(x) = 1. Then, |k(ζ)| ≤

1,∀ζ. (Blurring kernels do not amplify the spectrum.)

Proof.∣∣∣k(ζ)∣∣∣ =

∣∣∣∣∫ k(x)eix·ζdx

∣∣∣∣ ≤ ∫ |k(x)| dx =

∫k(x)dx = 1.

Most modern digital cameras have a burst modewhere the photographer is allowed to sequently takea series of images, one right after the other. Let usassume that the photographer takes a sequence of Mimages of the same scene u,

vi = u ? ki + ni, for i = 1, . . . ,M. (2)

The movement of the camera during any two imagesof the burst will be essentially independent. Thus,the blurring kernels ki will be mostly different for dif-ferent images in the burst. Hence, each Fourier fre-quency of u will be differently attenuated on eachframe of the burst. The idea is to reconstruct an im-age whose Fourier spectrum takes for each frequencythe value having the largest Fourier magnitude in theburst. Since a blurring kernel does not amplify theFourier spectrum (Claim 1), the reconstructed imagepicks what is less attenuated from each image of theburst.

More formally, let p be a non-negative integer, wewill call Fourier Burst Accumulation ( fba) to theFourier weighted averaged image,

up(x) = F−1

(M∑i=1

wi(ζ) · vi(ζ)

)(x), (3)

wi(ζ) =|vi(ζ)|p∑Mj=1 |vj(ζ)|p

,

where vi is the Fourier Transform of the individualburst image vi. The weight wi := wi(ζ) controls thecontribution of the frequency ζ of image vi to the fi-nal reconstruction up. Given ζ, for p > 0, the largerthe value of |vi(ζ)|, the more vi(ζ) contributes to theaverage, reflecting what we discussed above that thestrongest frequency values represent the least attenu-ated u components.

The integer p controls the aggregation of the imagesin the Fourier domain. If p = 0, the restored imageis just the average of the burst (as standard for exam-ple in the case of noise only), while if p → ∞, eachreconstructed frequency takes the maximum value ofthat frequency along the burst. This is stated in thefollowing claim; the proof is straightforward and it istherefore omitted.

Claim 2. Mean/Max aggregation. Suppose that vi(ζ)for i = 1, . . . ,M are such that |vi1(ζ)| = |vi2(ζ)| =. . . = |viq (ζ)| > |viq+1(ζ)| ≥ |viq+2(ζ)| ≥ . . . ≥ |viM (ζ)|and wi(ζ) is given by (3). If p = 0, then wi(ζ) =1M ,∀i (arithmetic mean pooling), while if p→∞, thenwi(ζ) = 1

q for i = i1, . . . , iq and wi(ζ) = 0 otherwise

(maximum pooling).

The Fourier weights only depend on the Fouriermagnitude and hence they are not sensitive to imagemisalignment. However, when doing the average in (3),the images vi have to be correctly aligned to mitigateFourier phase intermingling and get a sharp aggrega-tion. The images in our experiments are aligned us-ing SIFT correspondences and then finding the domi-nant homography between each image in the burst andthe first one (implementation details are given in Sec-tion 5). This pre-alignment step can be done exploitingthe camera gyroscope and accelerometer data.

Dealing with noise. The images in the burst areblurry but also contaminated with noise. In the idealcase where the input images are not contaminated withnoise, (3) is reduced to

wi =|vi|p∑Mj=1 |vj |p

=|ki · u|p∑Mj=1 |kj · u|p

=|ki|p∑Mj=1 |kj |p

, (4)

as long as |u| > 0. This is what we would like tohave: a procedure for weighting more the frequenciesthat are less attenuated by the different camera shakekernels. Since camera shake kernels have typically asmall support, of maximum only a few tenths of pixels,its Fourier spectrum magnitude varies very smoothly.Thus, |vi| can be smoothed out before computing theweights. This helps to remove noise and also to stabi-lize the weights (see Section 5).

(a) Frames crop 1-7 and the Fourier weights wi for p = 11.

p = 0 p = 3 p = 7 p = 11 p = 20 p = 50

(b) Fourier Aggregation results for different p values.

1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Frame

p=0

p=3

p=7

p=11

p=20

p=50

(c) Weights energy distribution

1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Frame

p=0

p=3

p=7

p=11

p=20

p=50

(d) Weighted frames energydistribution

Figure 2: Weights distribution of the Fourier Burst Aggre-gation when changing the value of p. As p increases, theweights are concentrated in fewer images and the aggre-gated image becomes sharper but also noisier.

4. Fourier Burst Accumulation Analysis

4.1. Weights Distribution

The value of p balances sharpness and noise reduc-tion. Although, one would always prefer to get a sharpimage, due to noise and the unknown Fourier phaseshift introduced by the aggregation procedure, the re-sultant image would not necessary be better as p→∞.Figure 2 shows an example of the proposed fba for aburst of 7 images, and the amount of contribution ofeach frame to the final aggregation. As p grows, theweights are concentrated in fewer images. The weightsmaps clearly show that different Fourier frequencies arerecovered from different frames. In this example, thehigh frequency content is uniformly taken from all theframes in the burst. This produces a strong noise re-duction behavior, in addition to the sharpening effect.

4.2. Statistical Performance

To show the statistical performance of the Fourierweighted accumulation, we carried out an empiricalanalysis applying the proposed aggregation with dif-ferent values of p. We simulated motion kernels follow-ing [11], where the (expected value) amount of blur iscontrolled by a parameter related to the exposure time.We also controlled the number of images in the burstand the noise level in each frame. The kernels were gen-erated by simulating the random shake of the camera

1/10

1/5

1/2

2/3

Figure 3: Simulated kernels due to hand tremor follow-ing [11]. Each row shows a set of simulated kernels (leftpanel) for different exposures Texp = 1/10, 1/5, 1/2, 2/3, andthe respective Fourier spectrum magnitude (right panel).The parameter Texp controls the amount of expected blur.

from the power spectral density of measured physiolog-ical hand tremor [4]. All the images were aligned bypre-centering the motion kernels before blurring theunderlying sharp image. Figure 3 shows several dif-ferent realizations for different exposure values Texp.Actually, the amount of blur not only depends on theexposure time but also on the focal distance, user ex-pertise, and camera dimensions and mass [1]. However,for simplicity, all these variables were controlled by thesingle parameter Texp.

We computed the empirical mean square error (mse)by randomly sampling hundreds of different motionkernels and Gaussian noise realizations, and then ap-plying the Fourier aggregation procedure. The meansquare error was decomposed into the bias and vari-ance terms, namely mse(up) = bias(up)

2 + var(up), tohelp visualize the behavior of the algorithm.

Figure 4 shows the average algorithm performancewhen changing the acquisition conditions. In general,the larger the value of p the smaller the bias and thelarger the variance. There is a minimum of the meansquare error for p ∈ [7, 30]. This is reasonable sincethere exists a tradeoff between variance reduction andbias. Although both the bias and the variance are af-fected by the noise level, the qualitative performanceof the algorithm remains the same. The bias is notaltered by the number of frames in the burst but thevariance is reduced as more images are used, implying again in the expected mse as more images are used. Onthe other hand, the exposure time mostly affects thebias, being much more significant for larger exposuresas expected.

5. Algorithm Implementation

The burst restoration algorithm is built on threemain blocks: Burst Registration, Fourier Burst Ac-cumulation, and Noise Aware Sharpening as a post-processing. These are described in what follows.

Burst Registration. There are several ways of regis-tering images (see [33] for a survey). In this work, we

0 1 3 7 11 20 30 500

1

2

3

4x 10

−3

p

Bias

s=0.01

s=0.02

s=0.04

s=0.08

s=0.16

0 1 3 7 11 20 30 500

1

2

3

4x 10

−3

p

Variance

0 1 3 7 11 20 30 500

2

4

6x 10

−3

p

MSE

(a) Noise level s

0 1 3 7 11 20 30 500

1

2

3

4

5x 10

−3

p

Bias

M=4

M=9

M=16

M=25

M=36

M=49

M=64

0 1 3 7 11 20 30 500

0.5

1

1.5x 10

−3

p

Variance

0 1 3 7 11 20 30 500

2

4

6x 10

−3

p

MSE

(b) Number of images M

0 1 3 7 11 20 30 500

2

4

6

8x 10

−4

p

Bias

Texp=1/10

Texp=1/5

Texp=1/3

Texp=1/2

Texp=2/3

Texp=1

0 1 3 7 11 20 30 500

1

2

3x 10

−4

p

Variance

0 1 3 7 11 20 30 500

0.5

1x 10

−3

p

MSE

(c) Exposure time Texp

Figure 4: Bias-Variance tradeoff. Average algorithm per-formance with respect to p when changing (a) the amountof noise in the input images s, (b) the number of images inthe burst M , and (c) the exposure time Texp. The rest ofthe parameters are set to M = 16, s = 0.04 and Texp = 1/3,unless other specified. With short exposures, the arithmeticaverage (p = 0) produces the best mse since the images arenot blurred. The bias does not depend on M , but the vari-ance can be significantly reduced by taking more images(light accumulation procedure). Noise affects the bias andthe variance terms (with the exception of p = 0 where thebias is unaffected). The mse plots show the existence ofa minimum for p ∈ [7, 30], indicating that the best is tobalance a perfect average and a max pooling.

use image correspondences to estimate the dominanthomography relating every image of the burst and areference image (the first one in the burst). Althoughrestrictive, the homography assumption is valid if thescene is planar (or far from the camera) or the view-point location is fixed, e.g., the camera only rotatesaround its optical center. Image correspondences arefound using SIFT features [19] and then filtered outthrough the orsa algorithm [21], a variant of the socalled ransac method [10]. Recall that as in prior art,e.g., [22], the registration can be done with the gyro-scope and accelerometer information from the camera.

Fourier Burst Accumulation. Given the registeredimages {vi}Mi=1 we directly compute the correspondingFourier transforms {vi}Mi=1. Since camera shake mo-tion kernels have a small spatial support, their Fourierspectrum magnitudes vary very smoothly. Thus, |vi|can be lowpass filtered before computing the weights,that is, |¯vi| = Gσ|vi|, where Gσ is a Gaussian filter ofstandard deviation σ. The strength of the low pass fil-ter (controlled by the parameter σ) should depend on

the assumed kernel size (the smaller the kernel the moreregular its Fourier spectrum magnitude). In our imple-mentation we set σ = min(mh,mw)/ks, where ks = 50 pix-els and the image size is mh×mw pixels. Although thislow pass filter is important, the results are not too sen-sitive to the value of σ.

The final Fourier burst aggregation is (note that thesmoothing is only applied to the weights calculation)

up = F−1

(M∑i=1

wi · vi

), wi =

|¯vi|p∑Mj=1 |¯vj |p

. (5)

The extension to color images is straightforward.The accumulation is done channel by channel using thesame Fourier weights for all channels. The weights arecomputed by arithmetically averaging the Fourier mag-nitude of the channels before the low pass filtering.

Noise Aware Sharpening. While the results of theFourier burst accumulation are already very good, con-sidering that the process so far has been computa-tionally non-intensive, one can optionally apply a fi-nal sharpening step if resources are still available. Thesharpening must contemplate that the reconstructedimage may have some remaining noise. Thus, we firstapply a denoising algorithm (we used nlbayes [15] 1),then on the filtered image we apply a Gaussian sharp-ening. To avoid removing fine details we finally addback a percentage of what has been removed duringthe denoising step.

Memory and Complexity Analysis. Once the im-ages are registered, the algorithm runs in O(M · m ·logm), where m = mh ×mw is the number of imagepixels and M the number of images in the burst. Theheaviest part of the algorithm is the computation ofM fft, very suitable and popular in vlsi implemen-tations. This is the reason why the method has a verylow complexity. Regarding memory consumption, thealgorithm does not need to access all the images simul-taneously and can proceed in an online fashion. Thiskeeps the memory requirements to only three buffers:one for the current image, one for the current average,and one for the current weights sum.

6. Experimental Results

We captured several handheld bursts with differentnumber of images using a Canon 400D dslr cameraand the back camera of an iPad tablet. The full re-stored images and the details of the camera parame-ters are shown in Figure 5. The photographs contain

1A variant of this is already available on camera phones, sowe stay at the level of potential on-board implementations.

complex structure, texture, noise and saturated pixels,and were acquired under different lighting conditions.All the results were computed using p = 11.

Comparison to multi-image blind deblurring.Since this problem is typically addressed by multi-image blind deconvolution techniques, we selected twostate-of-the-art algorithms for comparison [25, 30].Both algorithms are built on variational formulationsand estimate first the blurring kernels using all theframes in the burst and then do a step of multi-imagenon-blind deconvolution, requiring significant memoryfor normal operation. We used the code provided bythe authors. The algorithms rely on parameters thatwere manually tuned to get the best possible results.We also compare to the simple align and average algo-rithm (which indeed is the particular case p = 0).

Figures 6 and 7 show some crops of the restoredimages by all the algorithms. In addition, we showtwo input images for each burst: the best one in theburst and a typical one in the series. The full sequencesare available in the supplementary material. The pro-posed algorithm obtains similar or better results thanthe one by Zhang et al. [30], at a significantly lowercomputational and memory cost. Since this algorithmexplicitly seeks to deconvolve the sequence, if the con-volution model is not perfectly valid or there is mis-alignment, the restored image will have deconvolutionartifacts. This is clearly observed in the bookshelf se-quence where [30] produces a slightly sharper restora-tion but having ringing artifacts (see Jonquet book).Also, it is hard to read the word “Women” in the spineof the red book. Due to the strong assumed priors,[30] generally leads to very sharp images but it mayalso produce overshooting/ringing in some regions likein the brick wall (parking night).

The proposed method clearly outperforms [25] in allthe sequences. This algorithm introduces strong ar-tifacts that degraded the performance in most of thetested bursts. Tuning the parameters was not trivialsince this algorithm relies on 4 parameters that the au-thors have linked to a single one (named γ). We sweptthe parameter γ to get the best possible performance.

Our approach is conceptually similar to a regularalign and average algorithm, but it produces signifi-cantly sharper images while keeping the noise reductionpower of the average principle. In some cases with nu-merous images in the burst (e.g., see the parking nightsequence), there might already be a relatively sharpimage in the burst (lucky imaging). Our algorithmdoes not need to explicitly detect such “best” frame,and naturally uses the others to denoise the frequenciesnot containing image information but noise.

woods 13 imgsiso 1600, 1/8”Canon 400D

parking night 10 imgsiso 1600, 1/3”Canon 400D

bookshelf 10 imgsiso 100, 1/6”Canon 400D

auvers 12 imgsiso 400, 1/2”, iPad

anthropologie [22] 8 imgsiso 100, 353 ms

tequila [22] 8 imgsiso 100, 177 ms

Figure 5: Restoration of image bursts captured using different cameras. Full images and additional results given in thesupplementary material [8].

Execution time. Once the images are registered,the proposed approach runs in only a few seconds inour Matlab experimental code, while [30] needs severalhours for bursts of 8-10 images. Even if the estimationof the blurring kernels is done in a cropped version (i.e.,200×200 pixels region), the multi-image non-blind de-convolution step is very slow, taking several hours for6-8 megapixel images.

Multi-image non-blind deconvolution. Figure 8shows the algorithm results in two sequences providedin [22]. The algorithm proposed in [22] uses gyroscopeinformation present in new camera devices to registerthe burst and also to have an estimation of the localblurring kernels. Then a more expensive multi-imagenon-blind deconvolution algorithm is applied to recoverthe latent sharp image. Our algorithm produces similarresults without explicitly solving any inverse problemnor using any information about the motion kernels.

7. Conclusions

We presented an algorithm to remove the camerashake blur in an image burst. The algorithm is built onthe idea that each image in the burst is generally differ-ently blurred; this being a consequence of the randomnature of hand tremor. By doing a weighted averagein the Fourier domain, we reconstruct an image com-bining the least attenuated frequencies in each frame.

This algorithm has several advantages. First, itdoes not introduce typical ringing or overshooting ar-tifacts present in most deconvolution algorithms. Thisis avoided by not formulating the deblurring problemas an inverse problem of deconvolution. The algorithmproduces similar or better results than the state-of-the-art multi-image deconvolution while being significantlyfaster and with lower memory footprint. As a futurework, we would like to incorporate a gyroscope registra-tion technique, e.g., [22], to create a real-time systemfor removing camera shake in image bursts. We arealso exploring the use of the framework presented inthis paper for HDR [18, 31] and video deblurring [7],with excellent preliminary results.

Typical Shot Best Shot Align and average

Sroubek &Milanfar [25]

Zhang et al. [30] proposed method

Figure 6: Real data burst deblurring results and compari-son with multi-image blind deconvolution methods (auvers).

Best

Shot

Park

&Levoy

[22]

proposed

method

Figure 8: Restoration results with the data provided in [22](anthropologie and tequila sequences).

Typical Shot Best Shot Align and average Sroubek &Milanfar [25]

Zhang et al. [30] proposed method(no final sharp.)

proposed method

Figure 7: Real data burst deblurring results and comparison to state-of-the-art multi-image blind deconvolution algo-rithms (woods, parking night, bookshelf sequences).

Acknowledgment

This work was partially funded by: ONR, ARO,NSF, NGA, and AFOSR.

References

[1] G. Boracchi and A. Foi. Modeling the performanceof image restoration from motion blur. IEEE Trans.Image Process., 21(8):3502–3517, 2012. 5

[2] T. Buades, Y. Lou, J.-M. Morel, and Z. Tang. A noteon multi-image denoising. In LNLA, 2009. 1, 3

[3] J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motiondeblurring using multiple images. J. Comput. Phys.,228(14):5057–5071, 2009. 2

[4] B. Carignan, J.-F. Daneault, and C. Duval. Quan-tifying the importance of high frequency componentson the amplitude of physiological tremor. Exp. BrainRes., 202(2):299–306, 2010. 1, 3, 5

[5] J. Chen, L. Yuan, C.-K. Tang, and L. Quan. Robustdual motion deblurring. In CVPR, 2008. 2

[6] S. Cho and S. Lee. Fast motion deblurring. ACMTrans. Graph., 28(5):145:1–145:8, 2009. 2

[7] S. Cho, J. Wang, and S. Lee. Vdeo deblurring forhand-held cameras using patch-based synthesis. ACMTrans. Graph., 31(4):64:1–64:9, 2012. 7

[8] M. Delbracio and G. Sapiro. Burst Deblurring: Re-moving Camera Shake Through Fourier Burst Ac-cumulation (Supplementary Material). http://dev.

ipol.im/~mdelbra/fba, 2015. 7

[9] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, andW. T. Freeman. Removing camera shake from a singlephotograph. ACM Trans. Graph., 25(3):787–794, 2006.2

[10] M. A. Fischler and R. C. Bolles. Random sample con-sensus: a paradigm for model fitting with applicationsto image analysis and automated cartography. Comm.ACM, 24(6):381–395, 1981. 5

[11] F. Gavant, L. Alacoque, A. Dupret, and D. David. Aphysiological camera shake model for image stabiliza-tion systems. In IEEE Sensors, 2011. 1, 2, 3, 4, 5

[12] A. Ito, A. C. Sankaranarayanan, A. Veeraraghavan,and R. G. Baraniuk. Blurburst: Removing blur dueto camera shake using multiple images. ACM Trans.Graph., Submitted. 3

[13] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolu-tion using a normalized sparsity measure. In CVPR,2011. 2

[14] D. Kundur and D. Hatzinakos. Blind image deconvo-lution. IEEE Signal Process. Mag., 13(3):43–64, 1996.2

[15] M. Lebrun, A. Buades, and J.-M. Morel. Implementa-tion of the “Non-Local Bayes” Image Denoising Algo-rithm. IPOL, 3:1–42, 2013. 6

[16] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman.Understanding and evaluating blind deconvolution al-gorithms. In CVPR, 2009. 2

[17] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman.Efficient marginal likelihood optimization in blind de-convolution. In CVPR, 2011. 2

[18] M. Levoy. HDR+: Low Light and High DynamicRange photography in the Google Camera App.http://googleresearch.blogspot.fr/2014/10/

hdr-low-light-and-high-dynamic-range.html. 7

[19] D. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision, 60:91–110, 2004. 5

[20] T. Michaeli and M. Irani. Blind deblurring using in-ternal patch recurrence. In ECCV, 2014. 2

[21] L. Moisan, P. Moulon, and P. Monasse. AutomaticHomographic Registration of a Pair of Images, withA Contrario Elimination of Outliers. IPOL, 2:56–73,2012. 5

[22] S. H. Park and M. Levoy. Gyro-based multi-imagedeconvolution for removing handshake blur. CVPR,2014. 3, 5, 7

[23] A. Rav-Acha and S. Peleg. Two motion-blurred imagesare better than one. Pattern Recogn. Lett., 26(3):311–317, 2005. 2

[24] Q. Shan, J. Jia, and A. Agarwala. High-quality motiondeblurring from a single image. ACM Trans. Graph.,27(3), 2008. 2

[25] F. Sroubek and P. Milanfar. Robust multichannel blinddeconvolution via fast alternating minimization. IEEETrans. Image Process., 21(4):1687–1700, 2012. 2, 6, 7,8

[26] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniform deblurring for shaken images. Int. J. Comput.Vision, 98(2):168–186, 2012. 1

[27] F. Xiao, A. Silverstein, and J. Farrell. Camera-motionand effective spatial resolution. In ICIS, 2006. 1, 3

[28] L. Xu, S. Zheng, and J. Jia. Unnatural l0 sparse rep-resentation for natural image deblurring. In CVPR,2013. 2

[29] L. Yuan, J. Sun, L. Quan, and H.-Y. Shum. Image de-blurring with blurred/noisy image pairs. ACM Trans.Graph., 26(3), 2007. 2

[30] H. Zhang, D. Wipf, and Y. Zhang. Multi-image blinddeblurring using a coupled adaptive sparse prior. InCVPR, 2013. 1, 2, 6, 7, 8

[31] L. Zhang, A. Deshpande, and X. Chen. Denoisingvs. deblurring: Hdr imaging techniques using movingcameras. In CVPR, 2010. 7

[32] X. Zhu, F. Sroubek, and P. Milanfar. Deconvolvingpsfs for a better motion deblurring using multiple im-ages. In ECCV, 2012. 2

[33] B. Zitova and J. Flusser. Image registration meth-ods: a survey. Image Vision Comput., 21(11):977–1000, 2003. 5

http://dev.ipol.im/~mdelbra/fba

http://dev.ipol.im/~mdelbra/fba

http://googleresearch.blogspot.fr/2014/10/hdr-low-light-and-high-dynamic-range.html

http://googleresearch.blogspot.fr/2014/10/hdr-low-light-and-high-dynamic-range.html

Burst Deblurring: Removing Camera Shake Through Fourier Burst...

Documents

Transcript of Burst Deblurring: Removing Camera Shake Through Fourier Burst...