1 Introduction - University of Bristol

20
Mitigating The Effects of Atmospheric Distortion on Video Imagery : A Review Nantheera Anantrasirichai, Alin Achim, David Bull October 5, 2011 1 Introduction Various types of atmospheric distortion can influence the visual quality of a captured video signal. Distortion types include fog or haze which reduces contrast, and atmospheric turbulence due to temperature variations or airborne contaminants. A atmospheric turbulence is sometimes referred to as scintillation and anisoplanatism. This is a particular problem close to the ground in hot environments and can combine with other effects to detrimental effect in long range surveillance applications. Such obscurations in the lower atmosphere reduce the ability to see objects at a distance. A variation in temperature causes different interference patterns in the light refraction, leading to unclear, unsharp, waving images of the objects. Examples of the heat haze effect are found in areas such as hot roads and deserts, as well in proximity of aircraft jet engines streams. KYLMAR (a GD company) provides a wide range of surveillance cameras integrating with other sensors, such as thermal imaging. For example, the DCM1200 Medium Range Surveil- lance Camera offers very high performance with wide bandwidth zoom capability. The camera provides high-resolution imagery and is primarily intended for human identification at up to 4 km, providing high -contrast high-clarity imagery [1]. This report reviews the causes of atmospheric turbulence and various advanced solutions (see Figure 1). Existing commercial camera systems and prototypes are also reviewed. The report concludes with recommended approaches to address this problem in the context of the requirements of General Dynamics. Mulple images Image Selecon Image Alignment and Registraon Image Fusion - Lucky Regions - Pixel-based processing - Region-based processing d Image Deblurring - Blur model - Blind Deconvoluon - Super-resoluon - DT-CWT method Undistorted image - Block-based matching - Opcal ow - Non-rigid Deformaon - DT-CWT method - Image averaging Figure 1: Image restoration diagram for atmospheric turbulence 1

Transcript of 1 Introduction - University of Bristol

Page 1: 1 Introduction - University of Bristol

Mitigating The Effects of Atmospheric Distortion on Video Imagery : A Review

Nantheera Anantrasirichai, Alin Achim, David Bull

October 5, 2011

1 Introduction

Various types of atmospheric distortion can influence the visual quality of a captured videosignal. Distortion types include fog or haze which reduces contrast, and atmospheric turbulencedue to temperature variations or airborne contaminants. A atmospheric turbulence is sometimesreferred to as scintillation and anisoplanatism. This is a particular problem close to the groundin hot environments and can combine with other effects to detrimental effect in long rangesurveillance applications. Such obscurations in the lower atmosphere reduce the ability to seeobjects at a distance. A variation in temperature causes different interference patterns in thelight refraction, leading to unclear, unsharp, waving images of the objects. Examples of theheat haze effect are found in areas such as hot roads and deserts, as well in proximity of aircraftjet engines streams.

KYLMAR (a GD company) provides a wide range of surveillance cameras integrating withother sensors, such as thermal imaging. For example, the DCM1200 Medium Range Surveil-lance Camera offers very high performance with wide bandwidth zoom capability. The cameraprovides high-resolution imagery and is primarily intended for human identification at up to 4km, providing high -contrast high-clarity imagery [1].

This report reviews the causes of atmospheric turbulence and various advanced solutions(see Figure 1). Existing commercial camera systems and prototypes are also reviewed. Thereport concludes with recommended approaches to address this problem in the context of therequirements of General Dynamics.

Multiple images Image Selection

Image Alignment and Registration

Image Fusion

- Lucky Regions - Pixel-based processing

- Region-based processing

d

Image Deblurring

- Blur model - Blind Deconvolution - Super-resolution - DT-CWT method

Undistorted image

- Block-based matching - Optical flow - Non-rigid Deformation - DT-CWT method

- Image averaging

Figure 1: Image restoration diagram for atmospheric turbulence

1

Page 2: 1 Introduction - University of Bristol

Figure 2: Haze removal using a single image. (a) input haze image. (b) haze-removed image. [7]

2 Atmospheric Turbulence and its Reduction

Since the turbulence in the captured images makes it difficult to interpret information behindthe distorted layer, there has been significant research activity trying to faithfully reconstructthis useful information using various methods. The perfect solution however seems impossiblesince this problem is irreversible, even though the represented equation of the problem cansimply be written as Equation 1.

Iobv = DIidl + n (1)

where Iobv and Iidl are the observed and ideal images respectively. D represents geometric dis-tortion and blur, while n represents noise. Various approaches solve this problem by modellingit as a point spread function (PSF) and then employing deconvolution with an iterative processto estimate Iidl. For the heat haze case, the PSF is generally unknown, so blind deconvolu-tion is sometimes employed [2–4]. Some techniques however correct geometric distortion anddeblurring separately [5, 6].

A second problem that often accompanies turbulence is haze or fog. Using one observedimage for enhancement can be applied to reduce fog interference, whereas it is accepted thatmultiple images are required for turbulence reduction. For example, contrast enhancement canbe used to remove haze. It has been applied to distant areas in the image using a dark channelprior for single outdoor image haze removal by He et al. [7]. The dark channel prior is basedon the statistics of haze-free outdoor images. They found that, in most of the local regionswhich do not cover the sky, pixels often contain have very low intensity in at least one colour(rgb) channel. The results of this algorithm are shown in Figure 2 revealing significant visualimprovement.

It is obvious that, when using only a single image, it is difficult to remove the visible ripplesand waves caused by hot air turbelence. However, utilising a set of images to construct oneenhanced image (reference image) makes more useful information available compared to usinga single image. Current multi-frame methods that address the atmospheric turbulence problemare illustrated in Figure 1 either involving all functions or a subset of them.

The restoration process can be described by two main routes through the diagram. Thefirst (green line) employs an image registration technique with deformation estimation [6,8–13].

2

Page 3: 1 Introduction - University of Bristol

This process tries to align objects temporally to solve for small movements of the camera andtemporal variations due to atmospheric refraction. The image fusion block may subsequentlybe employed in order to combine several aligned images.

The other route (red line) employs image selection and fusion, known as ‘lucky region’techniques [14–20]. These regions of the input frames that have the best quality in the temporaldirection are selected. These are found using an image quality metric, which is usually appliedin the spatial frequency domain, to extract the best quality (minimally distorted and leastblurred) and then they are combined in an intelligent way. Recently this method has beenimproved by applying image alignment to those lucky regions [21]. Both approaches can berefined with a deblurring process (which is a challenging task as this blur is space-variant). Theregistration method is generally time-consuming, while the fusion approach requires a largenumber of short-exposure frames.

3 Image Selection

As heat haze causes the visible edges of an object to distort in time, multiple images capturedwith short exposure time are likely to show at least one correct edge. This leads to the techniquecalled ‘Lucky imaging ’. Lucky imaging techniques were originally introduced by Gregory inearth-based astronomic photography to overcome warping and blurring due to the atmosphere[15]. An example in [21] uses a set of images taken with a speed of more than 100 fps andthey are ranked using a quality metric with the top N% images subsequently combined usinga weighted average . The technique has been improved with the use of ‘Lucky regions’ in orderto apply to the heat haze problem at the photography taken at long horizontal path close tothe ground where the distortions are space and time variant [16] 1.

A quality metric is normally used to select the amount of each image included in the fusionoperation. Traditional metrics use a sharpness measurement as this determines the amount ofdetail the image contains. Some examples include the maximal local gradient [22], the maximallocal absolute difference (where the contrast is also used to average the absolute difference atthe overall edge pixels) [23], and eigenvalue sharpness [11]. Sharpness can also be measured inthe frequency domain. The authors in [24] use the ratio between the high and low frequencybands, whilst a bicoherence technique is employed in [16]. In [14] the authors measure localsharpness using a 3× 3 discrete Laplacian filter.

Alternatively the quality metric can be developed based on wavefront sensing principlesusing phase diversity. The technique relies on the through-focus symmetry of the PSF foran unaberrated imaging system. The regions of the scene which are imaged with little or noatmospheric degradation will have the same PSF, whilst the regions which are imaged withsignificant aberration will have difference PSFs in the two phase-diverse images. The imagequality is then given as Equation 2.

Qk =2

πarctan

(∑i

∣∣Zk,i − 12(Pk,i +Mk,i)

∣∣∑i |Pk,i +Mk,i|

)(2)

where Zk,i is the value of the ith pixel in kth region, and similarly for previous frame P andnext frame M . Q varies between 0 and 1, with higher values corresponding to better image

1It has been reported verbally by the DSTL that in ‘close to ground’ imagery over long distance as few as in1 in 100,000 frames can be considered as lucky

3

Page 4: 1 Introduction - University of Bristol

quality [19]. A distortion compensation algorithm is subsequently applied using an iterativeprocess to improve the reference frame by summing the aligned images. The same researchgroup that produced [19] has developed a real-time (approximately 10 fps) implementation aspresented in [20] which applies a phase-diversity approach to an active short-wave infrared BIL(burst-illumination laser - narrow-band active laser) imaging system.

4 Image Alignment and Registration

High-speed cameras can be used with short exposure time to freeze moving objects so as tominimise distortions associated with motion blur. However, a geometric distortion which is theresult of anisoplanatic tip/tilt wavefront errors will still be present. Straightforward temporalaveraging of a number of frames in the sequence can be used. Since atmospheric turbulencecan be viewed as being quasi-periodic, averaging a number of frames would provide a referenceframe that is geometrically improved, but blurred by an unknown PSF of the same size as thepixel motions due to the turbulence. A blind deconvolution algorithm can consequently be usedto reduce the blur. This simple approach has been used in a speckle imaging technique [25].The main drawbacks of this are that it requires a large number of frames to overcome thedistortion and the average operation makes the deblurring process challenging. Moreover, unlikeastronomic imaging, when imaging over a long horizontal path close to the ground, conditionsare likely to be anisoplanatic and image dancing will affect local image displacements betweenconsecutive frames rather than global shifts only. Therefore more sophisticated image alignmenttechniques are required.

4.1 Block-based Matching

Early work employed full-frame alignment, where camera translation and rotation are modelledby orthographic projection [26]. This method does not give impressive results as the imagealways combines a variety of depths which have different translations. Also, to account for localisoplanatism, the image dancing affects local displacements. Hence, a local block-based flowhas been proposed to solve this problem. This exploits the sum-of-squared differences (SSD)between each image and the averaged image, and the minimum value is selected from 5 × 5neighbourhoods around the current pixel [14]. A similar concept called locally operating Motion-Compensated Averaging (MCA) has been employed. The method utilises Block Matching (BM)in order to identify and re-arrange uniformly displaced image parts. The local MCA-procedurehas been used in both the lucky imaging [21] and registration [9].

4.2 Optical Flow

Optical flow is widely exploited in video processing. The Lukas-Kanade and Horn-Schunckmethods can be applied to image registration in a coarse-to-fine iterative manner, in such away that the spatial derivatives are first computed at a coarse scale (a smaller image) in apyramid. The source image is warped by the computed deformation, and iterative updates arethen conducted at successively finer scales [10].

4

Page 5: 1 Introduction - University of Bristol

Figure 3: Image reconstruction using 80 frames distorted by real atmospheric turbulence. (a) Theobserved frame. (b) Lucky region fusion [18]. (c) Image registration [11].

4.3 Registration using Non-rigid Deformation

Image registration involves spatially transforming the target image to align with the referenceimage when one of the images is referred to as the reference and the second image is referredto as the target or sensed. The ‘non-rigid’ term refers to the fact that the transformationis capable of local warping. Theoretically the warping process is finished when the iterativeprocess reaches the minimum of the cost function. Assuming an image I0 is warped to an imageI1, a general energy formulation can be written as Equation 3.

E(φ) = S(I0, I1) +R(φ) (3)

where I0 = I1 ◦ φ is the warped version of I0, S is a similarity measure between I0 and I1. TheR is a regularisation term. The result is found when E is minimised.

In [13], a B-spline based registration algorithm is introduced to estimate the motion field ineach observed frame which can remove geometric deformation with the symmetric constraint(vxy = −vyx). Large deformation models, such as diffeomorphisms, are employed in [8]. Thiswas originally used in fluid mechanics and applied in medical image processing. The iterativeprocess morphs the image until dφt

dt = vt(φ) is minimum which means that I0 is closed to I1.The authors in [8] use the following energy term:

v = arg minv∈V1

2

∫ T

0‖vt‖2V dt+

C

2

∥∥I0 ◦ φvT,0 − IT∥∥22 (4)

Image registration is performed for each frame with respect to a reference frame therebyproviding a sequence that is stable and geometrically correct. As the distortion model is un-known, an iterative process must be used so that the correcting frame is improved when thedifferent between frames are decreased. To initialise the reference frame, the current frame orthe fused image from a set of neighbouring frames (using the techniques from section 5) maybe used. A comparison between the results of lucky region and image registration is shown inFigure 3.

The limitation of this approach is that non-rigid registration works well only when thegeometric distortion can be adjusted by all the control points in the grid. However, imagingthrough hot-air turbulence contains both large area distortion (perceived as waving) and smalldisturbance (perceived as jittering). If non-rigid registration has to be used to compensatesmall disturbance, then the number of control points will be huge, making the computationimpractical.

5

Page 6: 1 Introduction - University of Bristol

4.4 Registration using DT-CWT

The Dual Tree Complex Wavelet Transform (DT-CWT) introduced by Kingsbury [27] is aform of discrete wavelet transform which generates complex coefficients using a dual tree ofwavelet filters to obtain their real and imaginary parts. Although this redundancy factor oftwo costs extra computational power, it provides extra information for analysis and also stillallows perfect reconstruction of the signal. The DT-CWT overcomes some of the problemsof traditional discrete wavelet transforms in image processing as it provides properties of nearshift-invariance and directionally selectivity [28].

Registration of non-rigid bodies using the phase-shift properties of the DT-CWT was pro-posed in [29]. The algorithm is developed from the ideas of phase-based multidimensionalvolume registration [30], which is robust to noise and temporal intensity variations. Motion es-timation is performed iteratively, firstly by using coarser level complex coefficients to determinelarge motion components and then by employing finer level coefficients to refine the motionfield. The motion is described locally as a parametric affine model. The cost function ε(x) isdefided as follows.

ε(x) =

N∑d=1

∥∥∥cTd (x) K(x) a∥∥∥2 (5)

where N is the number of subband directions and K(x) =

[K(x) 00 1

]and a =

[a1

]. K(x)a

is the affine model equation. The simplest case uses N = 6 directions, so the affine parameters

are a = [a1 . . . a6]T and K(x) =

[1 0 x 0 y 00 1 0 x 0 y

]. The cd(x) is the motion constraint

vector defined as Equation 6.

cd(x) =

∣∣∣∑4k=1 u

∗kvk

∣∣∣2∑4k=1(|uk|

3 + |vk|3) + η

[∇xθd−∂θd

∂t

](6)

where uk and vk are the wavelet coefficents in the reference frame and the current frame,respectively, and the subscripts k = 1 . . . 4 denote the 4 neighbouring wavelet coefficients centred

at location x in subband d. The ∇xθd =[

∂θd∂x

∂θd∂y

]Trepresents the phase gradient at x for

subband d in the direction of x and y. The term ∂θd∂t is the phase gradient between the reference

frame and the current frame. The small positive constant η prevents the demonimator fromgoing to zero.

5 Image Fusion

One of the most popular frame-based fusion techniques is temporal rank filtering as it usuallygives better results when the data have outliers [8]. The spatial neighbours around the currentpixel are also used to find the average value in [12] with a pre-defined threshold to exclude theoutliers. An example result of a temporal median filter applied to 117 frames of a distortedvideo is shown in Figure 4. It should be noted that the observed frame (a) appears sharpenthan the filtered frame, but in the video (b) is stable whereas (a) is shimmering. In an offlineanalysis a viewer would clearly prefer (a).

6

Page 7: 1 Introduction - University of Bristol

Figure 4: Temporal median filtering applied to the stable scene. (a) An observed frame. (b) Thecorresponding stable frame [31].

5.1 Pixel-based processing

A pixel-based weighted average is widely used for fusion as it offers better results than theframe-based method. The simplest technique sets a threshold. The pixels that exceed the giventhreshold are included in the set to construct the average frame [12]. These methods can beimproved with a window-based weighted average. In [6] the authors modified the overlap-add(OLA) method introduced in [32] to create a space-variant PSF. The image is divided intooverlapping patches and the borders of each patch are damped with a windowing function.Each patch is subsequently processed with its individual FFT filter. It is claimed in [13] thatthis method cannot remove diffraction limited blur since the local deconvolution does not usea prior knowledge of the ideal sharp image. They therefore propose a near-diffraction-limitedimage which is reconstructed using L×L overlapping patches centred at each pixel to calculatea local sharpness. Subsequently the value that maximises sharpness is restored using temporalkernel regression.

5.2 Region-based processing

A region-based scheme processes the fusion at the feature level. The process initialises usingimage segmentation to produce a set of homogeneous regions. Various properties of these regionscan be caluculated and used to determine which features from which images are included inthe fused image. This has advantages over pixel-based processing as more intelligent semanticfusion rules can be considered based on actual features in the image, rather than on single orarbitrary groups of pixels. Lewis et al. [33] introduced a region-based image fusion using complexwavelets. This is built on previous work in [34] that shows fusion using complex waveletsoutperforming that using traditional discrete wavelets. The authors employ an adapted versionof the combined morphological spectral unsupervised image segmentation [35] and a multiscalewatershed segmentation [36] to divide the image into regions. Subsequently priority is givento each region using a simple activity measure which takes the absolute value of the waveletcoefficients. The fused image is consequently constructed based on this priority map. The fusionresult for a multifocus imaging application is illustrated in Figure 5.

7

Page 8: 1 Introduction - University of Bristol

Figure 5: Region-based fusion. Left: Input Image A. Middle: Input Image B. Right: Fused Image. [33]

6 Image Deblurring

The final step of image reconstruction is image deblurring or sharpening. Here four key ap-proaches are discussed in detail, a deconvolution from blur model, a blind-deconvolution, asuper-resolution technique and deblurring using the DT-CWT.

6.1 Deconvolution from Blur Model

In the simple case it can be assumed that blur is global and known and therefore a deconvolu-tion process can straightforwardly be applied. Several blur models for atmospheric turbulencesuppression have been pre-defined, based on statistics studied by various researchers [10,12]. Inan astronomical imaging system this information can be gained from a wavefront sensor (WFS)which measures the distortion of the optical wavefront appearing at the telescope aperture. Per-formance for the case of terrain generated heat haze can be found in [37]. However experimentalstudies in [38] show that the choice of the kernel has an insignificant effect on the accuracy ofestimation and therefore preference is given to the differentiable kernels with low computationalcomplexity such as the Gaussian kernel [5]. The specified Gaussian blur model is proposed byHufnagel and Stanley [12].

H(u, v) = e−λ(u2+v2)

56 (7)

where H represents the blur in the frequency domain and λ is estimated to match the blurpresent in the image (which is manually selected in the paper). Models based on the modulationtransfer function (MTF) occurring from different sources are proposed in [5]. The MTF due toforward scattering by aerosols and the MTF due to turbulence are shown in Equation 8 andEquation 9, respectively:

MTFa = e−( f

fc)2τS , f ≤ fc (8)

MTFt = e−57.3f53 λ−

13C2

nr (9)

8

Page 9: 1 Introduction - University of Bristol

Figure 6: The deblur result of the Wiener restoration filter with Hufnagel and Stanleys blur function [12]

C2n =

(79× 10−6

P

T 2

)2

C2T (10)

where fc is cutoff frequency and τS represents optical depth. C2n and C2

T are the refractive indexstructure function and the temperature structure function respectively. P and T are pressureand temperature respectively.

After obtaining the blur model, Wiener Deconvolution can be used [12] (note that theWiener Deconvolution is also applied with a blind Deconvolution with some constraints [2]).The Wiener filter is applied by calculating the 2D Fast Fourier Transform (FFT) of the in-put image, multiplying it by the Wiener filter and then taking the inverse FFT of the result.An example result is illustrated in Figure 6. Alternatively an iterative deconvolution process,such as Richardson-Lucy, is widely employed. This converges to the maximum likelihood so-lution for Poisson statistics in the data. The method in [5] claims that only 3-5 iterations arerequired to achieve the convergence. However the ill-posedness of the deconvolution problemmakes it impossible to obtain perfect solutions. A spatially-variant solution should therefore beconsidered.

6.2 Blind Deconvolution

Classical deconvolution does not give impressive image restoration results when applied to short-exposure images of which the distortions are spatially-phase-invariant. The method relies on theestimation of the PSF usually obtained from a theoretical model or an auxiliary measurement.When such data are not available, it becomes a blind deconvolution. Blind deconvolution isgenerally exploited to achieve (either or both of) image restoration and image enhancement.The goal of restoration is to accomplish an accurate depiction of the scene being imaged, whilethe enhancement aims to create the most visually appealing image (e.g. by removing noise).

Blind deconvolution estimates the PSF from the image (or images) itself and generallyoperates iteratively [39]. In each iteration the image is improved at the same time of PSFestimation. Referring to Equation 1, only Iobv is known, so Iidl can result from the predictionusing a generic measurement of the distance between these two data. A simple but effectivemethod for solving this inverse problem is a least square method. Let x and y are vectorscontaining the Iidl and Iobv, respectively. H is a square, block-circulant matrix of the blurringfunction. Using L2 norm, the x is estimated by minimising ‖Hx− y‖2 with a non-negativityconstraint. The regularisation terms R and λ control the smoothness.

9

Page 10: 1 Introduction - University of Bristol

Figure 7: Chimney and books sequences. (a)-(b) observed frames. (c) reconstructed frame. [6]

x = arg minx≥0 ‖Hx− y‖2 + λR (11)

Generally in the first iteration, x is initialised by the first observation y(0). Even thoughthe blind deconvolution can theoretically solve unknown prior information problems, the initialblur function H is important as it has to ensure that the iteration does not converge to thewrong solution. Additionally, the computation time of the process can be reduced with anaccurate initial blur function H leading to a fewer iterations. That is, the key of this methodis the application of a priori knowledge about the nature of the degradations and the images.The simplest solution is a Gaussian function which seems to work well with long-exposureimages [11], but for short-exposure images the function should be considered locally.

When multiple frames are used, the blur function can be estimated with a least squaresmethod using all available observed images N , yn, 0 < n ≤ N [4].

H(i) = arg minH≥0

∥∥∥Hx(i) − yn∥∥∥2 (12)

Blind deconvolution has been mainly applied to images where the blur function is space-invariant. However, if artefacts (e.g. edge ringing) occur, they are not isolated events withinthe image. Hirsch et al. have introduced a space-variant blind deconvolution [6]. This methoddivides each frame into overlapping patches. Because the size of these patches is small, they canbe viewed as isoplanatic regions - small regions containing space-invariant blur. Then, it canbe processed through its individual FFT filter. That means the blind deconvolution algorithmestimates the PSF for each patch along with the latent image content. The final output imageis produced by fusing the deconvolved patches. Example results are shown in Figure 7.

Maximum likelihood estimation is also employed to solve the blind deconvolution problem [3,39]; however, the computation complexity is generally high. Figure 8 shows image improvementwith the sharpness after applying the blind deconvolution.

10

Page 11: 1 Introduction - University of Bristol

Figure 8: Blind Deconvolution applied to Figure 3 (c) (Left) resulted the sharper image (Right)

6.3 Super-Resolution Methods

Alternative image deblurring techniques have been reported using image super-resolution. Here,super-resolution is defined formally as the removal of blur caused by a diffraction-limited opticalsystem along with meaningful recovery of the object’s spatial-frequency components outsidethe optical system’s passband. A discrete sinc-interpolation is employed [31, 40] for imageupsampling. If several pixels of different frames are to be placed in the same location in theoutput-enhanced frame, the median of those pixels is computed to avoid the influence of outliers.The refinement process operates iteratively using the discrete Fourier transform (DFT) in orderto reduce aliasing effects. Finally the result is down-sampled to the original resolution.

6.4 Deblurring using tje DT-CWT

Wavelet regularization is an efficient technique for deconvolution as it provides both deblur-ring and noise removal. However, conventional wavelet transforms produce artefacts since thetransform is not shift invariant. To avoid these artefacts, the resulting image may be averagedover all possible integer translations or the shift invariant transforms may be exploited. Hence,the complex wavelet approach has been considered as it enables translation invariance withoutexcessive computation overhead. In addition, it provides a better restoration by separatinginto 6 directions, whilst the real separable wavelets allow only two directions. One of the ef-ficient approaches using the DT-CWT is proposed in [41]. The process starts with applying aWiener filter to the blurred image. The cost function in terms of wavelet coeffiencts using MAPestimation can be written as:

J(w) =1

2‖HMw − y‖2 +

1

2wTAw (13)

where W and M represent the forward and inverse DT-CWT, and a column vector w is thewavelet coefficients of the transform of the x. The authors arrange the entries of w such thatw2k−1 = Re(wk), w2k = Im(wk). A is a diagonal matrix with Aii = σ2/var(wi). The deblurredresult of the Cameraman with white additve noise 40dB is illustrated in Figure 9.

Applying a deblurring algorithm might accentuate certain artefacts, particularly if the dis-torted video is compressed. Figure 10 shows how blocking artefacts are affected by the deblurringprocess using the DT-CWT. This problem should be considered in future work.

11

Page 12: 1 Introduction - University of Bristol

Figure 9: Deblurring using DT-CWT. a) blurred image. b) initial image using Wiener filter. c) deblurredimage after 30 iterations. [41]

Figure 10: Left: the average of registered images. Right: Deblurred image using DT-CWT

7 Moving Object Detection

Reducing heat haze effects in a video requires an additional motion model when the objects inthe scene are moving themselves, not because of camera shake or atmospheric turbulence. Athreshold is set and areas having motion vectors larger than the threshold are defined as havingtrue motions [12]. If the absolute difference in intensity between the current point and the pointon the reference frame is larger than a threshold, this current point is ignored in the heat hazereduction process, as it is likely to be true object motion.

True motion trajectory estimation is proposed in [10]. The authors construct a referenceframe based on trajectory centroids smoothing in adaptive temporal windows. An intensitythreshold is determined by exploiting the observer’s limitation of distinguishing between closegrey-levels. Motion vector magnitude and motion vector angular variance are exploited [31,40].These two values can classify the real motions of the objects and the moving edges causedfrom atmospheric turbulence as illustrated in Figure 11. Although not applied to turbulencereduction, other methods have been presented in the context of super-resolution processing.This is an unsolved and difficult problem, and a topic for future research.

Complete motion can be written as mall = mtrue +mdistort, where mdistort is motion offsetaccording atmospheric turbulence. Optical flow may be used to find approximate motion tra-jectory, mtrue. When these displacements are cancelled to make the motion trajectory smooth,the estimated kernel can be applied across neighbouring frames. A steering kernel regressionwithout explicit motion estimation proposed in [42] shows promising results and can possiblybe used here. The example is shown in Figure 12. The weights provided by the steering ker-nel function capture the local signal structures which include both spatial and temporal edges.With this approach, not only can the heat haze turbulence be removed, but the problems due

12

Page 13: 1 Introduction - University of Bristol

Figure 11: Magnitude-driven mask (left) and angle-driven mask (right) used in [31] to classify truemotions and image waving from heat haze.

Figure 12: Steering kernel visualization examples for (a) the case one horizontal edge moving up (thiscreates a tilted plane in a local cubicle) (b)-(c) cross sections and the isosurface of the weights given bythe steering kernel function when we denoise the sample located at the centre of the data cube of (a). [42]

to an unstable camera, panning and zooming can also be solved.

The authors are aware that future unpublished work has been completed by Kingsbury -this will be evaluated in this study.

8 Quality Assessment

Image quality assessment measures the perceived image degradation (typically, compared to anideal or perfect image). Imaging systems may introduce distortion or artefacts in the signal, soquality assessment is an important issue when assessing the performance of individual systemsof differentiation between different solutions to the some methods. Two approaches to qualityassessment exist; subjective and objective. Subjective assessment captures the visual percep-tion of human subjects directly through human trials and provides a benchmark for objectiveanalysis. Objective assessments are mathematical models that approximate the results of sub-jective quality assessment, but are based on criteria and metrics that can be measured readilyand automatically evaluated without the need for human trials.

Objective image quality metrics can be classified according to the availability of an origi-

13

Page 14: 1 Introduction - University of Bristol

nal (distortion-free) image, with which the distorted image is to be compared. Most existingapproaches are known as full-reference, meaning that a complete reference image is available.In many practical applications, a reference image is not available, and a no-reference or ‘blind’quality assessment approach is desirable. In a third method, the reference image is only par-tially available, in the form of a set of extracted features made available as side information tohelp evaluate the quality of the distorted image. This is referred to as reduced-reference qualityassessment.

8.1 Subjective testing

Subjective Quality Assessment is based on human perception and evaluated by human trials.The basic method which has been used for decades is the Mean Opinion Score (MOS). Thisprovides a numerical indication of the perceived quality from the users’ perspective of receivedmedia after compression/transmision/restoration. The assessment evaluates the image qualityby asking non-expert random people to observe the images. The quality assessment is contin-uously scaled from 0 to 100 (or 10) on a scale Bad, Poor, Good and Excellent [43]. Obviouslythis approach is direct and meaningful but resource intensive. Various methodologies exist forconducting subjective trials. These methods include single stimulus, double stimulus and triplestimulus [44].

8.2 Objective quality metrics

When the reference images or video (undistorted images) are available, the simplest and mostwidely used full-reference quality metric is mean squared error (MSE). It is calculated by averag-ing the squared intensity differences of distorted and reference image pixels. The higher values ofMSE correspond to lower image quality and the lower values of MSE imply better image quality.A related quantity is peak signal-to-noise ratio PSNR defined as PSNR = 10 log

[M2/MSE

],

where M denotes the maximum possible value for any pixel. For an example 8 bit monochromeimages, M = 255. The PSNR is measured in units of decibels (dB). Obviously, higher val-ues for PSNR should correspond to better image quality and vice versa. The distortion-basedmetrics are straightforward to implement; however, they are mathematical difference modelsrather than perceptually-based metrics. Examples exist where PSNR does not correlate wellwith subjective assessment. A number of quality matrics have been proposed recently with theaim of providing better correlation with subjective scores.

Structural information is an important factor for human visual perception. Wang et al.developed an image quality assessment approach based on the structural similarity (SSIM)between a reference and a distorted image [45]. This metric was tested on JPEG and JPEG2000Image Databases and shows significant improvement in correlation with subjective mean opinionscores compared with PSNR. A version for video, MSSIM, has also been proposed.

Motion information is also an essential component for human perception. Seshadrinathanet al. proposed a Motion Tuned Spatio-Temporal Quality Assessment (MOVIE) method fornatural videos [46]. Test and reference material is decomposed by a Gabor filter family, and thequality index consists of spatial quality, based on SSIM, and temporal quality, mainly computedusing motion information. Using the VQEG FRTV Phase 1 Database [47], MOVIE provides acompetitive performance including higher correlation and fewer outliers, compared with PSNRand SSIM. The drawback of MOVIE is its complexity, as a large number of filters are used togenerate subband coefficients.

14

Page 15: 1 Introduction - University of Bristol

Mutual information is a quantity that measures the mutual dependence of the two data, so itcan be used for image quality measurement. In [?], the reference image is modelled as being theoutput of a stochastic ‘natural’ source that passes through the HVS channel. The informationcontent of the reference image is quantified as being the mutual information between the inputand output of the HVS channel. This process is also applied to the distored image. Then twoinformation measures are combined to form a visual information fidelity measure (VIF) thatrelates visual quality to relative image information.

Recently Zhang and Bull have introduced a novel artefact-based video matric (AVM) [48].The metric contains three independent measurements, namely blurring estimation, similarityestimation and edge artefact detection. The method is fast and suitable for assessing synthesis-based video coding. It gives a comparable performance to SSIM for non systhesised content.

8.3 No-reference image quality assessment

In the past, a great deal of effort has been made to develop new objective image/video qualitymetrics that incorporate perceptual quality measures by considering Human Visual System(HVS) characteristics. Most of the proposed image quality assessment approaches require theoriginal image as a reference. When reference is not available, as in the case of heat hazereduction, the quality assessment becomes challenging. Methods cited in the literature areusually based on the prior knowledge of the distortion characteristics.

Blind image quality measurement was first introduced using blur measurement to assessoptics and sensors noise in capture or display devices. Later it was used in video coding andtransmission. In later case, the metrics were derived from blocking artefacts for DCT-basedcoding [49] and ringing artefacts for wavelet-based coding [50]. Li introduced an algorithm thataims to measure several distortions present in an image blindly: global blur (based on assumedGaussian blurs of step edges), additive white and impulse noise (based on local smoothnessviolation), blocking artefacts (based on simple block boundary detection), and ringing artefacts(based on anisotropic diffusion) [51].

8.4 Task based assessment

Objective matrics are required that reflect the difficulty of performing certain tasks and thesewill need to be correlated with subjective assessment. Thus the data sets will need to includeobjects or scenarios so that appropriate questions can be asked. For example: i) can yourecognise one of the faces?, ii) is there a weapon?, and iii) can you read the text? Johnson’scriteria are widely used in military surveillance and define the number of pixels (+/-50

(1) Detection - an object is present: 2 +1/-0.5 pixels (2) Orientation - symmetrical, asym-metric, horizontal or vertical: 2.8 +0.8/-0.4 pixels (3) Recognition - the type object can bediscerned, a person vs. a car: 8 +1.6/-0.4 pixels (4) Identification - a specific object can bediscerned, a woman vs. a man, the specific car: 12.8 +3.2/-2.8 pixels

15

Page 16: 1 Introduction - University of Bristol

Figure 13: Principle of heat haze reduction using micro-lens sensor and lucky region method [53]

9 Haze Heat Reduction on Cameras

9.1 DARPA’s SRVS

DARPA has developed a new type of binocular which penetrates heat haze. It exploits shimmer-ing distortion to magnify distant objects, significantly extending target recognition and iden-tification. The Super-Resolution Vision System (SRVS) exploits an ‘atmospheric turbulence-generated micro-lensing phenomena’, which acts as a lens, sporadically generating a better viewof what is going on behind the haze [52].

Digital technology is employed to identify the ‘lucky regions’ or ‘lucky frames’ where a clearview appears and assembles them into a complete picture as deputed in Figure 13. Howeverone disadvantage is that, since the technique relies on combining a number of images, real-timeexecution is difficult. The best case viewing from the approximately 1.8 kg, 35 cm prototype isapproximately one image per second.

9.2 PENTAX PAIR 02

The PENTAX Imaging Company has released a new zoom lens model, called PAIR 02, whichis claimed to be the first industry device with a heat haze reduction function [54]. An examplezoomed image is shown in Figure 14. Note that the image is improved about half a secondafter the heat haze reduction is switched on. The fog reduction system appears to work well.Figure 15 shows four frames of a video sequence when the heat haze reduction function is off(left images) and on (right images). The two frames on the right are more stable (fewer ripples)compared to the two frames on the left. However the images are still not very sharp and thenumber plate is hardly, if at all, improved. Note that there are no reports of its performancewith moving cameras or objects.

References

[1] Kylmar. Advanced electro-optical systems. http://www.kylmar.co.uk/, 2011.

[2] B.L.K. Davey, R.G. Lane, and R.H.Y. Bates. Blind deconvolution of noisy complex-valued image. OpticsCommunications, 69:353–356, 1989.

16

Page 17: 1 Introduction - University of Bristol

Figure 14: Pentax Atmospheric Interference Reduction (PAIR) technology. (a) Reduc-tion function off. (b) Fog reduction. (c) Fog reduction and heat haze reduction.(www.pentaxcctvus.com/PairVideo/index.html)

Figure 15: Pentax Atmospheric Interference Reduction (PAIR) technology. (a) Reduction function off.(b) Heat haze reduction on.(www.pentaxcctvus.com/PairVideo/index.html)

17

Page 18: 1 Introduction - University of Bristol

[3] Edmund Y. Lam and Joseph W. Goodman. Iterative statistical approach to blind image deconvolution. J.Opt. Soc. Am. A, 17(7):1177–1184, Jul 2000.

[4] S. Harmeling, M. Hirsch, S. Sra, and B. Scholkopf. Online blind image deconvolution for astronomy. In Procof IEEE Conf Comp. Photogr., 2009.

[5] B. Tedla, S.D. Cabrera, and N.J. Parks. Analysis and restoration of desert/urban scenes degraded by theatmosphere. In Image Analysis and Interpretation, 2004. 6th IEEE Southwest Symposium on, pages 11 –15, march 2004.

[6] M. Hirsch, S. Sra, B. Scholkopf, and S. Harmeling. Efficient filter flow for space-variant multiframe blinddeconvolution. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 607–614, june 2010.

[7] Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior. In ComputerVision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1956 –1963, june 2009.

[8] Jerome Gilles, Tristan Dagobert, and Carlo Franchis. Atmospheric turbulence restoration by diffeomorphicimage registration and blind deconvolution. In Proceedings of the 10th International Conference on AdvancedConcepts for Intelligent Vision Systems, ACIVS ’08, pages 400–409, Berlin, Heidelberg, 2008. Springer-Verlag.

[9] Claudia S. Huebner. Compensating image degradation due to atmospheric turbulence in anisoplanaticconditions. In Proceedings of Mobile Multimedia/Image Processing, Security, and Applications, 2009.

[10] Dalong Li. Suppressing atmospheric turbulent motion in video through trajectory smoothing. Signal Pro-cessing, 89(4):649 – 655, 2009.

[11] X. Zhu and P. Milanfa. Image reconstruction from videos distorted by atmospheric turbulence. SPIEElectronic Imaging, Conference on Visual Information Processing and Communication, 2010.

[12] J.P. Delport. Scintillation mitigation for long-range surveillance video. Science real and relevant conference,2010.

[13] X. Zhu and P. Milanfar. Removing atmospheric turbulence. Submitted to IEEE Trans. on Pattern Analysisand Machine Intelligence.

[14] N. Joshi and M.F. Cohen. Seeing mt. rainier: Lucky imaging for multi-image denoising, sharpening, andhaze removal. In Computational Photography (ICCP), 2010 IEEE International Conference on, pages 1 –8,march 2010.

[15] R.L. Gregory. A technique of minimizing the effects of atmospheric disturbance on photographic telescopes.Nature, 203, 1964.

[16] Z. Wen, D. Fraser, and A. Lambert. Bicoherence used to predict lucky regions in turbulence affectedsurveillance. In Video and Signal Based Surveillance, 2006. AVSS ’06. IEEE International Conference on,page 108, nov. 2006.

[17] S. Woods, J. G. Burnett, and A.M Scott. Efficient technique for lucky frame selection using phase diversityimages. In EMRC DTC Technical Conference, 2007.

[18] M. Aubailly, M. Vorontsov, G. Carhat, and M. Valley. Automated video enhancement from a stream ofatmospherically-distorted images: the lucky-region fusion approach. SPIE, 7463, 2009.

[19] S. Woods, P. Kent, and J. G. Burnett. Lucky imaging using phase diversity image quality metric. In EMRCDTC Technical Conference, 2009.

[20] P. J. Kent, S. B. Foulkes, J. G. Burnett, S. C. Woods, and A. J. Turner. Progress towards a real-time activelucky imaging system. In EMRC DTC Technical Conference, 2010.

[21] C. S. Huebner and C. Scheifling. Software-based mitigation of image degradation due to atmospheric tur-bulence. In SPIE conference on Optics in Atmospheric Propagation and Adaptive Systems, 2010.

[22] R. Eschbach and W. A. Fuss. Image dependent sharpness enhancement. US patent patent 5,363,209.

[23] B. Zhang, J. P. Allebach, and Z. Pizlo. An investigation of perceived sharpness and sharpness metrics. SPIEImage Quality and System Performance, 5668:98–110, 2005.

[24] D. Tretter. Apparatus and method for determining the appropriate amount of sharpening for an image. USpatent 5,867,606.

[25] M. Roggemann and B. Welsh. Imaging through turbulence. CRC Press, 1996.

[26] Heung-Yeung Shum and Richard Szeliski. Construction of panoramic mosaics with global and local align-ment. International Journal of Computer Vision, 2000.

18

Page 19: 1 Introduction - University of Bristol

[27] N G Kingsbury. The dual-tree complex wavelet transform: a new technique for shift invariance and direc-tional filters. IEEE Digital Signal Processing Workshop, 1998.

[28] I.W. Selesnick, R.G. Baraniuk, and N.C. Kingsbury. The dual-tree complex wavelet transform. SignalProcessing Magazine, IEEE, 22(6):123 – 151, nov. 2005.

[29] H. Chen and N. Kingsbury. Efficient registration of non-rigid 3-d bodies. Image Processing, IEEE Transac-tions on, 2011.

[30] M. Hemmendorff, M.T. Andersson, T. Kronander, and H. Knutsson. Phase-based multidimensional volumeregistration. Medical Imaging, IEEE Transactions on, 21(12):1536 –1543, dec. 2002.

[31] Barak Fishbain, Leonid P. Yaroslavsky, Ianir A. Ideses, Ofer Ben-Zvi, and Alon Shtern. Real time stabiliza-tion of long range observation system turbulent video. Proc. SPIE 6496, 2007.

[32] Thomas G. Stockham, Jr. High-speed convolution and correlation. In Proceedings of the April 26-28, 1966,Spring joint computer conference, AFIPS ’66 (Spring), pages 229–233, New York, NY, USA, 1966. ACM.

[33] J. J. Lewis, R. J. O’Callaghan, S. G. Nikolov, D. R. Bull, C. N. Canagarajah, and Essa Basaeed. Region-based image fusion using complex wavelets. In Proc. 7th International Conference on Information Fusion,volume 1, pages 555–562, 2004.

[34] Paul Hill, Nishan Canagarajah, and David Bull. Image fusion using complex wavelets. In Proc. 13th BritishMachine Vision Conference, pages 487–496, 2002.

[35] R.J. O’Callaghan and D.R. Bull. Combined morphological-spectral unsupervised image segmentation. ImageProcessing, IEEE Transactions on, 14(1):49 –62, jan. 2005.

[36] P. Scheunders and J. Sijbers. Multiscale watershed segmentation of multivalued images. In Pattern Recog-nition, 2002. Proceedings. 16th International Conference on, volume 3, pages 855 – 858, 2002.

[37] D. Tofsted, D. Quintis, S. O Brien, J. Yarbrough, M. Bustillos, and G. T. Vaucher. Test report on thenovember 2005 nato rtg-40 active imager land field trials. Army Research Lab Aberdeen Proving GroundMd, 2006.

[38] B. W. Silverman. Density estimation for statistics and data analysis monographs on statistics and appliedprobability. London: Chapman and Hall, 1986.

[39] Timothy J. Schulz. Multiframe blind deconvolution of astronomical images. J. Opt. Soc. Am. A, 10(5):1064–1073, May 1993.

[40] Barak Fishbain, Leonid P. Yaroslavsky, and Ianir Ideses. Spatial, temporal, and interchannel image datafusion for long-distance terrestrial observation systems. Advances in Optical Technologies, 2008, 2008.

[41] Yingsong Zhang and Nick Kingsbury. A bayesian wavelet-based multidimensional deconvolution with sub-band emphasis. In Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual Interna-tional Conference of the IEEE, pages 3024 –3027, aug. 2008.

[42] H. Takeda, P. Milanfar, M. Protter, and M. Elad. Super-resolution without explicit subpixel motion esti-mation. Image Processing, IEEE Transactions on, 18(9):1958 –1975, sept. 2009.

[43] J. Dusek and K. Roubik. Testing of new models of the human visual system for image quality evaluation. InSignal Processing and Its Applications, 2003. Proceedings. Seventh International Symposium on, volume 2,pages 621 – 622 vol.2, july 2003.

[44] H. Hoffmann. Hdtv - ebu format comparisons at ibc 2006,. EBU, Technical Review, 2006.

[45] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibilityto structural similarity. Image Processing, IEEE Transactions on, 13(4):600 –612, april 2004.

[46] K. Seshadrinathan and A.C. Bovik. Motion tuned spatio-temporal quality assessment of natural videos.Image Processing, IEEE Transactions on, 19(2):335 –350, feb. 2010.

[47] VQEG. Final report from the video quality experts group on the validation of objectivequailty metrics for video quality assessment. Video Quality Experts Group, Tech. Rep, Available:http://www.its.bldrdoc.gov/vqeg, 2000.

[48] F. Zhang and D. Bull. A parametric framework for video compression using region-based texture models.Selected Topics in Signal Processing, IEEE Journal of, PP(99):1, 2011.

[49] Zhou Wang, A.C. Bovik, and B.L. Evan. Blind measurement of blocking artifacts in images. In ImageProcessing, 2000. Proceedings. 2000 International Conference on, volume 3, pages 981 –984 vol.3, 2000.

[50] H.R. Sheikh, A.C. Bovik, and L. Cormack. No-reference quality assessment using natural scene statistics:Jpeg2000. Image Processing, IEEE Transactions on, 14(11):1918 –1927, nov. 2005.

19

Page 20: 1 Introduction - University of Bristol

[51] Xin Li. Blind image quality assessment. In Image Processing. 2002. Proceedings. 2002 International Con-ference on, volume 1, pages I–449 – I–452 vol.1, 2002.

[52] Henry S. Kenyon. Sensor technology opens new horizons. http://www.afcea.org/signal, 2008.

[53] Mark Rutherford. New binoculars make the most of mirage. http://news.cnet.com, 2008.

[54] PENTAX. Cctv lens pair - pentax atmospheric interference reduction. http://www.pentaxcctvus.com, 2010.

20