5 single image super resolution using

4
Single Image Super-Resolution Using Dictionary-Based Local Regression Sundaresh Ram and Jeffrey J. Rodriguez. Department of Elecical and Computer Engineering, University of Arizona, Tucson, , USA. Email: {ram.jjrodrig}@email.arizona .edu Ahstract-This paper presents a new method of producing a high-resolution image from a single low-resolution image without any external training image sets. We use a dictionary-based regression model for practical image super-resolution using local self-similar example patches within the image. Our method is inspired by the observation that image patches can be well rep- resented as a sparse linear combination of elements from a chosen over-complete dictionary and that a patch in the high-resolution image have good matches around its corresponding location in the low-resolution image. A first-order approximation of a nonlinear mapping function, learned using the local self-similar example patches, is applied to the low-resolution image patches to obtain the corresponding high-resolution image patches. We show that the proposed algorithm provides improved accuracy compared to the existing single image super-resolution methods by running them on various input images that contain diverse textures, and that are contaminated by noise or other artifacts. Index Terms-Image restoration, dictionary learning, sparse recovery, image super-resolution, regression. I. INTRODUCTION Super-resolution image reconstruction is a very important task in many computer vision and image processing applica- tions. The goal of image super-resolution (SR) is to generate a high-resolution () image from one or more low-resolution (LR) images. Image SR is a widely researched topic, and there have been numerous SR algorithms that have been proposed in the literature [1]-[9], [11]-[18]. SR algorithms can be broadly classified into three main categories: interpolation-based algo- rithms, leaing-based algorithms, and reconstruction-based algorithms. Interpolation-based SR algorithms [2], [8], [9], [11] are fast but the results may lack some of the fine details. In leaing-based SR algorithms [4]-[6], [14], detailed textures are elucidated by searching through a aining set of L images. They need a careful selection of the training im- ages, otherwise erroneous details may be found. Alteatively, reconsuction-based SR algorithms [1], [3], [12], [15]-[18] apply various smoothness priors and impose the consaint that when properly downsampled, the image should reproduce the original LR image. The image SR problem is a severely ill-posed problem, since many images can produce the same LR image, and thus it has to rely on some song image priors for robust estima- tion. The most common image prior is the simple analytical "smoothness" prior, e.g., bicubic interpolation. As an image contains sharp discontinuities, such as edges and coers, using the simple "smoothness" prior for its SR reconsuction will result in ringing, jagged, blurring and ghosting artifacts. Thus, 978-1-4799-4053-0114/$31.00 ©2014 IEEE 121 more sophisticated statistical image priors leaed om natural images have been explored [1], [2], [12]. Even though natural images are sparse signals, ying to capture their rich charac- teristics using only a few parameters is impossible. Further, example-based nonparameic methods [14]-[16], [18] have been used to predict the missing high-frequency component of the image, using a universal set of aining example L image patches. But these methods require a large set of aining patches, making them computationally inefficient. Recently, many SR algorithms have been developed using the fact that images possess a large number of self-similarities, I.e., local image suctures tend to reappear within and across different image scales [3], [5], [18], and thus the image SR problem can be regularized based on these examples rather than some exteal database. In particular, Glasner et al. [5] proposed a framework that uses the self-similar example patches from within and across different image scales to regularize the SR problem. Yang et al. [18] developed a SR method where the SR images are constructed using a leed dictionary formed using image patch pairs exacted by building an image pyramid of the LR image. Freedman et al. [3] extended the example-based SR amework by following a local self-similarity assumption on the example image patches and iteratively upscaling the LR image. In this paper, we describe a new single image super- resolution method using a dictionary-based local regression approach. Our approach differs from prior work on single- image SR with respect to two aspects: 1) using the in-place self-similarity [17] to construct and train a dictionary from the LR image, and 2) using the ained dictionary to le a robust first-order approximation of the nonlinear mapping from LR to image patches. The image patch is reconstructed om the given LR image patch using this leed nonlinear function. We describe our algorithm in detail and present both quantitative and qualitative results comparing it to several recent algorithms. II. METHODS We assume that some areas of the input LR image Xo contain high-frequency content that we can borrow for image SR; i.e., Xo is an image containing some sharp areas but overall having unsatisfactory pixel resolution. Let Xo and X denote the LR (input) and (output) images, where the output pixel resolution is r times greater. Let Yo and Y denote the coesponding low-equency bands. That is, Yo has SSIAI2014

Transcript of 5 single image super resolution using

Page 1: 5 single image super resolution using

Single Image Super-Resolution Using

Dictionary-Based Local Regression

Sundaresh Ram and Jeffrey J. Rodriguez. Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, USA.

Email: {ram.jjrodrig}@email.arizona . edu

Ahstract-This paper presents a new method of producing a high-resolution image from a single low-resolution image without any external training image sets. We use a dictionary-based regression model for practical image super-resolution using local self-similar example patches within the image. Our method is inspired by the observation that image patches can be well rep­resented as a sparse linear combination of elements from a chosen over-complete dictionary and that a patch in the high-resolution image have good matches around its corresponding location in the low-resolution image. A first-order approximation of a nonlinear mapping function, learned using the local self-similar example patches, is applied to the low-resolution image patches to obtain the corresponding high-resolution image patches. We show that the proposed algorithm provides improved accuracy compared to the existing single image super-resolution methods by running them on various input images that contain diverse textures, and that are contaminated by noise or other artifacts.

Index Terms-Image restoration, dictionary learning, sparse recovery, image super-resolution, regression.

I. INTRODUCTION

Super-resolution image reconstruction is a very important

task in many computer vision and image processing applica­

tions. The goal of image super-resolution (SR) is to generate a

high-resolution (HR) image from one or more low-resolution

(LR) images. Image SR is a widely researched topic, and there

have been numerous SR algorithms that have been proposed in

the literature [1]-[9], [11]-[18]. SR algorithms can be broadly

classified into three main categories: interpolation-based algo­

rithms, learning-based algorithms, and reconstruction-based

algorithms. Interpolation-based SR algorithms [2], [8], [9],

[11] are fast but the results may lack some of the fine details.

In learning-based SR algorithms [4]-[6], [14], detailed textures

are elucidated by searching through a training set of LRIHR

images. They need a careful selection of the training im­

ages, otherwise erroneous details may be found. Alternatively,

reconstruction-based SR algorithms [1], [3], [12], [15]-[18]

apply various smoothness priors and impose the constraint that

when properly downsampled, the HR image should reproduce

the original LR image.

The image SR problem is a severely ill-posed problem, since

many HR images can produce the same LR image, and thus

it has to rely on some strong image priors for robust estima­

tion. The most common image prior is the simple analytical

"smoothness" prior, e.g., bicubic interpolation. As an image

contains sharp discontinuities, such as edges and corners, using

the simple "smoothness" prior for its SR reconstruction will

result in ringing, jagged, blurring and ghosting artifacts. Thus,

978-1-4799-4053-0114/$31.00 ©2014 IEEE 121

more sophisticated statistical image priors learned from natural

images have been explored [1], [2], [12]. Even though natural

images are sparse signals, trying to capture their rich charac­

teristics using only a few parameters is impossible. Further,

example-based nonparametric methods [14]-[16], [18] have

been used to predict the missing high-frequency component

of the HR image, using a universal set of training example

LRIHR image patches. But these methods require a large set

of training patches, making them computationally inefficient.

Recently, many SR algorithms have been developed using

the fact that images possess a large number of self-similarities,

I.e., local image structures tend to reappear within and across

different image scales [3], [5], [18], and thus the image SR

problem can be regularized based on these examples rather

than some external database. In particular, Glasner et al.

[5] proposed a framework that uses the self-similar example

patches from within and across different image scales to

regularize the SR problem. Yang et al. [18] developed a

SR method where the SR images are constructed using a

learned dictionary formed using image patch pairs extracted by

building an image pyramid of the LR image. Freedman et al.

[3] extended the example-based SR framework by following a

local self-similarity assumption on the example image patches

and iteratively upscaling the LR image.

In this paper, we describe a new single image super­

resolution method using a dictionary-based local regression

approach. Our approach differs from prior work on single­

image SR with respect to two aspects: 1) using the in-place

self-similarity [17] to construct and train a dictionary from the

LR image, and 2) using the trained dictionary to learn a robust

first-order approximation of the nonlinear mapping from LR

to HR image patches. The HR image patch is reconstructed from the given LR image patch using this learned nonlinear

function. We describe our algorithm in detail and present

both quantitative and qualitative results comparing it to several

recent algorithms.

II. METHODS

We assume that some areas of the input LR image Xo contain high-frequency content that we can borrow for image

SR; i.e., Xo is an image containing some sharp areas but

overall having unsatisfactory pixel resolution. Let Xo and

X denote the LR (input) and HR (output) images, where

the output pixel resolution is r times greater. Let Yo and Y denote the corresponding low-frequency bands. That is, Yo has

SSIAI2014

Page 2: 5 single image super resolution using

first-order regression f Xo + V'FT(yO)(Y- Yo) \..

low-freq. band Y

Y = bicubic(X,)

Fig. I. For each patch y of the upsampled low-frequency image Y, we find its in-place match YO from the low-frequency image Yo, and then perform a first-order regression on Xo to estimate the desired patch x for the target image X.

the same spatial dimension as Xo, but is missing the high­

frequency content, and likewise for Y and X. Let Xo and x

denote a x a HR image patches sampled from Xo and X, respectively, and let Yo and y denote a x a LR image patches

sampled from Yo and Y , respectively. Let (i, j) and (p, q) denote coordinates in the 2-D image plane.

A. Proposed Super-Resolution Algorithm

The LR image is denoted as Xo E lRK,XK2, from which

we obtain its low-frequency image Yo E lRK,XK2 by Gaussian

filtering. We upsample Xo using bicubic interpolation by a

factor of r to get Y E lRTK,XTK2. Y is used to approximate

the low-frequency component of the unknown HR image X E lRTK,XTK2. We aim to estimate X from the knowledge of

Xo, Yo and Y . Fig. 1 is a block-diagram description of the overall SR

scheme presented. For each image patch y from the image Y at

location (i, j), we find its in-place self-similar example patch

Yo around its corresponding coordinates (is, j s) in the image

Yo, where is = lilr + 0.5J and js = Ulr + 0.5J. Similarly,

we can obtain the image patch Xo from image Xo, which is a

HR version of Yo. The image patch pair {Yo, xo} constitutes

a LR/HR image prior example pair from which we learn a

first-order regression model to estimate the HR image patch x

for the LR patch y. We repeat the procedure using overlapping

patches of image Y , and the final HR image X is generated by

aggregating all the HR image patches x obtained. For large

upscaling factors, the algorithm is run iteratively, each time

with a constant scaling factor r.

B. Local Regression

The patch-based single image SR problem can be viewed

as a regression problem, i.e., finding a nonlinear mapping

function f from the LR patch space to the target HR patch

space. However, due to the ill-posed nature of the inverse

problem at hand, learning this nonlinear mapping function

requires good image priors and proper regularization. From

Section II-A, the in-place self-similar example patch pair

{Yo, xo} serves as a good prior example pair for inferring

the HR version of y. Assuming that the mapping function f is continuously differentiable, we have the following Taylor

series expansion:

x f(y) = f(yo + y - Yo) (1)

f(yo) + "ilF(yo)(y - Yo) + O{II y - Yo lin :::::; Xo + "ilF(yo)(y - Yo).

Equation (1) is a first-order approximation for the nonlinear

mapping function f. Instead of learning the mapping function

f, we can learn its gradient "il f, which should be simpler.

We learn the mapping gradient "il f by building a dictionary

using the prior example pair {Yo, xo} detailed in the next

section. With the function values learned, given any LR input

patch y, we first search its in-place self-similar example patch

pair {Yo, xo}, then find "il f(yo) using the trained dictionary,

and then use the first-order approximation to compute the HR

image patch x.

Due to the discrete resampling process in downsampling

and upsampling, we expect to find multiple approximate in­

place examples for y in the 3 x 3 neighborhood of (is,js), which contains 9 patches. To reduce the regression variance,

we perform regression on each of them and combine the results

by a weighted average. Given the in-place self-similar example

patch pairs {YO,XO}Y=l for y, we have

9

x = L (XOi + "ilF(yo,)(y - Yo,)) Wi, (2) i=l

where Wi = (liz) . exp {- II y - YOi II§ 120"2} with z the

normalization factor.

C. Dictionary Learning

The proposed dictionary-based method to learn the mapping

gradient "il f is a modification of the work by Yang et al.

[15], [16] to guarantee detail enhancement. Yang et al. [15],

[16] developed a method for single image SR based on sparse

modeling. This method utilizes an overcomplete dictionary

Dh E lRnxK built using the HR image, which is an n x K matrix whose K columns represents K "atoms" of size n,

where an "atom" is a sparse coefficient vector (i.e., a vector

of weights/coefficients in the sparse basis). We assume that

any patch x E lRn in the HR image X can be represented as

a sparse linear combination of the atoms of Dh as follows:

x:::::; DhQ, with II Q 110« K, Q E lRK. (3)

A patch y in the observed LR image can be represented using

a corresponding LR dictionary Dl with the same sparse coeffi­

cient vector Q. This is ensured by co-training the dictionary Dh with the HR patches and dictionary Dl with the corresponding

LR patches.

For a given input LR image patch y we determine the sparse

solution vector

Q* = min II GDIQ - Gy II� +A II Q 111 a

(4)

where G is a feature extraction operator to emphasize high­

frequency detail. We use the following set of I-D filters:

9, = [-1,0,1]' 92 = 9; , 93 = [-1,-2,1], 94 = 9� (5)

122

Page 3: 5 single image super resolution using

G is obtained as a concatenation of the responses from

applying the above I-D filters to the image. The sparsity of

the solution vector a* is controlled by A. In order to enhance

the texture details while suppressing noise and other artifacts,

we need to adapt the number of non-zero coefficients in the

solution vector a*, as increasing the number of non-zero

coefficients enhances the texture details but also enhances the

noise and artifacts. We use the standard deviation ((J') of a

patch to indicate the local texture content, and empirically

adapted A as follows:

{ 0.5 if (J' < 15 A = 0.1 if 15 � (J' � 25

0.01 otherwise

These (J' thresholds are designed for our 8-bit gray-scale

images and can easily be adapted for other image types.

The mapping gradient \7 f for a given Yo is obtained as

\7f(yo) = Dha*. We make use of a bilateral filter as a degradation operator

instead of a Gaussian blurring operator to obtain the image

Yo from the given LR input image Xo for dictionary training,

as we are interested in enhancing the textures present while

suppressing noise and other artifacts. Dictionary training starts

by sampling in-place self-similar example image patch pairs

{Yo, XO}�l from the corresponding LR and HR images. We

generate the HR patch vector X h = {xolO X02 , • • • , Xo= }, LR

patch feature vector Yi = {Yo" Yo2, • • • ,xo=} and residue

patch vector E = {xo, - Yo" x02 - YOz,' .. ,XOm - YOm}' We

use the residue patch vector E instead of the HR patch vector

Xh for training. The residue patch vector is concatenated with

the LR patch features, and a concatenated dictionary is defined

by

(6)

Here, Nand M are dimensions of LR and HR image patches

in vector form. Optimized dictionaries are computed by

II Xc - DcZ II� +A II Z 111 s.t. II DCi II�� 1, i = 1, ... , K

(7)

The trammg process is performed in an iterative manner,

alternating between optimizing Z and Dc using the technique

in [15].

III. EXPERIMENTS AND RESULTS

We evaluate the proposed SR algorithm both quantitatively

and qualitatively, on a variety of example images used in the

SR literature [17]. We compare our SR algorithm with recent

algorithms proposed by Glasner et al. [5], Yang et al. [18] and

Freedman et al. [3].We used open source implementations of

these three SR algorithms available online for comparison,

carefully choosing the various parameters within each method

for a fair comparison.

123

TABLE I PREDICTION RMSE FOR ONE UPSCALlNG STEP (2x)

Images Bicubic Glasner Yang Freedman Ours [5] [18] [3] Chip 6.03 5.81 5.70 5.85 4.63 Child 7.47 6.74 7.06 6.51 5.92

Peppers 9.11 8.97 9.10 8.72 7.74 House 10.37 10.41 10.16 9.62 8.14

Cameraman 11.61 10.93 11.81 10.64 8.97 Lena 13.31 12.92 12.65 11.97 11.41

Barbara 14.93 14.24 13.92 13.23 12.22 Monarch 16.25 15.71 15.96 15.50 15.42

A. Algorithm Parameter Settings

We chose the image patch size as a = 5 and the iterative

scaling factor as r = 2 in all of our experiments. Bicubic

interpolation on the input LR image Xo generates the low­frequency component Y of the target HR image X. A standard

deviation of 0.4 is used in the low-pass Gaussian filtering to

obtain the low-frequency component Yo of the input LR image

Xo. For clean images, we use the nearest neighbor in-place

example for regression, whereas in the case of noisy images,

we average all the 9 in-place example regressions for robust

estimation, where (J' is the only tuning parameter needed to

compute the weight Wi in (2) depending on the noise level.

K = 512 atoms are used to train and build the dictionaries

Dh and Dl used in the experiments.

B. Quantitative Results

In order to obtain an objective measure of performance

for the SR algorithms under comparison, we validated the

results of several example images taken from [10] (whose

names appear in Table 1) using the root mean square error

(RMSE). The results of all the algorithms are shown in Table

1 for one upscaling step (2 x ). From Table 1 we observe

that SR using simple bicubic interpolation performs the worst

due to the assumption of overly smooth image priors. Yang's

SR algorithm performs better than bicubic interpolation in

terms of RMSE values for the different images. Glasner's

and Freedman's SR methods have very similar RMSE values,

since both the methods are closely related by using local self­

similar patches to learn the HR image patches from a single

LR image. The proposed SR algorithm has the best RMSE

values, as it combines the advantages of in-place example

patches and their corresponding local self similarity learned

using the dictionary-based approach.

C. Qualitative Results

Real applications requiring SR rely on three main aspects:

image sharpness, image naturalness (affected by visual arti­

facts) and the speed of the algorithm to super-resolve. We

will discuss the SR algorithms compared here with respect

to these aspects. Fig. 2 shows the SR results of the different

approaches on "child" by 4 x , "cameraman" by 3 x and on

"castle" by 2 x . As shown, Glasner's and Freedman's SR

algorithms give rise to overly sharp images, resulting in visual

artifacts, e.g., ghosting and ringing artifacts around the eyes

Page 4: 5 single image super resolution using

Original Bicubic Glasner Yang Freedman Ours

Fig. 2. Super-resolution results on "child" (4x), "cameraman" (3x) and "castle" (2x). Results are better viewed in zoomed mode.

in "child", and jagged artifacts along the towers in "castle" .

Also, the details of the camera are smudged in "cameraman"

for both algorithms. The results of the Yang's SR algorithm

are generally a little blurry and they contain small visible

noise-like artifacts across the images upon a closer look.

In comparison, our algorithm is able to recover the local

texture details as well as sharp edges without sacrificing the

naturalness of the images.

IV. CONCLUSION

In this paper we propose a robust first-order regression

model for single-image SR based on local self-similarity

within the image. Our approach combines the advantages

of learning from in-place examples and learning from local

self-similar patches within the same image using a trained

dictionary. The in-place examples allow us to learn a local

regression function for the otherwise ill-posed mapping from

LR to HR image patches. On the other hand, by learning

from local self-similar patches elsewhere within the image,

the regression model can overcome the problem of insufficient

number of in-place examples. By conducting various experi­

ments and comparing with existing algorithms, we show that

our new approach is more accurate and can produce more

natural looking results with sharp details by suppressing the

noisy artifacts present within the images.

REFERENCES

[I] S. Dai, M. Han, W. Xu, Y, Wu, Y. Gong, and A. K. Katsaggelos, "SoftCuts: a soft edge smoothness prior for color image super-resolution," IEEE Trans. Image. Process., vol. 18, no. 5, pp. 969-981, May 2009.

[2] R. Fattal, "Image upsampling via imposed edge statistics;' ACM Trans­

actions on Graphics, vol. 26, no. 3, pp. 95-1-95-8, Jul. 2007. [3] G. Freedman and R. Fallal, "Image and video upscaling from local self­

examples," ACM Transactions on Graphics, vol. 30, no. 2, pp. 12-1-12-II, Apr. 2011.

124

[4] W. T. Freeman, T. R. Jones, and E. C. Pasztor, "Example-based super­resolution," IEEE Comput. Graph. Appl., vol. 22, no. 2, pp. 56-65, Mar. 2002.

[5] D. Glasner, S. Bagon, and M. Irani, "Super-resolution from a single image;' in Proc. IEEE Int. Con! Computer Vision, pp. 349-356, 2009.

[6] H. He and W-c. Siu, "Single image super-resolution using Gaussian process regression," in Proc. IEEE C0I1f Computer Vision and Pattern

Recognition, pp. 449-456, 2011. [7] K. I. Kim and Y. K won, "Single-image super-resolution using sparse

regression and natural image prior," IEEE Trans. Pattern. Anal. Mach.

buell., vol. 32, no. 6, pp. 1127-1133, Jun. 2010. [8] X. Li and M. T. Orchard, "New edge-directed interpolation," IEEE Trans.

Image. Process., vol. 10, no. 10, pp. 1521-1527, Oct. 2001. [9] S. Mallat and G. Yu, "Super-resolution with sparse mixing estimators,"

IEEE Trans. Image. Process., vol. 19, no. II, pp. 2889-2900, Nov. 2010. [10] D. Martin, C. Fowlkes, D. Tal, and J. Malik, "A dataset of human

segmentation natural images and its application to evaluating segmen­tation algorithms and measuring ecological statistics," in Proc. Int. Con!

Computer Vision, pp. 416-423, 2001. [11] Q. Shan, Z. Li, J. Jia, and C-K. Tang, "Fast image/video upsampling,"

ACM Transactions on Graphics, vol. 27, no. 5, pp. 153-1-153-8, Dec. 2008.

[12] J. Sun, J. Sun, Z. Xu, and H-Y. Shum, "Gradient profile prior and its applications in image super-resolution and enhancement," IEEE Trans.

Image. Process., vol. 20, no. 6, Jun. 2011. [13] R. Timofte, V. D. Smet, and L. V. Gool, "Anchored neighborhood

regression for fast example-based super-resolution," in IEEE Int. Con!

Computer Vision, 2013. [14] Q. Wang, X. Tang, and H. Shum, "Patch based blind image super

resolution," in Proc. IEEE Int. COllf Computer Vision, pp. 709-716, 2005. [15] J. Yang, J. Wright, T. S. Huang, and Y. Ma, "Image super-resolution via

sparse representation;' IEEE Trans. Image. Process., vol. 19, no. 11, pp. 2861-2873, Nov. 2010.

[16] J. Yang, Z. Wang, Z. Lin, S. Cohen, and T. S. Huang, "Coupled dictio­nary training for image super-resolution," IEEE Trans. Image. Process., vol. 21, no. 8, pp. 3467-3478, Aug. 2012.

[17] J. Yang, Z. Lin, and S. Cohen, "Fast image super-resolution based on in-place example regression," in Proc. IEEE Con! Computer Vision and

Pattern Recognition, pp. 1059-1066,2013. [18] C-Y. Yang, J-B. Huang, and M-H. Yang, "Exploiting self-similarities for

single frame super-resolution," in Proc. Asian Con! Computer Vision, pp. 497-510,2010.