IEEE FINAL YEAR PROJECTS 2012 2013 - …elysiumtechnologies.info/IEEE-PROJECT 2012-2013... · IEEE...

Elysium Technologies Private Limited Approved by ISO 9001:2008 and AICTE for SKP Training Singapore | Madurai | Trichy | Coimbatore | Cochin | Kollam | Chennai http://www.elysiumtechnologies.com, [email protected]

IEEE Final Year Projects 2012 |Student Projects | Digital Image Processing Projects

IEEE FINAL YEAR PROJECTS 2012 – 2013

DIGITAL IMAGE PROCESSING

Corporate Office: Madurai

227-230, Church road, Anna nagar, Madurai – 625 020.

0452 – 4390702, 4392702, +9199447933980

[email protected], [email protected]

http://elysiumtechnologies.com

Branch Office: Trichy

15, III Floor, SI Towers, Melapudur main road, Trichy – 620 001.

0431 – 4002234, +919790464324.

[email protected], [email protected].

http://elysiumtechnologies.com

Branch Office: Coimbatore

577/4, DB Road, RS Puram, Opp to KFC, Coimbatore – 641 002.

+919677751577

http://elysiumtechnologies.com, [email protected]

Branch Office: Kollam

Surya Complex, Vendor junction, Kollam – 691 010, Kerala.

0474 – 2723622, +919446505482.

[email protected], http://elysiumtechnologies.com

Branch Office: Cochin

4th

Floor, Anjali Complex, near south over bridge, Valanjambalam,

Cochin – 682 016, Kerala.

0484 – 6006002, +917736004002.

[email protected], http://elysiumtechnologies.com

mailto:[email protected]


http://elysiumtechnologies.com/












EGC

1301

EGC

1302

EGC

1303

DIGITAL MAGE PROCESSING 2012 - 2013

This paper explores a Bayesian theoretic approach to constructing multiscale complex-phase-order representations. We

formulate the construction of complex-phase-order representations at different structural scales based on the scale-

space theory. Linear and nonlinear deterministic approaches are explored, and a Bayesian theoretic approach is

introduced for constructing representations in such a way that strong structure localization and noise resilience are

achieved. Experiments illustrate its potential for constructing robust multiscale complex-phase-order representations

with well-localized structures across all scales under high-noise situations. An illustrative example of applications of the

proposed approach is presented in the form of multimodal image registration and feature extraction.

In this paper, a coarse-to-fine framework is proposed to register accurately the local regions of interest (ROIs) of images

with independent perspective motions by estimating their deformation parameters. A coarse registration approach

based on control points (CPs) is presented to obtain the initial perspective parameters. This approach exploits two

constraints to solve the problem with a very limited number of CPs. One is named the point-point-line topology

constraint, and the other is named the color and intensity distribution of segment constraint. Both of the constraints

describe the consistency between the reference and sensed images. To obtain a finer registration, we have converted

the perspective deformation into affine deformations in local image patches so that affine refinements can be used

readily. Then, the local affine parameters that have been refined are utilized to recover precise perspective parameters of

a ROI. Moreover, the location and dimension selections of local image patches are discussed by mathematical

demonstrations to avoid the aperture effect. Experiments on simulated data and real-world sequences demonstrate the

accuracy and the robustness of the proposed method. The experimental results of image super-resolution are also

provided, which show a possible practical application of our method.

We consider the problem of decomposing a video sequence into a superposition of (a given number of) moving layers.

For this problem, we propose an energy minimization approach based on the coding cost. Our contributions affect both

the model (what is minimized) and the algorithmic side (how it is minimized). The novelty of the coding-cost model is the

inclusion of a refined model of the image formation process, known as super resolution. This accounts for camera blur

and area averaging arising in a physically plausible image formation process. It allows us to extract sharp high-

resolution layers from the video sequence. The algorithmic framework is based on an alternating minimization scheme

A Bayesian Theoretic Approach to Multiscale Complex-Phase-Order Representations

A Coarse-to-Fine Subpixel Registration Method to Recover Local Perspective

Deformation in the Application of Image

A Coding-Cost Framework for Super-Resolution Motion Layer Decomposition



EGC

1304

EGC

1305

EGC

1306

and includes the following innovations. (1) A video labeling, we optimize the layer domains. This allows to regularize the

shapes of the layers and a very elegant handling of occlusions. (2) We present an efficient parallel algorithm for

extracting super-resolved layers based on TV filtering.

Hyper spectral data processing typically demands enormous computational resources in terms of storage, computation,

and input/output throughputs, particularly when real-time processing is desired. In this paper, a proof-of-concept study

is conducted on compressive sensing (CS) and unmixing for hyper spectral imaging. Specifically, we investigate a low-

complexity scheme for hyper spectral data compression and reconstruction. In this scheme, compressed hyper spectral

data are acquired directly by a device similar to the single-pixel camera based on the principle of CS. To decode the

compressed data, we propose a numerical procedure to compute directly the unmixed abundance fractions of given end

members, completely bypassing high-complexity tasks involving the hyper spectral data cube itself. The reconstruction

model is to minimize the total variation of the abundance fractions subject to a preprocessed fidelity equation with a

significantly reduced size and other side constraints. An augmented Lagrangian-type algorithm is developed to solve

this model. We conduct extensive numerical experiments to demonstrate the feasibility and efficiency of the proposed

approach, using both synthetic data and hardware-measured data. Experimental and computational evidences obtained

from this paper indicate that the proposed scheme has a high potential in real-world applications.

In this paper, we introduce a method to detect co-saliency from an image pair that may have some objects in common.

The co-saliency is modeled as a linear combination of the single-image saliency map (SISM) and the multi-image

saliency map (MISM). The first term is designed to describe the local attention, which is computed by using three

saliency detection techniques available in literature. To compute the MISM, a co-multilayer graph is constructed by

dividing the image pair into a spatial pyramid representation. Each node in the graph is described by two types of

visual descriptors, which are extracted from a representation of some aspects of local appearance, e.g., color and

texture properties. In order to evaluate the similarity between two nodes, we employ a normalized single-pair SimRank

algorithm to compute the similarity score. Experimental evaluation on a number of image pairs demonstrates the good

performance of the proposed method on the co-saliency detection task.

A Compressive Sensing and Unmixing Scheme for Hyper spectral Data Processing

A Co-Saliency Model of Image Pairs

A Fast Majorize–Minimize Algorithm for the Recovery of Sparse and Low-Rank Matrices



EGC

1307

EGC

1308

EGC

1309

We introduce a novel algorithm to recover sparse and low-rank matrices from noisy and undersampled measurements.

We pose the reconstruction as an optimization problem, where we minimize a linear combination of data consistency

error, nonconvex spectral penalty, and nonconvex sparsity penalty. We majorize the nondifferentiable spectral and

sparsity penalties in the criterion by quadratic expressions to realize an iterative three-step alternating minimization

scheme. Since each of these steps can be evaluated either analytically or using fast schemes, we obtain a

computationally efficient algorithm. We demonstrate the utility of the algorithm in the context of dynamic magnetic

resonance imaging (MRI) reconstruction from sub-Nyquist sampled measurements. The results show a significant

improvement in signal-to-noise ratio and image quality compared with classical dynamic imaging algorithms. We expect

the proposed scheme to be useful in a range of applications including video restoration and multidimensional MRI

High dynamic range imaging (HDRI) methods in computational photography address situations where the dynamic

range of a scene exceeds what can be captured by an image sensor in a single exposure. HDRI techniques have also

been used to construct radiance maps in measurement applications; unfortunately, the design and evaluation of HDRI

algorithms for use in these applications have received little attention. In this paper, we develop a novel HDRI technique

based on pixel-by-pixel Kalman filtering and evaluate its performance using objective metrics that this paper also

introduces. In the presented experiments, this new technique achieves as much as 9.4-dB improvement in signal-to-

noise ratio and can achieve as much as a 29% improvement in radiometric accuracy over a classic method.

Loss of information in a wavelet domain can occur during storage or transmission when the images are formatted and

stored in terms of wavelet coefficients. This calls for image inpainting in wavelet domains. In this paper, a variational

approach is used to formulate the reconstruction problem. We propose a simple but very efficient iterative scheme to

calculate an optimal solution and prove its convergence. Numerical results are presented to show the performance of

the proposed algorithm.

In this paper, we propose a new psycho visual quality metric of images based on recent developments in brain theory

and neuroscience, particularly the free-energy principle. The perception and understanding of an image is modeled as

an active inference process, in which the brain tries to explain the scene using an internal generative model. The

psychovisual quality is thus closely related to how accurately visual sensory data can be explained by the generative

model, and the upper bound of the discrepancy between the image signal and its best internal description is given by

A Kalman-Filtering Approach to High Dynamic Range Imaging for Measurement

Applications

A Primal–Dual Method for Total-Variation-Based Wavelet Domain Inpainting

A Psycho visual Quality Metric in Free-Energy Principle



EGC

1310

EGC

1311

the free energy of the cognition process. Therefore, the perceptual quality of an image can be quantified using the free

energy. Constructively, we develop a reduced-reference free-energy-based distortion metric (FEDM) and a no-reference

free-energy-based quality metric (NFEQM). The FEDM and the NFEQM are nearly invariant to many global systematic

deviations in geometry and illumination that hardly affect visual quality, for which existing image quality metrics wrongly

predict severe quality degradation. Although with very limited or even without information on the reference image, the

FEDM and the NFEQM are highly competitive compared with the full-reference SSIM image quality metric on images in

the popular LIVE database. Moreover, FEDM and NFEQM can measure correctly the visual quality of some model-based

image processing algorithms, for which the competing metrics often contradict with viewers' opinions.

A new blind authentication method based on the secret sharing technique with a data repair capability for grayscale

document images via the use of the Portable Network Graphics (PNG) image is proposed. An authentication signal is

generated for each block of a grayscale document image, which, together with the binarized block content, is

transformed into several shares using the Shamir secret sharing scheme. The involved parameters are carefully chosen

so that as many shares as possible are generated and embedded into an alpha channel plane. The alpha channel plane

is then combined with the original grayscale image to form a PNG image. During the embedding process, the computed

share values are mapped into a range of alpha channel values near their maximum value of 255 to yield a transparent

stego-image with a disguise effect. In the process of image authentication, an image block is marked as tampered if the

authentication signal computed from the current block content does not match that extracted from the shares embedded

in the alpha channel plane. Data repairing is then applied to each tampered block by a reverse Shamir scheme after

collecting two shares from unmarked blocks. Measures for protecting the security of the data hidden in the alpha

channel are also proposed. Good experimental results prove the effectiveness of the proposed method for real

applications.

This paper proposes a new semitransparency-based optical-flow model with a point trajectory (PT) model for particle-

like video. Previous optical-flow models have used ranging from image brightness constancy to image brightness

change models as constraints. However, two important issues remain unsolved. The first is how to track/match a

semitransparent object with a very large displacement between frames. Such moving objects with different shapes and

sizes in an outdoor scene move against a complicated background. Second, due to semitransparency, the image

intensity between frames can also violate a previous image brightness-based optical-flow model. Thus, we propose a

two-step optimization for the optical-flow estimation model for a moving semitransparent object, i.e., particle. In the first

step, a rough optical flow between particles is estimated by a new alpha constancy constraint that is based on an image

generation model of semitransparency. In the second step, the optical flow of a particle with a continuous trajectory in a

A Secret-Sharing-Based Method for Authentication of Grayscale Document Images via

the Use of the PNG Image with a Data Repair Capability

A Semi transparency-Based Optical-Flow Method with a Point Trajectory Model for

Particle-Like Video



EGC

1312

EGC

1313

EGC

1314

definite temporal interval based on a PT model can be refined. Many experiments using various falling-snow and foggy

scenes with multiple moving vehicles show the significant improvement of the optical flow compared with a previous

optical-flow model.

This paper presents an algorithm designed to measure the local perceived sharpness in an image. Our method utilizes

both spectral and spatial properties of the image: For each block, we measure the slope of the magnitude spectrum and

the total spatial variation. These measures are then adjusted to account for visual perception, and then, the adjusted

measures are combined via a weighted geometric mean. The resulting measure, i.e., S3 (spectral and spatial sharpness),

yields a perceived sharpness map in which greater values denote perceptually sharper regions. This map can be

collapsed into a single index, which quantifies the overall perceived sharpness of the whole image. We demonstrate the

utility of the S3 measure for within-image and across-image sharpness prediction, no-reference image quality

assessment of blurred images, and monotonic estimation of the standard deviation of the impulse response used in

Gaussian blurring. We further evaluate the accuracy of S3 in local sharpness estimation by comparing S3 maps to

sharpness maps generated by human subjects. We show that S3 can generate sharpness maps, which are highly

correlated with the human-subject maps.

Despite years of research, the name ambiguity problem remains largely unresolved. Outstanding issues include how to

A Many facial-analysis approaches rely on robust and accurate automatic facial land marking to correctly function. In

this paper, we describe a statistical method for automatic facial-landmark localization. Our land marking relies on a

parsimonious mixture model of Gabor wavelet features, computed in coarse-to-fine fashion and complemented with a

shape prior. We assess the accuracy and the robustness of the proposed approach in extensive cross-database

conditions conducted on four face data sets (Face Recognition Grand Challenge, Cohn-Kanade, Bosphorus, and BioID).

Our method has 99.33% accuracy on the Bosphorus database and 97.62% accuracy on the BioID database on the

average, which improves the state of the art. We show that the method is not significantly affected by low-resolution

images, small rotations, facial expressions, and natural occlusions such as beard and mustache. We further test the

goodness of the landmarks in a facial expression recognition application and report land marking-induced improvement

over baseline on two separate databases for video-based expression recognition (Cohn-Kanade and BU-4DFE).

A Spectral and Spatial Measure of Local Perceived Sharpness in Natural Images

A Surface-Based 3-D Dendritic Spine Detection Approach from Confocal Microscopy

Images

A Statistical Method for 2-D Facial Land marking



EGC

1315

EGC

1317

EGC

1316

Determining the relationship between the dendritic spine morphology and its functional properties is a fundamental

challenge in neurobiology research. In particular, how to accurately and automatically analyse meaningful structural

information from a large microscopy image data set is far away from being resolved. As pointed out in existing literature,

one remaining challenge in spine detection and segmentation is how to automatically separate touching spines. In this

paper, based on various global and local geometric features of the dendrite structure, we propose a novel approach to

detect and segment neuronal spines, in particular, a breaking-down and stitching-up algorithm to accurately separate

touching spines. Extensive performance comparisons show that our approach is more accurate and robust than two

state-of-the-art spine detection and segmentation algorithms.

This paper proposes a new motion-compensated frame interpolation (MCFI) method. The proposed method utilizes a

symmetric motion estimation (SME) method, which is a new pixel-wise motion estimation method for intermediate frame

interpolation. By using an adaptive search range for the motion estimation, the proposed method can obtain a more

reliable motion vector for each pixel than previous MCFI methods that use a conventional block matching algorithm

(BMA). In addition, we propose a combined method of the SME and BMA to reduce the computation time of the pixel-

wise motion estimation method. The experimental results show that the proposed method outperforms other MCFI

methods in terms of generating objectively and subjectively better interpolated frames.

A In this paper, a visual attention model is incorporated for efficient saliency detection, and the salient regions are

employed as object seeds for our automatic object segmentation system. In contrast with existing interactive

segmentation approaches that require considerable user interaction, the proposed method does not require it, i.e., the

segmentation task is fulfilled in a fully automatic manner. First, we introduce a novel unified spectral-domain approach

for saliency detection. Our visual attention model originates from a well-known property of the human visual system that

the human visual perception is highly adaptive and sensitive to structural information in images rather than

nonstructural information. Then, based on the saliency map, we propose an iterative self-adaptive segmentation

framework for more accurate object segmentation. Extensive tests on a variety of cluttered natural images show that the

proposed algorithm is an efficient indicator for characterizing the human perception and it can provide satisfying

segmentation performance.

A Symmetric Motion Estimation Method for Motion-Compensated Frame Interpolation

A Uniform Grid Structure to Speed up Example-Based Photometric Stereo

A Unified Spectral-Domain Approach for Saliency Detection and Its Application to

Automatic Object Segmentation



EGC

1319

EGC

1318

EGC

1320

In this paper, we describe a data structure and an algorithm to accelerate the table lookup step in example-based

multiimage photometric stereo. In that step, one must find a pixel of a reference object, of known shape and color,

whose appearance under different illumination fields is similar to that of a given scene pixel. This search reduces to

finding the closest match to a given -vector in a table with a thousand or more -vectors. Our method is faster than

previously known solutions for this problem but, unlike some of them, is exact, i.e., always yields the best matching

entry in the table, and does not assume point-like sources. Our solution exploits the fact that the table is in fact a fairly

flat 2-D manifold in -dimensional space so that the search can be efficiently solved with a uniform 2-D grid structure.

The robust tracking of abrupt motion is a challenging task in computer vision due to its large motion uncertainty. While

various particle filters and conventional Markov-chain Monte Carlo (MCMC) methods have been proposed for visual

tracking, these methods often suffer from the well-known local-trap problem or from poor convergence rate. In this

paper, we propose a novel sampling-based tracking scheme for the abrupt motion problem in the Bayesian filtering

framework. To effectively handle the local-trap problem, we first introduce the stochastic approximation Monte Carlo

(SAMC) sampling method into the Bayesian filter tracking framework, in which the filtering distribution is adaptively

estimated as the sampling proceeds, and thus, a good approximation to the target distribution is achieved. In addition,

we propose a new MCMC sampler with intensive adaptation to further improve the sampling efficiency, which combines

a density-grid-based predictive model with the SAMC sampling, to give a proposal adaptation scheme. The proposed

method is effective and computationally efficient in addressing the abrupt motion problem. We compare our approach

with several alternative tracking algorithms, and extensive experimental results are presented to demonstrate the

effectiveness and the efficiency of the proposed method in dealing with various types of abrupt motions.

This paper introduces a class of adaptive Perona-Malik (PM) diffusion, which combines the PM equation with the heat

equation. The PM equation provides a potential algorithm for image segmentation, noise removal, edge detection, and

image enhancement. However, the defect of traditional PM model is tending to cause the staircase effect and create new

features in the processed image. Utilizing the edge indicator as a variable exponent, we can adaptively control the

diffusion mode, which alternates between PM diffusion and Gaussian smoothing in accordance with the image feature.

Computer experiments indicate that the present algorithm is very efficient for edge detection and noise removal.

Adaptive Perona–Malik Model Based on the Variable Exponent for Image Denoising

Abrupt Motion Tracking Via Intensively Adaptive Markov-Chain Monte Carlo Sampling

Algorithms for the Digital Restoration of Torn Films



EGC

1321

EGC

1323

EGC

1322

This paper presents algorithms for the digital restoration of films damaged by tear. As well as causing local image data

loss, a tear results in a noticeable relative shift in the frame between the regions at either side of the tear boundary. This

paper describes a method for delineating the tear boundary and for correcting the displacement. This is achieved using

a graph-cut segmentation framework that can be either automatic or interactive when automatic segmentation is not

possible. Using temporal intensity differences to form the boundary conditions for the segmentation facilitates the

robust division of the frame. The resulting segmentation map is used to calculate and correct the relative displacement

using a global-motion estimation approach based on motion histograms. A high-quality restoration is obtained when a

suitable missing-data treatment algorithm is used to recover any missing pixel intensities.

Power-line-strike accident is a major safety threat for low-flying aircrafts such as helicopters, thus an automatic warning

system to power lines is highly desirable. In this paper we propose an algorithm for detecting power lines from radar

videos from an active millimeter-wave sensor. Hough Transform is employed to detect candidate lines. The major

challenge is that the radar videos are very noisy due to ground return. The noise points could fall on the same line which

results in signal peaks after Hough Transform similar to the actual cable lines. To differentiate the cable lines from the

noise lines, we train a Support Vector Machine to perform the classification. We exploit the Bragg pattern, which is due

to the diffraction of electromagnetic wave on the periodic surface of power lines. We propose a set of features to

represent the Bragg pattern for the classifier. We also propose a slice-processing algorithm which supports parallel

processing, and improves the detection of cables in a cluttered background. Lastly, an adaptive algorithm is proposed

to integrate the detection results from individual frames into a reliable video detection decision, in which temporal

correlation of the cable pattern across frames is used to make the detection more robust. Extensive experiments with

real-world data validated the effectiveness of our cable detection algorithm.

Speeded-Up Robust Features is a feature extraction algorithm designed for real-time execution, although this is rarely

achievable on low-power hardware such as that in mobile robots. One way to reduce the computation is to discard some

of the scale-space octaves, and previous research has simply discarded the higher octaves. This paper shows that this

approach is not always the most sensible and presents an algorithm for choosing which octaves to discard based on the

properties of the imagery. Results obtained with this best octaves algorithm show that it is able to achieve a significant

reduction in computation without compromising matching performance.

An Algorithm for Power Line Detection and Warning Based on a Millimeter-Wave Radar

Video

An Alternating Minimization Algorithm for Binary Image Restoration

An Algorithm for the Contextual Adaption of SURF Octave Selection with Good

Matching Performance: Best Octaves



EGC

1324

EGC

1325

The problem we will consider in this paper is binary image restoration. It is, in essence, difficult to solve because of the

combinatorial nature of the problem. To overcome this difficulty, we propose a new minimization model by making use

of a new variable to enforce the image to be binary. Based on the proposed minimization model, we present a fast

alternating minimization algorithm for binary image restoration. We prove the convergence of the proposed alternating

minimization algorithm. Experimental results show that the proposed method is feasible and effective for binary image

restoration.

In the field of machine vision, camera calibration refers to the experimental determination of a set of parameters that

describe the image formation process for a given analytical model of the machine vision system. Researchers working

with low-cost digital cameras and off-the-shelf lenses generally favor camera calibration techniques that do not rely on

specialized optical equipment, modifications to the hardware, or an a priori knowledge of the vision system. Most of the

commonly used calibration techniques are based on the observation of a single 3-D target or multiple planar (2-D)

targets with a large number of control points. This paper presents a novel calibration technique that offers improved

accuracy, robustness, and efficiency over a wide range of lens distortion. This technique operates by minimizing the

error between the reconstructed image points and their experimentally determined counterparts in ―distortion free‖

space. This facilitates the incorporation of the exact lens distortion model. In addition, expressing spatial orientation in

terms of unit quaternions greatly enhances the proposed calibration solution by formulating a minimally redundant

system of equations that is free of singularities. Extensive performance benchmarking consisting of both computer

simulation and experiments confirmed higher accuracy in calibration regardless of the amount of lens distortion present

in the optics of the camera. This paper also experimentally confirmed that a comprehensive lens distortion model

including higher order radial and tangential distortion terms improves calibration accuracy.

Ab initio protein structure prediction methods first generate large sets of structural conformations as candidates (called

decoys), and then select the most representative decoys through clustering techniques. Classical clustering methods

are inefficient due to the pairwise distance calculation, and thus become infeasible when the number of decoys is large.

In addition, the existing clustering approaches suffer from the arbitrariness in determining a distance threshold for

proteins within a cluster: a small distance threshold leads to many small clusters, while a large distance threshold

results in the merging of several independent clusters into one cluster. In this paper, we propose an efficient clustering

method through fast estimating cluster centroids and efficient pruning rotation spaces. The number of clusters is

automatically detected by information distance criteria. A package named ONION, which can be downloaded freely, is

An Efficient Camera Calibration Technique Offering Robustness and Accuracy over a

Wide Range of Lens Distortion

An Efficient Selective Perceptual-Based Super-Resolution Estimator



EGC

1326

EGC

1327

EGC

1328

implemented accordingly. Experimental results on benchmark data sets suggest that ONION is 14 times faster than

existing tools, and ONION obtains better selections for 31 targets, and worse selection for 19 targets compared to

SPICKER's selections. On an average PC, ONION can cluster 100,000 decoys in around 12 minutes.

In this correspondence, we present an original energy-based model that achieves the edge-histogram specification of a

real input image and thus extends the exact specification method of the image luminance (or gray level) distribution

recently proposed by Coltuc et al. Our edge-histogram specification approach is stated as an optimization problem in

which each edge of a real input image will tend iteratively toward some specified gradient magnitude values given by a

target edge distribution (or a normalized edge histogram possibly estimated from a target image). To this end, a hybrid

optimization scheme combining a global and deterministic conjugate-gradient-based procedure and a local stochastic

search using the Metropolis criterion is proposed herein to find a reliable solution to our energy-based model.

Experimental results are presented, and several applications follow from this procedure.

Split networks are commonly used to visualize collections of bipartitions, also called splits, of a finite set. Such

collections arise, for example, in evolutionary studies. Split networks can be viewed as a generalization of phylogenetic

trees and may be generated using the SplitsTree package. Recently, the NeighborNet method for generating split

networks has become rather popular, in part because it is guaranteed to always generate a circular split system, which

can always be displayed by a planar split network. Even so, labels must be placed on the "outside‖ of the network,

which might be problematic in some applications. To help circumvent this problem, it can be helpful to consider so-

called flat split systems, which can be displayed by planar split networks where labels are allowed on the inside of the

network too. Here, we present a new algorithm that is guaranteed to compute a minimal planar split network displaying a

flat split system in polynomial time, provided the split system is given in a certain format. We will also briefly discuss

two heuristics that could be useful for analyzing phylogeographic data and that allow the computation of flat split

systems in this format in polynomial time.

We propose a novel online learning-based framework for occlusion boundary detection in video sequences. This

approach does not require any prior training and instead ―learns‖ occlusion boundaries by updating a set of weights for

the online learning Hedge algorithm at each frame instance. Whereas previous training-based methods perform well only

on data similar to the trained examples, the proposed method is well suited for any video sequence. We demonstrate the

An Energy-Based Model for the Image Edge-Histogram Specification Problem

An Investigation of Dehazing Effects on Image and Video Coding

An Online Learning Approach to Occlusion Boundary Detection



EGC

1329

EGC

1331

EGC

1330

performance of the proposed detector both for the CMU data set, which includes hand-labeled occlusion boundaries,

and for a novel video sequence. In addition to occlusion boundary detection, the proposed algorithm is capable of

classifying occlusion boundaries by angle and by whether the occluding object is covering or uncovering the

background.

To recover a sharp version from a blurred image is a long-standing inverse problem. In this paper, we analyze the

research on this topic both theoretically and experimentally through three paradigms: 1) the deterministic filter; 2)

Bayesian estimation; and 3) the conjunctive deblurring algorithm (CODA), which performs the deterministic filter and

Bayesian estimation in a conjunctive manner. We point out the weaknesses of the deterministic filter and unify the

limitation latent in two kinds of Bayesian estimators. We further explain why the CODA is able to handle quite large blurs

beyond Bayesian estimation. Finally, we propose a novel method to overcome several unreported limitations of the

CODA. Although extensive experiments demonstrate that our method outperforms state-of-the-art methods with a large

margin, some common problems of image deblurring still remain unsolved and should attract further research efforts.

Compressed-sensing methodology typically employs random projections simultaneously with signal acquisition to

accomplish dimensionality reduction within a sensor device. The effect of such random projections on the preservation

of anomalous data is investigated. The popular RX anomaly detector is derived for the case in which global anomalies

are to be identified directly in the random-projection domain, and it is determined via both random simulation, as well as

empirical observation that strongly anomalous vectors are likely to be identifiable by the projection-domain RX detector

even in low-dimensional projections. Finally, a reconstruction procedure for hyper spectral imagery is developed

wherein projection-domain anomaly detection is employed to partition the data set, permitting anomaly and normal pixel

classes to be separately reconstructed in order to improve the representation of the anomaly pixels.

Learning a satisfactory object detector generally requires sufficient training data to cover the most variations of the

object. In this paper, we show that the performance of object detector is severely degraded when training examples are

limited. We propose an approach to handle this issue by exploring a set of pretrained auxiliary detectors for other

categories. By mining the global and local relationships between the target object category and auxiliary objects, a

robust detector can be learned with very few training examples. We adopt the deformable part model proposed by

Felzenszwalb and simultaneously explore the root and part filters in the auxiliary object detectors under the guidance of

the few training examples from the target object category. An iterative solution is introduced for such a process. The

Analyzing Image Deblurring Through Three Paradigms

Assemble New Object Detector with Few Examples

Anomaly Detection and Reconstruction from Random Projections



EGC

1332

EGC

1333

EGC

1334

extensive experiments on the PASCAL VOC 2007 challenge data set show the encouraging performance of the new

detector assembled from those related auxiliary detectors.

A new fully automatic object tracking and segmentation framework is proposed. The framework consists of a motion-

based bootstrapping algorithm concurrent to a shape-based active contour. The shape-based active contour uses finite

shape memory that is automatically and continuously built from both the bootstrap process and the active-contour

object tracker. A scheme is proposed to ensure that the finite shape memory is continuously updated but forgets

unnecessary information. Two new ways of automatically extracting shape information from image data given a region

of interest are also proposed. Results demonstrate that the bootstrapping stage provides important motion and shape

information to the object tracker. This information is found to be essential for good (fully automatic) initialization of the

active contour. Further results also demonstrate convergence properties of the content of the finite shape memory and

similar object tracking performance in comparison with an object tracker with unlimited shape memory. Tests with an

active contour using a fixed-shape prior also demonstrate superior performance for the proposed bootstrapped finite-

shape-memory framework and similar performance when compared with a recently proposed active contour that uses

an alternative online learning model.

This paper addresses the automatic image segmentation problem in a region merging style. With an initially over

segmented image, in which many regions (or super pixels) with homogeneous color are detected, an image

segmentation is performed by iteratively merging the regions according to a statistical test. There are two essential

issues in a region-merging algorithm: order of merging and the stopping criterion. In the proposed algorithm, these two

issues are solved by a novel predicate, which is defined by the sequential probability ratio test and the minimal cost

criterion. Starting from an over segmented image, neighboring regions are progressively merged if there is an evidence

for merging according to this predicate. We show that the merging order follows the principle of dynamic programming.

This formulates the image segmentation as an inference problem, where the final segmentation is established based on

the observed image. We also prove that the produced segmentation satisfies certain global properties. In addition, a

faster algorithm is developed to accelerate the region-merging process, which maintains a nearest neighbor graph in

each iteration. Experiments on real natural images are conducted to demonstrate the performance of the proposed

dynamic region-merging algorithm.

Automatic Bootstrapping and Tracking of Object Contours

Automatic Image Equalization and Contrast Enhancement Using Gaussian Mixture

Modeling

Bayesian Estimation for Optimized Structured Illumination Microscopy



EGC

1335

EGC

1336

Structured illumination microscopy is a recent imaging technique that aims at going beyond the classical optical

resolution by reconstructing high-resolution (HR) images from low-resolution (LR) images acquired through modulation

of the transfer function of the microscope. The classical implementation has a number of drawbacks, such as requiring a

large number of images to be acquired and parameters to be manually set in an ad-hoc manner that have, until now,

hampered its wide dissemination. Here, we present a new framework based on a Bayesian inverse problem formulation

approach that enables the computation of one HR image from a reduced number of LR images and has no specific

constraints on the modulation. Moreover, it permits to automatically estimate the optimal reconstruction

hyperparameters and to compute an uncertainty bound on the estimated values. We demonstrate through numerical

evaluations on simulated data and examples on real microscopy data that our approach represents a decisive advance

for a wider use of HR microscopy through structured illumination.

A hierarchical Bayesian model is considered for decomposing a matrix into low-rank and sparse components’,

assuming the observed matrix is a superposition of the two. The matrix is assumed noisy, with unknown and possibly

non-stationary noise statistics. The Bayesian framework infers an approximate representation for the noise statistics

while simultaneously inferring the low-rank and sparse-outlier contributions; the model is robust to a broad range of

noise levels, without having to change model hyperparameter settings. In addition, the Bayesian framework allows

exploitation of additional structure in the matrix. For example, in video applications each row (or column) corresponds

to a video frame, and we introduce a Markov dependency between consecutive rows in the matrix (corresponding to

consecutive frames in the video). The properties of this Markov process are also inferred based on the observed matrix,

while simultaneously denoising and recovering the low-rank and sparse components. We compare the Bayesian model

to a state-of-the-art optimization-based implementation of robust PCA; considering several examples, we demonstrate

competitive performance of the proposed model.

As a newly developed 2-D extension of the wavelet transform using multiscale and directional filter banks, the contourlet

transform can effectively capture the intrinsic geometric structures and smooth contours of a texture image that are the

dominant features for texture classification. In this paper, we propose a novel Bayesian texture classifier based on the

adaptive model-selection learning of Poisson mixtures on the contourlet features of texture images. The adaptive model-

selection learning of Poisson mixtures is carried out by the recently established adaptive gradient Bayesian Ying-Yang

harmony learning algorithm for Poisson mixtures. It is demonstrated by the experiments that our proposed Bayesian

Bayesian Robust Principal Component Analysis

Bayesian Texture Classification Based on Contourlet Transform and BYY Harmony

Learning of Poisson Mixtures



EGC

1337

EGC

1338

EGC

1339

classifier significantly improves the texture classification accuracy in comparison with several current state-of-the-art

texture classification approaches.

It is difficult to directly apply existing binarization approaches to the barcode images captured by mobile device due to

their low quality. This paper proposes a novel scheme for the binarization of such images. The barcode and

background regions are differentiated by the number of edge pixels in a search window. Unlike existing approaches

that center the pixel to be binarized with a window of fixed size, we propose to shift the window center to the nearest

edge pixel so that the balance of the number of object and background pixels can be achieved. The window size is

adaptive either to the minimum distance to edges or minimum element width in the barcode. The threshold is

calculated using the statistics in the window. Our proposed method has demonstrated its capability in handling the

nonuniform illumination problem and the size variation of objects. Experimental results conducted on 350 images

captured by five mobile phones achieve about 100% of recognition rate in good lighting conditions, and about 95% and

83% in bad lighting conditions. Comparisons made with nine existing binarization methods demonstrate the

advancement of our proposed scheme.

The Halftone dot orientation modulation has recently been proposed as a method for data hiding in printed images.

Extraction of data embedded with halftone orientation modulation is accomplished by computing, from the scanned

hardcopy image, detection statistics that uniquely identify the embedded orientation. From a communications

perspective, this data hiding setup forms an interesting class of channels with dot orientation as input and a vector of

statistics as the output. This paper derives capacity expressions for these channels that allow for numerical evaluation

of the capacity. Results provide significant insight for orientation modulation based print-scan resilient data hiding: the

capacity varies significantly as a function of the image gray level and experimentally observed error free data rates

closely mirror the variation in capacity.

.

Color constancy algorithms are generally based on the simplifying assumption that the spectral distribution of a light

source is uniform across scenes. However, in reality, this assumption is often violated due to the presence of multiple

light sources. In this paper, we will address more realistic scenarios where the uniform light-source assumption is too

restrictive. First, a methodology is proposed to extend existing algorithms by applying color constancy locally to image

patches, rather than globally to the entire image. After local (patch-based) illuminant estimation, these estimates are

combined into more robust estimations, and a local correction is applied based on a modified diagonal model.

Binarization of Low-Quality Barcode Images Captured by Mobile Phones Using Local

Window of Adaptive Location and Size

Capacity Analysis for Orthogonal Halftone Orientation Modulation Channels

Color Constancy for Multiple Light Sources



EGC

1341

EGC

1340

Quantitative and qualitative experiments on spectral and real images show that the proposed methodology reduces the

influence of two light sources simultaneously present in one scene. If the chromatic difference between these two

illuminants is more than 1° , the proposed framework outperforms algorithms based on the uniform light-source

assumption (with error-reduction up to approximately 30%). Otherwise, when the chromatic difference is less than 1°

and the scene can be considered to contain one (approximately) uniform light source, the performance of the proposed

method framework is similar to global color constancy methods.

Head pose and eye location for gaze estimation have been separately studied in numerous works in the literature.

Previous research shows that satisfactory accuracy in head pose and eye location estimation can be achieved in

constrained settings. However, in the presence of no frontal faces, eye locators are not adequate to accurately locate the

center of the eyes. On the other hand, head pose estimation techniques are able to deal with these conditions; hence,

they may be suited to enhance the accuracy of eye localization. Therefore, in this paper, a hybrid scheme is proposed to

combine head pose and eye location information to obtain enhanced gaze estimation. To this end, the transformation

matrix obtained from the head pose is used to normalize the eye regions, and in turn, the transformation matrix

generated by the found eye location is used to correct the pose estimation procedure. The scheme is designed to

enhance the accuracy of eye location estimations, particularly in low-resolution videos, to extend the operative range of

the eye locators, and to improve the accuracy of the head pose tracker. These enhanced estimations are then combined

to obtain a novel visual gaze estimation system, which uses both eye location and head information to refine the gaze

estimates. From the experimental results, it can be derived that the proposed unified scheme improves the accuracy of

eye estimations by 16% to 23%. Furthermore, it considerably extends its operating range by more than 15° by

overcoming the problems introduced by extreme head poses. Moreover, the accuracy of the head pose tracker is

improved by 12% to 24%. Finally, the experimentation on the proposed combined gaze estimation system shows that it

is accurate (with a mean error between 2° and 5°) and that it can be used in cases where classic approaches would fail

without imposing restraints on the position of the head.

.

The aim of this paper is to describe a novel and completely automated technique for carotid artery (CA) recognition, far

(distal) wall segmentation, and intima-media thickness (IMT) measurement, which is a strong clinical tool for risk

assessment for cardiovascular diseases. The architecture of completely automated multiresolution edge snapper

(CAMES) consists of the following two stages: 1) automated CA recognition based on a combination of scale-space and

statistical classification in a multiresolution framework and 2) automated segmentation of lumen-intima (LI) and media-

adventitia (MA) interfaces for the far (distal) wall and IMT measurement. Our database of 365 B-mode longitudinal carotid

images is taken from four different institutions covering different ethnic backgrounds. The ground-truth (GT) database

Combining Head Pose and Eye Location Information for Gaze Estimation

Completely Automated Multiresolution Edge Snapper—A New Technique for an Accurate

Carotid Ultrasound IMT Measurement: Clinical Validation and Benchmarking on a Multi-

Institutional Database



EGC

1343

EGC

1342

was the average manual segmentation from three clinical experts. The mean distance ± standard deviation of CAMES

with respect to GT profiles for LI and MA interfaces were 0.081 ± 0.099 and 0.082 ± 0.197 mm, respectively. The IMT

measurement error between CAMES and GT was 0.078 ± 0.112 mm. CAMES was benchmarked against a previously

developed automated technique based on an integrated approach using feature-based extraction and classifier (CALEX).

Although CAMES underestimated the IMT value, it had shown a strong improvement in segmentation errors against

CALEX for LI and MA interfaces by 8% and 42%, respectively. The overall IMT measurement bias for CAMES improved

by 36% against CALEX. Finally, this paper demonstrated that the figure-of-merit of CAMES was 95.8% compared with

87.4% for CALEX. The combination of multiresolution CA recognition and far-wall segmentation led to an automated,

low-complexity, real-time, and accurate technique for carotid IMT measurement. Validation on a multiethnic/multi-

institutional data set demonstrated the robustness of the technique, which can constitute a clinically valid IMT

measurement for assistance in atherosclerosis disease management.

A computational camera uses a combination of optics and processing to produce images that cannot be captured with

traditional cameras. In the last decade, computational imaging has emerged as a vibrant field of research. A wide variety

of computational cameras has been demonstrated to encode more useful visual information in the captured images, as

compared with conventional cameras. In this paper, we survey computational cameras from two perspectives. First, we

present taxonomy of computational camera designs according to the coding approaches, including object side coding,

pupil plane coding, sensor side coding, illumination coding, camera arrays and clusters, and unconventional imaging

systems. Second, we use the abstract notion of light field representation as a general tool to describe computational

camera designs, where each camera can be formulated as a projection of a high-dimensional light field to a 2-D image

sensor. We show how individual optical devices transform light fields and use these transforms to illustrate how

different computational camera designs (collections of optical devices) capture and encode useful visual information.

A state-of-the-art progressive source encoder is combined with a concatenated block coding mechanism to produce a

robust source transmission system for embedded bit streams. The proposed scheme efficiently trades off the available

total bit budget between information bits and parity bits through efficient information block size adjustment,

concatenated block coding, and random block interleavers. The objective is to create embedded codewords such that,

for a particular information block, the necessary protection is obtained via multiple channel encodings, contrary to the

conventional methods that use a single code rate per information block. This way, a more flexible protection scheme is

obtained. The information block size and concatenated coding rates are judiciously chosen to maximize system

performance, subject to a total bit budget. The set of codes is usually created by puncturing a low-rate mother code so

that a single encoder-decoder pair is used. The proposed scheme is shown to effectively enlarge this code set by

providing more protection levels than is possible using the code rate set directly. At the expense of complexity, average

system performance is shown to be significantly better than that of several known comparison systems, particularly at

Computational Cameras: Convergence of Optics and Processing

Concatenated Block Codes for Unequal Error Protection of Embedded Bit Streams



EGC

1344

EGC

1345

EGC

1346

higher channel bit error rates.

Ink-jet print attributes such as color gamut, grain, and cost are consequences of the materials and printing technology

used and of choices made during color management, color separation, and halftoning operation. Traditionally, color

separation determines what amounts of the available inks to use for each reproducible color, and halftoning deals with

the spatial distribution of inks that also results in the nature of their overprinting. However, using an ink space as a

means of communication between color separation and halftoning gives access only to some of the printed patterns that

a printing system is capable of and, therefore, only to a reduced range of print attributes. Here, a method, i.e., Halftone

Area Neugebauer Separation, is proposed to gain access to all possible printable patterns by specifying relative area

coverages of a printing system's Neugebauer primaries instead of only ink amounts. This results in delivering prints

with more optimal attributes (e.g., using less ink and giving rise to a larger color gamut) than is possible using current

methods.

Wavelets with composite dilations provide a general framework for the construction of waveforms defined not only at

various scales and locations, as traditional wavelets, but also at various orientations and with different scaling factors in

each coordinate. As a result, they are useful to analyze the geometric information that often dominate multidimensional

data much more efficiently than traditional wavelets. The shearlet system, for example, is a particular well-known

realization of this framework, which provides optimally sparse representations of images with edges. In this paper, we

further investigate the constructions derived from this approach to develop critically sampled wavelets with composite

dilations for the purpose of image coding. Not only do we show that many nonredundant directional constructions

recently introduced in the literature can be derived within this setting, but we also introduce new critically sampled

discrete transforms that achieve much better nonlinear approximation rates than traditional discrete wavelet transforms

and outperform the other critically sampled multiscale transforms recently proposed. demonstrate that junction tree

inference substantially improves rates of convergence compared to existing methods.

The discrete Radon transform (DRT) was defined by Abervuch et al. as an analog of the continuous Radon transform for

discrete data. Both the DRT and its inverse are computable in O(n2logn) operations for images of size n ×n. In this paper,

we demonstrate the applicability of the inverse DRT for the reconstruction of a 2-D object from its continuous

projections. The DRT and its inverse are shown to model accurately the continuum as the number of samples increases.

Controlling Ink-Jet Print Attributes Via Neugebauer Primary Area Coverages

Critically Sampled Wavelets with Composite Dilations

CT Reconstruction from Parallel and Fan-Beam Projections by a 2-D Discrete Radon

Transform



EGC

1347

EGC

1349

EGC

1348

Numerical results for the reconstruction from parallel projections are presented. We also show that the inverse DRT can

be used for reconstruction from fan-beam projections with equispaced detectors.

This paper presents a novel approach for depth video enhancement. Given a high-resolution color video and its

corresponding low-quality depth video, we improve the quality of the depth video by increasing its resolution and

suppressing noise. For that, a weighted mode filtering method is proposed based on a joint histogram. When the

histogram is generated, the weight based on color similarity between reference and neighboring pixels on the color

image is computed and then used for counting each bin on the joint histogram of the depth map. A final solution is

determined by seeking a global mode on the histogram. We show that the proposed method provides the optimal

solution with respect to L1 norm minimization. For temporally consistent estimate on depth video, we extend this

method into temporally neighboring frames. Simple optical flow estimation and patch similarity measure are used for

obtaining the high-quality depth video in an efficient manner. Experimental results show that the proposed method has

outstanding performance and is very efficient, compared with existing methods. We also show that the temporally

consistent enhancement of depth video addresses a flickering problem and improves the accuracy of depth video.

Traditionally, subpixel interpolation in stereo-vision systems was designed for the block-matching algorithm. During the

evaluation of different interpolation strategies, a strong correlation was observed between the type of the stereo

algorithm and the subpixel accuracy of the different solutions. Subpixel interpolation should be adapted to each stereo

algorithm to achieve maximum accuracy. In consequence, it is more important to propose methodologies for

interpolation function generation than specific function shapes. We propose two such methodologies based on data

generated by the stereo algorithms. The first proposal uses a histogram to model the environment and applies

histogram equalization to an existing solution adapting it to the data. The second proposal employs synthetic images of

a known environment and applies function fitting to the resulted data. The resulting function matches the algorithm and

the data as best as possible. An extensive evaluation set is used to validate the findings. Both real and synthetic test

cases were employed in different scenarios. The test results are consistent and show significant improvements

compared with traditional solutions.

Depth Video Enhancement Based on Weighted Mode Filtering

Discretization of Parametrizable Signal Manifolds

Design of Interpolation Functions for Subpixel-Accuracy Stereo-Vision Systems



EGC

1350

EGC

1351

I Transformation-invariant analysis of signals often requires the computation of the distance from a test pattern to a

transformation manifold. In particular, the estimation of the distances between a transformed query signal and several

transformation manifolds representing different classes provides essential information for the classification of the

signal. In many applications, the computation of the exact distance to the manifold is costly, whereas an efficient

practical solution is the approximation of the manifold distance with the aid of a manifold grid. In this paper, we consider

a setting with transformation manifolds of known parameterization. We first present an algorithm for the selection of

samples from a single manifold that permits to minimize the average error in the manifold distance estimation. Then we

propose a method for the joint discretization of multiple manifolds that represent different signal classes, where we

optimize the transformation-invariant classification accuracy yielded by the discrete manifold representation.

Experimental results show that sampling each manifold individually by minimizing the manifold distance estimation

error outperforms baseline sampling solutions with respect to registration and classification accuracy. Performing an

additional joint optimization on all samples improves the classification performance further. Moreover, given a fixed total

number of samples to be selected from all manifolds, an asymmetric distribution of samples to different manifolds

depending on their geometric structures may also increase the classification accuracy in comparison with the equal

distribution of samples.

Tracking low-resolution (LR) targets is a practical yet quite challenging problem in real video analysis applications. Lack

of discriminative details in the visual appearance of the LR target leads to the matching ambiguity, which confronts most

existing tracking methods. Although artificially enhancing the video resolution by super resolution (SR) techniques

before analyzing might be an option, the high demand of computational cost can hardly meet the requirements of the

tracking scenario. This paper presents a novel solution to track LR targets without explicitly performing SR. This new

approach is based on discriminative metric preservation that preserves the data affinity structure in the high-resolution

(HR) feature space for effective and efficient matching of LR images. In addition, we substantialize this new approach in

a solid case study of differential tracking under metric preservation and derive a closed-form solution to motion

estimation for LR video. In addition, this paper extends the basic linear metric preservation method to a more powerful

nonlinear kernel metric preservation method. Such a solution to LR target tracking is discriminative, robust, and

efficient. Extensive experiments validate the entrustments and effectiveness of the proposed approach and demonstrate

the improved performance of the proposed method in tracking LR targets.

For economic reasons, most digital cameras use color filter arrays instead of beam splitters to capture image data. As a

result of this, only one of the required three color samples becomes available at each pixel location and the other two

need to be interpolated. This process is called Color Filter Array (CFA) interpolation or demosaicing. Many demosaicing

Discriminative Metric Preservation for Tracking Low-Resolution Targets

Edge Strength Filter Based Color Filter Array Interpolation



EGC

1352

EGC

1353

EGC

1354

algorithms have been introduced over the years to improve subjective and objective interpolation quality. We propose

an orientation-free edge strength filter and apply it to the demosaicing problem. Edge strength filter output is utilized

both to improve the initial green channel interpolation and to apply the constant color difference rule adaptively. This

simple edge directed method yields visually pleasing results with high CPSNR.

We propose an incremental self-tuning particle filtering (ISPF) framework for visual tracking on the affine group, which

can find the optimal state in a chainlike way with a very small number of particles. Unlike traditional particle filtering,

which only relies on random sampling for state optimization, ISPF incrementally draws particles and utilizes an online-

learned pose estimator (PE) to iteratively tune them to their neighboring best states according to some feedback

appearance-similarity scores. Sampling is terminated if the maximum similarity of all tuned particles satisfies a target-

patch similarity distribution modeled online or if the permitted maximum number of particles is reached. With the help of

the learned PE and some appearance-similarity feedback scores, particles in ISPF become ―smart‖ and can

automatically move toward the correct directions; thus, sparse sampling is possible. The optimal state can be efficiently

found in a step-by-step way in which some particles serve as bridge nodes to help others to reach the optimal state. In

addition to the single-target scenario, the ―smart‖ particle idea is also extended into a multitarget tracking problem.

Experimental results demonstrate that our ISPF can achieve great robustness and very high accuracy with only a very

small number of particles.

We present a novel method to perform an accurate registration of 3-D nonrigid bodies by using phase-shift properties of

the dual-tree complex wavelet transform (DT-BBCWT). Since the phases of DT-BBCWT coefficients change

approximately linearly with the amount of feature displacement in the spatial domain, motion can be estimated using the

phase information from these coefficients. The motion estimation is performed iteratively: first by using coarser level

complex coefficients to determine large motion components and then by employing finer level coefficients to refine the

motion field. We use a parametric affine model to describe the motion, where the affine parameters are found locally by

substituting into an optical flow model and by solving the resulting over determined set of equations. From the

estimated affine parameters, the motion field between the sensed and the reference data sets can be generated, and the

sensed data set then can be shifted and interpolated spatially to align with the reference data feature displacement set.

Efficient Object Tracking by Incremental Self-Tuning Particle Filtering on the Affine Group

Efficient Registration of Nonrigid 3-D Bodies

Eye-Tracking Database for a Set of Standard Video Sequences



EGC

1355

EGC

1356

This correspondence describes a publicly available database of eye-tracking data, collected on a set of standard video

sequences that are frequently used in video compression, processing, and transmission simulations. A unique feature

of this database is that it contains eye-tracking data for both the first and second viewings of the sequence. We have

made available the uncompressed video sequences and the raw eye-tracking data for each sequence, along with

different visualizations of the data and a preliminary analysis based on two well-known visual attention models.

In this paper, we propose a fast algorithm that efficiently selects the temporal prediction type for the dyadic hierarchical-

B prediction structure in the H.264/MPEG-4 temporal scalable video coding (SVC). We make use of the strong

correlations in prediction type inheritance to eliminate the superfluous computations for the bi-directional (BI) prediction

in the finer partitions, 16 × 8/8 × 16/8 × 8, by referring to the best temporal prediction type of 16 × 16. In addition, we

carefully examine the relationship in motion bit-rate costs and distortions between the BI and the uni-directional

temporal prediction types. As a result, we construct a set of adaptive thresholds to remove the unnecessary BI

calculations. Moreover, for the block partitions smaller than 8 × 8, either the forward prediction (FW) or the backward

prediction (BW) is skipped based upon the information of their 8 × 8 partitions. Hence, the proposed schemes can

efficiently reduce the extensive computational burden in calculating the BI prediction. As compared to the JSVM 9.11

software, our method saves the encoding time from 48% to 67% for a large variety of test videos over a wide range of

coding bit-rates and has only a minor coding performance loss.

Difference images quantify changes in the object scene over time. In this paper, we use the feature-specific imaging

paradigm to present methods for estimating a sequence of difference images from a sequence of compressive

measurements of the object scene. Our goal is twofold. First is to design, where possible, the optimal sensing matrix for

taking compressive measurements. In scenarios where such sensing matrices are not tractable, we consider plausible

candidate sensing matrices that either use the available a priori information or are nonadaptive. Second, we develop

closed-form and iterative techniques for estimating the difference images. We specifically look at l2 - and l1 -based

methods. We show that l2-based techniques can directly estimate the difference image from the measurements without

first reconstructing the object scene. This direct estimation exploits the spatial and temporal correlations between the

object scene at two consecutive time instants. We further develop a method to estimate a generalized difference image

from multiple measurements and use it to estimate the sequence of difference images. For l1-based estimation, we

consider modified forms of the total-variation method and basis pursuit denoising. We also look at a third method that

directly exploits the sparsity of the difference image. We present results to show the efficacy of these techniques and

discuss the advantages of each.

Fast Bi-Directional Prediction Selection in H.264/MPEG-4 AVC Temporal Scalable Video

Coding

Feature-Specific Difference Imaging



EGC

1358

EGC

1359

EGC

1357

We consider novel phylogenetic models with rate matrices that arise via the embedding of a progenitor model on a small

number of character states, into a target model on a larger number of character states. Adapting representation-

theoretic results from recent investigations of Markov invariants for the general rate matrix model, we give a prescription

for identifying and counting Markov invariants for such "symmetric embedded‖ models, and we provide enumerations of

these for the first few cases with a small number of character states. The simplest example is a target model on three

states, constructed from a general 2 state model; the "2 hookrightarrow 3‖ embedding. We show that for 2 taxa, there

exist two invariants of quadratic degree that can be used to directly infer pairwise distances from observed sequences

under this model. A simple simulation study verifies their theoretical expected values, and suggests that, given the

appropriateness of the model class, they have superior statistical properties than the standard (log) Det invariant (which

is of cubic degree for this case).

Synthetic aperture radar (SAR) imaging suffers from image focus degradation in the presence of phase errors in the

received signal due to unknown platform motion or signal propagation delays. We present a new autofocus algorithm,

termed Fourier-domain multichannel autofocus (FMCA) that is derived under a linear algebraic framework, allowing the

SAR image to be focused in a noniterative fashion. Motivated by the mutichannel autofocus (MCA) approach, the

proposed autofocus algorithm invokes the assumption of a low-return region, which generally is provided within the

antenna sidelobes. Unlike MCA, FMCA works with the collected polar Fourier data directly and is capable of

accommodating wide-angle monostatic SAR and bistatic SAR scenarios. Most previous SAR autofocus algorithms rely

on the prior assumption that radar's range of look angles is small so that the phase errors can be modeled as varying

along only one dimension in the collected Fourier data. And, in some cases, implicit assumptions are made regarding

the SAR scene. Performance of such autofocus algorithms degrades if the assumptions are not satisfied. The proposed

algorithm has the advantage that it does not require prior assumptions about the range of look angles, nor

characteristics of the scene.

Fourier-Domain Multichannel Autofocus for Synthetic Aperture Radar

Framelet-Based Blind Motion Deblurring From a Single Image

Markov Invariants for Phylogenetic Rate Matrices Derived from Embedded Submodels



EGC

1361

EGC

1360

How to recover a clear image from a single motion-blurred image has long been a challenging open problem in digital

imaging. In this paper, we focus on how to recover a motion-blurred image due to camera shake. A regularization-based

approach is proposed to remove motion blurring from the image by regularizing the sparsity of both the original image

and the motion-blur kernel under tight wavelet frame systems. Furthermore, an adapted version of the split Bregman

method is proposed to efficiently solve the resulting minimization problem. The experiments on both synthesized

images and real images show that our algorithm can effectively remove complex motion blurring from natural images

without requiring any prior information of the motion-blur kernel.

A single captured image of a real-world scene is usually insufficient to reveal all the details due to under- or over-

exposed regions. To solve this problem, images of the same scene can be first captured under different exposure

settings and then combined into a single image using image fusion techniques. In this paper, we propose a novel

probabilistic model-based fusion technique for multi-exposure images. Unlike previous multi-exposure fusion methods,

our method aims to achieve an optimal balance between two quality measures, i.e., local contrast and color consistency,

while combining the scene details revealed under different exposures. A generalized random walks framework is

proposed to calculate a globally optimal solution subject to the two quality measures by formulating the fusion problem

as probability estimation. Experiments demonstrate that our algorithm generates high-quality images at low

computational cost. Comparisons with a number of other techniques show that our method generates better results in

most cases.

A major problem in imaging applications such as magnetic resonance imaging and synthetic aperture radar is the task

of trying to reconstruct an image with the smallest possible set of Fourier samples, every single one of which has a

potential time and/or power cost. The theory of compressive sensing (CS) points to ways of exploiting inherent sparsity

in such images in order to achieve accurate recovery using sub-Nyquist sampling schemes. Traditional CS approaches

to this problem consist of solving total-variation (TV) minimization programs with Fourier measurement constraints or

other variations thereof. This paper takes a different approach. Since the horizontal and vertical differences of a medical

image are each more sparse or compressible than the corresponding TV image, CS methods will be more successful in

recovering these differences individually. We develop an algorithm called GradientRec that uses a CS algorithm to

recover the horizontal and vertical gradients and then estimates the original image from these gradients. We present two

methods of solving the latter inverse problem, i.e., one based on least-square optimization and the other based on a

generalized Poisson solver. After a thorough derivation of our complete algorithm, we present the results of various

experiments that compare the effectiveness of the proposed method against other leading methods.

Gradient-Based Image Recovery Methods from Incomplete Fourier Measurements

Generalized Random Walks for Fusion of Multi-Exposure Images



EGC

1362

EGC

1364

EGC

1363

We present no quadratic Hessian-based regularization methods that can be effectively used for image restoration

problems in a variational framework. Motivated by the great success of the total-variation (TV) functional, we extend it to

also include second-order differential operators. Specifically, we derive second-order regularizers that involve matrix

norms of the Hessian operator. The definition of these functionals is based on an alternative interpretation of TV that

relies on mixed norms of directional derivatives. We show that the resulting regularizers retain some of the most

favorable properties of TV, i.e., convexity, homogeneity, rotation, and translation invariance, while dealing effectively

with the staircase effect. We further develop an efficient minimization scheme for the corresponding objective functions.

The proposed algorithm is of the iteratively reweighted least-square type and results from a majorization-minimization

approach. It relies on a problem-specific preconditioned conjugate gradient method, which makes the overall

minimization scheme very attractive since it can be applied effectively to large images in a reasonable computational

time. We validate the overall proposed regularization framework through deblurring experiments under additive

Gaussian noise on standard and biomedical images.

Computers are developing along with a new trend from the dual-core and quad-core processors to ones with tens or

even hundreds of cores. Multimedia, as one of the most important applications in computers, has an urgent need to

design parallel coding algorithms for compression. Taking intraframe/image coding as a start point, this paper proposes

a pure line-by-line coding scheme (LBLC) to meet the need. In LBLC, an input image is processed line by line

sequentially, and each line is divided into small fixed-length segments. The compression of all segments from prediction

to entropy coding is completely independent and concurrent at many cores. Results on a general-purpose computer

show that our scheme can get a 13.9 times speedup with 15 cores at the encoder and a 10.3 times speedup at the

decoder. Ideally, such near-linear speeding relation with the number of cores can be kept for more than 100 cores. In

addition to the high parallelism, the proposed scheme can perform comparatively or even better than the H.264 high

profile above middle bit rates. At near-lossless coding, it outperforms H.264 more than 10 dB. At lossless coding, up to

14% bit-rate reduction is observed compared with H.264 lossless coding at the high 4:4:4 profile.

In this paper, we deal with a problem of separating the effect of reflection from images captured behind glass. The input

consists of multiple polarized images captured from the same view point but with different polarizer angles. The output

is the high quality separation of the reflection layer and the background layer from the images. We formulate this

Hessian-Based Norm Regularization for Image Restoration with Biomedical Applications

High-Quality Reflection Separation Using Polarized Images

Highly Parallel Line-Based Image Coding for Many Cores



EGC

1365

EGC

1366

problem as a constrained optimization problem and propose a framework that allows us to fully exploit the mutually

exclusive image information in our input data. We test our approach on various images and demonstrate that our

approach can generate good reflection separation results.

Histograms have been widely used for feature representation in image and video content analysis. However, due to the

orderless nature of the summarization process, histograms generally lack spatial information. This may degrade their

discrimination capability in visual classification tasks. Although there have been several research attempts to encode

spatial context into histograms, how to extend the encodings to higher order spatial context is still an open problem. In

this paper,we propose a general histogram contextualization method to encode efficiently higher order spatial context.

The method is based on the cooccurrence of local visual homogeneity patterns and hence is able to generate more

discriminative histogram representations while remaining compact and robust. Moreover, we also investigate how to

extend the histogram contextualization to multiple modalities of context. It is shown that the proposed method can be

naturally extended to combine both temporal and spatial context and facilitate video content analysis. In addition, a

method to combine cross-feature context with spatial context via the technique of random forest is also introduced in

this paper. Comprehensive experiments on face image classification and human activity recognition tasks demonstrate

the superiority of the proposed histogram contextualization method compared with the existing encoding methods.

In this paper, we propose a new patch distribution feature (PDF) (i.e., referred to as Gabor-PDF) for human gait

recognition. We represent each gait energy image (GEI) as a set of local augmented Gabor features, which concatenate

the Gabor features extracted from different scales and different orientations together with the X-Y coordinates. We learn

a global Gaussian mixture model (GMM) (i.e., referred to as the universal background model) with the local augmented

Gabor features from all the gallery GEIs; then, each gallery or probe GEI is further expressed as the normalized

parameters of an image-specific GMM adapted from the global GMM. Observing that one video is naturally represented

as a group of GEIs, we also propose a new classification method called locality-constrained group sparse representation

(LGSR) to classify each probe video by minimizing the weighted l1, 2 mixed-norm-regularized reconstruction errors with

respect to the gallery videos. In contrast to the standard group sparse representation method that is a special case of

LGSR, the group sparsity and local smooth sparsity constraints are both enforced in LGSR. Our comprehensive

experiments on the benchmark USF HumanID database demonstrate the effectiveness of the newly proposed feature

Gabor-PDF and the new classification method LGSR for human gait recognition. Moreover, LGSR using the new feature

Histogram Contextualization

Human Gait Recognition Using Patch Distribution Feature and Locality-Constrained

Group Sparse Representation



EGC

1368

EGC

1369

EGC

1367

Gabor-PDF achieves the best average Rank-1 and Rank-5 recognition rates on this database among all gait recognition

algorithms proposed to date.

Connections in image processing are an important notion that describes how pixels can be grouped together according

to their spatial relationships and/or their gray-level values. In recent years, several works were devoted to the

development of new theories of connections among which hyper connection (h-connection) is a very promising notion.

This paper addresses two major issues of this theory. First, we propose a new axiomatic that ensures that every h-

connection generates decompositions that are consistent for image processing and, more precisely, for the design of h-

connected filters. Second, we develop a general framework to represent the decomposition of an image into h-

connections as a tree that corresponds to the generalization of the connected component tree. Such trees are indeed an

efficient and intuitive way to design attribute filters or to perform detection tasks based on qualitative or quantitative

attributes. These theoretical developments are applied to a particular fuzzy h-connection, and we test this new

framework on several classical applications in image processing, i.e., segmentation, connected filtering, and document

image binarization. The experiments confirm the suitability of the proposed approach: It is robust to noise, and it

provides an efficient framework to design selective filters.

We present a novel approach using distributed source coding for image authentication. The key idea is to provide a

Slepian-Wolf encoded quantized image projection as authentication data. This version can be correctly decoded with the

help of an authentic image as side information. Distributed source coding provides the desired robustness against

legitimate variations while detecting illegitimate modification. The decoder incorporating expectation maximization

algorithms can authenticate images which have undergone contrast, brightness, and affine warping adjustments. Our

authentication system also offers tampering localization by using the sum-product algorithm.

This paper proposes a new image deconvolution method using multi-stage convex relaxation, and presents a metric for

perceptual evaluation of deconvolution results. Recent work in image deconvolution addresses the deconvolution

problem via minimization with non-convex regularization. Since all regularization terms in the objective function are non-

convex, this problem can be well modeled and solved by multi-stage convex relaxation. This method, adopted from

machine learning, iteratively refines the convex relaxation formulation using concave duality. The newly proposed

Image Authentication Using Distributed Source Coding

Image Deconvolution with Multi-Stage Convex Relaxation and Its Perceptual Evaluation

Hyper connections and Hierarchical Representations for Grayscale and Multiband Image

Processing



EGC

1371

EGC

1370

deconvolution method has outstanding performance in noise removal and artifact control. A new metric, transduced

contrast-to-distortion ratio (TCDR), is proposed based on a human vision system (HVS) model that simulates human

responses to visual contrasts. It is sensitive to ringing and boundary artifacts, and very efficient to compute. We

conduct comprehensive perceptual evaluation of image deconvolution using visual signal-to-noise ratio (VSNR) and

TCDR. Experimental results of both synthetic and real data demonstrate that our method indeed improves the visual

quality of deconvolution results with low distortions and artifacts.

The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-

based image interpolation algorithms have been proposed in the literature, in which the objective functions are

optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable

properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To

address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear

regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least

squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid

over fitting, we incorporate the ℓ2-norm as the estimator complexity penalty. Moreover, motivated by recent progress on

manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both

measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal

probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The

optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem.

Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive

performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation.

A full-reference image quality assessment (IQA) model by multiscale visual gradient similarity (VGS) is presented. The

VGS model adopts a three-stage approach: First, global contrast registration for each scale is applied. Then, point wise

comparison is given by multiplying the similarity of gradient direction with the similarity of gradient magnitude. Third,

intrascale pooling is applied, followed by interscale pooling. Several properties of human visual systems on image

gradient have been explored and incorporated into the VGS model. It has been found that Stevens' power law is also

suitable for gradient magnitude. Other factors such as quality uniformity, visual detection threshold of gradient and

visual frequency sensitivity also affect subjective image quality. The optimal values of two parameters of VGS are

trained with existing IQA databases, and good performance of VGS has been verified by cross validation. Experimental

results show that VGS is competitive with state-of-the-art metrics in terms of prediction precision, reliability, simplicity,

and low computational cost.

Image Quality Assessment by Visual Gradient Similarity

Image Interpolation via Regularized Local Linear Regression



EGC

1372

EGC

1374

EGC

1373

We investigate the problem of averaging values on lattices and, in particular, on discrete product lattices. This problem

arises in image processing when several color values given in RGB, HSL, or another coding scheme need to be

combined. We show how the arithmetic mean and the median can be constructed by minimizing appropriate penalties,

and we discuss which of them coincide with the Cartesian product of the standard mean and the median. We apply these

functions in image processing. We present three algorithms for color image reduction based on minimizing penalty

functions on discrete product lattices.

Active contour models (ACMs) integrated with various kinds of external force fields to pull the contours to the exact

boundaries have shown their powerful abilities in object segmentation. However, local minimum problems still exist

within these models, particularly the vector field's ―equilibrium issues.‖ Different from tradit

field in view of dynamical systems. An interpolated swirling and attracting flow (ISAF) vector field is first gener

-

. Meanwhile, the periods of limit cycles are determined. Consequently, the

objects' boundaries are represented by integral equations with the corresponding converged states and periods.

Experiments and comparisons with some traditional external force field methods are done to exhibit the superiority of

the proposed method in cases of complex concave boundary segmentation, multiple-object segmentation, and

initialization flexibility. In addition, it is more computationally efficient than traditional ACMs by solving the problem in

some lower dimensional subspace without using level-set methods.

When using motion fields to interpolate between two consecutive images in an image sequence, a major problem is to

handle occlusions and disclusions properly. However, in most cases, one of both images contains the information that

is either discluded or occluded; if the first image contains the information (i.e., the region will be occluded), forward

interpolation shall be employed, while for information that is contained in the second image (i.e., the region will be

discluded), one should use backward interpolation. Hence, we propose to improve an existing approach for image

sequence interpolation by incorporating an automatic segmentation in the process, which decides in which region of the

image forward or backward interpolation shall be used. Our approach is a combination of the optimal transport

approach to image sequence interpolation and the segmentation by the Chan-Vese approach. We propose to solve the

Image Reduction Using Means on Discrete Product Lattices

Image Sequence Interpolation Based on Optical Flow, Segmentation, and Optimal Control

Image Segmentation Based on the Poincaré Map Method



EGC

1375

EGC

1376

EGC

1377

resulting optimality condition by a segregation loop, combined with a level set approach. We provide examples that

illustrate the performance both in the interpolation error and in the human perception.

Compressed sensing (CS) is a new information sampling theory for acquiring sparse or compressible data with much

fewer measurements than those otherwise required by the Nyquist/Shannon counterpart. This is particularly important

for some imaging applications such as magnetic resonance imaging or in astronomy. However, in the existing CS

formulation, the use of the l2 norm on the residuals is not particularly efficient when the noise is impulsive. This could

lead to an increase in the upper bound of the recovery error. To address this problem, we consider a robust formulation

for CS to suppress outliers in the residuals. We propose an iterative algorithm for solving the robust CS problem that

exploits the power of existing CS solvers. We also show that the upper bound on the recovery error in the case of non-

Gaussian noise is reduced and then demonstrate the efficacy of the method through numerical studies.

In this paper, we propose a method to exploit segmentation information for elastic image registration using a Markov-

random-field (MRF)-based objective function. MRFs are suitable for discrete labeling problems, and the labels are

defined as the joint occurrence of displacement fields (for registration) and segmentation class probability. The data

penalty is a combination of the image intensity (or gradient information) and the mutual dependence of registration and

segmentation information. The smoothness is a function of the interaction between the defined labels. Since both terms

are a function of registration and segmentation labels, the overall objective function captures their mutual dependence.

A multiscale graph-cut approach is used to achieve subpixel registration and reduce the computation time. The user

defines the object to be registered in the floating image, which is rigidly registered before applying our method. We test

our method on synthetic image data sets with known levels of added noise and simulated deformations, and also on

natural and medical images. Compared with other registration methods not using segmentation information, our

proposed method exhibits greater robustness to noise and improved registration accuracy.

Multiple description coding has been receiving attention as a robust transmission framework for multimedia services.

This paper studies the iterative decoding of FEC-based multiple description codes. The proposed decoding algorithms

take advantage of the error detection capability of Reed-Solomon (RS) erasure codes. The information of correctly

decoded RS codewords is exploited to enhance the error correction capability of the Viterbi algorithm at the next

iteration of decoding. In the proposed algorithm, an intradescription interleaver is synergistically combined with the

iterative decoder. The interleaver does not affect the performance of noniterative decoding but greatly enhances the

Improved Image Recovery from Compressed Data Contaminated With Impulsive Noise

Integrating Segmentation Information for Improved MRF-Based Elastic Image

Registration

Iterative Channel Decoding of FEC-Based Multiple-Description Codes



EGC

1378

EGC

1379

performance when the system is iteratively decoded. We also address the optimal allocation of RS parity symbols for

unequal error protection. For the optimal allocation in iterative decoding, we derive mathematical equations from which

the probability distributions of description erasures can be generated in a simple way. The performance of the algorithm

is evaluated over an orthogonal frequency-division multiplexing system. The results show that the performance of the

multiple description codes is significantly enhanced.

In this paper, an iterative narrow-band-based graph cuts (INBBGC) method is proposed to optimize the geodesic active

contours with region forces (GACWRF) model for interactive object segmentation. Based on cut metric on graphs

proposed by Boykov and Kolmogorov, an NBBGC method is devised to compute the local minimization of GAC. An

extension to an iterative manner, namely, INBBGC, is developed for less sensitivity to the initial curve. The INBBGC

method is similar to graph-cuts-based active contour (GCBAC) presented by Xu, and their differences have been

analyzed and discussed. We then integrate the region force into GAC. An improved INBBGC (IINBBGC) method is

proposed to optimize the GACWRF model, thus can effectively deal with the concave region and complicated real-world

images segmentation. Two region force models such as mean and probability models are studied. Therefore, the

GCBAC method can be regarded as the special case of our proposed IINBBGC method without region force. Our

proposed algorithm has been also analyzed to be similar to the Grabcut method when the Gaussian mixture model

region force is adopted, and the band region is extended to the whole image. Thus, our proposed IINBBGC method can

be regarded as narrow-band-based Grabcut method or GCBAC with region force method. We apply our proposed

IINBBGC algorithm on synthetic and real-world images to emphasize its performance, compared with other

segmentation methods, such as GCBAC and Grabcut methods.

The neighbor-embedding (NE) algorithm for single-image super-resolution (SR) reconstruction assumes that the feature

spaces of low-resolution (LR) and high-resolution (HR) patches are locally isometric. However, this is not true for SR

because of one-to-many mappings between LR and HR patches. To overcome or at least to reduce the problem for NE-

based SR reconstruction, we apply a joint learning technique to train two projection matrices simultaneously and to map

the original LR and HR feature spaces onto a unified feature subspace. Subsequently, the k -nearest neighbor selection

of the input LR image patches is conducted in the unified feature subspace to estimate the reconstruction weights. To

handle a large number of samples, joint learning locally exploits a coupled constraint by linking the LR-HR counterparts

together with the K-nearest grouping patch pairs. In order to refine further the initial SR estimate, we impose a global

reconstruction constraint on the SR outcome based on the maximum a posteriori framework. Preliminary experiments

suggest that the proposed algorithm outperforms NE-related baselines.

Iterative Narrowband-Based Graph Cuts Optimization for Geodesic Active Contours with

Region Forces (GACWRF)

Joint Learning for Single-Image Super-Resolution via a Coupled Constraint



EGC

1380

EGC

1382

EGC

1381

Compressive sensing (CS) is an emerging approach for the acquisition of signals having a sparse or compressible

representation in some basis. While the CS literature has mostly focused on problems involving 1-D signals and 2-D

images, many important applications involve multidimensional signals; the construction of sparsifying bases and

measurement systems for such signals is complicated by their higher dimensionality. In this paper, we propose the use

of Kronecker product matrices in CS for two purposes. First, such matrices can act as sparsifying bases that jointly

model the structure present in all of the signal dimensions. Second, such matrices can represent the measurement

protocols used in distributed settings. Our formulation enables the derivation of analytical bounds for the sparse

approximation of multidimensional signals and CS recovery performance, as well as a means of evaluating novel

distributed measurement schemes.

This paper proposes a low-distortion transform for prediction-error expansion reversible watermarking. The transform

is derived by taking a simple linear predictor and by embedding the expanded prediction error not only into the current

pixel but also into its prediction context. The embedding ensures the minimization of the square error introduced by the

watermarking. The proposed transform introduces less distortion than the classical prediction-error expansion for

complex predictors such as the median edge detector or the gradient-adjusted predictor. Reversible watermarking

algorithms based on the proposed transform are analyzed. Experimental results are provided.

In this paper, we propose a low-complexity video coding scheme based upon 2-D singular value decomposition (2-D

SVD), which exploits basic temporal correlation in visual signals without resorting to motion estimation (ME). By

exploring the energy compaction property of 2-D SVD coefficient matrices, high coding efficiency is achieved. The

proposed scheme is for the better compromise of computational complexity and temporal redundancy reduction, i.e.,

compared with the existing video coding methods. In addition, the problems caused by frame decoding dependence in

hybrid video coding, such as unavailability of random access, are avoided. The comparison of the proposed 2-D SVD

coding scheme with the existing relevant non-ME-based low-complexity codecs shows its advantages and potential in

applications.

Kronecker Compressive Sensing

Low-Complexity Video Coding Based on Two-Dimensional Singular Value Decomposition

Low Distortion Transform for Reversible Watermarking



EGC

1383

EGC

1384

EGC

1385

Hysteresis thresholding is a method that offers enhanced object detection. Due to its recursive nature, it is time

consuming and requires a lot of memory resources. This makes it avoided in streaming processors with limited memory.

We propose two versions of a memory-efficient and fast architecture for hysteresis thresholding: a high-accuracy pixel-

based architecture and a faster block-based one at the expense of some loss in the accuracy. Both designs couple

thresholding with connected component analysis and feature extraction in a single pass over the image. Unlike queue-

based techniques, the proposed scheme treats candidate pixels almost as foreground until objects complete; a decision

is then made to keep or discard these pixels. This allows processing on the fly, thus avoiding additional passes for

handling candidate pixels and extracting object features. Moreover, labels are reused so only one row of compact labels

is buffered. Both architectures are implemented in MATLAB and VHDL. Simulation results on a set of real and synthetic

images show that the execution speed can attain an average increase up to for the pixel-based and for the block-based

when compared to state-of-the-art techniques. The memory requirements are also drastically reduced by about 99%

In compressive sensing (CS), a challenge is to find a space in which the signal is sparse and, hence, faithfully

recoverable. Since many natural signals such as images have locally varying statistics, the sparse space varies in

time/spatial domain. As such, CS recovery should be conducted in locally adaptive signal-dependent spaces to counter

the fact that the CS measurements are global and irrespective of signal structures. On the contrary, existing CS

reconstruction methods use a fixed set of bases (e.g., wavelets, DCT, and gradient spaces) for the entirety of a signal. To

rectify this problem, we propose a new framework for model-guided adaptive recovery of compressive sensing (MARX)

and show how a 2-D piecewise autoregressive model can be integrated into the MARX framework to make CS recovery

adaptive to spatially varying second order statistics of an image. In addition, MARX offers a mechanism of

characterizing and exploiting structured sparsities of natural images, greatly restricting the CS solution space.

Simulation results over a wide range of natural images show that the proposed MARX technique can improve the

reconstruction quality of existing CS methods by 2-7 dB.

In this paper, we propose to represent an image as a local descriptor tensor and use a multilinear supervised

neighborhood embedding (MSNE) for discriminant feature extraction, which is able to be used for subject or scene

recognition. The contributions of this paper include: (1) a novel feature extraction approach denoted as the histogram of

orientation weighted with a normalized gradient (NHOG) for local region representation, which is robust to large

Memory-Efficient Architecture for Hysteresis Thresholding and Object Feature Extraction

Model-Assisted Adaptive Recovery of Compressed Sensing with Imaging Applications

Multilinear Supervised Neighborhood Embedding of a Local Descriptor Tensor for

Scene/Object Recognition



EGC

1386

EGC

1387

EGC

1387

illumination variation in an image; (2) an image representation framework denoted as the local descriptor tensor, which

can effectively combine a moderate amount of local features together for image representation and be more efficient

than the popular existing bag-of-feature model; and (3) an MSNE analysis algorithm, which can directly deal with the

local descriptor tensor for extracting discriminant and compact features and, at the same time, preserve neighborhood

structure in tensor-feature space for subject/scene recognition. We demonstrate the performance advantages of our

proposed approach over existing techniques on different types of benchmark database such as a scene data set (i.e.,

OT8), face data sets (i.e., YALE and PIE), and view-based object data sets (COIL-100 and ETH-80).

Motivated This study reviews the state-of-the-art multiobjective optimization (MOO) techniques with metaheuristic

through clustering approaches developed specifically for image segmentation problems. The authors treat image

segmentation as a real-life problem with multiple objectives; thus, focusing on MOO methods that allow a trade-off

among multiple objectives. A reasonable solution to a multiobjective (MO) problem is to investigate a set of solutions,

each of which satisfies the objectives at an acceptable level without being dominated by any other solution. The primary

difference of MOO methods from traditional image segmentation is that instead of a single solution, their output is a set

of solutions called Pareto-optimal solution. This study discusses the evolutionary and non-evolutionary MO clustering

techniques for image segmentation. It diagnoses the requirements and issues for modelling MOO via MO clustering

technique. In addition, the potential challenges and the directions for future research are presented.

Multiple descriptions (MD) coding has been a popular choice for robust data transmission over the unreliable network

channels. Lattice vector quantization provides lower computation for efficient data compression. In this paper, a new MD

coinciding lattice vector quantizer (MDCLVQ) is presented. The design of the quantizer is based on coinciding 2-D

hexagonal sublattices. The coinciding sublattices are geometrically similar sublattices, with the same index but

generated by different generator matrices. A novel labeling algorithm based on the hexagonal coinciding sublattices is

also developed. Performance results of the MDCLVQ scheme, together with the new labeling algorithm applied to

standard test images, show improvements of the central and side decoders, as compared with the renowned techniques

for several test images.

Multiobjective clustering with metaheuristic: current trends and methods in image

segmentation

Multiple Descriptions Coinciding Lattice Vector Quantizer for Wavelet Image Coding

Multiscale Semi local Interpolation with Antialiasing



EGC

8290

EGC

1388

EGC

1389

Aliasing is a common artifact in low-resolution (LR) images generated by a downsampling process. Recovering the

original high-resolution image from its LR counterpart while at the same time removing the aliasing artifacts is a

challenging image interpolation problem. Since a natural image normally contains redundant similar patches, the values

of missing pixels can be available at texture-relevant LR pixels. Based on this, we propose an iterative multiscale

semilocal interpolation method that can effectively address the aliasing problem. The proposed method estimates each

missing pixel from a set of texture-relevant semilocal LR pixels with the texture similarity iteratively measured from a

sequence of patches of varying sizes. Specifically, in each iteration, top texture-relevant LR pixels are used to construct

a data fidelity term in a maximum a posteriori estimation, and a bilateral total variation is used as the regularization term.

Experimental results compared with existing interpolation methods demonstrate that our method can not only

substantially alleviate the aliasing problem but also produce better results across a wide range of scenes both in terms

of quantitative evaluation and subjective visual quality.

Availability Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive,

incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate

dictionary for the data under test and also for image recovery. In the context of compressive sensing, significant

improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal

image expansions. The compressive-measurement projections are also optimized for the learned dictionary.

Additionally, we consider simpler (incomplete) measurements, defined by measuring a subset of image pixels, uniformly

selected at random. Spatial interrelationships within imagery are exploited through use of the Dirichlet and probit stick-

breaking processes. Several example results are presented, with comparisons to other methods in the literature.

Feature selection from gene expression microarray data is a widely used technique for selecting candidate genes in

There are two main issues that make nonrigid image registration a challenging task. First, voxel intensity similarity may

not be necessarily equivalent to anatomical similarity in the image correspondence searching process. Second, during

the imaging process, some interference such as unexpected rotations of input volumes and monotonic gray-level bias

fields can adversely affect the registration quality. In this paper, a new feature-based nonrigid image registration method

is proposed. The proposed method is based on a new type of image feature, namely, uniform spherical region descriptor

(USRD), as signatures for each voxel. The USRD is rotation and monotonic gray-level transformation invariant and can

be efficiently calculated. The registration process is therefore formulated as a feature matching problem. The USRD

feature is integrated with the Markov random field labeling framework in which energy function is defined for

registration. The energy function is then optimized by the α-expansion algorithm. The proposed method has been

compared with five state-of-the-art registration approaches on both the simulated and real 3-D databases obtained from

the Brain Web and Internet Brain Segmentation Repository, respectively. Experimental results demonstrate that the

proposed method can achieve high registration accuracy and reliable robustness behavior.

Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete

Images

Nonrigid Brain MR Image Registration Using Uniform Spherical Region Descriptor



EGC

1391

EGC

1390

Processing images for specific targets on a large scale has to handle various kinds of contents with regular processing

steps. To segment objects in one image, we utilized dual multiScalE Graylevel mOrphological open and close

recoNstructions (SEGON) to build a background (BG) gray-level variation mesh, which can help to identify BG and

object regions. It was developed from a macroscopic perspective on image BG gray levels and implemented using

standard procedures, thus robustly dealing with large-scale database images. The image segmentation capability of

existing methods can be exploited by the BG mesh to improve object segmentation accuracy. To evaluate the

segmentation accuracy, the probability of coherent segmentation labeling, i.e., the normalized probability random index

(PRI), between a computer-segmented image and the hand-labeled one is computed for comparisons. Content-based

image retrieval (CBIR) was carried out to evaluate the object segmentation capability in dealing with large-scale

database images. Retrieval precision-recall (PR) and rank performances, with and without SEGON, were compared. For

multi-instance retrieval with shape feature, AdaBoost was used to select salient common feature elements. For color

features, the histogram intersection between two scalable HSV descriptors was calculated, and the mean feature vector

was used for multi-instance retrieval. The distance measure for color feature can be adapted when both positive and

negative queries are provided. The normalized correlation coefficient of features among query samples was computed to

integrate the similarity ranks of different features in order to perform multi-instance with multifeature query. Experiments

showed that the proposed object segmentation method outperforms others by 21% in the PRI. Performing SEGON-

enabled CBIR on large-scale databases also improves on the PR performance reported elsewhere by up to 42% at a

recall rate of 0.5. The proposed object segmentation method - an be extended to extract other image features, and new

feature types can be incorporated into the algorithm to further improve the image retrieval performance.

The plenoptic function (POF) provides a powerful conceptual tool for describing a number of problems in image/video

processing, vision, and graphics. For example, image-based rendering is shown as sampling and interpolation of the

POF. In such applications, it is important to characterize the bandwidth of the POF. We study a simple but representative

model of the scene where band-limited signals (e.g., texture images) are ―painted‖ on smooth surfaces (e.g., of objects

or walls). We show that, in general, the POF is not band limited unless the surfaces are flat. We then derive simple rules

to estimate the essential bandwidth of the POF for this model. Our analysis reveals that, in addition to the maximum and

minimum depths and the maximum frequency of painted signals, the bandwidth of the POF also depends on the

maximum surface slope. With a unifying formalism based on multidimensional signal processing, we can verify several

key results in POF processing, such as induced filtering in space and depth-corrected interpolation, and quantify the

necessary sampling rates.

On the Bandwidth of the Plenoptic Function

Object Segmentation of Database Images by Dual Multiscale Morphological

Reconstructions and Retrieval Applications



EGC

1393

EGC

1392

EGC

1394

In this paper, we propose a novel outdoor scene image segmentation algorithm based on background recognition and

perceptual organization. We recognize the background objects such as the sky, the ground, and vegetation based on the

color and texture information. For the structurally challenging objects, which usually consist of multiple constituent

parts, we developed a perceptual organization model that can capture the no accidental structural relationships among

the constituent parts of the structured objects and, hence, group them together accordingly without depending on a

priori knowledge of the specific objects. Our experimental results show that our proposed method outperformed two

state-of-the-art image segmentation approaches on two challenging outdoor databases (Gould data set and Berkeley

segmentation data set) and achieved accurate segmentation quality on various outdoor natural scene environments.

A power-constrained contrast-enhancement algorithm for emissive displays based on histogram equalization (HE) is

proposed in this paper. We first propose a log-based histogram modification scheme to reduce overstretching artifacts

of the conventional HE technique. Then, we develop a power-consumption model for emissive displays and formulate an

objective function that consists of the histogram-equalizing term and the power term. By minimizing the objective

function based on the convex optimization theory, the proposed algorithm achieves contrast enhancement and power

saving simultaneously. Moreover, we extend the proposed algorithm to enhance video sequences, as well as still

images. Simulation results demonstrate that the proposed algorithm can reduce power consumption significantly while

improving image contrast and perceptual quality.

This paper presents designs for both bit-parallel (BP) and digit-serial (DS) precision-optimized implementations of the

discrete wavelet transform (DWT), with specific consideration given to the impact of depth (the number of levels of DWT)

on the overall computational accuracy. These methods thus allow customizing the precision of a multilevel DWT to a

given error tolerance requirement and ensuring an energy-minimal implementation, which increases the applicability of

DWT-based algorithms such as JPEG 2000 to energy-constrained platforms and environments. Additionally,

quantization of DWT coefficients to a specific target step size is performed as an inherent part of the DWT computation,

thereby eliminating the need to have a separate downstream quantization step in applications such as JPEG 2000.

Experimental measurements of design performance in terms of area, speed, and power for 90-nm complementary metal-

oxide-semiconductor implementation are presented. Results indicate that while BP designs exhibit inherent speed

advantages, DS designs require significantly fewer hardware resources with increasing precision and DWT level. A four-

level DWT with medium precision, for example, while the BP design is four times faster than the digital-serial design,

occupies twice the area. In addition to the BP and DS designs, a novel flexible DWT processor is presented, which

supports run-time configurable DWT parameters.

Outdoor Scene Image Segmentation Based on Background Recognition and Perceptual

Organization

Power-Constrained Contrast Enhancement for Emissive Displays Based on Histogram

Equalization

Precision-Aware Self-Quantizing Hardware Architectures for the Discrete Wavelet

Transform



EGC

1395

EGC

1397

EGC

1396

We propose a simple preconditioning method for accelerating the solution of edge-preserving image super-resolution

(SR) problems in which a linear shift-invariant point spread function is employed. Our technique involves reordering the

high-resolution (HR) pixels in a similar manner to what is done in preconditioning methods for quadratic SR

formulations. However, due to the edge preserving requirements, the Hessian matrix of the cost function varies during

the minimization process. We develop an efficient update scheme for the preconditioned in order to cope with this

situation. Unlike some other acceleration strategies that round the displacement values between the low-resolution (LR)

images on the HR grid, the proposed method does not sacrifice the optimality of the observation model. In addition, we

describe a technique for preconditioning SR problems involving rational magnification factors. The use of such factors

is motivated in part by the fact that, under certain circumstances, optimal SR zooms are nonintegers. We show that, by

reordering the pixels of the LR images, the structure of the problem to solve is modified in such a way that

preconditioners based on circulant operators can be used.

Transmitting video over wireless is a challenging problem since video may be seriously distorted due to packet errors

caused by wireless channels. The capability of predicting transmission distortion (i.e., video distortion caused by packet

errors) can assist in designing video encoding and transmission schemes that achieve maximum video quality or

minimum end-to-end video distortion. This paper is aimed at deriving formulas for predicting transmission distortion.

The contribution of this paper is twofold. First, we identify the governing law that describes how the transmission

distortion process evolves over time and analytically derive the transmission distortion formula as a closed-form

function of video frame statistics, channel error statistics, and system parameters. Second, we identify, for the first time,

two important properties of transmission distortion. The first property is that the clipping noise, which is produced by

nonlinear clipping, causes decay of propagated error. The second property is that the correlation between motion-vector

concealment error and propagated error is negative and has dominant impact on transmission distortion, compared with

other correlations. Due to these two properties and elegant error/distortion decomposition, our formula provides not

only more accurate prediction but also lower complexity than the existing methods

The luminance of a natural scene is often of high dynamic range (HDR). In this paper, we propose a new scheme to

handle HDR scenes by integrating locally adaptive scene detail capture and suppressing gradient reversals introduced

Preconditioning for Edge-Preserving Image Super Resolution

Prediction of Transmission Distortion for Wireless Video Communication: Analysis

Probabilistic Exposure Fusion



EGC

1399

EGC

1398

by the local adaptation. The proposed scheme is novel for capturing an HDR scene by using a standard dynamic range

(SDR) device and synthesizing an image suitable for SDR displays. In particular, we use an SDR capture device to record

scene details (i.e., the visible contrasts and the scene gradients) in a series of SDR images with different exposure

levels. Each SDR image responds to a fraction of the HDR and partially records scene details. With the captured SDR

image series, we first calculate the image luminance levels, which maximize the visible contrasts, and then the scene

gradients embedded in these images. Next, we synthesize an SDR image by using a probabilistic model that preserves

the calculated image luminance levels and suppresses reversals in the image luminance gradients. The synthesized

SDR image contains much more scene details than any of the captured SDR image. Moreover, the proposed scheme

also functions as the tone mapping of an HDR image to the SDR image, and it is superior to both global and local tone

mapping operators. This is because global operators fail to preserve visual details when the contrast ratio of a scene is

large, whereas local operators often produce halos in the synthesized SDR image. The proposed scheme does not

require any human interaction or parameter tuning for different scenes. Subjective evaluations have shown that it is

preferred over a number of existing approaches.

This paper proposes an efficient method to estimate the point spread function (PSF) of a blurred image using image

gradients spatial correlation. A patch-based image degradation model is proposed for estimating the sample covariance

matrix of the gradient domain natural image. Based on the fact that the gradients of clean natural images are

approximately uncorrelated to each other, we estimated the autocorrelation function of the PSF from the covariance

matrix of gradient domain blurred image using the proposed patch-based image degradation model. The PSF is

computed using a phase retrieval technique to remove the ambiguity introduced by the absence of the phase.

Experimental results show that the proposed method significantly reduces the computational burden in PSF estimation,

compared with existing methods, while giving comparable blurring kernel.

This paper presents a novel algorithm for matching image interest points. Potential interest points are identified by

searching for local peaks in Difference-of-Gaussian (DoG) images. We refine and assign rotation, scale and location for

each key point by using the SIFT algorithm. Pseudo log-polar sampling grid is then applied to properly scaled image

patches around each key point, and a weighted adaptive lifting scheme transform is designed for each ring of the log-

polar grid. The designed adaptive transform for a ring in the reference key point and the general non-adaptive transform

are applied to the corresponding ring in a test key point. Similarity measure is calculated by comparing the

corresponding transform domain coefficients of the adaptive and non-adaptive transforms. We refer to the proposed

versatile system of Rotation and Scale Invariant Matching as RASIM. Our experiments show that the accuracy of RASIM

PSF Estimation via Gradient Domain Correlation

RASIM: A Novel Rotation and Scale Invariant Matching of Local Image Interest Points



EGC

1401

EGC

1400

EGC

1402

is more than SIFT, which is the most widely used interest point matching algorithm in the literature. RASIM is also more

robust to image deformations while its computation time is comparable to SIFT.

The inefficiency of separable wavelets in representing smooth edges has led to a great interest in the study of new 2-D

transformations. The most popular criterion for analyzing these transformations is the approximation power.

Transformations with near-optimal approximation power are useful in many applications such as denoising and

enhancement. However, they are not necessarily good for compression. Therefore, most of the nearly optimal

transformations such as curvelets and contourlets have not found any application in image compression yet. One of the

most promising schemes for image compression is the elegant idea of directional wavelets (DIWs). While these

algorithms outperform the state-of-the-art image coders in practice, our theoretical understanding of them is very

limited. In this paper, we adopt the notion of rate-distortion and calculate the performance of the DIW on a class of edge-

like images. Our theoretical analysis shows that if the edges are not ―sharp,‖ the DIW will compress them more

efficiently than the separable wavelets. It also demonstrates the inefficiency of the quadtree partitioning that is often

used with the DIW. To solve this issue, we propose a new partitioning scheme called megaquad partitioning. Our

simulation results on real-world images confirm the benefits of the proposed partitioning algorithm, promised by our

theoretical analysis.

This paper presents a novel algorithm for matching image interest points. Potential interest points are identified by

searching for local peaks in Difference-of-Gaussian (DoG) images. We refine and assign rotation, scale and location for

each key point by using the SIFT algorithm. Pseudo log-polar sampling grid is then applied to properly scaled image

patches around each key point, and a weighted adaptive lifting scheme transform is designed for each ring of the log-

polar grid. The designed adaptive transform for a ring in the reference key point and the general non-adaptive transform

are applied to the corresponding ring in a test key point. Similarity measure is calculated by comparing the

corresponding transform domain coefficients of the adaptive and non-adaptive transforms. We refer to the proposed

versatile system of Rotation and Scale Invariant Matching as RASIM. Our experiments show that the accuracy of RASIM

is more than SIFT, which is the most widely used interest point matching algorithm in the literature. RASIM is also more

robust to image deformations while its computation time is comparable to SIFT

Rate-Distortion Analysis of Directional Wavelets

RASIM: A Novel Rotation and Scale Invariant Matching of Local Image Interest Points

Real-Time Affine Global Motion Estimation Using Phase Correlation and its Application

for Digital Image Stabilization



EGC

1403

EGC

1405

EGC

1404

We propose a fast and robust 2D-affine global motion estimation algorithm based on phase-correlation in the Fourier-

Mellin domain and robust least square model fitting of sparse motion vector field and its application for digital image

stabilization. Rotation-scale-translation (RST) approximation of affine parameters is obtained at the coarsest level of the

image pyramid, thus ensuring convergence for a much larger range of motions. Despite working at the coarsest

resolution level, using subpixel-accurate phase correlation provides sufficiently accurate coarse estimates for the

subsequent refinement stage of the algorithm. The refinement stage consists of RANSAC based robust least-square

model fitting for sparse motion vector field, estimated using block-based subpixel-accurate phase correlation at

randomly selected high activity regions in finest level of image pyramid. Resulting algorithm is very robust to outliers

such as foreground objects and flat regions. We investigate the robustness of the proposed method for digital image

stabilization application. Experimental results show that the proposed algorithm is capable of estimating larger range of

motions as compared to another phase correlation method and optical flow algorithm.

A focus profile having a steeper peak is more resistant to image noise in the autofocus (AF) process of a digital camera.

However, a focus profile of such shape normally has a flatter out-of-focus region on either side of the profile, resulting in

a slow AF process due to the lack of clue about where the lens should move when the lens is in such regions. To

address the problem, we provide a statistical analysis of the focus profile and show that a strictly monotonic

transformation of the focus profile preserves the accuracy of the AF. On the basis of this analysis, we propose a new

focus profile representation that transforms the focus profile to the reciprocal domain in which the reciprocal focus

profile is modeled by a polynomial function. This transformation makes the AF mathematically tractable and boosts the

search speed. Experimental results are shown to demonstrate the advantage of the proposed representation.

Soft-decision adaptive interpolation (SAI) provides a powerful framework for image interpolation. The robustness of SAI

can be further improved by using weighted least-squares estimation, instead of least-squares estimation in both of the

parameter estimation and data estimation steps. To address the mismatch issue of ―geometric duality‖ during parameter

estimation, the residuals (prediction errors) are weighted according to the geometric similarity between the pixel of

interest and the residuals. The robustness of data estimation can be improved by modeling the weights of residuals with

the well-known bilateral filter. Experimental results show that there is a 0.25-dB increase in peak signal-to-noise ratio

(PSNR) for a sample set of natural images after the suggested improvements are incorporated into the original SAI. The

proposed algorithm produces the highest quality in terms of PSNR and subjective quality among sophisticated

algorithms in the literature.

Reciprocal Focus Profile

Robust Soft-Decision Interpolation Using Weighted Least Squares

Robust Through-the-Wall Radar Image Classification Using a Target-Model Alignment

Procedure



EGC

1407

EGC

1406

A through-the-wall radar image (TWRI) bears little resemblance to the equivalent optical image, making it difficult to

interpret. To maximize the intelligence that may be obtained, it is desirable to automate the classification of targets in

the image to support human operators. This paper presents a technique for classifying stationary targets based on the

high-range resolution profile (HRRP) extracted from 3-D TWRIs. The dependence of the image on the target location is

discussed using a system point spread function (PSF) approach. It is shown that the position dependence will cause a

classifier to fail, unless the image to be classified is aligned to a classifier-training location. A target image alignment

technique based on deconvolution of the image with the system PSF is proposed. Comparison of the aligned target

images with measured images shows the alignment process introducing normalized mean squared error (NMSE) ≤ 9%.

The HRRP extracted from aligned target images are classified using a naive Bayesian classifier supported by principal

component analysis. The classifier is tested using a real TWRI of canonical targets behind a concrete wall and shown to

obtain correct classification rates ≥97%.

We study the problem of automatic ―reduced-reference‖ image quality assessment (QA) algorithms from the point of

view of image information change. Such changes are measured between the reference- and natural-image

approximations of the distorted image. Algorithms that measure differences between the entropies of wavelet

coefficients of reference and distorted images, as perceived by humans, are designed. The algorithms differ in the data

on which the entropy difference is calculated and on the amount of information from the reference that is required for

quality computation, ranging from almost full information to almost no information from the reference. A special case of

these is algorithms that require just a single number from the reference for QA. The algorithms are shown to correlate

very well with subjective quality scores, as demonstrated on the Laboratory for Image and Video Engineering Image

Quality Assessment Database and the Tampere Image Database. Performance degradation, as the amount of information

is reduced, is also studied.

This paper addresses the problem of detecting salient areas within natural images. We shall mainly study the problem

under unsupervised setting, i.e., saliency detection without learning from labeled images. A solution of multitask

sparsity pursuit is proposed to integrate multiple types of features for detecting saliency collaboratively. Given an image

described by multiple features, its saliency map is inferred by seeking the consistently sparse elements from the joint

decompositions of multiple-feature matrices into pairs of low-rank and sparse matrices. The inference process is

formulated as a constrained nuclear norm and as an ℓ2,1 -norm minimization problem, which is convex and can be

solved efficiently with an augmented Lagrange multiplier method. Compared with previous methods, which usually

make use of multiple features by combining the saliency maps obtained from individual features, the proposed method

RRED Indices: Reduced Reference Entropic Differencing for Image Quality Assessment

Saliency Detection by Multitask Sparsity Pursuit



EGC

1409

EGC

1408

EGC

1410

seamlessly integrates multiple features to produce jointly the saliency map with a single inference step and thus

produces more accurate and reliable results. In addition to the unsupervised setting, the proposed method can be also

generalized to incorporate the top-down priors obtained from supervised environment. Extensive experiments well

validate its superiority over other state-of-the-art methods.

In this paper, we propose a new sharpness enhancement algorithm for stereo images. Although the stereo image and its

applications are becoming increasingly prevalent, there has been very limited research on specialized image

enhancement solutions for stereo images. Recently, a binocular just-noticeable-difference (BJND) model that describes

the sensitivity of the human visual system to luminance changes in stereo images has been presented. We introduce a

novel application of the BJND model for the sharpness enhancement of stereo images. To this end, an over

enhancement problem in the sharpness enhancement of stereo images is newly addressed, and an efficient solution for

reducing the over enhancement is proposed. The solution is found within an optimization framework with additional

constraint terms to suppress the unnecessary increase in luminance values. In addition, the reliability of the BJND

model is taken into account by estimating the accuracy of stereo matching. Experimental results demonstrate that the

proposed algorithm can provide sharpness-enhanced stereo images without producing excessive distortion.

In this paper, we present a post processing method to tackle the single-image refocusing-and-defocusing problem. The

proposed method can accomplish the tasks of focus-map estimation and image refocusing and defocusing. Given an

image with a mixture of focused and defocused objects, we first detect the edges and then estimate the focus map

based on the edge blurriness, which is depicted explicitly by a parametric model. The image refocusing problem is

addressed in a blind deconvolution framework, where the image prior is modeled by using both global and local

constraints. In particular, we correct the defocused blurry edges to sharp ones with the aid of the parametric edge model

and then render this cue as a local prior to ensure the sharpness of the refocused image. Experimental results

demonstrate that the proposed method performs well in producing visually plausible images with different focus effects

from a single input.

Smile detection in face images captured in unconstrained real-world scenarios is an interesting problem with many

potential applications. This paper presents an efficient approach to smile detection, in which the intensity differences

between pixels in the grayscale face images are used as features. We adopt AdaBoost to choose and combine weak

classifiers based on intensity differences to form a strong classifier. Experiments show that our approach has similar

Sharpness Enhancement of Stereo Images Using Binocular Just-Noticeable Difference

Single-Image Refocusing and Defocusing

Smile Detection by Boosting Pixel Differences



EGC

1411

EGC

1413

EGC

1412

accuracy to the state-of-the-art method but is significantly faster. Our approach provides 85% accuracy by examining 20

pairs of pixels and 88% accuracy with 100 pairs of pixels. We match the accuracy of the Gabor-feature-based support

vector machine using as few as 350 pairs of pixels.

We present a new class of continuously defined parametric snakes using a special kind of exponential splines as basis

functions. We have enforced our bases to have the shortest possible support subject to some design constraints to

maximize efficiency. While the resulting snakes are versatile enough to provide a good approximation of any closed

curve in the plane, their most important feature is the fact that they admit ellipses within their span. Thus, they can

perfectly generate circular and elliptical shapes. These features are appropriate to delineate cross sections of

cylindrical-like conduits and to outline bloblike objects. We address the implementation details and illustrate the

capabilities of our snake with synthetic and real data.

In the LMM for hyper spectral images, all the image spectra lie on a high-dimensional simplex with corners called end

members. Given a set of end members, the standard calculation of fractional abundances with constrained least squares

typically identifies the spectra as combinations of most, if not all, end members. We assume instead that pixels are

combinations of only a few end members, yielding abundance vectors that are sparse. We introduce sparse demixing

(SD), which is a method that is similar to orthogonal matching pursuit, for calculating these sparse abundances. We

demonstrate that SD outperforms an existing L1 demixing algorithm, which we prove to depend adversely on the angles

between end members. We combine SD with dictionary learning methods to calculate automatically end members for a

provided set of spectra. Applying it to an airborne visible/infrared imaging spectrometer image of Cuprite, NV yields end

members that compare favorably with signatures from the USGS spectral library.

Super-resolution technology provides an effective way to increase image resolution by incorporating additional

information from successive input images or training samples. Various super-resolution algorithms have been proposed

based on different assumptions, and their relative performances can differ in regions of different characteristics within a

single image. Based on this observation, an adaptive algorithm is proposed in this paper to integrate a higher level

image classification task and a lower level super-resolution process, in which we incorporate reconstruction-based

super-resolution algorithms, single-image enhancement, and image/video classification into a single comprehensive

framework. The target high-resolution image plane is divided into adaptive-sized blocks, and different suitable super-

resolution algorithms are automatically selected for the blocks. Then, a deblocking process is applied to reduce block

Snakes with an Ellipse-Reproducing Property

Sparse Demixing of Hyper spectral Images

Spatially Adaptive Block-Based Super-Resolution



EGC

1415

EGC

1414

EGC

1416

edge artifacts. A new benchmark is also utilized to measure the performance of super-resolution algorithms.

Experimental results with real-life videos indicate encouraging improvements with our method.

Data-level fusion is believed to have the potential for enhancing human face recognition. However, due to a number of

challenges, current techniques have failed to achieve its full potential. We propose spatially optimized data/pixel-level

fusion of 3-D shape and texture for face recognition. Fusion functions are objectively optimized to model expression and

illumination variations in linear subspaces for invariant face recognition. Parameters of adjacent functions are

constrained to smoothly vary for effective numerical regularization. In addition to spatial optimization, multiple nonlinear

fusion models are combined to enhance their learning capabilities. Experiments on the FRGC v2 data set show that

spatial optimization, higher order fusion functions, and the combination of multiple such functions systematically

improve performance, which is, for the first time, higher than score-level fusion in a similar experimental setup.

In this paper, we exploit the advantages of tensorial representations and propose several tensor learning models for

regression. The model is based on the canonical/parallel-factor decomposition of tensors of multiple modes and allows

the simultaneous projections of an input tensor to more than one direction along each mode. Two empirical risk

functions are studied, namely, the square loss and ε-insensitive loss functions. The former leads to higher rank tensor

ridge regression (TRR), and the latter leads to higher rank support tensor regression (STR), both formulated using the

Frobenius norm for regularization. We also use the group-sparsity norm for regularization, favoring in that way the low

rank decomposition of the tensorial weight. In that way, we achieve the automatic selection of the rank during the

learning process and obtain the optimal-rank TRR and STR. Experiments conducted for the problems of head-pose,

human-age, and 3-D body-pose estimations using real data from publicly available databases, verified not only the

superiority of tensors over their vector counterparts but also the efficiency of the proposed algorithms.

Text-line extraction in unconstrained handwritten documents remains a challenging problem due to nonuniform

character scale, spatially varying text orientation, and the interference between text lines. In order to address these

problems, we propose a new cost function that considers the interactions between text lines and the curvilinearity of

each text line. Precisely, we achieve this goal by introducing normalized measures for them, which are based on

estimated line spacing. We also present an optimization method that exploits the properties of our cost function.

Experimental results on a database consisting of 853 handwritten Chinese document images have shown that our

method achieves a detection rate of 99.52% and an error rate of 0.32%, which outperforms conventional methods.

Spatially Optimized Data-Level Fusion of Texture and Shape for Face Recognition

Tensor Learning for Regression

Text-Line Extraction in Handwritten Chinese Documents Based on an Energy

Minimization Framework



EGC

1417

EGC

1417

EGC

1416

We present a new supervised learning model designed for the automatic segmentation of the left ventricle (LV) of the

heart in ultrasound images. We address the following problems inherent to supervised learning models: 1) the need of a

large set of training images; 2) robustness to imaging conditions not present in the training data; and 3) complex search

process. The innovations of our approach reside in a formulation that decouples the rigid and nonrigid detections, deep

learning methods that model the appearance of the LV, and efficient derivative-based search algorithms. The

functionality of our approach is evaluated using a data set of diseased cases containing 400 annotated images (from 12

sequences) and another data set of normal cases comprising 80 annotated images (from two sequences), where both

sets present long axis views of the LV. Using several error measures to compute the degree of similarity between the

manual and automatic segmentations, we show that our method not only has high sensitivity and specificity but also

presents variations with respect to a gold standard (computed from the manual annotations of two experts) within

interuser variability on a subset of the diseased cases. We also compare the segmentations produced by our approach

and by two state-of-the-art LV segmentation models on the data set of normal cases, and the results show that our

approach produces segmentations that are comparable to these two approaches using only 20 training images and

increasing the training set to 400 images causes our approach to be generally more accurate. Finally, we show that

efficient search methods reduce up to tenfold the complexity of the method while still producing competitive

segmentations. In the future, we plan to include a dynamical model to improve the performance of the algorithm, to use

semi supervised learning methods to reduce even more the dependence on rich and large training sets, and to design a

shape model less dependent on the trading set.

Text-line extraction in unconstrained handwritten documents remains a challenging problem due to nonuniform

character scale, spatially varying text orientation, and the interference between text lines. In order to address these

problems, we propose a new cost function that considers the interactions between text lines and the curvilinearity of

each text line. Precisely, we achieve this goal by introducing normalized measures for them, which are based on

estimated line spacing. We also present an optimization method that exploits the properties of our cost function.

Experimental results on a database consisting of 853 handwritten Chinese document images have shown that our

method achieves a detection rate of 99.52% and an error rate of 0.32%, which outperforms conventional methods.

The Segmentation of the Left Ventricle of the Heart from Ultrasound Data Using Deep

Learning Architectures and Derivative-Based Search Methods

Text-Line Extraction in Handwritten Chinese Documents Based on an Energy

Minimization Framework

The Segmentation of the Left Ventricle of the Heart from Ultrasound Data Using Deep

Learning Architectures and Derivative-Based Search Methods



EGC

1419

EGC

1418

We present a new supervised learning model designed for the automatic segmentation of the left ventricle (LV) of the

heart in ultrasound images. We address the following problems inherent to supervised learning models: 1) the need of a

large set of training images; 2) robustness to imaging conditions not present in the training data; and 3) complex search

process. The innovations of our approach reside in a formulation that decouples the rigid and nonrigid detections, deep

learning methods that model the appearance of the LV, and efficient derivative-based search algorithms. The

functionality of our approach is evaluated using a data set of diseased cases containing 400 annotated images (from 12

sequences) and another data set of normal cases comprising 80 annotated images (from two sequences), where both

sets present long axis views of the LV. Using several error measures to compute the degree of similarity between the

manual and automatic segmentations, we show that our method not only has high sensitivity and specificity but also

presents variations with respect to a gold standard (computed from the manual annotations of two experts) within

interuser variability on a subset of the diseased cases. We also compare the segmentations produced by our approach

and by two state-of-the-art LV segmentation models on the data set of normal cases, and the results show that our

approach produces segmentations that are comparable to these two approaches using only 20 training images and

increasing the training set to 400 images causes our approach to be generally more accurate. Finally, we show that

efficient search methods reduce up to tenfold the complexity of the method while still producing competitive

segmentations. In the future, we plan to include a dynamical model to improve the performance of the algorithm, to use

semi supervised learning methods to reduce even more the dependence on rich and large training sets, and to design a

shape model less dependent on the trading set.

A problem of view interpolation from a pair of rectified stereo images with inaccurate depth information is addressed.

Errors in geometric information greatly affect the quality of the resulting images since inaccurate geometry causes

miscorrespondences between the input images. A new theory for quantitatively analyzing the effect of depth errors and

providing a principled optimization scheme based on the mean-squared error metric is proposed. The theory clarifies

that, if the probabilistic distribution of the depth errors is given, an optimized view-interpolation scheme that

outperforms conventional linear interpolation can be derived. It also reveals that, under specific conditions, linear

interpolation is acceptable as an approximation of the optimized-interpolation scheme. Furthermore, band limitation

combined with linear interpolation is also analyzed, leading to an optimal cutoff frequency, which achieves better results

than the antialias scheme proposed in previous studies. Experimental results using real scenes are also presented to

confirm this theory.

Theoretical Analysis of View Interpolation with Inaccurate Depth Information

This is SPIRAL-TAP: Sparse Poisson Intensity Reconstruction ALgorithms—Theory and

Practice



EGC

1421

EGC

1420

Observations in many applications consist of counts of discrete events, such as photons hitting a detector, which

cannot be effectively modeled using an additive bounded or Gaussian noise model, and instead require a Poisson noise

model. As a result, accurate reconstruction of a spatially or temporally distributed phenomenon (f*) from Poisson data

(y) cannot be effectively accomplished by minimizing a conventional penalized least-squares objective function. The

problem addressed in this paper is the estimation of f* from y in an inverse problem setting, where the number of

unknowns may potentially be larger than the number of observations and f* admits sparse approximation. The

optimization formulation considered in this paper uses a penalized negative Poisson log-likelihood objective function

with nonnegativity constraints (since Poisson intensities are naturally nonnegative). In particular, the proposed

approach incorporates key ideas of using separable quadratic approximations to the objective function at each iteration

and penalization terms related to l1 norms of coefficient vectors, total variation seminorms, and partition-based

multiscale estimation methods.

We address the problem of model-based object recognition. Our aim is to localize and recognize road vehicles from

monocular images or videos in calibrated traffic scenes. A 3-D deformable vehicle model with 12 shape parameters is

set up as prior information, and its pose is determined by three parameters, which are its position on the ground plane

and its orientation about the vertical axis under ground-plane constraints. An efficient local gradient-based method is

proposed to evaluate the fitness between the projection of the vehicle model and image data, which is combined into a

novel evolutionary computing framework to estimate the 12 shape parameters and three pose parameters by iterative

evolution. The recovery of pose parameters achieves vehicle localization, whereas the shape parameters are used for

vehicle recognition. Numerous experiments are conducted in this paper to demonstrate the performance of our

approach. It is shown that the local gradient-based method can evaluate accurately and efficiently the fitness between

the projection of the vehicle model and the image data. The evolutionary computing framework is effective for vehicles

of different types and poses is robust to all kinds of occlusion.

Image processing methods that utilize characteristics of the human visual system require color spaces with certain

properties to operate effectively. After analyzing different types of perception-based image processing problems, we

present a list of properties that a unified color space should have. Due to contradictory perceptual phenomena and

geometric issues, a color space cannot incorporate all these properties. We therefore identify the most important

properties and focus on creating opponent color spaces without cross contamination between color attributes (i.e.,

lightness, chroma, and hue) and with maximum perceptual uniformity induced by color-difference formulas. Color

lookup tables define simple transformations from an initial color space to the new spaces. We calculate such tables

using multigrid optimization considering the Hung and Berns data of constant perceived hue and the CMC, CIE94, and

CIEDE2000 color-difference formulas. The resulting color spaces exhibit low cross contamination between color

Three-Dimensional Deformable-Model-Based Localization and Recognition of Road

Vehicles

Toward a Unified Color Space for Perception-Based Image Processing



EGC

1423

EGC

1422

EGC

1424

attributes and are only slightly less perceptually uniform than spaces optimized exclusively for perceptual uniformity.

We compare the CIEDE2000-based space with commonly used color spaces in two examples of perception-based image

processing. In both cases, standard methods show improved results if the new space is used. All color-space

transformations and examples are provided as MATLAB codes on our website

Mosaicing is largely dependent on the quality of registration among the constituent input images. Parallax and object

motion present challenges to image registration, leading to artifacts in the result. To reduce the impact of these artifacts,

traditional image mosaicing approaches often impose planar scene constraints or rely on purely rotational camera

motion or dense sampling. However, these requirements are often impractical or fail to address the needs of all

applications. Instead, taking advantage of depth cues and a smooth transition criterion, we achieve significantly

improved mosaicing results for static scenes, coping effectively with nontrivial parallax in the input. We extend this

approach to the synthesis of dynamic video mosaics, incorporating foreground/background segmentation and a

consistent motion perception criterion. Although further additions are required to cope with unconstrained object

motion, our algorithm can synthesize a perceptually convincing output, conveying the same appearance of motion as

seen in the input sequences.

A method for full-reference visual quality assessment based on the 2-D combination of two diverse metrics is described.

The first metric is a measure of structural information loss based on the Fisher information about the position of the

structures in the observed images. The second metric acts as a categorical indicator of the type of distortion that

images underwent. These two metrics constitute the inner state of a virtual cognitive model, viewed as a system whose

output is the automatic visual quality estimate. The use of a 2-D metric fills the intrinsic incompleteness of methods

based on a single metric while providing consistent response across different image impairment factors and blind

distortion classification capability with a modest computational overhead. The high accuracy and robustness of the

method are demonstrated through cross-validation experiments.

We present a framework for image segmentation based on quadratic programming, i.e., by minimization of a quadratic

regularized energy linearly constrained. In particular, we present a new variational derivation of the quadratic Markov

measure field (QMMF) models, which can be understood as a procedure for regularizing model preferences

Toward Dynamic Image Mosaic Generation with Robustness to Parallax

Two-Dimensional Approach to Full-Reference Image Quality Assessment Based on

Positional Structural Information

Variational Viewpoint of the Quadratic Markov Measure Field Models: Theory and

Algorithms



EGC

1425

EGC

1426

(memberships or likelihoods). We also present efficient optimization algorithms. In the QMMFs, the uncertainty in the

computed regularized probability measure field is controlled by penalizing Gini's coefficient, and hence, it affects the

convexity of the quadratic programming problem. The convex case is reduced to the solution of a positive definite linear

system, and for that case, an efficient Gauss-Seidel (GS) scheme is presented. On the other hand, we present an

efficient projected GS with subspace minimization for optimizing the nonconvex case. We demonstrate the proposal

capabilities by experiments and numerical comparisons with interactive two-class segmentation, as well as the

simultaneous estimation of segmentation and (parametric and nonparametric) generative models. We present

extensions to the original formulation for including color and texture clues, as well as imprecise user scribbles in an

interactive framework.

A There has been considerable recent interest in using wavelets to analyze time series and images that can be regarded

as realizations of certain 1-D and 2-D stochastic processes on a regular lattice. Wavelets give rise to the concept of the

wavelet variance (or wavelet power spectrum), which decomposes the variance of a stochastic process on a scale-by-

scale basis. The wavelet variance has been applied to a variety of time series, and a statistical theory for estimators of

this variance has been developed. While there have been applications of the wavelet variance in the 2-D context (in

particular, in works by Unser in 1995 on wavelet-based texture analysis for images and by Lark and Webster in 2004 on

analysis of soil properties), a formal statistical theory for such analysis has been lacking. In this paper, we develop the

statistical theory by generalizing and extending some of the approaches developed for time series, thus leading to a

large-sample theory for estimators of 2-D wavelet variances. We apply our theory to simulated data from Gaussian

random fields with exponential covariances and from fractional Brownian surfaces. We demonstrate that the wavelet

variance is potentially useful for texture discrimination. We also use our methodology to analyze images of four types of

clouds observed over the southeast Pacific Ocean.

Radiometric degradation is a common problem in the image acquisition part of many applications. There is much

research carried out in an effort to deblur such images. However, it has been proven that it is not always necessary to go

through a burdensome process of deblurring. To tackle this problem, different blur-invariant descriptors have been

proposed so far, which are either in the spatial domain or based on the properties available in the Fourier domain. In this

paper, wavelet-domain blur invariants are proposed for the first time for discrete 2-D signals. These descriptors, which

are invariant to centrally symmetric blurs, inherit the advantages that this domain provides. It is also proven that the

spatial-domain blur invariants are a special version of the proposed invariants. The performance of these invariants will

be demonstrated through experiments.

Wavelet Variance Analysis for Random Fields on a Regular Lattice

Wavelet-Domain Blur Invariants for Image Analysis



EGC

1427

The number of digital images rapidly increases, and it becomes an important challenge to organize these resources

effectively. As a way to facilitate image categorization and retrieval, automatic image annotation has received much

research attention. Considering that there are a great number of unlabeled images available, it is beneficial to develop an

effective mechanism to leverage unlabeled images for large-scale image annotation. Meanwhile, a single image is

usually associated with multiple labels, which are inherently correlated to each other. A straightforward method of image

annotation is to decompose the problem into multiple independent single-label problems, but this ignores the

underlying correlations among different labels. In this paper, we propose a new inductive algorithm for image annotation

by integrating label correlation mining and visual similarity mining into a joint framework. We first construct a graph

model according to image visual features. A multilabel classifier is then trained by simultaneously uncovering the

shared structure common to different labels and the visual graph embedded label prediction matrix for image

annotation. We show that the globally optimal solution of the proposed framework can be obtained by performing

generalized Eigen-decomposition. We apply the proposed framework to both web image annotation and personal album

labeling using the NUS-WIDE, MSRA MM 2.0, and Kodak image data sets, and the AUC evaluation metric. Extensive

experiments on large-scale image databases collected from the web and personal album show that the proposed

algorithm is capable of utilizing both labeled and unlabeled data for image annotation and outperforms other algorithms.

Web and Personal Image Annotation by Mining Label Correlation with Relaxed Visual

Graph Embedding

IEEE FINAL YEAR PROJECTS 2012 2013 - …elysiumtechnologies.info/IEEE-PROJECT 2012-2013... · IEEE...

Documents

Transcript of IEEE FINAL YEAR PROJECTS 2012 2013 - …elysiumtechnologies.info/IEEE-PROJECT 2012-2013... · IEEE...