Elysium PRO Titles with Abstracts 2017-18


Biometric systems can identify individuals based on their unique characteristics. A new biometric based on hand

synergies and their neural representations is proposed here. In this paper, ten subjects were asked to perform six

hand grasps that are shared by most common activities of daily living. Their scalp electroencephalographic

(EEG) signals were recorded using 32 scalp electrodes, of which 18 task-relevant electrodes were used in feature

extraction. In our previous work, we found that hand kinematic synergies, or movement primitives, can be a

potential biometric. In this paper, we combined the hand kinematic synergies and their neural representations to

provide a unique signature for an individual as a biometric. Neural representations of hand synergies were

encoded in spectral coherence of optimal EEG electrodes in the motor and parietal areas. An equal error rate of

7.5% was obtained at the system’s best configuration. Also, it was observed that the best performance was

obtained when movement specific EEG signals in gamma frequencies (30–50Hz) were used as features. The

implications of these first results, improvements, and their applications in the near future are discussed

ETPL BI -

001

Biometrics Based on Hand Synergies and Their Neural Representations

Face spoofing detection is commonly formulated as a two-class recognition problem where relevant features of

both positive (real access) and negative samples (spoofing attempts) are utilized to train the system. However,

the diversity of spoofing attacks, any new means of spoofing attackers, may invent (previously unseen by the

system) the problem of imaging sensor interoperability, and other environmental factors in addition to the small

sample size make the problem quite challenging. Considering these observations, in this paper, a number of

propositions in the evaluation scenario, problem formulation, and solving are presented. First of all, a new

evaluation protocol to study the effect of occurrence of unseen attack types, where the train and test data are

produced by different means, is proposed. The new evaluation protocol better reflects the realistic conditions in

spoofing attempts where an attacker may come up with new means for spoofing. Inter-database and intra-

database experiments are incorporated into the evaluation scheme to account for the sensor interoperability

problem. Second, a new and more realistic formulation of the spoofing detection problem based on the anomaly

detection concept is proposed where the training data come from the positive class only. The test data, of course,

may come from the positive or negative class. Such a one-class formulation circumvents the need for the

availability of negative training samples, which, in an in deal case, should be the representative of all possible

spoofing types. Finally, a thorough evaluation and comparison of 20 different one-class and two-class systems

on the video sequences of three widely employed databases is performed to investigate the merits of the one-

class anomaly detection approaches compared with the common two-class formulations. It is demonstrated that

the anomaly-based formulation is not inferior as compared with the conventional two-class approach

ETPL BI -

002

An Anomaly Detection Approach to Face Spoofing Detection: A New Formulation

and Evaluation Protocol


Wristband-placed physical activity monitors, as a convenient means for counting walking steps, assessing

movement, and estimating energy expenditure, are widely used in daily life. There are many consumer-based

wristband monitors on the market, but there is not an unified method to compare their performance. In this paper,

we designed a series of experiments testing step counting performance under different walking conditions to

evaluate these wristband activity monitors. Seven popular brands, including Huawei B1, Mi Band, Fitbit Charge,

Polar Loop, Garmin Vivofit2, Misfit Shine, and Jawbone Up, were selected and evaluated with the proposed

experiment method in this paper. These experiments include four parts, which are walking in a field at a different

walking speed with and without arm swing, walking along a specified complex path, walking on a treadmill,

and walking up and down stairs. Experiment results and analysis with nine healthy subjects were reported to

show the step counting performance of these seven monitors

ETPL BI -

003

Evaluation on Step Counting Performance of Wristband Activity Monitors in Daily

Living Environment

Matching heterogeneous iris images in less constrained applications of iris biometrics is becoming a challenging

task. The existing solutions try to reduce the difference between heterogeneous iris images in pixel intensities

or filtered features. In contrast, this paper proposes a code-level approach in heterogeneous iris recognition. The

non-linear relationship between binary feature codes of heterogeneous iris images is modeled by an adapted

Markov network. This model transforms the number of iris templates in the probe into a homogenous iris

template corresponding to the gallery sample. In addition, a weight map on the reliability of binary codes in the

iris template can be derived from the model. The learnt iris template and weight map are jointly used in building

a robust iris matcher against the variations of imaging sensors, capturing distance, and subject conditions.

Extensive experimental results of matching cross-sensor, high-resolution versus low-resolution and, clear versus

blurred iris images demonstrate the code-level approach can achieve the highest accuracy in compared with the

existing pixel-level, feature-level, and score-level solutions

ETPL BI -

004

A Code-Level Approach to Heterogeneous Iris Recognition


We propose multi-task and multivariate methods for multi-modal recognition based on low-rank and joint sparse

representations. Our formulations can be viewed as generalized versions of multivariate low-rank and sparse

regression, where sparse and low-rank representations across all modalities are imposed. One of our methods

simultaneously couples information within different modalities by enforcing the common low-rank and joint

sparse constraints among multi-modal observations. We also modify our formulations by including an occlusion

term that is assumed to be sparse. The alternating direction method of multipliers is proposed to efficiently solve

the resulting optimization problems. Extensive experiments on three publicly available multi-modal biometrics

and object recognition data sets show that our methods compare favourably with other feature-level fusion

methods

ETPL BI -

005

Low-Rank and Joint Sparse Representations for Multi-Modal Recognition

Body area networks, including smart sensors, are widely reshaping health applications in the new era of smart

cities. To meet increasing security and privacy requirements, physiological signal-based biometric human

identification is gaining tremendous attention. This paper focuses on two major impediments: the signal

processing technique is usually both complicated and data-dependent and the feature engineering is time-

consuming and can fit only specific datasets. To enable a data-independent and highly generalizable signal

processing and feature learning process, a novel wavelet domain multiresolution convolutional neural network

is proposed. Specifically, it allows for blindly selecting a physiological signal segment for identification purpose,

avoiding the complicated signal fiducial characteristics extraction process. To enrich the data representation, the

random chosen signal segment is then transformed to the wavelet domain, where multiresolution time-frequency

representation is achieved. An auto-correlation operation is applied to the transformed data to remove the phase

difference as the result of the blind segmentation operation. Afterward, a multiresolution 1-D-convolutional

neural network (1-D-CNN) is introduced to automatically learn the intrinsic hierarchical features from the

wavelet domain raw data without data-dependent and heavy feature engineering, and perform the user

identification task. The effectiveness of the proposed algorithm is thoroughly evaluated on eight

electrocardiogram datasets with diverse behaviors, such as with or without severe heart diseases, and with

different sensor placement methods. Our evaluation is much more extensive than the state-of-the-art works, and

an average identification rate of 93.5% is achieved. The proposed multiresolution 1-D-CNN algorithm can

effectively identify human subjects, even from randomly selected signal segments and without heavy feature

engineering. This paper is expected to demonstrate the feasibility and effectiveness of applying the blind signal

processing and deep learning techniques to biometric human identification, to enable a low algorithm

engineering effort and also a high generalization ability.

ETPL BI -

006

Heart ID: A Multi resolution Convolutional Neural Network for ECG-Based

Biometric Human Identification in Smart Health Applications


Direction information serves as one of the most important features for palmprint recognition. In the past decade,

many effective direction representation (DR)-based methods have been proposed and achieved promising

recognition performance. However, due to an incomplete understanding for DR, these methods only extract DR

in one direction level and one scale. Hence, they did not fully utilize all potentials of DR. In addition, most

researchers only focused on the DR extraction in spatial coding domain, and rarely considered the methods in

frequency domain. In this paper, we propose a general framework for DR-based method named complete DR

(CDR), which reveals DR by a comprehensive and complete way. Different from traditional methods, CDR

emphasizes the use of direction information with strategies of multi-scale, multi-direction level, multi-region,

as well as feature selection or learning. This way, CDR subsumes previous methods as special cases. Moreover,

thanks to its new insight, CDR can guide the design of new DR-based methods toward better performance.

Motived this way, we propose a novel palmprint recognition algorithm in frequency domain. First, we extract

CDR using multi-scale modified finite radon transformation. Then, an effective correlation filter, namely, band-

limited phase-only correlation, is explored for pattern matching. To remove feature redundancy, the sequential

forward selection method is used to select a small number of CDR images. Finally, the matching scores obtained

from different selected features are integrated using score-level-fusion. Experiments demonstrate that our

method can achieve better recognition accuracy than the other state-of-the-art methods. More importantly, it has

fast matching speed, making it quite suitable for the large-scale identification applications

ETPL BI -

007

Palm print Recognition Based on Complete Direction Representation

Swipe fingerprint scanners (sensors) can be distinguished based on their scanner pattern-a sufficiently unique,

persistent, and unalterable intrinsic characteristic even to scanners of the same technology, manufacturer, and

model. We propose a method to extract the scanner pattern from a single image acquired by a widely-used

capacitive swipe fingerprint scanner and compare it with a similarly extracted pattern from another image

acquired by the same or by another scanner. The method is extremely simple and computationally efficient as it

based on moving-average filtering, yet it is very accurate and achieves an equal error rate below 0.1% for 27

swipe fingerprint scanners of exactly the same model. We also show the receiver operating characteristic for

different decision thresholds of two modes of the method. The method can enhance the security of a biometric

system by detecting an attack on the scanner in which an image containing the fingerprint pattern of the

legitimate user and acquired by the authentic fingerprint scanner has been replaced by another image that may

still contain the fingerprint pattern of the legitimate user but has been acquired by another, unauthentic

fingerprint scanner, i.e., for scanner authentication

ETPL BI -

008

Authentication of Swipe Fingerprint Scanners


Markov Random Fields (MRFs) are a popular tool in many computer vision problems and faithfully model a

broad range of local dependencies. However, rooted in the Hammersley-Clifford theorem, they face serious

difficulties in enforcing the global coherence of the solutions without using too high order cliques that reduce

the computational effectiveness of the inference phase. Having this problem in mind, we describe a multi-layered

(hierarchical) architecture for MRFs that is based exclusively in pairwise connections and typically produces

globally coherent solutions, with 1) one layer working at the local (pixel) level, modeling the interactions

between adjacent image patches; and 2) a complementary layer working at the object (hypothesis) level pushing

toward globally consistent solutions. During optimization, both layers interact into an equilibrium state that not

only segments the data, but also classifies it. The proposed MRF architecture is particularly suitable for problems

that deal with biological data (e.g., biometrics), where the reasonability of the solutions can be objectively

measured. As test case, we considered the problem of hair / facial hair segmentation and labeling, which are soft

biometric labels useful for human recognition in-the-wild. We observed performance levels close to the state-

of-the-art at a much lower computational cost, both in the segmentation and classification (labeling) tasks

ETPL BI -

009

Soft Biometrics: Globally Coherent Solutions for Hair Segmentation and Style

Recognition Based on Hierarchic

Faces carry a lot of information to distinguish different individuals. In this context, biometrics-based verification

systems play a major role in terms of recognizing (or confirming) an individual identity, relying on physiological

and/or behavioral characteristics among a set of individual biometric traits. In particular, facial recognition is

important because it has a relatively low cost (i.e., it can be carried out using standard cameras) and is one of

the least intrusive biometric modalities available, since it does not require physical contact like fingerprint

recognition or retina scanning

ETPL BI -

010

Facial biometrics and applicationsal MRFs


In this paper, a novel joint sparse representation method is proposed for robust face recognition. We embed both

group sparsity and kernelized locality-sensitive constraints into the framework of sparse representation. The

group sparsity constraint is designed to utilize the grouped structure information in the training data. The local

similarity between test and training data is measured in the kernel space instead of the Euclidian space. As a

result, the embedded nonlinear information can be effectively captured, leading to a more discriminative

representation. We show that, by integrating the kernelized local-sensitivity constraint and the group sparsity

constraint, the embedded structure information can be better explored, and significant performance improvement

can be achieved. On the one hand, experiments on the ORL, AR, extended Yale B, and LFW data sets verify

the superiority of our method. On the other hand, experiments on two unconstrained data sets, the LFW and the

IJB-A, show that the utilization of sparsity can improve recognition performance, especially on the data sets

with large pose variation

ETPL BI -

011

Robust Face Recognition with Kernelized Locality-Sensitive Group Sparsity

Representation

A common practice in modern face recognition methods is to specifically align the face area based on the prior

knowledge of human face structure before recognition feature extraction. The face alignment is usually

implemented independently, causing difficulties in the designing of end-to-end face recognition models. We

study the possibility of end-to-end face recognition through alignment learning in which neither prior knowledge

on facial landmarks nor artificially defined geometric transformations are required. Only human identity clues

are used for driving the automatic learning of appropriate geometric transformations for the face recognition

task. Trained purely on publicly available datasets, our model achieves a verification accuracy of 99.33% on the

LFW dataset, which is on par with state-of-the-art single model methods

ETPL BI –

012

33

Toward End-to-End Face Recognition through Alignment Learning


In this paper, we propose a simultaneous feature and dictionary learning (SFDL) method for image set-based

face recognition, where each training and testing example contains a set of face images, which were captured

from different variations of pose, illumination, expression, resolution, and motion. While a variety of feature

learning and dictionary learning methods have been proposed in recent years and some of them have been

successfully applied to image set-based face recognition, most of them learn features and dictionaries for facial

image sets individually, which may not be powerful enough because some discriminative information for

dictionary learning may be compromised in the feature learning stage if they are applied sequentially, and vice

versa. To address this, we propose a SFDL method to learn discriminative features and dictionaries

simultaneously from raw face pixels so that discriminative information from facial image sets can be jointly

exploited by a one-stage learning procedure. To better exploit the nonlinearity of face samples from different

image sets, we propose a deep SFDL (D-SFDL) method by jointly learning hierarchical non-linear

transformations and class-specific dictionaries to further improve the recognition performance. Extensive

experimental results on five widely used face data sets clearly shows that our SFDL and D-SFDL achieve very

competitive or even better

ETPL BI -

013

Simultaneous Feature and Dictionary Learning for Image Set Based Face

Recognition

This paper addresses the problem of face recognition when there is only few, or even only a single, labeled

examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance

variables, both linear (i.e., additive nuisance variables such as bad lighting, wearing of glasses) and non-linear

(i.e., non-additive pixel-wise nuisance variables such as expression changes). The small number of labeled

examples means that it is hard to remove these nuisance variables between the training and testing faces to obtain

good recognition performance. To address the problem we propose a method called Semi-Supervised Sparse

Representation based Classification (S3RC). This is based on recent work on sparsity where faces are

represented in terms of two dictionaries: a gallery dictionary consisting of one or more examples of each person,

and a variation dictionary representing linear nuisance variables (e.g., different lighting conditions, different

glasses). The main idea is that (i) we use the variation dictionary to characterize the linear nuisance variables

via the sparsity framework, then (ii) prototype face images are estimated as a gallery dictionary via a Gaussian

Mixture Model (GMM), with mixed labeled and unlabeled samples in a semi-supervised manner, to deal with

the non-linear nuisance variations between labeled and unlabeled samples. We have done experiments with

insufficient labeled samples, even when there is only a single labeled sample per person. Our results on the AR,

Multi-PIE, CAS-PEAL, and LFW databases demonstrate that the proposed method is able to deliver

significantly improved performance over existing methods

ETPL BI -

014

Semi-Supervised Sparse Representation Based Classification for Face Recognition

with Insufficient Labeled Samples


Face recognition (FR) via regression analysis-based classification has been widely studied in the past several

years. Most existing regression analysis methods characterize the pixelwise representation error via l1-norm or

l2-norm, which overlook the 2D structure of the error image. Recently, the nuclear norm-based matrix regression

model is proposed to characterize low-rank structure of the error image. However, the nuclear norm cannot

accurately describe the low-rank structural noise when the incoherence assumptions on the singular values does

not hold, since it overpenalizes several much larger singular values. To address this problem, this paper presents

the robust nuclear norm to characterize the structural error image and then extends it to deal with the mixed

noise. The majorization-minimization (MM) method is applied to derive a iterative scheme for minimization of

the robust nuclear norm optimization problem. Then, an efficiently alternating direction method of multipliers

(ADMM) method is used to solve the proposed models. We use weighted nuclear norm as classification criterion

to obtain the final recognition results. Experiments on several public face databases demonstrate the

effectiveness of our models in handling with variations of structural noise (occlusion, illumination, and so on)

and mixed noise

ETPL BI -

015

Robust Nuclear Norm-Based Matrix Regression with Applications to Robust Face

Recognition

Heterogeneous face recognition is an important, yet challenging problem in face recognition community. It

refers to matching a probe face image to a gallery of face images taken from alternate imaging modality. The

major challenge of heterogeneous face recognition lies in the great discrepancies between different image

modalities. Conventional face feature descriptors, e.g., local binary patterns, histogram of oriented gradients,

and scale-invariant feature transform, are mostly designed in a handcrafted way and thus generally fail to extract

the common discriminant information from the heterogeneous face images. In this paper, we propose a new

feature descriptor called common encoding model for heterogeneous face recognition, which is able to capture

common discriminant information, such that the large modality gap can be significantly reduced at the feature

extraction stage. Specifically, we turn a face image into an encoded one with the encoding model learned from

the training data, where the difference of the encoded heterogeneous face images of the same person can be

minimized. Based on the encoded face images, we further develop a discriminant matching method to infer the

hidden identity information of the cross-modality face images for enhanced recognition performance. The

effectiveness of the proposed approach is demonstrated (on several public-domain face datasets) in two typical

heterogeneous face recognition scenarios: matching NIR faces to VIS faces and matching sketches to

photographs

ETPL BI -

016

Heterogeneous Face Recognition: A Common Encoding Feature Discriminant

Approach


The extraction of descriptive features from the sequences of faces is a fundamental problem in facial expression

analysis. Facial expressions are represented by psychologists as a combination of elementary movements known

as action units: each movement is localised and its intensity is specified with a score that is small when the

movement is subtle and large when the movement is pronounced. Inspired by this approach, we propose a novel

data-driven feature extraction framework that represents facial expression variations as a linear combination of

localised basis functions, whose coefficients are proportional to movement intensity. We show that the linear

basis functions of the proposed framework can be obtained by training a sparse linear model with Gabor phase

shifts computed from facial videos. The proposed framework addresses generalisation issues that are not tackled

by existing learnt representations, and achieves, with the same learning parameters, state-of-the-art results in

recognising both posed expressions and spontaneous micro-expressions. This performance is confirmed even

when the data used to train the model differ from test data in terms of the intensity of facial movements and

frame rate

ETPL BI -

017

Learning Bases of Activity for Facial Expression Recognition

Face alignment aims at localizing multiple facial landmarks for a given facial image, which usually suffers from

large variances of diverse facial expressions, aspect ratios and partial occlusions, especially when face images

were captured in wild conditions. Conventional face alignment methods extract local features and then directly

concatenate these features for global shape regression. Unlike these methods which cannot explicitly model the

correlation of neighbouring landmarks and motivated by the fact that individual landmarks are usually

correlated, we propose a deep sharable and structural detectors (DSSD) method for face alignment. To achieve

this, we firstly develop a structural feature learning method to explicitly exploit the correlation of neighbouring

landmarks, which learns to cover semantic information to disambiguate the neighbouring landmarks. Moreover,

our model selectively learns a subset of sharable latent tasks across neighbouring landmarks under the paradigm

of the multi-task learning framework, so that the redundancy information of the overlapped patches can be

efficiently removed. To better improve the performance, we extend our DSSD to a recurrent DSSD (R-DSSD)

architecture by integrating with the complementary information from multi-scale perspectives. Experimental

results on the widely used benchmark datasets show that our methods achieve very competitive performance

compared to the state-of-the-arts

ETPL BI -

018

Learning Deep Sharable and Structural Detectors for Face Alignment


Significant effort has been devoted within the visual tracking community to rapid learning of object properties

on the fly. However, state-of-the-art approaches still often fail in cases such as rapid out-of-plane rotation, when

the appearance changes suddenly. One of the major contributions of this work is a radical rethinking of the

traditional wisdom of modelling 3D motion as appearance change during tracking. Instead, 3D motion is

modelled as 3D motion. This intuitive but previously unexplored approach provides new possibilities in visual

tracking research. Firstly, 3D tracking is more general, as large out-of-plane motion is often fatal for 2D trackers,

but helps 3D trackers to build better models. Secondly, the tracker’s internal model of the object can be used in

many different applications and it could even become the main motivation, with tracking supporting

reconstruction rather than vice versa. This effectively bridges the gap between visual tracking and Structure

from Motion. A new benchmark dataset of sequences with extreme out-ofplane rotation is presented and an

online leader-board offered to stimulate new research in the relatively underdeveloped area of 3D tracking. The

proposed method, provided as a baseline, is capable of successfully tracking these sequences, all of which pose

a considerable challenge to 2D trackers (error reduced by 46 %)

ETPL BI -

019

TMAGIC: A Model-free 3D Tracker

As more and more stereo cameras are installed on electronic devices, we are motivated to investigate how to

leverage disparity information for autofocus. The main challenge is that stereo images captured for disparity

estimation are subject to defocus blur unless the lenses of the stereo cameras are at the in-focus position.

Therefore, it is important to investigate how the presence of defocus blur would affect stereo matching and, in

turn, the performance of disparity estimation. In this paper, we give an analytical treatment of this fundamental

issue of disparity-based autofocus by investigating the relation between image sharpness and disparity error. A

statistical approach that treats the disparity estimate as a random variable is developed. Our analysis provides a

theoretical backbone for the empirical observation that, regardless of the initial lens position, disparity-based

autofocus can bring the lens to the hill zone of the focus profile in one movement. The insight gained from the

analysis is useful for the implementation of an autofocus system

ETPL BI -

020

Analysis of Disparity Error for Autofocus


Most existing salient object detection methods compute the saliency for pixels, patches, or superpixels by

contrast. Such fine-grained contrast-based salient object detection methods are stuck with saliency attenuation

of the salient object and saliency overestimation of the background when the image is complicated. To better

compute the saliency for complicated images, we propose a hierarchical contour closure-based holistic salient

object detection method, in which two saliency cues, i.e., closure completeness and closure reliability, are

thoroughly exploited. The former pops out the holistic homogeneous regions bounded by completely closed

outer contours, and the latter highlights the holistic homogeneous regions bounded by averagely highly reliable

outer contours. Accordingly, we propose two computational schemes to compute the corresponding saliency

maps in a hierarchical segmentation space. Finally, we propose a framework to combine the two saliency maps,

obtaining the final saliency map. Experimental results on three publicly available datasets show that even each

single saliency map is able to reach the state-of-the-art performance. Furthermore, our framework, which

combines two saliency maps, outperforms the state of the arts. Additionally, we show that the proposed

framework can be easily used to extend existing methods and further improve their performances substantially

ETPL BI -

021

Hierarchical Contour Closure-Based Holistic Salient Object Detection

Word spotting strategies employed in historical handwritten documents face many challenges due to variation

in the writing style and intense degradation. In this paper, a new method that permits effective word spotting in

handwritten documents is presented that it relies upon document-oriented local features, which take into account

information around representative keypoints as well a matching process that incorporates spatial context in a

local proximity search without using any training data. Experimental results on four historical handwritten data

sets for two different scenarios (segmentation-based and segmentation-free) using standard evaluation measures

show the improved performance achieved by the proposed methodology

ETPL BI -

022

Unsupervised Word Spotting in Historical Handwritten Document Images using

Document-oriented Local Features


Most existing salient object detection methods compute the saliency for pixels, patches or superpixels by

contrast. Such fine-grained contrast based salient object detection methods are stuck with saliency attenuation

of the salient object and saliency overestimation of the background when the image is complicated. To better

compute the saliency for complicated images, we propose a hierarchical contour closure based holistic salient

object detection method, in which two saliency cues, i.e., closure completeness and closure reliability are

thoroughly exploited. The former pops out the holistic homogeneous regions bounded by completely closed

outer contours, and the latter highlights the holistic homogeneous regions bounded by averagely highly reliable

outer contours. Accordingly, we propose two computational schemes to compute the corresponding saliency

maps in a hierarchical segmentation space. Finally, we propose a framework to combine the two saliency maps,

obtaining the final saliency map. Experimental results on three publicly available datasets show that even each

single saliency map is able to reach the state-of-the-art performance. Furthermore, our framework which

combines two saliency maps outperforms the state of the arts. Additionally, we show that the proposed

framework can be easily used to extend existing methods and further improve their performances substantially

ETPL BI -

023

Hierarchical Contour Closure based Holistic Salient Object Detection

Elysium PRO Titles with Abstracts 2017-18 · Elysium PRO Titles with Abstracts 2017-18 extraction....

Documents

Transcript of Elysium PRO Titles with Abstracts 2017-18 · Elysium PRO Titles with Abstracts 2017-18 extraction....