Recursive spatiotemporal subspace learning for gait recognition
Gait recognition and micro-expression recognition based on ... · Multilinear DR have many various...
Transcript of Gait recognition and micro-expression recognition based on ... · Multilinear DR have many various...
ORIGINAL ARTICLE
Gait recognition and micro-expression recognition basedon maximum margin projection with tensor representation
Xianye Ben1,2 • Peng Zhang1,2 • Rui Yan3 • Mingqiang Yang1 • Guodong Ge1
Received: 29 August 2014 / Accepted: 11 August 2015 / Published online: 4 September 2015
� The Natural Computing Applications Forum 2015
Abstract We contribute, through this paper, to design a
novel algorithm called maximum margin projection with
tensor representation (MMPTR). This algorithm is able to
recognize gait and micro-expression represented as third-
order tensors. Through maximizing the inter-class Lapla-
cian scatter and minimizing the intra-class Laplacian
scatter, MMPTR can seek a tensor-to-tensor projection that
directly extracts discriminative and geometry-preserving
features from the original tensorial data. We show the
validity of MMPTR through extensive experiments on the
CASIA(B) gait database, TUM GAID gait database, and
CASME micro-expression database. The proposed
MMPTR generally obtains higher accuracy than MPCA,
GTDA as well as state-of-the-art DTSA algorithm.
Experimental results included in this paper suggest that
MMPTR is especially effective in such tensorial object
recognition tasks.
Keywords Maximum margin projection with tensor
representation (MMPTR) � Dimensionality reduction � Gaitrecognition � Micro-expression recognition
1 Introduction
In machine learning and statistics, dimensionality reduction
(DR) is the process of reducing the number of random
variables under consideration. DR is commonly defined as
the process of mapping high-dimensional data to a lower-
dimensional embedding [1]. Engel et al. [1] divided the
basic approaches into two classes: projection-based and
manifold learning methods. Projection-based methods are
simply based on linear inner product transformations, while
manifold learning methods can capture certain distance
relationships in a nonlinear data structure along a manifold.
Figure 1 provides a schematic diagram of DR techniques.
If the data are modeled as a graph by the graph theory in
order to optimize and learn the distances in data space, we
can view these methods as graph-based ones. All the pro-
jection-based methods can learn the embedding of metric
distances. Manifold learning methods can learn the
embedding of nonmetric distances except graph-based
methods. From the data processing point of view, projec-
tion-based methods can deal with linear data except graph-
based technology; on the contrary, nonlinear data should be
handled by manifold learning methods.
In image processing, most traditional DR algorithms,
such as principal component analysis (PCA), linear dis-
criminant analysis (LDA), multidimensional scaling
(MDS) [2], Isomap [3], locally linear embedding (LLE) [4]
and recent works, such as covariance operator inverse
regression (COIR) [5], collaborative representation-based
projections (CRP) [6], and maximal linear embedding
(MLE) [7], treat an input image or sequence as a vector
before embedding. It seriously destroys the intrinsic tensor
structure information of high-order data, and at the same
time, may be beyond the computational processing capa-
bility of computer devices after the vectorization.
& Xianye Ben
1 School of Information Science and Engineering, Shandong
University, No. 27, Shanda South Road, Jinan 250100,
People’s Republic of China
2 Key Laboratory of Intelligent Perception and Systems for
High-Dimensional Information, Ministry of Education,
Nanjing University of Science and Technology,
Nanjing 210094, People’s Republic of China
3 Computer Science Department, Rensselaer Polytechnic
Institute, Troy, NY 12180, USA
123
Neural Comput & Applic (2016) 27:2629–2646
DOI 10.1007/s00521-015-2031-8
To address these problems, in [8], He et al. put forward
tensor subspace analysis (TSA) to do second-order tensor
dimension reduction with the help of s the intrinsic local
geometrical structure of the second-order tensor space.
Vasilescu et al. [9] proposed multilinear ICA (MICA)
model to learn the statistically independent components of
multiple factors. Since then multilinear DR operates
directly on the tensor samples that were emerging. Raj
et al. [10] proposed the fast MICA. Motivated by PCA, Lu
et al. [11] introduced multilinear principal component
analysis (MPCA) to capture most of the original tensorial
input variation. Nonnegative multilinear principal compo-
nent analysis (NMPCA) [12] was proposed for the
dimensionality reduction in the tensors by maximizing the
total tensor scatter while preserving the nonnegativity of
auditory representations. MICA, MPCA, and NMPCA are
unsupervised methods since they learn purely from the
dataset without knowing any class information. Multilinear
discriminant analysis (MLDA) approaches [13] may be
more reliable, since they are supervised ones. Among them,
discriminant analysis with tensor representation (DATER)
[14] maximized the dividing inter-class scatters by the
intra-class scatters; general tensor discriminant analysis
(GTDA) [15] maximized the differential inter-class scatters
and the weighted intra-class scatters; compound rank-
k projections for bilinear analysis [16] utilized multiple
rank-k mappings to increase monotonicity, at the same
time preserved the correlations within the matrix; tensor
discriminative locality alignment [17] can preserve the
discriminative locality for classification.
Uncorrelated features are desirable in recognition tasks
since they contain minimum redundancy and ensure inde-
pendence of features. Lu et al. improved MPCA and
MLDA. Uncorrelated multilinear principal component
analysis (UMPCA) [18] can produce uncorrelated features
while capturing most of the variation in the original ten-
sorial input. Instead of uncorrelated features, orthogonal
multilinear discriminant analysis (OMDA) [19] can extract
orthogonal discriminative features.
Some graph-based approaches have been proposed for
multilinear feature extraction and dimensionality reduction
in various pattern classification tasks. To preserve the
structural information of original tensor data, Lu et al. [20]
proposed uncorrelated multilinear geometry-preserving
projections (UMGPP) to obtain uncorrelated projection
directions, and Li et al. [21] proposed discriminant locally
linear embedding/tensorization (DLLE/T) to learn the pro-
jection directions by maximizing the margins between point
pairs on different classes. Tensor locality-preserving pro-
jections (TLPP) [22] exploited the intrinsic local geometric
and topological properties of the manifold. Tensor neigh-
borhood-preserving discriminant projections (TNPDP) [23]
encouraged instances from the same class to be close and
instances from different classes to be far. It considered
locality and discriminative information simultaneously. Lu
et al. [24] proposed a multilinear locality-preserving
canonical correlation analysis (MLPCCA) which sought
multiple sets of pairwise projection bases by maximizing the
correlation of two image sets. Wang et al. [25] introduced
discriminant tensor subspace analysis (DTSA) algorithm by
maximizing the quotient of the between-class scatter and
within-class scatter. Han et al. [26] presented multilinear
supervised neighborhood embedding (MSNE), which
directly dealt with the local descriptor tensor for extracting
discriminant and compact features. Liu et al. [27] proposed a
novel multilinear locality-preserved maximum information
embedding (MLPMIE) algorithm to preserve the local
geometry and to maximize the global discrimination simul-
taneously. Another graph-based approach preserves the
global geometry, for example multilinear isometric embed-
ding (MIE) [28]. In this paper, we concentrate on learning
multilinearmaximummargin projections under the guidance
of locality-preserving and discriminant analysis.
Multilinear DR have many various applications, both in
feature extraction and classification. Liu et al. [29] gave the
3D X-ray transform within a multilinear framework and
proposed a multilinear X-ray transform feature represen-
tation. Feng et al. [30] generated multilinear active
appearance model (MAAM) from an incomplete training
tensor with missing values to achieve face recognition
under the viewpoint, illumination, and expression varia-
tions. This paper focuses on the maximum margin
DR
projection-based methods manifold learning methodsgraph-based
methods
NoYesYesmetric
non-linearnon-linearlinear data
Fig. 1 Schematic diagram of
DR techniques
2630 Neural Comput & Applic (2016) 27:2629–2646
123
projection with tensor representation (MMPTR) applied to
gait recognition and micro-expression recognition because
these recognition issues have some common ground: (1)
The beginning and final frames should be labeled when a
single recognized sample is defined; thus, the single sample
can be viewed as a third-order tensor including all the
frames between the beginning and final frames; (2) the
dimensionality reduction approach to these tensorial data is
required to reduce excess redundancy and extract the dis-
criminant feature for the recognition task.
Gait recognition has attracted significant attention
because of its wide range of applications for visual
surveillance in security-sensitive environments, such as
banks, airports, and parking lots [31]. Ben et al. [32] sur-
veyed various methods for gait recognition summarized
from unique ones for anthropometry, spatial temporal,
kinematics, kinetics, and video stream data forms. Some
representative feature expression methods are key frame
[33], time normalization [34], time series [35], outer sil-
houette [36], moments [37], modeling [38], projection
method [39], template energy images [40], fusion [41], and
tensor-based method [15]. However, the aforementioned
gait expression technologies except tensor-based method
and time normalization lost the dynamic information of
gait, which is significant in the gait recognition. Because of
the complicated adjustment process of time normalization,
this paper focuses on the tensor-based gait recognition. The
gait silhouette sequence can be viewed as a third-order
tensor with column, row, and time modes.
Micro-expression is a fast facial movement which usu-
ally lasts for 1/25 to 1/5 s; it reveals a real emotion that
people try to suppress and conceal. Micro-expression is
differentiated from expression by its short duration, so that
one can scarcely notice the micro-expression. Unlike reg-
ular expressions, few can fake a micro-expression because
it is an expression of spontaneous movements. Micro-ex-
pression may include all or a part of the facial muscle
movements of regular expressions. One technique to detect
lies is through the identification of facial micro-expressions
[42], and such the wide range of applications as settling
lawsuits in assisting judicial departments, business nego-
tiation, psychological counseling, and other fields resort to
micro-expressions. Researchers in the field of computer
vision have tried to develop micro-expression detection
algorithms but lack recognition algorithms in real meaning.
Pfister et al. [43] accurately detected the very short
expressions using a high-speed camera. Only Fu and Wang
[25] achieved the micro-expression recognition, and they
also viewed the micro-expression sequence as the tensor
sample, and proposed DTSA algorithm for distinguishing
tense, repression, disgust, and surprise.
Motivated by the discussions above and maximum
margin criterion [44], this paper aims to develop a MMPTR
that extracts locality-preserving and discriminant features.
The criterion of MMPTR is designed to strive for a serial of
transformation matrices, by maximizing the differential the
inter-class Laplacian scatters and the intra-class Laplacian
scatters. The solution is iterative, based on the alternating
projection method. Then, two classification methods,
namely direct classification and classification after tensor
vectorization, can be used in the classification. The latter
can be adopted to enhance the recognition performance and
compress the features. The effectiveness of the proposed
method has been strictly evaluated against the
CASIA(B) and TUM GAID gait database for gait recog-
nition, as well as CASME database for micro-expression
recognition.
More specifically, our contributions are as follows. First,
we propose a novel criterion for multilinear DR, which
maximizes the differential the inter-class Laplacian scatters
and the intra-class Laplacian scatters. Second, the classi-
fication after tensor vectorization can enhance the recog-
nition performance and compress the features furthermore.
Third, a solution to micro-expression recognizer by rep-
resenting the micro-expression samples as tensors and
extracting locality-preserving and discriminant features
from them was developed.
The rest of paper is organized as follows: Sect. 2
introduces tensor algebra briefly. In Sect. 3, we propose
MMPTR with an algorithm derived as an iterative process.
This section also discusses classification of MMPTR fea-
tures, initialization, convergence, termination, connections
to other tensorial subspace methods, and computational
complexity. Section 4 evaluates the effectiveness of
MMPTR in gait recognition and micro-expression recog-
nition tasks by comparing its performance against MPCA,
GTDA, and DTSA. Finally, Sect. 5 draws the conclusions.
2 Tensor algebra
2.1 n-mode unfolding of a third tensor
A data point A in the tensor space RI1�I2�����IN denotes Nth-
order tensor, where a tensor is represented by using ‘‘Eu-
clid Math One’’ font. A has N indices in; n ¼ 1; . . .;N,
where in addresses the n-mode of A. A can be unfolded
along n kinds of directions, and transformed into vectors
with rank-1. AðnÞ 2 RIn�ðI1�I2�����In�1�Inþ1�����INÞ addresses n-
mode unfolding of A. Taking a third-order tensor for
example, Fig. 21–3 illustrate 1-mode unfolding, 2-mode
unfolding, and 3-mode unfolding.
Neural Comput & Applic (2016) 27:2629–2646 2631
123
2.2 n-mode projection
The projection (or called product) of A by a matrix U 2RJn�In is defined as
ðA �n UÞði1; . . .; in�1; jn; inþ1; . . .; iNÞ¼
XinAði1; . . .; iNÞ � Uðjn; inÞ ð1Þ
Based on the multilinear algebra theory, any tensor Bcan be expressed as another tensor A n-mode
(n ¼ 1; . . .;N) produced by other matrices uð1Þ; uð2Þ; . . .;
uðNÞ
B ¼ A �1 uð1ÞT �2 u
ð2ÞT �3 � � � �N uðNÞT ð2Þ
Figure 3 provides a visual illustration of a tensor pro-
duced by other matrices. In the 1-mode projection, a third-
order tensor A 2 R4�5�3 is projected in the 1-mode vector
space by a transformation matrix uð1Þ 2 R4�2, obtaining a
new tensor A�1 uð1ÞT 2 R2�5�3, therefore, the length of
each 1-mode vector of A is transformed from 4 to 2. Then,
the new tensor is projected by a transformation matrix
uð2Þ 2 R5�2 in the 2-mode projection, resulting in
A�1 uð1ÞT �2 u
ð2ÞT 2 R2�2�3, therefore, the length of each
2-mode vector of A is transformed from 5 to 2. Finally, in
the 3-mode projection, A�1 uð1ÞT �2 u
ð2ÞT �3 uð3ÞT
2 R2�2�2, therefore, the length of each 3-mode vector of Ais transformed from 3 to 2.
3 Maximum margin projection with tensorrepresentation
In this section, an MMPTR solution to the problem of
tensor-based dimensionality reduction is introduced,
investigated, and analyzed.
3.1 Algorithm
Maximum margin projection with tensor representation
(MMPTR) aims to find a multilinear transformation from
the original high-order space RI1 � RI2 � � � � � RIN (where
� denotes Kronecker product) to the reduced dimensional
space RP1 � RP2 � � � � � RPN (with Pn\In, for n ¼ 1; 2;
. . .;N):
X ! Y : Y ¼ X �1~U 1ð ÞT �2
~U 2ð ÞT �3 � � � �N~U Nð ÞT ð3Þ
The input data of MMPTR are a set of training samples
Xm;m ¼ 1; . . .;Mf g, where M is the total number of
(1)
(2)
(3)
Fig. 2 Visual illustration of n-
mode unfolding of a third
tensor. 1 1-mode unfolding, 22-mode unfolding, 3 3-mode
unfolding
2632 Neural Comput & Applic (2016) 27:2629–2646
123
samples. The criterion of MMPTR is designed to strive for
N transformation matrices ~U nð Þ 2 RIn�Pn , which maximize
the inter-class Laplacian scatter UðnÞb and meanwhile min-
imize the intra-class Laplacian scatter UðnÞw . Based on this
point, we have the following optimizations on N transfor-
mation matrices ~U nð Þ: argmax ~UðnÞ UðnÞb and argmin ~UðnÞ UðnÞ
w .
If we define the total Laplacian scatters U nð Þt , we have
UðnÞb ¼ UðnÞ
t � UðnÞw , in which
~UUðnÞ ¼ ~Uðnþ1Þ � ~Uðnþ2Þ � � � � � ~UðNÞ � ~Uð1Þ � ~Uð2Þ
� � � � � ~Uðn�1Þ ð6Þ
where the superscript (n) denotes n-mode, IPnis a unit
matrix with the size of Pn � Pn, c is the number of classes,
andMi ði ¼ 1; . . .; cÞ is the sample number of class i. �X nð Þ is
the total average n-mode matrix of all the samples, which
can be expressed as
�X nð Þ ¼ ð1=MÞXM
m¼1
Xm nð Þ ð7Þ
where Xm nð Þ is the n-mode matrix for the sample m.
�XðiÞðnÞ is the average n-mode matrix of the samples
belonging to class i, which can be expressed as
�XðiÞðnÞ ¼ ð1=MiÞ
XMi
j¼1
XðiÞjðnÞ ð8Þ
where XðiÞjðnÞ is the n-mode matrix for the sample j from class i.
L ¼ D�W , Lw ¼ diag Dð1Þ�W ð1Þ
M1; D
ð2Þ�Wð2Þ
M2; . . .; D
ðcÞ�WðcÞ
Mc
� �
are two kinds of graph Laplacian matrices used to maxi-
mally preserve certain local nonlinear geometry of the
tensor data, where W is the Gaussian similarity matrix with
each entry wij, D is a diagonal matrix; its entries are col-
umn (or row) sum of W, dii ¼P
j wij.
Fig. 3 Visual illustration of a
tensor produced by other
matrices
UðnÞt ¼ 1
M
XM
m¼1
XmðnÞ � �XðnÞ� �
~UUðnÞ L� IPnð Þ ~UT
UðnÞ XmðnÞ � �XðnÞ� �T
¼ 1
2M2
XM
m¼1
XM
i¼1
XmðnÞ � XiðnÞ� �
~UUðnÞ L� IPnð Þ ~UT
UðnÞ XmðnÞ � XiðnÞ� �T
ð4Þ
U nð Þw ¼ 1
M
Xc
i¼1
XMi
j¼1
XðiÞjðnÞ � �X
ðiÞðnÞ
� �~UUðnÞ Lw � IPn
ð Þ ~UTUðnÞ X
ðiÞjðnÞ � �X
ðiÞðnÞ
� �T
¼ 1
M
Xc
i¼1
1
2Mi
XMi
j¼1
XMi
k¼1
XðiÞjðnÞ � X
ðiÞkðnÞ
� �~UUðnÞ Lw � IPn
ð Þ ~UTUðnÞ X
ðiÞjðnÞ � X
ðiÞkðnÞ
� �Tð5Þ
Neural Comput & Applic (2016) 27:2629–2646 2633
123
wij ¼ e
� X i�X jk k2
2r2
� �
ð9Þ
where r is a heat kernel parameter. If X i and X j belong to
the same class, the value of X i �X j
�� ��2 should be com-
puted; on the contrary, X i � X j
�� ��2 is set þ1.
W ðiÞ is the Gaussian similarity matrix with each entry
wðiÞkl for class i. DðiÞ is a diagonal matrix for class i; its
entries are column (or row) sum of W ðiÞ, dkk ¼P
l wðiÞkl .
wðiÞkl ¼ e
� Xk�X lk k22r2
� �
ð10Þ
We unify arg ~U nð Þ maxU nð Þb and arg ~U nð Þ minU nð Þ
w into
~UðnÞ; n ¼ 1; . . .;Nn o
¼ arg ~Uð1Þ;...; ~UðNÞ max U nð Þb � U nð Þ
w
� �
¼ arg ~Uð1Þ;...; ~UðNÞmax U nð Þt � 2U nð Þ
w
� �
ð11Þ
constrained by ~UðnÞT ~UðnÞ ¼ I.
There is no known optimal solution which allows for the
simultaneous optimization of N transformation matrices.
An iterative procedure can be utilized to solve Eq. (11).
Assuming ~UðnÞ; n ¼ 1; 2; . . .k � 1; k; . . .;N are known, we
optimize ~UðkÞ acquired by combining the eigenvectors
associated with the largest Pn eigenvalues of the scatter
matrix U nð Þt � 2U nð Þ
w . Then, we iteratively optimize ~Uðkþ1Þ
by updating original ~UðkÞ to the latest optimized result and
fixing other N-2 transformation matrices, and so forth.
Generally, Pn for n ¼ 1; . . .N can mainly depend on
experience. Let kðnÞ�iðnÞbe iðnÞth full-projection eigenvalue for
n-mode. Pn can be determined by the defined testQðnÞðn ¼1; . . .;NÞ as follows:
testQðnÞ ¼PPn
iðnÞ¼1 kðnÞ�iðnÞ
PIniðnÞ¼1
kðnÞ�iðnÞ
ð12Þ
wherePPn
iðnÞ¼1 kðnÞ�iðnÞ
denotes the sum of the largest Pn
eigenvalues after the truncation of n-mode eigenvectors
beyond the Pnth, andPIn
iðnÞ¼1 kðnÞ�iðnÞ
denotes the sum of all
the eigenvalues before the truncation of n-mode eigen-
vectors. In order to simplify the selection of
testQðnÞðn ¼ 1; . . .;NÞ, testQ ¼ testQð1Þ ¼ testQð2Þ ¼ � � � ¼testQðNÞ. Thus, the dimensionality of each mode can be
selected by testQðnÞðn ¼ 1; . . .;NÞ.Table 1 summarizes the aforementioned procedure for
optimization.
Table 1 Procedure of MMPTRInput The training sample set , 1, ,
Output Projected training sample set 1 T 2 T T1 2 3 ,,1,= and set
of transformation matrices R
Procedure
(1) Preprocessing: The training sample set is centered , 1, , ,
where is the mean of all the training samples. (2) Initialization: it will be discussed in the section 3.3. (3) Optimization: Iteration cycle for =1 to { Mode cycle for =1to {
Calculate the total Laplacian scatters by Eq.(4)
Calculate the inter-class Laplacian scatters by Eq.(5)
Optimize ( ) , 1, , by Eq. (11)
end Judge convergence: it will be discussed in the section 3.3.
end
2634 Neural Comput & Applic (2016) 27:2629–2646
123
3.2 Classification of MMPTR features
To classify a test tensor sample v0, we can adopt two
classification proposals, namely direct classification and
classification after tensor vectorization.
3.2.1 Direct classification
v0 is projected to be Y0 ¼ v0 �1~U 1ð ÞT �2
~U 2ð ÞT �3 � � � �N
~U Nð ÞT using MMPTR. Then, we calculate the Euclidean
distance Dis between the test sample v0 and each candidate
training sample, and the test sample is assigned the label:
the class of Xm.
DisðXm;X0Þ ¼ argmini
DisðYi;Y0Þ
¼ jjXm �1~U 1ð ÞT �2
~U 2ð ÞT �3 � � � �N~U Nð ÞT
�X0 �1~U 1ð ÞT �2
~U 2ð ÞT �3 � � � �N~U Nð ÞT jjF ð13Þ
where Yiði ¼ 1; . . .;MÞ is the projected result of the
training sample X iði ¼ 1; . . .;MÞ by using Eq. (3), and kkFdenotes F-norm.
3.2.2 Classification after tensor vectorization
The tensor data processed by MMPTR still consist of a
large number of redundant variables, and the so-called
tensor vectorization can further remove redundancy. The
entries of the new tensor Ym are rearranged into a vector
ym, ordered according to the class discriminability Cp1p2���pNin a descending order
Cp1p2���pN ¼Pc
i¼1Ni�Yiðp1;p2; . . .;pNÞ� �Yðp1;p2; . . .;pNÞ½ �2
PMm¼1 Ymðp1;p2; . . .;pNÞ� �Ycmðp1;p2; . . .;pNÞ½ �2
ð14Þ
where �Ycm ,�Yi, and �Y are the class mean feature tensor of
Xm, the class mean feature tensor belonging to class i, and
the total mean feature tensor in the projected tensor sub-
space, respectively.
In classification, the class of the test sample is deter-
mined by the nearest neighbor classifier using Euclidean
distance Dis between the vectorization of the test sample v0
and the vectorization of each candidate training sample
DisðXm;X0Þ ¼ argmini
DisðvecðYiÞ;vecðY0ÞÞ
¼ jjvecðXm
YN
n¼1
�nU nð ÞTÞ� vecðX0
YN
n¼1
�nU nð ÞTÞjjF ð15Þ
where vecð Þ denotes tensor vectorization with preserved
optimal dimension which can be determined by
experiments.
3.3 Initialization, convergence, and termination
This section discusses MMPTR design issue, such as its
initialization, convergence, and termination conditions.
3.3.1 Initialization
Due to the tensorial nature of the proposed method,
solving the projection vector in one mode requires the
projection vectors in all the other (n-1) modes. The
experimental results of Ref. [18] indicated that initializing
each mode projection vector to the all-ones vector gives
stable results. Therefore, this paper employs this uniform
initialization.
3.3.2 Convergence
Given each mode projection vector initialized by all-ones
vector, the alternating projection generates a sequence of
items f ~U nð Þk ; k ¼ 1; . . .;Kg via maximizing gð ~U 1ð Þ
k ;
~U2ð Þk ; . . .; ~U
nð Þk Þ ¼ Ui � 2Uw. The objective function is
nondecreasing, i.e., gð ~U 1ð Þk ; ~U
2ð Þk ; . . .; ~U
nð Þk Þ� gð ~U 1ð Þ
kþ1;~U
2ð Þk ;
. . .; ~Unð Þk Þ� gð ~U 1ð Þ
kþ1;~U
2ð Þkþ1; . . .;
~Unð Þk Þ� � � � � gð ~U 1ð Þ
kþ1;~U
2ð Þkþ1;
. . .; ~Unð Þkþ1Þ: We test the convergence of the MMPTR on
CASIA(B) gait database via ~Unð Þk � ~U
nð Þk�1
������F, where ~U
nð Þk
and ~Unð Þk�1 are, respectively, the n-mode transformation
matrices for the kth and (k–1)th iterations, and the change
in ~Unð Þk between two successive iterations converges to zero
for each n-mode after eight iterations, as can be seen in
Fig. 4.
3.3.3 Termination
The termination criterion is to be determined by the objective
function Ut � 2Uw. The iterative optimization procedure
terminates if ~Unð Þk � ~U
nð Þk�1
������F\gðnÞ,k ¼ 1; . . .;K, where gðnÞ
is a small predefined threshold for the n-mode. Also, the
termination criterion can be simply set to a maximum
number of iterations. In this paper, we set K = 10.
Fig. 4 Convergence test
Neural Comput & Applic (2016) 27:2629–2646 2635
123
3.4 Connections to other tensorial subspace
methods
In this section, we analyze MMPTR’s relation with MPCA
and GTDA.
3.4.1 Relation to MPCA
MPCA can be thought of as revealing the internal structure
of the data in a way that best explains the variance in the
tensorial data. Its objective is the determination of the N
projection matrices ~UðnÞ; n ¼ 1; . . .;N
that maximize
the total tensor scatterPM
i¼1 Yi � �Y�� ��2
F, where
�Y ¼ 1=MPM
i¼1 Yi. When wij ¼ 1 for all i and j, and wðiÞkl ¼
1 for all k and l, MMPTR is reduced to MPCA. Compared
with MMPTR, MPCA builds a global graph in which each
tensorial data point is connected with the remaining points.
Therefore, MPCA preserves the global structural informa-
tion of the dataset.
3.4.2 Relation to GTDA
GTDA preserves the discriminative information in the
training tensors. The objective function of GTDA is the
determination of the N projection matrices
~UðnÞ; n ¼ 1; . . .;N
that maximize the differential scatter
discriminant criterionPc
i¼1 Mi�Yi � �Y
�� ��2F�1
PMj¼1 Yj�
���Yi;jk2F , where 1 is the Lagrange multiplier, �Yi is the average
tensor of embedded samples belonging to class i, and �Yi;j is
the average tensor of embedded sample j belonging to class
i. Supposing (1) wij ¼ 1=Mhif and only if i and j belong to
class h, otherwise, wij ¼ 0, (2) wðiÞkl ¼ 1=Mi
then MMPTR is
reduced to GTDA. When all of the tensorial data points are
used as the vertices of the graph, W is the similar weight of
all connection in this graph for GTDA. Therefore, GTDA
preserves the global structural information and discrimi-
nant information of the dataset.
3.5 Computational complexity
For simplicity, we assume that I1 ¼ I2 ¼ � � � ¼ IN ¼ I and
M is the number of training samples. From the previous
work, we know that its computational complexity mainly
lies in computing tensor projections, and its time com-
plexity is OðKNI3Þ, where K is the loop number to make
the optimization procedure of MMPTR converge.
4 Experimental results
In this section, we first briefly describe the CASIA(B) gait
database, TUM GAID gait database, and CASME micro-
expression database, then explore the performance of the
proposed MMPTR for the direct classification and classi-
fication after tensor vectorization, and lastly compare it
with MPCA, GTDA, and DTSA for gait recognition and
micro-expression recognition.
4.1 Experimental data
The experimental analysis was conducted on the gait
recognition and micro-expression recognition from the
following three public databases.
The first database is CASIA(B) gait database [45],
which includes a total of 124 individuals. There are six
normal gait sequences recorded at a resolution of
640 9 480 pixels with a frame rate of 25 fps. The gait
period was detected based on the dual-ellipse fitting
approach [46]. Each image from gait sequences was resized
to 64 9 64 pixels, and the silhouette was also centered.
The sample images of one individual are shown in Fig. 5.
The second database is TUM GAID gait database [47,
48], which contains RGB video, depth, and audio. There
Fig. 5 Sample images of one
individual from
CASIA(B) database
2636 Neural Comput & Applic (2016) 27:2629–2646
123
are 305 people in total, and this gait database is one of the
largest to date. For the recording, the Microsoft Kinect
sensor was used. This sensor provides a video stream, a
depth stream, and four-channel audio. Six normal gait
sequences per person are recorded at a resolution of
640 9 480 pixels at a frame rate of approximately 30 fps
(slightly varying) for both video and depth. The depth
resolution is on the order of 1 cm. For depth acquisition,
the sensor sends beams of infrared light and infers the
depth from reflections on the objects. Therefore, placing
the sensor outside is not possible, since infrared light from
the sun can interfere with the depth sensor. The four-
channel audio is sampled with 24 bit at 16 kHz. The gait
period was detected according to the layered coding of
depth information [49]. To be specific, the grayscale value
of each depth image frame was extracted after the back-
ground subtraction. Then, the grayscale layered processing
of depth image was operated according to the predefined
thresholds, and all layer information was quantized and
coded uniformity. For the gait fluctuation, we constructed a
new signal which is the sum of coded pixels points over
time. Finally, the gait period was detected based on the
points with minimum value of the smoothed signal. The
sample images of one individual are shown in Fig. 6.
The third database is CASME database [50], which
contains 195 micro-expressions recorded at a resolution of
640 9 480 and 1280 9 720 pixels at 60 fps. These sam-
ples were selected from more than 1500 elicited facial
movements. There is no need to separate single indepen-
dent emotion sequence any more, because the onset, apex
and offset frames, marked action units (AUs) of micro-
expression sequences have been coded in the database. The
emotions of disgust, repression, surprise, and tense are
labeled. The sample images of one micro-expression
sequence are shown in Fig. 7.
In the recognition issue, gait recognition and micro-
expression recognition have some common features. For
example, they both can be represented as a tensor sample
with three modes, say row, column, and time, and their
recognition performances will be affected by various
numbers of frames for a sample. For gait, differences in
walking speed lead to the variance in the frame number.
Figure 8 shows that frames of No. ‘‘1’’,‘‘2’’,…,‘‘6’’ are
known, but the frame of No. ‘‘?’’ is unknown. It also
shows the interpolated images and difference images
between computed images interpolated by the two adja-
cent frames of ‘‘?’’ (such as ‘‘1’’ and ‘‘2’’ or
‘‘3’’,‘‘4’’,…,‘‘6’’) and their true images. From the total
mean (TM) grayscale value of the difference image, we
can see that the estimation error is minimal when using
the two nearest neighbor frames. Therefore, the interme-
diate frames are estimated by their two nearest neighbor
frames. In addition, the beginning and final frames for a
gait tensor are the same as the starting and ending ones
extracted from the original gait period.
For micro-expression, Shen et al. [51] have investigated
the effects of expression duration on micro-expression
recognition. So, this paper will present the frame number
normalization approach as follows:Fig. 6 Sample images of one individual from TUM GAID database
Fig. 7 Sample images of one micro-expression sequence from CASME database
Neural Comput & Applic (2016) 27:2629–2646 2637
123
Given a sequence fSi 2 Rm�n; i ¼ 1; . . .;Pg and its
normalized sequence fS0j 2 Rm�n; j ¼ 1; . . .;Qg, where P is
the original number of frames, Q is the normalized number
of frames, and m and n denote the row and column of
image, respectively, and the linear interpolation compres-
sion rate rate can be defined as the ratio of the frame
interval after the interpolation to the one before
rate ¼ ðQ� 1Þ=ðP� 1Þ ð16Þ
The normalized first frame and last frame are stipulated
as
S01 ¼ S1; S0Q ¼ SP ð17Þ
If the relationships ði� 1Þ � rate\ ¼ j and i� rate[ j
were met for any normalized frame j and any original
frame i, the interpolation coefficients a and b can be
computed as
a ¼ ði� 1Þ � rate� ðj� 1Þ=ratej j ð18Þb ¼ ði� 2Þ � rate� ðj� 1Þ=ratej j ð19Þ
Thus, the image of normalized frame j can be expressed
as
S0j ¼ a� Si�1 þ b� Si ð20Þ
Through a vast number of experiments on the recogni-
tion performance assessment under various frame numbers,
Figs. 9, 10, and 11 show their optimal results interpolated
into 23 frames, 26 frames, and 64 frames for the sequences
corresponding to Figs. 5, 6, and 7, respectively.
4.2 Experiments on the CASIA(B) gait database
All the gait tensor samples can be expressed to be of size
64 9 64 9 23 through the proposed frame number nor-
malization approach. Three samples of each individual are
selected randomly and used for training, and the remainder
is used for testing. Experiments are conducted to test the
average recognition rate (ARR) across 30 random realiza-
tions of the training set.
4.2.1 Preprocessing step
Because of a large number of interrelated variables existing
in the tensor sample sets, we firstly employ MPCA to
reduce the dimensionality and retain most of the original
data variation. Figure 12 shows the ARR and average fea-
ture dimension (Dim) used for classification with the testQ
for MPCA in each mode ranging from 40 % to 98 %.
Results on the testQ = 90 % are significantly better than
the results on other testQs, indicating that MPCA with
testQ = 90 % can be followed by the proposed MMPTR.
Thus, all the gait tensor samples can be reduced to 8 %
ðð19� 27� 15Þ=ð64� 64� 23Þ = 8%Þ of original size:
P1 ¼ 19, P2 ¼ 27, and P3 ¼ 15 by MPCA, which is a very
important preprocessing step for the CASIA(B) gait
database.
4.2.2 Direct classification versus classification after tensor
vectorization
For the direct classification by using Eq. (13), the proposed
MMPTR can yield the best ARR of 77.7 %. However, the
proposed MMPTR following MPCA can obtain more
encouraging results. The AAR result for the post-tensor
vectorization classification of the proposed MMPTR with
various testQs following MPCA for further data reduction
Fig. 8 Example of error discussion
Fig. 9 Optimal interpolated
result with 23 frames for
CASIA(B) database
2638 Neural Comput & Applic (2016) 27:2629–2646
123
Fig. 10 Optimal interpolated
result with 26 frames for TUM
GAID database
Fig. 11 Optimal interpolated result with 64 frames for CASME database
Fig. 12 ARR and average feature dimension (Dim) by using MPCA
on CASIA(B) database
85 90 95 100
0.4
0.5
0.6
0.7
0.8
0.9
1
testQ(%)
AR
R
Fig. 13 ARR by using MPCA followed by the proposed MMPTR on
CASIA(B) database
Neural Comput & Applic (2016) 27:2629–2646 2639
123
is depicted in Fig. 13. In particular, a first large number of
features are useful, while beyond testQ = 92 %, the per-
formance varies very slowly with an increased testQ. We
observe that the best ARR of 94.9 % appears when
testQ = 96 %.
4.2.3 Comparison with existing methods
In this subsection, we compare our method with MPCA
[11], MPCA ? GTDA [15], and MPCA ? DTSA [25].
Table 2 presents the ARRs for our MPCA ? MMPTR
method compared against MPCA, MPCA ? GTDA, and
MPCA ? DTSA in both direct classification and classifi-
cation after tensor vectorization for the tensor DR issue.
We can see that the classification after tensor vectorization
improves the performance of direct classification signifi-
cantly. By reordering and selecting the feature preserved,
the feature size for the classification after tensor vector-
ization is also decreased considerably compared with the
direct classification. Since MPCA can be seen as the pre-
processing step, the tensor size of MPCA is larger than
MPCA ? GTDA, MPCA ? DTSA, and MPCA ?
MMPTR in the direct classification.
Figure 14 shows the plot of the ARR versus the number
of features used ranging from 10 to 370 for the MPCA,
MPCA ? GTDA, MPCA ? DTSA, and our method in the
classification after tensor vectorization. As can be seen, our
MPCA ? MMPTR method achieves the ARR of 94.9 %,
and outperforms the other three methods on larger
dimensions ranging from 205 to 260. MPCA performs the
second best with more preserved features. MPCA ? DTSA
is inferior to MPCA slightly, but needs fewer preserved
features.
Then, we use rank order statistic to evaluate the pro-
posed method. This is defined as the cumulative probability
that the actual class of a test measurement is among its
k top matches, where k is called the rank. These perfor-
mance statistics are reported as cumulative match scores
(CMS), and it can be effective to characterize features’
filter capability. According to the number of preserved
features selected as in Table 2, Fig. 15 shows the CMS for
ranks up to 10 of our MPCA ? MMPTR method also
compared with the aforementioned three methods. As can
be seen, our MPCA ? MMPTR method outperforms the
other three ones.
Finally, we design and conduct an experiment to check
the actual running time of each method. The experimental
platform is a workstation equipped with a 16 GB RAM and
hexa-core 3.47-GHz Intel(R) Xeon(R) CPU. Table 3 shows
the results, which indicates that the testing time consumed
Table 2 Recognition results on
the CASIA(B) gait databaseMethod Direct classification Classification after tensor vectorization
Tensor size ARR (%) ARR (%) Number of feature preserved
MPCA 19 9 27 9 15 74.2 92.7 330
MPCA ? GTDA 12 9 14 9 3 77.7 90.6 100
MPCA ? DTSA 15 9 22 9 12 68.8 92.2 30
MPCA ? MMPTR 18 9 24 9 14 77.7 94.9 245
Bold values indicate the best results
0 50 100 150 200 250 300 350 4000.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Features used
AR
R
MPCAMPCA+GTDAMPCA+DTSAMPCA+MMPTR
Fig. 14 ARR versus the number of features used on
CASIA(B) database
1 2 3 4 5 6 7 8 9 100.9
0.91
0.92
0.93
0.94
0.95
0.96
0.97
0.98
0.99
Rank
CM
SMPCAMPCA+GTDAMPCA+DTSAMPCA+MMPTR
Fig. 15 Recognition performance in terms of rank order statistics on
CASIA(B) database
2640 Neural Comput & Applic (2016) 27:2629–2646
123
by each method is almost the same. Nonetheless,
MPCA ? DTSA and MPCA ? MMPTR need more
training time than MPCA and MPCA ? GTDA. The main
reason is threefold: First, MPCA combined with other
algorithms requires optimizing N more transformation
matrices; second, Laplacian scatters in both MMPTR and
DTSA take much longer time to develop than no-Laplacian
scatters in the MPCA and GTDA; third, the testing time
hinges on both projection calculation and the size of final
feature used for classification. We can also observe that
compared with direct classification proposal, a little more
time is spent during the training process for the
Table 3 Time consumed (/s)
during the whole training and
testing processing on the
CASIA(B) gait database
Method Direct classification Classification after tensor vectorization
Training Testing Training Testing
MPCA 34.0 1.3 41.1 0.9
MPCA ? GTDA 78.3 1.2 86.3 0.7
MPCA ? DTSA 1410.4 1.4 1415.9 0.6
MPCA ? MMPTR 1337.6 1.5 1344.3 0.8
Fig. 16 ARR and average feature dimension (Dim) by using MPCA
on TUM GAID database
91 92 93 94 95 96 97 98 99 1000
AR
R
testQ(%)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 17 ARR by using MPCA followed by the proposed MMPTR on
TUM GAID database
0 100 200 300 400 500 600
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of Features used
AR
R
MPCAMPCA+GTDAMPCA+DTSAMPCA+MMPTR
Fig. 18 ARR versus the number of features used on TUM GAID
database
1 2 3 4 5 6 7 8 9 100.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
Rank
CM
S
MPCAMPCA+GTDAMPCA+DTSAMPCA+MMPTR
Fig. 19 Recognition performance in terms of rank order statistics on
the TUM GAID database
Neural Comput & Applic (2016) 27:2629–2646 2641
123
classification after tensor vectorization proposal. This is
because it needs to spend extra time on tensor’s rear-
rangement into a vector according to the class discrim-
inability, which is with a little calculated amount for the
workstation. Though our proposed MPCA ? MMPTR
method in general exchanges training speed for precision,
its recognition decision time is quite short.
4.3 Experiments on the TUM GAID gait database
In this gait database experiments, all the gait tensor samples
are normalized to 64 9 44 9 26 pixels. For each individual,
four samples are randomly selected for training and the rest
are used for testing. We report the results, also including
ARR and average feature dimension (Dim) used for classi-
fication over 30 random splits. We also provide comparisons
with MPCA, MPCA ? GTDA, and MPCA ? DTSA.
4.3.1 Preprocessing step
We use MPCA to reduce the dimensionality. Figure 16
shows the ARR and average feature dimension (Dim) used
for classification with the testQ for MPCA in each mode
ranging from 40 % to 98 %. We can see that both
testQ = 89 % and testQ = 97 % are better than the results
on other testQs. It is worth noting that the larger testQ
produces larger dimension feature which may benefit
classification; therefore, MPCA with testQ = 97 % can be
chosen as the preprocessing step for the proposed MMPTR.
Table 4 Recognition results on
the TUM GAID databaseMethod Direct classification Classification after tensor vectorization
Tensor size ARR (%) ARR (%) Number of features preserved
MPCA 41 9 23 9 21 68.0 93.3 80
MPCA ? GTDA 23 9 15 9 14 68.3 91.3 40
MPCA ? DTSA 40 9 22 9 21 55.3 81.3 50
MPCA ? MMPTR 39 9 20 9 20 74.3 94.3 70
Bold values indicate the best results
Table 5 Time consumed (/s)
during the whole training and
testing processing on the TUM
GAID database
Method Direct classification Classification after tensor vectorization
Training Testing Training Testing
MPCA 61.2 42.2 75.5 0.8
MPCA ? GTDA 168.9 73.2 170.0 1.2
MPCA ? DTSA 1752.6 82.5 1754.5 1.3
MPCA ? MMPTR 1643.7 81.7 1645.2 1.4
Fig. 20 ARR and average feature dimension (Dim) by using MPCA
on CASME database
86 88 90 92 94 96 98 100
0.4
0.5
0.6
0.7
0.8
0.9
1
testQ(%)
AR
R
Fig. 21 ARR by using the proposed MMPTR on CASME database
2642 Neural Comput & Applic (2016) 27:2629–2646
123
Thus, all the gait tensor samples can be reduced to
P1 ¼ 41, P2 ¼ 23, and P3 ¼ 21 by MPCA. In Fig. 17, we
show the ARR results by using MPCA followed by the
proposed MMPTR, for the horizontal axis which indicates
the testQ of MMPTR, and the top AAR of 94.3 % appears
when testQ = 99 %.
4.3.2 Direct classification v.s. classification after tensor
vectorization
Our MPCA ? MMPTR method based on direct classifi-
cation with 39 9 20 9 20-dimension features and classi-
fication after tensor vectorization with 70-dimension
features achieves the ARR of 74.3 % and 94.3 %, respec-
tively. From the results, we see that classification after
tensor vectorization outperforms direct classification.
4.3.3 Comparison with existing methods
We compare our methods with MPCA, MPCA ? GTDA,
and MPCA ? DTSA. In Fig. 18, the ARR with different
numbers of features is drawn based on classification after
tensor vectorization. We can see that our MPCA ? MMPTR
method outperforms the other three methods on several
dimensions ranging from 50 to 400. The ARR achieved by
MPCA with 80-dimension features and MPCA ? GTDA
with 40-dimension features is 93.3 % and 91.3 %, respec-
tively. The ARR of MPCA ? DTSA with 50-dimension
features is only 81.3 %, much lower thanMPCA.We present
the results obtained based on direct classification and clas-
sification after tensor vectorization in Table 4.
Besides, according to the number of preserved feature
selected as Table 4, we plot the CMS results for ranks up to
10 of different methods in Fig. 19. From the results, we can
see that our method outperforms MPCA, MPCA ? GTDA,
and MPCA ? DTSA. In addition, the CPU time consumed
by each method on the TUM GAID gait database is tested
and given in Table 5. The results suggest that the compared
time-consumed performance is similar to the
CASIA(B) gait database, however, the TUM GAID gait
database needs more training and testing time except for
MPCA’s testing for classification after tensor vectorization
proposal than the CASIA(B) one. The TUM GAID gait
database is larger than the CASIA(B) one, and as a result,
most of methods dealing with the TUM GAID gait data-
base require more time. In terms of MPCA’s testing for
classification after tensor vectorization proposal, the final
feature used for classification is shorter in the TUM GAID
gait database than in the CASIA(B) gait database, there-
fore, MPCA’s testing time is less (Table 5).
Table 6 Recognition results on
the CASME databaseMethod Direct classification Classification after tensor vectorization
Tensor size ARR (%) ARR (%) Number of features preserved
MPCA 42 9 42 9 43 42.5 47.6 60
GTDA 59 9 59 9 60 43.3 79.3 20
DTSA 59 9 61 9 53 48.6 52.0 100
MMPTR 64 9 64 9 64 49.5 80.2 25
Bold values indicate the best results
0 50 100 150 200 250 300 350 400 450 5000.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
Number of Features used
AR
R
MPCAGTDADTSAMMPTR
Fig. 22 ARR versus the number of features used on the CASME
database
1 2 3 4 5 6 7 8 9 100.4
0.5
0.6
0.7
0.8
0.9
1
Rank
CM
S
MPCAGTDADTSAMMPTR
Fig. 23 Recognition performance in terms of rank order statistics on
the CASME database
Neural Comput & Applic (2016) 27:2629–2646 2643
123
4.4 Experiments on the CASME micro-expression
database
All the micro-expression samples are normalized to a
third-order tensor of 64 9 64 9 64 in this experiment.
For each micro-expression, 15 samples are randomly
selected for training and the rest are used for testing. We
report the results, also including ARR and average fea-
ture dimension (Dim) used for classification over 30
random splits.
4.4.1 Preprocessing step
In this subsection, we first estimate whether MPCA as a
preprocessing step will work or not. Figure 20 shows the
plots of the ARR and average feature dimension (Dim)
versus testQ for MPCA. As can be seen, the perfor-
mance of the MPCA varies with the testQ, and the
highest AAR is achieved when testQ = 100 %; that is to
say, there is no sense in taking MPCA as a prepro-
cessing step for the micro-expression task. The reason
could be due to the fact that the detailed and tiny fea-
tures also play a role in recognizing the micro-expres-
sion. Therefore, we will compare MMPTR with the
three tensor-based algorithms MPCA, GTDA, and
DTSA in the subsequent subsection.
4.4.2 Direct classification v.s. classification after tensor
vectorization
The AAR for MMPTR is enhanced from 49.45 % to
80.18 % when direct classification is replaced with clas-
sification after tensor vectorization. In Fig. 21, we show the
ARR results of the proposed MMPTR based on classifi-
cation after tensor vectorization, for the horizontal axis
which indicates the testQ of MMPTR, and the top AAR of
80.2 % appears when testQ = 100 %. The feature dimen-
sions for MMPTR method based on direct classification
and classification after tensor vectorization are
64 9 64 9 64 and 25, respectively. From the results, we
can observe that classification after tensor vectorization
outperforms direct classification (Table 6).
4.4.3 Comparison with existing methods
We compare our methods with MPCA, GTDA, and DTSA.
Both ARR and CMS are reported in Figs. 22 and 23, and
the tensor size as well as ARR for the direct classification
and number of features preserved for the classification after
tensor vectorization is listed in Table 6. Figure 22
demonstrates that starting from 10 features, DTSA gives
better recognition performance than all the other three
algorithms. From Fig. 23, we see that DTSA again out-
performs MPCA, GTDA, and DTSA, furthermore, when a
list of top-10 possible identifications of the tested are found
out, DTSA performs with 98.7 % accuracy. We also
compare the CPU time consumption of these four methods,
and Table 7 lists the time-consumed results. What is more,
similar results arise as both CASIA(B) and TUM GAID
gait databases. So far, the proposed MMPTR for micro-
expression recognition, to our best knowledge, has been the
state-of-the-art method in the aspects of degree of accuracy
and rapid identification.
5 Conclusion
This work effectively proposes a novel tensor subspace
analysis algorithm, named MMPTR for gait recognition
and micro-expression recognition. By finding N transfor-
mation matrices for N-order tensor data through maxi-
mizing the inter-class Laplacian scatter and meanwhile
minimizing the intra-class Laplacian scatter, we can extract
discriminative and geometry-preserving features for
recognition. Our results show that the proposed MMPTR
using these features produces better recognition perfor-
mance than MPCA, GTDA, and DTSA.
Acknowledgments We sincerely thank the Institute of Automation
Chinese Academy of Sciences for granting us permission to use the
CASIA(B) gait database, and thank the Institute for Human–Machine
Communication, Technische Universitat Munchen for granting us
permission to use the TUM GAID database, and also thank the
Institute of Psychology, Chinese Academy of Sciences for granting
us permission to use the CASME database. This project is supported
by the Natural Science Foundation of China (Grant Nos.
61201370, 61571275, and 61571274), the Specialized Research Fund
Table 7 Time consumed (/s)
during the whole training and
testing processing on the
CASME database
Method Direct classification Classification after tensor vectorization
Training Testing Training Testing
MPCA 2.6 4.7 5.7 0.5
GTDA 23.0 21.5 23.2 0.9
DTSA 302.1 20.6 304.1 1.2
MMPTR 203.1 22.3 204.2 0.9
2644 Neural Comput & Applic (2016) 27:2629–2646
123
for the Doctoral Program of Higher Education of China (Grant No.
20120131120030), the Independent Innovation Foundation for Post-
doctoral Scientists of Shandong Province (Grant No. 201303100), the
Special Financial Program of China Post-doctoral Science Foundation
(Grant No. 2014T70636), the Key Laboratory of Intelligent Percep-
tion and Systems for High-Dimensional Information, Ministry of
Education (Grant No. 30920140122006), the Shandong Provincial
Natural Science Foundation, China (Grant Nos. ZR2014FM030 and
ZR2013FM32), and the Young Scholars Program of Shandong
University (Grant No. 2015WLJH39).
References
1. Daniel E, Lars H, Bernd H (2011) A survey of dimension reduction
methods for high-dimensional data analysis and visualization. In:
Visualization of large and unstructured data sets, pp 135–149
2. Andrew RW (1995) Multidimensional scaling by iterative
majorization using radial basis functions. Pattern Recogn
28:753–759
3. Tenenbaum JB, Silva V, Langford JC (2000) A global geometric
framework for nonlinear dimensionality reduction. Science
290(5500):2319–2323
4. Roweis ST, Lawrence KS (2000) Nonlinear dimensionality
reduction by locally linear embedding. Science 290:2323–2326
5. Kim M, Pavlovic V (2011) Central subspace dimensionality
reduction using covariance operators. IEEE Trans Pattern Anal
Mach Intell 33:657–670
6. Yang W, Wang Z, Sun C (2015) A collaborative representation
based projections method for feature extraction. Pattern Recogn
48:20–27
7. Wang R, Shan S, Chen X, Chen J, Gao W (2011) Maximal linear
embedding for dimensionality reduction. IEEE Trans Pattern
Anal Mach Intell 33:1776–1792
8. He XF, Cai D, Niyogi P (2005) Tensor subspace analysis. In:
Advances in neural information proceeding systems 18, Van-
couver, Canada
9. Vasilescu MAO, Terzopoulos D (2005) Multilinear independent
components analysis. IEEE Comput Soc Conf Comput Vision
Pattern Recogn 1:547–553
10. Raj RG, Bovik AC (2010) A fast multilinear ICA algorithm. In:
17th IEEE international conference on image processing (ICIP),
pp 1889–1892
11. Lu H, Plataniotis KN, Venetsanopoulos AN (2008) MPCA:
multilinear principal component analysis of tensor objects. IEEE
Trans Neural Netw 19:18–39
12. Panagakis Y, Kotropoulos C, Arce GR (2010) Non-negative
multilinear principal component analysis of auditory temporal
modulations for music genre classification. IEEE Trans Audio
Speech Lang Process 18:576–588
13. Yan S, Xu D, Yang Q, Zhang L, Tang X, Zhang HJ (2007)
Multilinear discriminant analysis for face recognition. IEEE
Trans Image Process 16:212–220
14. Yan S, Xu D, Yang Q, Zhang L, Tang X, Zhang HJ (2005)
Discriminant analysis with tensor representation. IEEE Comput
Soc Conf Comput Vision Pattern Recogn 1:526–532
15. Tao D, Li X, Wu X, Maybank SJ (2007) General tensor dis-
criminant analysis and gabor features for gait recognition. IEEE
Trans Pattern Anal Mach Intell 29:1700–1715
16. Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2015)
Compound Rank-k Projections for Bilinear Analysis. IEEE Trans
Neural Netw Learn Syst. doi:10.1109/TNNLS.2015.2441735
17. Zhang L, Zhang L, Tao D, Huang X (2013) Tensor discriminative
locality alignment for hyperspectral image spectral-spatial feature
extraction. IEEE Trans Geosci Remote Sens 51:242–256
18. Lu H, Plataniotis KN, Venetsanopoulos AN (2009) Uncorrelated
multilinear principal component analysis for unsupervised mul-
tilinear subspace learning. IEEE Trans Neural Netw 20:
1820–1836
19. Ben XY, Jiang MY, Yan R et al (2015) Orthogonal multilinear
discriminant analysis and its subblock tensor analysis version.
Optik Int J Light Electron Opt 126:361–367
20. Lu J, Tan YP (2009) Uncorrelated multilinear geometry pre-
serving projections for multimodal biometrics recognition. In:
IEEE international symposium on circuits and systems,
pp 2601–2604
21. Li X, Lin S, Yan S, Xu D (2008) Discriminant locally linear
embedding with high-order tensor data. IEEE Trans Syst Man
Cybern B Cybern 38:342–352
22. Zheng D, Du X, Cui L (2010) Tensor locality preserving pro-
jections for face recognition. In: 2010 IEEE international con-
ference on systems man and cybernetics (SMC), pp 2347–2350
23. Lu J, Tan YP (2008) Enhanced face recognition using tensor
neighborhood preserving discriminant projections. In: 15th IEEE
international conference on image processing, pp 1916–1919
24. Lu J, Wang G, Tan YP (2011) Multilinear locality preserving
canonical correlation analysis for face recognition, information.
In: 8th international conference on communications and signal
processing (ICICS), pp 1–5
25. Wang SJ, Chen HL, Yan WJ, Chen YH, Fu XL (2014) Face
recognition and micro-expression recognition based on discrim-
inant tensor subspace analysis plus extreme learning machine.
Neural Process Lett 39(1):25–43
26. Han XH, Chen YW, Ruan X (2012) Multilinear supervised
neighborhood embedding of a local descriptor tensor for scene/
object recognition. IEEE Trans Image Process 21:1314–1326
27. Liu Y, Liu Y, Chan KCC (2010) Tensor distance based multi-
linear locality-preserved maximum Information embedding.
IEEE Trans Neural Netw 21:1848–1854
28. Liu Y, Liu Y, Chan KCC (2009) Multilinear isometric embedding
for visual pattern analysis. In: IEEE 12th international conference
on computer vision workshops (ICCV Workshops), pp 212–218
29. Liu M, Chowdhury AKR (2010) Multilinear feature extraction
and classification of multi-focal images, with applications in
nematode taxonomy, 2010. In: IEEE conference on computer
vision and pattern recognition (CVPR), pp 2823–2830
30. Feng ZH, Kittler J, Christmas W, Wu XJ, Pfeiffer S (2012)
Automatic face annotation by multilinear AAM with missing
values. In: 2012 21st international conference on pattern recog-
nition (ICPR), pp 2586–2589
31. Hu H (2013) Enhanced gabor feature based classification using a
regularized locally tensor discriminant model for multiview gait
recognition. IEEE Trans Circuits Syst Video Technol 23:
1274–1286
32. Ben XY, Xu S, Wang KJ (2012) Review on pedestrian gait
feature expression and recognition. Pattern Recog Artif Intell
25:71–81
33. Yu S, Wang L, Huang K et al. (2004) Gait analysis for human
identification in frequency domain. In: Proceedings of the 3rd
international conference on image and graphics, pp 282–285
34. Boulgouris NV, Plataniotis KN, Hatzinakos D (2006) Gait
recognition using linear time normalization. Pattern Recogn
39:969–979
35. Chen C, Liang J, Zhu X (2011) Gait recognition based on
improved dynamic bayesian networks. Pattern Recogn
44:988–995
36. Wang L, Hu WM, Tan TN (2003) Gait-Based human identifi-
cation. Chin J Comput 26:353–360
37. Goffredo M, Carter JN, Nixon MS (2008) Front view gait
recognition. In: Proceedings of the 2nd IEEE international con-
ference on biometrics: theory, applications and systems, pp 1–6
Neural Comput & Applic (2016) 27:2629–2646 2645
123
38. Urtasun R, Fua P (2004) 3D tracking for ait characterization and
recognition. In: Proceedings of the sixth IEEE international
conference on automatic face and gesture recognition, pp 17–22
39. Ben XY, Xu S, Wang KJ (2012) Research on gait recognition
based on trace transform. J Jilin Univ (Engineering and Tech-
nology Edition) 42:1–5
40. Toby HWL, Cheung KH, James NKL (2011) Gait flow image: a
silhouette-based gait representation for human identification.
Pattern Recogn 44:973–987
41. Wang KJ, Ben XY, Liu LL et al (2009) Gait recognition using
information fusion of energy. J Huazhong Univ Sci Technol
37:14–17
42. Owayjan M, Kashour A, Haddad NA, Fadel M, Souki GA (2012)
The design and development of a lie detection system using facial
micro-expressions. In: 2012 2nd international conference on
advances in computational tools for engineering applications
(ACTEA), pp 33–38
43. Pfister T, Li X, Zhao G, Pietikainen M (2011) Recognising
spontaneous facial micro-expressions. In: 2011 IEEE interna-
tional conference on computer vision (ICCV), pp 1449–1456
44. Yang W, Wang J, Ren M, Yang J (2009) Feature extraction based
on laplacian bidirectional maximum margin criterion. Pattern
Recogn 42:2327–2334
45. Yu S, Tan D, Tan T (2006) A framework for evaluating the effect
of view angle, clothing and carrying condition on gait recogni-
tion. In: Proceedings of the 18th international conference on
pattern recognition, Hong Kong, China, pp 441–444
46. BenXY,MengWX,YanR (2012) Dual-ellipse fitting approach for
robust gait periodicity detection. Neurocomputing 79:173–178
47. Hofmann M, Geiger J, Bachmann S, Schuller B, Rigoll G (2013)
The TUM gait from audio, image and depth (GAID) database:
multimodal recognition of subjects and traits. In: Journal of visual
communication and image representation, special issue on visual
understanding and applications with rgb-d cameras, Elsevier
48. Hofmann M, Bachmann S, Rigoll G (2012) 2.5D gait biometrics
using the depth gradient histogram energy image. In: IEEE fifth
international conference on biometrics: theory, applications and
systems (BTAS 2012), Washington, DC, USA, pp 23–26
49. Zhang HL, Ben XY, Zhang P, Liu TJ (2014) Gait period detec-
tion based on layered coding of depth information. In: The 2014
international conference on mechanical engineering and intelli-
gent systems, Qingdao, China
50. Yan WJ, Wu Q, Liu YJ, Wang SJ, Fu XL (2013) CASME
database: A dataset of spontaneous micro-expressions collected
from neutralized faces. In: 2013 10th IEEE international con-
ference and workshops on automatic face and gesture recognition
(FG), pp 1–7
51. Shen X, Wu Q, Fu X (2012) Effects of the duration of expressions
on the recognition of micro-expressions. J Zhejiang Univ Sci B
13:221–230
2646 Neural Comput & Applic (2016) 27:2629–2646
123