Post on 04-Jul-2015
description
Linear Discriminative Image Processing Operator Analysis Toru Tamaki, Bingzhi Yuan, Kengo Harada, Bisser Raytchev, Kazufumi Kaneda
Most discriminative image processing operators (IPOs)
reco
gniti
on
Feat
ure
spac
e
LDA
clas
sifie
r
Generating matrices (image processing operators)
Goal
Motivation
Contribution
Find a most discriminative set of image processing operations for LDA.
For a small sample size problem, many studies use an approach to increase training samples by synthetically generating new training samples. But, HOW ?
Ad-hoc… discriminatively !
Simultaneous estimation of both LDA feature space and a set of discriminative generating matrices.
Linear IPO + LDA = LDA with increased samples
xj = Gjx
m0 = Gm
m
0i =
1
Jni
JX
j=1
X
x2Xi
Gjx =1
J
JX
j=1
Gjmi = Gmi
S0i = G (Si �Ri) G
T +1
J
JX
j=1
GjRiGTj
S0W = G (SW �RW ) GT +
1
J
JX
j
GjRWGjT
S0B = GSBG
T
Ri =1
ni
X
x2Xi
xx
T
RW =cX
i
Ri
X 0 = G (X �Rall) GT +
1
J
JX
j=1
GjRallGTj
eS0i =ATPTS0
iPA
eS0W =ATPTS0
WPA
eS0B =ATPTS0
BPA
yj =ATPTxj = ATPTGjx
m
0i =ATPT = ATPT Gmi
m
0 =ATPTm
0 = ATPT Gm
�PTS0
WP��1
PTS0BP
tr(eS0B)
tr(eS0W )
increased sample
Mean of class i for increased samples
Mean of all increased samples
an original sample
a generating matrix (an image processing operator)
average of image processing operations
Scatter matrix of class i for increased samples
Within-class scatter matrix for increased samples
Between-class scatter matrix for increased samples
are scatter matrices for original (non-increased) samples
scatter matrices Rayleigh quotient
Generalized Eigenvalue problem
Training sample
Scatter matrices
PCA
LDA Feature space
PPCA projection matrix
Covariance matrix
SW , SB
Dimensionality reduction
Given , we don’t need to actually increase training samples. But, need more memory to store…
{Gj}
Analysis of IPO: the spectral decomposition
Definition 1 Let f(x), g(x) 2 L2(R2
) be complex-
valued 2D functions where x 2 R2. The inner
product is defined as
(f, g) ⌘Z
R2
f(x)g(x)dx,
where g is the complex conjugate of g.An operator G : f 7! g is linear if it satis-
fies G(af + bg) = aG(f) + bG(g), 8a, b 2 R.
G⇤is the adjoint operator of G if it satis-
fies (Gf, g) = (f,G⇤g).
Corollary 1 Filtering or geometric transformation opera-
tors G are normal operators which satisfy G⇤G = GG⇤.
G =X
�iPiA normal operator can be decomposed into projection operators!
(a) x (b) Gx (c) GTGx (d) GTx
x
E1x E2x E3x E4x E5x E6x
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
(a) H11, H21
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
(b) H12, H22
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
index
eige
n va
lue
100 200 300 400 500 600 700 800 900 1000
−0.1
−0.05
0
0.05
0.1
0.15
indexei
gen
valu
e
(c) H13, H23 P11ix P21ix P12ix P22ix P13ix P23ix
But, is it feasible for a generating matrix? Yes! Is a fltering Hermite?
||G�GT || < 10�6Almost symmetric
Is a geometric trans. Unitary? Transpose is apparently inverse
G = H1 + iH2, H1 =G+GT
2, H2 =
G�GT
2i
i =p�1
Are eigenvalues complex? Use Hermite decomposition.
So, two step approximation.
an operator two Hermite operators (which have real eigenvalues)
G = H1 + iH2, H1 =G+GT
2, H2 =
G�GT
2i
G 'X
j
ajEj =X
aj(H1j + iH2j) 'X
j
ajX
i
(�1jiP1ji + i�2jiP2ji)
Examples
Real eigenvalues can be small so that we can compress them. Eigenprojections of eigenoperators transorm images to … wavelets?
Eigenoperators transorm images to variants.
Q: To reduce the memory cost of generating matrices, can we use a decomposition for operators just like for images?
A: Yes.
LDA + IPO = LDIPOA: find a set of discriminative IPOs
G(k) = 1k+1
Pkl=0 G
(l) D = PA
eS0(k)W = DT G(k) (SW �RW ) G(k)TD +
1
k + 1
kX
l=0
DTG(l)RWG(l)TD
eS0(k)B = DT G(k)SBG
(k)TD
X 0(k) = G(k) (X �Rall) G(k)T +
1
k + 1
kX
l=0
G(l)RallG(l)T
S0(k)W = G(k) (SW �RW ) G(k)T +
1
k + 1
kX
l=0
G(l)RWG(l)T
S0(k)B = G(k)SBG
(k)T
Algorithm 1 LDIPOA
1: Compute PCA P and LDA A. G0 I.2: for k = 1, . . . , do3: repeat
4: ↵ step: ↵(k) = argmax↵ E(A,P,↵)5: PCA step: Compute P with ↵(k).6: LDA step: A = argmaxA E(A,P,↵(k))7: until E converges8: end for
alpha step
PCA step
LDA step
At each step k, estimate a single generating matrix represented as a linear combination.
G(k) =JX
j
↵(k)j Gj (↵(k)
1 ,↵(k)2 , . . . ,↵(k)
J )T = ↵(k)
Experiments with FERET dataset
The proposed algorithm iteratively estimates • α (coeffs. of generating matrices) • P (PCA) • A (LDA) at the same time.
k: the number of estimated generating matrices
10 generating matrices are used to increase the dataset 11 times.
1 generating matrix is used to increase the dataset double.
No generating matrices are used (normal LDA)
The Rayleigh quotient
xj = Gjx
Proposition 1 A filtering is defined as
Gf(x) =
ZG(x,y)f(y)dy,
where the kernel is symmetric G(x,y) = G(y,x) and real valued.
G is an Hermite operator which satisfies G⇤ = G.
Proposition 2 A geometric (a�ne) transformation G is defined
as
Gf(x) = |A|1/2f(Ax+ t),
where |A| 6= 0. G is a unitary operator which satisfies G⇤G = I.
real imag real imag real imag
Size of images: 32x32 Size of generating matrices: 1024x1024 Number of classes: 1001 (fa) Training images per class: 1 (fa) Test images per class: 1 (fb) Eigen-generating matrices: 96 Initial generating matrices: 567 (3 scaling, 7 rotations, 3 Gaussian and 9 motion blurs) Classifiers: nearest neighbor PCA rates: 80% and 95% for eigen-generating matrices (G-PCA) for PCA step (LDA-PCA)
Maximized in a few steps
A few generating matrices are enough to improve the performance.
Bad approximation of generating matrices do not lead to any improvement…
i = 1
i = 2
...