Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Semi-Supervised Facial ExpressionsAnnotation Using Co-Training with Fast
Probabilistic Tri-Class SVMs
Mohamed Farouk Abdel Hady, Martin Schels, FriedhelmSchwenker, Günther Palm
Institute of Neural Information ProcessingUniversity of Ulm, Germany
{mohamed.abdel-hady|friedhelm.schwenker|guenther.palm}@uni-ulm.de
September 12, 2010
1 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Semi-Supervised Learning
In many domains, the amount of training examples is largebut unlabeled.Data labeling process is often tedious, expensive andtime consuming because it requires the effort of humanexperts such as physicians, radiologists, chemist, etc.Research directions of SSL
Semi-Supervised ClusteringSemi-Supervised ClassificationSemi-Supervised RegressionSemi-Supervised Dimensionality Reduction
2 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
How can unlabeled data be helpful?
+
+
+
+
-
-
--
Figure: The unlabeled examples help to put thedecision boundary in low density regions. Using labeleddata only, the maximum margin separating hyperplane isplotted with the versicle dashed lines. Using bothlabeled and unlabeled data (dots), the maximum marginseparating hyperplane is plotted with the oblique solidlines.
3 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Co-Training with Tri-Class SVMs
ωk-v-ωh
...
...
Measure
Confidence
Select the most confident examples
{(xu(1), xu
(2), xu(3), Hkh(Xu))}
train
apply
refill
U
Lkh
U'
add
h2h1 h3
Hkh(xu)
xu(1) xu
(3)xu(2)
Hkh
Xu
ω1-v-ωKω1-v-ω2
...
......
...
ωK-1-v-ωK
Figure: Tri-Class Co-Training
4 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Bi-Class SVMs
2
||w ||2
b b +1 b -1
large margin
small margin large margin
small margin
ωh
ωk
y=1 y=3
fkh(x) = <w, ϕ (x)>
12‖w‖2 + C
nk +nh∑i=1
εi (1)
subject to the constraints
yi (〈w , φ(xi )〉 − b) ≥ 1− εi , εi ≥ 0, for i = 1, . . . , nk + nh (2)
5 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Tri-Class SVMs
2
||w ||2
2
||w ||2
b1 b1 +1 b1-1 b2 b2 +1 b2-1
large margin
small margin large margin
small margin
ωh
ωk
y=1 y=2 y=3
ϵi*3
ϵi2
fkh(x) = <w, ϕ (x)>
ϵi1
ϵi*2
minw,b1,b2,ε,ε
∗ΨP =
12‖w‖2 + C(
n1∑i=1
ε1i +
n2∑i=1
ε∗2i +
n2∑i=1
ε2i +
n3∑i=1
ε∗3i ) (3)
subject to
〈w , φ(x1i )〉 − b1 ≤ −1 + ε1i , ε1i ≥ 0 for i = 1, . . . , n1;
〈w , φ(x2i )〉 − b1 ≥ 1− ε∗2
i , ε∗2i ≥ 0 for i = 1, . . . , n2;
〈w , φ(x2i )〉 − b2 ≤ −1 + ε2i , ε2i ≥ 0 for i = 1, . . . , n2;
〈w , φ(x3i )〉 − b2 ≥ 1− ε∗3
i , ε∗3i ≥ 0 for i = 1, . . . , n3;
b1 ≤ b2
(4)
6 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Illustrative example for one-v-one Tri-Class SVMs
+-
x
-
- -
-
-
-
-
-
--
-
-
-
-
x
x
x
x
x
x x
x
x
x
x x x
x
+
+
+
+
+
+
+
+
+
+ +
+
++
ω1
ω2
ω3
(a) input space
+-
x
-
- -
-
-
-
-
-
--
-
-
-
-
x
x
x
x
x
x x
x
x
x
x x x
x
+
+
+
+
+
+
+
+
+
+ +
+
++
ω1
ω2
ω3
(b) Class ω1 against ω2
+-
x
-
- -
-
-
-
-
-
--
-
-
-
-
x
x
x
x
x
x x
x
x
x
x x x
x
+
+
+
+
+
+
+
+
+
+ +
+
++
ω1
ω2
ω3
(c) Class ω1 against ω3
+-
x
-
- -
-
-
-
-
-
--
-
-
-
-
x
x
x
x
x
x x
x
x
x
x x x
x
+
+
+
+
+
+
+
+
+
+ +
+
++
ω1
ω2
ω3
(d) Class ω2 against ω3
Figure: a linearly separable dataset with 45 examples.
7 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Probabilistic interpretation for the Tri-Class SVM output
We fit a sigmoid function on the SVM output where Eq. (6)represents the doubt that input example x belongs to ωk or ωh.
Pkh(y = 1|x) =
(1−
11 + exp(−(fkh(x)− b1))
); (5)
Pkh(y = 2|x) =
(1
1 + exp(−(fkh(x)− b1))
)(1−
11 + exp(−(fkh(x)− b2))
); (6)
Pkh(y = 3|x) =
(1
1 + exp(−(fkh(x)− b1))
)(1
1 + exp(−(fkh(x)− b2))
)(7)
8 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Decision Fusion for Ensemble of Probabilistic Tri-Class SVMs
Table: One-against-One Decision Profile of example x
ω1 ω2 ω3 ω4ω1 - P12(y = 3|x) P13(y = 3|x) P14(y = 3|x)ω2 P12(y = 1|x) - P23(y = 3|x) P24(y = 3|x)ω3 P13(y = 1|x) P23(y = 1|x) - P34(y = 3|x)ω4 P14(y = 1|x) P24(y = 1|x) P34(y = 1|x) -
Thus the final probabilistic output of One-against-Oneensemble of Tri-Class SVMs is defined as follows, for eachk = 1, . . . , K :
P(y = ωk |x) =
∑k−1h=1 Phk (y = 1|x) +
∑Kh=k+1 Pkh(y = 3|x)∑K
k′=1∑k′−1
h=1 Phk′ (y = 1|x) +∑K
h=k′+1 Pk′h(y = 3|x)(8)
9 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Facial Expressions Recognition
1 The Cohn-Kanade dataset is a collection of image sequences with emotionalcontent, which is available for research purposes.
2 It contains image sequences, which were recorded in a resolution of 640×480(sometimes 490) pixels with a temporal resolution of 33 frames per second.
3 Every sequence is played by an amateur actor who is recorded from a frontalview. The sequences always start with a neutral facial expression and end withthe full emotion.
(a) happiness (b) surprise (c) disgust (d) sadness
10 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Feature Extraction
Orientation Histogram or
Optical Flow Feature
extraction Algorithm
Video i
Video i
Video i
Training Videos GMM UBM
Initial Step :
MAP Adaptation:
GMM UBM
MAP Adaptation
Orientation Histogram or
Optical Flow Feature
extraction Algorithm
Video
Input Video
μ = [μ1 , ..., μM ]T
GMM Super Vector
SMO for Tri - Class SVM
EM Algorithm
Figure: Calculation of GMM Super Vectors that isperformed for each feature type
11 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Methodology
1 5 times of 8-fold cross validation2 Each test set has 44 videos (13, 11, 10 and 10 per class, respectively) while
each training set consists of 314 videos.3 10% of the training examples of each class are used in L (9, 8, 7 and 7,
respectively), while the remaining are in U.4 Three feature vectors (views) for Co-Training: the orientation histogram from the
mouth region (V1) and the optical flow features extracted from the full facialregion (V2) and from the mouth region (V3).
5 The supervectors are normalized to have zero mean and unit variance, in orderto avoid problems with outliers.
12 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
86.21
84.14
83.31
93.15
73.71
77.56
75.67
84.31
84.99
75.17
78.49
89.52
75.13
81.36
76.19
87.22
87.15
78.76
81.76
91.37
75.25
70.74
69.62
81.57
77.37
78.14
74.18
86.56
58.05
67.06
65.86
73.26
70.97
64.55
66.15
79.47
61.2
70.49
70.03
77.33
75.19
67.89
70.38
82.76
64.16
58.42
59.24
70.34
81.08
81.25
81.72
89.53
61.35
68.91
70.79
73.99
78.77
70.05
73.29
83.4
78.77
73.54
72.34
78.07
81.79
73.17
75.86
85.77
65.14
61.1
62.04
71.66
55 60 65 70 75 80 85 90 95
SVM(V1)
SVM(V2)
SVM(V3)
mvEns
SVM(V1)
SVM(V2)
SVM(V3)
mvEns
SVM(V1)
SVM(V2)
SVM(V3)
mvEns
SVM(V1)
SVM(V2)
SVM(V3)
mvEns
SVM(V1)
SVM(V2)
SVM(V3)
mvEns
SVM(V1)
SVM(V2)
SVM(V3)
mvEns
ω1-v-ω
2ω1-v-ω
3ω1-v-ω
4ω2-v-ω
3ω2-v-ω
4ω3-v-ω
4
test accuracy (%)
20% and Co-Training
20% only
100%
13 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Conclusion
there is an improvement from using unlabeled data whentraining one-against-one ensembles. Thus a learningframework is introduced that integrates multi-viewCo-Training in the one-against-one output-spacedecomposition process where Tri-Class Tri-Class SVMsare used as binary classifiers.The experiments have shown that Co-Training improvesfacial expression recognition system using unlabeledvideos where the visual recognizers are initially trainedwith a small quantity of labeled videos.A probabilistic interpretation of Tri-Class SVM outputs isintroduced to measure confidence.Since Tri-Class SVMs are retrained several times duringCo-Training iterations in order to benefit from thenewly-labeled videos, a modified version of SMO algorithmis introduced for fast learning of Tri-Class SVMs because itis computationally expensive to use traditional quadraticprogramming algorithms to solve Tri-Class SVMoptimization problems.GMM supervectors approach was applied to extractfeatures from image sequences that are used further asinput for Tri-Class SVMs. The GMM supervectorsapproach provides a flexible processing scheme for theclassification of any type of sequential data.
14 / 15
Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion
Thanks for your attention
Questions ??
15 / 15
Top Related