Download - Semi-supervised Facial Expressions Annotation Using Co-Training with Fast Probabilistic Tri-Class SVMs

Semi-Supervised Learning(SSL) Co-Training with Tri-Class SVMs Experimental Results Conclusion

Semi-Supervised Facial ExpressionsAnnotation Using Co-Training with Fast

Probabilistic Tri-Class SVMs

Mohamed Farouk Abdel Hady, Martin Schels, FriedhelmSchwenker, Günther Palm

Institute of Neural Information ProcessingUniversity of Ulm, Germany

{mohamed.abdel-hady|friedhelm.schwenker|guenther.palm}@uni-ulm.de

September 12, 2010

1 / 15


Semi-Supervised Learning

In many domains, the amount of training examples is largebut unlabeled.Data labeling process is often tedious, expensive andtime consuming because it requires the effort of humanexperts such as physicians, radiologists, chemist, etc.Research directions of SSL

Semi-Supervised ClusteringSemi-Supervised ClassificationSemi-Supervised RegressionSemi-Supervised Dimensionality Reduction

2 / 15


How can unlabeled data be helpful?

+

+

+

+

-

-

--

Figure: The unlabeled examples help to put thedecision boundary in low density regions. Using labeleddata only, the maximum margin separating hyperplane isplotted with the versicle dashed lines. Using bothlabeled and unlabeled data (dots), the maximum marginseparating hyperplane is plotted with the oblique solidlines.

3 / 15


Co-Training with Tri-Class SVMs

ωk-v-ωh

...

...

Measure

Confidence

Select the most confident examples

{(xu(1), xu

(2), xu(3), Hkh(Xu))}

train

apply

refill

U

Lkh

U'

add

h2h1 h3

Hkh(xu)

xu(1) xu

(3)xu(2)

Hkh

Xu

ω1-v-ωKω1-v-ω2

...

......

...

ωK-1-v-ωK

Figure: Tri-Class Co-Training

4 / 15


Bi-Class SVMs

2

||w ||2

b b +1 b -1

large margin

small margin large margin

small margin

ωh

ωk

y=1 y=3

fkh(x) = <w, ϕ (x)>

12‖w‖2 + C

nk +nh∑i=1

εi (1)

subject to the constraints

yi (〈w , φ(xi )〉 − b) ≥ 1− εi , εi ≥ 0, for i = 1, . . . , nk + nh (2)

5 / 15


Tri-Class SVMs

2

||w ||2

2

||w ||2

b1 b1 +1 b1-1 b2 b2 +1 b2-1

large margin

small margin large margin

small margin

ωh

ωk

y=1 y=2 y=3

ϵi*3

ϵi2

fkh(x) = <w, ϕ (x)>

ϵi1

ϵi*2

minw,b1,b2,ε,ε

∗ΨP =

12‖w‖2 + C(

n1∑i=1

ε1i +

n2∑i=1

ε∗2i +

n2∑i=1

ε2i +

n3∑i=1

ε∗3i ) (3)

subject to

〈w , φ(x1i )〉 − b1 ≤ −1 + ε1i , ε1i ≥ 0 for i = 1, . . . , n1;

〈w , φ(x2i )〉 − b1 ≥ 1− ε∗2

i , ε∗2i ≥ 0 for i = 1, . . . , n2;

〈w , φ(x2i )〉 − b2 ≤ −1 + ε2i , ε2i ≥ 0 for i = 1, . . . , n2;

〈w , φ(x3i )〉 − b2 ≥ 1− ε∗3

i , ε∗3i ≥ 0 for i = 1, . . . , n3;

b1 ≤ b2

(4)

6 / 15


Illustrative example for one-v-one Tri-Class SVMs

+-

x

-

- -

-

-

-

-

-

--

-

-

-

-

x

x

x

x

x

x x

x

x

x

x x x

x

+

+

+

+

+

+

+

+

+

+ +

+

++

ω1

ω2

ω3

(a) input space

+-

x

-

- -

-

-

-

-

-

--

-

-

-

-

x

x

x

x

x

x x

x

x

x

x x x

x

+

+

+

+

+

+

+

+

+

+ +

+

++

ω1

ω2

ω3

(b) Class ω1 against ω2

+-

x

-

- -

-

-

-

-

-

--

-

-

-

-

x

x

x

x

x

x x

x

x

x

x x x

x

+

+

+

+

+

+

+

+

+

+ +

+

++

ω1

ω2

ω3

(c) Class ω1 against ω3

+-

x

-

- -

-

-

-

-

-

--

-

-

-

-

x

x

x

x

x

x x

x

x

x

x x x

x

+

+

+

+

+

+

+

+

+

+ +

+

++

ω1

ω2

ω3

(d) Class ω2 against ω3

Figure: a linearly separable dataset with 45 examples.

7 / 15


Probabilistic interpretation for the Tri-Class SVM output

We fit a sigmoid function on the SVM output where Eq. (6)represents the doubt that input example x belongs to ωk or ωh.

Pkh(y = 1|x) =

(1−

11 + exp(−(fkh(x)− b1))

); (5)

Pkh(y = 2|x) =

(1

1 + exp(−(fkh(x)− b1))

)(1−

11 + exp(−(fkh(x)− b2))

); (6)

Pkh(y = 3|x) =

(1

1 + exp(−(fkh(x)− b1))

)(1

1 + exp(−(fkh(x)− b2))

)(7)

8 / 15


Decision Fusion for Ensemble of Probabilistic Tri-Class SVMs

Table: One-against-One Decision Profile of example x

ω1 ω2 ω3 ω4ω1 - P12(y = 3|x) P13(y = 3|x) P14(y = 3|x)ω2 P12(y = 1|x) - P23(y = 3|x) P24(y = 3|x)ω3 P13(y = 1|x) P23(y = 1|x) - P34(y = 3|x)ω4 P14(y = 1|x) P24(y = 1|x) P34(y = 1|x) -

Thus the final probabilistic output of One-against-Oneensemble of Tri-Class SVMs is defined as follows, for eachk = 1, . . . , K :

P(y = ωk |x) =

∑k−1h=1 Phk (y = 1|x) +

∑Kh=k+1 Pkh(y = 3|x)∑K

k′=1∑k′−1

h=1 Phk′ (y = 1|x) +∑K

h=k′+1 Pk′h(y = 3|x)(8)

9 / 15


Facial Expressions Recognition

1 The Cohn-Kanade dataset is a collection of image sequences with emotionalcontent, which is available for research purposes.

2 It contains image sequences, which were recorded in a resolution of 640×480(sometimes 490) pixels with a temporal resolution of 33 frames per second.

3 Every sequence is played by an amateur actor who is recorded from a frontalview. The sequences always start with a neutral facial expression and end withthe full emotion.

(a) happiness (b) surprise (c) disgust (d) sadness

10 / 15


Feature Extraction

Orientation Histogram or

Optical Flow Feature

extraction Algorithm

Video i

Video i

Video i

Training Videos GMM UBM

Initial Step :

MAP Adaptation:

GMM UBM

MAP Adaptation

Orientation Histogram or

Optical Flow Feature

extraction Algorithm

Video

Input Video

μ = [μ1 , ..., μM ]T

GMM Super Vector

SMO for Tri - Class SVM

EM Algorithm

Figure: Calculation of GMM Super Vectors that isperformed for each feature type

11 / 15


Methodology

1 5 times of 8-fold cross validation2 Each test set has 44 videos (13, 11, 10 and 10 per class, respectively) while

each training set consists of 314 videos.3 10% of the training examples of each class are used in L (9, 8, 7 and 7,

respectively), while the remaining are in U.4 Three feature vectors (views) for Co-Training: the orientation histogram from the

mouth region (V1) and the optical flow features extracted from the full facialregion (V2) and from the mouth region (V3).

5 The supervectors are normalized to have zero mean and unit variance, in orderto avoid problems with outliers.

12 / 15


86.21

84.14

83.31

93.15

73.71

77.56

75.67

84.31

84.99

75.17

78.49

89.52

75.13

81.36

76.19

87.22

87.15

78.76

81.76

91.37

75.25

70.74

69.62

81.57

77.37

78.14

74.18

86.56

58.05

67.06

65.86

73.26

70.97

64.55

66.15

79.47

61.2

70.49

70.03

77.33

75.19

67.89

70.38

82.76

64.16

58.42

59.24

70.34

81.08

81.25

81.72

89.53

61.35

68.91

70.79

73.99

78.77

70.05

73.29

83.4

78.77

73.54

72.34

78.07

81.79

73.17

75.86

85.77

65.14

61.1

62.04

71.66

55 60 65 70 75 80 85 90 95

SVM(V1)

SVM(V2)

SVM(V3)

mvEns

SVM(V1)

SVM(V2)

SVM(V3)

mvEns

SVM(V1)

SVM(V2)

SVM(V3)

mvEns

SVM(V1)

SVM(V2)

SVM(V3)

mvEns

SVM(V1)

SVM(V2)

SVM(V3)

mvEns

SVM(V1)

SVM(V2)

SVM(V3)

mvEns

ω1-v-ω

2ω1-v-ω

3ω1-v-ω

4ω2-v-ω

3ω2-v-ω

4ω3-v-ω

4

test accuracy (%)

20% and Co-Training

20% only

100%

13 / 15


Conclusion

there is an improvement from using unlabeled data whentraining one-against-one ensembles. Thus a learningframework is introduced that integrates multi-viewCo-Training in the one-against-one output-spacedecomposition process where Tri-Class Tri-Class SVMsare used as binary classifiers.The experiments have shown that Co-Training improvesfacial expression recognition system using unlabeledvideos where the visual recognizers are initially trainedwith a small quantity of labeled videos.A probabilistic interpretation of Tri-Class SVM outputs isintroduced to measure confidence.Since Tri-Class SVMs are retrained several times duringCo-Training iterations in order to benefit from thenewly-labeled videos, a modified version of SMO algorithmis introduced for fast learning of Tri-Class SVMs because itis computationally expensive to use traditional quadraticprogramming algorithms to solve Tri-Class SVMoptimization problems.GMM supervectors approach was applied to extractfeatures from image sequences that are used further asinput for Tri-Class SVMs. The GMM supervectorsapproach provides a flexible processing scheme for theclassification of any type of sequential data.

14 / 15


Thanks for your attention

Questions ??

15 / 15