7/23/2019 Personalized Emotion Space for Video Affective Content
http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 1/6
2009, Vol.14 No.5, 393-398
Article ID 1007-1202(2009)05-0393-06
DOI 10.1007/s11859-009-0505-1
Personalized Emotion Space
for Video Affective Content
Representation
□ SUN Kai, YU Junqing†, HUANG Yue,
HU Xiaoqiang, LIU Qing
College of Computer Science and Technology, Huazhong
University of Science and Technology, Wuhan 430074, Hubei,
China
Abstract: A personalized emotion space is proposed to bridge the
“affective gap” in video affective content understanding. In order
to unify the discrete and dimensional emotion model, fuzzy
C-mean (FCM) clustering algorithm is adopted to divide the emo-
tion space. Gaussian mixture model (GMM) is used to determine
the membership functions of typical affective subspaces. At every
step of modeling the space, the inputs rely completely on the af-
fective experiences recorded by the audiences. The advantages of
the improved V-A (Velance-Arousal) emotion model are the per-
sonalization, the ability to define typical affective state areas in the
V-A emotion space, and the convenience to explicitly express the
intensity of each affective state. The experimental results validate
the model and show it can be used as a personalized emotion space
for video affective content representation.
Key words: video affective computing; personalized emotionspace; video affective content representation; fuzzy C-means clus-
tering (FCM); Gaussian mixture model (GMM)
CLC number: TP 391.4
Received date: 2008-12-07
Foundation item: Supported by the National Natural Science Foundation of
China (60703049); the “Chenguang” Foundation for Young Scientists
(200850731353) and the National Post-doctoral Foundation of China
(20060400847)
Biography: SUN Kai (1977-), male, Ph. D. candidate, research direction:
video affective computing, content based video retrieval. E-mail: [email protected]
† To whom correspondence should be addressed. E-mail: [email protected]
0 Introduction
With the proliferation of digital audio visualization,
the challenge of extracting meaningful content from such
data sets has led to the research and development in the
area of content based video retrieval (CBVR). Video
affective computing is one of the latest research areas in
CBVR, which can utilize both affective computing
[1]
andCBVR theories to understand video affective con-
tents[2,3]
. The affective content is an important natural
component for humans to classify and retrieve informa-
tion. Recognizing the video affective content and using it
to automatically label the significant affective features
potentially allows a new modality for users to interact
with video contents.
To understand video affective contents automati-
cally, the primary task is to transform the abstract con-
cepts of emotion into a form that can be handled by the
computer easily. Furthermore, the emotional experiencesinspired by the video content vary from individual to
individual. How to model a personalized space to repre-
sent video affective contents is one of the biggest chal-
lenges. There are many psychological emotion models.
They can be categorized into two classes: ○a discrete
emotion states[4,5]
and ○ b dimensional continuous emo-
tion space[6]
. The 2D valence-arousal emotion model[7,8]
(V-A model) is a famous dimensional continuous emo-
tion space that is more precise and general than the dis-
crete emotion states. The V-A model allows a smooth
passage from one state to another in an infinite set of
values. However, the actual affective recognizers usually
7/23/2019 Personalized Emotion Space for Video Affective Content
http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 2/6
Wuhan University Journal of Natural Sciences 2009, Vol.14 No.5 394
use a discrete set of typical emotion states depicting the
affective experiences. It is desirable to define some typi-
cal emotion state areas in the 2D plane of the V-A model
for unifying the two main emotion model classes (dis-
crete and dimensional). Another issue of the V-A modelis that it does not allow explicitly expressing the inten-
sity of the emotion state. Moreover, it does not cover the
personalization issue.
To address the problems mentioned above, a per-
sonalized emotion space is presented. The basic idea is to
define a set of typical fuzzy emotion subspaces in the
V-A emotion space. Each affective state is a point in the
V-A plane and characterized by a fuzzy emotion sub-
space. The intensity of each state is expressed by the
membership function of the fuzzy subspace. By intro-
ducing the typical fuzzy emotion subspace, the proposed
emotion space can represent discrete emotion states in
the continuous V-A plane. The fuzzy emotion subspaces
and their membership functions can be modeled based on
the personalized emotion coordinates annotated by the
audiences, which allows personalization to be met. The
centers, borders, shapes and densities of these subspaces
can truthfully reflect the emotional tendencies of audi-
ences.
1 The Establishment of thePersonalized Emotion Space
For the convenience of discussion, the formal de-
scriptions of modeling the emotion space are given as
follows:
L e t { | ( , ) , 1 1, 1 1}S e e v a v a= ∈ × − −R R ≤ ≤ ≤ ≤
be the V-A plane (Fig. 1), where ( , )e v a is one of the
affective states, v and a denote the intensities of valence
and arousal. ( 1, , )i E S i k ⊆ =
are the typical fuzzy emo-
tion subspaces, which can be expressed as( )i
i
e S
E e E
e∈
= ∫
( 1, , ).i k = 1 2{ , , , }( )nT x x x n= ∈ N is the training set
of video clips, the corresponding emotion coordinates
annotated by the audiences (i.e., the points in the V-A
plane) are 1 2{ , , , }.T nS e e e= The coordinate value
( , )i iv a of i T e S ∈ can be collected through our soft-
ware tool developed according to the theory of emotional
psychology (Fig. 2). The modeling objectives are defin-
ing the typical fuzzy emotion subspaces i E
in V-A plane
S and determining their affective membership func-tions ( )i E e
, where 1, , .i k =
Fig. 1 V-A emotion space and the typical emotion subspaces
1.1 FCM-Based Division of the V-A Emotion
Space
We use Fuzzy C-means clustering (FCM) to define
the typical fuzzy emotion subspaces i E
in the V-A plane
S . FCM, also known as fuzzy ISODATA, is a data clus-
tering algorithm in which each data point belongs to a
cluster to a degree specified by a membership grade.
Robert et al proposed this algorithm[9]
as an improve-
ment over K-means clustering.
Our task is to use the FCM to define k typical emo-
tion subspaces ( 1, , )i E i k =
based on 1 2{ , , , }.T nS e e e=
Suppose that the cluster centers of the k subspaces i E
are 1 2, , , k c c c . We can use a k n× membership matrix
U to express how well every emotion state belong to thetypical subspaces. To accommodate the introduction of
fuzzy partitioning U is allowed to have the elements with
the values between 0 and 1. However, imposing nor-
malization stipulates that the summation of degrees of
belongingness for an affective state i T e S ∈ always be
equal to unity:
1
1, 1,2, ,k
ij
i
u j n=
= ∀ =∑ (1)
We can use U to define the cost function (or objec-
tive function) J for FCM:
2
1
1 1 1
( , , , )k k n
m
k i ij ij
i i j
J c c J u d = = =
= =∑ ∑∑U (2)
where iju is between 0 and 1; ic is the cluster center of
the fuzzy typical affective subspace ;i
E
ij i jd c e= − is
the Euclidean distance between the ith cluster center and
the jth affective state; and (1, )m ∈ ∞ is a weighting ex-
ponent.
The necessary conditions for equation (2) to reach a
minimum can be found by forming a new objective func-
tion new J as follows:new 1 1 2( , , , , , , , )k n J c c λ λ λ U
7/23/2019 Personalized Emotion Space for Video Affective Content
http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 3/6
SUN Kai et al : Personalized Emotion Space for Video Affective … 395
2
1 1 1 1
1k n n k
m
ij i j j ij
i j j i
u d uλ = = = =
⎛ ⎞= + −⎜ ⎟
⎝ ⎠∑∑ ∑ ∑ (3)
where jλ , 1, , , j n= are the Lagrange multipliers for the
n constraints in equation (1). By differentiating new ( , J U
1 1 2, , , , , , )
k nc c λ λ λ with respect to all its input argu-
ments, the necessary conditions for equation (2) to reach
its minimum are
1
1
( )
( )
nm
ij j
j
i nm
ij
j
u e
c
u
=
=
=
∑
∑ (4)
and
2
1
1
1ij
mk
ij
p pj
u
d
d
−
=
=
⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠∑
(5)
The FCM is simply an iterated procedure through
the preceding two necessary conditions.
Based on the above discussions, the steps using
FCM to define the typical emotion subspaces i E
and the
membership matrix U can be concluded as follows:
Step 1 Initialize the membership matrix U with a
random value between 0 and 1 such that the constraints
in equation (1) are satisfied.
Step 2 Calculate k centers of the typical emotion
subspaces ,ic 1, , ,i k = using equation (4).
Step 3 Compute the cost function according to
equation (2). Stop if it is either below a certain tolerance
value or its improvement over previous iteration is below
a certain threshold.
Step 4 Compute a new U using equation (5). Go
to step 2.
1.2 GMM-Based Membership Functions of
Fuzzy Affective Subspaces
Although the typical emotion subspaces i E
can be
fuzzily divided in V-A plane based on FCM, their con-tinuous membership functions ( )i E e
still cannot be de-
termined. A fuzzy set is specified by its membership
function. Therefore, it is very important to determine the
membership functions for the subspaces rationally.
GMM (Gaussian Mixture Model) is an effective
tool for data modeling and pattern classification[10]
.
GMM assumes that the data under modeling is generated
via a probability density distribution, which is the
weighted sum of a set of Gaussian PDFs (Probability
Density Function). Our study proves that the distribution
of the elements’ membership degrees in every subspace
meets the needs of GMM, i.e., GMM can be used to for-
mulate these member functions.
Suppose the typical emotion subspace { }i ij E e=
,
( 1, , ; 1, , ),ii k j s= = where1
.k
i
i
s n=
=∑ If the distribu-
tion of elements in i E
is similar to ellipsoid, we can use
single multivariate Gaussian PDF ( ; , ) g e µ Σ to express
the PDF of i E
:
( ; , ) g e µ Σ =
T 11 1 exp ( ) ( )
2(2 )d
e e µ Σ µ Σ
−⎡ ⎤− − −⎢ ⎥⎣ ⎦π
(6)
where µ is the center of the PDF, Σ is the covariance
matrix of the PDF. These parameters determine the
characters of the PDF, such as the center, width and di-
rection of the function. However, the distribution of i E is not a rigorous single multivariate Gaussian distribu-
tion. A flexible solution is the weighted sum of a set of
Gaussian PDFs, which can be denoted as:
1
( ) ( ; , )r
i i i
i
p e g eα µ Σ =
= ∑ (7)
where r is the number of the Gaussian PDFs. The pa-
rameters in (7) are ( 1, , r α α ; 1, , r µ µ ; 1, , r Σ Σ ) and
1 2, , , r α α α should satisfy the constraint condition:
1
1r
i
i
α =
=∑ . We call ( ) p e the Gaussian Mixture Model.
To simplify the discussion, we restrict that the co-
variance matrix of the Gaussian PDF be expressed as:
2 2
1 0
, 1, ,
0 1
i i i
r r
I i r Σ σ σ
×
⎛ ⎞⎜ ⎟
= = =⎜ ⎟⎜ ⎟⎝ ⎠
(8)
In this case, the single Gaussian PDF can be ex-
pressed as:2
T
22
( ; , )
( ) ( ) (2 ) exp2
d
d
g e
e e
µ σ
µ µ σ σ
−
−
=
⎡ ⎤− −π −⎢ ⎥
⎣ ⎦
(9)
Equation (7) can be rewritten as:
2
1
( ) ( ; , )r
i i i
i
p e g eα µ σ =
= ∑ (10)
The parameters in equation (10) are:
2 2
1 1 1{ , , ; , , ; , , }r r r θ α α µ µ σ σ =
To compute the optimum estimation of θ , we can
use maximum likelihood estimate (MLE) to find the
maximum of the equation (11):
11
( ) ln ( ) ln ( )i i s s
j j
j j
J p e p eθ ==
⎡ ⎤= =⎢ ⎥
⎣ ⎦ ∑∏
7/23/2019 Personalized Emotion Space for Video Affective Content
http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 4/6
Wuhan University Journal of Natural Sciences 2009, Vol.14 No.5 396
2
1 1
ln ( ; , )i s r
k k k
j k
g eα µ σ = =
⎡ ⎤= ⎢ ⎥
⎣ ⎦∑ ∑ (11)
Based on the above discussions, the steps using
GMM to determine the membership functions ( )i E e
of
the fuzzy typical emotion subspaces i E
can be concluded
as follows:
Step 1 Initialize the parameter vector:2 2
1 1 1{ , , ; , , ; , , }r r r θ α α µ µ σ σ = (12)
Step 2 Calculate ( ), j e β 1,2, , j r = , using θ .
Step 3 Compute the new j µ according to
1
1
( )
( )
i
i
s
j k k
k
j s
j k
k
e e
e
β
µ
β
=
=
=
∑
∑ (13)
Step 4 Compute the new σ according to
T
2 1
1
( ) ( )1
( )
i
i
s
j k j k j
k
j s
j k
k
e e
d e
β µ µ
σ
β
=
=
− −
=
∑
∑
(14)
Step 5 Compute the new α according to
1
1( )
i s
j k
k i
e s
α β =
= ∑
Step 6 Let2 2
1 1 1{ , , ; , , ; , , }.r r r θ α α µ µ σ σ =
Stop if θ θ −
is below a certain tolerance value;else let θ θ = and go to step 2.
2 Experimental Results and
Discussion
2.1 Video Affective Content Database
Video affective computing is a hot but fairly new
research topic in CBVR, which still lacks a standard
video affective content database (VACDB) to validate
our proposed video affective semantic space. We choose
movies to create our VACDB because they have rich
affective contents. Based on the statistical figures of theIMDB
[11], we select 46 typical movies as the source of
affective video clips in VACDB. The total length of
these movies is 84 hours 43 minutes 4 seconds. These
movies can be classified into 6 genres: 9 animations, 10
actions, 11 dramas, 7 science fictions, 3 horrors and 6
comedies.
The ground truth for the 6 typical affective contents,
i.e., joy, tension, fear, relaxation, sadness and neutral, is
manually determined within the 46 movies. If one of the
video clips is labeled with the same affective content by
at least 6 of 9 researchers, we assign this clip as having
one of 6 affective contents. Finally, we select a total of
1037 video clips from 46 movies to create the VACDB.
The total length of the 1 037 clips is 10 hours 42
minutes 41 seconds. We choose 4 audiences (A1, A2, A3
and A4) from different fields, gender, age and back-
grounds to record the coordinate values of these 1037
video clips. The coordinate values are recorded with the
affective content annotation tool (Fig. 2), which is de-
signed according to the theory of emotional psychology.
We do not tell the audiences the emotion labels of thesemovie clips. The audiences watch every movie clip and
record its emotion coordinate based on their affective
experiences, i.e., the intensities of their valence and
arousal (V-A). After the steps mentioned above, we get 4
sets of the 1 037 emotional coordinates of these movie
clips, which are used to validate our proposed emotion
space.
Fig. 2 Video affective content annotation tool
7/23/2019 Personalized Emotion Space for Video Affective Content
http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 5/6
SUN Kai et al : Personalized Emotion Space for Video Affective … 397
2.2 Experimental Results
Modeling our emotion space has two steps: ① de-
fining the fuzzy typical affective subspaces in V-A plane
based on FCM and ② determining the affective mem-
bership functions of these subspaces based on GMM.At the first step of our experiment, the inputs of the
FCM are the 1037 emotional coordinates labeled by the
audiences, i.e., { }iS e= , ( , )i i ie v a ∈ ×R R , [ 1,1]iv ∈ −
and [ 1,1]ia ∈ − , 1, 2, ,1 037i = . The number of the
typical affective sunspaces (i.e., clusters) is 6. The
weighting exponent m of the cost function J is 2. The
tolerance value and the threshold of the iterations in
FCM are 100 and 1×5
10− , respectively. Fig. 3 (a), (b),
(c) and (d) demonstrate the division results of the V-A
plane. The centers, borders, shapes and densities of the 6
typical affective subspaces in Fig. 3 (a), (b), (c) and (d)
are different, which proves that our modeling method
can characterize the personalized emotion experiences of
the audiences.
Fig. 3 The 4 partitioning results of V-A plane based on FCM
(a), (b), (c) and (d) are plotted according to the coordinate values recorded by
A1, A2, A3 and A4, respectively
The second step is using the GMM to formulate the
membership functions of the subspaces. Because the
performance of GMM is highly related to its number of
mixtures, we designed another experiment to find the
best number of mixtures in GMM. We partitioned S into
a design set (DS, 519 samples) and a test set (TS, 518
samples). DS was used for training and TS was used for
test. We found that the training and test recognition rates
all reached the highest points when the number of mix-
tures in GMM was 2. Therefore, we chose 2 mixtures tomodel GMM. Fig. 4(a) demonstrates the 6 3-D member-
ship functions determined by the GMM in our experi-
ments, and Fig. 4(b) demonstrates the labeled 2-D mem-
bership functions (the coordinate values for modeling are
recorded by A1 and the membership functions based on
the coordinate values recorded by A2, A3 and A4 are
similar). Bringing the 1 037 coordinate values into these6 membership functions, the average recognition rate is
up to 97.8%. The experimental results show that our
proposed emotion space can be used to represent the
video affective content very well.
Fig. 4 GMM-based membership functions of typical
emotion subspaces
2.3 Discussions
The proposed personalized emotion space has two
prominent characteristics. On the one hand, the emotion
space is originated from the most famous 2D V-A emo-
tional model, which is a continuous model and can ex-
press infinite affective states in the V-A plane. Mean-
while, we define k typical emotion subspace ( 6k = in
experiments) based on the fuzzy theory. By introducing
the typical fuzzy emotion subspace, the proposed emo-
tion space can also represent discrete affective states inthe continuous V-A plane. Therefore, our emotion space
successfully unifies the discrete and dimensional emo-
7/23/2019 Personalized Emotion Space for Video Affective Content
http://slidepdf.com/reader/full/personalized-emotion-space-for-video-affective-content 6/6
Wuhan University Journal of Natural Sciences 2009, Vol.14 No.5 398
tion models in the theory psychology. On the other hand,
at every step of modeling the emotion space, the inputs
rely completely on the affective experiences recorded by
the audiences. The centers, borders, shapes and densities
of these subspaces can truthfully reflect the emotionaltendencies of the audiences, which mean that our pro-
posed space can be used to cover the personalization is-
sues. Along with the increase of video clips, the emotion
space will become more and more audience-oriented.
3 Conclusion
Modeling personalized emotion space and utilizing
it to represent and recognize the video affective contents
is one of the most important problems in video affectivecomputing. Filling this theory gap is the main objective
of this paper. The experimental results demonstrate that
the proposed emotion space can be used as an overall
solution to this problem.
All of these are the foundation for our future works.
To understand video affective contents automatically, it
is desirable to design a set of video affective features to
relate the video clips with the emotion coordinate values
based on our emotion space.
[1] Picard R. Affective Computing [M]. Cambridge: MIT Press,
1997.
[2] Hanjalic A, Xu L Q. Affective Video Content Representation
and Modeling [J]. IEEE Transactions on Multimedia, 2005,
7(1):143-154.
[3] Hanjalic A. Extracting Moods from Pictures and Sounds:
Towards Truly Personalized TV [J]. IEEE Magazine on Sig-
nal Processing , 2006, 23(2): 90-100.
[4] Ekman P. Are There Basic Emotions? [J]. Psychological Re-
view, 1992, 99(3): 550-553.
[5] Ortony A, Clore G L, Collins A. The Cognitive Structure of
Emotion [M]. Cambridge: Cambridge Cambridge University
Press, 1988.
[6] Russell J A. The Circumplex Model of Affect [J]. Journal of
Personality and Social Psychology, 1980, 39(6): 1161-1178.
[7] Lang P J, Bradley M M, Cuthbert B N. International Affec-
tive Picture System (IAPS): Instruction Manual and Affective
Ratings[EB/OL]. [2008-04-15]. http://www.unifesp.br/dpsico-
bio/adap/instructions.pdf.
[8] Wang L H, Cheong L F. Affective Understanding in Film [J].
IEEE Transactions on Circuits and Systems for Video Tech-
nology, 2006, 16(6): 689-704.
[9] Robert L C, Jitendra V D, Bezdek J C. Efficient Implementa-
tion of the Fuzzy C-means Clustering Algorithms[J]. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
1986, 8(2): 248-255.
[10] Zhang Z X. Data Clustering and Pattern Recognition[EB/OL].
[2008-04-15]. http://neural.cs.nthu.edu.tw/jang/books/dcpr /doc
/ 08 gmm.pdf. [11] The Internet Movie Database (IMDB)[EB/OL].[2008-04- 15].
http://www.imdb.com/chart/top.
□
References
Top Related