Speech Emotion Recognition and Perception of...
Transcript of Speech Emotion Recognition and Perception of...
1/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Speech Emotion Recognition
and Perception of Music
Mélanie Fernández Pradier
Prof. Dr.-Ing. Bin Yang
Supervisors: Prof. Dr.-Ing. Bin Yang
Dipl.-Ing. Fabian Schmieder
January 27, 2011
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
2/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
MotivationAim of the thesis
MotivationSpeech Emotion Recognition and Perception of Music
Emotion Recognition from Speech
Speech ∼ two-channel
linguistic
paralinguistic
Several Applications
support ASR
diagnoses
speech synthesis
entertainment
Music Perception
�language of emotion�
treatment of a�ective
disorders
treatment of speech disorders
same origin of music and
speech
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
3/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
MotivationAim of the thesis
Aim of the thesisApply Music Theory to Speech Emotion Recognition
Investigate Speech and Music similarities to derive universal features for Emotions
1 What is the link between music and speech?
2 How are emotions transmitted through music?
3 Can we apply musical knowledge to speech processing?
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
4/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
General ConceptsDescription
1 Introduction
Motivation
Aim of the thesis
2 Basic Features
General Concepts
Description
3 Musical Features
Interval and Triad Features
Based on Music Emotion Recognition
Perceptual Model of Intonation
4 Simulations and ResultsMélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
4/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
General ConceptsDescription
Pattern Recognition
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
5/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
General ConceptsDescription
Feature Generation
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
6/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
General ConceptsDescription
Basic Features Description
Local Features
ZCR
MFCC
Energytotal + bands
Pitch
Voiced-unvoiced
VAD
ZCR = 12 ·∑
N
n=1 |sgn (xn)− sgn (xn+1)|
Cepstrum =∣∣∣FFT {log (|FFT {x}|2)}∣∣∣2
Energy =∑
N
n=1 xn · x?n
Global Features
Global statistics: min, mean, max,
median, std, iqr...
directly, 1st or 2nd derivative
Energy and pitch plateaux
Combination with logical features
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
7/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Interval and Triad Features
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
8/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Interval Features
Autocorrelation of the circular
pitch density function
∫ L
0
po (modL (s + λ)) po (λ) dλ
Intervalic dissonance
DIS =
∫ L
0
d (s) ro (s) ds
where d (s) '√N (s)D (s)
0 2 4 6 8 10 12
2
4
6
8
10
12
14
Pitch Histogram
Num
ber
of P
itch S
am
ple
s
Circular frequency in ST scale
0 2 4 6 8 10 120
0.01
0.02
0.03
0.04
0.05
0.062nd order Autocorrelation
Circular frequency in ST scale
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
9/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Interval Dissonance
0f- f
Frequency of
Tone Sensation
Beats
Area
Roughness
AreaSmoothness
Area
Smoothness
Area
10Hz
Frequency difference
f = f2 - f1
f1
One-Tone
Sensation
Critical Bandwidth
f2
f1
Limits of
Discrimination
Two-Tone
Sensation
Two-Tone
Sensation
m2 M2 m3 M3 P4 4+/5° P5 m6 M6 m7 M70
10
20
30
40
50
60
70
Dis
so
na
nce
Intervals
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
10/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Triad Features
1 Direct computation
2 Extraction of �dominant pitches�
Autocorrelation Triad Features
0 2 4 6 8 10 120
0.05
0.1
0.15
0.2
0.25
Gaussian Mixture Model
Semi−Tone Scale
Gaussian Triad Features
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
11/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Tension and Modality
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
12/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Loudness, Timbre and Rhythm
Intensity Features
I (k) =
N/2∑n=0
|FFTk (n)|
Di (k) =1
I (k)
Hi∑n=Li
|FFTk (n)|
where k refers to the frame
Timbre Features
FFTk ≡ {xk1 . . . xkN}
sorted ≡{x′
k1 . . . x′
kN
}
Peak (k) = log
{1
αN
αN∑i=1
x′
ki
}
Valley (k) = log
{1
αN
αN∑i=1
x′
k(N−i+1)
}
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
13/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Loudness, Timbre and Rhythm
Rhythm Features
1 Compute FFT
2 Extract amplitude envelope
Ai (n) = FFTi (n)⊗ hw (n)
3 Apply Canny operator
Oi (n) = Ai (n)⊗ C (n)
C (n) = n
σ2e− n
2
2σ2
We obtain the onset sequence
Oi (n)50 100 150 200 250
2
4
6
8
10
12
14
16
18
Number of samples
Am
plit
ude
Onset Sequence
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
14/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Loudness, Timbre and Rhythm
Rhythm Features
Strength Average value of the
peaks
Regularity Average value of peaks
in the autocorrelation
Speed Ratio of number of
peaks and time
duration
50 100 150 200 250
2
4
6
8
10
12
14
16
18
Number of samples
Am
plit
ude
Onset Sequence
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
15/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Interval and Triad FeaturesBased on Music Emotion RecognitionPerceptual Model of Intonation
Perceptual Model of Intonation
Perceptual principles
1 Segmentation E�ect
2 Glissando Threshold: minimum
amount of frequency change
gth = 0.16/T 2 [ST/s²]
3 Di�erential Glissando Threshold:
minimum di�erence in slope
dgth = a2 − a1 = 20 [ST/s]
4 Short-term integration in time
0 0.5 1 1.5 20
50
100
150
200
250
300
Time (s)
Fre
quency (
Hz)
F0 estimation
stylization 1
stylization 2
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
16/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Database - Labels - Features
Database: emoDB (TUB)
10 speakers
708 �les
6 emotions
BASIC SET
duration 16
MFCC 91
ZCR 13
harmony 3
energy 58
pitch 33
Total 214
MUSICAL SET
interval 31
autocorr.
triad
4
gaussian
triad
10
intensity 63
rhythm 15
Total 123
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
17/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Strategies for evaluation 9-1 Vs 8-1-1
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
18/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Musical Universals
0.8 1 1.2 1.33 1.5 1.6 1.75 2 2.2−18
−16
−14
−12
−10
−8
−6
−4
−2
0
2
Frequency Ratio
Me
an
no
rma
lize
d a
mp
litu
de
(d
B)
1.4
m3
unison
octave
M3
P4P5
4+or5°
m6
M6
m7
1.25 1.67
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
19/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Plain bayes classi�er - Evaluation 8-1-1
0 10 20 30 40 50
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Number of features
Tra
inin
g h
it r
ate
(%
)
Basic set
Full set
0 10 20 30 40 50
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
Number of features
Ge
ne
raliz
atio
n h
it r
ate
(%
)
Basic set
Full set
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
20/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Nature of selected features
time
MFCC
ZCR
harmony
energy
pitch
interval
auto−correlation
triad
gaussiantriad
intensity
rhythm
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
21/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Comparison plain Vs hierarchical bayes classi�er
Activation
Valence
Potency
Valence
Potency
high low
highhigh
high highlow
lowlow
low
happy angry afraid neutralboredsad
plain
Bayes
hierarchical
Bayes
Basic 76.12 84.22
Basic+
Interval+Triad80.61 85.04
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
22/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Multi-dimensional Scaling
−15 −10 −5 0 5 10 15 20−15
−10
−5
0
5
10
15
1st Principal Component
BASIC
2n
d P
rin
cip
al C
om
po
ne
nt
Happy
Sad
Bored
Angry
−20 −10 0 10 20 30−15
−10
−5
0
5
10
15
20
25
1st Principal Component
2n
d P
rin
cip
al C
om
po
ne
nt
FULL
Happy
Sad
Bored
Angry
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
23/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Happy Vs Angry - Evaluation 8-1-1
0 10 20 30 40 500.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.88
Number of features
Tra
inin
g h
it r
ate
(%
)
Happy Vs Angry
Basic set
Full set
0 10 20 30 40 500.62
0.64
0.66
0.68
0.7
0.72
0.74
Number of features
Ge
ne
raliz
atio
n h
it r
ate
(%
)
Angry Versus Happy
Basic set
Full set
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
24/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Final Comparison of Musical Features
0 5 10 15 20 25 30 35 40 45 500.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8Comparison between different musical feature sets
Number of features
Accura
cy R
ate
(%
)
Musical Set
Basic Set
B+Stylization Set
B+Interval+Triad
B+Intensity
B+Rhythm
Full Set
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
25/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Conclusion
Summary
1 Literature review about speech, music and emotions
2 Theoretical background on psychoacoustics
3 Re-implementation of the basic features
4 Implementation of speech processing algorithms
5 Implementation of musical features
(music perception, MER and linguistics)
6 Simulations ⇒ Musical features can help to improve emotion
recognition in speech
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
26/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Conclusions
Further research
Environment: natural emotional speech, other languages
Pattern Recognition steps: feature transformation, pitch
extraction, classi�cation...
Improvement of musical features
Dissonance model, Perceptual model of intonation,
Emotionally meaningful moments
Systematization of feature extraction step
"Even monkeys express strong feelings in di�erent tones � anger
and impatience by low, � fear and pain by high notes."
Charles Darwin, Naturalist
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music
27/27
IntroductionBasic Features
Musical FeaturesSimulations and Results
Thank you!
Looking forward to your questions. . .
Mélanie Fernández Pradier Speech Emotion Recognition and Perception of Music