Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March...

40
I M C Modeling Audio Data with Speaker Heterogeneity Ludovic Rheault and Sophie Borwein University of Toronto

Transcript of Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March...

Page 1: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

International Methods Colloquium

Modeling Audio Data with SpeakerHeterogeneity

Ludovic Rheault and Sophie BorweinUniversity of Toronto

Page 2: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Objectives of the Project

� �antifying sentiment, activation and specific emotional states(anxiety) in political videos, using three modalities.

� In this talk: TEXT vs AUDIO.

� Methods: deep neural networks; transfer learning.

� Training data: annotated political videos with transcripts.

� Issues: heterogeneity across speakers; coder reliability.

ReferenceA preliminary study introducing this project:Rheault and Borwein (2019).

Looking for political transcripts? Check out lipad.ca.

IMC - Rheault and Borwein - Objectives March 2020 2 / 40

Page 3: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Previous Work: Audio Data

� Political Science:� Dietrich et al. (2019a; 2019b): Pitch as measure of

activation.� Knox and Lucas (2019): HMM model; skepticism in

voice.� Neumann (2019): Phonetics; style-shi�ing.� Hwang et al. (2019): Audio and video; political ads.� ...

� Engineering/Computer Science:� Schuller (2018).� Tzirakis et al. (2017).� ...

IMC - Rheault and Borwein - Objectives March 2020 3 / 40

Page 4: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Representing Emotions

Categorical Approaches� e.g. Ekman’s six basic emotions (fear, anger, sadness, surprise,

happiness, disgust).

� Problem: many of them not commonly observed in elites’speeches.

� Fear vs. anxiety.

Dimensional Approaches� e.g. Russel’s circumplex model of a�ect.

� Sentiment (or valence) and activation (arousal).

IMC - Rheault and Borwein - Theory March 2020 4 / 40

Page 5: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Circumplex Model of A�ect

PositiveNegative

Calm

Activated

Anxious Excited

Depressed Serene

IMC - Rheault and Borwein - Theory March 2020 5 / 40

Page 6: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Our Data

Sources3,635 videos: Canadian House of Commons, US Congress, Debates.

Annotations (Labels)Three binary annotations (graduate students; MTurk workers).

� Sentiment

� Activation

� Anxiety

Current work: Improving coder reliability.

IMC - Rheault and Borwein - Data March 2020 6 / 40

Page 7: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Pipeline for processing videos

IMC - Rheault and Borwein - Data March 2020 7 / 40

Page 8: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Final Video Collection

Upcoming Steps� Public release of video dataset with improved coder reliability.

� Lab subjects with biometric measurements as ground truth.

(with Jonathan Rose and Bazen Teferra, UofT Engineering)

Trade-O�� Crowdsourced annotations of public domain videos:

� Easier to make data public;� Usually low intercoder reliability;

� Human subjects with biometric ground truth:� Higher reliability;� Very di�icult to anonymize audio and video signals.

IMC - Rheault and Borwein - Data March 2020 8 / 40

Page 9: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Methods, Concepts and Definitions

Page 10: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Machine learning in one slide

Social science (inference) Machine learning (prediction)

GLM inverse link function Activation function

E(y) = f (x′β) E(y) = f (x′β)

Preferred objective function

Log-likelihood Cross-entropy

logL =∑n

i=1 log P(yi|xi,β) − logL = −∑n

i=1 log P(yi|xi,β)

Solving algorithm

Newton-Raphson Gradient descent

βt := βt−1 − [H logL]−1∇ logL βt := βt−1 − η∇(− logL)�antities of interest

β;Var(β) y;∑

1(y = y)/n

IMC - Rheault and Borwein - Methods, Concepts and Definitions March 2020 10 / 40

Page 11: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Preliminaries: Neural Networks

Logistic Regression as Neural Network

x1

x2

y

E(y) = f (α+ x′β)

Neural Network with Hidden Layer

x1

x2

h1

h2

y

E(y) = f (2)(α(2) + h′β(2)); hk = f (1)(α(1)k + x′β(1)

k )

Deep Neural Network (DNN)

Multiple hidden layers: E(y) = f (4)(f (3)(f (2)(f (1)(x))))

IMC - Rheault and Borwein - Methods, Concepts and Definitions March 2020 11 / 40

Page 12: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Preliminaries: Convolutional Neural Networks

1D ConvNet (or CNN)

x ∗ β = h

Input sentence K features

Sequ

ence

ofsi

zeT

Filter (size 2)h

x11 x1

2 x13 x1

4

x21 x2

2 x23 x2

4

Is

hypochondria

a

symptom

of

Covid

19

?

β1 β2 β3 β4

β5 β6 β7 β8

h1

∗ =

h1 = β1x11 + β2x1

2 + ...+ β8x24

IMC - Rheault and Borwein - Methods, Concepts and Definitions March 2020 12 / 40

Page 13: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Preliminaries: Convolutional Neural Networks

1D ConvNet

x ∗ β = h

Input sentence K features

Sequ

ence

ofsi

zeT

Filter (size 2)h

x21 x2

2 x23 x2

4

x31 x3

2 x33 x3

4

Is

hypochondria

a

symptom

of

Covid

19

?

β1 β2 β3 β4

β5 β6 β7 β8

h2

∗ =

h2 = β1x21 + β2x2

2 + ...+ β8x34

IMC - Rheault and Borwein - Methods, Concepts and Definitions March 2020 13 / 40

Page 14: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Preliminaries: Convolutional Neural Networks

1D ConvNet

x ∗ β = h

Input sentence K features

Sequ

ence

ofsi

zeT

Filter (size 2)h

x31 x3

2 x33 x3

4

x41 x4

2 x43 x4

4

Is

hypochondria

a

symptom

of

Covid

19

?

β1 β2 β3 β4

β5 β6 β7 β8 h3∗ =

h3 = β1x31 + β2x3

2 + ...+ β8x44

IMC - Rheault and Borwein - Methods, Concepts and Definitions March 2020 14 / 40

Page 15: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Preliminaries: Convolutional Neural Networks

There will be K × L×M trainable β parameters, where L is the chosen number of

filters and M the filter size (or kernel size), plus a filter specific intercept.

K features

Sequ

ence

ofsi

zeT

2 × Filters (size 2)h

x31 x3

2 x33 x3

4

x41 x4

2 x43 x4

4

x61 x6

2 x63 x6

4

x71 x7

2 x73 x7

4

β11 β1

2 β13 β1

4

β15 β1

6 β17 β1

8

β21 β2

2 β23 β2

4

β25 β2

6 β27 β2

8

h31

h62

IMC - Rheault and Borwein - Methods, Concepts and Definitions March 2020 15 / 40

Page 16: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Methods: Current Trends

Audio� Two main approaches: HMMs (e.g. Knox and Lucas 2019) and

deep neural networks (Hinton et al. 2012).� Trends:

� No features: use raw audio signal as input in ConvNets.� Transfer learning (e.g. Audioset, wav2vec, autoencoders).

Text� Transfer learning everywhere:

� Previous years: Word embeddings + DNN as default.� Now: transfer learning using more sophisticated language models

(e.g. ULMFiT, BERT, DistilBERT).

IMC - Rheault and Borwein - Methods, Concepts and Definitions March 2020 16 / 40

Page 17: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Transfer Learning

� Defining the concept of transfer learning is controversial, but in anutshell:

The ProblemSpecific applications usually have limited training data, resulting inpoor predictive accuracy.

The SolutionPre-train a model using a very large dataset, for a di�erent task (e.g.an autoencoder). Use the parameters of this larger model asfeature representations for the target task, or fine-tune themodel for the target task using local data.

IMC - Rheault and Borwein - Methods, Concepts and Definitions March 2020 17 / 40

Page 18: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Text as Data

Page 19: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Transfer Learning (Word Embeddings, Skip-Gram)

afraid (wt )

I (wt−2)

am (wt−1)

of (wt+1)

bats (wt+2)

h

βV×M

The learned feature representation matrix βV×M (V size ofvocabulary, M size of hidden layer) contains information about

semantics not available from a small sample. (Mikolov et al. 2013)

IMC - Rheault and Borwein - Text as Data March 2020 19 / 40

Page 20: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Transfer Learning (Word Embeddings)

� Map words from a new dataset onto pre-trained embeddings:

“I” → [−0.51, 1.29, ..., 1.34]

“am” → [0.76,−2.44, ...,−1.06]

“afraid” → [−0.83,−3.09, ..., 0.86]

“of” → [2.25,−2.16, ...,−0.98]

“Covid-19” → [0, 0, ..., 0]

� Each document is a matrix: sequence of T words with featurelength M.

� Use ConvNets or recurrent neural network (RNNs) to predicttarget annotation (e.g. sentiment) from the sequences.

RNNs: ht = f (α+ (xt)′β + (ht−1)′θ)

IMC - Rheault and Borwein - Text as Data March 2020 20 / 40

Page 21: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Transfer Learning (BERT)

I

am

afraid

of

[MASK]

I

am

afraid

of

bats

Transformers24 layers

βV×M

Bidirectional Encoder Representations from Transformers (BERT) trained on

Wikipedia + BooksCorpus, using two tasks (predicting masked word shown

above) (Devlin et al. 2019).

IMC - Rheault and Borwein - Text as Data March 2020 21 / 40

Page 22: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Transfer Learning (BERT)

� Like word embeddings, BERT can provide pretrained embeddings(first hidden layer, or encoder).

� Deep learning architecture (transformers with a�ention weights)with state-of-the art results on many NLP tasks.

� Two straightforward methods to adapt BERT for a new task:� “Freeze” the parameters, add an output layer on top of BERT (e.g.

logistic or so�max), and fit with local data.� Add the output layer and continue training all parameters with

local data (fine-tuning).

� BERT Large model: 24 transformer layers, hidden layer size of1024.

IMC - Rheault and Borwein - Text as Data March 2020 22 / 40

Page 23: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Results, Text-as-Data Benchmark

BERT model (high-quality annotations only)

Emotion Accuracy (%) Modal (%) PRE (%) AUROC

Sentiment 88.2 56.2 73.0 0.89

Activation 74.6 71.5 11.0 0.65

Anxiety 62.5 52.0 21.9 0.63

� Text classification works well with sentiment, less so foractivation and a specific emotion like anxiety.

� Substantive conclusion the same with other classifiers (e.g. wordembeddings + RNN) and annotation quality.

� “The sentiment is in the transcript, but the arousal is not”(Cochrane et al. 2019).

(Accuracy calculated on a held out sample; we use the same withaudio models for comparison.)

IMC - Rheault and Borwein - Text as Data March 2020 23 / 40

Page 24: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Audio as Data

Page 25: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Audio Data: Raw Signals (Waveform)

A vector of signed integers with a specified bit depth (e.g. 16 bitsranges from -32767 to 32767), usually converted to float:

[0, 0, 0.15, 0.21, ..., 0]

with sampling rate in Hz (integers per second, e.g. 16KHz).

For a great intro on sound, check past IMC presentation fromChristopher Lucas.

IMC - Rheault and Borwein - Audio as Data March 2020 25 / 40

Page 26: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Transfer Learning (wav2vec)

Audio Signal

5 layers of ConvNets Feature representation

Task: predict future samples of the input signal vs “distractor” samples.

Schneider et al. (2019)IMC - Rheault and Borwein - Audio as Data March 2020 26 / 40

Page 27: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Transfer Learning (wav2vec)

� Use input audio samples to predict likelihood of future sample.

� Trained on 1,000 hours of spoken language (LibriSpeech).

� Two di�erent outputs of wav2vec ConvNet blocks can be used asfeature representation of wave inputs (10ms × 512):

0ms–10ms → [0.0, 0.03, ..., 0.05]

10ms–20ms → [0.04, 0.02, ..., 0.14]

20ms–30ms → [0.01, 0.0, ..., 0.16]

30ms–40ms → [0.0, 0.11, ..., 0.06]

· · · → · · ·� Intuition similar to word embeddings/BERT.

IMC - Rheault and Borwein - Audio as Data March 2020 27 / 40

Page 28: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Audio Data: Model I (Schematic Depiction)

Audio Signal

Audio Features(wav2vec)

2L of ConvNets

Output Node

(Anxiety,Activation

or Sentiment)

Two 1D ConvNet layers with 16 and 8 filters (size 3), dropout and pooling.

IMC - Rheault and Borwein - Audio as Data March 2020 28 / 40

Page 29: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Results Part I, Audio Data (ConvNet)

ConvNets with wav2vecEmotion Accuracy (%) Modal (%) PRE (%) AUROC

Activation 80.1 71.5 30.0 0.75

Anxiety 71.7 52.0 41.1 0.72

Sentiment 56.2 56.2 0.0 0.54

� Be�er than text for activation and anxiety, but not impressive.

IMC - Rheault and Borwein - Audio as Data March 2020 29 / 40

Page 30: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Issue: Speaker Heterogeneity

Speaker HeterogeneityEach voice is unique. As with heterogeneity bias in panel dataanalysis, we would like to account for a speaker j’s a�ributes:

E(y) = f (αj + x′β)

� Deep neural networks can learn to distinguish emotional statesfrom speaker-specific a�ributes, but this would require a lot oftraining data.

� Speaker-specific intercepts wouldn’t help for new speakers,unobserved during training stage.

IMC - Rheault and Borwein - Audio as Data March 2020 30 / 40

Page 31: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Speaker Voice Recognition

Voice Encoder for Speaker VerificationA voice encoder to represent each speaker’s individual voicecharacteristics.

0.000

0.025

0.050

0.075

0.100

0.125

0.150

0.175

0.200

0.00

0.05

0.10

0.15

0.20

Used for voice synthesis and voice cloning, e.g. Google’s Tacotron (Wan etal. 2018, Jia et al. 2019).

IMC - Rheault and Borwein - Audio as Data March 2020 31 / 40

Page 32: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Audio Data: Speaker Embeddings (Voice Encoders)

−0.4 −0.2 0.0 0.2 0.4

First principal component

−0.3

−0.2

−0.1

0.0

0.1

0.2

0.3

0.4

Seco

ndpr

inci

palc

ompo

nent

WomenMen

IMC - Rheault and Borwein - Audio as Data March 2020 32 / 40

Page 33: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Audio Data: Model II (Schematic Depiction)

Audio Signal

Audio Features

(wav2vec)

Speaker Embedding

2L of ConvNets

Output Node

Two 1D ConvNet layers with 16 and 8 filters (size 3), dropout and pooling.

IMC - Rheault and Borwein - Audio as Data March 2020 33 / 40

Page 34: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Results Part II, Accounting for Heterogeneity

ConvNet with wav2vec AND speaker embeddings

Emotion Accuracy (%) Modal (%) PRE (%) AUROC

Activation 88.9 71.5 61.0 0.85

Anxiety 78.9 52.0 56.2 0.79

Sentiment 59.7 56.2 8.0 0.58

ConvNet with wav2vec, no speaker embeddings

Emotion Accuracy (%) Modal (%) PRE (%) AUROC

Activation 80.1 71.5 30.0 0.75

Anxiety 71.7 52.0 41.1 0.72

Sentiment 56.2 56.2 0.0 0.54

IMC - Rheault and Borwein - Audio as Data March 2020 34 / 40

Page 35: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Results Part III, Impact of Annotation �ality

ConvNet with wav2vec and speaker embeddings(High quality annotations only)

Emotion Accuracy (%) Modal (%) PRE (%) AUROC

Activation 88.9 71.5 61.0 0.85

Anxiety 78.9 52.0 56.2 0.79

Sentiment 59.7 56.2 8.0 0.58

ConvNet with wav2vec and speaker embeddings(Including low quality annotations)

Emotion Accuracy (%) Modal (%) PRE (%) AUROC

Activation 76.6 62.6 37.4 0.75

Anxiety 75.7 52.0 49.3 0.76

Sentiment 62.5 50.8 23.7 0.63

IMC - Rheault and Borwein - Audio as Data March 2020 35 / 40

Page 36: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Summary: Audio (ConvNet) vs Text (BERT)

Audio Modality

Emotion Accuracy (%) Modal (%) PRE (%) AUROC

Activation 88.9 71.5 61.0 0.85

Anxiety 78.9 52.0 56.2 0.79

Sentiment 59.7 56.2 8.0 0.58

Text Modality

Emotion Accuracy (%) Modal (%) PRE (%) AUROC

Activation 74.6 71.5 11.0 0.65

Anxiety 62.5 52.0 21.9 0.63

Sentiment 88.2 56.2 73.0 0.89

IMC - Rheault and Borwein - Audio as Data March 2020 36 / 40

Page 37: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Full Project: Visual Modality

Input Video Face Recognition(256 × 256 × 3)

InceptionV3

Pooling Top Layer

Output

IMC - Rheault and Borwein - Concluding Thoughts March 2020 37 / 40

Page 38: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

Conclusion

� Text transcripts and audio signals of political speeches o�ercomplementarity:

� Audio be�er at capturing aroused/anxious speakers.� Transcript be�er at capturing sentiment (valence).

� Accounting for speaker heterogeneity ma�ers in small samples.

� �ality of human coding a major issue in speech emotionrecognition.

� Future step: Completion and public release of video collection.

� Future step: Audio vs Text vs Visual.

Feedback welcome!

IMC - Rheault and Borwein - Concluding Thoughts March 2020 38 / 40

Page 39: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

References

� Baltrusaitis, Tadas, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. “MultimodalMachine Learning: A Survey and Taxonomy.” IEEE Transactions on Pa�ern Analysis andMachine Intelligence 41(2): 423–443.

� Cochrane, Christopher, et al. 2019. “The Automated Detection of Emotion in Transcripts ofParliamentary Speech.” APSA 2019.

� Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding.” InProceedings of NAACL-HLT 2019.

� Dietrich, Bryce J, Ryan D Enos, and Maya Sen. 2019a. “Emotional Arousal Predicts Votingon the U.S. Supreme Court.” Political Analysis 27(2): 237–243.

� Dietrich, Bryce J, Ma�hew Hayes, and Diana Z O’Brien. 2019b. “Pitch Perfect: Vocal Pitchand the Emotional Intensity of Congressional Speech.” American Political Science Review113(4): 941–962.

� El Ayadi, Moataz, Mohamed S Kamel, and Fakhri Karray. 2011. “Survey on SpeechEmotion Recognition” Pa�ern Recognition 44(3): 572–587.

� Hinton, Geo�rey, et al. 2012. “Deep Neural Networks for Acoustic Modeling in SpeechRecognition: The Shared Views of Four Research Groups.” IEEE Signal Processing Magazine29(6): 82–97.

� Hwang, June, Kosuke Imai, and Alex Tarr. 2019. “Automated Coding of Political CampaignAdvertisement Videos: An Empirical Validation Study.” PolMeth XXXVI.

IMC - Rheault and Borwein - Concluding Thoughts March 2020 39 / 40

Page 40: Ludovic Rheault and Sophie Borwein University of Toronto · IMC - Rheault and Borwein -Theory March 2020 4 / 40. Circumplex Model of A˙ect Negative Positive Calm Activated Anxious

References (Continued)

� Jia, Ye, et al. 2018. “Transfer Learning from Speaker Verification to MultispeakerText-to-Speech Synthesis.” In Advances in Neural Information Processing Systems. pp.4480–4490.

� Knox, Dean and Christopher Lucas. 2019. “A Dynamic Model of Speech for the SocialSciences.” Available at SSRN: h�ps://ssrn.com/abstract=3490753.

� Mikolov, Tomas, et al. 2013. “E�icient Estimation of Word Representations in VectorSpace.” ICLR 2013.

� Neumann, Markus. 2019. “Hooked With Phonetics: The Strategic Use of Style-Shi�ing inPolitical Rhetoric.” PolMeth XXXVI.

� Russel, James. 1980. “A Circumplex Model of A�ect.” Journal of Personality and SocialPsychology 39(6): 1161-1178.

� Schneider, Ste�en, et al. 2019. “wav2vec: Unsupervised Pre-Training for SpeechRecognition.” Available at arXiv:1904.05862

� Schuller, Björn W. 2018. “Speech Emotion Recognition: Two Decades in a Nutshell,Benchmarks, and Ongoing Trends.” Communications of the ACM 61(5): 90–99.

� Tzirakis, Panagiotis, et al. 2017. “End-to-End Multimodal Emotion Recognition Using DeepNeural Networks.” IEEE Journal of Selected Topics in Signal Processing 11(8): 1301–1309.

� Wan, Li, et al. 2018. “Generalized End-to-End Loss for Speaker Verification.” ICASSP 2018.

IMC - Rheault and Borwein - Concluding Thoughts March 2020 40 / 40