Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight...
Transcript of Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight...
![Page 1: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/1.jpg)
Facial Expression Recognition and Generation
Deepali Aneja Ph.D. student
Computer Science and Engineering University of Washington
![Page 2: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/2.jpg)
Motivation • Accurate facial expression depiction is critical for storytelling.
• And difficult!
0% 13% 25% 38% 50% 63%
JoySadness
AngerSurprise
FearDisgustNeutral
0% 13% 25% 38% 50%
JoySadness
AngerSurprise
FearDisgustNeutral
0% 13% 25% 38% 50%
Joy
Sadness
Anger
Surprise
Fear
Disgust
Neutral
We asked three professional animators to make the character appear as surprised as possible. None of the expressions achieved above 50% recognition on
Mechanical Turk testing.
![Page 3: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/3.jpg)
Use human anatomy (FACs) to generate expressions
MPEG – 4 HapFACS HapFACS FACSGen (Anger) (Anger) (Fear) (Fear)
![Page 4: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/4.jpg)
Adobe Character Animator (Geometry + Audio input)
![Page 5: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/5.jpg)
Problem Statement
Given that that simple geometric mappings are not sufficient:
• How can we transfer human expressions to stylized characters without losing perceptual information?
• How can we use human expressions to quickly and
automatically create expressions for a wide range of characters?
Generate characters from human expressions
![Page 6: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/6.jpg)
Our Approach
• Use deep learning to learn mappings between • human expressions and human expressions • character expressions and character expressions • human expressions and characters expressions
• Seven classes of expressions : • Joy, Sad, Anger, Disgust, Surprise, Fear and Neutral
• This isn’t just geometry mapping
• It is perceptual modelling of expressions
![Page 7: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/7.jpg)
Step 4
Retrieve characters using a perceptual model and geometry
Step 2
Learn analogous character model
Character feature space
f’( )
Step 1
Use deep learning to create a perceptual model of human expressions
Human feature space
f( )
Step 3
Learn Mapping f’( ) f( )
Part 1: Expression Retrieval
![Page 8: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/8.jpg)
Steps
Data Collection
Data Pre-processing
Network Training
using Deep Learning
Transfer expressions
![Page 9: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/9.jpg)
Data Collection - Human Database
• CK+: The Extended Cohn-Kanade [REF] -309 images • DISFA: Denver Intensity of Spontaneous Facial Actions [REF] 60,000
images • KDEF: The Karolinska Directed Emotional Faces [REF] 4900 images • MMI: 10,000 images • Total of 75K images - We balanced out the final number of the
samples for training our network to avoid any bias towards any particular expression.
![Page 10: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/10.jpg)
Data Collection - Character Database
• Eight stylized characters • The animator creates the
• key poses for each expression • labeled via Mechanical Turk (MT) to populate the database
initially • We only used the expression key poses having 70% MT test
agreement among 50 Turkers for the same pose. Interpolating between the key poses resulted in 60,000 images (around 8,000 images per character).
![Page 11: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/11.jpg)
Data Pre-processing
Extract Face 49 landmarks (Intraface)
Register faces to an average frontal face via an affine transformation
Face bounding box selection
Re-size to 256x256 pixels for analysis
![Page 12: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/12.jpg)
Registered faces
Disgust(CK+) Joy(DISFA) Anger (KDEF) Surprise (MMI)
![Page 13: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/13.jpg)
Training networks
Stylized Character
Neural Network
Human Neural
Network
A
D
F
J
N
Sa
Sa
D
F
J
N
Sa
Sa
A
Find the correlation
between the corresponding expressions
Mapping
![Page 14: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/14.jpg)
Network Training using Deep Learning
Data Augmentation • 5 crops of 227x227
from four corners • center crop • Horizontal flip
Training Human model • 4 CONV layers • 4 POOL layers • 2 Fully Connected
layers
Training character model • 3 CONV layers • 3 POOL layers • 2 Fully Connected
layers
Fine-tuning character model • N-1 layer features
![Page 15: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/15.jpg)
Network Architecture
Human CNN (HCNN) Character CNN (CCNN) Shared CNN (SCNN)
![Page 16: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/16.jpg)
When and How to Fine-tune?
• New dataset is small and similar to original dataset. • Not a good idea to fine-tune the ConvNet (overfitting) • Train a linear classifier on the CNN codes.
• New dataset is medium/large and similar to the original dataset. • Fine-tune through the full network (Our shared CNN)
• New dataset is small but very different from the original dataset. • Train the SVM classifier from activations (somewhere earlier in the network)
• New dataset is large and very different from the original dataset. • Train from scratch • Initialize with weights from a pre-trained model.
![Page 17: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/17.jpg)
Transfer Learning
FC6 features extracted
from HCNN
FC6 features
extracted from SCNN
Shared human-character
feature space
![Page 18: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/18.jpg)
Distance Metrics
• Extracted features from the last fully connected layer of both the models: human expression trained model and fine-tuned character expression model & normalized the feature vectors
• To retrieve the stylized character closest expression match to the human expression:
• Jensen—Shannon divergence Distance for expression clarity • Geometric feature distance for expression refinement
Expression feature vectors (N-1) Layer features
Geometry feature vectors
![Page 19: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/19.jpg)
Jensen—Shannon divergence • JS Divergence is symmetrical and gives a finite value:
where • Kullback—Leibler divergence is given as
where X and M are discrete probability distributions
KL Div. KL Div.
![Page 20: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/20.jpg)
Multiple correct label results
![Page 21: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/21.jpg)
Geometric distance refinement
• Since expressions are mainly controlled by muscles around the mouth, eyes and eyebrows, we focus on features that characterize the shape and location of these parts of the face.
• We use the facial landmarks to extract our geometric features including the following measurements:
• the left/right eyebrow height • left/right eyelid height • nose width • left mouth corner to mouth center distance • mouth corner to mouth center distance.
• We normalize these feature vectors and compute the L2 norm distance between
the human geometry vector and character geometry vectors with the correct expression label. Finally, we re-order the retrieved images within the matched label based on matched geometry.
![Page 22: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/22.jpg)
Layers Visualization
Input
Filter – conv1 Features – conv1 Features – conv2 Features – conv3
Prediction label: Surprise
![Page 23: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/23.jpg)
Top match results (Surprise and Joy) Query Character Retrievals
![Page 24: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/24.jpg)
Expression based Retrieval
Using CCNN
Using HCNN
![Page 25: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/25.jpg)
Evaluation
How close is the retrieved character expression label is to the human query expression label?
Retrieval Score
Spearman rank correlation coefficient
Kendall τ test
Expert Comparison
![Page 26: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/26.jpg)
Retrieval Score
• We measured the retrieval performance of our method by calculating the average normalized rank of relevant results (0 is the best score)
• The evaluation score for a query human expression image was calculated as
follows:
where where N is the number of images in the database Nrel the number of relevant expression label images to q Rk is the rank assigned to the kth relevant image.
![Page 27: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/27.jpg)
Average retrieval score for each expression across all characters
![Page 28: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/28.jpg)
Sample expert comparison
test1 test2 test3 test4 test5
Expression
test1 test2 test3 test4 test5
Expert
test1 test2 test3 test4 test5
Expression + Geometry
Rank 1 Rank 2 Rank 3 Rank4 Rank 5
Query
![Page 29: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/29.jpg)
Rank correlation coefficient • Pearson correlation coefficient
• The closer value is to 1, the better the two ranks are correlated. • The average Spearman correlation coefficient for the 30 validation rank
orderings is 0.773 ± 0.336. • Rank 1 correlation is 0.934. – Most relevant match!
• Kendall test
• Pairwise error that represents how many pairs are ranked discordant. The best matching ranks get a τ value of 1.
• The average Kendall correlation coefficient for 30 validation rank orderings is 0.706 ± 0.355
• Rank 1 correlation is 0.910 - Most relevant match!
![Page 30: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/30.jpg)
Correlation metrics with expert
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Corr
elat
ion
coef
ficie
nt
Number of validation sets
Spearman
Kendall
![Page 31: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/31.jpg)
Part 2: Generating Character Expressions
Convolutional layer Max pooling layer Fully connected layer
Surprise
Fully Connected Convolutional Neural Network
![Page 32: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/32.jpg)
Generating Character Expressions
Convolutional layer Max pooling layer Fully connected layer
N-1 feature vector
![Page 33: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/33.jpg)
Generating Character Expressions
Convolutional layer Max pooling layer Fully connected layer
N-1 feature vector
Maya parameters
![Page 34: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/34.jpg)
Learn character model parameters
Convolutional Max pooling Fully connected Soft max
N-1 feature vector
Maya parameters
![Page 35: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/35.jpg)
Preliminary Result:
Disgust expression query
Disgust expression Parameter rendering
![Page 36: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/36.jpg)
Applications
• Improve visual storytelling applications: • animated films • Gaming • Online marketing • VR/AR experiences • Robotics
• Medically-motivated application: teaching children with autism
spectrum disorder (ASD) to both recognize and convey expressions using cartoon characters in an interactive environment.
![Page 37: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/37.jpg)
Expression retrieval work to be presented at Asian Conference on Computer Vision (Nov 2016).
Project webpage http://grail.cs.washington.edu/projects/deepexpr/
![Page 38: Facial Expression Recognition and Generationshapiro/EE562/notes/... · 2016. 11. 9. · • Eight stylized characters • The animator creates the • key poses for each expression](https://reader035.fdocuments.in/reader035/viewer/2022071414/610d812bdae5a658a366b733/html5/thumbnails/38.jpg)
Questions?