TDLC workshop August 2009–Social Interaction Network Day
FACS
Facial Action Coding System
CERT
Computer Expression Recognition Toolbox
The Facial Action Coding SystemEkman & Friesen, 1978
How do humans detect facial actions?
• Relative movement of parts of the face. (motion‐based, video )
• Wrinkles and furrows. (texture based, photo)
• Shape of parts of the face. (texture based)
Fear brow 1+2+4
Neutral Brow raise
Video examples
• Show ELAN clip of TDLC_30_MysteryBox (motion enhances human detection)
• Show ELAN clip of TDLC_24_MysteryBox (interpreting low intensity action)
• Hand out contains list of numerical codes for most common facial action units.
• Briefly try producing :AU1+2, 4, 751,53,55,61,6312,20,26
4 alone1 alone
1+4 1+2+41+2
5 alone
Target Upper face AU’s
Target Lower face AU’s
10 201412
Basic Emotions
• Anger
• Disgust
• Fear
• Joy
• Sadness
• Surprise
DFAT Examples
Pain Actions
Real Pain with CERT
Warning
• Do not try to FACS code your own data.
• Real expressions are typically complex combinations of AUs at various intensities.
• Subtle differences not mentioned here.
C.E.R.T.
please hold while I open the CERT GUI
Computer Expression Recognition Toolbox (CERT)
FACS AU 1AU 2
:AU 46
Binary Classifier
Bank
FeatureSelection
Machine Learning
Developed at the Machine Perception Laboratory
filter
Meta Labels
Dynamics
Face detection
• Compaq Dataset: 5,000 face images from the web and segmented by hand.
• 8,000 non‐face images collected from the web.
• 30 frames/second (160x120 images, 2.1ghz)
• 90% Detection rate, 1/million false positive rate on CMU test set. Equivalent to Viola & Jones
• Source code available at http://mplab.ucsd.edu
+ ++
+
Frontal +‐10 degrees, Procrustes Alignment using 4 points
Automatic registration
Training Data Large datasets (over 10,000 images from over 300 subjects)
Expert FACS coding
Combined posed and spontaneous datasets:
DFAT ‐ Cohn‐Kanade
Ekman Hager
MMI ‐ Pantic et al
D006 D005 and D007‐ Frank et al
Training set size20,000 subject smile database
100 10,0001000
90%
Training set size
Performance
Whitehill et al.
POFA+noise
Gabor representation
SVM Classifiers
• 19 AUs with over 100 examples
• Unilaterals (left or right) for 3 Aus
• Fear, distress, smiles, blinks
• Yaw, pitch and roll
Action Unit Recognition Performance TableAU Name Posed Spont
1 Inner brow raise .97 .892 Outer brow raise .95 .824 Brow Lower .94 .745 Upper Lid Raise .96 .796 Cheek Raise .92 .907 (7) Lids tight .91 .789 Nose wrinkle .99 .8710 Upper lip raise .95 .7912 Lip corner pull .99 .9214 Dimpler .90 .7715 Lip corner Depress .97 .8617 Chin Raise .95 .8018 (18) Lip Pucker .83 .7220 Lip stretch .91 .6223 Lip tighten .85 .6624 Lip press .94 .7525 Lips part .96 .7226 Jaw drop .88 .711,1+4 Distress brow .94 .701+2+4 Fear brow .95 .63
Mean: .93 .77
Performance Area under the ROC
= fraction correct on a 2‐alternative forced choice.
Unbiased sensitivity
0 1 false alarm rate
A’
ROC curve1
hit rate
0
equal error
The SVM margin (CERT output) predicts human AU intensity label
Frame Number
Margin
Correlation
• Varies by subject• Ranges from r = .34 to r = .93• Mean r = .63AU 4
AU 7AU 9
…
Dynamic classifier output on 3.5 minutes of video. Spontaneous behavior. Action unit 12=Zygomatic
A
D
D
A
ED
B
C CD
C
E
D
Video Frame number
Evid
ence
of A
U12
Human AU codes* ABCDEApex Onset-offset interval low-high Intensity
Predicting self‐report of emotion
Dynamicsou
tput
Surprise Expressions
Frame Number Frame Number Frame Number Frame Number
Subject 1 Subject 2 Subject 3 Subject 4
* AU 1o AU 2
AU 5
Frame Number Frame Number Frame Number Frame Number
output
Subject 1 Subject 2 Subject 3 Subject 4
Disgust Expressions
* AU 4o AU 7
AU 9
Testing: MLR Weighted Temporal Windows
Classifier for Real vs Fake Pain
SVM(RBF)
Frame
Real Pain
Fake Pain
:
19 Facial Action Channels
AU1
AU2
AU4
AU28
Gabor‐like filter applied to 500 frame window
Statistics within episode
No Pain
Conclusions from Pain study• Automated Expression Coding identified similar Facial Actions to
previous human coding studies for genuine pain and posed pain.
• Dynamical information such as the variability of duration of theactions or the correlations between actions was necessary for good classification performance.
• A learned classifier outperformed naïve humans for discriminating fake from real pain expressions, based on 1 minute of video of each– 52% (human) for fake/real – 89% (cert) for fake/real
TDLC study
• Judy Reilly, Marni Bartlett, Gwen Littlewort Linda Phan, Grace Kang and co.
• 30 TD kids aged 4‐9 years doing a 45 minute session which includes many tasks that provoke facial expression through imitation, natural social interaction and games.
• Longitudinal study. Data base.
• Rapidly yields huge quantities of CERT output data.
• Need a variety of temporal dynamics measures to apply to CERT outputs such as event detection, measures of coordination and temporal integration, block statistics, correlation distributions.
show video subject 30 mystery box item 2
Head direction‐activity plot
activechild
Calm child
Finding a needle in a haystack – trajectory of synchronized and sequentially phased actions for one subject. 4 seconds of activity.
Comparing coordination of actions in an older and younger child during the latency period of a mystery box task: Trajectories of
frown and smile during a moment of “recognition”.
Social referencing ? Returning gaze to adult interviewer as puzzlement fades.
Top Related