Facial Animation

57
Facial Animation Facial Animation By: Shahzad Malik By: Shahzad Malik CSC2529 Presentation CSC2529 Presentation March 5, 2003 March 5, 2003

description

Facial Animation. By: Shahzad Malik CSC2529 Presentation March 5, 2003. Motivation. Realistic human facial animation is a challenging problem (DOF, deformations) Would like expressive and plausible animations of a photorealistic 3D face Useful for virtual characters in film, video games. - PowerPoint PPT Presentation

Transcript of Facial Animation

Page 1: Facial Animation

Facial AnimationFacial Animation

By: Shahzad MalikBy: Shahzad Malik

CSC2529 PresentationCSC2529 Presentation

March 5, 2003March 5, 2003

Page 2: Facial Animation

MotivationMotivation

Realistic human facial animation is a Realistic human facial animation is a challenging problem (DOF, challenging problem (DOF, deformations)deformations)

Would like expressive and plausible Would like expressive and plausible animations of a photorealistic 3D animations of a photorealistic 3D faceface

Useful for virtual characters in film, Useful for virtual characters in film, video gamesvideo games

Page 3: Facial Animation

PapersPapers

Three major areas are Lip Syncing, Three major areas are Lip Syncing, Face Modeling, and Expression Face Modeling, and Expression SynthesisSynthesis Video Rewrite (Bregler – SIGGRAPH 1997)Video Rewrite (Bregler – SIGGRAPH 1997) Making Faces (Guenter – SIGGRAPH 1998) Making Faces (Guenter – SIGGRAPH 1998) Expression Cloning (Noh – SIGGRAPH Expression Cloning (Noh – SIGGRAPH

2001)2001)

Page 4: Facial Animation

Video RewriteVideo Rewrite

Generate a new video of an actor Generate a new video of an actor mouthing a new utterance by piecing mouthing a new utterance by piecing together old footagetogether old footage

Two stages:Two stages: AnalysisAnalysis SynthesisSynthesis

Page 5: Facial Animation

Analysis StageAnalysis Stage

Given footage of the subject Given footage of the subject speaking, extract mouth position and speaking, extract mouth position and lip shapelip shape

Hand label 26 training images:Hand label 26 training images: 34 points on mouth (20 outer boundary, 34 points on mouth (20 outer boundary,

12 inner boundary, 1 at bottom of upper 12 inner boundary, 1 at bottom of upper teeth, 1 at top of lower teeth)teeth, 1 at top of lower teeth)

20 points on chin and jaw line20 points on chin and jaw line Morph training set to get to 351 imagesMorph training set to get to 351 images

Page 6: Facial Animation

EigenPointsEigenPoints

Create EigenPoint models using this setCreate EigenPoint models using this set Use derived EigenPoints model to label Use derived EigenPoints model to label

features in all frames of the training videofeatures in all frames of the training video

Page 7: Facial Animation

EigenPoints (continued)EigenPoints (continued)

Problem:Problem: EigenPoints assumes EigenPoints assumes features are undergoing pure features are undergoing pure translationtranslation

Page 8: Facial Animation

Face WarpingFace Warping

Before EigenPoints labeling, warp Before EigenPoints labeling, warp each image into a reference planeeach image into a reference plane

Use a minimization algorithm to Use a minimization algorithm to register imagesregister images

i

iFiT IIME 2)x()'x()(

1x'x

876

543

210

yx

mmmmmmmmm

M

Page 9: Facial Animation

Face Warping (continued)Face Warping (continued)

Use rigid parts of face to estimate warp Use rigid parts of face to estimate warp MM

Warp face by Warp face by MM-1-1

Perform eigenpoint analysisPerform eigenpoint analysis Back-project features by Back-project features by MM onto face onto face

Page 10: Facial Animation

Audio AnalysisAudio Analysis

Want to capture visual dynamics of Want to capture visual dynamics of speechspeech

Phonemes are not enoughPhonemes are not enough Consider Consider coarticulationcoarticulation Lip shapes for many phonemes are Lip shapes for many phonemes are

modified based on phoneme’s context modified based on phoneme’s context (e.g. /T/ in “beet” vs. /T/ in “boot”)(e.g. /T/ in “beet” vs. /T/ in “boot”)

Page 11: Facial Animation

Audio Analysis (continued)Audio Analysis (continued)

Segment speech into triphonesSegment speech into triphones e.g. “teapot” becomes /SIL-T-IY/, /T-e.g. “teapot” becomes /SIL-T-IY/, /T-

IY-P/, /IY-P-AA/, /P-AA-T/, and /AA-T-IY-P/, /IY-P-AA/, /P-AA-T/, and /AA-T-SIL/)SIL/)

Emphasize middle of each triphoneEmphasize middle of each triphone Effectively captures forward and Effectively captures forward and

backward coarticulationbackward coarticulation

Page 12: Facial Animation

Audio Analysis (continued)Audio Analysis (continued) Training footage audio is labeled with Training footage audio is labeled with

phonemes and associated timingphonemes and associated timing Use gender-specific HMMs for segmentationUse gender-specific HMMs for segmentation Convert transcript into triphonesConvert transcript into triphones

Page 13: Facial Animation

Synthesis StageSynthesis Stage Given some new speech utteranceGiven some new speech utterance

Mark it with phoneme labelsMark it with phoneme labels Determine triphonesDetermine triphones Find a video example with the Find a video example with the desired desired

transitiontransition in database in database Compute a matching distance to Compute a matching distance to

each triphone:each triphone:

error = αDp + (1- α)Ds

Page 14: Facial Animation

Viseme ClassesViseme Classes Cluster phonemes into Cluster phonemes into visemeviseme classes classes Use 26 viseme classes (10 consonant, Use 26 viseme classes (10 consonant,

15 vowel):15 vowel):

(1) /CH/, /JH/, /SH/, /ZH/(1) /CH/, /JH/, /SH/, /ZH/

(2) /K/, /G/, /N/, /L/(2) /K/, /G/, /N/, /L/

……

(25) /IH/, /AE/, /AH/(25) /IH/, /AE/, /AH/

(26) /SIL/(26) /SIL/

Page 15: Facial Animation

Phoneme Context DistancePhoneme Context Distance Dp is phoneme context distance

Distance is 0 if phonemic categories are the same (e.g. /P/ and /P/)

Distance is 1 if viseme classes are different (e.g. /P/ and /IY/)

Distance is between 0 and 1 if different phonemic classes but same viseme class (e.g. /P/ and /B/)

Compute for the entire triphone Weight the center phoneme most

Page 16: Facial Animation

Lip Shape DistanceLip Shape Distance

Ds is distance between lip shapes in overlapping triphones Eg. for “teapot”, contours for /IY/ and /P/

should match between /T-IY-P/ and /IY-P-AA/

Compute Euclidean distance between 4-element vectors (lip width, lip height, inner lip height, height of visible teeth)

Solution depends on neighbors in both directions (use DP)

Page 17: Facial Animation

Time Alignment of Triphone Time Alignment of Triphone VideosVideos

Need to combine triphone videosNeed to combine triphone videos Choose portion of overlapping Choose portion of overlapping

triphones where lip shapes are close triphones where lip shapes are close as possibleas possible

Already done when computing DsAlready done when computing Ds

Page 18: Facial Animation

Time Alignment to Time Alignment to UtteranceUtterance

Still need to time align with target audioStill need to time align with target audio Compare corresponding phoneme Compare corresponding phoneme

transcriptstranscripts Start time of center phoneme in triphone is Start time of center phoneme in triphone is

aligned with label in target transcriptaligned with label in target transcript Video is then stretched/compressed to fit Video is then stretched/compressed to fit

time needed between target phoneme time needed between target phoneme boundariesboundaries

Page 19: Facial Animation

Combining Lips and Combining Lips and BackgroundBackground

Need to stitch new mouth movie into Need to stitch new mouth movie into background original face sequencebackground original face sequence

Compute transform Compute transform MM as before as before Warping replacement mask defines Warping replacement mask defines

mouth and background portions in final mouth and background portions in final videovideo

Mouth mask Background mask

Page 20: Facial Animation

Combining Lips and Combining Lips and BackgroundBackground

Mouth shape comes from triphone Mouth shape comes from triphone image, and is warped using image, and is warped using MM

Jaw shape is combination of background Jaw shape is combination of background jaw and triphone jaw linesjaw and triphone jaw lines

Near ears, jaw dependent on Near ears, jaw dependent on background, near chin, jaw depends on background, near chin, jaw depends on mouthmouth

Illumination matching is used to avoid Illumination matching is used to avoid seams mouth and backgroundseams mouth and background

Page 21: Facial Animation

Video Rewrite ResultsVideo Rewrite Results Video: 8 minutes of video, 109 sentencesVideo: 8 minutes of video, 109 sentences Training Data: front-facing segments of Training Data: front-facing segments of

video, around 1700 triphonesvideo, around 1700 triphones

“Emily” sequences

Page 22: Facial Animation

Video Rewrite ResultsVideo Rewrite Results

2 minutes of video, 1157 triphones2 minutes of video, 1157 triphones

JFK sequences

Page 23: Facial Animation

Video RewriteVideo Rewrite

Image-based facial animation systemImage-based facial animation system Driven by audioDriven by audio Output sequence created from real Output sequence created from real

videovideo Allows natural facial movements (eye Allows natural facial movements (eye

blinks, head motions)blinks, head motions)

Page 24: Facial Animation

Making FacesMaking Faces

Allows capturing facial expressions in Allows capturing facial expressions in 3D from a video sequence3D from a video sequence

Provides a 3D model and texture that Provides a 3D model and texture that can be rendered on 3D hardwarecan be rendered on 3D hardware

Page 25: Facial Animation

Data CaptureData Capture

Actor’s face digitized using a Cyberware Actor’s face digitized using a Cyberware scanner to get a base 3D meshscanner to get a base 3D mesh

Six calibrated video cameras capture Six calibrated video cameras capture actor’s expressionsactor’s expressions

Six camera views

Page 26: Facial Animation

Data CaptureData Capture

182 dots are glued to actor’s face182 dots are glued to actor’s face Each dot is one of six colors with Each dot is one of six colors with

fluorescent pigmentfluorescent pigment Dots of same color are placed as far Dots of same color are placed as far

apart as possibleapart as possible Dots follow the contours of the face Dots follow the contours of the face

(eyes, lips, nasio-labial furrows, etc.)(eyes, lips, nasio-labial furrows, etc.)

Page 27: Facial Animation

Dot LabelingDot Labeling Each dot needs a unique labelEach dot needs a unique label Dots will be used to warp the 3D Dots will be used to warp the 3D

meshmesh Also used later for texture generation Also used later for texture generation

from the six viewsfrom the six views For each frame in each camera:For each frame in each camera:

Classify each pixel as belonging to one Classify each pixel as belonging to one of six categoriesof six categories

Find connected componentsFind connected components Compute the centroidCompute the centroid

Page 28: Facial Animation

Dot Labeling (continued)Dot Labeling (continued)

Need to compute dot correspondences Need to compute dot correspondences between camera viewsbetween camera views

Must handle occlusions, false matchesMust handle occlusions, false matches Compute all point correspondences Compute all point correspondences

between between kk cameras and cameras and nn 2D dots 2D dots

2

2 nk

point

correspondences

Page 29: Facial Animation

Dot Labeling (continued)Dot Labeling (continued)

For each correspondenceFor each correspondence Triangulate a 3D point based on closest Triangulate a 3D point based on closest

intersection of rays cast through 2D dotsintersection of rays cast through 2D dots Check if back-projection is above some Check if back-projection is above some

thresholdthreshold All 3D candidates below threshold are All 3D candidates below threshold are

storedstored

Page 30: Facial Animation

Dot Labeling (continued)Dot Labeling (continued) Project stored 3D points into a Project stored 3D points into a

reference viewreference view Keep points that are within 2 pixels Keep points that are within 2 pixels

of dots in reference viewof dots in reference view These points are potential 3D These points are potential 3D

matches for a given 2D dotmatches for a given 2D dot Compute average as final 3D positionCompute average as final 3D position Assign to 2D dot in reference viewAssign to 2D dot in reference view

Page 31: Facial Animation

Dot Labeling (continued)Dot Labeling (continued)

Need to assign consistent labels to 3D dot Need to assign consistent labels to 3D dot locations across entire sequencelocations across entire sequence

Define a reference set of dots Define a reference set of dots DD (frame 0) (frame 0) Let Let ddjj ЄЄ DD be the neutral location for dot be the neutral location for dot jj Position of Position of ddjj at frame at frame ii is is ddjj

ii=d=djj+v+vjjii

For each reference dot, find the closest 3D For each reference dot, find the closest 3D dot of same color within some distance dot of same color within some distance εε

Page 32: Facial Animation

Moving the DotsMoving the Dots

Move reference dot to matched Move reference dot to matched locationlocation

For unmatched reference dot For unmatched reference dot ddkk, let , let nnkk be the set of neighbor dots with be the set of neighbor dots with match in current frame match in current frame ii

k

ij nd

ij

k

ik v

nv

1

Page 33: Facial Animation

Constructing the MeshConstructing the Mesh

Cyberware scan has problems:Cyberware scan has problems: Fluorescent markers cause bumps on meshFluorescent markers cause bumps on mesh No mouth openingNo mouth opening Too many polygonsToo many polygons

Bumps removed manuallyBumps removed manually Split mouth polygons, add teeth and Split mouth polygons, add teeth and

tongue polygonstongue polygons Run mesh simplification algorithm Run mesh simplification algorithm

(Hoppe’s algorithm: 460k to 4800 polys)(Hoppe’s algorithm: 460k to 4800 polys)

Page 34: Facial Animation

Moving the MeshMoving the Mesh

Move vertices by linear combination Move vertices by linear combination of offsets of nearest dotsof offsets of nearest dots

k

kik

jkj

ij ddpp

Dd

jk

k

1where

Page 35: Facial Animation

Assigning Blend CoefficientsAssigning Blend Coefficients

Assign blend coefficients for a grid of Assign blend coefficients for a grid of 1400 evenly distributed points on face1400 evenly distributed points on face

Page 36: Facial Animation

Assigning Blend CoefficientsAssigning Blend Coefficients

Label each dot, vertex, grid point as Label each dot, vertex, grid point as aboveabove, , belowbelow, or , or neitherneither

Find 2 closest dots to each grid point Find 2 closest dots to each grid point pp DDnn is set of dots within 1.8(d is set of dots within 1.8(d11+d+d22)/2 of )/2 of pp Remove points in relatively same Remove points in relatively same

directiondirection Assign blend values based on distance Assign blend values based on distance

from from pp

Page 37: Facial Animation

Assign Blend Coefficients Assign Blend Coefficients (cont.)(cont.)

If dot not in Dn, then a is 0If dot not in Dn, then a is 0 If dot in Dn:If dot in Dn:

ni Ddi

ii

ii l

l

pdl then ,

0.1let

For vertices, find closest grid pointsFor vertices, find closest grid points Copy blend coefficientsCopy blend coefficients

Page 38: Facial Animation

Dot RemovalDot Removal Substitute skin color for dot colorsSubstitute skin color for dot colors First low-pass filter the imageFirst low-pass filter the image Directional filter prevents color bleedingDirectional filter prevents color bleeding Black=symmetric, White=directionalBlack=symmetric, White=directional

Face with dots Low-pass mask

Page 39: Facial Animation

Dot Removal (continued)Dot Removal (continued)

Extract rectangular patch of dot-free Extract rectangular patch of dot-free skinskin

High-pass filter this patchHigh-pass filter this patch Register patch to center of dot Register patch to center of dot

regionsregions Blend it with the low-frequency skinBlend it with the low-frequency skin Clamp hue values to narrow rangeClamp hue values to narrow range

Page 40: Facial Animation

Dot Removal (continued)Dot Removal (continued)

Original Low-pass

High-pass Hue clamped

Page 41: Facial Animation

Texture GenerationTexture Generation

Texture map generated for each frameTexture map generated for each frame Project mesh onto cylinderProject mesh onto cylinder Compute mesh location (Compute mesh location (kk, , ββ11, , ββ22) for each ) for each

texel (u,v)texel (u,v) For each camera, transform mesh into viewFor each camera, transform mesh into view For each texel, get 3D coordinates in meshFor each texel, get 3D coordinates in mesh Project 3D point to camera planeProject 3D point to camera plane Get color at (x,y) and store as texel colorGet color at (x,y) and store as texel color

Page 42: Facial Animation

Texture Generation Texture Generation (continued)(continued)

Compute a texel weight (dot product Compute a texel weight (dot product between texel normal on mesh and between texel normal on mesh and direction to camera)direction to camera)

Merge the texture maps from all the Merge the texture maps from all the cameras based on the weight map cameras based on the weight map

Page 43: Facial Animation

ResultsResults

Page 44: Facial Animation

Making Faces SummaryMaking Faces Summary

3D geometry and texture of a face3D geometry and texture of a face Data generated for each frame of videoData generated for each frame of video Shading and highlights “glued” to faceShading and highlights “glued” to face Nice results, but not totally automatedNice results, but not totally automated Need to repeat entire process for every Need to repeat entire process for every

new face we want to animatenew face we want to animate

Page 45: Facial Animation

Expression CloningExpression Cloning

Allows facial expressions to be Allows facial expressions to be mapped from one model to anothermapped from one model to another

Source model

Animation

Target model

Page 46: Facial Animation

Expression Cloning OutlineExpression Cloning Outline

Motion capture data or any animation

mechanism

DeformDense surface

correspondences

Vertex displacements

Cloned expressions

Motion transfer

Source model Target model

Source animation Target animation

Page 47: Facial Animation

Source Animation CreationSource Animation Creation

Use any existing facial animation methodUse any existing facial animation method Eg. “Making Faces” paper described earlierEg. “Making Faces” paper described earlier

Motion capture data

Source model Source animation

Page 48: Facial Animation

Dense Surface Dense Surface CorrespondenceCorrespondence

Manually select 15-35 correspondencesManually select 15-35 correspondences Morph the source model using RBFsMorph the source model using RBFs Perform cylindrical projection (ray through Perform cylindrical projection (ray through

source vertex, into target mesh)source vertex, into target mesh) Compute Barycentric coords of intersectionCompute Barycentric coords of intersection

After RBFInitial Features After projection

Page 49: Facial Animation

Automatic Feature SelectionAutomatic Feature Selection

Can also automate initial Can also automate initial correspondencescorrespondences

Use basic facts about human Use basic facts about human geometry:geometry: Tip of Nose = point with highest Z valueTip of Nose = point with highest Z value Top of Head = point with highest Y valueTop of Head = point with highest Y value Currently use around 15 such heuristic Currently use around 15 such heuristic

rulesrules

Page 50: Facial Animation

Example DeformationsExample Deformations

Source Deformed source Target

Closely approximates the target models

Page 51: Facial Animation

Animation with Motion Animation with Motion VectorsVectors

Animate by displacing target vertex by Animate by displacing target vertex by motion of corresponding source pointmotion of corresponding source point

Interpolate Barycentric coordinates of Interpolate Barycentric coordinates of target vertices based on source verticestarget vertices based on source vertices

Need to project target model onto Need to project target model onto source model (opposite of what we did source model (opposite of what we did before)before)

Page 52: Facial Animation

Motion Vector TransferMotion Vector Transfer

Need to adjust direction and magnitudeNeed to adjust direction and magnitude

Source Target Source Target

Source motion vector

Proper target motion vector

Page 53: Facial Animation

Motion Vector Transfer Motion Vector Transfer (cont.)(cont.)

Attach local coordinate system for each Attach local coordinate system for each vertex in source and deformed sourcevertex in source and deformed source

X-axis = average normalX-axis = average normal Y-axis = proj. of any adjacent edge onto Y-axis = proj. of any adjacent edge onto

plane with X-axis as normalplane with X-axis as normal Z-axis = cross product of X and YZ-axis = cross product of X and Y

Source TargetDeformed source

XY

Z

X

Y

Z

T m

m m M

Page 54: Facial Animation

Motion Vector TransferMotion Vector Transfer

Compute transformation between Compute transformation between two coordinate systemstwo coordinate systems

Mapping determines the deformed Mapping determines the deformed source model motion vectorssource model motion vectors

Page 55: Facial Animation

Example Motion TransferExample Motion Transfer

Adjusted motions

Models Motion vectors

Source

TargetMore horizontal

Smaller

Page 56: Facial Animation

ResultsResults

Angry expression

Distorted mouth

Source Targets

Big open mouth

Page 57: Facial Animation

SummarySummary

EC can animate new models using a EC can animate new models using a library of existing expressionslibrary of existing expressions

Transfers motion vectors from source Transfers motion vectors from source animations to target modelsanimations to target models

Process is fast and can be fully Process is fast and can be fully automatedautomated