Learning deep representation from coarse to fine for face alignment

1
{shaozhiwen, feiben, yiru.zhao, qinchuan.zhang}@sjtu.edu.cn, [email protected] Learning Deep Representation from Coarse to Fine for Face Alignment Zhiwen Shao, Shouhong Ding, Yiru Zhao, Qinchuan Zhang, and Lizhuang Ma Department of Computer Science and Engineering Shanghai Jiao Tong University, China Problem Face alignment is to locate facial landmarks Motivation & Challenges Face alignment has many applications Face animation Face beautification Face preprocessing There are some challenges Large pose, illumination and expression variations Partial occlusion Low quality We need an effective method to represent highly complex faces Ours vs. Others Conventional methods Their results are highly relevant to the initial shape Our network takes raw faces as input without any initialization Deep learning methods They use cascaded networks or multitask learning Our method uses one network and doesn’t require extra facial attributes CNN Coarse-to-fine Training Algorithm Comparison with other methods The detection of dense landmarks is difficult owing to too many labels of each face There are a few key landmarks coarsely determining the face shape Given landmarks can be split into principal subset and elaborate subset Principal subset Elaborate subset Loss function controls the relative weight of principal subset The prediction for location of the principal subset can extract intrinsic facial structure We further fine-tune the learned model by adjusting the relative weight of principal subset Deep convolutional network Convolutional layer 3×3/1/1 Principal unit Input 50×50×3 25×25×64 13×13×128 7×7×192 25×25×128 4×4×256 50×50×64 13×13×192 7×7×256 256 24 n-24 24 n-24 24 n-24 24 n-24 Max-pooling layer 2×2/2/0 Fully-connected unit Elaborate unit RCPR CFT Algorithm discussions The input is 50×50×3 for color face patches. n is equal to double total number of landmarks Three face alignment benchmarks Helen, 300-W, COFW Direct training algorithm (DT) Coarse-to-fine training algorithm (CFT) Results of RCPR and CFT on several images from COFW Results of CFT on several images from Helen and IBUG Conclusion Comparison of mean errors (%) with other methods We propose a novel coarse-to-fine algorithm to train deep convolutional network for facial landmark detection Our network directly predicts the coordinates of landmarks using a single network without any other additional operation, whilst significantly improving the accuracy of face alignment in the condition of severe occlusion We believe that the proposed algorithm can be applied to other problems using deep convolutional network

Transcript of Learning deep representation from coarse to fine for face alignment

Page 1: Learning deep representation from coarse to fine for face alignment

{shaozhiwen, feiben, yiru.zhao, qinchuan.zhang}@sjtu.edu.cn, [email protected]

Learning Deep Representation from Coarse to Fine for Face Alignment Zhiwen Shao, Shouhong Ding, Yiru Zhao, Qinchuan Zhang, and Lizhuang Ma

Department of Computer Science and Engineering

Shanghai Jiao Tong University, China

Problem

Face alignment is to locate facial landmarks

Motivation & Challenges

Face alignment has many applications

• Face animation

• Face beautification

• Face preprocessing

There are some challenges

• Large pose, illumination and expression

variations

• Partial occlusion

• Low quality

We need an effective method to represent

highly complex faces

Ours vs. Others

Conventional methods

• Their results are highly relevant to the

initial shape

• Our network takes raw faces as input

without any initialization

Deep learning methods

• They use cascaded networks or multitask

learning

• Our method uses one network and

doesn’t require extra facial attributes

CNN

Coarse-to-fine Training Algorithm Comparison with other methods

The detection of dense landmarks is difficult

owing to too many labels of each face

There are a few key landmarks coarsely

determining the face shape

Given landmarks can be split into principal

subset and elaborate subset

Principal subset

Elaborate subset

Loss function

controls the relative weight

of principal subset

The prediction for location of the principal

subset can extract intrinsic facial structure

We further fine-tune the learned model by

adjusting the relative weight of principal

subset

Deep convolutional network

Convolutional layer 3×3/1/1

Principal unit

Input

50×50×3

25×25×64 13×13×128 7×7×192

25×25×128

4×4×256

50×50×64 13×13×192 7×7×256

256

24

n-24

24

n-24

24

n-24

24

n-24

Max-pooling layer 2×2/2/0

Fully-connected unit

Elaborate unit

RCPR

CFT

Algorithm discussions

The input is 50×50×3 for color face patches. n is

equal to double total number of landmarks

Three face alignment benchmarks

• Helen, 300-W, COFW

Direct training algorithm (DT)

Coarse-to-fine training algorithm (CFT)

Results of RCPR and CFT on several images from COFW

Results of CFT on several images from Helen and IBUG

Conclusion

Comparison of mean errors (%) with other methods

We propose a novel coarse-to-fine algorithm to train deep

convolutional network for facial landmark detection

Our network directly predicts the coordinates of landmarks

using a single network without any other additional

operation, whilst significantly improving the accuracy of

face alignment in the condition of severe occlusion

We believe that the proposed algorithm can be applied to

other problems using deep convolutional network