Learning deep representation from coarse to fine for face alignment

Post on 15-Apr-2017

19 views 1 download

Transcript of Learning deep representation from coarse to fine for face alignment

{shaozhiwen, feiben, yiru.zhao, qinchuan.zhang}@sjtu.edu.cn, ma-lz@cs.sjtu.edu.cn

Learning Deep Representation from Coarse to Fine for Face Alignment Zhiwen Shao, Shouhong Ding, Yiru Zhao, Qinchuan Zhang, and Lizhuang Ma

Department of Computer Science and Engineering

Shanghai Jiao Tong University, China

Problem

Face alignment is to locate facial landmarks

Motivation & Challenges

Face alignment has many applications

• Face animation

• Face beautification

• Face preprocessing

There are some challenges

• Large pose, illumination and expression

variations

• Partial occlusion

• Low quality

We need an effective method to represent

highly complex faces

Ours vs. Others

Conventional methods

• Their results are highly relevant to the

initial shape

• Our network takes raw faces as input

without any initialization

Deep learning methods

• They use cascaded networks or multitask

learning

• Our method uses one network and

doesn’t require extra facial attributes

CNN

Coarse-to-fine Training Algorithm Comparison with other methods

The detection of dense landmarks is difficult

owing to too many labels of each face

There are a few key landmarks coarsely

determining the face shape

Given landmarks can be split into principal

subset and elaborate subset

Principal subset

Elaborate subset

Loss function

controls the relative weight

of principal subset

The prediction for location of the principal

subset can extract intrinsic facial structure

We further fine-tune the learned model by

adjusting the relative weight of principal

subset

Deep convolutional network

Convolutional layer 3×3/1/1

Principal unit

Input

50×50×3

25×25×64 13×13×128 7×7×192

25×25×128

4×4×256

50×50×64 13×13×192 7×7×256

256

24

n-24

24

n-24

24

n-24

24

n-24

Max-pooling layer 2×2/2/0

Fully-connected unit

Elaborate unit

RCPR

CFT

Algorithm discussions

The input is 50×50×3 for color face patches. n is

equal to double total number of landmarks

Three face alignment benchmarks

• Helen, 300-W, COFW

Direct training algorithm (DT)

Coarse-to-fine training algorithm (CFT)

Results of RCPR and CFT on several images from COFW

Results of CFT on several images from Helen and IBUG

Conclusion

Comparison of mean errors (%) with other methods

We propose a novel coarse-to-fine algorithm to train deep

convolutional network for facial landmark detection

Our network directly predicts the coordinates of landmarks

using a single network without any other additional

operation, whilst significantly improving the accuracy of

face alignment in the condition of severe occlusion

We believe that the proposed algorithm can be applied to

other problems using deep convolutional network