[IEEE 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) - Regina,...

4
FAST FACIAL EXPRESSION RECOGNITION BASED ON LOCAL BINARY PATTERNS Rohit Verma and Mohamed-Yahia Dabbagh Dept. of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada Emails: [email protected], [email protected] ABSTRACT Automatic facial expression analysis is an interesting and challenging problem which impacts important applications in many areas such as human-computer interaction and data- driven animation. Deriving effective facial representative features from face images is a vital step towards success- ful expression recognition. In this paper, we evaluate facial representation based on statistical local features called Local Binary Patterns (LBP) for facial expression recognition. Sim- ulation results illustrate that LBP features are effective and efficient for facial expression recognition. A real-time im- plementation of the proposed approach is also demonstrated which can recognize expressions accurately at the rate of 4.8 frames per second. Index TermsFacial expression recognition, Local Bi- nary Patterns, Support vector machine, AdaBoost. 1. INTRODUCTION Recently, automatic facial expression recognition has at- tracted a lot of interest. Some of its applications include emotion detection and analysis, human-computer interaction, face animation, etc. Human emotions can be significantly ex- pressed by the facial expression which can be very helpful in interactive systems. Most of the systems are able to recognize basic prototype emotions like Happy, Sad, Surprise, Anger, Fear and Disgust. These general expressions are detected using certain variations of the facial features like broadening of mouth, closing of eyes, twitching of nose, etc. But, in normal day to day life all of these happening together is of low possibility. Instead, small gestures like tightening of lips in anger and lowering of lips in sadness and raising of eye- brows to greet someone are some of the general expressions. Some of the basic features which reveal the current expres- sions of the face are relative displacements of features like opening the mouth, texture changes of the skin, change in the skin hue (due to blushing), etc. Although a lot of progress has been made till now [1–3], to capture these subtle, com- plex and variable gestures, better automatic facial expression recognition systems are required. LBP features were originally proposed for texture analy- sis [4], and recently have been introduced to represent faces in Fig. 1: Facial expression recognition system facial image analysis. The most important properties of LBP features are their tolerance against illumination changes and their computational simplicity. LBP has been used along with linear programming (LP) [1] to recognize facial expression. In [2], LBP features were compared with Gabor-filter features and it was concluded that LBP features perform better for low resolution images. In this paper, a simple but effective local- ized facial features based on local binary patterns has been proposed. Various machine learning approaches including Support Vector Machine (SVM) [5] and Adaptive Boosting (AdaBoost) [6, 7] have also been examined for facial expres- sion recognition. One limitation of the existing techniques is that they are slow in extracting the facial features and recog- nizing the expression, therefore a real-time implementation of the proposed approach has also been presented. 2. FAST FACIAL EXPRESSION RECOGNITION A facial expression recognition system can be divided into three blocks as shown in Fig. 1. Facial emotion recognition involves the important steps of facial feature representation and classifier design. Facial representation is to derive a set of features which effectively represents the face. It can be extracted either in terms of geometric features or appearance features. Geometric features include the location and shape of facial components. These facial feature points, form the feature vector that represents the face geometry. The appear- 2013 26th IEEE Canadian Conference Of Electrical And Computer Engineering (CCECE) 978-1-4799-0033-6/13/$31.00 ©2013 IEEE

Transcript of [IEEE 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) - Regina,...

Page 1: [IEEE 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) - Regina, SK, Canada (2013.05.5-2013.05.8)] 2013 26th IEEE Canadian Conference on Electrical

FAST FACIAL EXPRESSION RECOGNITION BASED ON LOCAL BINARY PATTERNS

Rohit Verma and Mohamed-Yahia Dabbagh

Dept. of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, CanadaEmails: [email protected], [email protected]

ABSTRACT

Automatic facial expression analysis is an interesting andchallenging problem which impacts important applicationsin many areas such as human-computer interaction and data-driven animation. Deriving effective facial representativefeatures from face images is a vital step towards success-ful expression recognition. In this paper, we evaluate facialrepresentation based on statistical local features called LocalBinary Patterns (LBP) for facial expression recognition. Sim-ulation results illustrate that LBP features are effective andefficient for facial expression recognition. A real-time im-plementation of the proposed approach is also demonstratedwhich can recognize expressions accurately at the rate of 4.8frames per second.

Index Terms— Facial expression recognition, Local Bi-nary Patterns, Support vector machine, AdaBoost.

1. INTRODUCTION

Recently, automatic facial expression recognition has at-tracted a lot of interest. Some of its applications includeemotion detection and analysis, human-computer interaction,face animation, etc. Human emotions can be significantly ex-pressed by the facial expression which can be very helpful ininteractive systems. Most of the systems are able to recognizebasic prototype emotions like Happy, Sad, Surprise, Anger,Fear and Disgust. These general expressions are detectedusing certain variations of the facial features like broadeningof mouth, closing of eyes, twitching of nose, etc. But, innormal day to day life all of these happening together is oflow possibility. Instead, small gestures like tightening of lipsin anger and lowering of lips in sadness and raising of eye-brows to greet someone are some of the general expressions.Some of the basic features which reveal the current expres-sions of the face are relative displacements of features likeopening the mouth, texture changes of the skin, change in theskin hue (due to blushing), etc. Although a lot of progresshas been made till now [1–3], to capture these subtle, com-plex and variable gestures, better automatic facial expressionrecognition systems are required.

LBP features were originally proposed for texture analy-sis [4], and recently have been introduced to represent faces in

Fig. 1: Facial expression recognition system

facial image analysis. The most important properties of LBPfeatures are their tolerance against illumination changes andtheir computational simplicity. LBP has been used along withlinear programming (LP) [1] to recognize facial expression.In [2], LBP features were compared with Gabor-filter featuresand it was concluded that LBP features perform better for lowresolution images. In this paper, a simple but effective local-ized facial features based on local binary patterns has beenproposed. Various machine learning approaches includingSupport Vector Machine (SVM) [5] and Adaptive Boosting(AdaBoost) [6, 7] have also been examined for facial expres-sion recognition. One limitation of the existing techniques isthat they are slow in extracting the facial features and recog-nizing the expression, therefore a real-time implementation ofthe proposed approach has also been presented.

2. FAST FACIAL EXPRESSION RECOGNITION

A facial expression recognition system can be divided intothree blocks as shown in Fig. 1. Facial emotion recognitioninvolves the important steps of facial feature representationand classifier design. Facial representation is to derive a setof features which effectively represents the face. It can beextracted either in terms of geometric features or appearancefeatures. Geometric features include the location and shapeof facial components. These facial feature points, form thefeature vector that represents the face geometry. The appear-

2013 26th IEEE Canadian Conference Of Electrical And Computer Engineering (CCECE)

978-1-4799-0033-6/13/$31.00 ©2013 IEEE

Page 2: [IEEE 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) - Regina, SK, Canada (2013.05.5-2013.05.8)] 2013 26th IEEE Canadian Conference on Electrical

(a) LBP(4,4) (b) LBP(8,1) (c) LBP(8,4)

Fig. 2: Local Binary Patterns

ance features generally refer to the skin texture changes on theface, such as wrinkles and furrows. A good classifier designis also important for the correct recognition of expressions.The proposed algorithm performs the following three stepsfor each face image.

Face detection: Viola-Jones proposed a real-time objectdetection approach in [8]. The training is slow but detectionis very fast. The recognition process is very efficient as it isbased on the detection of Haar-like features that encode theexistence of oriented contrasts between different regions ofthe image. A set of these features has been used to encode thecontrasts exhibited by a human face and their spatial relation-ships. Therefore, this approach was used to extract the facialregions from the images.

Feature extraction: Local binary pattern (LBP) [4] is agray-scale invariant texture description operator which labelsthe pixels of an image by thresholding the neighborhood ofeach pixel with the center value and considering the resultas a binary pattern. Many features such as gray-scale in-variance and no normalization in a neighbor window makesit favorable for use in texture recognition. LBP is a non-parametric method which means that no underlying assump-tions are needed for its computation. The LBP operator isan excellent measure of the spatial structure around a pixel.Various circular neighborhoods can be obtained using bilinearinterpolation of the pixel values thereby allowing any radiusand number of pixels in the neighborhood of a center pixel.Fig. 2 shows various possible combinations to form a circu-larly symmetric neighbor set LBP(P,R), where P is the numberof neighboring pixels and R is the radius of operation. In (1),B(i) represents the binary bit at the neighboring pixel i, andI(c) represents the intensity at the center pixel c of the block,then the LBP transform of the ith neighbor is given by

B(i) ={1, if I(i) ≥ I(c)0, otherwise (1)

Fig. 3 shows an example of the generation of LBP(8,1) fea-tures. It thresholds a 3×3 neighborhood of each pixel with thecenter value and stores the result as a binary number. Thus,the 256-bin histogram of the LBP labels is generated. It is avery powerful primitive texture descriptor which can identifyvarious types of edges, flat areas, spots, etc. as shown in Fig.4. A uniform LBP contains at most two bitwise transitionsfrom either 0 to 1 or 1 to 0. It has been shown that these

Fig. 3: LBP(8,1) Operator

Dark spot Bright spot Line end Edge Corner

Fig. 4: Various edge textures detected using LBP(8,1)

LBP

Fig. 5: Viola-Jones face detection and LBP(8,1) transform

patterns account for over 90% of all patterns in the LBP(8,1)

neighborhood [4]. Computing LBP histogram over an imageonly gives the occurrence frequency of each pattern withoutany information on the regional variations, therefore it is ben-eficial to take into account the face shape in localizing theseprimitive patterns. Since, most of the emotions can be effec-tively represented by different variations in the eye and mouthregion primitive LBP features, therefore in order to gener-ate a simple feature vector a sample face image from JAFFEdatabase [9] was divided into regions containing the two eyesand the mouth as shown in Fig. 5. The LBP features extractedfrom the two eyes’ sub-region were concatenated into a fea-ture histogram and then averaged by two. The LBP featuresextracted from mouth region are appended to this histogram.Uniform LBP(8,1) operator was used to obtain the 59-bin his-togram features for each region. Thus the final LBP histogramhad a length of 118 (59×2).

Classifier design: Different machine learning techniquesincluding SVM and AdaBoost were examined to recognizeexpressions. SVM has been used successfully to classify fa-cial expressions [5]. It is a powerful machine learning tech-nique for data classification which tries to find a linear sep-arating hyperplane with the maximal margin to separate datain a higher dimensional space. AdaBoost learns the classifica-tion by selecting only those individual features that can bestdiscriminate among classes by training several weak classi-fiers sequentially.

Page 3: [IEEE 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) - Regina, SK, Canada (2013.05.5-2013.05.8)] 2013 26th IEEE Canadian Conference on Electrical

3. SIMULATION RESULTS

Publicly available Japanese Female Facial Expression Database(JAFFE) [9] was used to test the classifiers. JAFFE consistsof 213 images of 7 facial expressions namely Happy, Sad,Surprise, Anger, Disgust, Fear and Neutral of 10 females.The image size is 256×256 pixels. Along with the images,the dataset also provides mean opinion score (MOS) for eachfacial expression in the images. Fear has been excluded ac-cording to the suggestion from JAFFE website and so 181images were used in the experiments. For training, 121 im-ages were used and the remaining 60 images were used fortesting. Ten images were selected randomly for each expres-sion type for the testing dataset.

A unique feature matrix was also developed to representeach expression in terms of numerical values. For example,for a happy face, the maximum weight is assigned on the ex-pression happy and all the rest of the expressions are givenzero weighting. The derived feature matrix is shown in Ta-ble 1. Two types of classifiers have been used: SVM andAdaBoost. Both SVM and AdaBoost make binary decisions,so the multi-class classification is accomplished by using theone-against-rest technique, which trains binary classifiers todiscriminate one expression from all others, and outputs theclass with the largest prediction output of binary classifica-tion. The SVM implementation in the publicly available ma-chine learning library SPIDER for Matlab was used in theexperiments. Three different kernels were tested for SVM:linear kernel, polynomial kernel with degree 1 and RBF ker-nel with standard deviation of 211. The confusion matrices ofthe 6-class recognition using the RBF kernel is shown in Ta-ble 2. It can be seen that recognition rates for Neutral, Happyand Surprise are high (90-100%) but Sad, Anger and Disgusthave lower recognition accuracy (60-80%).

Five AdaBoost classifiers were also trained using the de-veloped feature matrix. 100 weak classifiers were used foreach expression classifier with a learning rate of 1. The con-fusion matrices of the 6-class recognition using AdaBoostis shown in Table 3. Again, it can be seen that recognitionrates of Neutral, Happy and Surprise are high and that of Sad,Anger and Disgust are relatively poor. The accuracy com-parison is reported in Table 4. With LBP features and thelinear programming technique, Feng et al. [1] reported theperformance of 93.80% on the JAFFE database but they pre-processed the images using the CSU Face Identification Eval-uation System [10] to exclude non-face area with an ellipticalmask. Liao et al. [2] recently reported the recognition per-formance of 85.57% on the JAFFE database. But owing totheir large input feature set they are not feasible for real-timeimplementation.

The implementation was done in Matlab on an Intel i3 2.2GHz CPU. The processing time for training and testing forvarious classifiers has been reported in Table 5. Training timeis the time taken to train the particular classifier for all the

Table 1: Developed feature matrix for training of classifiers

Happy Sad Surprise Anger Disgust

Neutral 1 1 1 1 1Happy 1 0 0 0 0Sad 0 1 0 0 0Surprise 0 0 1 0 0Anger 0 0 0 1 0Disgust 0 0 0 0 1

Table 2: Confusion matrix (%) of 6-class facial expressionrecognition using LBP based SVM (RBF)

Neutral Happy Sad Surprise Anger Disgust

Neutral 90 0 0 0 10 0Happy 10 90 0 0 0 0Sad 20 0 60 0 10 10Surprise 0 0 0 100 0 0Anger 0 0 10 0 80 10Disgust 0 0 10 0 10 80

Table 3: Confusion matrix (%) of 6-class facial expressionrecognition using AdaBoost with LBP features

Neutral Happy Sad Surprise Anger Disgust

Neutral 90 0 0 0 10 0Happy 0 100 0 0 0 0Sad 20 0 60 0 10 10Surprise 0 0 0 100 0 0Anger 0 0 10 0 80 10Disgust 0 0 10 0 0 90

Table 4: Accuracy comparison between various classifiers us-ing LBP features

Recognition accuracy(%)

Feng et. al [1] 93.80Liao et. al [2] 85.57SVM (linear) 78.33SVM (Polynomial) 83.33SVM (RBF) 83.33AdaBoost 86.67

Table 5: Speed comparison (in ms) between different classi-fiers using LBP features

Total training time Testing time / image

SVM (linear) 466 10SVM (Polynomial) 503 28SVM (RBF) 747 41AdaBoost 11651 866

training database images. Testing time is the time taken bythe trained classifiers to predict the result. It can be seen thatAdaBoost takes a long time in training the weak classifierswhile SVM takes less than a second to train. Testing time ofSVM for a single image is very low (10-41 ms) but that of Ad-aBoost is on the higher side (866 ms). The average time takenby the face detection and feature extraction algorithm per im-age is ∼183 ms. A real-time facial expression recognition

Page 4: [IEEE 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) - Regina, SK, Canada (2013.05.5-2013.05.8)] 2013 26th IEEE Canadian Conference on Electrical

(a) Anger (b) Happy

(c) Neutral (d) Sad

(e) Sad (f) Surprise

Fig. 6: Sample results with database images used for testingthe AdaBoost classifier giving correct classification

(a) Disgust misclassified as Anger (b) Neutral misclassified as Happy

Fig. 7: Sample results with database images used for testingthe AdaBoost classifier giving incorrect classification

system was developed using both SVM and AdaBoost typesof classifiers. The graphical user interface (GUI) consists oftwo plots. The left side shows a preview of the video and theright side shows the bar plot of the resulting expression basedon the classifier prediction values. Expression recognition isdone on a frame by frame basis. Since, SVM testing is veryfast, the application was able to process the video feed at anaverage rate of 4.8 frames per second without any noticeablelag. Using AdaBoost classifier, we were able to achieve 0.9frames per second expression output. Some of the sample testimages that yielded correct and incorrect expression resultsare shown in Fig. 6 and 7, respectively. As mentioned ear-lier, the pair Sad and Anger is difficult for a system to recog-nize accurately owing to similar characteristic features. Somefaces are often falsely read as expressing a particular emotion,

even if their expression is neutral, because their proportionsare naturally similar to those that another face would tem-porarily assume when emoting. The results demonstrate thatLBP features can be very fast for deriving discriminative fa-cial features which can be stored in a compact representationtherefore, allowing for faster recognition rates.

4. CONCLUSION

In this paper, a fast facial expression recognition algorithmbased on LBP was proposed. For a 6-class (Neutral, Happy,Sad, Surprise, Anger and Disgust) system, a recognition accu-racy of 86.67% was achieved. The complexity of the existingtechniques was also reduced while maintaining the recogni-tion accuracy to enable human emotion recognition on real-time video sequences.

5. REFERENCES

[1] X. Feng, M. Pietikinen, and A. Hadid, “Facial expressionrecognition based on local binary patterns,” Pattern Recogni-tion and Image Analysis, vol. 17, pp. 592–598, 2007.

[2] S. Liao, W. Fan, A. Chung, and D.-Y. Yeung, “Facial expres-sion recognition using advanced local binary patterns, tsallisentropies and global appearance features,” in Image Process-ing, 2006 IEEE International Conference on, oct. 2006, pp.665 –668.

[3] C. Shan, S. Gong, and P. W. McOwan, “Facial expressionrecognition based on local binary patterns: A comprehensivestudy,” Image and Vision Computing, vol. 27, pp. 803 – 816,2009.

[4] G. Zhao and M. Pietikainen, “Dynamic texture recognitionusing local binary patterns with an application to facial ex-pressions,” Pattern Analysis and Machine Intelligence, IEEETransactions on, vol. 29, no. 6, pp. 915–928, 2007.

[5] M. Valstar, I. Patras, and M. Pantic, “Facial action unit de-tection using probabilistic actively learned support vector ma-chines on tracked facial point data,” in Computer Vision andPattern Recognition - Workshops, 2005. CVPR Workshops.IEEE Computer Society Conference on, june 2005, p. 76.

[6] R. E. Schapire and Y. Freund, “Boosting the margin: a new ex-planation for the effectiveness of voting methods,” The Annalsof Statistics, vol. 26, pp. 322–330, 1998.

[7] N. R. Howe, “A closer look at boosted image retrieval,” inIn ACM Transactions on Multimedia Computing, Communi-cations. Springer-Verlag, 2003, pp. 61–70.

[8] P. Viola and M. J. Jones, “Robust real-time face detection,”International Journal of Computer Vision, vol. 57, pp. 137–154, 2004.

[9] M. Lyons, J. Budynek, and S. Akamatsu, “Automatic classifi-cation of single facial images,” Pattern Analysis and MachineIntelligence, IEEE Transactions on, vol. 21, no. 12, pp. 1357–1362, dec 1999.

[10] D. S. Bolme, J. R. Beveridge, M. Teixeira, and B. A. Draper,“The csu face identification evaluation system: Its purpose,features and structure,” in In International Conference on Vi-sion Systems. ICVS, 2003, pp. 304–311.