FPGA IMPLEMENTATION OF A NOVEL ROBUST FACIAL …ethesis.nitrkl.ac.in/6504/1/212EC2211-16.pdf ·...
Transcript of FPGA IMPLEMENTATION OF A NOVEL ROBUST FACIAL …ethesis.nitrkl.ac.in/6504/1/212EC2211-16.pdf ·...
FPGA IMPLEMENTATION OF A
NOVEL ROBUST FACIAL EXPRESSION
RECOGNITION ALGORITHM
A THESIS SUBMITTED IN PARTIAL FULFILMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF TECHNOLOGY IN
VLSI DESIGN AND EMBEDDED SYSTEM BY
Yamini Piparsaniyan
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY, ROURKELA
2012-2014
FPGA IMPLEMENTATION OF A
NOVEL ROBUST FACIAL EXPRESSION
RECOGNITION ALGORITHM
A THESIS SUBMITTED IN PARTIAL FULFILMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF TECHNOLOGY IN
VLSI DESIGN AND EMBEDDED SYSTEM BY
Yamini Piparsaniyan
212EC2211
Under the guidance of
Prof. Kamala Kanta Mahapatra
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY, ROURKELA
2012-2014
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY, ROURKELA
ODISHA, INDIA-769008
CERTIFICATE
This is to certify that the thesis report entitled “FPGA Implementation Of A
Novel Robust Facial Expression Recognition Algorithm”, is submitted by Ms. Yamini
Piparsaniyan bearing Roll No. 212EC2211, in partial fulfilment of the requirements for
the award of Master of Technology in Electronics and Communication Engineering
with specialization in “VLSI Design and Embedded Systems” during session 2012-14
at National Institute of Technology, Rourkela.
This thesis is an authentic work carried out by her under my supervision and
guidance. To the best of my knowledge, the matter embodied in the thesis has not been
submitted to any other university/institute for the award of any Degree or Diploma.
Place: Rourkela Prof. Kamala Kanta Mahapatra
Date: 22nd
May, 2014 Department of ECE
National Institute of Technology,
Rourkela
i
ABSTRACT
A facial expression recognition system depicts about state of mind of a particular
by showing their emotions, thus has potential application in various field of human
computer interaction (HCI) such as to aid autistic children, robot control and many more.
This work presents a robust and hardware efficient algorithm for facial expression
recognition which gives very high rate of accuracy. Broadly, human facial expression has
been categorized in seven categories, named as anger, disgust, fear, happy, sad, surprise
with basic neutral emotion. The process of emotion recognition starts with the image
capturing, detecting the face in the image of which emotion has to recognize, extracting
robust and unique features of image which makes categorization efficient and
classification of features for one of the above mentioned emotion categories. Face
detection out of an image is done using existing Bayesian discriminating feature method.
An algorithm is proposed for facial expression recognition, integrating Gabor filter bank
and its features for feature extraction, statistical modelling which uses principal
component analysis PCA and conditional density function for modelling of features and
extended Bayes classifier for multi-class classification of emotion in a detected face. The
multi class classification strategic has been applied based on highest value of log
likelihood after training different emotions class. Robust features are extracted using
Gabor filter with 8 frequency and 8 orientations. FPGA implementation of the extended
Bayesian classifier is done on Xilinx10.1, Virtex II Pro FPGA using CORDIC unit for
trigonometric functions. Facial expression images from JAFFE database have been used
for training as well as testing. Very high accuracy (96.73 %) of emotion recognition has
been obtained with proposed method.
ii
ACKNOWLEDGEMENT
I would like to express my gratitude to my thesis guide Prof. K. K. Mahapatra for
his guidance, advice and support throughout my thesis work. He inspired, motivated,
encouraged and gave me full freedom to do my work with proper suggestions throughout
my research work. I am grateful to him for his kind and moral support throughout my
academics at National Institute of Technology, Rourkela. It has been a great honour and
pleasure for me to do research under his supervision. I would like to thank him for being
my advisor here at National Institute of Technology, Rourkela.
Next, I want to express my respects to Prof. A.K. Swain, Prof. D.P. Acharya, Prof.
P. K. Tiwari, Prof. N. Islam, Prof. Poonam Singh, Prof. A.K. Sahoo for teaching me and
also helping me how to learn. They have been great sources of inspiration to me and I
thank them from the bottom of my heart. I would like to thank Vijay Kumar Sharma sir,
for helping me throughout the project work. I would like to thank to all my faculty
members and staff of the Department of Electronics and Communication Engineering,
N.I.T. Rourkela, for their generous help for the completion of this thesis.
I would like to thank all my friends and especially Seshagiri, Neel Kamal,
Kalpesh and Priyanka, who made my stay here in NIT so beautiful and motivated,
supported me all time. I also thank Jagannath sir, Tom sir and Sudheendra sir for their
support and help.
I am especially indebted to my parents for their love, sacrifice, and support. They
are my first teachers after I came to this world and have set great examples for me about
how to live, study, and work.
CONTENTS
ABSTRACT .............................................................................................................................................. i
ACKNOWLEDGEMENT ...................................................................................................................... ii
LIST OF FIGURES ................................................................................................................................. v
LIST OF TABLES .................................................................................................................................. vi
CHAPTER 1 INTRODUCTION ............................................................................................ 1
1.1 Types of Emotion ............................................................................................................................... 3
1.2 Applications of Facial Emotion Recognition System ...................................................................... 3
1.3 Motivation .......................................................................................................................................... 4
1.4 Problem Description .......................................................................................................................... 5
1.5 Organization of Thesis ...................................................................................................................... 5
CHAPTER 2 BACKGROUND ............................................................................................... 6
2.1 Facial Expression Recognition Process ............................................................................................ 7
2.2 Methods Used for Different Modules of Recognition Process ....................................................... 8 2.2.1 Pre-processing methods ............................................................................................................. 8 2.2.2 Feature extraction Methods .................................................................................................... 10 2.2.3 Classification Methods ............................................................................................................. 13
2.3 CORDIC and Its Applications........................................................................................................ 14 2.3.1 What is CORDIC? ................................................................................................................... 14 2.3.2 Functional Description of CORDIC ....................................................................................... 15 2.3.3 Applications .............................................................................................................................. 18
CHAPTER 3 PROPOSED ALGORITHM FOR FACIAL EXPRESSION
RECOGNITION ..................................................................................................................... 19
3.1 Overview........................................................................................................................................... 20
3.2 Pre-processing .................................................................................................................................. 22 3.2.1 Discriminating Feature Analysis ............................................................................................. 22 3.2.2 Statistical Modelling................................................................................................................. 25 3.2.3 Bayes classifier .......................................................................................................................... 26
3.3 Feature extraction using Gabor Wavelets ..................................................................................... 28 3.3.1 Gabor Wavelet Representation ............................................................................................... 28 3.3.2 Feature Extraction from Gabor Wavelets ............................................................................. 29
3.4 Principal component analysis ......................................................................................................... 29
3.5 Extended Bayesian classifier for classification .............................................................................. 31 3.5.1 Modelling .................................................................................................................................. 31 3.5.2 Classification............................................................................................................................. 32
CHAPTER 4 IMPLEMENTATION .................................................................................... 32
4.1 MATLAB Implementation ............................................................................................................. 33
4.2 FPGA Implementation of Post Feature Extraction Process ........................................................ 34
CHAPTER 5 SIMULATION RESULTS & COMPARISON ........................................... 41
5.1 Simulation Results ........................................................................................................................... 42
5.2 FPGA Implementation Results ....................................................................................................... 44
5.3 Comparison ...................................................................................................................................... 46
CHAPTER 6 CONCLUSION & FUTURE SCOPE ........................................................... 48
6.1 Conclusion ........................................................................................................................................ 49
6.2 Scope for Future Work ................................................................................................................... 49
BIBLIOGRAPHY ................................................................................................................... 50
PUBLICATION ...................................................................................................................... 53
v
LIST OF FIGURES
Fig. 2.1 Generalized process of face detection and emotion detection ................................ 7
Fig. 2.2 The Cordic core architecture ................................................................................ 15
Fig. 2.3 2D circular rotation of a vector by an angle in coordinate system ....................... 16
Fig. 3.1 Proposed Method for Facial Expression Recognition System ............................. 21
Fig. 3.2 Process to find feature vector using Discriminating Feature Analysis ................ 24
Fig. 3.3 Detailed process of Face Detection ...................................................................... 27
Fig. 3.4. (a) Gabor wavelet, (b) an image and (c) convolved image with Gabor wavelet . 28
Fig. 4.1 Block diagram for Dimensionality Reduction process of Classification ............. 34
Fig. 4.2 Block diagram for mean calculation ..................................................................... 35
Fig. 4.3 Block diagram for Covariance module ................................................................ 35
Fig. 4.4 Block diagram for Eigen Loop ............................................................................ 36
Fig. 4.5 Block diagram for (cos α, sin α) module using CORDIC .................................... 37
Fig. 4.6 Block diagram for Arctan function using CORDIC ............................................ 37
Fig. 4.7 Block diagram for Principal component calculation ............................................ 38
Fig. 4.8 Delay path model for FPGA implementation ....................................................... 39
Fig. 5.1 Example of Variance face class images ............................................................... 42
Fig. 5.2 Example of Variance non-face class images ........................................................ 42
Fig.5.3 Example of images from JAFFE Database showing various emotions ................ 43
Fig. 5.4 Example of pre-processed images of JAFFE Database ........................................ 43
Fig. 5.5 Simulation waveform of FPGA implemented model ........................................... 45
Fig. 5.6 Chart comparing performance of proposed method with that of existing methods
................................................................................................................................... 47
vi
LIST OF TABLES
Table 4-1 Parameters for Gabor Wavelets generation ...................................................... 33
Table 5-1 Accuracy of Face Detection using Bayesian Discriminating Feature Analysis
Method .............................................................................................................. 43
Table 5-2 Percentage of Correct and Erroneous Emotion Recognition With Training
Database and Test Images from JAFFE Database ............................................ 44
Page | 2
Today, in a world of automation, human-computer interaction (HCI) is a wide
area of research and applications. As, human to human interpersonal communication can
be categorized in two types, verbal communication and non-verbal communication (facial
expressions, hand gestures and tone of the voice), where verbal communication is
essential yet non-verbal communication plays an important role where expression says a
lot more than words, similarly HCI can also be of above two types. Non-verbal
communication is backbone of HCI. For HCI, there are two major constraints, first
understanding Facial expression and second understanding Hand/arm movement (gesture)
[17]. Tactlessly, presently available human–computer interfaces systems are not
providing full benefit of these valuable communicative media, thus unable to deliver the
advantages of natural interaction to the users. HCI can be more useful if computers can
understand facial expression of the user and then act according to the recognized emotion
of the user.
The first constraint of HCI is facial expression recognition as emotions are
unavoidable part of communication even if we talk of verbal communication. In day to
day life, human show many expressions which are outcome of ones‟ internal state or
psychological state, which is response to the events occurring in the surroundings and
these expressions form emotions. The movement of main organs of the face such as
opening of mouth, widening of eyes, narrowing or frowning of eyebrows and cheek
movements play main role in forming and expressing emotions. On the basis of
appearance and movements of face organ only human do recognise others‟ emotions, so
the HCI system has to develop which sense the changes and movements in face organs
while communication and result in recognised emotions. To make a facial expression
recognition system fruitful, fast and accurate response of system are key parameters,
Page | 3
failing which results produced may not be useful or turn to be confusing at times,
depending on the application of system [1−4].
1.1 Types of Emotion
There are a number of micro facial expression which human expresses such as
affection, anger, angst, annoyance, anguish, anxiety, apathy, awe, arousal, boldness,
boredom, contentment, curiosity, contempt, depression, desire, disappointment, despair,
disgust, dread, embarrassment, envy, ecstasy, euphoria, excitement, fearlessness, fear,
frustration, gratitude, guilt, grief, happiness, hatred, horror, hope, hostility, hysteria, hurt,
indifference, interest, joy, jealousy, loathing, loneliness, lust, love, misery, nervousness,
passion, panic, pity, pride, pleasure, rage, regret, sadness, remorse, satisfaction, shame,
shyness, shock, sorrow, suffering, terror, uneasiness, worry, wonder, zeal, zest [13].
For HCI, it is not possible to detect each and every micro facial expression so
widely facial expression can be categories in basic seven categories namely anger,
disgust, fear, happy, sad, surprise and neutral [1].
1.2 Applications of Facial Emotion Recognition System
Application of Facial emotion recognition can be seen in different HCI areas such as:
1. To aid autistic children
2. Robot control
3. Driver state surveillance
4. Computerized psychological counselling and therapy
5. In the detection of criminal and antisocial motives
Page | 4
Autistic children are found with a disorder known as, Asperger‟s Syndrome, an
autism spectrum disease, due to which autistic children are often face problem to
communicate with other people, in verbal mode [1]. As they are unable to understand
verbal communication, to read people‟s behaviour and talks, autistic children need to be
trained with non-verbal communication and facial expressions being an important cue of
interaction, they must learn to read emotions. Thus, for such children, an automatic real
time facial expression recognition system can be a boon during face to face interaction or
even to meet their daily communication needs.
1.3 Motivation
Facial expressions are identified by position and movement of face organs such as
eyes, cheeks, mouth. Thus facial expression representation is very sensitive to even slight
change in face organ location or movement. The process of emotion recognition includes
feature extraction and classification of extracted feature. Thus even small change in any
of the face organ leads to change in emotion and also extracted feature is changed. Now,
classifier should be efficient enough to monitor change in feature and map the feature to
correct emotion class. There are methods available for robust feature extraction based on
PCA [1], LDA [14], LBP [15], harr features [7], and more. For classification broadly used
is SVM [16] because of its accuracy but at the cost of complexity and more hardware
utilization. Other is based on Euclidean distance method which is less complex but shows
poor results too [18]. Thus facial expression recognition system suffers either with
accuracy or complexity in multi class classification part on hardware implementation.
Page | 5
1.4 Problem Description
Thus, the challenge is to design a classifier for multi-class classification which is
less complex, very accurate and feasible for hardware implementation along with to be
compatible with flow of process of facial expression recognition process.
1.5 Organization of Thesis
The thesis is organized as follows:
Chapter 2 describes the basic process of facial expression recognition, different
methods used in literature for different modules of whole emotion recognition
procedure, such as pre-processing, feature extraction and classification, have been
described briefly with comparison of pros and cons of each process. Also
introduction to CORDIC is given in this chapter.
Chapter 3 describes a proposed method for facial expression recognition system
with detailed process of face detection based on Bayesian discriminating feature
analysis, feature extraction using Gabor filters, PCA and proposed Extended
Bayesian method for classification.
Chapter 4 describes about implementation constraints which have been taken into
account while implementing the proposed work. Also VHDL implementation
process and its modules for classification process using CORDIC have been
shown.
Chapter 5 shows results of facial expression recognition process using proposed
work and compares the result with highly accurate existing models.
Chapter 6 concludes the work done with an insight into future scope of the work.
Page | 7
2.1 Facial Expression Recognition Process
Facial expression recognition process starts with capturing of image of desired
person whom emotion has to recognize. After this, two major processes are to be
followed, [5]:
Face Detection
Emotion Detection
Facial Expression recognition process is training based process where first some
image database is developed for both face detection (having two class types, face class
image and non-face class images) and emotion detection (containing images from all
seven emotion classes, anger, disgust, fear, happy, neutral, sad and surprize). Training
images are featured then on the basis of their property test image is featured and
classified. A generalized process for both face detection and emotion detection is shown
in Error! Reference source not found..
Fig. 2.1 Generalized process of face detection and emotion detection
Before applying images directly for emotion recognition process, unwanted area
and noise of the image should be removed so that extracted features only have required
and important values to make features robust. This purpose is solved by face detection
which gives only effective part of face removing all other unnecessary things. After face
detection comes the emotion detection part.
Pre-processing Feature
Extraction Classification
Testing Images
Recognized
Emotion
Training Images Training Images
Page | 8
There are numerous methods existing in literature which are widely used for pre-
processing, feature extraction and classification, some of them which gives high accuracy
results are described next in this section.
2.2 Methods Used for Different Modules of Recognition Process
2.2.1 Pre-processing methods
When it comes to face detection system, pre-processing of an image includes the
resizing, sharpening of face and non-face class images. When it comes to emotion
detection, pre-processing of image means to find effective area of face contributing for
feature extraction by removing background and lower face portion of body (if present).
There are many methods which have been approached for face detection as face detection
is being the first step of process in many applications like automatic face recognition,
human computer interaction (for expression recognition, emotional state recognition),
surveillance system etc. But emotion detection is an emerging topic of concern on which
less effective work is done, especially when it comes to considering real time hardware
implementation and it includes face detection as pre-processing step.
Some of the widely used methods for face detection are penned below along with
their pros and cons in view of our requirement constraints.
Skin Color Based Face Detection Algorithm
In [19], Sanjay Kr. Singh et al. presents an algorithm for face detection based on
skin colour. Colour is one of the important features of human faces n which basis
probability of face presence can be estimated. Using human skin colour as a tool for
feature for locating face in an image many advantages than other face detection methods
among which most important is speed of operation and definitely colour processing is
Page | 9
faster than any other kind of feature processing and this procedure becomes easier with
the fact that colour orientation is invariant under certain lighting conditions. The coloured
image under test is converted to all three colour space representation namely RGB,
YCbCr and HCI and algorithms based on these three colour spaces are combined together
to get a new skin colour based face detection to take the algorithm.
This method shows poor result as skin colour features are subject to change with
various lightening conditions with movement of objects. Even different camera produces
different colour ranges for the same person and time. Moreover this method does not
work for gray-scale images.
Boosted Cascade of Simple Feature Method
In [20], Paul Viola and Michael Jones gave a method for rapid object detection.
This method is outcome of three major combinations first is image representation by Harr
features and termed as “Integral Image”. These “integral image features” are chosen
because of their very simple yet very fast calculation. For feature extraction image is
divided into sub-windows then Harr features are taken for each window, by doing so
there comes a huge amount of features along with some majority of available features in
whole image. So, the second contribution tells to pick only small amount of data for
feature with some critical features with the help of AdaBoost. Third comes the
classification after feature extraction, in this method, some high response, complex
classifiers are “cascaded” to increase the classification speed as cascaded classifiers look
forward in some promising regions of image for face detection.
Cascaded classifier needs to be implemented in multi-layers, which becomes very
complex in embedded platform with high cost.
Page | 10
Bayesian Discriminating Feature Analysis Method
In [7], Chengjun Liu gave this method for face detection which uses Harr like
features for feature extraction but uses “difference images” unlike the previous discussed
method. Along with “difference images”, amplitude projections of image and image
pixels itself are combined to get robust features. For classification, this method uses
statistical modelling, PCA, conditional density function and Bayes classifier. Ease of
implementation with superb accuracy makes me to choose this in the project. Thus, the
method is discussed in section 3.2.
2.2.2 Feature extraction Methods
For emotion recognition second step is to extract facial feature out of the face
image and then to classify feature for emotion. Mainly two types of approaches are
reported to extract facial features (for both face detection and emotion detection), namely,
appearance-based methods and geometry-based methods. In geometric based feature
extraction process, the shape, movement and location of various face organs like
frowning of eyebrows, uplifting of cheeks, widening of eyes, are taken into consideration
for feature extraction and for this they require accurate and reliable facial organ
movement description, which is quite complex to achieve implement real time
applications in hard based systems. While, in the process of appearance-based feature
extraction [1], appearance changes in image are captured by applying them on image
filters.
Geometric feature extraction system gives robust features but proofs to be very
complex when it comes to real time hardware implementation while appearance based
method is easy to implement with high accuracy. So while keeping hardware constraints
Page | 11
in mind, we will focus on appearance based methods for pre-processing (face detection)
and emotion detection. Some of the techniques used for appearance based method are:
LDA (Linear Discriminant Analysis)
LBP (Local Binary Pattern)
PCA (Principal Component Analysis)
Harr-like Features method
Filter/ Kernel based methods
LDA Based Method
In [14], A. Djeradi et al. used a feature sub-space known as “fisher-space” to
project images. This projection reduces the dimensionality of the data matrix while
preserving the unique feature set. Training images are projected to this “fisher-space” for
feature extraction then test images are projected to LDA training space. Class information
extracted from fisher space is used to find unique vector set which minimizing the intra-
class scatters while maximizing the inter-class scatter.
Classification accuracy of LDA based method is affected by the "small sample
size" (SSS) problem.
LBP Based Method
In [15], Caifeng Shan et al. studied a method for face detection based on facial
local statical features which was earlier done for texture recognition. In this method
feature is extracted by considering 3×3 neighbour pixel of each pixel and then taking
threshold value of each neighbouring pixel with respect to centre pixel, formulating a
binary number belonging that block and finally take a 256 bit bin histogram as feature
descriptor.
Page | 12
Small neighbouring size loses some dominant features when applied on large
scale images which limit its effectiveness of operation.
PCA Based Method
Similar to LDA method, this PCA method uses “Eigen-space” for dimensionality
reduction and feature extraction. It uses eigen values and eigen vectors of the input data
matrix for calculation of principal component and later some significant principal
components are taken as feature descriptor. This method suffers from loss of data due to
dimensionality reduction and feature extraction can‟t bear data loss which makes it yield
less accurate result along with implementation ease [1].
Harr-like Features Method
Harr-features provides a promising feature extraction method by taking sub-
portion of image and calculating “integral images” or “difference images”. This method
provides very robust features but with repetition of features as integral or difference
images are linked to forth and back window of image. The robustness comes with
redundant data. The process is shown in [7] and [18].
Filter/Kernel Based Method
A kernel or filter bank is created with varying parameters and image under test is
convolved with the filter bank to map its feature with kernel. Such an approach is shown
in by Vitomir Struc and Nikola Pavesi C. in [11] which makes Gabor filter bank varying
with orientation and frequencies and image is mapped to this kernel. It provides
flexibility to choose size and robustness of feature by limiting number of scale and
orientation variation. This property is taken into advantage of this project and detailed in
section 3.3.1.
Page | 13
2.2.3 Classification Methods
Support Vector Machine (SVM)
SVMs are based on supervised learning methods, first model learns with the help
of training then perform pattern recognition on test images. SVM is a non-probabilistic
binary classification method where classes are separated by boundary and patterns are
shown as points in space. Binary SVM can be extended to multi-class but high degree of
complexity as shown in [16]. Above methods result best with SVM (support vector
machine) classification but SVM makes it complex to be implemented on embedded
platform hardware especially when it comes to multi-class classification.
KNN Classifier
KNN classifier as shown in [18], is based on Euclidean distance. Point to point
space distance is calculated from training to testing feature and the minimum is distance,
maximum is the similarity. This is the very basic and simplest method to classification
but suffers inaccuracy due to no cross point measures.
Bayes Classifier
Bayes classifier is for binary class classification, uses statistical model for
measuring parameter by calculating likelihoods of test image with classes [7]. The
accuracy and simplicity of implementation makes it useful for face detection which needs
binary class classifier and so detailed in section 3.2.
Page | 14
2.3 CORDIC and Its Applications
Many DSP and pattern recognition algorithms use elementary functions like
logarithmic, trigonometric, hyperbolic, exponential, division and multiplication. There are
two ways of implementing these functions, first by using lookup table method and second
is through polynomial expansions. The above mentioned methods require large number of
multiplications/divisions and additions/subtractions. Hardware implementation is done in
embedded platform environment which uses development boards such as Spartan, Virtex
etc. These are bounded by number of input-output pins, area, slices, flip-flops and
multiplication/division uses further excess adders/subtracts which is not feasible as it
crosses the source limitations.
2.3.1 What is CORDIC?
COordinate Rotation DIgital Computer (CORDIC) is a special purpose computer
useful to compute many non-linear and transcendental functions, was proposed by Volder
in 1959 and generalized by Walther later [22]. It is useful when there is no hardware
multiplier are provided and computation is done on additions/subtractions only. The
CORDIC core architecture is shown in fig 2.2, where all trigonometric, logarithmic and
other non-linear functions are computed using only shifts, additions and subtractions. The
functions that can be computed using a CORDIC computer include trigonometric,
logarithmic, exponential, hyperbolic, multiplication, division, square root, etc. [22].
Though it initially served the purpose of navigation systems, it later became a
popular tool to implement several digital systems especially in the areas of digital signal
processing, communications, computer graphics, etc. The simplicity of CORDIC is that it
can compute any of the above mentioned functions using shifts and additions which are
of the form (x± ). The operating mode and the coordinate system chosen are two
Page | 15
Fig. 2.2 The Cordic core architecture
key factors to compute the desired functions in the CORDIC. Many signal processing and
communication systems operate CORDIC in circular coordinate system and in either of
rotation or vectoring modes.
2.3.2 Functional Description of CORDIC
Main functional blocks of CORDIC are:
Conversion (polar to rectangular and vice versa)
Trigonometric function (Sin and Cos)
Hyperbolic functions ( Sinh and Cosh)
Inverse trigonometric functions (arctan and arctanh)
Logarithmic
Other non-linear functions such as square root, multiplication, division etc.
CORDIC operates on rotation of coordinate system with prefixed angles until the
rotation angle reaches to zero from a defined angle, this process is shown in fig.2.3,
which shows 2D coordinate circular rotation system. The trigonometric and other
Page | 16
arithmetic functions are executed by solving eqs.(2.3.1) with some predefined set of
conditions. In equations Xnew, Ynew are new coordinate value of Xold , Yold vectors after
rotating with angle ϴ.
Fig. 2.3 2D circular rotation of a vector by an angle in coordinate system
(2.3.1)
K is invariable constant in above equations.
Input/Output Data Representation
Fig 2.3 shows CORDIC core block where X_IN, Y_IN are input data signals and
X_OUT, Y_OUT are output data signals while p_in and p_out denotes phase input and
output signals. Clk, ce, ND, ACLR, SCLR are input control signals with two output
signals RFD and RDY. There is separate format for reading data signals and phase
signals.
Page | 17
XQN Number Format
This number format is used to represent signed values in 2‟s complement format.
This format is described as XQN => 1 sign bit + X bits for integer + N bits for
mantissa (fraction). So if „w‟ is word width then N = w − (X+1) bits will represent
fractional part of the value. This is also represented as Fix (N+X+1)_N using
system generator fixed format.
Data Signal Representation
o Input data signals (signed) should lie in the range from -1 to +1, outside
which CORDIC core gives unpredictable results. So to present maximum
integer of value „1‟ one bit is required with an additional bit for sign
representation. Thus input data signals are represented in 1QN format
where N= word_width – 2. It can also be represented as Fix(N+2)_N
format.
Example: If word width is 10bit then signed data signal will be
represented by 1Q8 format.
“0011000000” => 00.11000000 => +0.75
“1101000000” => 11.01000000 => −0.75
o When data signal set to unsigned fraction, value can be ranged 0 <= X_in
< +2 and integer is represented by 1 bit allowing it to represent as
UFix(N+1)_N format. N = word_width – 1.
Example:
“0011000000” => 0.011000000 => 0.375
o For unsigned integer, data signal range varies from 0 <= X_IN < 2**Input
Width.
Page | 18
Example:
“0011000000” => 192
Phase Signal Representation
Input phase signals (signed) should lay in the range from −∏ to +∏,
outside which CORDIC core gives unpredictable results. So to present maximum
integer of value „3‟, two integer bits are required with an additional bit for sign
representation. Thus input phase signals are represented in 2QN format where N=
(word_width – 3). It can also be represented as Fix(N+3)_N format. Phase is
represented in radians in CORDIC core.
Example: If word width is 10bit then phase signal will be represented by 2Q7
format.
“0011000000” => 001.1000000 => +1.5 radians
“1101000000” => 110.1000000 => −1.5 radians
2.3.3 Applications
CORDIC applications are mainly figured in following areas of development:
Matrix decomposition
Image processing
Digital signal processing
Communications implementations
Computer graphics
Robotics and many more.
With the base of this literature review and defined processes, further work is done.
Page | 20
3.1 Overview
Facial expression recognition system has been designed by following the
procedure as shown in section 2.1 and the flow of proposed method is shown in fig 3.1. A
set of database containing numerous images showing all seven basic facial expressions
(anger, disgust, fear, happy, neutral, sad and surprize), with equal prior probabilities, is
taken and used for training of the system. All the training images are first pre-processed
to find the effective face area out of the image, which is useful for feature extraction. Also
the test image is pre-processed before it is used for future feature extraction and
classification. A pre-existing Bayesian discriminating feature analysis method [7], is used
for pre-processing, which is described in detail in section 3.2. This particular method is
used for pre-processing because of its simple implementation and robust and speedy
classification and as here classification is only between two classes, face class which
contains effective face portion of the human reducing all other noises and other is non-
face class which includes images from rest of the world, binary based Bayesian
classification shows the best result.
After pre-processing of training images, images belonging to same emotion class
are grouped together and combining features of each image of same emotion group, class
feature is generated which collectively represents an emotion feature. Since emotions are
represented by very minute details of face organs digitally, it becomes necessary to
extract very robust features for accurate classification which contains various aspects of
face appearance. To cover maximum available features Gabor filters, its wavelets and
generated features are used for feature extraction and described in section 3.2 in detail
[11]. Since Gabor wavelets can be generated depends on various frequencies and scales,
higher the frequency and scale, robust will be the feature with hardware cost. So there
Page | 21
comes a trade-off between feature robustness and hardware cost and one can choose
depending on high priority. Test image features are is also extracted using this method.
Fig. 3.1 Proposed Method for Facial Expression Recognition System
As Gabor features are of very high dimensionality, to reduce its computational
complexity PCA is used. By calculating eigen values and eigen vectors of feature matrix,
principal components of the feature vector is estimated. Section 3.3 shows how PCA is
used for dimensionality reduction.
Reduced dimensionality class features are modelled with test image feature. Here
modelling means to estimate the similarity between class feature and test image feature
and classification is based on measure of similarity. For this conditional density function
is used with its logarithm value, it shows the likelihood amount of one category to other,
shown in section 3.4.1. Since here multi-class classification is required an Extended
Bayes classifier is proposed contrary to binary Bayes classifier, detailed in section 3.4.2.
Extended Bayes classifier gives very high accuracy rate and is far easier for embedded
platform implementation as compared to support vector machine (SVM) which is widely
used for classification but loses implement ability for multi class classification.
Page | 22
3.2 Pre-processing
An image or portion of image (window) may belong to face class or non-face
class (rest of the world). To identify face in an image, we check in window in varying
size of image whether that particular portion belongs to face class or non-face class. For
this, training is done with both face class images and non-face class images. For window
under test of test image, discriminating feature vector for feature extraction is applied
then log likelihood of conditional density function of extracted feature with respect to
both face class and non-face class is calculated and to identify to whether there is face or
not, Bayes classification rule is applied on calculated log likelihood.
3.2.1 Discriminating Feature Analysis
An enhanced and simple feature is estimated in discriminating feature analysis by
combining Harr features and amplitude projections to give high discriminating power of
features [7], graphically shown in fig.3.2. If image A(i,j) ∈ Rm × n
then discriminating
feature vector of image, Y, is calculated by combining :
a) Image A
b) 1D Harr representation of image A
c) Amplitude projection of image A
1D Harr representation includes calculation of horizontal difference image and
vertical difference image of image A(i, j), Ah(i, j) ∈ R(m-1) x n
and Av(i, j) ∈ Rm× (n-1)
respectively.
Ah(i, j) = A(i+1,j) – A(i, j) 1≤ i <m, 1≤ j ≤n (3.2.1)
Av(i, j) = A(i,j+1) – A(i, j) 1≤ i ≤m, 1≤ j<n (3.2.2)
Page | 23
Amplitude projections, row and column projections, denoted as Xr and Xc
respectively gives vertical symmetry and horizontal characteristics of image.
Xr (i)= ∑n
A(i, j) , 1≤ I ≤m , Xr ∈ Rm
(3.2.3)
j=1
Xc(j)= ∑m
A(i, j) , 1≤ j ≤n , Xc ∈ Rn (3.2.4)
i=1
A new discriminating feature vector is formed by combining normalized values of
vectors X, Xh, Xv, Xr and Xc, in order to make data redundant for ease of calculation.
Normalization of a vector is done by subtracting vector with the mean of its components
then dividing the subtracted result by their standard deviations. Say X1, Xh1, Xv1, Xr1,
and Xc1 represents the new normalized vectors then discriminating feature vector Y1 ∈
RN
is formed by concatenating above normalized vectors, given as:
Y1= (X1, Xh1, Xv1, Xr1, Xc1)t
(3.2.5)
Again Y1 is normalized to form final discriminating vector, Y also the normalization
process can be viewed by the equation (3.2.6):
(3.2.6)
where shows mean and shows standard deviation of Y1. Size of the resulting
discriminating feature vector will of range, Y ∈ RN
with N = 3×m×n.
Page | 25
3.2.2 Statistical Modelling
Statistical modelling estimates the conditional probability density functions, of
any class for test image using multivariate normal distribution will be used later for
classification for feature.
Let w is feature matrix representing particular class, M is mean of class feature
(w) and ∑ is covariance matrix then multivariate normal distribution of w is represented
as
w = N(M, ∑) (3.2.7)
Since, multivariate normal distribution is of high dimensionality so to make it ease
with computation, an important property, optimal signal reconstruction, of principal
component analysis (PCA) is used for dimensionality reduction. Only a part of original
signal can be used to represent the whole signal as, PCA uses minimum mean square
error for reconstruction of signal [6]. If eigen vector space and eigen values of covariance
matrix are denoted by ϕ and λ respectively and Y is test image feature vector then
principal component of deviation of test image with particular class can be calculated by
eq.(3.2.8)
Z= ϕt (Y -M) (3.2.8)
With the use of PCA above property, inspite of using all principal components,
only first Mn, (Mn << N) components are used to formulate the conditional density
function and impact of other principal components can be included by adopting
Moghaddam and Pentland model [21], that calculates eigen value equivalent for the rest
(N –Mn) eigen values, by the averaging them:
ρ =
∑ (3.2.9)
Page | 26
Logarithmic of conditional density function ( ) of class (w) for image having
discriminating feature vector (Y) can be given as
[ ( | )]
* ∑
|| || –∑
(∏ )
( ) ( )} (3.2.10)
Discriminating feature vector of each training image of face class and non-face
class is determined and combined to get face class features wf and wn respectively. Also
a feature of test image window is estimated with the above defined process.
With the defined procedure of statistical modelling, log likelihood of CDF with
respect to face class and non-face class, for test portion is calculated and denoted by δf
and δn respectively.
3.2.3 Bayes classifier
Binary Bayes classifier is used for classification on the basis of likelihood
between two classes [10]. Here, in face detection likelihood of test window is calculated
for both face class and non-face class and whatever likelihood is greater test window
belongs to that particular class, mathematically
∈ {
(3.2.11)
If Y belongs to face class then save it and process it for expression recognition
otherwise slide the window further and repeat the process to find face. Summarized
process of face detection is shown in flow chart of fig 3.3.
Page | 28
3.3 Feature extraction using Gabor Wavelets
3.3.1 Gabor Wavelet Representation
For 2D data, Gabor wavelet is given by [11],
2 22 2
2 22
2( , , , )
cos sin
sin cos
t t
t
f fx y
j fx
t
t
ff e e
x x y
y x y
(3.3.1)
Here, f is sinusoidal frequency, θ is wavelet orientation, γ is the spatial width along the
sinusoidal plane wave, η represents spatial width of wavelet which is perpendicular to the
wave and x,y represents the coordinates of pixels. By varying the scale (f) and orientation
(θ), a filter bank is created. The scale and orientation are changed according to equations
given by eq. (3.3.2) :
max
,
/ 2 ,
,8
( , ) ( , , , )
g
g
h
g h g h
f f
h
x y f
(3.3.2)
The value of parameters in Eq.(2) are given by γ = η = 2 , fmax = 0.25 with g
{0,…,3}, h {0,…,7}, generally. In the current work both scale and orientation are taken
as (g,h) {0,…,7}, to extract more discriminating features.
(a) (b) (c)
Fig. 3.4. (a) Gabor wavelet, (b) an image and (c) convolved image with Gabor wavelet
Page | 29
3.3.2 Feature Extraction from Gabor Wavelets
Each component of the 2-D Gabor wavelet (at different scales with different
orientations) is convolved with image (A) and given by Eq. Down sampling is performed
by a factor of 4 in rows and columns of the absolute value of the filtered image. The
feature vector is obtained by rearranging the down sampled value.
, ,( , ) ( , ) ( , )g h g hO x y A x y x y (3.3.3)
The vector F, is normalized to have unit variance and zero mean. If og,h denotes the
feature vector from the filtered image at scale g and orientation h, then final feature (F)
value in Rd is given by Eq.(4)[11]. For m×n image, the size of feature space dimension,
d = m×n×g×h/(d1×d2) (3.3.4)
Where „g‟ is number of frequency scale and „h‟ is number of orientations taken into
account to create Gabor filter bank.
The feature extraction method is performed using a Gabor filter bank of scales and
orientations both equal to 8. The pixel coordinates x, y (in Eq.(1)) is taken as 39, 39 as in
[12], and the final feature vector is given by eq.(3.3.4).
3.4 Principal component analysis
PCA is used for dimensionality reduction as dimensionality reduction gives a
compact, effective and low-dimensional feature for a given high-dimensional data set. In
an N dimensional space, if PCA is applied then it aims to find a linear subspace with
lower dimensionality say D where (D << N) while maintaining almost all variability of N
dimensional data. Eq.(3.4.1)-(3.4.5) shows the how to calculate the principal components
using eigen values and eigen vectors. First D principal components, associated with D
largest eigen values are considered to represent reduced dimensionality vector [1].
Page | 30
Let Ai is featuring vector of an image after feature extraction using Gabor
wavelets. All the feature vectors of image belonging to same emotion class are grouped
together to form class feature matrix. Let say „A‟ represents class feature matrix then
process to calculate principal components of it is as follows:
A = [A1, A2, A3……… An] (3.4.1)
where „i‟ is number of images belonging to same class. Mean (An,mean) of all „n‟
images is calculated and deviation of image by mean is calculated and represented by Ã.
Ãn = An – An,mean (3.4.2)
To find principal component of a matrix its eigen values and eigen vectors are
required and for this Covariance matrix is used because use of its symmetry property
makes calculation overhead half of the actual. Let „cov‟ represents covariance matrix of A
then it can be calculated as;
∑ (
)
(3.4.3)
Covariance matrix „cov‟ comes out to be a n×n matrix. The process of finding
eigen values and eigen vectors of this matrix is by Jacobi Algorithm, as follows:
Step 1.Start with an identity matrix U and covariance matrix cov.
Step 2.Find out the off diagonal element with largest absolute value, covij then take out
corresponding diagonal elements covii and covjj using to calculate rotation angle „α’ using
eq.(3.4.4),
(
) (3.4.4)
Step 3.Take an identity matrix V except for Vii = Vjj = cos α, Vji = -sin α, and Vij = sin α.
Page | 31
Step 4.Compute the matric products C′′ and U′′ such that C′′ij becomes zero and other
elements of matrix changes.
C′′ = V′×cov×V and U′′ = U×V (3.4.5)
Step 5.Compare maximum absolute value of off diagonal element (covij) with threshold
value, if covij is greater repeat steps from step 2 to step 5 until convergence with C′′ as cov
and U′′ as U. Upon convergence C′′ diagonal elements contains eigen values and U′′
contains eigen vectors. U′′ and C′′ both are of size n×n.
Eigen vector associated with highest eigen value gives principal component and
so on effectiveness of components decreases with decreasing eigen values. If X is (m×n)
size data which is projected to eigen space Z = XU′′, principal components are in order of
U′′.
3.5 Extended Bayesian classifier for classification
3.5.1 Modelling
Feature vector of training images, belonging to same emotion class, is estimated
using eqs.(3.3.1)– (3.3.4), and are grouped together to form particular class feature vector.
If emotions are categorized in „C‟ number of classes, class feature vectors, wk, for each
class (k=1 to C) are obtained to get covariance matrix of size (d×d), where d is given in
eq(3.3.4). These covariance matrix features are undergo dimensionality reduction by
using PCA. For PCA, eigen values and eigen vectors of covariance matrix are calculated
and PCA is applied as shown in section 3.2.2. Similarly feature vector (Y) of test image is
estimated using eqs.(3.3.1)– (3.3.4) . Conditional density function and its logarithm using
PCA is shown in eq.(3.5.1)-(3.5.2) below [7],
( | )
( ) |∑| {
( ) ∑
( )} (3.5.1)
Page | 32
[ ( | )]
* ∑
|| || –∑
(∏ )
( ) ( ) } (3.5.2)
where M and ∑ are mean and covariance of class feature vector „w‟ and „L‟ shows
likelihood of test image with respect to class w.
3.5.2 Classification
If there are „k‟ number of classes (k = 1 to C) then for each „k‟ emotion class (k =
1 to C), likelihood of test image „Lk‟ is estimated using eq.(3.5.3). To classify the
estimated likelihoods, Lk, (k = 1 to C), proposed method extends the binary Bayes
classifier to multi-class classifier (having equal class priors), such that class „k‟ having
the maximum value of logarithmic of conditional density function, shows maximum
similarity of that class feature to test image feature thus test image is categorized under
„k‟th
emotion class, mathematical shown in eq.(3.5.3)
1
(L )
observed kk C
C argmax (3.5.3)
Eq.(9) recognizes the emotion class „Cobserved’ as the emotion label of the test
image. Prior probabilities of each emotion class are equal, as equal number of training
images in each class has been taken.
This proposed method has been implemented and tested in the subsequent
sections.
Page | 33
4.1 MATLAB Implementation
The proposed method is implemented following the process described in chapter
3. The pre-processing of images is done by Bayesian discriminating feature analysis
method with (M = 10) principal components taking into account for modelling by using
eqs. (3.2.1) to (3.2.11) using MATLAB.
Pre-processed images are undergoing for feature extraction as per section 3.3
using Gabor wavelets. Table 4-1 shows the parameters taken while designing Gabor
wavelets and also for feature extraction using wavelets. The features are extracted using
eqs. (3.3.1) to (3.3.3). Eq.(3.3.4) gives the size of the feature of an images. With the
defined parameters size of the feature comes out to be (1024×1) since (d =
16×16×8×8/(4×4) = 1024).
Table 4-1
Parameters for Gabor Wavelets generation
Gabor wavelets
No. Of frequencies (g) 8
Orientations (h) 8
No. of rows in 2-D Gabor filter 39
No. of columns in 2-D Gabor filter 39
Gabor features extraction
Factor of down sampling along rows (d1) 4
Factor of down sampling along columns (d2) 4
Each class feature is computed by combining similar class images. Since size of
the feature is very large, PCA is applied on class features using eqs.(3.4.1) to (3.4.5) for
dimensionality reduction and effective first 10 principal components (M=10), are
considered for modelling using eqs.(3.5.1) and (3.5.2) and likelihood is calculated for
each training class, anger, disgust, fear, happy, sad, surprize and neutral. Now calculated
Page | 34
likelihoods for each class are compared with each other to find which class shares
maximum similarity with the test image and the class with max likelihood is recognized
as recognized emotion class for given test image.
4.2 FPGA Implementation of Post Feature Extraction Process
After pre-processing and feature extraction of using MATLAB, the
dimensionality reduction and modelling part is implemented in FPGA using Xilinx 10.1.
The computational block diagram which is designed by following Jacobi Algorithm, as in
section 3.4, is shown in fig. 4.1.
Fig. 4.1 Block diagram for Dimensionality Reduction process of Classification
Let say „A‟ is feature matrix of representing any class and Y is feature of test
image after Gabor feature extraction. Since class feature is accumulation of various
images, covariance of class feature is calculated to take the measure of difference among
images. Covariance is a symmetric matrix where diagonal elements shows variance of an
image with its different components while off diagonal elements shows difference
between different images. Taking advantage of symmetry property of covariance matrix,
while calculation; only upper triangular off-diagonal elements are calculated and lower
half are assigned copying the upper triangular elements, it saves hardware utilization.
Page | 35
Covariance is calculated using eq.(3.4.3) where deviation of feature (A1), with its
mean (Amean), is need to be calculated. The mean is calculated as shown in fig. 4.2. If
each column represents an image in class then column wise mean is calculated and
subtracted with its respective column to find deviation. Since feature is of (1024×1) size,
division operation for mean is calculated using right shift property of HDL, and to divide
by 1024 times, vector is right shifted 10 bits (1024 = 210
). Deviation A1 and its transpose
A1‟ are calculated and fed to Covariance block, fig. 4.3, which calculates upper triangular
elements by adding and multiplying elements logically then divided by using CORDIC
divider.
Fig. 4.2 Block diagram for mean calculation
Fig. 4.3 Block diagram for Covariance module
As per the steps of Jacobi Algorithm of section 3.4, eigen values and eigen
vectors are calculated iteratively using covariance matrix cov, With covariance block an
identity matrix U is also formed. Using Eigen loop block, fig. 4.4, each iterative value of
C2 and U2 is calculated until threshold is reached. Here in this particular implementation
Page | 36
threshold is set to „2‟. Mux is used in top block, fig.4.1 to make iterations according to
threshold. Max value and indices block calculates max value among absolute values of
off-diagonal elements of C1, along with the indices, max value index and its symmetric
part index (max_index and symm_max_index), its corresponding diagonal elements
index (diag_i and diag_j) are calculated. (Cos α, sin α) block is implemented to fulfil
steps 3 followed by matrix V and (C2,U2) block of step 4. Threshold comparator of fig.
4.1, compares the max_element and threshold, which is set to „2‟ to generate a flag of
condition satisfaction, and this flag works as select line for mux.
Fig. 4.4 Block diagram for Eigen Loop
Fig. 4.5 shows implementation of block used to generate cos α and sin α signals.
Alpha angle is calculated using eq.(3.4.4) which uses tan inverse function. Tan inverse
function and cos and sin functions are implemented using CORDIC ip core 4.0. The
CORDIC cos, sin values are read using XQN format as described in section 2.3.2.
Page | 37
Fig. 4.5 Block diagram for (cos α, sin α) module using CORDIC
To implement tan inverse function there is an arctan function available in
CORDIC but that gives phase angle correctly when values of tangents (X and Y) are
subjected to first quadrant (i.e. both positive) only, otherwise it gives garbage as its output
phase limitation is upto (-pi/4 to +pi/4). So to realize an arctan function which is suitable
for all four quadrants some functionality has to be added with CORDIC block as shown
in fig. 4.6. As we know, if phase_out is tan inverse angle when tangents lie in first
quadrant then angle for rest quadrants can be calculated as :
sel => 00 => angle <= phase_out (Ist quadrant)
sel => 10 => angle <= pi - phase_out (IInd
quadrant)
sel => 01 => angle <= − (phase_out) (IIIrd
quadrant)
sel => 11 => angle <= − pi + phase_out (IVth
quadrant)
Fig. 4.6 Block diagram for Arctan function using CORDIC
Page | 38
In fig. 4.6, (magnitude and sign sel) block store the magnitudes and signs of inputs (X and
Y) and according to signs generates sel signal. On magnitude values tan inverse function
is operated and then according to sel signal angle is rotated using Phase rotation having
above functionality, which gives the correct tan inverse angle.
After calculating eigen values and eigen vectors from fig. 4.1 and intermediate
blocks, fig.4.7 shows the process of calculating principal components using eq. (3.2.8).
Fig. 4.7 Block diagram for Principal component calculation
Here first block selects the M principal components corresponding to M maximum
eigen values and gives eigen vector matrix Φ. Second block finds the deviated feature
matrix and fed to multiplier. Multiplier multiplies the transpose of matrix Φ and temp
matrix signal to generate principal components Z. These principal components are further
used in eq.(3.5.1) to (3.5.3) to model and classification.
Delay Path Model for FPGA Implementation
While designing for hardware implementation, one of the most important aspects
is the synchronization between different modules of design. Since clock is the measure of
latency in HDL design, inputs to particular module should reach at same time to avoid
cross calculation of data. To maintain the synchronization latencies if blocks are
calculated pre-hand and delay wherever required are given as shown in fig. 4.8.
Page | 40
In flow graph, number with arrow shows the delay required to provide to input of
successive signal or block to maintain the synchronization in output of that particular
block, in such manner full design is synchronized. For an example, in the above design,
tan inverse function has latency of 14 clock cycles and further (sine, cosine) block has
latency of 20 clock cycle, since for calculation of matrix V output of (sine, cosine) block
and all the indices are required so to make these in sync, delay of 34 clock cycles (14 +
20 =34) is given to indices signals. In the similar way whole design is synchronized.
Page | 42
5.1 Simulation Results
Both face detection and emotion detection processes are implemented using
constraints defined in chapter 4. Since both face detection and emotion detection are
supervised learning method, first system is modelled and trained using training images
(from both face-class and non-face class in case of face detection and images from all
seven emotion classes in case of emotion detection) then test images are subjected to the
system to be classified.
The pre-processing steps i.e. face detection method is implemented with (M=10)
principal components and trained using 2400 face class images and 4500 non-face class
images from variance face class database and variance non-face class database
respectively. Since non-face class images may contain images from rest of the world
except face image, so there is hell number of images belonging to this class but for
training, images with similar variance to face class are taken into account. The examples
of face class and non-face class variance images are shown in fig.(5.1) and fig.(5.2)
respectively.
Fig. 5.1 Example of Variance face class images
Fig. 5.2 Example of Variance non-face class images
Facial expression recognition algorithm is tested on JAFFE database so JAFFE
database is first pre-processed (face detection) then used in emotion recognition part for
both training and testing.
Page | 43
JAFFE Database: JAFFE is an acronym for Japanese Female Facial Expresion.
JAFFE database provides 213 images posed by 10 Japanese females showing all the
seven basic emotions (Anger, Disgust, Fear, Happy, Sad, Surprise and Neutral) [10].
The database was planned and assembled by Michael Lyons, Miyuki Kamachi, and
Jiro Gyoba. These images were taken when these females were subjected to sudden
feel of any of the above described emotions. The example of images from JAFFE
database is shown below in fig. 5.3.
Fig.5.3 Example of images from JAFFE Database showing various emotions
Images from JAFFE database have been pre-processed, the accuracy of pre-
processing and resized to 16×16 for training as well as testing has been shown in Table 5-
1below and fig.5.4 shows example of pre-processed images.
Table 5-1
Accuracy of Face Detection using Bayesian Discriminating Feature Analysis Method
Set Sources Number of
Images
Detected
face
Undetected
face
Accuracy
Set1 JAFFE Database 213 205 8 96.244%
Fig. 5.4 Example of pre-processed images of JAFFE Database
JAFFE database have 210 images of frontal face showing seven emotions posed
by 10 Japanese female models. Each emotion class has 30 images. Out of 30 images in
Page | 44
each class, 20 have been used for training and rest of 10 images for testing the accuracy.
In three different rounds, different training and test sets have been used. The principal
components taken is 10 (i.e., M=10). Performance result of proposed emotion recognition
method is shown in table 5-2. The columns represent the classified emotion for the
desired emotion in corresponding rows. Anger is classified with 100 % accuracy. The
lowest accuracy is obtained in case of fear class.
Table 5-2
Percentage of Correct and Erroneous Emotion Recognition With Training Database and Test
Images from JAFFE Database
Desired
Emotion
Recognized Emotion
Anger Disgust Fear Happy Neutral Sad Surprize
Anger 100 0 0 0 0 0 0
Disgust 0 96.77 3.22 0 0 0 0
Fear 0 3.22 93.54 0 0 3.22 0
Happy 0 3.22 0 96.77 0 0 0
Neutral 0 0 0 0 96.77 0 3.22
Sad 0 3.22 0 0 0 96.66 0
Surprize 0 0 3.22 0 0 0 96.77
Overall
Accuracy 96.73%
5.2 FPGA Implementation Results
The design shown in fig. 4.1 for classification is implemented and simulated using
Xilinx ISE 10.1 tool on board of VirtexIIP family with device XC2VP30 and package
being FF896. The simulated waveform output is shown in fig. 5.5.
Page | 46
5.3 Comparison
The performance of proposed method is compared with existing highly accurate facial
emotion recognition models listed below and comparison chart is shown in fig.5.6.
The method proposed in [2], based on fuzzy relational approach shows overall 88.2 %
accuracy when tested on Set, containing images of 100 Indian male showing 6
emotions (except neutral). It shows 92.2 % accuracy when tested on Set2, contains
images of 100 Indian female showing 6 emotion (except neutral).
The method proposed in [4], which is based on facial movement analysis, shows
92.92 % accuracy when tested on Set1 containing 180 images from JAFFE database
showing 6 emotions (except neutral) and 93.14 % accuracy when tested on Set2,
containing 800 images from Cohn-Kanade database showing 6 emotions.
Method proposed in [5] based on local directional number pattern, shows 96.68 %
accuracy when tested on Set1, contains images from Cohn-Kanade database showing
all seven emotions.
In [6], 94 % accuracy is obtained by using Hidden Markov Model method for
recognition.
Facial action units based method in [8], which is, shows 87.43% accuracy when tested
on Set1, contains images from Cohn-Kanade database showing 6 emotions (except
neutral).
Page | 47
Fig. 5.6 Chart comparing performance of proposed method with that of existing methods
The comparison chart shows that the proposed method of facial expression
recognition has highest accuracy when tested on JAFFE database and anger emotion class
has 100 % accuracy, which is also highest among other methods.
6062646668707274767880828486889092949698
100Anger
Disgust
Fear
Happy
Sad
Surprize
Neutral
OverallAccuracy
Page | 49
6.1 Conclusion
A method for facial emotion recognition based on Gabor wavelet based feature
and extended Bayesian classifier for multi class classification is proposed. The proposed
method shows overall accuracy of 96.73% for JAFFE database, when implemented on
MATLAB which is higher than that of highly accurate existing facial emotion recognition
methods. Also existing method for Face detection based on Bayesian discriminating
feature analysis is implemented successfully with accuracy of 98.59% when tested on
JAFFE database. The simple extended Bayesian classifier is suitable for real time
implementation and is implemented on FPGA VirtexIIpro using Xilinx ISE 10.1
simulator with CORDIC IP Core. The ease of implementation and hardware efficiency of
CORDIC IP core makes the FPGA implementation area efficient and faster. The
intellectual properties of CORDIC core have been used significantly for calculation of
logarithm, sin, cos, tan and arctan functions.
6.2 Scope for Future Work
To implement the remaining existing modules of feature extraction and face
detection in HDL and to perform full FPGA implementation of algorithm in
hardware.
Real time implementation of the proposed algorithm for facial expression
recognition.
Realization and implementation of second parameter of HCI that is gesture
recognition.
ASIC implementation for HCI including both emotion and gesture recognition.
Page | 50
BIBLIOGRAPHY
[1] K.G. Smitha and A.P. Vinod, “Hardware Efficient FPGA Implementation of
Emotion Recognizer for Autistic Children,” IEEE CONECCT, pp.1–4, 2013.
[2] A. Chakraborty and A. Konar, “Emotion Recognition from Facial Expressions and
its Control using Fuzzy Logic,” IEEE Transactions on Systems, Man and
Cybernetics- Part A: Systems and Humans, vol.39(4), pp.726–743, 2009.
[3] T. Senechal, V. Rapp, H. Salam, R. Seguier, K. Bailly and L. Prevost, “Facial
Action Recognition Combining Heterogeneous Features via Multikernel
Learning,” IEEE Transactions on Systems, Man and Cybernetics- Part B:
Cybernetics, vol.42(4), pp.993–1005, 2012.
[4] L. Zhang and D. Tjondronegoro, “Facial Expression Recognition Using Facial
Movement Features,” IEEE Transactions on Affective Computing, vol.2(4),
pp.219–229, 2011.
[5] A.R. Rivera, J.R. Castillo and O. Chae, “Local Directional Number Pattern for
Face Analysis:Face and Expression Recognition” IEEE Transactions on Image
Processing, vol.22(5), pp.1740–1752, 2013.
[6] M. Lahbiri et al., “Facial Emotion Recognition with the Hidden Markov Model,”
International Conference on Electrical Engineering and Software Applications
(ICEESA), pp.1–6, 2013.
[7] C. Liu, “A Bayesian Discriminating Features Method for Face Detection,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol.25(6), pp.725–
740, 2003.
[8] Y. Li, S. Wang, Y. Zhao and Q. Ji, “Simultaneous Facial Feature Tracking and
Facial Expression Recognition,” IEEE Transactions on Image Processing,
vol.22(7), pp.2559–2573, 2013.
Page | 51
[9] G. Littlewort, M.S. Bartlett, I. Fasel, J. Susskind and J.R. Movellan, “Dynamics of
Facial Expression Extracted Automatically from Video,” Image and Vision
Computing, vol.24, pp.615–625, 2006.
[10] M. Lyons, M. Kamachi and J. Gyoba. (1998, April). The Japanese Female Facial
Expression (JAFFE) Database. [Online]. Available:
http://www.kasrl.org/jaffe.html
[11] V. Struc and N. Pavesic, “Gabor-Based Kernel Partial-Least-Squares
Discrimination Features for Face Recognition,” Informatica, vol.20(1), pp.115–
138, 2009.
[12] M. Haghighat, S. Zonouz and M. Abdel-Mottaleb, “Identification Using
Encrypted Biometrics,” Computer Analysis of Images and Patterns, Springer
Berlin Heidelberg, vol.8048, pp.440–448, 2013.
[13] T. Pfister, M. Pietikainen, X. Li and Guoying Zhao, “Automatic Recognition
Algorithm for Detecting facial Expressions,” U.S. Patent 0 300 900, Nov 14,
2013.
[14] F. Z. Chelali, A. Djeradi AND R. Djeradi, “Linear Discriminant Analysis for Face
Recognition,” IEEE International Conference on Multimedia Computing and
Systems, pp. 1–10, 2009.
[15] Shan C., Gong S., P. McOwan, “Facial expression recognition based on Local
Binary Patterns: A comprehensive study,” Image and Vision Computing, vol.
27(6), pp. 803–816, 2009.
[16] Ching-Chih Tsai, You-Zhu Chen and Ching-Wen Liao, “Interactive Emotion
Recognition Using Support Vector Machine for Human-Robot Interaction”, Proc.
IEEE International Conference on Systems, Man, and Cybernetics, pp. 407-412,
2009.
Page | 52
[17] Vladimir I. Pavlovic, R. Sharma and Thomas S. Huang, “Visual Interpretation of
Hand Gestures for Human-Computer Interaction: A Review,” IEEE Transaction
on Pattrern Analysis and Machine Intelligence, vol. 19(7), pp. 677-695, July
1997.
[18] L.S. Ng, M.S. Nixon and J.N. Carter, “Texture Classification using Combined
Feature Sets,” IEEE Southwest Symposium on Image Analysis and Interpretation,
pp.103-108, 1998.
[19] S. Kr. Singh, D. S. Chauhan and M. Vatsa, R. Singh, “A Robust Skin Color Based
Face Detection Algorithm,” Tamkang Journal of Science and Engineering, vol.
6(4), pp. 227-234, 2003.
[20] Paul Viola, Michael Jones, “Rapid Object Detection using a Boosted Cascade of
Simple Features,” IEEE Conference in Computer Vision and Pattern Recognition,
vol. 1, pp. 511-518, 2001.
[21] B. Moghaddam and A. Pentland, “Probabilistic Visual Learning for Object
Representation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol.
19(7), pp. 696-710, July 1997.
[22] J. E. Volder, “The CORDIC trigonometric computing technique,” IRE
Transaction on Electron. Comput., vol. EC-8, pp.330–334, Sep. 1959.
[23] Xilinx LogiCore, “CORDICv3.0,” DS249 product specification, May 21, 2004.