Extended feed forward neural networks with random weights for face recognition

7
Extended feed forward neural networks with random weights for face recognition Jing Lu, Jianwei Zhao, Feilong Cao n Department of Information Sciences and Mathematics, China Jiliang University, Hangzhou 310018, Zhejiang Province, China article info Article history: Received 19 August 2013 Received in revised form 30 November 2013 Accepted 4 January 2014 Communicated by D. Wang Available online 5 February 2014 Keywords: Face recognition Classier Neural network with random weights (NNRW) Matrix data abstract Face recognition is always a hot topic in the eld of pattern recognition and computer vision. Generally, images or features are often converted into vectors in the process of recognition. This method usually results in the distortion of correlative information of the elements in the vectorization of an image matrix. This paper designs a classier called two dimensional neural network with random weights (2D- NNRW) which can use matrix data as direct input, and can preserve the image matrix structure. Specically, the proposed classier employs left and right projecting vectors to replace the usual high dimensional input weight in the hidden layer to keep the correlative information of the elements, and adopts the idea of neural network with random weights (NNRW) to learn all the parameters. Experiments on some famous databases validate that the proposed classier 2D-NNRW can embody the structural character of the face image and has good performance for face recognition. & 2014 Elsevier B.V. All rights reserved. 1. Introduction The strong adaptability, high security, and non-contact smart interaction of face recognition make it of great potential in appli- cations, such as public security, intelligent access control, and criminal investigation. Thus face recognition becomes a hot topic increasingly in the elds of pattern recognition and computer vision. Traditional face recognition system generally contains four steps: face detection, image preprocess, feature extraction, and classication with some classier, and among which feature extraction and classication are the cores. After decades of developments, there have been various effec- tive methods for feature extraction and classiers in the eld of automatic face recognition [1]. For example, classical feature extraction methods include eigenfaces [2], sherfaces [3], inde- pendent component analysis (ICA) [4], laplacianfaces [5], kernel tricks [6,7], and so on. And the popular applied classiers contain the nearest neighbor (NN) network [8,9], support vector machine (SVM) [10,11], feed-forward neural network (FNN) [12], and so on. It is easy to nd that these existing feature extraction and classi- cation methods only take effect for vector input. That is, before applying these methods to deal with the face recognition, the matrix data of face image must be converted into a row or column vector. However, this method usually will destroy the relationship among elements of the original matrix data, which may affect the extracted feature and the subsequent classication results. Recently, many researchers have proposed some two-dimensional feature extraction methods that operate on the matrix data directly, e.g. two-dimensional principal component analysis (2DPCA) [13,14] and two-dimensional linear discriminant analysis (2DLDA) [1517], which have been veried useful for extracting effective information of the neighbouring elements as well as reducing the computational complexity of the extraction. On the other hand, for the existing classiers, such as SVM and FNN, when they are used to classify, we have to convert the extracted feature matrices into column vectors, which will result in that the neighbouring information of the feature matrix usually is destroyed and the recognition rate is decreased. Although the NN classier can be used to classify the feature matrices directly because of the same distance for the form of matrix or vector, its structure is so simple that it usually cannot achieve the recognition rate we expected. Therefore, it is meaningful to study the classier applied for the matrix data, i.e. 2D input. So to classify matrix data directly and to preserve the matrix or 2D feature structure effectively, we will propose a novel classier, named two-dimensional neural network with random weights (2D-NNRW), which will also be used availably for face recognition. To construct the classier, we will employ a kind of special feed forward networks introduced rst in [18], named neural networks with random weights (NNRW). These networks have fast learning speed because of its randomly chosen input weights and biases. Mean- while, it still can achieve good classication performance [1822]. So the designed classier can achieve efcient classication. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing http://dx.doi.org/10.1016/j.neucom.2014.01.022 0925-2312 & 2014 Elsevier B.V. All rights reserved. n Corresponding author. E-mail address: [email protected] (F. Cao). Neurocomputing 136 (2014) 96102

Transcript of Extended feed forward neural networks with random weights for face recognition

Page 1: Extended feed forward neural networks with random weights for face recognition

Extended feed forward neural networks with random weightsfor face recognition

Jing Lu, Jianwei Zhao, Feilong Cao n

Department of Information Sciences and Mathematics, China Jiliang University, Hangzhou 310018, Zhejiang Province, China

a r t i c l e i n f o

Article history:Received 19 August 2013Received in revised form30 November 2013Accepted 4 January 2014Communicated by D. WangAvailable online 5 February 2014

Keywords:Face recognitionClassifierNeural network with random weights(NNRW)Matrix data

a b s t r a c t

Face recognition is always a hot topic in the field of pattern recognition and computer vision. Generally,images or features are often converted into vectors in the process of recognition. This method usuallyresults in the distortion of correlative information of the elements in the vectorization of an imagematrix. This paper designs a classifier called two dimensional neural network with random weights (2D-NNRW) which can use matrix data as direct input, and can preserve the image matrix structure.Specifically, the proposed classifier employs left and right projecting vectors to replace the usual highdimensional input weight in the hidden layer to keep the correlative information of the elements, andadopts the idea of neural network with random weights (NNRW) to learn all the parameters.Experiments on some famous databases validate that the proposed classifier 2D-NNRW can embodythe structural character of the face image and has good performance for face recognition.

& 2014 Elsevier B.V. All rights reserved.

1. Introduction

The strong adaptability, high security, and non-contact smartinteraction of face recognition make it of great potential in appli-cations, such as public security, intelligent access control, andcriminal investigation. Thus face recognition becomes a hot topicincreasingly in the fields of pattern recognition and computervision. Traditional face recognition system generally containsfour steps: face detection, image preprocess, feature extraction,and classification with some classifier, and among which featureextraction and classification are the cores.

After decades of developments, there have been various effec-tive methods for feature extraction and classifiers in the fieldof automatic face recognition [1]. For example, classical featureextraction methods include eigenfaces [2], fisherfaces [3], inde-pendent component analysis (ICA) [4], laplacianfaces [5], kerneltricks [6,7], and so on. And the popular applied classifiers containthe nearest neighbor (NN) network [8,9], support vector machine(SVM) [10,11], feed-forward neural network (FNN) [12], and so on.It is easy to find that these existing feature extraction and classi-fication methods only take effect for vector input. That is, beforeapplying these methods to deal with the face recognition, thematrix data of face image must be converted into a row or columnvector. However, this method usually will destroy the relationship

among elements of the original matrix data, which may affect theextracted feature and the subsequent classification results.

Recently, many researchers have proposed some two-dimensionalfeature extraction methods that operate on the matrix data directly,e.g. two-dimensional principal component analysis (2DPCA) [13,14]and two-dimensional linear discriminant analysis (2DLDA) [15–17],which have been verified useful for extracting effective information ofthe neighbouring elements as well as reducing the computationalcomplexity of the extraction. On the other hand, for the existingclassifiers, such as SVM and FNN, when they are used to classify, wehave to convert the extracted feature matrices into column vectors,which will result in that the neighbouring information of the featurematrix usually is destroyed and the recognition rate is decreased.Although the NN classifier can be used to classify the feature matricesdirectly because of the same distance for the form of matrix or vector,its structure is so simple that it usually cannot achieve the recognitionrate we expected. Therefore, it is meaningful to study the classifierapplied for the matrix data, i.e. 2D input.

So to classify matrix data directly and to preserve the matrix or2D feature structure effectively, we will propose a novel classifier,named two-dimensional neural network with random weights(2D-NNRW), which will also be used availably for face recognition.

To construct the classifier, we will employ a kind of special feedforward networks introduced first in [18], named neural networkswith random weights (NNRW). These networks have fast learningspeed because of its randomly chosen input weights and biases. Mean-while, it still can achieve good classification performance [18–22].So the designed classifier can achieve efficient classification.

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/neucom

Neurocomputing

http://dx.doi.org/10.1016/j.neucom.2014.01.0220925-2312 & 2014 Elsevier B.V. All rights reserved.

n Corresponding author.E-mail address: [email protected] (F. Cao).

Neurocomputing 136 (2014) 96–102

Page 2: Extended feed forward neural networks with random weights for face recognition

The proposed model employs the left projecting vector andright projecting vector to regress each matrix data to its label foreach class, which is inspired by a recent matrix input-basedclassifier named multiple rank regression (MRR) [23], and usesthe random idea to train the weights. So the proposed classifiercan achieve the higher accuracy compared with some vector-based regression methods, and it has a fast training speed bymeans of randomly choosing projecting vectors and biases of thehidden layer.

The rest of this paper is as follows. In Section 2, we propose anew classifier 2D-NNRW and its corresponding algorithm tomaintain the structural properties of the matrix data based onNNRW. In Section 3, we first carry out experiments on recognitionrates on some famous face image databases to verify the effec-tiveness of 2D-NNRW, then analyze the nature of the proposedmodel, and last we combine the proposed classifier 2D-NNRWwith some 2D feature extraction methods to obtain a new facerecognition tool and compare it with some other methods. Con-clusions based on the study are highlighted in Section 4.

2. The proposed method

2.1. The notations

First, we give some important notations in Table 1, and theother notations as well as their concrete meanings will beexplained when they are first used.

2.2. A brief review of NNRW

It is well known that an FNN with single hidden layer can bemathematically modeled as

f LðyÞ ¼ ∑L

i ¼ 1βiϕðw>

i yþbiÞ; ð1Þ

where L is the number of hidden nodes, ϕ : R-R is the activefunction, y¼ ðy1; y2;…; ydÞ> ARd is the input vector, wi ¼ ðwi1;

wi2;…;widÞ> ARd is the input weight connecting the i-th hiddennode to the input, biAR is the bias of the i-th hidden node, and βi

is the output weight, i¼ 1;…; L.According to the conventional FNNs theory, the hidden layer

parameters wi, bi, and the output weight βiði¼ 1;…; LÞ are requiredfreely adjustable. In the supervised learning theory, the hiddenlayer parameters and the output weights need to be trained andtuned properly for the given set of training samples. One of thefamous algorithms for training the weights and biases is the errorback-propagation (BP) algorithm, where the gradient descentmethod is employed to adjust all the weights and biases. However,BP algorithm generally converges very slowly due to the process ofiterations and it easily falls into the local minima.

A fast learning algorithm for the FNNs with single hidden layer,called neural network with random weights (NNRW) was firstproposed in [18], and was developed in [19–22]. Its main idea is asfollows: For a given set of training samples, choose the inputweights and biases randomly, i.e., each element of the inputweights and biases are considered as random variables, then theoutput weights can be calculated by using Moore–Penrose gen-eralized inverse. Actually, the vector of output weights can beexpressed as follows (see [21]):

β̂ ¼ arg minβ

∑N

i ¼ 1∑L

j ¼ 1βjϕðwn

j> yiþbn

j Þ�ti

����������2

2

;

where the wn

j and bn

j are input weights and biases whichwere randomly generated, j¼ 1;2;…; L, β¼ ½β1;β2;…;βL�> , andfðyi; tiÞgNi ¼ 1 is the set of given training samples and their corre-sponding true output. Rewriting the above model in the form ofmatrix, we have

β̂ ¼ arg minβ

‖Hβ�T‖2F ;

where

H¼ϕðwn

1> y1þbn

1Þ ⋯ ϕðwnL> y1þbn

L Þ⋮ ⋯ ⋮

ϕðwn

1> yNþbn

1Þ ⋯ ϕðwnL> yNþbn

L Þ

0BB@

1CCA

is the hidden output matrix and

T ¼t>1⋮t>N

0B@

1CA:

Then β̂ ¼H†T by means of Moore–Penrose generalized inverse H†

of H.

Remark 2.1. For the case of classification using the NNRW, theobjective value of each sample is a label vector. That is, if thereare c classes to be classified, then we usually set ti ¼ ð0;…;0;1;0;…;0ÞARc , where 1 only appears in the i-th position.

Remark 2.2. In the case of recognition, for a new input zARd, itsreal output of the learned NNRW can be computed as

f LðzÞ ¼ ∑L

i ¼ 1β̂ iϕðwn

i> zþbn

i ÞARc:

Then we can judge the class that the z belongs to by means offinding the position of the maximum component of the f LðzÞ.

2.3. The proposed 2D-NNRW

As we all know, the traditional methods for face recognitionoften concatenate each image as a row or column vector, then usevector-based classifier to classify those vectors. However, with thedevelopment of image processing techniques, researchers haveexpended feature extraction methods from 1D (vector) to 2D(matrix), for example, 2DPCA and 2DLDA. These feature extractionmethods can preserve the structural characters of the elements inthe original images, so they are more suitable for image classifica-tion than other methods [24]. Then the following problem is howto classify these matrix-based features. In order to preserve thestructural information of the extracted features, it is meaningful todesign a classifier for the matrix input.

For a given set of matrix features fðXi; tiÞgNi ¼ 1, where XiARm�n;

tiARc , we construct the following 2D-FNN with single hidden

Table 1Notations.

m The first dimensionality of matrix datan The second dimensionality of matrix dataN Number of training datac Number of class

yiARd The ith training vector data

XiARm�n The ith training matrix datatiARc The label vector of Xi or yiϕ : R-R The active functionL The number of hidden nodebjAR The bias of the jth hidden nodeujARm The left projecting vector on the jth hidden nodevjARn The right projecting vector on the jth hidden node

J. Lu et al. / Neurocomputing 136 (2014) 96–102 97

Page 3: Extended feed forward neural networks with random weights for face recognition

layer as an approximator:

f LðXÞ ¼ ∑L

j ¼ 1βjϕðu>

j XvjþbjÞ; ð2Þ

where XARm�n, ujARm, vjARn, bjAR, and βjARc , j¼ 1;2;…; L.

Remark 2.3. Let x¼ VecðXm�nÞ and wj ¼ Vecðujv>j Þ, where Vecð�Þ

denotes the process of displaying a matrix with m�n into acolumn vector with mn elements, then

u>j Xvj ¼ Trðu>

j XvjÞ ¼ TrðXvju>j Þ ¼ Trððujv

>j Þ>XÞ

¼ ðVecðujv>j ÞÞ>VecðXÞ ¼w >

j x:

So we can see that the 2D-FNN is equivalent to an FNN as in (2)with mn inputs.

Remark 2.4. Although 2D-FNN is equivalent to an FNN, it does notneed to convert a matrix into a column vector. So it preserves thestructural information among the elements of the matrix, which isvery important for the subsequent classification.

Remark 2.5. For a same image X, there are different numbers ofparameters to be calculated for the 2D-FNN and FNN, respectively.For example, there are only ðmþnþ1þcÞL parameters to becomputed for 2D-FNN, while ðmnþ1þcÞL for FNN.

Remark 2.6. When X is a matrix of m� 1, then uj is an m� 1column vector, and vj is a 1�1 vector, i.e. vj is a number, then 2D-FNN reduced to FNN given in (2).

Now we begin to design a learning algorithm to determine allthe weights and biases. Inspired by the idea of NNRW [18], wechoose the left and right projecting vectors un

j and vnj , and the biasbn

j randomly, j¼ 1;2;…; L. That is, each element of the projectingvectors and biases obey some distributions on ð0;1Þ. Then thefollowing problem is how to determine the output weightsβi; i¼ 1;2;…; L. After choosing the projecting vectors and biases,the problem of interpolation becomes a problem of solving asystem of linear equations as follows:

Gβ¼ Tþɛ;

where

G¼ϕðun

1>X1vn1þbn

1Þ ⋯ ϕðunL>X1vnL þbn

L Þ⋮ ⋯ ⋮

ϕðun

1>XNvn1þbn

1Þ ⋯ ϕðunL>XNvnL þbn

L Þ

0BB@

1CCA ð3Þ

is the hidden output matrix and ɛ is the error of the system:

β¼β>1

⋮β>L

0BB@

1CCA; and T ¼

t>1⋮t>N

0B@

1CA:

So the output weights can be solved by the following optimal:

β̂ ¼ arg minβ

∑N

i ¼ 1∑L

j ¼ 1βjϕðun

j>Xivnj þbn

j Þ�ti

����������2

2

:

With the least square method, β̂ ¼ G†T , where G† is the Moore–Penrose generalized inverse of G. We call the above algorithm twodimensional neural network with random weights (2D-NNRW),where 2D represents that it is designed for the matrix-input data.Now the concrete process is given in the following Algorithm 1, i.e.2D-NNRW.

Remark 2.7. The NNRW generates the mn dimensional randomvectors wj; j¼ 1;2;…; L to make the arbitrarily linear combinationof the elements in an m�n image Xi. However, the 2D-NNRWgenerates the m dimension row vectors and the n dimensioncolumn vectors, and it first makes arbitrarily linear combination of

the rows of image Xi, and then makes arbitrarily linear combina-tion of the obtained row vectors. Thus, to some extent, it preservesthe structure of the original matrix data. Its good effects inclassification will be seen in the experiments of Section 3.

Remark 2.8. Since the input weights and biases are generatedrandomly, the learned network for once will be unstable. So weusually take the average value of p runs as the final network.

Algorithm 1. 2D-NNRW.

Input: Given a set of sample datafðXi; tiÞjXiARm�n; tiARc; i¼ 1;…;Ng, the number L ofhidden nodes, and the active function ϕ.

Step 1. Randomly generate the left projecting vector un

j , right

projecting vector vnj and biases bn

j ; j¼ 1;2;…; L.

Step 2. Compute the hidden output matrix G as in (3), whereits element in the position (i,j) is ϕðun

j>Xivnj þbn

j Þ.Step 3. Calculate the output weights β̂ ¼ G†T .Output: The determined network

f LðXÞ ¼∑Lj ¼ 1β̂ jϕ un

j>Xvnj þbn

j

� �.

3. Performance evaluation

3.1. Databases

The ORL database [25]: The ORL database contains images from40 distinct subjects, each subject has 10 different images. For somesubjects, the images were taken at different times, under varyinglighting conditions, and also have different facial expressions orfacial details.

The FERET database [26]: The images of FERET database werecollected in a semi-controlled environment. The database containsa total of 14,126 images from 1199 individuals. For some indivi-duals, over 2 years had elapsed between their first and lastsittings, with some subjects being photographed multiple times.Here, we chose 72 subjects with 6 frontal images per person forexperiments.

The Aberdeen database (see http://pics.psych.stir.ac.uk/2D_face_sets.htm): Aberdeen database includes 687 color faces from IanCraw at Aberdeen. Each of the 90 individuals has 1–18 differentnumber of images. Almost all are frontal images with variations inlighting.

The Extended YaleB database [27,28]: The Extended YaleBdatabase contains 2414 frontal face images of 38 individuals. Thecropped and normalized 192�168 images were captured undervarious laboratory-controlled lighting conditions. In this paper, wetake the images with most neutral light sources as training data,and not the so dark images as the testing data.

3.2. Recognition rate comparison of the proposed 2D-NNRW withNNRW

In this subsection, we carry out some experiments to show thatthe proposed 2D-NNRW can achieve higher recognition rate thanthe NNRW, or in other words, our method is more suitable for faceimage recognition.

Fig. 1 is the recognition rate comparison of the proposed2D-NNRW with NNRW under different numbers of hidden nodeson ORL and FERET databases. Here, the final rates are the averagevalue of 20 runs. Meanwhile, in order to reflect the performance ofthese two classifiers more clearly and truly, we did not carry outfeature extraction before classification. That is, we just classifiedthe original images. In Fig. 1(a), we used the first 5 images of each

J. Lu et al. / Neurocomputing 136 (2014) 96–10298

Page 4: Extended feed forward neural networks with random weights for face recognition

subject for training, and the remaining images for testing. And inFig. 1(b), we selected 72 subjects that have at least 6 frontal imagestaken at different times, and randomly chose 3 images of thesesubjects for training, and the remaining images for testing.

From Fig. 1, we found that the recognition rate of our proposed2D-NNRW is obviously superior than that of the NNRW. With theincrease of the numbers of hidden nodes, the recognition rates oftwo classifiers tend to stable. Although the gap between tworecognition rates is getting narrow, the advantage of 2D-NNRWstill exists.

Changing the training images of each subject, we carried outthe same experiments as above on the ORL and FERET databases.Their experimental results are shown in Tables 2 and 3. Here weused 1000 hidden nodes, and the recognition rates are the averagevalues over 20 runs.

From Tables 2 and 3, we found that the improvement of therecognition rate of the proposed method is not by chance.

Fig. 2 is the recognition rate comparison of the proposed2D-NNRW with NNRW under different number of hidden nodeson the Aberdeen database. Here we chose 60 subjects that has atleast 4 frontal images, and used the two images under nearlyneutral illumination of each subject for training, the remainingtwo illumination images for testing. In the figure, 2D meansthe 2D-NNRW, and 1D represents the NNRW, and the numberfollowed is the number of hidden nodes. In this experiment,besides showing the recognition rates under different number ofhidden nodes, we also repeated each experiment 10 times, andrecorded the average recognition rate of 20 runs in each time. Thereason to do so is tend to show that the average recognition rate of20 runs is relatively stable, so we can take this average recognitionrate as these two networks' corresponding recognition rate. In theremaining paper, we all use the average recognition rate of 20 runsas the final recognition rate.

Observed from Fig. 2, we can get the same conclusion as that onORL and FERER databases. That is, the proposed 2D-NNRW hasbetter performance than NNRW for face images classification.

At the end of this subsection, we will further show theperformance comparison with different number of training sam-ples and testing samples. The results can be found in Fig. 3. Fig. 3(a) is the recognition rates with the number of training samplesfrom 3 to 7 on ORL database, and Fig. 3(b) is the recognition rateswith the number of training samples from 2 to 5 on FERETdatabase. In these two figures, like in Fig. 2, 2D means the2D-NNRW, and 1D represents the NNRW, and the numberfollowed is the number of hidden nodes. We can see that underthe same number of hidden nodes, 2D-NNRW always has muchbetter performance than NNRW, and when uses more hiddennodes (900 in the experiment), with the increase of the number oftraining samples, both networks have increasing recognition rates,however when uses less hidden nodes (500 in the experiment),the recognition rate of NNRW drops dramatically with the increase

0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Hidden Neurons

Rec

ogni

tion

Rat

e (%

)

NNRW Test2D−NNRW TestNNRW Train2D−NNRW Train

0 200 400 600 800 10000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Number of Hidden Neurons

Rec

ogni

tion

Rat

e (%

)

NNRW Test2D−NNRW TestNNRW Train2D−NNRW Train

Fig. 1. Recognition rate comparison of the proposed 2D-NNRW with NNRW under different number of hidden nodes on ORL and FERET databases. (a) ORL database.(b) FERET database.

Table 2Recognition rate comparison of the proposed 2D-NNRW with NNRW on ORLdatabase (%).

Tests NNRW Proposed 2D-NNRW

Training Testing Training Testing

Test 1 100 89.80 100 91.90Test 2 100 89.35 100 91.23Test 3 100 84.65 100 87.35Test 4 100 87.63 100 89.53

Table 3Recognition rate comparison of the proposed 2D-NNRW with NNRW on FERETdatabase (%).

Tests NNRW Proposed 2D-NNRW

Training Testing Training Testing

Test 1 100 81.00 100 87.06Test 2 100 82.82 100 88.45Test 3 100 80.30 100 86.16Test 4 100 84.19 100 88.31

1 2 3 4 5 6 7 8 9 100.7

0.75

0.8

0.85

0.9

0.95

Times

Rec

ogni

tion

Rat

e (%

)

Fig. 2. Recognition rate comparison of 2D-NNRW with NNRW under differentnumbers of hidden nodes and repeated times on Aberdeen database.

J. Lu et al. / Neurocomputing 136 (2014) 96–102 99

Page 5: Extended feed forward neural networks with random weights for face recognition

of the number of training samples, but 2D-NNRW can keeprelatively stable and higher performance.

3.3. Performance analysis for 2D-NNRW

Since all our experiments showed that 2D-NNRW has betterperformance than NNRW, then what is the essential differencebetween 2D-NNRW and NNRW? Is it really the structural char-acter of the 2D-NNRW that makes sense? To see this, we will makesome analysis on the proposed 2D-NNRW.

From Remark 2.3, we found that the structure of 2D-NNRW isequivalent to that of NNRW, and the only difference between themis the generation rule of the input weights. From the previoussection, we know that our generation rule for input weights ismore suitable for face image classification.

Now let us study more about this rule. In NNRW, the inputweights wj and biases bj are both random vectors obeying somecontinuous probability distribution, and the usually used distribu-tion is uniform distribution on interval ½�1;1�. However, in theproposed 2D-NNRW, the equivalent input weight wj is obtained bywj ¼ Vecðujv>

j Þ, where uj and vj are two random column vectorsobeying uniform distribution on the interval ½�1;1�. Hence wj

actually obeys the product distribution of two uniform distribu-tions, j¼ 1;2;…; L. The probability density function of this dis-tribution is

pwðzÞ ¼Z þ1

�1

1jyjp

zy

� �pðyÞ dy:

When the distribution interval is taken to be ½�1;1�, we havepðxÞ ¼ 1=2, and

pwðzÞ ¼�1

2 ln z; 0ozr1;

�12 ln ð�zÞ; �1rzo0:

(

Furthermore, ujv>j is a matrix with rank-1. As mentioned in

Section 2, it can preserve the structural character among thematrix elements. Then what on earth makes the classificationperformance different, distribution or structural character?

Fig. 4 shows the recognition rates of NNRW with input weightsobeying uniform distribution and pw distribution (we can get suchdistribution weight vectors through Cartesian product of twouniform distributions vectors), respectively, and the recognitionrate of 2D-NNRW under 500 and 900 hidden nodes on ORL

database. The symbol P�500 represents the recognition rate ofNNRW with input weights obeying pw distribution with 500hidden nodes, and P�900 is similar.

From Fig. 4, we can see that although the distribution havesome impacts on the results, the improvement of recognitionrate of the proposed 2D-NNRW is not only up to the change ofdistribution.

Meanwhile, we have made the same comparison on ExtendedYaleB and FERET databases, and its results are shown in Table 4,where Groups 1 and 2 represent the experiments under 500hidden nodes and 900 hidden nodes, respectively. And NNRW-Prepresents NNRW with the input weights obeying pw distribution.

Observed from Table 4, the proposed 2D-NNRW can achievehigher recognition rates than the original network. Although thenew distribution of input weights can also improve the rates ofNNRW, the proposed 2D-NNRW has the highest. So we can saythat it is not the distribution that makes 2D-NNRW and NNRWdifferent.

Next we further illustrate the performance of 2D-NNRW bychanging the distributions in NNRW and 2D-NNRW from uniformdistribution to standard normal distribution, and the results areshown in Table 5. Similar to Table 4, Groups 1 and 2 represent therecognition rates under 500 hidden nodes and 900 nodes, respec-tively. And NNRW-P represents NNRW with input weights obeyingproduct distribution of two standard normal distributions.

Table 5 exposes that when the input weights in NNRW obeyingthe standard normal distribution, the impact of product distribu-tion on NNRW is small, some even lower than that of NNRW.However the proposed 2D-NNRW still has obviously outstanding

3 4 5 6 750

55

60

65

70

75

80

85

90

95

Number of Training Samples

Rec

ogni

tion

Rat

e (%

)

1D−9002D−9001D−5002D−500

2 3 4 550

55

60

65

70

75

80

85

90

95

Number of Training Sample

Rec

ogni

tion

Rat

e (%

)

1D−9002D−9001D−5002D−500

Fig. 3. Recognition rate comparison of the proposed 2D-NNRW with NNRW under different number of training samples on ORL and FERET databases. (a) ORL database.(b) FERET database.

1 2 3 4 5 6 7 8 9 10

0.65

0.7

0.75

0.8

0.85

0.9

Times

Rec

ogni

tion

Rat

e (%

)

Fig. 4. Impact from distribution.

J. Lu et al. / Neurocomputing 136 (2014) 96–102100

Page 6: Extended feed forward neural networks with random weights for face recognition

performance than others. Hence we can say that it is really thestructural character of the 2D-NNRW that makes sense in recogni-tion rate. And it also makes us confident that the proposed2D-NNRW based on the matrix-input data is more suitable forface image classification.

Finally in this section, we will compare the stability betweenNNRW and the proposed 2D-NNRW. Actually, we intuitivelybelieve that the proposed 2D-NNRW can be more stable thanNNRw since the total numbers of parameters in 2D-NNRW isðmþnþ1ÞL that far less than the ðmnþ1ÞL in NNRW. It means thatthere are less uncertain factors in 2D-NNRW than those in NNRW.And the following experimental results in Table 6 actually illus-trate our belief. In this experiment, we recorded the standarddeviation of 20 runs, and Groups 1 and 2 still represent the resultsof NNRW and 2D-NNRW with 500 and 900 hidden neurons,respectively.

As a matter of fact, from Table 6 we can see that for all thesefour databases, under the same number of nodes, the standarddeviations of 2D-NNRW are always smaller than that of NNRW,which is in accordance with our analysis.

3.4. Combine 2D feature extraction method with 2D-NNRW

As we all know, feature extraction and classification are twokey steps in face recognition. In this subsection, we will combine2D-NNRW with 2D feature extraction method, bi-directional twodimensional principal component analysis (B2DPCA) [29] to obtaina novel face recognition method.

Table 7 shows the recognition rate comparison of differentclassifiers with the same feature extraction tool B2DPCA on threedatabases. That is, we compare the proposed B2DPCAþ2D-NNRW

with B2DPCAþNN (nearest network) and B2DPCAþSVM. Wedesign this experiment since NN is also a classifier which can becombined with 2D feature extraction method directly, and we alsowant to show that 2D feature extraction method combined with1D classifier is not so favorable. The symbol in the parenthesismeans the metric we used in NN. Here we chose the best metricfor each database.

From Table 7, we can see that the proposed B2DPCAþ2D-NNRW always has the highest recognition rate among threemethods. As for B2DPCAþNN and B2DPCAþSVM, we can see thatonly on ORL database, B2DPCAþSVM has little better performancethan B2DPCAþNN, while on the other two databases, B2DPCAþSVM has much poorer performance than B2DPCAþNN. Therefore,we can say that our new classifier is really effective and haspotential practical utility.

4. Conclusions

Since the traditional classifiers for face recognition are usuallydesigned for the vector data, one should first convert the faceimages or their features into vectors, which unavoidably destroysthe structural correlation among elements that may subsequentlyinfluence recognition performance. Furthermore, though therehave been 2D feature extraction methods, there is few 2D effectiveclassifiers. In this paper, we designed a new matrix-input-basedclassifier, 2D-NNRW, for face recognition. In the proposed 2D-NNRW, we used a left projecting vector and a right projectingvector to replace the high dimensional input weight vector in FNNwith single hidden layer to preserve the structure information ofmatrix data, and learn the weights with the random idea. Therecognition rate comparison showed that the proposed 2D-NNRWcan really improve recognition performance, and our analysis forthe proposed 2D-NNRW further showed that it was really struc-tural character that contributed to this improvements.

Acknowledgments

The research was supported by the National Nature ScienceFoundation of China (Nos. 61101240, 61272023, and 91330118).

References

[1] F. Camastra, A. Vinciarelli, Automatic face recognition, in: Machine Learningfor Audio, Image, and Video Analysis: Theory and Applications, Springer, 2008,pp. 381-411.

[2] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci. 3 (1) (1991)71–86.

[3] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. fisherfaces:recognition using class specific linear projection, IEEE Trans. Pattern Anal.Mach. Intell. 19 (7) (1997) 711–720.

[4] M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independentcomponent analysis, IEEE Trans. Neural Netw. 13 (6) (2002) 1450–1464.

[5] X. He, S. Yan, Y. Hu, P. Niyogi, H. Zhang, Face recognition using laplacianfaces,IEEE Trans. Pattern Anal. Mach. Intell. 27 (3) (2005) 328–340.

[6] M.H. Yang, N. Ahuja, D. Kriegman, Face recognition using kernel eigenfaces, in:Proceedings of IEEE International Conference on Image Processing, 2000,pp. 37–40.

Table 4Recognition rate comparison on Extended YaleB and FERET databases with theinput weights in NNRW obeying uniform distribution (%).

Database Group 1 Group 2

NNRW NNRW-P 2D-NNRW NNRW NNRW-P 2D-NNRW

YaleB 87.24 91.98 97.00 95.31 96.22 98.65FERET 72.85 80.12 86.46 87.78 89.72 91.94

Table 5Recognition rates comparison on different databases with the input weights inNNRW obeying standard normal distribution (%).

Database Group 1 Group 2

NNRW NNRW-P 2D-NNRW NNRW NNRW-P 2D-NNRW

ORL 59.10 61.05 70.05 80.15 80.80 82.15YaleB 81.68 82.89 95.68 93.52 92.96 98.32FERET 64.58 64.98 80.72 85.21 84.54 89.68

Table 6Standard deviation comparison of the proposed 2D-NNRWwith NNRWon differentdatabases.

Database Group 1 Group 2

NNRW 2D-NNRW NNRW 2D-NNRW

ORL 0.0387 0.0271 0.0189 0.0122YaleB 0.0196 0.0163 0.0088 0.0061FERET 0.0320 0.0198 0.0150 0.0099Aberdeen 0.0296 0.0191 0.0218 0.0128

Table 7Recognition rate comparison of the classifiers with B2DPCA as the featureextraction tool (%).

Database NN SVM 2D-NNRW

ORL 91.50(L2) 92.00 94.12FERET 86.11(L1) 74.07 93.24Aberdeen 88.33(L1) 70.00 94.29

J. Lu et al. / Neurocomputing 136 (2014) 96–102 101

Page 7: Extended feed forward neural networks with random weights for face recognition

[7] M.H. Yang, Kernel eigenfaces vs. kernel fisherfaces: face recognition usingkernel methods, in: Proceedings of International Conference on AutomaticFace and Gesture Recognition, 2002, pp. 215–220.

[8] B. Poon, M.A. Amin, H. Yan, Performance evaluation and comparison of PCABased human face recognition methods for distorted images, Int. J. Mach.Learn. Cybern. 2 (4) (2011) 245–259.

[9] V.P. Vishwakarma, Illumination normalization using fuzzy filter in DCT domainfor face recognition, Int. J. Mach. Learn. Cybern. (2013) 1–18.

[10] G. Guo, S.Z. Li, K. Chan, Face recognition by support vector machines, in:Proceedings of Fourth IEEE International Conference on Automatic Face andGesture Recognition, 2000, pp. 196-201.

[11] J. Qin, Z.S. He, A SVM face recognition method based on Gabor-featured keypoints, in: Proceedings of IEEE International Conference on Machine Learningand Cybernetics, vol. 8, 2005, pp. 5144–5149.

[12] M.J. Er, S. Wu, J. Lu, H.L. Toh, Face recognition with radial basis function (RBF)neural networks, IEEE Trans. Neural Netw. 13 (3) (2002) 697–710.

[13] J. Yang, D. Zhang, A.F. Frangi, J. Yang, Two-dimensional PCA: a new approach toappearance-based face representation and recognition, IEEE Trans. PatternAnal. Mach. Intell. 26 (1) (2004) 131–137.

[14] D. Zhang, Z.H. Zhou, (2D)2 PCA: two-directional two-dimensional PCA forefficient face representation and recognition, Neurocomputing 69 (1) (2005)224–231.

[15] M. Li, B. Yuan, 2D-LDA: a statistical linear discriminant analysis for imagematrix, Pattern Recognit. Lett. 26 (5) (2005) 527–532.

[16] P. Sanguansat, W. Asdornwised, S. Jitapunkul, S. Marukatat, Two-dimensionallinear discriminant analysis of principle component vectors for face recogni-tion, in: Proceedings of IEEE International Conference on Acoustics, Speechand Signal Processing, vol. 2, 2006, pp. 345–348.

[17] C. Lu, S.J. An, W.Q. Liu, X.D. Liu, An innovative weighted 2DLDA approach forface recognition, J. Signal Process. Syst. 65 (1) (2011) 81–87.

[18] W.F. Schmidt, M.A. Kraaijveld, R.P.W. Duin, Feed forward neural networks withrandomweights, in: Proceedings of 11th IAPR International Conference, vol. II.Conference B: Pattern Recognition Methodology and Systems, 1992, pp. 1–4.

[19] B. Igelnik, Y.H. Pao, Additional Perspectives of Feedforward Neural-Nets andthe Functional-Link, Technical Report 93-115, Center for Automation andIntelligent Systems, Case Western Reserve University, 1993, also in Proceed-ings of IICNN'93, Nagoya, Japan, October 25–29, 1993, pp. 2284–2287.

[20] Y.H. Pao, G.H. Park, D.J. Sobajic, Learning and generalization characteristics ofthe random vector functional-link net, Neurocomputing 6 (2) (1994) 163–180.

[21] B. Igelnik, Y.H. Pao, Stochastic choice of basis functions in adaptive functionapproximation and the functional-link net, IEEE Trans. Neural Netw. 6 (6)(1995) 1320–1329.

[22] I.Y. Tyukin, D.V. Prokhorov, Feasibility of random basis function approximatorsfor modeling and control, in: Proceedings of IEEE Conference on ControlApplications (CCA) & Intelligent Control (ISIC), 2009, pp. 1391–1396.

[23] C. Hou, F. Nie, D. Yi, Y. Wu, Efficient image classification via multiple rankregression, IEEE Trans. Image Process. 22 (1) (2013) 340–352.

[24] X. Wang, C. Huang, X. Fang, J. Liu, 2DPCA vs. 2DLDA: face recognition using two-dimensional method, in: Proceedings of International Conference on ArtificialIntelligence and Computational Intelligence, vol. 2, 2009, pp. 357–360.

[25] F.S. Samaria, A.C. Harter, Parameterisation of a stochastic model for humanface identification, in: Proceedings of Second IEEE Workshop on Applicationsof Computer Vision, 1994, pp. 138–142.

[26] P.J. Phillips, H. Wechsler, J. Huang, P.J. Rauss, The FERET database andevaluation procedure for face-recognition algorithms, Image Vis. Comput. 16(5) (1998) 295–306.

[27] A.S. Georghiades, P.N. Belhumeur, D.J. Kriegman, From few to many: illumina-tion cone models for face recognition under variable lighting and pose, IEEETrans. Pattern Anal. Mach. Intell. 23 (6) (2001) 643–660.

[28] K.C. Lee, J. Ho, D.J. Kriegman, Acquiring linear subspaces for face recognitionunder variable lighting, IEEE Trans. Pattern Anal. Mach. Intell. 27 (5) (2005)684–698.

[29] A.A. Mohammed, R. Minhas, Q.M. Jonathan Wu, M.A. Sid-Ahmed, Human facerecognition based on multidimensional PCA and extreme learning machine,Pattern Recognit. 44 (10) (2011) 2588–2597.

Jing Lu received the B.Sc. degree in Applied Mathe-matics from China Jiliang University, China, in 2011. Sheis currently working towards the M.Sc. degree inApplied Mathematics in China Jiliang University, China.Her research interests include pattern recognition,machine learning, and neural networks.

Jianwei Zhao received the B.Sc. and M.Sc. degree inMathematics from Shanxi Normal University, China, in2000 and 2003, respectively, and in 2006 she receivedthe Ph.D. degree in Mathematics from Chinese Acad-emy of Sciences, China. She has been a professor ofDepartment of Mathematics, China Jiliang University.Her current research interests include pattern recogni-tion, machine learning, and neural networks.

Feilong Cao received the B.Sc. And M.Sc. degree inApplied Mathematics from Ningxia University, China, in1987 and 1998, respectively. In 2003, he received thePh.D. degree in Applied Mathematics from Xi'an Jiao-tong University, China. He was a Research Fellow withthe Center of Basic Sciences, Xi'an Jiaotong University,China, from 2003 to 2004. From 2004 to 2006, he was aPost-Doctoral Research Fellow with the School of Aero-space, Xi'an Jiaotong University, China. From June 2011to December 2011, and October 2013 to December2013, he was a Visiting Professor with the Departmentof Computer Science, Chonbuk National University,South Korea, and the Department of Computer Sciences

and Computer Engineering, La Trobe University, Melbourne, Australia, respectively.He is currently a Professor and the Dean of the College of Sciences, China JiliangUniversity, China. He has authored or co-authored over 100 scientific papers inrefereed journals. His current research interests include pattern recognition, neuralnetworks, and approximation theory.

J. Lu et al. / Neurocomputing 136 (2014) 96–102102