[IEEE 2013 Sixth International Conference on Contemporary Computing (IC3) - Noida, India...

7
TAG: A Two-level framework for user Authentication through hand Gestures Anand Gupta Dept. of Computer Engineering Netaji Subhas Institute of Technology New Delhi, India [email protected] Ashima Arora Dept. of Information Technology Netaji Subhas Institute of Technology New Delhi, India [email protected] Bhawna Juneja Dept. of Information Technology Netaji Subhas Institute of Technology New Delhi, India [email protected] Abstract Gesture-based interfaces are gaining attention from researchers worldwide owing to a user-friendly and natural mode of interaction. The direct use of hands as an input method is an attractive way to design Human-Computer interfaces. In scenarios where hand gestures are used to perform manipulation tasks on sensitive data, there is a need to authenticate a user. In this paper, we propose a two-level framework for addressing this need by augmenting the existing gesture recognition methodologies with hand biometrics. In the first level, the hand gesture is recognized by extracting Gesture-Specific Features (GSFs) from a video consisting of a gesture password. This is followed by the second level where the user is authenticated based on User-Specific Features (USFs). These GSFs and USFs together form a Feature Vector (FV) which is unique for every user. This is fed into LIBSVM for authentication purposes. Before an unknown or a malicious user tries to modify sensitive data, his presence is detected, he is denied access and a warning is sent to the registered user. The framework is tested using hand gestures derived from American Sign Language symbols. An overall accuracy rate of 95 percentage demonstrates the potentiality of TAG in access control through a hand gesture interface. Keywords User Authentication; Hand Biometrics; Gesture- Specific Features; User-Specific Features; LIBSVM I. INTRODUCTION With the rise in identity thefts, it is important for every organization to ensure that its data reaches only the intended recipient. So, there is a need for security systems to check users for their authenticity at multiple levels before allowing them biometric authentication techniques have been implemented. In this section we discuss their drawbacks and our motivation for developing TAG. A. Related Work Biometrics refers to the identification of humans by their characteristics or traits. They are broadly classified into physiological and behavioral characteristics, as mentioned in [1], which have been used for automated recognition based on features of specific human body parts and behavior of a person. Several approaches using physiological characteristics have been implemented in [2][3][4][5]. These characteristics are related to body parts such as fingerprint, hand geometry, palm print, face, iris, retina, ear and Deoxyribo Nucleic Acid (DNA).On the contrary, [6][7] make use of behavioral characteristics which deal with signature, voice, keystroke and gait. Hand biometrics as a physiological method is user- friendly and can be used for authentication purposes and access control. Natural Hand Gesture Recognition is one of the very active research areas in the Computer Vision field. It provides the easiness to interact with machines without using any extra device. Uhl and Wild [8] focus on single-sensor approaches to multimodal biometric authentication targeting the human hand in multiple-matcher scenarios thus providing higher security in terms of accuracy and resistance to biometric system attack. Reference [9] investigates hand biometry as a possible technology in user authentication scenarios, establishes algorithms and demonstrates its applicability for populations of hundreds of people. Ribaric and Fratric [10] present a multi- modal biometric identification system which uses eigen finger and eigen palm features of the human hand. The foundation for hand biometrics is laid by hand gesture recognition. Reference [11] put forward a new approach for recognizing static gestures based on Zernike moments (ZMs) and pseudo-Zernike moments (PZMs). Reference [12] proposes hierarchical dynamic vision model (HDVM) based on dynamic Bayesian networks (DBNs) for automatically recognizing human hand gestures. B. Motivation The following drawbacks in present biometric techniques have motivated us to develop TAG. 1) Existing biometric techniques provide a single level check for authenticity. 2) Direct use of hands as input to the system is an appealing method since it conveys a lot more information as compared to traditional input devices such as mouse or keyboard. 3) A user should have the advantage of interacting with the system from a distance without any physical interaction with it. 4) Gesture-based authentication promises a convenient, no- touch and user-friendly environment. 978-1-4799-0192-0/13/$31.00 ©2013 IEEE 503

Transcript of [IEEE 2013 Sixth International Conference on Contemporary Computing (IC3) - Noida, India...

TAG: A Two-level framework for user Authentication through hand Gestures

Anand Gupta

Dept. of Computer Engineering Netaji Subhas Institute of

Technology New Delhi, India

[email protected]

Ashima Arora Dept. of Information Technology

Netaji Subhas Institute of Technology

New Delhi, India [email protected]

Bhawna Juneja Dept. of Information Technology

Netaji Subhas Institute of Technology

New Delhi, India [email protected]

Abstract Gesture-based interfaces are gaining attention from researchers worldwide owing to a user-friendly and natural mode of interaction. The direct use of hands as an input method is an attractive way to design Human-Computer interfaces. In scenarios where hand gestures are used to perform manipulation tasks on sensitive data, there is a need to authenticate a user. In this paper, we propose a two-level framework for addressing this need by augmenting the existing gesture recognition methodologies with hand biometrics. In the first level, the hand gesture is recognized by extracting Gesture-Specific Features (GSFs) from a video consisting of a gesture password. This is followed by the second level where the user is authenticated based on User-Specific Features (USFs). These GSFs and USFs together form a Feature Vector (FV) which is unique for every user. This is fed into LIBSVM for authentication purposes. Before an unknown or a malicious user tries to modify sensitive data, his presence is detected, he is denied access and a warning is sent to the registered user. The framework is tested using hand gestures derived from American Sign Language symbols. An overall accuracy rate of 95 percentage demonstrates the potentiality of TAG in access control through a hand gesture interface.

Keywords User Authentication; Hand Biometrics; Gesture-Specific Features; User-Specific Features; LIBSVM

I. INTRODUCTION With the rise in identity thefts, it is important for every organization to ensure that its data reaches only the intended recipient. So, there is a need for security systems to check users for their authenticity at multiple levels before allowing them

biometric authentication techniques have been implemented. In this section we discuss their drawbacks and our motivation for developing TAG.

A. Related Work Biometrics refers to the identification of humans by their characteristics or traits. They are broadly classified into physiological and behavioral characteristics, as mentioned in [1], which have been used for automated recognition based on features of specific human body parts and behavior of a person. Several approaches using physiological characteristics have been implemented in [2][3][4][5]. These characteristics are related to body parts such as fingerprint, hand geometry, palm

print, face, iris, retina, ear and Deoxyribo Nucleic Acid (DNA).On the contrary, [6][7] make use of behavioral characteristics which deal with signature, voice, keystroke and gait. Hand biometrics as a physiological method is user-friendly and can be used for authentication purposes and access control. Natural Hand Gesture Recognition is one of the very active research areas in the Computer Vision field. It provides the easiness to interact with machines without using any extra device. Uhl and Wild [8] focus on single-sensor approaches to multimodal biometric authentication targeting the human hand in multiple-matcher scenarios thus providing higher security in terms of accuracy and resistance to biometric system attack. Reference [9] investigates hand biometry as a possible technology in user authentication scenarios, establishes algorithms and demonstrates its applicability for populations of hundreds of people. Ribaric and Fratric [10] present a multi-modal biometric identification system which uses eigen finger and eigen palm features of the human hand. The foundation for hand biometrics is laid by hand gesture recognition. Reference [11] put forward a new approach for recognizing static gestures based on Zernike moments (ZMs) and pseudo-Zernike moments (PZMs). Reference [12] proposes hierarchical dynamic vision model (HDVM) based on dynamic Bayesian networks (DBNs) for automatically recognizing human hand gestures.

B. Motivation The following drawbacks in present biometric techniques have motivated us to develop TAG. 1) Existing biometric techniques provide a single level check

for authenticity. 2) Direct use of hands as input to the system is an appealing

method since it conveys a lot more information as compared to traditional input devices such as mouse or keyboard.

3) A user should have the advantage of interacting with the system from a distance without any physical interaction with it.

4) Gesture-based authentication promises a convenient, no-touch and user-friendly environment.

978-1-4799-0192-0/13/$31.00 ©2013 IEEE 503

C. Contribution In this paper, we present a novel methodology which grants system access to a user by merging the concepts of gesture recognition and hand biometrics. Our contribution can be summarized as follows: 1) Development of a framework for user authentication

employing gesture interpretation and hand biometrics. It includes formation of a Feature Vector (FV) which consists of User Specific Features (USF) and Gesture Specific Features (GSF).

2) A two level authentication system which grants authorised access and reduces the possibility of the entry of a malicious person.

3) Incorporation of physiological characteristics of hand, some of which are user specific (area, perimeter, centroid, orientation of fingers, aspect ratio) and rest gesture specific (bounding box, entropy, discrete cosine transform).

4) Experimental results on 28 different subjects from our self-created dataset.

II. TAG FRAMEWORK We take into account the following assumptions associated with the framework:

1) All images and videos are taken in specific illumination conditions.

2) 3) ASL symbols depicting letters A-Z and numerals 0-9

are chosen, as they involve only one hand at a time.

User authentication in the TAG framework proceeds in two stages-Training followed by Testing. The entire framework is depicted in the flowchart of Fig. 1. A. Training: In this stage, the hand images of every

registered user are used to train the system with the help of LIBSVM [13] [14].

Algorithms used at Training stage a. Skin Region Identification (SRI)

Algorithm 1 SRI Input: RGB Image Output: Binary skin-segmented image i.e. bimage Repeat

[himage,simage,vimage]=get_ hsv(framei); i.e. get the hue, saturation and value for every frame;

if himage(i,j) >= 0.01 or himage(i,j) <=0.50 bimage(i,j)=1; else bimage(i,j)=0; end if end for end for until sample in the dataset to be processed

_________________________________________________ Description of Algorithm 1

Fig. 1. Flowchart depicting the working of the TAG framework This algorithm finds the skin pixel region by using a device independent HSV colour model. The range of the hue value that represents the potential skin pixel value is experimentally determined to lie between 0.01 and 0.50. The image pixels that lie in this range are classified as skin pixels and rest as non-skin pixels. The result of this process is a binary image, where pixel values equal to 1 indicate potential skin colour locations.

a. Gesture-Specific Feature Extraction(GSFE)

Algorithm 2 GSFE Input: frame, frame Output: (GSF values) Repeat

(1)=get_dct2( );

= get_centroidx (skin_binary_frame); = get_centroidy (skin_binary_frame); = get_fingertip (skin_binary_frame);

end for __________________________________________________ Description of Algorithm 2

504

In this algorithm, the following gesture-specific features are extracted from every video frame:

i. Discrete Cosine Transform of the binary image ii. x-coordinate of the hand image centroid

iii. y-coordinate of the hand image centroid iv. Number of fingertips

c. User-Specific Feature Extraction(USFE)

ALGORITHM 3 USFE Input: frame, frame Output: (USF values) __________________________________________________ Repeat

);

get_aspect(skin_binary_ frame); get_orientation (skin_binary_ frame);

= get_area (skin_binary_ frame); = get_perimeter (skin_binary_ frame);

= get_boundingbox1 (skin_binary_ frame); = get_boundingbox2 (skin_binary_ frame); = get_boundingbox3 (skin_binary_ frame); = get_boundingbox4 (skin_binary_ frame);

end for __________________________________________________ Description of Algorithm 3 In this algorithm, the following user-specific features are extracted from every video frame:

i. Hand Image entropy ii. Aspect Ratio of the hand

iii. iv. Area of hand v. Perimeter of hand

vi. Coordinates of bounding box outlining

Fig. 2 indicates the sequence of operations being carried out on the input video frames.

d. Training using LIBSVM Algorithm 4 Train_ LIBSVM Input: FV for all registered users ( ) Output: Model file consisting of trained data attributes __________________________________________________ Repeat for in the database run_LIBSVM_algorithm( ); run_LIBSVM_algorithm( ); end for Description of Algorithm 4 The LIBSVM algorithm for multi-class classification takes the feature vectors as input and then categorizes the dataset into different classes using a pre-defined kernel.

B. Testing: The input to this stage comprises of a series of RGB images-frames from the test video just obtained.

Testing is subdivided into two levels as follows: Fig. 2. The gesture password made by a user. (i) The input RGB frame. (ii) The binary image after noise removal. (iii) Finger blob detection, fingertip detection & no. of fingers detected.

1) Level-1 Authentication: This level checks for similarity of GSFs in the test video frames and in the FV database.

2) Level-2 Authentication: The system reaches this level when the test subject has passed the Level-1 authentication check. At this level, USFs extracted from test video frames are matched with the ones present in the FV database for the entry corresponding to the observed gesture at Level-1. Algorithms used at Testing stage

a. GSF Comparison using LIBSVM

ALGORITHM 5 GSFComp Input: FV of test subject frame i.e. , FV dataset Output: Level-1 access permission __________________________________________________ Repeat for bool flag=test_gsf( ); if flag= =TRUE Display - else Display end for Description of Algorithm 5 This algorithm comprises the Level-1 authentication in the TAG framework. The GSFs of test subject frames are compared with the GSFs stored in the FV database of registered users. If a match is found, LIBSVM finds the concerned registered user, the test subject is authenticated at the first level based on GSF matching and the system proceeds

i.

ii.

iii.

1 2 H E Y

505

to the next level. On the other hand, if the GSFs do not match any of the features present in the FV database for registered users, LIBSVM labels the test subject as an

and stops checking any further. b. USF Comparison using LIBSVM

ALGORITHM 6 USFComp Input: FV of test subject frame i.e. , FV dataset Output: Level-2 access permission __________________________________________________ Repeat for bool flag=test_usf( ); if flag= =TRUE Display else Display end for Description of Algorithm 6 In this algorithm, TAG checks for similarity of USFs between the test subject frames and those stored in the FV database at the training stage. If a USF match is found with the corresponding GSF values found at the Level-1 authentication, then the test subject is identified as a registered user and granted access. In all other scenarios, he is labeled as a

and denied access. A comprehensive symbol table of 13 features used, along with their respective type and symbol can be found in Table I.

III. EXPERIMENTS AND RESULTS The TAG algorithms are implemented on Dell Inspiron N5110, Intel® Core (TM) Processor, 2 GB RAM, Windows 7 Home Basic in MATLAB R2009 and LIBSVM. We have created our own video dataset comprising of series of ASL symbols with 28 different subjects for performing the experiments. For this dataset, there are 4 users chosen to be unregistered (U1-U4), 4 to be malicious users (M1-M4) and the rest 20 to be registered users (R1-R20), whose records exist in the database. We conduct three experiments on the dataset so constructed. They are listed as follows: 1) Choosing different classification kernels for SVM training

set classification in order to determine the best training

2) Justifying the importance of choosing GSF and USF. 3) Observing TAG performance when test subject is a

registered user, malicious user and an unknown user. The training data comprises of hand images of R1-R20 and testing data comprises of hand videos of R1-R20, U1-U4 and M1-M4.The findings for each experiment and the conclusions drawn are discussed in detail in the following subsections. Experiment 1 In this experiment, we wish to know the effect of choosing different kernel functions provided by LIBSVM at the time of classification. As discussed earlier, we are employing three kernels RBF (Radial Basis Function), Polynomial and Sigmoid. Training and testing data are fed into TAG and

classification accuracies are observed for the three kernels. The results are illustrated in the bar graph of Fig. 4.

TABLE 1. SYMBOL TABLE FOR USER S HAND FEATURES

S.No Feature Name Symbol Type

1 DCT f1 GSF 2 Centroid x

coordinate f2 GSF

3 Centroid y coordinate

f3 GSF

4 Number of fingertips

f4 GSF

5 Entropy f5 USF 6 Aspect Ratio f6 USF 7 Orientation of

fingers f7 USF

8 Area f8 USF 9 Perimeter f9 USF 10 Bounding box 1 f10 USF 11 Bounding box 2 f11 USF 12 Bounding box 3 f12 USF 13 Bounding box 4 f13 USF

Discussion Fig.3 shows that for our purpose, Polynomial kernel gives the best accuracy of classification, 95%. In addition, RBF kernel gives an accuracy of 92%, followed by the sigmoid kernel whose accuracy is 90%. We conclude that the Polynomial kernel is best suited for our dataset. Experiment 2 In this experiment we wish to observe the importance of using GSF and USF for user authentication. We do so by first checking if the values of GSFs for a given symbol stay within a fixed range for multiple users. Then, we check if the values of USFs for those users differ enough to recognize a clear distinction between them. This is crucial in order to validate the existence of GSFs and USFs. We take 3 users namely

(R), Unregistered (U) or Malicious (M). We also choose an 4

shows the hand images of users A1, A2 and A3 making the

i. GSF values for different users making the same ASL

symbol The values of GSF for these three images are shown in Table 2. Looking at every row of values in Table II, it is evident that GSFs f1-f4 has more or less similar values for all the three images thereby indicating that these may be chosen to be gesture-specific features.

506

Fig. 4.

Fig. 3. Bar graph depicting TAG accuracy for 3 LIBSVM kernels

TABLE II. GSF VALUES FOR USERS AND MAKING ASL SYMBOL

User/ GSF

A1 A2 A3

f1 1077.818 871.254 935.191 f2 424.187 407.413 386.607 f3 321.946 312.48 304.011 f4 2 2 2

ii. USF values for different users making the same ASL

symbol We again refer to the images in Fig 5. The values of USF for these images are shown in Table III. It indicates that moving along a row; the values of a feature vary for every user. This implies that these features are independent of the symbol and in fact, depend on the user and can be chosen as user-specific features. The above discussions justify the effect of incorporating GSFs and USFs into the TAG framework for the purpose of user authentication. The bar graph in Fig.5 depicts this impact in a quantitative manner on the accuracy of classification for the three kernels mentioned in the previous sub-section. It can be seen that there is an increase in accuracy when both GSFs and USFs are taken into account. An appreciable accuracy of 95% is observed for the polynomial kernel.

TABLE III. USF VALUES FOR USERS A1, A2 AND A3 MAKING ASL SYMBOL

User/ USF

A1 A2 A3

f5 7.037 5.989 8.393

f6 1.526 3.519 2.072

f7 103.75 154.062 84.017

f8 63473 73665 44292

f9 1659.9 1870.4 1480.18

f10 216.5 178.5 192.5

f11 158.5 91.5 276.5

f12 410 462 352

f13 322 389 204

Fig. 5. Bar graph depicting the variation in classification accuracy with USFs

and GSFs alone for three LIBSVM kernels.

Experiment 3

subject tries to gain access. Suppose an unauthorized individual X wishes to access the system resources, he needs to replicate rThere can be two possible scenarios: a. b. In case a, when X does not know the correct password, the GSFs corresponding to his hand symbols will not match those belonging to Y, already stored in the dataset. X will, hence not pass the Level-1 authentication. X will be tagged as an unregistered user and denied entry into the system. In case b, when X knows the correct password, the GSFs corresponding to his hand symbols might match those belonging to Y. But, his USFs will not match those of Y. It is

and cannot be replicated. X will not pass the Level-2 authentication check, he will be tagged as a malicious user and a warning message will be sent to the registered user possessing the gesture password which has been replicated. The critical, confidential data will be hidden from him and he would not be able to modify it. Table IV shows typical normalized values of FV for M

types of users. Every row value is divided by the feature value for the

user. In this manner, a normalized set of FV values are obtained. Plots of FVs for M and

can be seen in Fig. 6 and Fig. 7 respectively.

92 95

90 86 88 90 92 94 96

RBF Polynomial Sigmoid Perc

enta

ge A

ccur

acy

87

94

89 90

95

90 92

95

90

82 84 86 88 90 92 94 96

Perc

enta

ge A

ccur

acy o

f Cl

assi

ficat

ion

LIBSVM kernel

Only GSF

Only USF

GSF and USF

507

TABLE IV. TYPICAL FV VALUES FOR M, R AND U TYPES OF USERS

FV M R U

f1 1.001773 1 0.097636

f2 0.868473 1 0.519215

f3 1.009158 1 1.385636

f4 1 1 2

f5 13.17716 1 5.710102489

f6 1.638498 1 0.563380282

f7 1.314752 1 2.221930055

f8 1.354946 1 1.176295016

f9 3.242075 1 1.823487032

f10 0.725425 1 1.037251356

f11 0.881988 1 1.031055901

f12 2.446483 1 2.324159021

f13 2.25 1 2.0125

Discussion Fig. 6 shows the variation in FV values when M tries to enter into the system by replicating gesture password. It is evident from the graph that the GSFs (f1-f4) for M are very close to the actual values for R . In other words, the malicious user is able to replicate the gesture password. He is authenticated at level-1. However, there is an appreciable difference in the USF values (f5-f13) for R and M . This is caught at Level-2 authentication and M is subsequently denied access. On the other hand, in Fig. 7, there is an appreciable difference in GSF values (f1-f4) between U and R . This difference implies that U will not pass the Level-1 authentication and would be denied access at the first level itself.

IV. CONCLUSION AND FUTURE WORK In this paper, we have presented TAG, a twofold authentication framework for biometric authentication of users in order to provide a two-level check on authenticity. It is achieved by suitably exploiting hand features unique to every individual while using the traditional gesture recognition technique. The human hand acts as an identifier to distinguish between a registered and malicious user. By suitably extracting GSFs and USFs in two subsequent levels, an imposter is caught before he can gain access to sensitive data; at the same time a warning is sent to the registered user to inform him that his identity has been under threat. The experimental results prove the power of hand biometrics while using gesture-based interfaces. The proposed research work can be used efficiently as an augmentative tool for security based identification and authentication. Its applicability in secure transmission of commands remotely through a hand gesture interface can be another potential usage. However, the framework may be modified in order to make our system more effective and useful for automation in future. It can be achieved by creating an exhaustive list of feature vectors,

taking into consideration fluctuating illumination conditions.

Fig. 6. Graph depicting typical FV values for M users

Fig. 7. Graph d users

Also, the gesture glossary can be further extended by incorporating user defined symbols instead of using pre-defined ASL sign language symbols. It will lead to a more user-friendly environment where users are allowed to create a password based on their choice of symbols and they may prove to be hard to crack by a swindler.

REFERENCES

Computer IEEE International Conference on Information Technology: New Generations, Las Vegas, Nevada, USA, 2007.

[2] C. Analysis Neural Network Algorithm for Fingerprint Recognition in

-Pacific Conference on Information Processing,pp. 182-186, Shenzhen, 2009.

[3] H.B. Kekre, S.D. Thepade, and A. Maloo,urnal on

Recent Trends in Engineering and Technology: vol. 05, no. 01,pp 185-190, 2011.

[4] D. Bhattacharyya, P. Das, S.K. Bandyopadhyay, and T.H. Kim,Texture Analysis and Feature Extraction for Biometric Pattern

f Database Theory and Application: vol. 1., pp. 53-61, 2007.

[5] M.S. Nixon, J.N. Carter, and A.H. A Novel Ray Analogy for Enrolment of Ear Biometrics ometrics: Theory, Applications and Systems, Washington DC, USA, September 2010.

[6] Eigenlips for Robust Speech Recognition , IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 669-672, Adelaide, Australia, April 1994.

[7 -based

International Conference on Automatic Face and Gesture Recognition, Washington, DC, USA, May 2002.

[8] A. Uhl ersonal recognition using single-sensor multimodal Third international conference on image and signal

processing -404,Berlin, 2008.

0.1

1

10

100

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

FV v

alue

(nor

mal

ized

)

FV number

M

R

0

2

4

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

FV v

alue

(n

orm

aliz

ed)

FV number

R

U

508

[9] E. Yoruk , H. Dutag : Image and Vision Computing ,pp. 483-497 ,2006.

[10]

Analysis And Machine Intelligence: vol. 27, no. 11 pp. 1698-1709, November 2005.

[11] C.C. Chang, J.J. Chen

Engineering 22, pp 1047-1057, 2006. [12 ition

Using Hierarchical Dynamic Bayesian Networks Through Low-Level

achine Learning and Cybernetics, pp. 3247 3253, Kunming, July 2008.

[13] B. E. Boser, I. M. Guyon, rithm for Fifth Annual Workshop on Computational

Learning Theory, Pittsburgh. ACM, 1992. [14

ACM Transactions on Intelligent Systems and Technology, vol. 2., issue 3.,USA, April 2011.

509