[IEEE 2012 IEEE International Symposium on Multimedia (ISM) - Irvine, CA, USA...

2
Automatic Actor Recognition for Video Services on Mobile Devices Lai-Tee Cheok, Sol Yee Heo, Donato Mitrani, Anshuman Tewari Dallas Technology Lab Samsung Telecommunications America Richardson, Texas, USA {ltcheok,sol.heo,dmitrani,a.tewar}@sta.samsung.com Abstract—Face recognition is one of the most promising and successful applications of image analysis and understanding. Applications include biometrics identification, gaze estimation, emotion recognition, human computer interface, among others. A closed system trained to recognize only a predetermined number of faces will become obsolete very easily. In this paper, we describe a demo that we have developed using face detection and recognition algorithms for recognizing actors/actresses in movies. The demo runs on a Samsung tablet to recognize actors/actresses in the video. We also present our proposed method that allows user to interact with the system during training while watching video. New faces are tracked and trained into new face classifiers as video is continuously playing and the face database is updated dynamically. Keywords-face detection; face recognition; image analysis; image understanding I. INTRODUCTION Rapidly advancing technology has made available methodologies that can be employed in face recognition. Face recognition is important in image analysis and understanding and supports a wide range of applications such as biometrics, surveillance, gaze estimation, emotion modeling, visual search and others. Challenging factors that influence the appearance and recognition of face includes lighting conditions, facial expressions, partial occlusions, etc. Many approaches have been developed for automatic face recognition such as cross-correlation [1], Linear Discriminant Analysis [2,3], Principal Component Analysis [4], neural network [5], Karhunen-Loeve expansion etc. The accuracy of face recognition is also dependent on the accuracy of the training data which should be representative of the whole data set to be detected. One of the practical and useful applications in face detection is actor recognition where users can easily and quickly browse scenes by specifying the actors that appear in them. We have developed a face recognition system that runs on Samsung tablet to recognize the faces of actors/actresses in video while video is playing, and retrieves video segments/frames containing the actor/actress. We also propose a novel training method. Users are allowed to interact with the system to train while watching video. The face is then tracked across the entire video sequence and new faces are trained while video is playing. This paper is organized as follows: we present our architecture in Section II and our demo application in Section III. In Section IV, we describe the recognition engine, training data and our proposed technique before concluding the paper in Section V. II. ARCHITECTURE Fig. 1 illustrates the architecture for our actor recognition system. We have developed a front-end GUI that allows users to select the actor’s names and display the video frames where the actor is detected and recognized in the scene. The application also includes libraries that we have built for capturing user’s input, extracting video frames, preprocessing these frames to the appropriate format, process results from face recognition engine and displaying the frames containing the actors on the GUI while handling continuous playback of video. During the training phase, the system reads the face of the actors and trains the system based on features of the face. In the classification phase, for each video frame, if the face is detected, it is passed to the classifier. Confidence scores are computed for each face ID (actor). Video frames that have highest scores for the specified actor are returned. Figure 1. System Architecture Face detection Face classifier Known face? Match with specified actor (face ID) Contains actor face? Display on GUI training Video frames Initial training data (e.g. from Google, Bing) User selects while video is playing no yes yes 2012 IEEE International Symposium on Multimedia 978-0-7695-4875-3/12 $26.00 © 2012 IEEE DOI 10.1109/ISM.2012.80 396 2012 IEEE International Symposium on Multimedia 978-0-7695-4875-3/12 $26.00 © 2012 IEEE DOI 10.1109/ISM.2012.80 384 2012 IEEE International Symposium on Multimedia 978-0-7695-4875-3/12 $26.00 © 2012 IEEE DOI 10.1109/ISM.2012.80 384

Transcript of [IEEE 2012 IEEE International Symposium on Multimedia (ISM) - Irvine, CA, USA...

Automatic Actor Recognition for Video Services on Mobile Devices

Lai-Tee Cheok, Sol Yee Heo, Donato Mitrani, Anshuman Tewari Dallas Technology Lab

Samsung Telecommunications America Richardson, Texas, USA

{ltcheok,sol.heo,dmitrani,a.tewar}@sta.samsung.com

Abstract—Face recognition is one of the most promising and successful applications of image analysis and understanding. Applications include biometrics identification, gaze estimation, emotion recognition, human computer interface, among others. A closed system trained to recognize only a predetermined number of faces will become obsolete very easily. In this paper, we describe a demo that we have developed using face detection and recognition algorithms for recognizing actors/actresses in movies. The demo runs on a Samsung tablet to recognize actors/actresses in the video. We also present our proposed method that allows user to interact with the system during training while watching video. New faces are tracked and trained into new face classifiers as video is continuously playing and the face database is updated dynamically.

Keywords-face detection; face recognition; image analysis; image understanding

I. INTRODUCTION Rapidly advancing technology has made available

methodologies that can be employed in face recognition. Face recognition is important in image analysis and understanding and supports a wide range of applications such as biometrics, surveillance, gaze estimation, emotion modeling, visual search and others. Challenging factors that influence the appearance and recognition of face includes lighting conditions, facial expressions, partial occlusions, etc. Many approaches have been developed for automatic face recognition such as cross-correlation [1], Linear Discriminant Analysis [2,3], Principal Component Analysis [4], neural network [5], Karhunen-Loeve expansion etc. The accuracy of face recognition is also dependent on the accuracy of the training data which should be representative of the whole data set to be detected.

One of the practical and useful applications in face detection is actor recognition where users can easily and quickly browse scenes by specifying the actors that appear in them. We have developed a face recognition system that runs on Samsung tablet to recognize the faces of actors/actresses in video while video is playing, and retrieves video segments/frames containing the actor/actress. We also propose a novel training method. Users are allowed to interact with the system to train while watching video. The

face is then tracked across the entire video sequence and new faces are trained while video is playing.

This paper is organized as follows: we present our architecture in Section II and our demo application in Section III. In Section IV, we describe the recognition engine, training data and our proposed technique before concluding the paper in Section V.

II. ARCHITECTURE Fig. 1 illustrates the architecture for our actor recognition

system. We have developed a front-end GUI that allows users to select the actor’s names and display the video frames where the actor is detected and recognized in the scene.

The application also includes libraries that we have built for capturing user’s input, extracting video frames, preprocessing these frames to the appropriate format, process results from face recognition engine and displaying the frames containing the actors on the GUI while handling continuous playback of video. During the training phase, the system reads the face of the actors and trains the system based on features of the face. In the classification phase, for each video frame, if the face is detected, it is passed to the classifier. Confidence scores are computed for each face ID (actor). Video frames that have highest scores for the specified actor are returned.

Figure 1. System Architecture

Face detection

Face classifier

Known face?

Match with specified actor (face ID)

Contains actor face?

Display on GUI

training

Video framesInitial training data (e.g. from Google, Bing)

User selects while video is playing

no

yes

yes

2012 IEEE International Symposium on Multimedia

978-0-7695-4875-3/12 $26.00 © 2012 IEEE

DOI 10.1109/ISM.2012.80

396

2012 IEEE International Symposium on Multimedia

978-0-7695-4875-3/12 $26.00 © 2012 IEEE

DOI 10.1109/ISM.2012.80

384

2012 IEEE International Symposium on Multimedia

978-0-7695-4875-3/12 $26.00 © 2012 IEEE

DOI 10.1109/ISM.2012.80

384

Figure 2. Launching the Actor Recognition Application on Samsung Tablet

III. FACE RECOGNITION APPLICATION ON MOBILE DEVICE Fig. 2 shows a screen shot of our demo application on

Samsung Galaxy 10.1 Tablet running Android version 3.2. User clicks on the "face detect" button to launch the application. Once the application is launched, users are presented with the front-end GUI showing a list of actors'/actresses’ names and a live video (movie) playing continuously within the main window. Users interact with the application by selecting an actor's name on the left via the touch screen capability of Samsung tablet as shown in Fig. 3. The video frames containing the scene where the actors appear in are displayed at the bottom of the GUI.

IV. FACE RECOGNITION ENGINE

A. Face Recognition Algorithms and Software The face recognition algorithm runs on the server which

has an Intel Core i5 CPU @ 2.5GHz speed and 4GB RAM. The OS version is Windows 7 32 bit. The library crops the face area in the video frame, converts to gray scale and resizes it before face recognition.

B. Training Data We have written software to acquire and process training

data automatically from several sources including Google and Bing. A subset of the data is manually collected as well. Table 1 shows some performance data from these sources. The data set match rate of data from Google is around 85-90% while from Bing is around 80-85%. The corresponding recognition accuracy is 55-60% and 50-55% respectively. Data set match rate of data collected manually is 100% while its recognition rate is 85%.

C. Novel Method We have proposed a way of allowing the user to interact

with the system during training, while the video is playing. The faces are then automatically tracked across the video sequence and new faces from the video are trained as the video is playing, and the face database is updated dynamically.

Figure 3. Face Recognition Demo on Samsung Tablet

TABLE 1. PERFORMANCE OF DATA

Data Set Match Rate

Recognition Accuracy

Automatic Data Collection from Google 85%-90% 55%-60% Automatic Data Collection from Bing 80%-85% 50%-55%

Manual Data Collection 100% 85%

V. CONCLUSION We have developed our actor recognition demo on

Samsung Tablet. Our application allows user to select an actor’s name. The video frames containing recognized actor’s faces are then displayed on the tablet. Future work involves improving the robustness and accuracy of the algorithm, and further exploring our novel technique for acquiring and training data during continuous playback of video.

REFERENCES [1] M.S.U.Sarwar, A. Sharin, M.R. Khan, H.Imtiaz, S.A. Fattah, “A face

recognition scheme based on spatial correlation function,” IEEE Region 10 Conference, pp. 671-674, 2010.

[2] P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, “Eigenfaces vs Fisherfaces: recognition using class specific linear projection,” IEEE Trans. on Pattern Analysis and Machine Intelligence, 28(12), pp 2037-2041, 2006.

[3] M.I.Razzak, M.K.Khan, K.Alghathbar, R.Yousaf, “Face recognition using layered linear discriminant analysis and small subspace,” Proceedings of IEEE 10th International Conference on Computer and Information Technology, pp. 1407-1412, 2010.

[4] H.M.EI-Bakry, M.Hamada, “Fast principal component analysis for face detection using cross-correlation and image decomposition,” Proceedings of the International Conference on Neural Networks, pp. 148-153, 2009.

[5] Z.Shi, J. Hu, “Local linear discriminant analysis with composite kernel for face recognition,” Proceedings of the International Joint Conference on Neural Networks, pp. 1-5, 2012.

397385385