The Good, the Bad and the Ugly: Attractive Portraits … · The Good, the Bad and the Ugly:...

4
The Good, the Bad and the Ugly: Attractive Portraits from Video Sequences Georgia Albuquerque, Timo Stich, Anita Sellent, Marcus Magnor Computer Graphics Lab, TU Braunschweig uhlenpfordtstr. 23, 38106 Braunschweig, Germany Email:{georgia,stich,sellent,magnor}@cg.tu-bs.de Abstract Taking a portrait picture is frequently an uncomfortable task for the photographed person, and the resulting shots are in many cases unnatural and unsatisfactory. A common solution is to take a large number of pictures and then select the favorite ones. Instead of a manual selection, an automatic selection of good portraits from a captured video sequence can increase the chances to obtain a pleasing portrait picture without much effort. In this paper, we present an automatic solution for classification of face images from video sequences. This, in turn, can not only ease the task of taking good portrait pictures, but it can also make the acquisition of portraits more comfortable. 1 Introduction In recent years, thanks to the advances in high-quality digital video, it has become possible to consider individual frames of a video sequence as single photographs with acceptable quality (see Figure1). We present an application of the work first proposed in [1] by extracting and selecting good portraits from video sequences. In [1], work is presented on qualitative classification of portraits that automatically classifies a given set of portraits as good or bad. In contrast to the traditional idea of aesthetics in photography, where it is defined as the way people usually characterize beauty in this form of art [2], the presented application can learn the user’s taste. It shall not only be able to select attractive shots, but to select good shots from any subject, even if the subject cannot be considered beautiful in the aesthetic sense. There are many advantages in extracting the pictures directly from a video sequence; it allows the photographed subject to behave more naturally, without habitual annoying camera clicks. Furthermore, using video sequences instead of single camera shots can save time during acquisition because, considering the same time interval, the amount of frames obtained by video acquisition is larger than the number of shots from a common photo-camera. 2 Related Work Recently, in the industrial sector, a wide range of classification applications has been implemented and embedded in ordinary digital cameras. Face detection algorithms, for example, are included in diverse digital cameras of different manufacturers. The Image Intelligence[3] technology from Fujifilm detects faces and can be used to automatically set the camera focus. In the same direction, the new digital cameras models from Sony come with a smile shutter algorithm, which automatically captures a picture when the subject smiles [9]. Considering face classification research, many published papers focus on biometric recognition[4][5] to determine to which subject a given face belongs. Even though we use similar techniques, our main interest is to classify pictures into qualitative categories such as good or bad, attractive or unattractive. In this direction, similar work can be seen in the area of affective computing, an application of pattern recognition introduced by Picard [8]. The recognition of facial expressions brings many contributions to the face classification area. Pantic and Rothkrantz[7] provide an overview over the branch of facial expression analysis. Contributions to the qualitative classification of portraits were introduced in [10] with the concept of identifying neutral faces. The main motivation behind identifying neutral faces is to augment accuracy of an arbitrary authentication algorithm by feeding it with a neutral face. Moreover, an impressive method for portrait images processing is presented in [6]. In contrast to a pattern recognition approach, the method automatically increases the predicted attractiveness rating of the face image. However, their method actually distorts the face image, and beautification comes at cost of identity loss. In the area of qualitative classification of face images, Albuquerque et al. [1] propose a qualitative portrait classification, where a set of portrait pictures are automatically classified in good or bad. In this paper, we present an improvement and application of the qualitative portrait classifier. Our application is able to select the best portraits directly from a video sequence, reducing acquisition time. 3 Portrait Classification Algorithm In this section we give a short summary of the portrait classification algorithm that is at the core of our application. A more detailed description of the method can be found in [1]. The diagram in Figure 2 depicts how the classification algorithm works. As input, the algorithm receives pictures containing a single person photographed in a frontal pose. The images are inspected and the algorithm determines whether 1

Transcript of The Good, the Bad and the Ugly: Attractive Portraits … · The Good, the Bad and the Ugly:...

Page 1: The Good, the Bad and the Ugly: Attractive Portraits … · The Good, the Bad and the Ugly: Attractive Portraits from Video Sequences Georgia Albuquerque, Timo Stich, Anita Sellent,

The Good, the Bad and the Ugly: Attractive Portraits from VideoSequences

Georgia Albuquerque, Timo Stich, Anita Sellent, Marcus Magnor

Computer Graphics Lab, TU BraunschweigMuhlenpfordtstr. 23, 38106 Braunschweig, Germany

Email:{georgia,stich,sellent,magnor}@cg.tu-bs.de

Abstract

Taking a portrait picture is frequently an uncomfortable task forthe photographed person, and the resulting shots are in manycases unnatural and unsatisfactory. A common solution is totake a large number of pictures and then select the favoriteones. Instead of a manual selection, an automatic selectionof good portraits from a captured video sequence can increasethe chances to obtain a pleasing portrait picture without mucheffort. In this paper, we present an automatic solution forclassification of face images from video sequences. This,in turn, can not only ease the task of taking good portraitpictures, but it can also make the acquisition of portraits morecomfortable.

1 Introduction

In recent years, thanks to the advances in high-quality digitalvideo, it has become possible to consider individual framesof a video sequence as single photographs with acceptablequality (see Figure1). We present an application of the workfirst proposed in [1] by extracting and selecting good portraitsfrom video sequences. In [1], work is presented on qualitativeclassification of portraits that automatically classifies a givenset of portraits as good or bad. In contrast to the traditionalidea of aesthetics in photography, where it is defined as the waypeople usually characterize beauty in this form of art [2], thepresented application can learn the user’s taste. It shall not onlybe able to select attractive shots, but to select good shots fromany subject, even if the subject cannot be considered beautifulin the aesthetic sense.

There are many advantages in extracting the pictures directlyfrom a video sequence; it allows the photographed subjectto behave more naturally, without habitual annoying cameraclicks. Furthermore, using video sequences instead of singlecamera shots can save time during acquisition because,considering the same time interval, the amount of framesobtained by video acquisition is larger than the number ofshots from a common photo-camera.

2 Related Work

Recently, in the industrial sector, a wide range of classificationapplications has been implemented and embedded in ordinarydigital cameras. Face detection algorithms, for example, areincluded in diverse digital cameras of different manufacturers.

The Image Intelligence[3] technology from Fujifilm detectsfaces and can be used to automatically set the camera focus.In the same direction, the new digital cameras models fromSony come with a smile shutter algorithm, which automaticallycaptures a picture when the subject smiles [9].

Considering face classification research, many publishedpapers focus on biometric recognition[4][5] to determine towhich subject a given face belongs. Even though we usesimilar techniques, our main interest is to classify picturesinto qualitative categories such as good or bad, attractive orunattractive.

In this direction, similar work can be seen in the area ofaffective computing, an application of pattern recognitionintroduced by Picard [8]. The recognition of facial expressionsbrings many contributions to the face classification area.Pantic and Rothkrantz[7] provide an overview over the branchof facial expression analysis. Contributions to the qualitativeclassification of portraits were introduced in [10] with theconcept of identifying neutral faces. The main motivationbehind identifying neutral faces is to augment accuracy ofan arbitrary authentication algorithm by feeding it with aneutral face. Moreover, an impressive method for portraitimages processing is presented in [6]. In contrast to a patternrecognition approach, the method automatically increases thepredicted attractiveness rating of the face image. However,their method actually distorts the face image, and beautificationcomes at cost of identity loss.

In the area of qualitative classification of face images,Albuquerque et al. [1] propose a qualitative portraitclassification, where a set of portrait pictures are automaticallyclassified in good or bad. In this paper, we present animprovement and application of the qualitative portraitclassifier. Our application is able to select the best portraitsdirectly from a video sequence, reducing acquisition time.

3 Portrait Classification Algorithm

In this section we give a short summary of the portraitclassification algorithm that is at the core of our application.A more detailed description of the method can be found in[1]. The diagram in Figure 2 depicts how the classificationalgorithm works. As input, the algorithm receives picturescontaining a single person photographed in a frontal pose. Theimages are inspected and the algorithm determines whether

1

Page 2: The Good, the Bad and the Ugly: Attractive Portraits … · The Good, the Bad and the Ugly: Attractive Portraits from Video Sequences Georgia Albuquerque, Timo Stich, Anita Sellent,

Figure 1: Samples of frames extracted from a video sequence and the portraits that were automatically selected by theapplication.

Figure 2: Classification Algorithm.

the picture is a good shot or not. The whole algorithmcan be divided into three main parts. The first stage of thealgorithm is the detection module. Given a complete portraitpicture as input, the detection module detects first the faceregion in the picture. After the face is selected, the detectionmodule extracts eyes and mouth regions.The detection moduleemploys an AdaBoost implementation proposed by Viola andJones in [13].

Once the eyes and mouths are detected, the next step is toextract the features from these regions. The feature extractionmodule is based on Principal Component Analysis (PCA)[11], which is a very common approach to find robust featuredescriptions in images.

Finally, the previously extracted features are used to representthe regions (eyes or mouth) during the classification. In thisstep, each of the three regions are classified separately ingood or bad. The adopted classification approach is a SupportVector Machine (SVM) [12]. After the classifier is trained withsamples of good eyes/mouth and bad eyes/mouth, it is able tocategorize the individual faces. The labeling criteria for eyesand mouths follow a simple rule: An eye is labeled bad if it is

closed or looking to the right or left. Otherwise, it is labeledgood. A mouth is labeled bad if it is open, if it makes somemovement to speak, or if the smile is too large. Otherwise, it islabeled good.

The final classification of a portrait is a combination of theindividual classifications of eyes and mouth. A portrait isconsidered good only if all three parts are considered good bythe classification module. Else, if one part is considered bador failed to be detected by the detection module, the portrait isclassified as unattractive.

4 Video Application

The main purpose of this work is to extract portrait picturesdirectly from video sequences, instead of classifying a set ofsingle shots. The input video contains a single person, in afrontal position, but not necessarily looking straight into thevideo camera.

The application processes the input video frame by frame;each frame is inspected by the portrait classification algorithm.The algorithm detects and classifies the eye and mouth regionsindividually. If the eyes and the mouth are classified as good,the frame is selected and presented to the user in portrait form.Otherwise the frame is discarded.

Additionally, we included some position constraints for thedetection module described in the previous section. Falsedetections, which occur if regions that do not contain an eyeor mouth are detected as one of those objects, were reducedexpressively by the addition of simple constraints. Accordingto traditional rules of proportion, dividing the face into sixequal-sized rectangles, 2×3 (see Fig.3), we can determine that

Page 3: The Good, the Bad and the Ugly: Attractive Portraits … · The Good, the Bad and the Ugly: Attractive Portraits from Video Sequences Georgia Albuquerque, Timo Stich, Anita Sellent,

Figure 3: dividing the face into six equal-sized rectangles,two by three

the eyes shall be positioned in the central horizontal division,one on each side, and the mouth shall sit roughly at the centreof the lower third horizontal division. We apply the constraintsto the regions originally selected by the detection module; theregions that do not fit these constraints are discarded. Thefiltering happens before the feature extraction module andclassification to avoid a misclassification of the whole face.

5 Experiments and Results

In order to evaluate our application, we have tested it withcommon MPEG videos from different people in differentsituations, i.e. not only controlled environments but alsowith noisy background. To produce the MPEG videos weused a high resolution (1440 × 1080) HDV Canon XHA1camcorder recording at 25 Hz mounted on a standard tripod.The video sequences include only one person, which movesthe head freely and the acquisitions were taken at differentdays under different light conditions and backgrounds. Theapplication was able to select good portrait samples, despitepicture artefacts that come with the compressed video format.Figure 4 shows the faces from the frames that have beenclassified as good portraits.

The algorithm can effectively detect and classify faces inpictures where a face element, i.e. eyes and mouth, coversat least a 30 × 15 pixels area. We define 30 × 15 pixels asthe minimal area for a face element. Processing pictures withlarger areas do not increase or decrease the detection rate, andthe selected regions are scaled before feature extraction andclassification procedures. The detected faces in our experimentcovered a region about 300× 300 pixels.

6 Conclusions and Future Work

In this work, we proposed an application for qualitative portraitclassification [1] that allows classifying and selecting goodportraits directly from video sequences. The implementedapplication was able to select good portrait pictures fromvideo sequences in MPEG format. Good portraits are definedas frames where the subject has both eyes open and looksstraight into the camera, while the mouth compose a neutralconfiguration or a light smile. But during the video acquisition,the subject does not have to stay the entire time in a frontalposition, looking straight to the video camera. He may behavenaturally, because the application selects just the good portraits.

Our application may also be embedded in ordinary digitalcameras. Since a good shot does not only depend on a goodmouth pose (the case of the smile detection [Son]), our morecomplete approach may increase the chances of taking goodportraits, because it analyzes not only mouth but also the eyesof the subject. Once the application detects a good portrait, itmay indicate to the photographer a chance for a good shot, oreven automatically closes the camera shutter.

References

[1] Georgia Albuquerque, Timo Stich, and Marcus Magnor.Qualitative Portrait Classification. Proc. Vision, Modeling, andVisualization (VMV’07), pages 243–252, 11 2007.

[2] Ritendra Datta, Dhiraj Joshi, Jia Li, and James ZeWang. Studying aesthetics in photographic images using acomputational approach. In ECCV (3), pages 288–301, 2006.

[3] FujiFilm. Image intelligence. Website.http://www.fujifilm.com/image intelligence/.

[4] Anil K. Jain, Ruud Bolle, and Sharath Pankanti. Biometrics:Personal Identification in Networked Society. Kluwer AcademicPublishers, Norwell, MA, USA, 1998.

[5] S. Y. Kung, M. W. Mak, and S. H. Lin. Biometric Authentication:A Machine Learning Approach. Prentice Hall, 2004.

[6] Tommer Leyvand, Daniel Cohen-Or, Gideon Dror, and DaniLischinski. Digital face beautification. In SIGGRAPH ’06: ACMSIGGRAPH 2006 Sketches, page 169, New York, NY, USA,2006. ACM Press.

[7] Maja Pantic and Leon J. M. Rothkrantz. Automatic analysis offacial expressions: The state of the art. IEEE Transactions onPattern Analysis and Machine Intelligence, 22(12):1424–1445,2000.

[8] Rosalind W. Picard. Affective Computing. MIT Press, 1997.

[9] Sony. New digital cameras with smile shutter. Website.http://sonystyle.ca/html/multimedia/DSC/T200 micrositehtml/feature 2.html.

[10] Yingli Tian and Rudolf M. Bulle. Automatic detecting neutralface for face authentication. In AAAI-03 Spring Symposium onIntelligent Multimedia Knowledge Management, 2003.

[11] Matthew Turk and Alex Paul Pentland. Eigenfaces forrecognition. Journal of Cognitive Neuroscience, 3(1):71–86,1991.

[12] V. Vapnik. Statistical Learning Theory. Wiley-Interscience, NewYork, 1998.

[13] P. Viola and M. Jones. Rapid object detection using a boostedcascade of simple features. In In IEEE Conference on ComputerVision and Pattern Recognition, pages 609–615, 2001.

Page 4: The Good, the Bad and the Ugly: Attractive Portraits … · The Good, the Bad and the Ugly: Attractive Portraits from Video Sequences Georgia Albuquerque, Timo Stich, Anita Sellent,

Figure 4: Classification result examples: For three individuals, extracted frames that have been considered good portraitsby the application. The original video sequences, from where the frames were extracted, were made in a semi-controlledenvironment. They include only one person, which moves the head freely. The acquisitions were taken under different lightconditions and backgrounds.