The Good, the Bad and the Ugly: Attractive Portraits … · The Good, the Bad and the Ugly:...

Post on 01-Sep-2018

228 views 3 download

Transcript of The Good, the Bad and the Ugly: Attractive Portraits … · The Good, the Bad and the Ugly:...

The Good, the Bad and the Ugly: Attractive Portraits from VideoSequences

Georgia Albuquerque, Timo Stich, Anita Sellent, Marcus Magnor

Computer Graphics Lab, TU BraunschweigMuhlenpfordtstr. 23, 38106 Braunschweig, Germany

Email:{georgia,stich,sellent,magnor}@cg.tu-bs.de

Abstract

Taking a portrait picture is frequently an uncomfortable task forthe photographed person, and the resulting shots are in manycases unnatural and unsatisfactory. A common solution is totake a large number of pictures and then select the favoriteones. Instead of a manual selection, an automatic selectionof good portraits from a captured video sequence can increasethe chances to obtain a pleasing portrait picture without mucheffort. In this paper, we present an automatic solution forclassification of face images from video sequences. This,in turn, can not only ease the task of taking good portraitpictures, but it can also make the acquisition of portraits morecomfortable.

1 Introduction

In recent years, thanks to the advances in high-quality digitalvideo, it has become possible to consider individual framesof a video sequence as single photographs with acceptablequality (see Figure1). We present an application of the workfirst proposed in [1] by extracting and selecting good portraitsfrom video sequences. In [1], work is presented on qualitativeclassification of portraits that automatically classifies a givenset of portraits as good or bad. In contrast to the traditionalidea of aesthetics in photography, where it is defined as the waypeople usually characterize beauty in this form of art [2], thepresented application can learn the user’s taste. It shall not onlybe able to select attractive shots, but to select good shots fromany subject, even if the subject cannot be considered beautifulin the aesthetic sense.

There are many advantages in extracting the pictures directlyfrom a video sequence; it allows the photographed subjectto behave more naturally, without habitual annoying cameraclicks. Furthermore, using video sequences instead of singlecamera shots can save time during acquisition because,considering the same time interval, the amount of framesobtained by video acquisition is larger than the number ofshots from a common photo-camera.

2 Related Work

Recently, in the industrial sector, a wide range of classificationapplications has been implemented and embedded in ordinarydigital cameras. Face detection algorithms, for example, areincluded in diverse digital cameras of different manufacturers.

The Image Intelligence[3] technology from Fujifilm detectsfaces and can be used to automatically set the camera focus.In the same direction, the new digital cameras models fromSony come with a smile shutter algorithm, which automaticallycaptures a picture when the subject smiles [9].

Considering face classification research, many publishedpapers focus on biometric recognition[4][5] to determine towhich subject a given face belongs. Even though we usesimilar techniques, our main interest is to classify picturesinto qualitative categories such as good or bad, attractive orunattractive.

In this direction, similar work can be seen in the area ofaffective computing, an application of pattern recognitionintroduced by Picard [8]. The recognition of facial expressionsbrings many contributions to the face classification area.Pantic and Rothkrantz[7] provide an overview over the branchof facial expression analysis. Contributions to the qualitativeclassification of portraits were introduced in [10] with theconcept of identifying neutral faces. The main motivationbehind identifying neutral faces is to augment accuracy ofan arbitrary authentication algorithm by feeding it with aneutral face. Moreover, an impressive method for portraitimages processing is presented in [6]. In contrast to a patternrecognition approach, the method automatically increases thepredicted attractiveness rating of the face image. However,their method actually distorts the face image, and beautificationcomes at cost of identity loss.

In the area of qualitative classification of face images,Albuquerque et al. [1] propose a qualitative portraitclassification, where a set of portrait pictures are automaticallyclassified in good or bad. In this paper, we present animprovement and application of the qualitative portraitclassifier. Our application is able to select the best portraitsdirectly from a video sequence, reducing acquisition time.

3 Portrait Classification Algorithm

In this section we give a short summary of the portraitclassification algorithm that is at the core of our application.A more detailed description of the method can be found in[1]. The diagram in Figure 2 depicts how the classificationalgorithm works. As input, the algorithm receives picturescontaining a single person photographed in a frontal pose. Theimages are inspected and the algorithm determines whether

1

Figure 1: Samples of frames extracted from a video sequence and the portraits that were automatically selected by theapplication.

Figure 2: Classification Algorithm.

the picture is a good shot or not. The whole algorithmcan be divided into three main parts. The first stage of thealgorithm is the detection module. Given a complete portraitpicture as input, the detection module detects first the faceregion in the picture. After the face is selected, the detectionmodule extracts eyes and mouth regions.The detection moduleemploys an AdaBoost implementation proposed by Viola andJones in [13].

Once the eyes and mouths are detected, the next step is toextract the features from these regions. The feature extractionmodule is based on Principal Component Analysis (PCA)[11], which is a very common approach to find robust featuredescriptions in images.

Finally, the previously extracted features are used to representthe regions (eyes or mouth) during the classification. In thisstep, each of the three regions are classified separately ingood or bad. The adopted classification approach is a SupportVector Machine (SVM) [12]. After the classifier is trained withsamples of good eyes/mouth and bad eyes/mouth, it is able tocategorize the individual faces. The labeling criteria for eyesand mouths follow a simple rule: An eye is labeled bad if it is

closed or looking to the right or left. Otherwise, it is labeledgood. A mouth is labeled bad if it is open, if it makes somemovement to speak, or if the smile is too large. Otherwise, it islabeled good.

The final classification of a portrait is a combination of theindividual classifications of eyes and mouth. A portrait isconsidered good only if all three parts are considered good bythe classification module. Else, if one part is considered bador failed to be detected by the detection module, the portrait isclassified as unattractive.

4 Video Application

The main purpose of this work is to extract portrait picturesdirectly from video sequences, instead of classifying a set ofsingle shots. The input video contains a single person, in afrontal position, but not necessarily looking straight into thevideo camera.

The application processes the input video frame by frame;each frame is inspected by the portrait classification algorithm.The algorithm detects and classifies the eye and mouth regionsindividually. If the eyes and the mouth are classified as good,the frame is selected and presented to the user in portrait form.Otherwise the frame is discarded.

Additionally, we included some position constraints for thedetection module described in the previous section. Falsedetections, which occur if regions that do not contain an eyeor mouth are detected as one of those objects, were reducedexpressively by the addition of simple constraints. Accordingto traditional rules of proportion, dividing the face into sixequal-sized rectangles, 2×3 (see Fig.3), we can determine that

Figure 3: dividing the face into six equal-sized rectangles,two by three

the eyes shall be positioned in the central horizontal division,one on each side, and the mouth shall sit roughly at the centreof the lower third horizontal division. We apply the constraintsto the regions originally selected by the detection module; theregions that do not fit these constraints are discarded. Thefiltering happens before the feature extraction module andclassification to avoid a misclassification of the whole face.

5 Experiments and Results

In order to evaluate our application, we have tested it withcommon MPEG videos from different people in differentsituations, i.e. not only controlled environments but alsowith noisy background. To produce the MPEG videos weused a high resolution (1440 × 1080) HDV Canon XHA1camcorder recording at 25 Hz mounted on a standard tripod.The video sequences include only one person, which movesthe head freely and the acquisitions were taken at differentdays under different light conditions and backgrounds. Theapplication was able to select good portrait samples, despitepicture artefacts that come with the compressed video format.Figure 4 shows the faces from the frames that have beenclassified as good portraits.

The algorithm can effectively detect and classify faces inpictures where a face element, i.e. eyes and mouth, coversat least a 30 × 15 pixels area. We define 30 × 15 pixels asthe minimal area for a face element. Processing pictures withlarger areas do not increase or decrease the detection rate, andthe selected regions are scaled before feature extraction andclassification procedures. The detected faces in our experimentcovered a region about 300× 300 pixels.

6 Conclusions and Future Work

In this work, we proposed an application for qualitative portraitclassification [1] that allows classifying and selecting goodportraits directly from video sequences. The implementedapplication was able to select good portrait pictures fromvideo sequences in MPEG format. Good portraits are definedas frames where the subject has both eyes open and looksstraight into the camera, while the mouth compose a neutralconfiguration or a light smile. But during the video acquisition,the subject does not have to stay the entire time in a frontalposition, looking straight to the video camera. He may behavenaturally, because the application selects just the good portraits.

Our application may also be embedded in ordinary digitalcameras. Since a good shot does not only depend on a goodmouth pose (the case of the smile detection [Son]), our morecomplete approach may increase the chances of taking goodportraits, because it analyzes not only mouth but also the eyesof the subject. Once the application detects a good portrait, itmay indicate to the photographer a chance for a good shot, oreven automatically closes the camera shutter.

References

[1] Georgia Albuquerque, Timo Stich, and Marcus Magnor.Qualitative Portrait Classification. Proc. Vision, Modeling, andVisualization (VMV’07), pages 243–252, 11 2007.

[2] Ritendra Datta, Dhiraj Joshi, Jia Li, and James ZeWang. Studying aesthetics in photographic images using acomputational approach. In ECCV (3), pages 288–301, 2006.

[3] FujiFilm. Image intelligence. Website.http://www.fujifilm.com/image intelligence/.

[4] Anil K. Jain, Ruud Bolle, and Sharath Pankanti. Biometrics:Personal Identification in Networked Society. Kluwer AcademicPublishers, Norwell, MA, USA, 1998.

[5] S. Y. Kung, M. W. Mak, and S. H. Lin. Biometric Authentication:A Machine Learning Approach. Prentice Hall, 2004.

[6] Tommer Leyvand, Daniel Cohen-Or, Gideon Dror, and DaniLischinski. Digital face beautification. In SIGGRAPH ’06: ACMSIGGRAPH 2006 Sketches, page 169, New York, NY, USA,2006. ACM Press.

[7] Maja Pantic and Leon J. M. Rothkrantz. Automatic analysis offacial expressions: The state of the art. IEEE Transactions onPattern Analysis and Machine Intelligence, 22(12):1424–1445,2000.

[8] Rosalind W. Picard. Affective Computing. MIT Press, 1997.

[9] Sony. New digital cameras with smile shutter. Website.http://sonystyle.ca/html/multimedia/DSC/T200 micrositehtml/feature 2.html.

[10] Yingli Tian and Rudolf M. Bulle. Automatic detecting neutralface for face authentication. In AAAI-03 Spring Symposium onIntelligent Multimedia Knowledge Management, 2003.

[11] Matthew Turk and Alex Paul Pentland. Eigenfaces forrecognition. Journal of Cognitive Neuroscience, 3(1):71–86,1991.

[12] V. Vapnik. Statistical Learning Theory. Wiley-Interscience, NewYork, 1998.

[13] P. Viola and M. Jones. Rapid object detection using a boostedcascade of simple features. In In IEEE Conference on ComputerVision and Pattern Recognition, pages 609–615, 2001.

Figure 4: Classification result examples: For three individuals, extracted frames that have been considered good portraitsby the application. The original video sequences, from where the frames were extracted, were made in a semi-controlledenvironment. They include only one person, which moves the head freely. The acquisitions were taken under different lightconditions and backgrounds.