VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra...

32
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1

Transcript of VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra...

Page 1: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

VIP: Finding Important People in ImagesClint Solomon MathialaganAndrew C. GallagherDhruv Batra

CVPR 2015

1

Page 2: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Outline• Introduction • Approach• Results• Importance vs Saliency• Application: Improving Im2Text• Conclusions

2

Page 3: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Introduction • Project: https://computing.ece.vt.edu/~mclint/vip/

• Demo: http://cloudcv.org/vip/

3

Page 4: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Introduction • The goal of this paper is to automatically predict the

importance of individuals in group photographs.

4

Page 5: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Introduction• Who are most important individuals in these pictures?

• Humans have a remarkable ability to understand social roles and identify important players, even without knowing identities of the people in the images.

5

Page 6: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Introduction • What is Importance?1. the photographer2. the subjects3. neutral third-party human observers

• In this work, we rely on the wisdom of the crowd to estimate the “ground-truth” importance of a person in an image.

6

Page 7: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Introduction• ApplicationsIm2textPhoto cropping algorithmsSocial networking sites and image search applications

• ContributionsWe learn a model for predicting importance of individuals in

photos.We collect two importance datasets.We show that we can automatically predict the importance of

people with high accuracy, and incorporating this predicted importance in applications 7

Page 8: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach• Framework

ModelPerson Features

Dataset

Distance FeaturesScaleSharpnessFace Pose FeaturesFace Occlusion

M(pi , pj ) ≈ Si − Sj

Image-Level

Corpus-Level

8

Page 9: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach• We model importance in two ways

1. Image-Level Importance: “Given an image, who is the most important individual?”

2. Corpus-Level Importance: “Given multiple images, in which image is a specific person most important?”

9

Page 10: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(1)-Dataset Collection• Image-Level DatasetIn this setting, we need a dataset of images containing at least three people with varying levels of importance. Flickr

• Corpus-Level DatasetIn this setting, we need a dataset that has multiple pictures of the same person; and multiple sets of such photos. TV series (‘Big Bang Theory’)

10

Page 11: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(2)-Importance Annotation

Annotation Interfaces used with MTurk

Image-Level Importance Annotation: Hovering over a button (A or B) highlights the person associated with it.

Corpus-Level Importance Annotation: Hovering over a frame shows the where the person is located in the frame.

11

Page 12: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(2)-Importance Annotation • (pi , pj ): each annotated pair of faces • si , sj : the relative importance scores (0 ~+1)• Note that si and sj are not absolute, as they are not calibrated

for comparison to another person, say pk from another pair.

12

Page 13: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(2)-Importance Annotation • The table shows a breakdown of both datasets along the

magnitude of differences in importance.

13

Page 14: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(3)-Importance Model • The objective is to build a model M that regresses to the

difference in ground truth importance score:

• We use a linear model: : the features extracted for this pair : the regressor weightsWe use ν-Support Vector Regression to learn these weights.

14

Page 15: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(3)-Importance Model • We compared two ways of composing these individual face

features:

Using difference of features

Concatenating the two individual features

15

Page 16: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(4)-Person Features• Distance Features

We first scale the image to a size of (1, 1), and compute two distance features

1. Distance from center2. Weighted distance from center

We compute two more features to capture how far a person is from the center of a group

1. Normalized distance from centroid2. Normalized distance from weighted centroid

d

0.50.5 𝒅

the   largest   dimension  of   the   face  box1

2

3 𝑥𝑐𝑚=𝑚1 𝑥1+𝑚2𝑥2+𝑚3𝑥3

𝑚1+𝑚2+𝑚3 ❑

𝑦 𝑐𝑚=𝑚1𝑦 1+𝑚2 𝑦2+𝑚3 𝑦3

𝑚1+𝑚2+𝑚3 ❑

the weighted average of center points of faces

the  weight   of   a   face=  the  area  of   the   headthe   total   area   of   faces   in   the   image

16

Page 17: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(4)-Person Features• Scale

• SharpnessSobel filterCompute the sum of the gradient energy in a face bounding

box, normalized by the sum of the gradient energy in all the bounding boxes in the image.

17

Page 18: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(4)-Person Features• Face Pose Features

DPM face pose features-We resize the face bounding box patch from the image to 128×128 pixels-Run the face pose and landmark estimation algorithm of Zhu et al. [28]. -Our pose feature is this component id, which can range from 1 to 13.

[28] X. Zhu and D. Ramanan. Face detection, pose estimation and landmark localization in the wild. In CVPR, 2012.

18

Page 19: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(4)-Person Features• Face Pose Features

Aspect ratioWhile the aspect ratio of a face is typically 1:1, this ratio can differentiate between some head poses.

DPM face pose differenceWe compute the pose of the person subtracted by the average pose of every other person in the image.

19

Page 20: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Approach(4)-Person Features• Face OcclusionDPM face scoreswe use scores for each the 13 components in the face detection model of [28] as a feature.

Face detection successThis is a binary feature indicating whether the face detection API [22] we used was successful in detection the face, or whether it required human annotation.The API achieved a nearly zero false positive rate on our dataset.

[28] X. Zhu and D. Ramanan. Face detection, pose estimation and landmark localization in the wild. In CVPR, 2012.[22] SkyBiometry. https://www.skybiometry.com/.

20

Page 21: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Results• BaselinesWe compare our proposed approach to three natural

baselines: center, scale, and sharpness baselines.We used the method of Harel et al. [10, 12] to produce

saliency maps and computed the fraction of saliency intensities inside each face as a measure of its importance.

We measure inter-human agreement in a leave-one-humanout manner.

[10] J. Harel. A saliency implementation in matlab. http://www.klab.caltech.edu/ harel/share/gbvs.php.[12] J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. In Advances in NIPS, 2006.

21

Page 22: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Results• MetricsWe use mean squared error to measure the performance of

our relative importance regressors.In addition,we convert the regressor output into binary

classification by thresholding against zero.For each pair of faces (pi , pj ), we use a weighted classification

accuracy measure, where the weight is the ground-truth importance score of the more important of the two, i.e. max{si , sj}.

22

Page 23: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Results• Image-Level Importance Results

Overall, we achieve an improvement of 3.17% (3.54% relative improvement). The mean squared error for our SVR is 0.1489.

the best baseline

23

Page 24: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Results• Image-Level Importance ResultsTable 4 show a break-down of the accuracies into the three

categories of annotations.

>

24

Page 25: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Results• Corpus-Level Importance Results

the best baseline

25

Page 26: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Results• Corpus-Level Importance ResultsTable 6 shows the category breakdown.

26

Page 27: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Results• Image-Level and Corpus-LevelFig. 4 shows some qualitative results for image experiments

and corpus experiments.

27

Page 28: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Results• Image-Level and Corpus-LevelTable 7 reports results from an ablation study, which shows

the impact of the features on the final performance.

28

Page 29: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Importance vs Saliency• We measured the correlation between importance and

saliency rankings using Kendall’s Tau. The Kendall’s Tau was 0.5256. The most salient face was also the most important person in 52.56% of the cases.

• Fig. 5 shows qualitative examples of individuals who are judged by humans to be salient but not important, important but not salient, both salient and important, and neither. 29

Page 30: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Application: Improving Im2Text

Setup Prediction Results

30

Page 31: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

Conclusions• We proposed the task of automatically predicting the

importance of individuals in group photographs.

• Compared to previous work in visual saliency, the proposed person importance is correlated but not identical.

• We showed that our method can successfully predict the importance of people from purely visual cues, and incorporating predicted importance provides significant improvement in im2text.

31

Page 32: VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR 2015 1.

References• Narrow depth-of- field https://goo.gl/EfxN2Q• Sobel filter http://goo.gl/BmBCx9

32