A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet...
Transcript of A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet...
![Page 1: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/1.jpg)
Omkar M Parkhi, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman
A Compact and Discriminative!Face Track Descriptor
![Page 2: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/2.jpg)
Recognising and verifying faces in videos 2
Recognition Verification
same
different
![Page 3: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/3.jpg)
VF2: a new compact face track descriptor
▶Discriminative!▶Useful for different tasks (Recognition, Verification)!▶Extremely compact
3
Face track: sequence of face detections in consecutive frames.
face track descriptor
![Page 4: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/4.jpg)
Large scale face retrieval
▶ Example of a typical target dataset!▶ 5 years of evening news programs!▶ 10,000 hrs of broadcast !▶ 20 Million frames, !▶ 2.1 Million face tracks!▶ Real time performance
4
http://www.robots.ox.ac.uk/~vgg/research/on-the-fly/
▶ 30 frames per track on average!▶ Typical 4000D descriptor → 1 TB!▶ Our descriptor → 270 MB!
![Page 5: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/5.jpg)
d2W (x, y)
[011001010]
Outline
1. Dense feature computation!
2. Fisher Vector encoding!
3. Video and jittered pooling!
4. Compression by metric learning!
5. Binarisation!
6. Results
5
![Page 6: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/6.jpg)
1. Dense feature computation
▶ Input: a face track!▶ Aligned or unaligned!▶ No facial landmarks required (eyes, nose, etc.)!
▶ Output: a set of local features !▶ Extracted from all frames!▶ Dense RootSIFT at multiple scales!▶ 64-D PCA
6
![Page 7: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/7.jpg)
d2W (x, y)
[011001010]
Outline
1. Dense feature computation!
2. Fisher Vector encoding!
3. Video and jittered pooling!
4. Compression by metric learning!
5. Binarisation!
6. Results
7
![Page 8: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/8.jpg)
GMM
[Perronnin et al. ECCV 2012]
2. Fisher Vector encoding 8
first and second order statistics
FV encoding+ sqrt-L2
normalisation
� =
2
6666666664
v1u1v2u2...vKuK
3
7777777775 uk =1
Mp2⇡k
MX
i=1
�k(xi )
✓xi � µk
�i� 1
◆2
vk =1
Mp⇡k
MX
i=1
�k(xi )xi � µk
�i
xi
Gaussians!(μk,Σk)
µk
xi�k(xi )
AssignmentHard
Dense SIFT
![Page 9: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/9.jpg)
Gaussian components as part detectors
2. Fisher Vector Encoding 9
2
66664 xW � 1
2yH � 1
2
3
77775
Spatial (x,y) Augmentation
![Page 10: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/10.jpg)
d2W (x, y)
[011001010]
Outline
1. Dense feature computation!
2. Fisher Vector encoding!
3. Video and jittered pooling!
4. Compression by metric learning!
5. Binarisation!
6. Results
10
![Page 11: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/11.jpg)
3. Video and jittered pooling
▶ Typically each frame is pooled independently!▶ Complex inference procedures combining multiple descriptors!▶ Large memory footprint
11
[Sivic et al. CVPR 09, Everingham et. al IVC 09,, Wolf et al. CVPR 2011]
![Page 12: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/12.jpg)
3. Video and jittered pooling
▶ Single descriptor per track!▶ Smaller memory footprint!▶ Easy to use!▶ Improved performance
12
[Application to Action Recognition: Oneata, Verbeek, Schmid ICCV 2013]
![Page 13: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/13.jpg)
3. Video and jittered pooling
▶ Data augmentation!▶ Data augmentation without training set increase!▶ Improvement in the performance
13
[Paulin et al. CVPR 2014]
![Page 14: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/14.jpg)
d2W (x, y)
[011001010]
Outline
1. Dense feature computation!
2. Fisher Vector encoding!
3. Video and jittered pooling!
4. Compression by metric learning!
5. Binarisation!
6. Results
14
![Page 15: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/15.jpg)
Learn to discriminate faces
4. Metric Learning 15
d2W (x, y) = kW x�W yk2 < b
same person
d2W (u, v) = kWu�W vk2 > b
x y uv
different people
d2W (x, y) = kW x�W yk2
[Simonyan, Parkhi, Vedaldi, Zisserman BMVC 2013]
W
x
=
learnt projectionFisherVector
z
![Page 16: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/16.jpg)
d2W (x, y)
[011001010]
Outline
1. Dense feature computation!
2. Fisher Vector encoding!
3. Video and jittered pooling!
4. Compression by metric learning!
5. Binarisation!
6. Results
16
![Page 17: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/17.jpg)
Parseval Tight Frame
5. Binarisation
▶ Low-dimensional real-valued descriptor → high dimensional binary !▶ 4x decrease in memory footprint (128D real → 1024D binary)!▶ Fast distance computation!▶ Alternative binarisation methods could be used
17
q ⨉ m!Columns from a
random rotation matrix
sign= q
q bits only
0101010
[Jégou et al. ICASSP 2012, Simonyan et al. PAMI 2014]
real-valued descriptor
m⨉
U z sign(Uz)zU
q
![Page 18: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/18.jpg)
d2W (x, y)
[011001010]
Outline
1. Dense feature computation!
2. Fisher Vector encoding!
3. Video and jittered pooling!
4. Compression by metric learning!
5. Binarisation!
6. Results
18
![Page 19: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/19.jpg)
Face Verification
YouTube Faces Dataset 19
▶ Face verification in videos!▶ 3,425 videos of 1,595 celebrities!▶ Videos collected from internet!▶ Wide pose, expression and illumination variation!▶ 10 splits of 600 pairs of videos!
▶ Restricted setting: Use provided pairs!▶ Unrestricted setting: Free to form own pairs.
differentsame
[Wolf, Hassner, Moaz CVPR 2011]
![Page 20: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/20.jpg)
Image Pool (Soft assignment FV)
Video Pool (Soft assignment FV)
Video Pool hard asignment fv
Video Pool + Jittered Pool
Video Pool. + Binar. 1024 bit + jitt.
Video Pool. + Joint sim. + jitt.
Error0 4.5 9 13.5 18
12.3
13.4
14.2
16.2
15
17.3
Face Verification
YouTube Faces Dataset 20
![Page 21: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/21.jpg)
Face Verification
YouTube Faces Dataset 21
MGBS & SVM-APEM FUSION
STFRD & PMMLVSOF & OSS (Adaboost)
DDML (Combined)VF 1024D (binary)
VF 256DDeep Face (facebook.com)
Error0 5.5 11 16.5 22
8.612.3
13.418.5
2019.9
21.421.2
Requires additional training data.
2
2
![Page 22: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/22.jpg)
Weakly supervised face classification
Oxford Buffy Dataset 22
▶ “Buffy The Vampire Slayer”!▶ Face tracks from 7 episodes of season 5.!▶ Both frontal and profile detections!▶ Weak supervision from transcript and subtitles!▶ Multi Class classification for every episode
[Everingham et al. IVC 2009, Sivic et al. CVPR 2009]
![Page 23: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/23.jpg)
Weakly supervised classification
Oxford Buffy Dataset 23
Sivic et al. (HOG RBF MKL)
VF ( GMMs trained on Buffy )
VF ( GMMs trained on YTF )
VF ( GMMs trained on YTF ) + Jitt. Pool 1024D
VF ( GMMs trained on YTF 2048b)
Avg. AP0.79 0.808 0.825 0.843 0.86
0.82
0.86
0.8
0.81
0.812
2
2
2
![Page 24: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks](https://reader034.fdocuments.in/reader034/viewer/2022050310/5f71ba40771d6316b63ec35b/html5/thumbnails/24.jpg)
Very simple yet powerful face track descriptor!
Recap 24
▶Track descriptor in 128 bytes!▶Face landmarks and alignment not required!▶One descriptor per track!!
▶State of the art/comparable results on multiple tasks!▶YouTube Faces Dataset!▶Oxford Buffy Dataset!
!▶ Can be trained with very small amount of data!▶ Extremely easy to compute!
!▶ Code online soon.
Questions?