Embeddings and Uncertainty · 2019-10-07 · Embeddings and Uncertainty Oct. 4, 2019 Andrew...

Post on 15-Jul-2020

2 views 0 download

Transcript of Embeddings and Uncertainty · 2019-10-07 · Embeddings and Uncertainty Oct. 4, 2019 Andrew...

Embeddings and UncertaintyOct. 4, 2019

Andrew Gallagher agallagher@google.com

Seong Joon Oh, Joseph Roth, Jiyan Pan, Florian Schroff, Kevin Murphy, Mike Mozer, Caroline Pantofaru, Michael Nechyba, Cusuh Ham, Zhou Zhou, Abinash Behera

Andrew Gallagher

longing for certainty ... is in every human mind. But certainty generally is illusion.

Oliver Wendell Holmes

Bertrand Russell

Andrew Gallagher

Would you recognize this person if you needed to?

Andrew Gallagher

What about one of these?

Bulthoff and Edelman, Psychophysical support for a two-dimensional view interpolation theory of object recognition, 1992. (link)

Andrew Gallagher

How about this?

Bulthoff, Object Recognition in Man and Machine (link)

Andrew Gallagher

How about this?

Bulthoff, Object Recognition in Man and Machine (link)

Andrew Gallagher

Introduction

Embeddings

Uncertainty Representations (2)

Discussion

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Introduction

Embeddings

Uncertainty Representations (2)

Discussion

Andrew Gallagher WNYIPW2019 Google

Anchor

Positive

Negative

Anchor

Positive

NegativeLearn

Margin

FaceNet

slide content: F. Schroff and D. Kalenichenko

Encourages a margin between the ||anchor - positive|| and ||anchor - negative||.

F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: A Unified Embedding for Face Recognition and Clustering https://arxiv.org/abs/1503.03832

Anchor

Positive

Negative

Anchor

Positive

NegativeLearn

Margin

FaceNet

slide content: F. Schroff and D. Kalenichenko

Training

Training

?

Testing

Testing

Andrew Gallagher

Introduction

Embeddings

Uncertainty Representations (1)

Discussion

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Are two images the same class or not?

We typically employ:

Seong Joon Oh, Kevin Murphy, Jiyan Pan, Joseph Roth, Florian Schroff, Andrew Gallagher, Modeling Uncertainty with Hedged Instance Embedding, https://arxiv.org/abs/1810.00319

Andrew Gallagher

Commonly done: Point embedding.

true class: 17

true class: 47

Andrew Gallagher

Commonly done: Point embedding.

true class: 17

true class: 47

class: ?

Andrew Gallagher

Commonly done: Point embedding.

true class: 17

true class: 47

class: 47?

Andrew Gallagher

Commonly done: Point embedding.

true class: 17

true class: 47

class: ?

Andrew Gallagher

Commonly done: Point embedding.

true class: 17

true class: 47

class: 17?

Andrew Gallagher

Commonly done: Point embedding.

true class: 17

true class: 47

class: ?

?

Andrew Gallagher

Commonly done: Point embedding.

true class: 17

true class: 47

class: ?

?

Andrew Gallagher

Commonly done: Point embedding.

true class: 17

true class: 47

class: ?

?

Andrew Gallagher

Hedged Instance Embeddings

true class: 17

true class: 47

class: ?

Andrew Gallagher

true class: 17

true class: 47

class: ?

Andrew Gallagher

Metric embedding as distributions.Formulate as discriminative task: given pair, predict match.

Introduce embedding as latent variable:

Embedding distributions must be:parameterizablesampleable

Andrew Gallagher

Match probability estimation.

Andrew Gallagher

Computing

Andrew Gallagher

Computing

Andrew Gallagher

Computing

Andrew Gallagher

Computing

Andrew Gallagher

sigmoid matchlikelihood

Computing

Andrew Gallagher

Computing

sigmoid matchlikelihood

Scratch Demo

Andrew Gallagher

End-to-end differentiable with reparametrization trick.

Shared params

Andrew Gallagher

MoG embedding with multiple samples case.

Shared params

Andrew Gallagher

Training objective - Variational Information Bottleneck

Derived from the Variational Information Bottleneck [from Bayesflow team]:

Log likelihood term: Binary cross-entropy loss for match prediction.

KL term: Controls the compression level for z. Unit Gaussian.

Andrew Gallagher

We desire a measure that, given input , one can guess its performance on downstream tasks (e.g., verification, recognition).

Uncertainty measure: Self-mismatch.

Andrew Gallagher

uncertain certainsomewhat uncertain

MoG-1 MoG-1MoG-2

Uncertainty measure: Self-mismatch.We desire a measure that, given input , one can guess its performance on downstream tasks (e.g., verification, recognition).

Andrew Gallagher

Dataset and Experiments

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

N-digit MNIST Dataset [open source!]

Andrew Gallagher

Experimental setup.

Andrew Gallagher

Clean images Corrupt images

2 Digit MNIST → 2 dimension Hedged Instance Embeddings (MoG-1)

Andrew Gallagher

Clean images Corrupt images

3 Digit MNIST → 3 dimension Hedged Instance Embeddings (MoG-1)

Andrew Gallagher

2 Digit → 2 Dim

Hedged Instance Embeddings (MoG-1)

Andrew Gallagher

Task Performance

Andrew Gallagher

Task Performance

Andrew Gallagher

Most certain Least certain

Andrew Gallagher

Uncertainty Measure

Andrew Gallagher

PetNet with Uncertainty

20-dim embedding0.25 MobileNet

High Certainty Samples Low Certainty Samples

Andrew Gallagher

Introduction

Embeddings

Uncertainty Representations (2)

Discussion

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Deep Convolutional Neural Network Features and the Original Image

Andrew Gallagher

Why is L2-Norm Useful?

Andrew Gallagher

Probabilistic Face Embeddings, Shi & Jain, 2019, https://arxiv.org/abs/1904.09658.

Fashion MNIST

MNIST

Datasets

Architecture50

D

10normalization

D=3, normalized, cross-entropy loss

D=4, normalized, triplet loss

Andrew Gallagher

Staying close to home ...

Andrew GallagherFa

ceN

et e

mbe

ddin

g

penu

ltim

ate

embe

ddin

g

feature extraction

norm

aliz

e

L2

sigmoid embedding confidence

Embedding

Andrew GallagherFa

ceN

et e

mbe

ddin

g

penu

ltim

ate

embe

ddin

g

Embedding

feature extraction

norm

aliz

e

L2

sigmoid embedding confidence

Distilled Embedding Confidence

distilled embedding confidence

Andrew Gallagher

Recognizable Not Recognizable0.123 0.0860.1640.2700.3200.3980.6180.630 0.4570.8650.935

Andrew Gallagher

Introduction

Embeddings

Uncertainty Representations (2)

Discussion

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Real-world applications

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Andrew Gallagher

A boring image.

Andrew Gallagher

A corrupted image.

Andrew Gallagher

ApplicationsGallery Selection: Given a potentially large collection of images or a video. Select a subset of images/frames that will most reliably identify the individual.

Computational Efficiency: Prevent running the expensive models on images with high uncertainty.

Tracking: Use as the confidence score.

Offline Face Clustering: Confidence weighted cluster similarity instead of top-N or other heuristic based approaches.

Andrew Gallagher

Open Questions

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Open Questions

How best to quantify and measure uncertainty?

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Open Questions

Tell us when you don’t know.

But only when absolutely necessary.

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Open Questions

How best to compute, index, and match uncertain embeddings (at scale)?

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Open Questions

What network architectures encourage learning uncertainty?

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Open Questions

Open world (epistemic) uncertainty.

Andrew Gallagher WNYIPW2019 Google

Andrew Gallagher

Thank you

Andrew Gallagher WNYIPW2019 Google