Learning visual representations for unfamiliar environments

28
Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI

description

Learning visual representations for unfamiliar environments. Kate Saenko , Brian Kulis , Trevor Darrell UC Berkeley EECS & ICSI. The challenge of large scale visual interaction. ?. Last decade has proven the superiority of models learned from data vs. hand engineered structures!. - PowerPoint PPT Presentation

Transcript of Learning visual representations for unfamiliar environments

Page 1: Learning visual  representations for unfamiliar  environments

Learning visual representations for unfamiliar

environmentsKate Saenko, Brian Kulis,

Trevor Darrell

UC Berkeley EECS & ICSI

Page 2: Learning visual  representations for unfamiliar  environments

The challenge of large scale visual interaction

?Last decade has proven the superiority of models learned from data vs. hand engineered structures!

Page 3: Learning visual  representations for unfamiliar  environments

• “Unsupervised”: Learn models from “found data”; often exploit multiple modalities (text+image)

Large-scale learning

… The Tote is the perfect example of two handbag design principles that ... The lines of this tote are incredibly sleek, but ... The semi buckles that form the handle attachments are ...

WikipediaFlickr Google

Page 4: Learning visual  representations for unfamiliar  environments

E.g., finding visual senses

4

Artifact sense: “telephone”DICTIONARY

1: (n) telephone, phone, telephone set (electronic equipment that converts sound into electrical signals that can be transmitted over distances and then converts received signals back into sounds)

2: (n) telephone, telephony (transmitting speech at a distance)

[Saenko and Darrell ’09]

Page 5: Learning visual  representations for unfamiliar  environments

• “Unsupervised”: Learn models from “found data”; often exploit multiple modalities (text+image)

• Supervised: Crowdsource labels (e.g., ImageNet)

Large-scale Learning

… The Tote is the perfect example of two handbag design principles that ... The lines of this tote are incredibly sleek, but ... The semi buckles that form the handle attachments are ...

WikipediaFlickr Google

Page 6: Learning visual  representations for unfamiliar  environments

Yet…• Even the best collection of images from the web

and strong machine learning methods can often yield poor classifiers on in-situ data!

• Supervised learning assumption: training distribution == test distribution

• Unsupervised learning assumption: joint distribution is stationary w.r.t. online world and real world

Almost never true!6

?

Page 7: Learning visual  representations for unfamiliar  environments

“What You Saw Is Not What You Get”

The models fail due to domain shift

SVM:54%NBNN:61%

SVM:20%NBNN:19%

Page 8: Learning visual  representations for unfamiliar  environments

Close-up Far-away

amazon.com Consumer imagesFLICKR CCTV

Examples of visual domain shifts

digital SLR webcam

Page 9: Learning visual  representations for unfamiliar  environments

Examples of domain shift: change in camera, feature type, dimension

digital SLR webcam

SURF

VQ to 300

SIFT

VQ to 1000

Different dimensions

Page 10: Learning visual  representations for unfamiliar  environments

Solutions?

• Do nothing (poor performance)• Collect all types of data (impossible)• Find out what changed (impractical)• Learn what changed

Page 11: Learning visual  representations for unfamiliar  environments

Prior Work on Domain Adaptation

• Pre-process the data [Daumé ’07] : replicate features to also create source- and domain-specific versions; re-train learner on new features

• SVM-based methods [Yang’07], [Jiang’08], [Duan’09], [Duan’10] : adapt SVM parameters

• Kernel mean matching [Gretton’09] : re-weight training data to match test data distribution

Page 12: Learning visual  representations for unfamiliar  environments

Our paradigm: Transform-based Domain Adaptation

Previous methods’ drawbacks

• cannot transfer learned shift to new categories

• cannot handle new features

We can do both by learning domain transformations*

Example: “green” and “blue” domains

W

* Saenko, Kulis, Fritz, and Darrell. Adapting visual category models to new domains. ECCV, 2010

Page 13: Learning visual  representations for unfamiliar  environments

Symmetric assumption fails!

Limitations of symmetric transformsSaenko et al. ECCV10

used metric learning:• symmetric transforms• same featuresHow do we learn more

general shifts?

W

Page 14: Learning visual  representations for unfamiliar  environments

Asymmetric transform (rotation)

Latest approach*: asymmetric transforms• Metric learning model

no longer applicable• We propose to learn

asymmetric transforms– Map from target to source– Handle different

dimensions

*Kulis, Saenko, and Darrell, What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms, CVPR 2011

Page 15: Learning visual  representations for unfamiliar  environments

Asymmetric transform (rotation)

W

Latest approach: asymmetric transforms• Metric learning model

no longer applicable• We propose to learn

asymmetric transforms– Map from target to source– Handle different

dimensions

Page 16: Learning visual  representations for unfamiliar  environments

Model Details

• Learn a linear transformation to map points from one domain to another– Call this transformation W– Matrices of source and target:

W

Page 17: Learning visual  representations for unfamiliar  environments

Loss Functions

Choose a point x from the source and y from the target, and consider inner product:

Should be “large” for similar objects and “small” for dissimilar objects

Page 18: Learning visual  representations for unfamiliar  environments

Loss Functions

• Input to problem includes a collection of m loss functions

• General assumption: loss functions depend on data only through inner product matrix

Page 19: Learning visual  representations for unfamiliar  environments

Regularized Objective Function

• Minimize a linear combination of sum of loss functions and a regularizer:

• We use squared Frobenius norm as a regularizer– Not restricted to this choice

Page 20: Learning visual  representations for unfamiliar  environments

The Model Has Drawbacks

• A linear transformation may be insufficient

• Cost of optimization grows as the product of the dimensionalities of the source and target data

• What to do?

Page 21: Learning visual  representations for unfamiliar  environments

Kernelization

• Main idea: run in kernel space– Use a non-linear kernel function (e.g., RBF

kernel) to learn non-linear transformations in input space

– Resulting optimization is independent of input dimensionality

– Additional assumption necessary: regularizer is a spectral function

Page 22: Learning visual  representations for unfamiliar  environments

Kernelization

Original Transformation Learning Problem

Kernel matrices for source and target

New Kernel Problem

Relationship between original and new problems at optimality

Page 23: Learning visual  representations for unfamiliar  environments

Summary of approach

Input space

Input space

1. Multi-Domain Data 2. Generate Constraints, Learn W

3. Map via W 4. Apply to New Categories

Trai

n Ti

me

Test

Tim

e

Test pointTest point

y1 y2

Page 24: Learning visual  representations for unfamiliar  environments

Multi-domain dataset

Page 25: Learning visual  representations for unfamiliar  environments

Experimental Setup• Utilized a standard bag-of-words model• Also utilize different features in the target domain

– SURF vs SIFT– Different visual word dictionaries

• Baseline for comparing such data: KCCA

Page 26: Learning visual  representations for unfamiliar  environments

Novel-class experiments

• Test method’s ability to transfer domain shift to unseen classes

• Train transform on half of the classes, test on the other half

Our Method (linear)Our Method

Page 27: Learning visual  representations for unfamiliar  environments

Extreme shift example

Nearest neighbors in source using transformation

Query from target Nearest neighbors in source using KCCA+KNN

Page 28: Learning visual  representations for unfamiliar  environments

Conclusion• Should not rely on hand-engineered

features any more than we rely on hand engineered models!

• Learn feature transformation across domains

• Developed a domain adaptation method based on regularized non-linear transforms– Asymmetric transform achieves best results on

more extreme shifts– Saenko et al ECCV 2010 and Kulis et al CVPR

2011; journal version forthcoming