Transfer Learning Algorithms for Image...
Transcript of Transfer Learning Algorithms for Image...
![Page 1: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/1.jpg)
1
Transfer Learning Algorithms for Image Classification
Ariadna QuattoniMIT, CSAIL
Advisors:Michael CollinsTrevor Darrell
![Page 2: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/2.jpg)
2 / 46
Motivation
We want to be able to build classifiers for thousands of visual categories.
We want to exploit rich and complex feature representations.
Problem:
Goal:
We might only have a few labeled samples per category.
Solution:
Transfer Learning, leverage labeled data from multiple related tasks.
![Page 3: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/3.jpg)
3 / 46
Thesis Contributions
Learn an image representation using supervised data from auxiliary tasks automatically derived from unlabeled images + meta-data.
A transfer learning model based on joint regularization and an efficient optimization algorithm for training jointly sparse classifiers in high dimensional feature spaces.
We study efficient transfer algorithms for image classification whichcan exploit supervised training data from a set of related tasks:
![Page 4: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/4.jpg)
4 / 46
A method for learning image representations from unlabeled images + meta-data
Large dataset of unlabeled images + meta-data
Create auxiliaryproblems
Structure Learning[Ando & Zhang, JMLR 2005] h:F I R→
Visual Representation
)()( xvxf ttkk θ=
Structure Learning:
Task specific parameters Shared Parameters
![Page 5: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/5.jpg)
5 / 46
A method for learning image representations from unlabeled images + meta-data
4 8 16 32 64 128 3560.5
0.55
0.6
0.65
0.7
0.75
# training samples
Average AUCReuters Dataset
Structure LearningBaseline ModelPCA Model
![Page 6: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/6.jpg)
6 / 46
Outline
An overview of transfer learning methods.
A joint sparse approximation model for transfer learning.
Asymmetric transfer experiments.
An efficient training algorithm.
Symmetric transfer image annotation experiments.
![Page 7: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/7.jpg)
7 / 46
Transfer Learning: A brief overview
The goal of transfer learning is to use labeled data from relatedtasks to make learning easier. Two settings:
Asymmetric transfer: Resource: Large amounts of supervised data for a set of related tasks. Goal: Improve performance on a target task for which training data is scarce.
Symmetric transfer:Resource: Small amount of training data for a large number of related tasks. Goal: Improve average performance over all classifiers.
![Page 8: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/8.jpg)
8 / 46
Transfer Learning: A brief overview
Three main approaches:
Learning intermediate latent representations:[Thrun
1996, Baxter 1997, Caruana
1997, Argyriou
2006, Amit
2007]
Learning priors over parameters: [Raina 2006, Lawrence et al. 2004 ]
Learning relevant shared features [Torralba 2004, Obozinsky 2006]
![Page 9: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/9.jpg)
9 / 46
Feature Sharing Framework:
Work with a rich representation:Complex features, high dimensional spaceSome of them will be very discriminative (hopefully)Most will be irrelevant
Related problems may share relevant features.
If we knew the relevant features we could:Learn from fewer examplesBuild more efficient classifiers
We can train classifiers from related problems together using a regularization penalty designed to promote joint sparsity.
![Page 10: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/10.jpg)
10 / 46
Church Airport Grocery Store Flower-Shop
2,1w
2,5w
11w 11w
2,4w
2,3w
2,2w
3,1w
3,5w
3,4w
3,3w
3,2w
4,1w
4,5w
4,4w
4,3w
4,2w
1,1w
1,5w
1,4w
1,3w
1,2w
![Page 11: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/11.jpg)
11 / 46
Church Airport Grocery Store Flower-Shop
2,1w
2,5w
11w 11w
2,4w
2,3w
2,2w
−
3,5w
3,4w
+
3,2w
4,1w
4,5w
4,4w
4,3w
4,2w
1,1w
1,5w
1,4w
1,3w
1,2w
![Page 12: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/12.jpg)
12 / 46
Church Airport Grocery Store Flower-Shop
+
−
11w 11w
2,4w
2,3w
2,2w
−
3,5w
3,4w
+
3,2w
−
+
4,4w
−
4,2w
+
+
1,4w
−
1,2w
![Page 13: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/13.jpg)
13 / 46
Related Formulations of Joint Sparse Approximation
Obozinski et al. [2006] proposed L1-2 joint penalty and developed a blockwise boosting scheme based on Boosted-Lasso.
Torralba et al. [2004] developed a joint boosting algorithm based on theidea of learning additive models for each class that share weak learners.
![Page 14: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/14.jpg)
14 / 46
Our Contribution
Previous approaches to joint sparse approximation have relied on greedy coordinate descent methods.
We propose a simple an efficient global optimization algorithm with guaranteed convergence rates.
A new model and optimization algorithm for training jointly sparse classifiers:
We test our model on real image classification tasks where we observe improvements in both asymmetric and symmetric transfer settings.
Our algorithm can scale to large problems involving hundreds of problems and thousands of examples and features.
![Page 15: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/15.jpg)
15 / 46
Outline
An overview of transfer learning methods.
A joint sparse approximation model for transfer learning.
Asymmetric transfer experiments.
An efficient training algorithm.
Symmetric transfer image annotation experiments.
![Page 16: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/16.jpg)
16 / 46
Notation
Collection
of Tasks
},...,,{ 21 mDDD=D},...,,{ 21 mDDD=D
Joint SparseApproximation
1D2D mD
)},(),....,,{( 11kn
kn
kkk kk
yxyxD =
dℜ∈x }1,1{ −+∈y
mddd
m
m
www
wwwwww
,2,1,
,22,21,2
,12,11,1
L
MOOM
L
L
W
![Page 17: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/17.jpg)
17 / 46
Single Task Sparse Approximation
xwxf ⋅=)(
∑ ∑∈ =
+Dyx
d
jjwQyxfl
),( 1||)),((minarg
w
Consider learning a single sparse linear classifier of the form:
We want a few features with non-zero coefficients
Recent work suggests to use L1 regularization:
Classificationerror
L1
penalizesnon-sparse solutions
Donoho [2004] proved (in a regression setting) that the solution with smallest L1 norm is also the sparsest solution.
![Page 18: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/18.jpg)
18 / 46
Joint Sparse Approximation
∑ ∑= ∈
+m
kmk
Dyxk
QyxflD
k121
),(,...,, ),....,,R()),((
||1minarg www
m21 www
Setting : Joint Sparse Approximation
Average Losson Collection D
Penalizes solutions that
utilize too many features
xxf kk ⋅= w)(
![Page 19: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/19.jpg)
19 / 46
Joint Regularization Penalty
rowszerononW −−=#)R(
mddd
m
m
www
wwwwww
,2,1,
,22,21,2
,12,11,1
L
MOOM
L
L
How do we penalize solutions that use too many features?
Coefficients forfor feature 2
Coefficients forclassifier 2
Would lead to a hard combinatorial problem .
![Page 20: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/20.jpg)
20 / 46
Joint Regularization Penalty
We will use a L1-∞ norm [Tropp 2006]
∑=
=d
iikk
WW1
|)(|max)R(
The combination of the two norms results in a solution where only a few features are used but the features used will contribute in solving many classification problems.
This norm combines:
An L1
norm on the maximum absolute values of the coefficients across tasks promotes sparsity.
Use few features
The L∞
norm on each row promotes non-
sparsity
on each row.Share features
![Page 21: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/21.jpg)
21 / 46
Joint Sparse Approximation
∑ ∑∑= =∈
+m
k
d
iikkk
Dyxk
WQyxflD
k1 1),(
|)(|max)),((||
1minW
Using the L1-∞ norm we can rewrite our objective function as:
For the hinge loss:the optimization problem can be expressed as a linear program.
))(1,0max()),(( xyfyxfl −=
For any convex loss this is a convex objective.
![Page 22: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/22.jpg)
22 / 46
Joint Sparse Approximation
Objective:
∑ ∑ ∑= = =
+m
k
D
j
d
ii
kj
k
k
tQD1
||
1 1],,[ ||
1min εtεW
Linear program formulation (hinge loss):
Max value constraints:
mkfor :1: =
difor :1: =
0≥kjε
iiki twt ≤≤−mkfor :1: =
|:|1: kDjfor =
kj
kjk
kj xfy ε−≥1)(
and
Slack variables constraints:and
![Page 23: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/23.jpg)
23 / 46
Outline
An overview of transfer learning methods.
A joint sparse approximation model for transfer learning.
Asymmetric transfer experiments.
An efficient training algorithm.
Symmetric transfer image annotation experiments.
![Page 24: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/24.jpg)
24 / 46
Setting: Asymmetric TransferSuperBowl Danish CartoonsSharon
Australian
Open Trapped Miners Golden globes
Grammys Figure Skating
Academy Awards
Iraq
Learn a representation using labeled data from 9 topics.
Train a classifier for the 10th
held out topic using the relevantfeatures R
only.
}0|)(|max:{ >= rkk wrRDefine the set of relevant features to be:
Learn the matrix W using our transfer algorithm.
![Page 25: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/25.jpg)
25 / 46
4 20 40 60 80 100 120 1400.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
Average AUC
# training samples
Asymmetric Transfer
Baseline RepresentationTransfered Representation
Results
![Page 26: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/26.jpg)
26 / 46
Outline
An overview of transfer learning methods.
A joint sparse approximation model for transfer learning.
Asymmetric transfer experiments.
An efficient training algorithm.
Symmetric transfer image annotation experiments.
![Page 27: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/27.jpg)
27 / 46
The LP formulation is feasible for small problems but becomes intractable for larger data-sets with thousands of examples and dimensions.
We might want a more general optimization algorithm that can handle arbitrary convex losses.
Limitations of the LP formulation
The LP formulation can be optimized using standard LP solvers.
![Page 28: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/28.jpg)
28 / 46
L1-∞
Regularization: Constrained Convex Optimization Formulation
∑ ∑= ∈
m
kk
Dyxk
yxflD
k1 ),()),((
||1minarg W
∑=
≤d
iikk
CWts1
|)(|max..
We will use a Projected SubGradient method.Main advantages: simple, scalable, guaranteed convergence rates.
A convex function
Convex constraints
Projected SubGradient methods have been recently proposed:L2 regularization, i.e. SVM [Shalev-Shwartz et al. 2007] L1 regularization [Duchi et al. 2008]
![Page 29: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/29.jpg)
29 / 46
Euclidean Projection into the L1-∞
ball
![Page 30: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/30.jpg)
30 / 46
Characterization of the solution
1
![Page 31: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/31.jpg)
31 / 46
Characterization of the solution
Feature I Feature IIIFeature II Feature VI
1μ
3μ2μ
4μ∑>
−=2,2
2,2μ
μθjA
jA
![Page 32: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/32.jpg)
32 / 46
Mapping to a simpler problemWe can map the projection problem to the following problem which
finds the optimal maximums μ:
![Page 33: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/33.jpg)
33 / 46
Efficient Algorithm for: , in pictures
1μ
2μ
4μ
4 Features, 6 problems, C=14 ∑=
=d
iikkA
1
29|)(|max
3μ
![Page 34: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/34.jpg)
34 / 46
Complexity
The total cost of the algorithm is dominated by a sort of
The total cost is in the order of:
the entries of A.
))log(( dmdmO
![Page 35: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/35.jpg)
35 / 46
Outline
An overview of transfer learning methods.
A joint sparse approximation model for transfer learning.
Asymmetric transfer experiments.
An efficient training algorithm.
Symmetric transfer image annotation experiments.
![Page 36: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/36.jpg)
36 / 46
Synthetic Experiments
Generate a jointly sparse parameter matrix W:
)( ki
tk
ki xwsigny =
For every task we generate pairs: ),( ki
ki yx
where
We compared three different types of regularization(i.e. projections):
L1−∞ projectionL2 projectionL1 projection
![Page 37: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/37.jpg)
37 / 46
Synthetic Experiments
10 20 40 80 160 320 64015
20
25
30
35
40
45
50
# training examples per task
Erro
r
Synthetic Experiments Results: 60 problems 200 features 10% relevant
L2L1L1-LINF
Test ErrorPerformance on predicting
relevant features
10 20 40 80 160 320 64010
20
30
40
50
60
70
80
90
100
# training examples per task
Feature Selection Performance
Precision L1-INFRecall L1Precision L1Recall L1-INF
![Page 38: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/38.jpg)
38 / 46
Dataset: Image Annotation
40 top content wordsRaw image representation: Vocabulary Tree
(Grauman
and Darrell 2005, Nister
and Stewenius
2006)
11000 dimensions
president actress team
![Page 39: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/39.jpg)
39 / 46
Results
The differences are statistically significant
![Page 40: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/40.jpg)
40 / 46
Results
![Page 41: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/41.jpg)
41 / 46
Dataset: Indoor Scene Recognition
67 indoor scenes.
Raw image representation: similarities to a set of unlabeled images.
2000 dimensions.
bakery bar Train station
![Page 42: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/42.jpg)
42 / 46
Results:
![Page 43: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/43.jpg)
43
![Page 44: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/44.jpg)
44 / 46
Summary of Thesis Contributions
A method that learns image representations using unlabeled images + meta-data.
A transfer model based on performing a joint loss minimization over the training sets of related tasks with a shared regularization.
Previous approaches used greedy coordinate descent methods. We propose an efficient global optimization algorithm for learning jointly sparsemodels.
We presented experiments on real image classification tasks for both an asymmetric and symmetric transfer setting.
A tool that makes implementing an L1−∞ penalty as easy and almost as efficient as implementing the standard L1 and L2 penalties.
![Page 45: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/45.jpg)
45 / 46
Future Work
Task Clustering.
Online Optimization.
Generalization properties of L1−∞ regularized models.
Combining feature representations.
![Page 46: Transfer Learning Algorithms for Image Classificationpeople.csail.mit.edu/ariadna/thesis_talk_8pm.pdfAriadna Quattoni MIT, CSAIL. Advisors: Michael Collins. Trevor Darrell. 2 / 46](https://reader033.fdocuments.in/reader033/viewer/2022053023/6056da009c79433e1328b172/html5/thumbnails/46.jpg)
46 / 46
Thanks!