1 A Survey on Distance Metric Learning (Part 2) Gerry Tesauro IBM T.J.Watson Research Center.
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of 1 A Survey on Distance Metric Learning (Part 2) Gerry Tesauro IBM T.J.Watson Research Center.
1
A Survey on Distance Metric Learning (Part 2)
Gerry Tesauro
IBM T.J.Watson Research Center
2
Acknowledgement
• Lecture material shamelessly adapted from the following sources:– Kilian Weinberger:
• “Survey on Distance Metric Learning” slides• IBM summer intern talk slides (Aug. 2006)
– Sam Roweis slides (NIPS 2006 workshop on “Learning to Compare Examples”)
– Yann LeCun talk slides (CVPR 2005, 2006)
3
Outline – Part 2
Neighbourhood Components Analysis (Golderberger et al.), Metric Learning by Collapsing Classes (Globerson & Roweis)
Metric Learning for Kernel Regression (Weinberger & Tesauro)
Metric learning for RL basis function construction (Keller et al.)
Similarity learning for image processing (LeCun et al.)
Neighborhood Component Analysis
(Goldberger et. al. 2004)Distance metric for visualization and kNN
Metric Learning for Kernel Regression
Weinberger & Tesauro, AISTATS 2007
Killing three birds with one stone:
We construct a method for linear dimensionality
reduction
that generates a meaningful distance
metric optimally tuned for
distance-based kernel
regression
7
Kernel Regression
• Given training set {(xj , yj), j=1,…,N} where x is -dim vector and y is real-valued, estimate value of a test point xi by weighted avg. of samples:
where kij = kD (xi, xj) is a distance-based kernel function using distance metric D
ijij
ijijj
i k
ky
y
8
Choice of Kernel
• Many functional forms for kij can be used in MLKR;
our empirical work uses the Gaussian kernel
where σ is a kernel width parameter (can set σ=1 W.L.O.G. since we learn D)
softmax regression estimate similar to Roweis’ softmax classifier
)/exp( 22 ijij Dk
ij
ij
ij
ijj
i D
Dy
y)exp(
)exp(
ˆ2
2
Distance Metric for Nearest Neighbor Regression
Learn a linear transformation that allows to estimate the value of a test point from its nearest neighbors
Mahalanobis Metric
Distance function is a pseudo Mahalanobis metric (Generalizes
Euclidean distance)
11
General Metric Learning Objective
• Find parmaterized distance function Dθ that minimizes total leave-one-out cross-validation loss function
– e.g. params θ = elements Aij of A matrix
• Since we’re solving for A not M, optimization is non-convex use gradient descent
2)ˆ( iii
yy
12
Gradient Computation
where xij = xi – xj
For fast implementation: Don’t sum over all i-j pairs, only go up to ~1000
nearest neighbors for each sample i Maintain nearest neighbors in a heap-tree structure,
update heap tree every 15 gradient steps Ignore sufficiently small values of kij ( < e-34 )
Even better data structures: cover trees, k-d trees
))ˆ()ˆ(4 Tij
i jijijjjii xxkyyyyA
A
Learned Distance Metric example
orig. Euclidean D < 1 learned D < 1
“Twin Peaks” test
n=8000
Training:
we added 3 dimensions with
1000% noise
we rotated 5 dimensions randomly
Input Variance
Noise Signal
Test data
QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.
QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.
Test data
Output Variance
Signal Noise
DimReduction with MLKR• FG-NET face data: 82 persons, 984 face images w/age
DimReduction with MLKR• FG-NET face data: 82 persons, 984 face images w/age
DimReduction with MLKR
Force A to be rectangular
Project onto eigenvectors of A
Allows visualization of data
PowerManagement data (d=21)
Robot arm results (8,32dim)
regression error
© 2006 IBM Corporation
IBM
Unity Data Center Prototype
Objective: Learn long-range resource value estimates for each application manager
State Variables (~48):
– Arrival rate– ResponseTime– QueueLength– iatVariance– rtVariance
Action: # of servers allocated by Arbiter
Reward: SLA(Resp. Time)
8 xSeries servers
Value(#srvrs)
Trade3
AppManager
Value(RT)
ResourceArbiter
Batch
AppManager
Trade3
Server Server Server Server Server Server Server Server
Value(#srvrs)
Value(#srvrs)
Demand(HTTP req/sec)
WebSphere 5.1
DB2
AppManager
WebSphere 5.1
DB2
Value(#srvrs)
Maximize Total SLA Revenue
5 sec
Value(RT)
Demand(HTTP req/sec)
SLA SLA SLA
(Tesauro, AAAI 2005; Tesauro et al., ICAC 2006)
© 2006 IBM Corporation
IBM
Power & Performance Management
Objective: Managing systems to multi-discipline objectives: minimize Resp. Time and minimize Power Usage
State Variables (21):
– Power Cap
– Power Usage
– CPU Utilization
– Temperature
– # of requests arrived
– Workload intensity (# Clients)
– Response Time
Action: Power Cap
Reward: SLA(Resp. Time) – Power Usage
(Kephart et al., ICAC 2007)
© 2006 IBM Corporation25
IBM
IBM Regression Results TEST ERROR
14/47
10/223/5
MLKR
27
Metric Learning for RL basis function construction (Keller et al. ICML 2006)
• RL Dataset of state-action-reward tuples {(si, ai, ri) , i=1,…,N}
28
Value Iteration
• Define an iterative “bootstrap” calculation:
• Each round of VI must iterate over all states in the state space• Try to speed this up using state aggregation (Bertsekas &
Castanon, 1989)
• Idea: Use NCA to aggregate states:– project states into lower-dim rep; keep states with similar Bellman
error close together
– use projected states to define a set of basis functions {}– learn linear value function over basis functions: V = θi i
'
''1 )'(max)(s
kass
ass
ak sVRPsV
Chopra et. al. 2005Similarity metric for image
verification.
Problem: Given a pair of face-images,decide if they are from the same person.
Chopra et. al. 2005Similarity metric for image
verification.
Too difficult for linear mapping!
Problem: Given a pair of face-images,decide if they are from the same person.