1 A Survey on Distance Metric Learning (Part 2) Gerry Tesauro IBM T.J.Watson Research Center.

A Survey on Distance Metric Learning (Part 2)

Gerry Tesauro

IBM T.J.Watson Research Center

Acknowledgement

• Lecture material shamelessly adapted from the following sources:– Kilian Weinberger:

• “Survey on Distance Metric Learning” slides• IBM summer intern talk slides (Aug. 2006)

– Sam Roweis slides (NIPS 2006 workshop on “Learning to Compare Examples”)

– Yann LeCun talk slides (CVPR 2005, 2006)

Outline – Part 2

Neighbourhood Components Analysis (Golderberger et al.), Metric Learning by Collapsing Classes (Globerson & Roweis)

Metric Learning for Kernel Regression (Weinberger & Tesauro)

Metric learning for RL basis function construction (Keller et al.)

Similarity learning for image processing (LeCun et al.)

Neighborhood Component Analysis

(Goldberger et. al. 2004)Distance metric for visualization and kNN

Metric Learning for Kernel Regression

Weinberger & Tesauro, AISTATS 2007

Killing three birds with one stone:

We construct a method for linear dimensionality

reduction

that generates a meaningful distance

metric optimally tuned for

distance-based kernel

regression

Kernel Regression

• Given training set {(xj , yj), j=1,…,N} where x is -dim vector and y is real-valued, estimate value of a test point xi by weighted avg. of samples:

where kij = kD (xi, xj) is a distance-based kernel function using distance metric D

Choice of Kernel

• Many functional forms for kij can be used in MLKR;

our empirical work uses the Gaussian kernel

where σ is a kernel width parameter (can set σ=1 W.L.O.G. since we learn D)

softmax regression estimate similar to Roweis’ softmax classifier

)/exp( 22 ijij Dk

y)exp(

Distance Metric for Nearest Neighbor Regression

Learn a linear transformation that allows to estimate the value of a test point from its nearest neighbors

Mahalanobis Metric

Distance function is a pseudo Mahalanobis metric (Generalizes

Euclidean distance)

General Metric Learning Objective

• Find parmaterized distance function Dθ that minimizes total leave-one-out cross-validation loss function

– e.g. params θ = elements Aij of A matrix

• Since we’re solving for A not M, optimization is non-convex use gradient descent

2)ˆ( iii

Gradient Computation

where xij = xi – xj

For fast implementation: Don’t sum over all i-j pairs, only go up to ~1000

nearest neighbors for each sample i Maintain nearest neighbors in a heap-tree structure,

update heap tree every 15 gradient steps Ignore sufficiently small values of kij ( < e-34 )

Even better data structures: cover trees, k-d trees

))ˆ()ˆ(4 Tij

i jijijjjii xxkyyyyA

Learned Distance Metric example

orig. Euclidean D < 1 learned D < 1

“Twin Peaks” test

n=8000

Training:

we added 3 dimensions with

1000% noise

we rotated 5 dimensions randomly

Input Variance

Noise Signal

Test data

QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.

Test data

Output Variance

Signal Noise

DimReduction with MLKR• FG-NET face data: 82 persons, 984 face images w/age

DimReduction with MLKR

Force A to be rectangular

Project onto eigenvectors of A

Allows visualization of data

PowerManagement data (d=21)

Robot arm results (8,32dim)

regression error

Unity Data Center Prototype

Objective: Learn long-range resource value estimates for each application manager

State Variables (~48):

– Arrival rate– ResponseTime– QueueLength– iatVariance– rtVariance

Action: # of servers allocated by Arbiter

Reward: SLA(Resp. Time)

8 xSeries servers

Value(#srvrs)

Trade3

AppManager

Value(RT)

ResourceArbiter

AppManager

Trade3

Server Server Server Server Server Server Server Server

Value(#srvrs)

Demand(HTTP req/sec)

WebSphere 5.1

AppManager

WebSphere 5.1

Value(#srvrs)

Maximize Total SLA Revenue

Value(RT)

Demand(HTTP req/sec)

SLA SLA SLA

(Tesauro, AAAI 2005; Tesauro et al., ICAC 2006)

Power & Performance Management

Objective: Managing systems to multi-discipline objectives: minimize Resp. Time and minimize Power Usage

State Variables (21):

– Power Cap

– Power Usage

– CPU Utilization

– Temperature

– # of requests arrived

– Workload intensity (# Clients)

– Response Time

Action: Power Cap

Reward: SLA(Resp. Time) – Power Usage

(Kephart et al., ICAC 2007)

IBM Regression Results TEST ERROR

10/223/5

Metric Learning for RL basis function construction (Keller et al. ICML 2006)

• RL Dataset of state-action-reward tuples {(si, ai, ri) , i=1,…,N}

Value Iteration

• Define an iterative “bootstrap” calculation:

• Each round of VI must iterate over all states in the state space• Try to speed this up using state aggregation (Bertsekas &

Castanon, 1989)

• Idea: Use NCA to aggregate states:– project states into lower-dim rep; keep states with similar Bellman

error close together

– use projected states to define a set of basis functions {}– learn linear value function over basis functions: V = θi i

''1 )'(max)(s

ak sVRPsV

Chopra et. al. 2005Similarity metric for image

verification.

Problem: Given a pair of face-images,decide if they are from the same person.

Chopra et. al. 2005Similarity metric for image

verification.

Too difficult for linear mapping!

Problem: Given a pair of face-images,decide if they are from the same person.

1 A Survey on Distance Metric Learning (Part 2) Gerry Tesauro IBM T.J.Watson Research Center.

Documents

Transcript of 1 A Survey on Distance Metric Learning (Part 2) Gerry Tesauro IBM T.J.Watson Research Center.

Improving Systems Management Policies Using Hybrid Reinforcement Learning Gerry Tesauro IBM TJ Watson Research Center Joint work with Rajarshi Das (IBM),

Tesauro Ling

Universum Support Vector Machine -A generalized approach Junfeng He with help from Professor Tony Jebara, Gerry Tesauro and Vladimir Naumovich Vapnik.

Tesauro de Arte Arquitectura y Tesauro Regional ... · Tesauro de Arte & Arquitectura y Tesauro Regional Patrimonial: Steps International Terminology Working Group Meeting Dresden,

Course of Geothermics Dr. Magdala Tesauro · Dr. Magdala Tesauro. Temperature and magmatic processes Melting generation is a consequence of: • Decompression occurring during upper

On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.

Gerry Huber Heritage Research Group Gerry Huber Heritage Research Group.

Il tesauro Mesh, l’Unified Medical Language System ... · indicizzazione MESH ... Il Medical Subject Headings è il tesauro della base dati PUBMED prodotta dalla National Library

NIPS-2000-Poster...er Contracts: Steve Hanson, Gerry Tesauro Government Liaison: Gar Local Arrangements: Rosemary Webmasters: Doug Baker, Alexan eP4)ray Program Committee: Leon Bottou,

Status Report from AAT Chile and Tesauro Regional Patrimonial ...

Tesauro de Arte & Arquitectura (TAA) - getty.edu · PDF fileSomeSomePUBLICATIONSPUBLICATIONS aboutabout TESAUROS TESAUROS Nagel: The Tesauro de Arte & Arquitectura and the Tesauro

Tesauro. . .

Course of Geodynamics Dr. Magdala Tesauro

CUSTOMER CENTRIC UNECE, Paris, June 30, 2010 Gerry McGovern gerry@customercarewords.com .

Gerry Crispin

Millennial Attitude May 6, 2015 Gerry McGovern gerry@customercarewords.com @gerrymcgovern.

(fuente: Tesauro de la Unesco). Del otro lado de la ...

Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.

LIVIN - GERRY WEBER · 2019-07-30 · 4 GERRY WEBER at a glance 6 GERRY WEBER Managing Board 3 STRONG BRANDS GERRY WEBER, TAIFUN and SAMOON 10 12 Brand portfolio Portrait of the GERRY

Gerry Panel Wang