Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft...

35
Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge

Transcript of Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft...

Page 1: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Semidefinite Programming Machines

Thore Graepel and Ralf Herbrich

Microsoft Research Cambridge

Page 2: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Overview

Invariant Pattern RecognitionSemidefinite Programming (SDP)From Support Vector Machines (SVMs)

to Semidefinite Programming Machines (SDPMs)

Experimental IllustrationFuture Work

Page 3: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Typical Invariances for Images

Translation

Rotation

Shear

Page 4: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Typical Invariances for Images

Translation

Rotation

Shear

Page 5: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Toy Features for Handwritten Digits

1 =0.48

3=0.37

2=0.58

Page 6: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Warning: Highly Non-Linear

Á1

Á2

Page 7: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Warning: Highly Non-Linear

0.2 0.3 0.4 0.5 0.60.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

1

2

Page 8: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Motivation: Classification Learning

0.1 0.2 0.3 0.4 0.50.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

1(x)

2(x)

Can we learn with infinitely many examples?

Page 9: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Motivation: Classification Learning

0.1 0.2 0.3 0.4 0.50.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

1(x)

2(x)

Page 10: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Motivation: Version Spaces

Original patterns Transformed patterns

Page 11: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Semidefinite Programs (SDPs)

Linear objective functionPositive semidefinite (psd) constraints

Infinitely many linear constraints

Page 12: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

SVM as a Quadratic Program

Given: A sample ((x1,y1),…,(xm,ym)).

SVMs find the weight vector w that maximises the margin on the sample

Page 13: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

SVM as a Semidefinite Program (I)

A (block)-diagonal matrix is psd if and only if all its blocks are psd.

Aj:=

g1,j

gi,j

gm,j

B:=

1

1

1

1

1

Page 14: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

SVM as a Semidefinite Program (I)

A (block)-diagonal matrix is psd if and onlyif all its blocks are psd.

Aj:=

g1,j

gi,j

gm,j

B:=

1

1

1

1

1

Page 15: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

SVM as a Semidefinite Program (II)

Transform quadratic into linear objective

Adds new (n+1)£(n+1) block to Aj and B

Use Schur’s complement lemma

Page 16: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Taylor Approximation of Invariance

Let T (x,µ) be an invariance transformation with parameter µ (e.g., angle of rotation).

Taylor Expansion about 0=0 gives

Polynomial approximation to trajectory.

Page 17: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Extension to Polynomials

Consider polynomial trajectory x(µ):

Infinite number of constraints from training example (x(0),…, x(r),y):

Page 18: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Non-Negative Polynomials (I)

Theorem (Nesterov,2000): If r=2l then 1. For every psd matrix P the polynomial

p(µ)=µTP µ is non-negative everywhere.

2. For every non-negative polynomial p there exists a psd matrix P such that p(µ)=µTPµ.

Example:

Page 19: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Non-Negative Polynomials (II)

(1) follows directly from psd definition(2) follows from sum-of-squares lemma.Note that (2) states the mere existence:

Polynomial of degree r: r+1 parametersCoefficient matrix P:(r+2) (r+4)/8 parameters

For r >2, we have to introduce another r(r-2)/8 auxiliary variables to find P.

Page 20: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Semidefinite Programming Machines

Extension of SVMs as (non-trivial) SDP.

Aj:=

g1,j

gi,j

gm,j

B:=

1

1

1

1

1

1

G1,j

Gi,j

Gm,j

1 0

0 0

1 0

0 0

1 0

0 0

Page 21: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Semidefinite Programming Machines

Extension of SVMs as (non-trivial) SDP.

Aj:=

g1,j

gi,j

gm,j

B:= 1

1

1

1

G1,j

Gi,j

Gm,j

1 0

0 0

1 0

0 0

1 0

0 0

Page 22: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Example: Second-Order SDPMs

2nd order Taylor expansion:

Resulting polynomial in µ:

Set of constraint matrices:

Page 23: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Example: Second-Order SDPMs

2nd order Taylor expansion:

Resulting polynomial in µ:

Set of constraint matrices:

Page 24: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Non-Negative on Segment

Given a polynomial p of degree 2l, consider the polynomial

Note that q is a polynomial of degree 4l.If q is positive everywhere, then p is

positive everywhere in [-¿,+¿].-5 0 5 10-10

-5

0

5

10

f()

Page 25: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Non-Negative on Segment

-5 0 5 10-10

-5

0

5

10

f( )

Page 26: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Truly Virtual Support Vectors

Dual complementarity yields expansion:

The truly virtual support vectors are linear combinations of derivatives:

Page 27: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Truly Virtual Support Vectors

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.31 0.315 0.32 0.325

0.188

0.189

0.19

0.191

0.192

0.193

“1”

“9”

Page 28: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Visualisation: USPS “1” vs. “9”

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

¿ = 20º

0.31 0.315 0.32 0.325

0.188

0.189

0.19

0.191

0.192

0.193

Page 29: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Results: Experimental Setup

All 45 USPS classification tasks (1-v-1).20 training images; 250 test images.Rotation is applied to all training images

with ¿ = 10º.All results are averaged over 50 random

training sets.Compared to SVM and virtual SVM.

Page 30: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Results: SDPM vs. SVM

0 0.05 0.1 0.15 0.20

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

SVM error

SD

PM

err

or

Page 31: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Results: SDPM vs. Virtual SVM

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

0.02

0.04

0.06

0.08

0.1

0.12

0.14

VSVM error

SD

PM

err

or

Page 32: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Results: Curse of Dimensionality

Page 33: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Results: Curse of Dimensionality

1 parameter 2 parameters

Page 34: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Extensions & Future Work

Multiple parameters µ1, µ2,..., µD.

(Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A.

Kharechko and J. Shawe-Taylor). Sparsification by efficiently finding the example

x and transformation µ with maximal information (idea of Neil Lawrence).

Expectation propagation for BPMs (idea of Tom Minka).

Page 35: Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft Research Cambridge.

Microsoft Research Ltd.

Conclusions & Future Work

Learning from infinitely many examples.Truly virtual support vectors xi(µi*).

Multiple parameters µ1, µ2,..., µD.

(Efficient) adaptation to kernel space.Semidefinite Perceptrons (NIPS poster

with A. Kharechko and J. Shawe-Taylor).