Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based...

55
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université de Montréal CIAR Workshop - April 26th, 2005

Transcript of Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based...

Page 1: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Graph-Based Semi-Supervised Learning

Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux

Université de Montréal

CIAR Workshop - April 26th, 2005

Page 2: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Graph-Based Semi-Supervised Learning

Yoshua Bengio, Olivier Delalleau and Nicolas Le Roux

Université de Montréal

CIAR Workshop - April 26th, 2005

Page 3: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Graph-Based Semi-Supervised Learning

Nicolas Le Roux, Olivier Delalleau and Yoshua Bengio

Université de Montréal

CIAR Workshop - April 26th, 2005

Page 4: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Graph-Based Semi-Supervised Learning

Olivier Delalleau, Nicolas Le Roux and Yoshua Bengio

Université de Montréal

CIAR Workshop - April 26th, 2005

Page 5: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Graph-Based Semi-Supervised Learning

Yoshua Bengio, Nicolas Le Roux and Olivier Delalleau

Université de Montréal

CIAR Workshop - April 26th, 2005

Page 6: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Graph-Based Semi-Supervised Learning

Nicolas Le Roux, Yoshua Bengio and Olivier Delalleau

Université de Montréal

CIAR Workshop - April 26th, 2005

Page 7: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Outline

1 Semi-Supervised Setting

2 Graph Regularization and Label Propagation

3 Transduction vs. Induction

4 Curse of Dimensionality

Page 8: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Semi-Supervised Learning for Dummies

Task of binary classification with labels yi ∈ {−1, 1}

Semi-supervised = learn something about labels usingboth labeled and unlabeled data

X = (x 1, x 2, . . . , x n)

Yl = (y1, y2, . . . , yl)

n = l + u

Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)

Induction ⇒ y : x → y(x)

Page 9: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Semi-Supervised Learning for Dummies

Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data

X = (x 1, x 2, . . . , x n)

Yl = (y1, y2, . . . , yl)

n = l + u

Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)

Induction ⇒ y : x → y(x)

Page 10: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Semi-Supervised Learning for Dummies

Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data

X = (x 1, x 2, . . . , x n)

Yl = (y1, y2, . . . , yl)

n = l + u

Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)

Induction ⇒ y : x → y(x)

Page 11: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Semi-Supervised Learning for Dummies

Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data

X = (x 1, x 2, . . . , x n)

Yl = (y1, y2, . . . , yl)

n = l + u

Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)

Induction ⇒ y : x → y(x)

Page 12: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

The Classical Two-Moon Problem

Page 13: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

The Classical Two-Moon Problem

Page 14: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Where are Manifolds and Kernels?

What is a good labeling Y = (Yl , Yu)?

1 one that is consistent with the given labels: Yl ' Yl

2 one that is smooth on the

manifold

where the data lie(

manifold

/ cluster assumption):

yi ' yj when x i close to x j

⇒ Cost function

C(Y ) =l∑

i=1

(yi − yi)2 +

µ

2

n∑i,j=1

W ij(yi − yj)2

with W ij = WX (x i , x j) a positive weighting function (e.g

kernel

)

Page 15: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Where are Manifolds and Kernels?

What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl

2 one that is smooth on the

manifold

where the data lie(

manifold

/ cluster assumption):

yi ' yj when x i close to x j

⇒ Cost function

C(Y ) =l∑

i=1

(yi − yi)2 +

µ

2

n∑i,j=1

W ij(yi − yj)2

with W ij = WX (x i , x j) a positive weighting function (e.g

kernel

)

Page 16: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Where are Manifolds and Kernels?

What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl

2 one that is smooth on the

manifold

where the data lie(

manifold

/ cluster assumption):

yi ' yj when x i close to x j

⇒ Cost function

C(Y ) =l∑

i=1

(yi − yi)2 +

µ

2

n∑i,j=1

W ij(yi − yj)2

with W ij = WX (x i , x j) a positive weighting function (e.g

kernel

)

Page 17: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Where are Manifolds and Kernels?

What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl

2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):

yi ' yj when x i close to x j

⇒ Cost function

C(Y ) =l∑

i=1

(yi − yi)2 +

µ

2

n∑i,j=1

W ij(yi − yj)2

with W ij = WX (x i , x j) a positive weighting function (e.g

kernel

)

Page 18: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Where are Manifolds and Kernels?

What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl

2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):

yi ' yj when x i close to x j

⇒ Cost function

C(Y ) =l∑

i=1

(yi − yi)2 +

µ

2

n∑i,j=1

W ij(yi − yj)2

with W ij = WX (x i , x j) a positive weighting function (e.g

kernel

)

Page 19: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Where are Manifolds and Kernels?

What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl

2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):

yi ' yj when x i close to x j

⇒ Cost function

C(Y ) =l∑

i=1

(yi − yi)2 +

µ

2

n∑i,j=1

W ij(yi − yj)2

with W ij = WX (x i , x j) a positive weighting function (e.g kernel)

Page 20: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Graphs Would be Cool too!

Nodes = points, edges (i , j) ⇔ W ij > 0.

Page 21: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Graphs Would be Cool too!

Nodes = points, edges (i , j) ⇔ W ij > 0.

Page 22: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Regularization on Graph

Graph Laplacian: L ii =∑

j 6=i W ij and L ij = −W ij .

C(Y ) =l∑

i=1

(yi − yi)2 +

µ

2

n∑i,j=1

W ij(yi − yj)2

= ‖Yl − Yl‖2 + µY>LY

C(Y ) is minimized when

(S + µL) Y = Y

⇒ linear system with n unknowns and equations.

Page 23: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Regularization on Graph

Graph Laplacian: L ii =∑

j 6=i W ij and L ij = −W ij .

C(Y ) =l∑

i=1

(yi − yi)2 +

µ

2

n∑i,j=1

W ij(yi − yj)2

= ‖Yl − Yl‖2 + µY>LY

C(Y ) is minimized when

(S + µL) Y = Y

⇒ linear system with n unknowns and equations.

Page 24: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

From Matrix Inversion to Label Propagation

Linear system rewrites for a labeled point

yi =

∑j W ij yj + 1

µyi∑j W ij + 1

µ

and for an unlabeled point

yi =

∑j W ij yj∑j W ij

.

Page 25: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

From Matrix Inversion to Label Propagation

Linear system rewrites for a labeled point

y (t+1)i =

∑j W ij y

(t)j + 1

µyi∑j W ij + 1

µ

and for an unlabeled point

y (t+1)i =

∑j W ij y

(t)j∑

j W ij.

Jacobi or Gauss-Seidel iteration algorithm.

Page 26: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

No, I didn’t come up with that last week-end

X. Zhu, Z. Ghahramani and J. Lafferty (2003):Semi-supervised learning using Gaussian fields andharmonic functions

D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, B.Schölkopf (2004): Learning with local and globalconsistency

M. Belkin, I. Matveeva and P. Niyogi (2004): Regularizationand Semi-supervised Learning on Large Graphs

Page 27: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Remember your Physics’ class?

Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:

R ij =1

W ij

Ohm’s law (potential = label):

yj − yi = R ij I ij

Kirchoff’s law on an unlabeled node i :∑j

I ij = 0

⇒ Same linear system as minimizing C(Y ) over Yu only.

Page 28: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Remember your Physics’ class?

Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:

R ij =1

W ij

Ohm’s law (potential = label):

yj − yi = R ij I ij

Kirchoff’s law on an unlabeled node i :∑j

I ij = 0

⇒ Same linear system as minimizing C(Y ) over Yu only.

Page 29: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Remember your Physics’ class?

Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:

R ij =1

W ij

Ohm’s law (potential = label):

yj − yi = R ij I ij

Kirchoff’s law on an unlabeled node i :∑j

I ij = 0

⇒ Same linear system as minimizing C(Y ) over Yu only.

Page 30: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Remember your Physics’ class?

Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:

R ij =1

W ij

Ohm’s law (potential = label):

yj − yi = R ij I ij

Kirchoff’s law on an unlabeled node i :∑j

I ij = 0

⇒ Same linear system as minimizing C(Y ) over Yu only.

Page 31: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

From Transduction to Induction

Solving the linear system ⇒ Y (transduction ).

From a new point x and already computed Y :

C(y(x )) = C(Y ) +µ

2

n∑i=1

WX (x i , x )(yi − y(x ))2

⇒ y(x ) =

∑ni=1 WX (x i , x )yi∑ni=1 WX (x i , x )

(Induction like Parzen Windows, but using estimated labels Y ).

Page 32: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

From Transduction to Induction

Solving the linear system ⇒ Y (transduction ).From a new point x and already computed Y :

C(y(x )) = C(Y ) +µ

2

n∑i=1

WX (x i , x )(yi − y(x ))2

⇒ y(x ) =

∑ni=1 WX (x i , x )yi∑ni=1 WX (x i , x )

(Induction like Parzen Windows, but using estimated labels Y ).

Page 33: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Faster Training from Subset

Previous algorithms are at least quadratic in n.

Induction formula ⇒ could train only on subset S.

Better: minimize the full cost over YS only.

For x i ∈ R = X \ S:

yi =

∑j∈S W ij yj∑j∈S W ij

i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.

Page 34: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Faster Training from Subset

Previous algorithms are at least quadratic in n.

Induction formula ⇒ could train only on subset S.

Better: minimize the full cost over YS only.

For x i ∈ R = X \ S:

yi =

∑j∈S W ij yj∑j∈S W ij

i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.

Page 35: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Faster Training from Subset

Previous algorithms are at least quadratic in n.

Induction formula ⇒ could train only on subset S.

Better: minimize the full cost over YS only.

For x i ∈ R = X \ S:

yi =

∑j∈S W ij yj∑j∈S W ij

i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.

Page 36: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Faster Training from Subset

Previous algorithms are at least quadratic in n.

Induction formula ⇒ could train only on subset S.

Better: minimize the full cost over YS only.

For x i ∈ R = X \ S:

yi =

∑j∈S W ij yj∑j∈S W ij

i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.

Page 37: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Nice Cost isn’t it?

C(YS) =µ

2

∑i,j∈R

W ij(yi − yj

)2

︸ ︷︷ ︸CRR

+ 2× µ

2

∑i∈R,j∈S

W ij(yi − yj

)2

︸ ︷︷ ︸CRS

2

∑i,j∈S

W ij(yi − yj

)2

︸ ︷︷ ︸CSS

+∑i∈L

(yi − yi)2

︸ ︷︷ ︸CL

Page 38: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Let’s Make it Simpler

Computing CRR is quadratic in n ⇒ just get rid of it.

Linear system with |S| = m � n unknowns ⇒ much faster

Still need to do matrix multiplications ⇒ scales as O(m2n)

This slide looked pretty empty with only three points

It is important to have reading material when nobodyunderstands your accent

Page 39: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Subset Selection

1 Random :

fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)

2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑

j∈S

W ij

i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)

Page 40: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Subset Selection

1 Random : fast,

easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)

2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑

j∈S

W ij

i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)

Page 41: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Subset Selection

1 Random : fast, easy,

crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)

2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑

j∈S

W ij

i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)

Page 42: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Subset Selection

1 Random : fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)

2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑

j∈S

W ij

i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)

Page 43: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Subset Selection

1 Random : fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)

2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑

j∈S

W ij

i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)

Page 44: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Experimental ResultsThis is not last-minute research so we do have results!

Table: Comparative Classification Error (Induction)

% labeled LETTERS MNIST COVTYPE1%

NoSub 56.0 35.8 47.3RandSubsubOnly 59.8 29.6 44.8

RandSub 57.4 27.7 75.7SmartSub 55.8 24.4 45.0

5%NoSub 27.1 12.8 37.1

RandSubsubOnly 32.1 14.9 35.4RandSub 29.1 12.6 70.6

SmartSub 28.5 12.3 35.810%

NoSub 18.8 9.5 34.7RandSubsubOnly 22.5 11.4 32.4

RandSub 20.3 9.7 64.7SmartSub 19.8 9.5 33.4

More comparisons between RandSub and SmartSub on 8 more UCI datasets⇒SmartSub always performs better.

Page 45: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Experimental ResultsThis is not last-minute research so we do have results!

Table: Comparative Classification Error (Induction)

% labeled LETTERS MNIST COVTYPE1%

NoSub 56.0 35.8 47.3RandSubsubOnly 59.8 29.6 44.8

RandSub 57.4 27.7 75.7SmartSub 55.8 24.4 45.0

5%NoSub 27.1 12.8 37.1

RandSubsubOnly 32.1 14.9 35.4RandSub 29.1 12.6 70.6

SmartSub 28.5 12.3 35.810%

NoSub 18.8 9.5 34.7RandSubsubOnly 22.5 11.4 32.4

RandSub 20.3 9.7 64.7SmartSub 19.8 9.5 33.4

More comparisons between RandSub and SmartSub on 8 more UCI datasets⇒SmartSub always performs better.

Page 46: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Curse of Dimensionality

Labeling function: y(x ) =∑

i yiW X (x i , x )

Locality : far point prediction = nearestneighbor; also normal vector ∂y

∂x (x ) isapproximately in the span of the nearestneighbors of x

Smoothness : it does not vary much within aball of radius small w.r.t. σ (obtained fromsecond derivative); also form of cost ⇒the learned function varies smoothly inregions with no labeled examples

Curse : one needs lots of unlabeled examplesto “fill” the region near the decision surface,and lots of labeled examples to account for all“clusters”

Page 47: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Curse of Dimensionality

Labeling function: y(x ) =∑

i yiW X (x i , x )

Locality : far point prediction = nearestneighbor; also normal vector ∂y

∂x (x ) isapproximately in the span of the nearestneighbors of x

Smoothness : it does not vary much within aball of radius small w.r.t. σ (obtained fromsecond derivative); also form of cost ⇒the learned function varies smoothly inregions with no labeled examples

Curse : one needs lots of unlabeled examplesto “fill” the region near the decision surface,and lots of labeled examples to account for all“clusters”

Page 48: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

“Almost Real-Life” Example

Need unlabeled exam-ples along the sinus deci-sion surface, and labeledexamples in each classregion.

Page 49: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Conclusion

Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm

Can scale to large datasets thanks to sparsity / subsetselection

Interesting links with electric networks / heat diffusion

Limitations of local weights: curse of dimensionality

Makes it possible to be on time for lunch!!!

Page 50: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Conclusion

Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm

Can scale to large datasets thanks to sparsity / subsetselection

Interesting links with electric networks / heat diffusion

Limitations of local weights: curse of dimensionality

Makes it possible to be on time for lunch!!!

Page 51: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Conclusion

Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm

Can scale to large datasets thanks to sparsity / subsetselection

Interesting links with electric networks / heat diffusion

Limitations of local weights: curse of dimensionality

Makes it possible to be on time for lunch!!!

Page 52: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Conclusion

Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm

Can scale to large datasets thanks to sparsity / subsetselection

Interesting links with electric networks / heat diffusion

Limitations of local weights: curse of dimensionality

Makes it possible to be on time for lunch!!!

Page 53: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Conclusion

Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm

Can scale to large datasets thanks to sparsity / subsetselection

Interesting links with electric networks / heat diffusion

Limitations of local weights: curse of dimensionality

Makes it possible to be on time for lunch!!!

Page 54: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

Conclusion

Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm

Can scale to large datasets thanks to sparsity / subsetselection

Interesting links with electric networks / heat diffusion

Limitations of local weights: curse of dimensionality

Makes it possible to be on time for lunch!!!

Page 55: Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based Semi-Supervised Learning Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux Université

Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse

References

Belkin, M., Matveeva, I., and Niyogi, P. (2004).Regularization and semi-supervised learning on large graphs.In Shawe-Taylor, J. and Singer, Y., editors, COLT’2004. Springer.

Delalleau, O., Bengio, Y., and Le Roux, N. (2005).Efficient non-parametric function induction in semi-supervised learning.In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics.

Doyle, P. G. and Snell, J. L. (1984).Random walks and electric networks.Mathematical Association of America.

Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., and Schölkopf, B. (2004).Learning with local and global consistency.In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16,Cambridge, MA. MIT Press.

Zhu, X., Ghahramani, Z., and Lafferty, J. (2003).Semi-supervised learning using Gaussian fields and harmonic functions.In ICML’2003.