Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based...
Transcript of Graph-Based Semi-Supervised Learningdelallea/pub/delalleau_semisup_ciar.pdf · Graph-Based...
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Olivier Delalleau, Yoshua Bengio and Nicolas Le Roux
Université de Montréal
CIAR Workshop - April 26th, 2005
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Yoshua Bengio, Olivier Delalleau and Nicolas Le Roux
Université de Montréal
CIAR Workshop - April 26th, 2005
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Nicolas Le Roux, Olivier Delalleau and Yoshua Bengio
Université de Montréal
CIAR Workshop - April 26th, 2005
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Olivier Delalleau, Nicolas Le Roux and Yoshua Bengio
Université de Montréal
CIAR Workshop - April 26th, 2005
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Yoshua Bengio, Nicolas Le Roux and Olivier Delalleau
Université de Montréal
CIAR Workshop - April 26th, 2005
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graph-Based Semi-Supervised Learning
Nicolas Le Roux, Yoshua Bengio and Olivier Delalleau
Université de Montréal
CIAR Workshop - April 26th, 2005
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Outline
1 Semi-Supervised Setting
2 Graph Regularization and Label Propagation
3 Transduction vs. Induction
4 Curse of Dimensionality
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Semi-Supervised Learning for Dummies
Task of binary classification with labels yi ∈ {−1, 1}
Semi-supervised = learn something about labels usingboth labeled and unlabeled data
X = (x 1, x 2, . . . , x n)
Yl = (y1, y2, . . . , yl)
n = l + u
Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)
Induction ⇒ y : x → y(x)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Semi-Supervised Learning for Dummies
Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data
X = (x 1, x 2, . . . , x n)
Yl = (y1, y2, . . . , yl)
n = l + u
Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)
Induction ⇒ y : x → y(x)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Semi-Supervised Learning for Dummies
Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data
X = (x 1, x 2, . . . , x n)
Yl = (y1, y2, . . . , yl)
n = l + u
Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)
Induction ⇒ y : x → y(x)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Semi-Supervised Learning for Dummies
Task of binary classification with labels yi ∈ {−1, 1}Semi-supervised = learn something about labels usingboth labeled and unlabeled data
X = (x 1, x 2, . . . , x n)
Yl = (y1, y2, . . . , yl)
n = l + u
Transduction ⇒ Yu = (yl+1, yl+2, . . . , yn)
Induction ⇒ y : x → y(x)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
The Classical Two-Moon Problem
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
The Classical Two-Moon Problem
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?
1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the
manifold
where the data lie(
manifold
/ cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the
manifold
where the data lie(
manifold
/ cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the
manifold
where the data lie(
manifold
/ cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g
kernel
)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Where are Manifolds and Kernels?
What is a good labeling Y = (Yl , Yu)?1 one that is consistent with the given labels: Yl ' Yl
2 one that is smooth on the manifold where the data lie(manifold / cluster assumption):
yi ' yj when x i close to x j
⇒ Cost function
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
with W ij = WX (x i , x j) a positive weighting function (e.g kernel)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graphs Would be Cool too!
Nodes = points, edges (i , j) ⇔ W ij > 0.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Graphs Would be Cool too!
Nodes = points, edges (i , j) ⇔ W ij > 0.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Regularization on Graph
Graph Laplacian: L ii =∑
j 6=i W ij and L ij = −W ij .
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
= ‖Yl − Yl‖2 + µY>LY
C(Y ) is minimized when
(S + µL) Y = Y
⇒ linear system with n unknowns and equations.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Regularization on Graph
Graph Laplacian: L ii =∑
j 6=i W ij and L ij = −W ij .
C(Y ) =l∑
i=1
(yi − yi)2 +
µ
2
n∑i,j=1
W ij(yi − yj)2
= ‖Yl − Yl‖2 + µY>LY
C(Y ) is minimized when
(S + µL) Y = Y
⇒ linear system with n unknowns and equations.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
From Matrix Inversion to Label Propagation
Linear system rewrites for a labeled point
yi =
∑j W ij yj + 1
µyi∑j W ij + 1
µ
and for an unlabeled point
yi =
∑j W ij yj∑j W ij
.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
From Matrix Inversion to Label Propagation
Linear system rewrites for a labeled point
y (t+1)i =
∑j W ij y
(t)j + 1
µyi∑j W ij + 1
µ
and for an unlabeled point
y (t+1)i =
∑j W ij y
(t)j∑
j W ij.
Jacobi or Gauss-Seidel iteration algorithm.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
No, I didn’t come up with that last week-end
X. Zhu, Z. Ghahramani and J. Lafferty (2003):Semi-supervised learning using Gaussian fields andharmonic functions
D. Zhou, O. Bousquet, T. Navin Lal, J. Weston, B.Schölkopf (2004): Learning with local and globalconsistency
M. Belkin, I. Matveeva and P. Niyogi (2004): Regularizationand Semi-supervised Learning on Large Graphs
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Remember your Physics’ class?
Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:
R ij =1
W ij
Ohm’s law (potential = label):
yj − yi = R ij I ij
Kirchoff’s law on an unlabeled node i :∑j
I ij = 0
⇒ Same linear system as minimizing C(Y ) over Yu only.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Remember your Physics’ class?
Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:
R ij =1
W ij
Ohm’s law (potential = label):
yj − yi = R ij I ij
Kirchoff’s law on an unlabeled node i :∑j
I ij = 0
⇒ Same linear system as minimizing C(Y ) over Yu only.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Remember your Physics’ class?
Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:
R ij =1
W ij
Ohm’s law (potential = label):
yj − yi = R ij I ij
Kirchoff’s law on an unlabeled node i :∑j
I ij = 0
⇒ Same linear system as minimizing C(Y ) over Yu only.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Remember your Physics’ class?
Electric network analogy (Doyle and Snell, 1984; Zhu,Ghahramani and Lafferty, 2003).Graph ⇔ electric network with resistors between nodes:
R ij =1
W ij
Ohm’s law (potential = label):
yj − yi = R ij I ij
Kirchoff’s law on an unlabeled node i :∑j
I ij = 0
⇒ Same linear system as minimizing C(Y ) over Yu only.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
From Transduction to Induction
Solving the linear system ⇒ Y (transduction ).
From a new point x and already computed Y :
C(y(x )) = C(Y ) +µ
2
n∑i=1
WX (x i , x )(yi − y(x ))2
⇒ y(x ) =
∑ni=1 WX (x i , x )yi∑ni=1 WX (x i , x )
(Induction like Parzen Windows, but using estimated labels Y ).
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
From Transduction to Induction
Solving the linear system ⇒ Y (transduction ).From a new point x and already computed Y :
C(y(x )) = C(Y ) +µ
2
n∑i=1
WX (x i , x )(yi − y(x ))2
⇒ y(x ) =
∑ni=1 WX (x i , x )yi∑ni=1 WX (x i , x )
(Induction like Parzen Windows, but using estimated labels Y ).
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Faster Training from Subset
Previous algorithms are at least quadratic in n.
Induction formula ⇒ could train only on subset S.
Better: minimize the full cost over YS only.
For x i ∈ R = X \ S:
yi =
∑j∈S W ij yj∑j∈S W ij
i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Faster Training from Subset
Previous algorithms are at least quadratic in n.
Induction formula ⇒ could train only on subset S.
Better: minimize the full cost over YS only.
For x i ∈ R = X \ S:
yi =
∑j∈S W ij yj∑j∈S W ij
i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Faster Training from Subset
Previous algorithms are at least quadratic in n.
Induction formula ⇒ could train only on subset S.
Better: minimize the full cost over YS only.
For x i ∈ R = X \ S:
yi =
∑j∈S W ij yj∑j∈S W ij
i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Faster Training from Subset
Previous algorithms are at least quadratic in n.
Induction formula ⇒ could train only on subset S.
Better: minimize the full cost over YS only.
For x i ∈ R = X \ S:
yi =
∑j∈S W ij yj∑j∈S W ij
i.e. YR = W RSYS: the cost C(Y ) now only depends on YS.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Nice Cost isn’t it?
C(YS) =µ
2
∑i,j∈R
W ij(yi − yj
)2
︸ ︷︷ ︸CRR
+ 2× µ
2
∑i∈R,j∈S
W ij(yi − yj
)2
︸ ︷︷ ︸CRS
+µ
2
∑i,j∈S
W ij(yi − yj
)2
︸ ︷︷ ︸CSS
+∑i∈L
(yi − yi)2
︸ ︷︷ ︸CL
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Let’s Make it Simpler
Computing CRR is quadratic in n ⇒ just get rid of it.
Linear system with |S| = m � n unknowns ⇒ much faster
Still need to do matrix multiplications ⇒ scales as O(m2n)
This slide looked pretty empty with only three points
It is important to have reading material when nobodyunderstands your accent
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random :
fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random : fast,
easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random : fast, easy,
crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random : fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Subset Selection
1 Random : fast, easy, crappy.Main problem = does not “fill the space” well enough ⇒bad approximation by the induction formula (some pointshave no near neighbors in the subset)
2 Heuristic : greedy construction of subset by starting withthe labeled points and iteratively minimizing over i∑
j∈S
W ij
i.e. choose the point x i farthest from the current subset.(Additional tricks to eliminate outliers and sample morepoints near decision surface)
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Experimental ResultsThis is not last-minute research so we do have results!
Table: Comparative Classification Error (Induction)
% labeled LETTERS MNIST COVTYPE1%
NoSub 56.0 35.8 47.3RandSubsubOnly 59.8 29.6 44.8
RandSub 57.4 27.7 75.7SmartSub 55.8 24.4 45.0
5%NoSub 27.1 12.8 37.1
RandSubsubOnly 32.1 14.9 35.4RandSub 29.1 12.6 70.6
SmartSub 28.5 12.3 35.810%
NoSub 18.8 9.5 34.7RandSubsubOnly 22.5 11.4 32.4
RandSub 20.3 9.7 64.7SmartSub 19.8 9.5 33.4
More comparisons between RandSub and SmartSub on 8 more UCI datasets⇒SmartSub always performs better.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Experimental ResultsThis is not last-minute research so we do have results!
Table: Comparative Classification Error (Induction)
% labeled LETTERS MNIST COVTYPE1%
NoSub 56.0 35.8 47.3RandSubsubOnly 59.8 29.6 44.8
RandSub 57.4 27.7 75.7SmartSub 55.8 24.4 45.0
5%NoSub 27.1 12.8 37.1
RandSubsubOnly 32.1 14.9 35.4RandSub 29.1 12.6 70.6
SmartSub 28.5 12.3 35.810%
NoSub 18.8 9.5 34.7RandSubsubOnly 22.5 11.4 32.4
RandSub 20.3 9.7 64.7SmartSub 19.8 9.5 33.4
More comparisons between RandSub and SmartSub on 8 more UCI datasets⇒SmartSub always performs better.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Curse of Dimensionality
Labeling function: y(x ) =∑
i yiW X (x i , x )
Locality : far point prediction = nearestneighbor; also normal vector ∂y
∂x (x ) isapproximately in the span of the nearestneighbors of x
Smoothness : it does not vary much within aball of radius small w.r.t. σ (obtained fromsecond derivative); also form of cost ⇒the learned function varies smoothly inregions with no labeled examples
Curse : one needs lots of unlabeled examplesto “fill” the region near the decision surface,and lots of labeled examples to account for all“clusters”
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Curse of Dimensionality
Labeling function: y(x ) =∑
i yiW X (x i , x )
Locality : far point prediction = nearestneighbor; also normal vector ∂y
∂x (x ) isapproximately in the span of the nearestneighbors of x
Smoothness : it does not vary much within aball of radius small w.r.t. σ (obtained fromsecond derivative); also form of cost ⇒the learned function varies smoothly inregions with no labeled examples
Curse : one needs lots of unlabeled examplesto “fill” the region near the decision surface,and lots of labeled examples to account for all“clusters”
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
“Almost Real-Life” Example
Need unlabeled exam-ples along the sinus deci-sion surface, and labeledexamples in each classregion.
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
Conclusion
Simple non-parametric setting ⇒ powerful non-parametricsemi-supervised algorithm
Can scale to large datasets thanks to sparsity / subsetselection
Interesting links with electric networks / heat diffusion
Limitations of local weights: curse of dimensionality
Makes it possible to be on time for lunch!!!
Semi-Supervised Setting Graph Regularization and Label Prop. Transduction / Induction The Curse
References
Belkin, M., Matveeva, I., and Niyogi, P. (2004).Regularization and semi-supervised learning on large graphs.In Shawe-Taylor, J. and Singer, Y., editors, COLT’2004. Springer.
Delalleau, O., Bengio, Y., and Le Roux, N. (2005).Efficient non-parametric function induction in semi-supervised learning.In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics.
Doyle, P. G. and Snell, J. L. (1984).Random walks and electric networks.Mathematical Association of America.
Zhou, D., Bousquet, O., Navin Lal, T., Weston, J., and Schölkopf, B. (2004).Learning with local and global consistency.In Thrun, S., Saul, L., and Schölkopf, B., editors, Advances in Neural Information Processing Systems 16,Cambridge, MA. MIT Press.
Zhu, X., Ghahramani, Z., and Lafferty, J. (2003).Semi-supervised learning using Gaussian fields and harmonic functions.In ICML’2003.