Consistent Semi-Supervised Graph Regularization for High ...
SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL...
Transcript of SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL...
![Page 1: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/1.jpg)
SEMI-SUPERVISED CLASSIFICATION
WITH GRAPH CONVOLUTIONAL
NETWORKS
Thomas N. Kipf, Max WellingICLR 2017
Presented by Devansh Shah
1
![Page 2: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/2.jpg)
Semi-Supervised Learning
Goal: Learn a better prediction rule than based on labeled data alone2
![Page 3: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/3.jpg)
Why bother?
• Unlabeled data is cheap
• Labeled data can be hard to get
• human annotation is boring
• labels may require experts
3
![Page 4: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/4.jpg)
Can Unlabeled data help?
• Assuming each class is a coherent group (e.g. Gaussian)
• With and without unlabeled data: decision boundary shift
4
![Page 5: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/5.jpg)
Can Unlabeled data help?
“Similar” data points have “similar” labels5
![Page 6: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/6.jpg)
Semi-supervised vs transductive learning
• labeled data (Xl ,Yl) = {(x1:l , y1:l)}• unlabeled data Xu = {xl+1:n}, available during training
• test data Xtest = {xn+1:}, not available during training
Inductive learning is ultimately applied to the test data.
Transductive learning is only concerned with the unlabeled data.
6
![Page 7: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/7.jpg)
Graph Convolutional Networks
7
![Page 8: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/8.jpg)
Applications
• Social Networks
• Protein-Protein Interaction
• 3D Meshes
• Clustering
• Scene Graphs
8
![Page 9: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/9.jpg)
Graph Learning Problem
Inputs:
• graph G = (V ,E )
• A feature description xi for every node i; summarized in a
N × D feature matrix X (N: number of nodes, D: number of
input features)
• Adjacency matrix A
Outputs:
• node-level output Z (an N×F feature matrix, where F is the
number of output features per node)
9
![Page 10: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/10.jpg)
Understanding Graph Neural Networks
Every neural network layer can be written as a non-linear function
H l+1 = f (H l ,A) with
• H0 = X
• HL = Z where L is number of layers
10
![Page 11: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/11.jpg)
Understanding Graph Neural Networks
f (H l ,A) = σ(AH lW l) where
• W l is weight matrix for the l-th layer
• σ(.) is a non-linear activation function like the ReLU
11
![Page 12: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/12.jpg)
Understanding Graph Neural Networks
Limitation I:
• Multiplication with A means that, for every node, we sum up
all the feature vectors of all neighboring nodes but not the
node itself
Fix:
• Enforce self-loop in the graph by adding identity matrix to A
12
![Page 13: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/13.jpg)
Understanding Graph Neural Networks
Limitation II:
• A is typically not normalized and therefore the multiplication
with A will completely change the scale of the feature vectors
Fix:
• Normalize A such that all rows sum to one, i.e. D−1A, where
D is the diagonal node degree matrix. Multiplying with D−1A
now corresponds to taking the average of neighboring node
features
13
![Page 14: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/14.jpg)
Understanding Graph Neural Networks
Propagation Rule: f (H l ,A) = σ(D−0.5AD−0.5H lW l)
• A = A + I , where I is the identity matrix
• D is the diagonal node degree matrix of A
14
![Page 15: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/15.jpg)
Semi-Supervised Node Classification
Cross-Entropy error over all labeled examples
Z = softmax(HL)
Loss = −∑l∈YL
F∑f=1
Ylf lnZlf
• HL is the output of the last layer
• YL is the set of node indices that have labels
• F is the number of distinct output classes
15
![Page 16: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/16.jpg)
Experiments
Datasets
16
![Page 17: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/17.jpg)
Experiments
Baselines
• Label Propagation (LP)
• Semi-Supervised embedding (SemiEmb)
• Manifold regularization (ManiReg)
• skip-gram based graph embeddings (DeepWalk)
• Iterative classification algorithm (ICA)
17
![Page 18: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/18.jpg)
Experiments
Results
18
![Page 19: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/19.jpg)
Robust Graph Convolutional Networks Against
Adversarial Attacks
Dingyuan Zhu, Ziwei Zhang, Peng Cui, Wenwu ZhuACM SIGKDD 2019
Presented by Devansh Shah
19
![Page 20: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/20.jpg)
Adversarial Attacks on Graphs
RELATED WORK
• Adversarial Attack on Graph Structured Data
• Adversarial Attacks on Neural Networks for Graph Data
20
![Page 21: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/21.jpg)
Graph adversarial attack
Transductive Node Classification Setting
• A single graph G0 = (V0,E0) is considered in the entire
dataset
• A target node ci ∈ Vi of graph Gi is associated with a
corresponding node label yi ∈ Y
• Test nodes (but not their labels) are also observed during
training
• D(tra) = {(G0, ci , yi )}Ni=1
21
![Page 22: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/22.jpg)
Graph adversarial attack
Problem DefinitionGiven:
• A learned classifier f
• An instance from the dataset (G , c , y) ∈ D
The graph adversarial attacker g(·, ·) : G × D → G modifies the
graph G = (V ,E ) into G = (V , E ) such that,
maxG
1(f (G , c) 6= y)
s.t. G = g(f , (G , c , y))
Eq(G , G , c) = 1
Here Eq(·, ·, ·) : G × G × V → {0, 1} is an equivalency indicator
that tells whether two graphs G and G are semantically equivalent 22
![Page 23: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/23.jpg)
Graph adversarial attack
23
![Page 24: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/24.jpg)
Robust Graph Convolutional Network (RGCN)
Crux of the paper
• Instead of representing nodes as vectors, they are represented
as Gaussian distributions in each convolutional layer
• When the graph is attacked, the model can automatically
absorb the effects of adversarial changes in the variances of
the Gaussian distributions
• To remedy the propagation of adversarial attacks in GCNs,
variance-based attention mechanism is used when performing
convolutions
24
![Page 25: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/25.jpg)
Gaussian-based Graph Convolution Layer
Latent representation of node vi in layer l
hli = N (µli , diag(σli ))
µli ∈ Rfl is the mean vector
diag(σli )) ∈ Rfl×fl is the diagonal variance matrix
Notation:
M l = [µl1, ..., µN1 ] ∈ RN×fl is the mean matrix
Covl = [σl1, ..., σN1 ] ∈ RN×fl is the variance matrix
25
![Page 26: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/26.jpg)
RGCN
26
![Page 27: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/27.jpg)
RGCN
TheoremIf xi ∼ N (µi , diag(σi )) i = 1, ...n and they are independent, then
for any fixed weights wi , we have:
n∑i=1
wixi ∼ N (n∑
i=1
wiµi , diag(n∑
i=1
w2i σi ))
27
![Page 28: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/28.jpg)
RGCN Node Aggregation
To prevent the propagation of adversarial attacks in GCNs, we
propose an attention mechanism to assign different weights to
neighbors based on their variances since larger variances indicate
more uncertainties in the latent representations and larger
probability of having been attacked
αlj = exp(−γσlj )
Here αlj are the attention weights of node vj in the layer l and γ is
a hyper-parameter
28
![Page 29: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/29.jpg)
RGCN Node Aggregation
µl+1i = ReLU(
∑j∈ne(i)
1√Di ,i Dj ,j
(µlj � αlj)W
lµ)
σl+1i = ReLU(
∑j∈ne(i)
1
Di ,i Dj ,j
(σlj � αlj � αl
j)Wlσ)
29
![Page 30: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/30.jpg)
Loss Functions
Considering that the hidden representations of our method are
Gaussian distributions, we first adopt a sampling process in the last
hidden layer
zi ∼ N (µLi , diag(σLi ))
Next zi is passed to a softmax function to get the predicted labels:
Y = softmax(Z ),Z = [z1, ..., zn]
Lcls is the cross-entropy loss between the actual labels and the
predicted probabilities for the labelled nodes
30
![Page 31: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/31.jpg)
Loss Functions
To ensure that the learned representations are indeed Gaussian
distributions, we use an explicit regularization to constrain the
latent representations in the first layer as follows
Lreg1 =n∑
i=1
KL(N (µi , diag(σi ))||N (0, I ))
where KL(·||·) is the KL-divergence between two distributions
We also impose L2 regularization on parameters of the first layer as
follows:
Lreg2 =∥∥∥W (0)
µ
∥∥∥22
+∥∥∥W (0)
σ
∥∥∥22
31
![Page 32: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/32.jpg)
Loss Functions
L = Lcls + β1Lreg1 + β2Lreg2
where β1 and β2 are hyper-parameters that control the impact of
different regularizations
32
![Page 33: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/33.jpg)
Results
Node Classification on Clean Datasets
RGCN slightly outperforms the baseline methods on Pubmed,
while having comparable performance on Cora and Citeseer
33
![Page 34: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/34.jpg)
Results
Against Non-targeted Adversarial Attacks
34
![Page 35: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/35.jpg)
Results
Against Targeted Adversarial Attacks
35
![Page 36: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised](https://reader033.fdocuments.in/reader033/viewer/2022052719/5f07f3717e708231d41f93f7/html5/thumbnails/36.jpg)
Thank You!
35