This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning,...
Transcript of This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning,...
![Page 1: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/1.jpg)
This Talk
Tutorial on Graph Representation Learning, AAAI 2019 1
§ 1) Node embeddings§ Map nodes to low-dimensional
embeddings.
§ 2) Graph neural networks
§ Deep learning architectures for graph-structured data
§ 3) Generative graph models§ Learning to generate realistic graph data.
![Page 2: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/2.jpg)
2
Part 2: Graph Neural
Networks
Tutorial on Graph Representation Learning, AAAI 2019
![Page 3: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/3.jpg)
Embedding Nodes
Tutorial on Graph Representation Learning, AAAI 2019 3
• Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.
![Page 4: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/4.jpg)
Embedding Nodes
Tutorial on Graph Representation Learning, AAAI 2019 4
similarity(u, v) ⇡ z>v zuGoal:
Need to define!
![Page 5: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/5.jpg)
Two Key Components
Tutorial on Graph Representation Learning, AAAI 2019 5
§ Encoder maps each node to a low-dimensional vector.
§ Similarity function specifies how relationships in vector space map to relationships in the original network.
enc(v) = zvnode in the input graph
d-dimensional embedding
Similarity of u and v in the original network
dot product between node embeddings
similarity(u, v) ⇡ z>v zu
![Page 6: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/6.jpg)
From “Shallow” to “Deep”
Tutorial on Graph Representation Learning, AAAI 2019 6
§ So far we have focused on “shallow” encoders, i.e. embedding lookups:
Z = Dimension/size of embeddings
one column per node
embedding matrix
embedding vector for a specific node
![Page 7: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/7.jpg)
From “Shallow” to “Deep”
Tutorial on Graph Representation Learning, AAAI 2019 7
§ Limitations of shallow encoding:§ O(|V|) parameters are needed: there no
parameter sharing and every node has its own unique embedding vector.
§ Inherently “transductive”: It is impossible to generate embeddings for nodes that were not seen during training.
§ Do not incorporate node features: Many graphs have features that we can and should leverage.
![Page 8: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/8.jpg)
From “Shallow” to “Deep”
Tutorial on Graph Representation Learning, AAAI 2019 8
§ We will now discuss “deeper” methods based on graph neural networks.
§ In general, all of these more complex encoders can be combined with the similarity functions from the previous section.
enc(v) = complex function that depends on graph structure.
![Page 9: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/9.jpg)
Outline for this Section
Tutorial on Graph Representation Learning, AAAI 2019 9
§ We will now discuss “deeper” methods based on graph neural networks.1. The Basics2. Graph Convolutional Networks 3. GraphSAGE4. Gated Graph Neural Networks5. Graph Attention Networks6. Subgraph embeddings
![Page 10: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/10.jpg)
10
The Basics: Graph Neural Networks
Tutorial on Graph Representation Learning, AAAI 2019
Based on material from:• Hamilton et al. 2017. Representation Learning on Graphs: Methods
and Applications. IEEE Data Engineering Bulletin on Graph Systems.• Scarselli et al. 2005. The Graph Neural Network Model. IEEE
Transactions on Neural Networks.
![Page 11: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/11.jpg)
Setup
Tutorial on Graph Representation Learning, AAAI 2019 11
§ Assume we have a graph G:
§ V is the vertex set.
§ A is the adjacency matrix (assume binary).
§ X∈ R(×|+| is a matrix of node features.
§ Categorical attributes, text, image data
– E.g., profile information in a social network.
§ Node degrees, clustering coefficients, etc.
§ Indicator vectors (i.e., one-hot encoding of each node)
![Page 12: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/12.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 12
§ Key idea: Generate node embeddingsbased on local neighborhoods.
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
![Page 13: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/13.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 13
§ Intuition: Nodes aggregate information from their neighbors using neural networks
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
![Page 14: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/14.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 14
§ Intuition: Network neighborhood defines a computation graphEvery node defines a unique
computation graph!
![Page 15: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/15.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 15
§ Nodes have embeddings at each layer.
§ Model can be arbitrary depth.§ “layer-0” embedding of node u is its input feature, i.e. xu.
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
xA
xB
xC
xExF
xA
xA
Layer-2
Layer-1
Layer-0
![Page 16: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/16.jpg)
Neighborhood “Convolutions”
Tutorial on Graph Representation Learning, AAAI 2019 16
§ Neighborhood aggregation can be viewed as a center-surround filter.
§ Mathematically related to spectral graph convolutions (see Bronstein et al., 2017)
![Page 17: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/17.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 17
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
???
?
?
?
what’s in the box!?
§ Key distinctions are in how different approaches aggregate information across the layers.
![Page 18: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/18.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 18
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
§ Basic approach: Average neighbor information and apply a neural network.
1) average messages from neighbors
2) apply neural network
![Page 19: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/19.jpg)
average of neighbor’s previous layer embeddings
The Math
Tutorial on Graph Representation Learning, AAAI 2019 19
§ Basic approach: Average neighbor messages and apply a neural network.
Initial “layer 0” embeddings are equal to node features
kth layer embedding
of vnon-linearity (e.g.,
ReLU or tanh)
previous layer embedding of vh0
v = xv
hkv = �
0
@Wk
X
u2N(v)
hk�1u
|N(v)| +Bkhk�1v
1
A , 8k > 0
![Page 20: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/20.jpg)
Training the Model
Tutorial on Graph Representation Learning, AAAI 2019 20
zA
Need to define a loss function on the embeddings, L(zu)!
§ How do we train the model to generate “high-quality” embeddings?
![Page 21: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/21.jpg)
Training the Model
Tutorial on Graph Representation Learning, AAAI 2019 21
§ After K-layers of neighborhood aggregation, we get output embeddings for each node.
§ We can feed these embeddings into any loss function and run stochastic gradient descent to train the aggregation parameters.
trainable matrices (i.e., what we learn) h0
v = xv
hkv = �
0
@Wk
X
u2N(v)
hk�1u
|N(v)| +Bkhk�1v
1
A , 8k 2 {1, ...,K}
zv = hKv
![Page 22: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/22.jpg)
Training the Model
Tutorial on Graph Representation Learning, AAAI 2019 22
§ Train in an unsupervised manner using only the graph structure.
§ Unsupervised loss function can be anything from the last section, e.g., based on§ Random walks (node2vec, DeepWalk)
§ Graph factorization
§ i.e., train the model so that “similar” nodes have similar embeddings.
![Page 23: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/23.jpg)
Training the Model
Tutorial on Graph Representation Learning, AAAI 2019 23
§ Alternative: Directly train the model for a supervised task (e.g., node classification):
Human or bot?
Human or bot?
e.g., an online social network
![Page 24: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/24.jpg)
Training the Model
Tutorial on Graph Representation Learning, AAAI 2019 24
§ Alternative: Directly train the model for a supervised task (e.g., node classification):
L =X
v2V
yv log(�(z>v ✓)) + (1� yv) log(1� �(z>v ✓))
output node embedding
classification weights
node class label
Human or bot?
![Page 25: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/25.jpg)
Overview of Model
Tutorial on Graph Representation Learning, AAAI 2019 25
1) Define a neighborhood aggregation function.
zA
2) Define a loss function on the embeddings, L(zu)
![Page 26: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/26.jpg)
Overview of Model
Tutorial on Graph Representation Learning, AAAI 2019 26
3) Train on a set of nodes, i.e., a batch of compute graphs
![Page 27: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/27.jpg)
Overview of Model
Tutorial on Graph Representation Learning, AAAI 2019 27
4) Generate embeddings for nodes as needed
Even for nodes we never trained on!!!!
![Page 28: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/28.jpg)
Inductive Capability
Tutorial on Graph Representation Learning, AAAI 2019 28
INPUT GRAPH
B
DE
F
CA
Compute graph for node A Compute graph for node B
shared parameters
shared parameters
Wk Bk
§ The same aggregation parameters are shared for all nodes.
§ The number of model parameters is sublinear in |V|and we can generalize to unseen nodes!
![Page 29: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/29.jpg)
Inductive Capability
29
Inductive node embedding generalize to entirely unseen graphs
e.g., train on protein interaction graph from model organism A and generate embeddings on newly collected data about organism B
train on one graph generalize to new graph
Tutorial on Graph Representation Learning, AAAI 2019
zu
![Page 30: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/30.jpg)
Inductive Capability
30
train with snapshot new node arrivesgenerate embedding for new node
Many application settings constantly encounter previously unseen nodes.
e.g., Reddit, YouTube, GoogleScholar, ….
Need to generate new embeddings “on the fly”
Tutorial on Graph Representation Learning, AAAI 2019
zu
![Page 31: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/31.jpg)
Quick Recap
Tutorial on Graph Representation Learning, AAAI 2019 31
§ Recap: Generate node embeddings by aggregating neighborhood information.
§ Allows for parameter sharing in the encoder.
§ Allows for inductive learning.
§ We saw a basic variant of this idea…now we will cover some state of the art variants from the literature.
![Page 32: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/32.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 32
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
???
?
?
?
What else can we put in the box?
§ Key distinctions are in how different approaches aggregate messages
![Page 33: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/33.jpg)
33
Graph Convolutional Networks
Tutorial on Graph Representation Learning, AAAI 2019
Based on material from:• Kipf et al., 2017. Semisupervised Classification with Graph Convolutional
Networks. ICLR.
![Page 34: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/34.jpg)
Graph Convolutional Networks
Tutorial on Graph Representation Learning, AAAI 2019 34
§ Kipf et al.’s Graph Convolutional Networks (GCNs) are a slight variation on the neighborhood aggregation idea:
hkv = �
0
@Wk
X
u2N(v)[v
hk�1up
|N(u)||N(v)|
1
A
![Page 35: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/35.jpg)
Graph Convolutional Networks
Tutorial on Graph Representation Learning, AAAI 2019 35
same matrix for self and neighbor embeddings
per-neighbor normalization
hkv = �
0
@Wk
X
u2N(v)[v
hk�1up
|N(u)||N(v)|
1
A
hkv = �
0
@Wk
X
u2N(v)
hk�1u
|N(v)| +Bkhk�1v
1
A
Basic Neighborhood Aggregation
GCN Neighborhood AggregationVS.
![Page 36: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/36.jpg)
Graph Convolutional Networks
Tutorial on Graph Representation Learning, AAAI 2019 36
§ Empirically, they found this configuration to give the best results. § More parameter sharing.§ Down-weights high degree neighbors.
use the same transformation matrix for self and neighbor
embeddings
instead of simple average, normalization varies across
neighbors
hkv = �
0
@Wk
X
u2N(v)[v
hk�1up
|N(u)||N(v)|
1
A
![Page 37: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/37.jpg)
Outline for this Section
Tutorial on Graph Representation Learning, AAAI 2019 38
1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Graph Attention Networks6. Subgraph Embeddings
![Page 38: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/38.jpg)
39
GraphSAGE
Tutorial on Graph Representation Learning, AAAI 2019
Based on material from:• Hamilton et al., 2017. Inductive Representation Learning on Large Graphs.
NIPS.
![Page 39: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/39.jpg)
GraphSAGE Idea
Tutorial on Graph Representation Learning, AAAI 2019 40
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
???
?
?
?
§ So far we have aggregated the neighbor messages by taking their (weighted) average, can we do better?
![Page 40: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/40.jpg)
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
GraphSAGE Idea
Tutorial on Graph Representation Learning, AAAI 2019 41
hkv = �
�⇥Ak · agg({hk�1
u , 8u 2 N(v)}),Bkhk�1v
⇤�
Any differentiable function that maps set of vectors to a
single vector.
![Page 41: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/41.jpg)
§ Simple neighborhood aggregation:
§ GraphSAGE:
GraphSAGE Differences
Tutorial on Graph Representation Learning, AAAI 2019 42
generalized aggregation
concatenate self embedding and neighbor embedding
hkv = �
�⇥Wk · agg
�{hk�1
u , 8u 2 N(v)}�,Bkh
k�1v
⇤�
hkv = �
0
@Wk
X
u2N(v)
hk�1u
|N(v)| +Bkhk�1v
1
A
![Page 42: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/42.jpg)
GraphSAGE Variants
Tutorial on Graph Representation Learning, AAAI 2019 43
§ Mean:
§ Pool
§ Transform neighbor vectors and apply symmetric vector function.
§ LSTM:
§ Apply LSTM to random permutation of neighbors.
agg =X
u2N(v)
hk�1u
|N(v)|
agg = ��{Qhk�1
u , 8u 2 N(v)}�
agg = LSTM�[hk�1
u , 8u 2 ⇡(N(v))]�
element-wise mean/max
![Page 43: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/43.jpg)
Outline for this Section
Tutorial on Graph Representation Learning, AAAI 2019 44
1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Graph Attention Networks6. Subgraph Embeddings
![Page 44: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/44.jpg)
45
Gated Graph Neural Networks
Tutorial on Graph Representation Learning, AAAI 2019
Based on material from:• Li et al., 2016. Gated Graph Sequence Neural Networks. ICLR.• Gilmer et al., 2017. Neural Message Passing for Quantum
Chemistry. ICML.
![Page 45: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/45.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 46
§ Basic idea: Nodes aggregate “messages” from their neighbors using neural networks
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
![Page 46: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/46.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 47
§ GCNs and GraphSAGE generally only 2-3 layers deep.
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
![Page 47: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/47.jpg)
Neighborhood Aggregation
Tutorial on Graph Representation Learning, AAAI 2019 48
§ But what if we want to go deeper?
INPUT GRAPH
TARGET NODE B
DE
F
CA
A
D
B
C …..
…..
10+ layers!?
![Page 48: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/48.jpg)
Gated Graph Neural Networks
Tutorial on Graph Representation Learning, AAAI 2019 49
§ How can we build models with many layers of neighborhood aggregation?
§ Challenges:§ Overfitting from too many parameters.§ Vanishing/exploding gradients during
backpropagation. § Idea: Use techniques from modern
recurrent neural networks!
![Page 49: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/49.jpg)
INPUT GRAPH
TARGET NODE B
DE
F
CA
A
A
C
F
B
E
A
A
D
B
C ….
Gated Graph Neural Networks
Tutorial on Graph Representation Learning, AAAI 2019 50
§ Idea 1: Parameter sharing across layers.same neural network
across layers
![Page 50: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/50.jpg)
Gated Graph Neural Networks
Tutorial on Graph Representation Learning, AAAI 2019 51
§ Idea 2: Recurrent state update.
RNN module
INPUT GRAPH
TARGET NODE B
DE
F
CA
A
A
C
F
B
E
A
A
D
B
C ….
RNN module!
![Page 51: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/51.jpg)
aggregation function does not depend on k
§ Intuition: Neighborhood aggregation with RNN state update.1. Get “message” from neighbors at step k:
2. Update node “state” using Gated Recurrent Unit (GRU). New node state depends on the old state and the message from neighbors:
mkv = W
X
u2N(v)
hk�1u
The Math
Tutorial on Graph Representation Learning, AAAI 2019 52
hkv = GRU(hk�1
v ,mkv)
![Page 52: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/52.jpg)
Gated Graph Neural Networks
Tutorial on Graph Representation Learning, AAAI 2019 53
§ Can handle models with >20 layers.§ Most real-world networks have small diameters
(e.g., less than 7).§ Allows for complex information about global graph
structure to be propagated to all nodes.
RNN moduleINPUT GRAPH
TARGET NODE B
DE
F
CA
A
A
C
F
B
E
A
A
D
B
C ….
![Page 53: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/53.jpg)
Gated Graph Neural Networks
Tutorial on Graph Representation Learning, AAAI 2019 54
§ Useful for complex networks representing:§ Logical formulas.
§ Programs.
RNN moduleINPUT GRAPH
TARGET NODE B
DE
F
CA
A
A
C
F
B
E
A
A
D
B
C ….
![Page 54: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/54.jpg)
§ Idea: We can generalize the gated graph neural network idea:1. Get “message” from neighbors at step k:
2. Update node “state”:
Tutorial on Graph Representation Learning, AAAI 2019 55
Message-Passing Neural Networks
mkv =
X
u2N(v)
M(hk�1u ,hk�1
v , eu,v)
hkv = U(hk�1
v ,mkv)
Generic “message” function (e.g., sum or MLP).
Can incorporate edge features.
Generic update function (e.g., LSTM or GRU)
![Page 55: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/55.jpg)
§ This is a general conceptual framework that subsumes most GNNs. 1. Get “message” from neighbors at step k:
2. Update node “state”:
• Gilmer et al., 2017. Neural Message Passing for Quantum Chemistry. ICML.
Tutorial on Graph Representation Learning, AAAI 2019 56
Message-Passing Neural Networks
mkv =
X
u2N(v)
M(hk�1u ,hk�1
v , eu,v)
hkv = U(hk�1
v ,mkv)
![Page 56: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/56.jpg)
Outline for this Section
Tutorial on Graph Representation Learning, AAAI 2019 57
1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Graph Attention Networks6. Subgraph Embeddings
![Page 57: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/57.jpg)
58
Graph Attention Networks
Tutorial on Graph Representation Learning, AAAI 2019
Based on material from:• Velickovic et al., 2018. Graph Attention Networks. ICLR.
![Page 58: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/58.jpg)
Neighborhood Attention
Tutorial on Graph Representation Learning, AAAI 2019 59
§ What if some neighbors are more important than others?
INPUT GRAPH
TARGET NODE B
DE
F
CA
B
C
D
A
A
A
C
F
B
E
A
++
++++
+
![Page 59: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/59.jpg)
Graph Attention Networks
Tutorial on Graph Representation Learning, AAAI 2019 60
§ Augment basic graph neural network model with attention.
Learned attention weights!
Sum over all neighbors (and the
node itself)
Non-linearity
hkv = �(
X
u2N(v)[{v}
↵v,uWkhk�1
u )
![Page 60: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/60.jpg)
Attention weights
Tutorial on Graph Representation Learning, AAAI 2019 61
§ Various attention models are possible.§ The original GAT paper uses:
§ Achieved SOTA in 2018 on a number of standard benchmarks.
↵v,u =exp
�LeakyReLU
�a>[Qhv,Qhu]
��P
u02N(v)[{v} exp (LeakyReLU (a>[Qhv,Qhu0 ]))
![Page 61: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/61.jpg)
§ Various attention mechanisms can be incorporated into the “message” step:1. Get “message” from neighbors at step k:
2. Update node “state”:
Tutorial on Graph Representation Learning, AAAI 2019 62
Attention in general
mkv =
X
u2N(v)
M(hk�1u ,hk�1
v , eu,v)
hkv = U(hk�1
v ,mkv) Incorporate
attention here.
![Page 62: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/62.jpg)
Recent advances in graph neural nets(not covered in detail here)
63Tutorial on Graph Representation Learning, AAAI 2019
§ Generalizations based on spectral convolutions:
§ Geometric Deep Learning (Bronstein et al., 2017)
§ Mixture Model CNNs (Monti et al., 2017)
§ Speed improvements via subsampling:
§ FastGCNs (Chen et al., 2018)
§ Stochastic GCNs (Chen et al., 2017)
§ And much more!!!
![Page 63: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/63.jpg)
So what is SOTA?
64Tutorial on Graph Representation Learning, AAAI 2019
§ No consensus…§ Standard benchmarks ~2017-2018
§ Cora, CiteSeer, PubMed§ Semi-supervised node classification.§ Extremely noisy evaluation and basic GNN/GCNs
are very strong…§ Attention, gating, and other modifications have
shown improvements in specific settings (e.g., molecule classification, recommender systems).
![Page 64: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/64.jpg)
Outline for this Section
Tutorial on Graph Representation Learning, AAAI 2019 65
1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Graph Attention Networks6. Subgraph Embeddings
![Page 65: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/65.jpg)
66
(Sub)graph Embeddings
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018
Based on material from:• Duvenaud et al. 2016. Convolutional Networks on Graphs for Learning
Molecular Fingerprints. ICML.• Li et al. 2016. Gated Graph Sequence Neural Networks. ICLR.• Ying et al, 2018. Hierarchical Graph Representation Learning with Differentiable
Pooling. NeurIPS.
![Page 66: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/66.jpg)
(Sub)graph Embeddings
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 67
§ So far we have focused on node-level embeddings…
![Page 67: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/67.jpg)
(Sub)graph Embeddings
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 68
§ But what about subgraph embeddings?
![Page 68: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/68.jpg)
Approach 1
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 69
§ Simple idea: Just sum (or average) the node embeddings in the (sub)graph
§ Used by Duvenaud et al., 2016 to classify molecules based on their graph structure.
zS =X
v2S
zv
![Page 69: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/69.jpg)
Approach 2
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 70
§ Idea: Introduce a “virtual node” to represent the subgraph and run a standard graph neural network.
§ Proposed by Li et al., 2016 as a general technique for subgraph embedding.
![Page 70: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/70.jpg)
Approach 3
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 71
§ Idea: Learn how to hierarchically cluster the nodes.
§ First proposed by Ying et al., 2018 and currently SOTA(?).
DiffPool module DiffPool module Graph classification
![Page 71: This Talk - Jian Tang's Homepage · Training the Model Tutorial on Graph Representation Learning, AAAI 2019 22 §Train in an unsupervised manner using only the graph structure. §Unsupervised](https://reader036.fdocuments.in/reader036/viewer/2022081621/61259005c49ed829027bd45e/html5/thumbnails/71.jpg)
Approach 3
Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 72
§ Idea: Learn to hierarchically cluster the nodes.
§ Basic overview:1. Run GNN on graph and get node embeddings.2. Cluster the node embeddings together to make a
“coarsened” graph. 3. Run GNN on “coarsened” graph.4. Repeat.
§ Different approaches to clustering:§ Soft clustering via learned softmax weights (Ying et al., 2018)§ Hard clustering (Cangea et al., 2018 and Gao et al., 2018)