This Talk - Stanford...

66
This Talk § 1) Node embeddings § Map nodes to low-dimensional embeddings. § 2) Graph neural networks § Deep learning architectures for graph- structured data § 3) Applications Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1

Transcript of This Talk - Stanford...

Page 1: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

This Talk§ 1) Node embeddings

§ Map nodes to low-dimensional embeddings.

§ 2) Graph neural networks§ Deep learning architectures for graph-

structured data

§ 3) ApplicationsRepresentation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 1

Page 2: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

2

Part 2: Graph Neural

Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

Page 3: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Embedding Nodes

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 3

• Goal is to encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.

Page 4: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Embedding Nodes

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 4

similarity(u, v) ⇡ z>v zuGoal:

Need to define!

Page 5: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Two Key Components

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 5

§ Encoder maps each node to a low-dimensional vector.

§ Similarity function specifies how relationships in vector space map to relationships in the original network.

enc(v) = zvnode in the input graph

d-dimensional embedding

Similarity of u and v in the original network

dot product between node embeddings

similarity(u, v) ⇡ z>v zu

Page 6: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

From “Shallow” to “Deep”

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 6

§ So far we have focused on “shallow” encoders, i.e. embedding lookups:

Z = Dimension/size of embeddings

one column per node

embedding matrix

embedding vector for a specific node

Page 7: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

From “Shallow” to “Deep”

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 7

§ Limitations of shallow encoding:§ O(|V|) parameters are needed: there no

parameter sharing and every node has its own unique embedding vector.

§ Inherently “transductive”: It is impossible to generate embeddings for nodes that were not seen during training.

§ Do not incorporate node features: Many graphs have features that we can and should leverage.

Page 8: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

From “Shallow” to “Deep”

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 8

§ We will now discuss “deeper” methods based on graph neural networks.

§ In general, all of these more complex encoders can be combined with the similarity functions from the previous section.

enc(v) = complex function that depends on graph structure.

Page 9: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Outline for this Section

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 9

§ We will now discuss “deeper” methods based on graph neural networks.1. The Basics2. Graph Convolutional Networks (GCNs)3. GraphSAGE4. Gated Graph Neural Networks5. Subgraph Embeddings

Page 10: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

10

The Basics: Graph Neural Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

Based on material from:• Hamilton et al. 2017. Representation Learning on Graphs: Methods

and Applications. IEEE Data Engineering Bulletin on Graph Systems.• Scarselli et al. 2005. The Graph Neural Network Model. IEEE

Transactions on Neural Networks.

Page 11: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Setup

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 11

§ Assume we have a graph G:§ V is the vertex set.§ A is the adjacency matrix (assume binary).§ X∈ R𝒎×|𝑽| is a matrix of node features.

§ Categorical attributes, text, image data– E.g., profile information in a social network.

§ Node degrees, clustering coefficients, etc.§ Indicator vectors (i.e., one-hot encoding of

each node)

Page 12: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 12

§ Key idea: Generate node embeddingsbased on local neighborhoods.

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Page 13: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 13

§ Intuition: Nodes aggregate information from their neighbors using neural networks

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Page 14: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 14

§ Intuition: Network neighborhood defines a computation graphEvery node defines a unique

computation graph!

Page 15: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 15

§ Nodes have embeddings at each layer.§ Model can be arbitrary depth.§ “layer-0” embedding of node u is its input feature, i.e. xu.

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

xA

xB

xC

xExF

xA

xA

Layer-2

Layer-1Layer-0

Page 16: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood “Convolutions”

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 16

§ Neighborhood aggregation can be viewed as a center-surround filter.

§ Mathematically related to spectral graph convolutions (see Bronstein et al., 2017)

Page 17: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 17

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

???

?

?

?

what’s in the box!?

§ Key distinctions are in how different approaches aggregate information across the layers.

Page 18: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 18

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

§ Basic approach: Average neighbor information and apply a neural network.

1) average messages from neighbors

2) apply neural network

Page 19: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

average of neighbor’s previous layer embeddings

The Math

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 19

§ Basic approach: Average neighbor messages and apply a neural network.

Initial “layer 0” embeddings are equal to node features

kth layer embedding

of vnon-linearity (e.g.,

ReLU or tanh)

previous layer embedding of vh0

v = xv

hkv = �

0

@Wk

X

u2N(v)

hk�1u

|N(v)| +Bkhk�1v

1

A , 8k > 0

Page 20: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Training the Model

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 20

zA

Need to define a loss function on the embeddings, L(zu)!

§ How do we train the model to generate “high-quality” embeddings?

Page 21: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Training the Model

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 21

§ After K-layers of neighborhood aggregation, we get output embeddings for each node.

§ We can feed these embeddings into any loss function and run stochastic gradient descent to train the aggregation parameters.

trainable matrices (i.e., what we learn) h0

v = xv

hkv = �

0

@Wk

X

u2N(v)

hk�1u

|N(v)| +Bkhk�1v

1

A , 8k 2 {1, ...,K}

zv = hKv

Page 22: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Training the Model

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 22

§ Train in an unsupervised manner using only the graph structure.

§ Unsupervised loss function can be anything from the last section, e.g., based on§ Random walks (node2vec, DeepWalk)§ Graph factorization§ i.e., train the model so that “similar” nodes have

similar embeddings.

Page 23: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Training the Model

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 23

§ Alternative: Directly train the model for a supervised task (e.g., node classification):

Human or bot?

Human or bot?

e.g., an online social network

Page 24: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Training the Model

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 24

§ Alternative: Directly train the model for a supervised task (e.g., node classification):

L =X

v2V

yv log(�(z>v ✓)) + (1� yv) log(1� �(z>v ✓))

output node embedding

classification weights

node class label

Human or bot?

Page 25: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Overview of Model Design

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 25

1) Define a neighborhood aggregation function.

zA

2) Define a loss function on the embeddings, L(zu)

Page 26: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Overview of Model Design

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 26

3) Train on a set of nodes, i.e., a batch of compute graphs

Page 27: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Overview of Model

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 27

4) Generate embeddings for nodes as needed

Even for nodes we never trained on!!!!

Page 28: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Inductive Capability

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 28

INPUT GRAPH

B

DE

F

CA

Compute graph for node A Compute graph for node B

shared parameters

shared parameters

Wk Bk

§ The same aggregation parameters are shared for all nodes.

§ The number of model parameters is sublinear in |V|and we can generalize to unseen nodes!

Page 29: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Inductive Capability

29

Inductive node embedding generalize to entirely unseen graphs

e.g., train on protein interaction graph from model organism A and generate embeddings on newly collected data about organism B

train on one graph generalize to new graph

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

zu

Page 30: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Inductive Capability

30

train with snapshot new node arrivesgenerate embedding for new node

Many application settings constantly encounter previously unseen nodes.e.g., Reddit, YouTube, GoogleScholar, ….

Need to generate new embeddings “on the fly”

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

zu

Page 31: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Quick Recap

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 31

§ Recap: Generate node embeddings by aggregating neighborhood information.§ Allows for parameter sharing in the encoder.§ Allows for inductive learning.

§ We saw a basic variant of this idea…now we will cover some state of the art variants from the literature.

Page 32: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 32

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

???

?

?

?

What else can we put in the box?

§ Key distinctions are in how different approaches aggregate messages

Page 33: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Outline for this Section

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 33

1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Subgraph Embeddings

Page 34: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

34

Graph Convolutional Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

Based on material from:• Kipf et al., 2017. Semisupervised Classification with Graph Convolutional

Networks. ICLR.

Page 35: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Graph Convolutional Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 35

§ Kipf et al.’s Graph Convolutional Networks (GCNs) are a slight variation on the neighborhood aggregation idea:

hkv = �

0

@Wk

X

u2N(v)[v

hk�1up

|N(u)||N(v)|

1

A

Page 36: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Graph Convolutional Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 36

same matrix for self and neighbor embeddings

per-neighbor normalization

hkv = �

0

@Wk

X

u2N(v)[v

hk�1up

|N(u)||N(v)|

1

A

hkv = �

0

@Wk

X

u2N(v)

hk�1u

|N(v)| +Bkhk�1v

1

A

Basic Neighborhood Aggregation

GCN Neighborhood AggregationVS.

Page 37: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Graph Convolutional Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 37

§ Empirically, they found this configuration to give the best results. § More parameter sharing.§ Down-weights high degree neighbors.

use the same transformation matrix for self and neighbor

embeddings

instead of simple average, normalization varies across

neighbors

hkv = �

0

@Wk

X

u2N(v)[v

hk�1up

|N(u)||N(v)|

1

A

Page 38: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Batch Implementation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 38

§ Can be efficiently implemented using sparse batch operations:

§ O(|E|)time complexity overall.

H(k+1) = �

⇣D

� 12 AD

� 12H

(k)Wk

A = A+ I

Dii =X

j

Ai,j

where

Page 39: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Outline for this Section

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 39

1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Subgraph Embeddings

Page 40: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

40

GraphSAGE

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

Based on material from:• Hamilton et al., 2017. Inductive Representation Learning on Large Graphs.

NIPS.

Page 41: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

GraphSAGE Idea

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 41

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

???

?

?

?

§ So far we have aggregated the neighbor messages by taking their (weighted) average, can we do better?

Page 42: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

GraphSAGE Idea

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 42

hkv = �

�⇥Ak · agg({hk�1

u , 8u 2 N(v)}),Bkhk�1v

⇤�

Any differentiable function that maps set of vectors to a

single vector.

Page 43: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

§ Simple neighborhood aggregation:

§ GraphSAGE:

GraphSAGE Differences

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 43

generalized aggregation

concatenate self embedding and neighbor embedding

hkv = �

�⇥Wk · agg

�{hk�1

u , 8u 2 N(v)}�,Bkh

k�1v

⇤�

hkv = �

0

@Wk

X

u2N(v)

hk�1u

|N(v)| +Bkhk�1v

1

A

Page 44: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

GraphSAGE Variants

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 44

§ Mean:

§ Pool§ Transform neighbor vectors and apply symmetric

vector function.

§ LSTM:§ Apply LSTM to random permutation of neighbors.

agg =X

u2N(v)

hk�1u

|N(v)|

agg = ��{Qhk�1

u , 8u 2 N(v)}�

agg = LSTM�[hk�1

u , 8u 2 ⇡(N(v))]�

element-wise mean/max

Page 45: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Outline for this Section

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 45

1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Subgraph Embeddings

Page 46: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

46

Gated Graph Neural Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

Based on material from:• Li et al., 2016. Gated Graph Sequence Neural Networks. ICLR.

Page 47: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 47

§ Basic idea: Nodes aggregate “messages” from their neighbors using neural networks

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Page 48: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 48

§ GCNs and GraphSAGE generally only 2-3 layers deep.

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Page 49: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Neighborhood Aggregation

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 49

§ But what if we want to go deeper?

INPUT GRAPH

TARGET NODE B

DE

F

CA

A

D

B

C …..

…..

10+ layers!?

Page 50: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Gated Graph Neural Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 50

§ How can we build models with many layers of neighborhood aggregation?

§ Challenges:§ Overfitting from too many parameters.§ Vanishing/exploding gradients during

backpropagation. § Idea: Use techniques from modern

recurrent neural networks!

Page 51: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

INPUT GRAPH

TARGET NODE B

DE

F

CA

A

A

C

F

B

E

A

A

D

B

C ….

Gated Graph Neural Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 51

§ Idea 1: Parameter sharing across layers.same neural network

across layers

Page 52: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Gated Graph Neural Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 52

§ Idea 2: Recurrent state update.

RNN module

INPUT GRAPH

TARGET NODE B

DE

F

CA

A

A

C

F

B

E

A

A

D

B

C ….

RNN module!

Page 53: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

aggregation function does not depend on k

§ Intuition: Neighborhood aggregation with RNN state update.1. Get “message” from neighbors at step k:

2. Update node “state” using Gated Recurrent Unit (GRU). New node state depends on the old state and the message from neighbors:

mkv = W

X

u2N(v)

hk�1u

The Math

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 53

hkv = GRU(hk�1

v ,mkv)

Page 54: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Gated Graph Neural Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 54

§ Can handle models with >20 layers.§ Most real-world networks have small diameters

(e.g., less than 7).§ Allows for complex information about global graph

structure to be propagated to all nodes.

RNN moduleINPUT GRAPH

TARGET NODE B

DE

F

CA

A

A

C

F

B

E

A

A

D

B

C ….

Page 55: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Gated Graph Neural Networks

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 55

§ Useful for complex networks representing:§ Logical formulas.§ Programs.

RNN moduleINPUT GRAPH

TARGET NODE B

DE

F

CA

A

A

C

F

B

E

A

A

D

B

C ….

Page 56: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Outline for this Section

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 56

1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Subgraph Embeddings

Page 57: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Summary so far

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 57

§ Key idea: Generate node embeddingsbased on local neighborhoods.

INPUT GRAPH

TARGET NODE B

DE

F

CA

B

C

D

A

A

A

C

F

B

E

A

Page 58: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Summary so far

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 58

§ Graph convolutional networks§ Average neighborhood information and

stack neural networks.§ GraphSAGE

§ Generalized neighborhood aggregation.§ Gated Graph Neural Networks

§ Neighborhood aggregation + RNNs

Page 59: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Recent advances in graph neural nets(not covered in detail here)

59Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

§ Attention-based neighborhood aggregation:§ Graph Attention Networks (Velickovic et al., 2018)§ GeniePath (Liu et al., 2018)

§ Generalizations based on spectral convolutions:§ Geometric Deep Learning (Bronstein et al., 2017)§ Mixture Model CNNs (Monti et al., 2017)

§ Speed improvements via subsampling:§ FastGCNs (Chen et al., 2018)§ Stochastic GCNs (Chen et al., 2017)

Page 60: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Outline for this Section

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 60

1. The Basics2. Graph Convolutional Networks3. GraphSAGE4. Gated Graph Neural Networks5. Subgraph Embeddings

Page 61: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

61

Subgraph Embeddings

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018

Based on material from:• Duvenaud et al. 2016. Convolutional Networks on Graphs for

Learning Molecular Fingerprints. ICML.• Li et al. 2016. Gated Graph Sequence Neural Networks. ICLR.

Page 62: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

(Sub)graph Embeddings

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 62

§ So far we have focused on node-level embeddings…

Page 63: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

(Sub)graph Embeddings

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 63

§ But what about subgraphembeddings?

Page 64: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Approach 1

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 64

§ Simple idea: Just sum (or average) the node embeddings in the (sub)graph

§ Used by Duvenaud et al., 2016 to classify molecules based on their graph structure.

zS =X

v2S

zv

Page 65: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

Approach 2

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 65

§ Idea: Introduce a “virtual node” to represent the subgraph and run a standard graph neural network.

§ Proposed by Li et al., 2016 as a general technique for subgraph embedding.

Page 66: This Talk - Stanford Universitysnap.stanford.edu/proj/embeddings-www/files/nrltutorial-part2-gnns.pdf · Inductive node embedding generalize to entirely unseen graphs e.g., train

(Sub)graph Embeddings

Representation Learning on Networks, snap.stanford.edu/proj/embeddings-www, WWW 2018 66

§ Still an open research area!§ How to embed (sub)graphs with millions or

billions of nodes?§ How to do analogue of CNN “pooling” on

networks?