Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec:...
Transcript of Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec:...
![Page 1: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/1.jpg)
Structure2Vec: Deep Learning for
Security Analytics over Graphs
Le Song
Principal Engineer, AI Department, Ant Financial
Associate Professor, College of Computing
Georgia Institute of Technology
![Page 2: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/2.jpg)
2
Alibaba Ecosystem
![Page 3: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/3.jpg)
Complex networks are everywhere
3
User
Device
GPS
Account
Phone
Card
Shop
Company
Company
Shop
Device
GPS
Card
WIFI Phone No.
Account
![Page 4: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/4.jpg)
Fake account detection
Fake account can increase system level risk
Need to shut down fake accounts
4
Account – device network
Bad
guy
Good
guy
device
New accounts in a month
millions of nodes and millions of edges.
RecallP
recision
detection comparisonRule Structure2vec
![Page 5: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/5.jpg)
Cash-out detection
Cash-out for credit loans create vicious cycles
Need to detect cash-out transactions
5
Shop Owner
Account
Alipay
Account 1
Normal Bank
Account
Alipay
Account 2
Use $1000
credit
Same
person
Transfer
$800 cash
Transfer
$800 cash
Recall
Pre
cisi
on
Structure2vec
node2vec
+ xgboost
Network of millions of transactions a day
detection comparison
![Page 6: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/6.jpg)
Fundamental challenge
6
(city=Atlanta) AND (age=40)
(neighbor=A) XOR (neighbor=B)
(bought=car) OR (time<3 years)
Explosive combinations!
Manual feature designNetworks
Timing
Various apps
Learn
Representation?
![Page 7: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/7.jpg)
Representation Learning
via Structure2Vec
![Page 8: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/8.jpg)
𝑋6
𝑋1
𝑋2 𝑋3
𝑋4
𝑋5
Structure2Vec embedding (mean field)
𝜇2(0)
𝜇1(0)
𝜇6(0)
𝜇3(0)
𝜇5(0)
𝜇4(0)
Obtain embedding via
iterative update algorithm:
1. Initialize 𝜇𝑖(0)
= 0, ∀ 𝑖
2. Iterate 𝑇 times
𝜇𝑖(𝑡)
← 𝜎 𝑊1𝑋𝑖 + 𝑊2
𝑗∈𝒩 𝑖
𝜇𝑗(𝑡−1)
, ∀𝑖
𝜇1(1)
8
Parameterized
as neural network
[Dai, et al. 2016]
![Page 9: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/9.jpg)
Structure2Vec embedding (mean field)
Obtain embedding via
iterative update algorithm:
1. Initialize 𝜇𝑖(0)
= 0, ∀ 𝑖
2. Iterate 𝑇 times
𝜇𝑖(𝑡)
← 𝜎 𝑊1𝑋𝑖 + 𝑊2
𝑗∈𝒩 𝑖
𝜇𝑗(𝑡−1)
, ∀𝑖
9
Parameterized
as neural network
𝑓 𝑓 , 𝑓
Supervised
Learning
Generative
Models
Reinforcement
Learning
𝜇𝑖(𝑇) 𝜇𝑖
(𝑇) 𝜇𝑘𝑇
𝑘∈𝑆𝜇𝑗(𝑇)
𝑋6
𝑋1
𝑋2 𝑋3
𝑋4
𝑋5
𝜇2(1)
𝜇1(1)
𝜇6(1)
𝜇3(1)
𝜇5(1)
𝜇4(1)
![Page 10: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/10.jpg)
Fake account detection
Fake account can increase system level risk
Need to shut down fake accounts
10
Account – device network
Bad
guy
Good
guy
device
New accounts in a month
roughly millions of nodes and millions of edges.
RecallP
recision
detection comparisonRule Structure2vec
![Page 11: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/11.jpg)
Fake account pattern
11
Fake accountNormal account
Device
Aggregation
Activity
Aggregation
![Page 12: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/12.jpg)
Cross-platform code similarity detection
Key problem is to learn code graph presentation across platforms
12[Xu et al. 2017]
![Page 13: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/13.jpg)
Twin-networks to extract features
Codes from the same function should be similar across platforms
13
Structure2vec
![Page 14: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/14.jpg)
New tools for representation learning over graphs
(city=Atlanta) AND (age=40)
(browser=IE) XOR (system=Linux)
(bought=car) OR (usage<3 years)
Explosive combinations!
Graphical Models
𝑋 ⊥ 𝑌 | 𝑍
Life is great
Deep learning
𝑦 = 𝑓(𝑥)
Help
Manual algorithm design
14
![Page 15: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/15.jpg)
𝑋6
𝑋1
𝑋2 𝑋3
𝑋4
𝑋5
Embed belief propagation
Approximate embedding of
𝑝 𝐻𝑖 𝑥𝑗 ↦ 𝜇𝑖
via fixed point update
1. Initialize 𝜇𝑖𝑗 , ∀ 𝑖, 𝑗
2. Iterate 𝑇 times
3. Aggregate 𝜇𝑖 = 𝑊3 ℓ∈𝒩 𝑖 𝜇ℓ𝑖(𝑇)
, ∀ 𝑖
𝜇𝑖𝑗(𝑡)
← 𝜎 𝑊1𝑋𝑖 + 𝑊2
ℓ∈𝒩 𝑖 \j
𝜇ℓ𝑖(𝑡−1)
, ∀(𝑖, 𝑗)
𝑓 𝑓 , 𝑓
Supervised
Learning
Generative
Models
Reinforcement
Learning 15
Parameterized
as neural network
![Page 16: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/16.jpg)
Dynamic Networks
![Page 17: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/17.jpg)
who will do
what and when?
Dynamic processes over networks
ChristineAliceDavid Jacob
BookTowelShoe
𝑅user
item
≈
matrix factorization
𝑈
𝑉
17
![Page 18: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/18.jpg)
Unroll: time-varying dependency structuretime
𝑡0
𝑡2
𝑡1
𝑡3
Represent
𝑋1 𝑋2 𝑋3
𝑋4
𝑋5
𝐻8 𝐻9
𝐻6 𝐻7
𝐻4
𝐻1
𝐻5
𝐻2 𝐻3
𝑋6
LVM
𝐺 = (𝒱, ℇ)
user/item
raw features
Interaction
time/context
18[Dai, et al. 2016]
![Page 19: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/19.jpg)
Embed filtering/forward message passingtime
𝑡0
𝑡2
𝑡1
𝑡3
Represent
𝑋1 𝑋2 𝑋3
𝑋4
𝑋5
𝐻8 𝐻9
𝐻6 𝐻7
𝐻4
𝐻1
𝐻5
𝐻2 𝐻3
𝑋6
LVM
𝐺 = (𝒱, ℇ)
user/item
raw features
Interaction
time/context
19
![Page 20: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/20.jpg)
Cash-out detection
Cash-out for credit loans create vicious cycles
Need to detect cash-out transactions
20
Shop Owner
Account
Alipay
Account 1
Normal Bank
Account
Alipay
Account 2
Use $1000
credit
Same
person
Transfer
$800 cash
Transfer
$800 cash
Recall
Pre
cisi
on
Structure2vec
node2vec
+ xgboost
Network of millions of transactions a day
detection comparison
![Page 21: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/21.jpg)
GDELT database
Events in news media
subject – relation – object
and time
Total archives span >215 years,
trillion of events
Time-varying dependency structure
Temporal knowledge graph:
What will happen next?
21[Trivedi, et al. 2017]
![Page 22: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/22.jpg)
Enemy’s
friend
is an
enemy
CAIRO CROATIA
MANCHESTER
PROTESTOR
NEW ZEALAND
SOMALIA
TEHRAN
SINGAPORE
<ASSAULT : 06-Jul-2015>
CO
NS
ULT
CO
OP
ER
ATE
<0
6-Ju
n-2
01
5>
<2
9-Ju
n-2
01
5>
(predicted event)
<2
8-M
ay-2
01
5>
FIGH
TFIG
HT
<0
5-M
ay-2
01
5>
22
![Page 23: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/23.jpg)
COLOMBIA OTTAWA
NEW DELHI
BELGIUM
LIBYA
VENEZUELA
UAE
GAUTEMALA
<MATERIAL COOP. : 02-Jul-2015>
CO
NS
ULT
PR
OV
IDE
AID
<2
6-Ju
n-2
01
5>
<0
3-Feb
-20
15
>
(predicted event)
<1
6-M
ay-2
01
5>
FIGH
TTR
AD
E C
OO
P.
<2
8-M
ar-2
01
5>
Friends’
friend
is a friend,
common
enemy
strengthen
the tie
23
![Page 24: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/24.jpg)
Reinforcement Learning
![Page 25: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/25.jpg)
Combinatorial optimization over graphs
NP-hard
problems
Application Optimization problem
Fraudster: virus spreading
Analysts: community discovery
Platforms: resource scheduling
Minimum vertex/set cover
Maximum cut
Traveling salesman
25
![Page 26: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/26.jpg)
0
Greedy algorithm for minimum vertex cover
2 - approximation for minimum vertex cover
Repeat till all edges covered:
• Select uncovered edge with largest total degree
Manually
designed rule.
Can we learn
from data?
1
1
00
0
0 0
0 0
0
NP-hard problems
26
![Page 27: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/27.jpg)
Greedy algorithm as Markov decision process
min𝑥𝑖∈ 0,1
𝑖∈𝓥
𝑥𝑖
𝑠. 𝑡. 𝑥𝑖 + 𝑥𝑗 ≥ 1, ∀ 𝑖, 𝑗 ∈ 𝓔
Repeat:
1. Compute total degree of each
uncovered edge
2. Select both ends of uncovered
edge with largest total degree
Until all edges are covered
Minimum vertex cover: smallest number of nodes to cover all edges
Reward: 𝑟𝑡 = −1
State 𝑆: current selected nodes
Action value function: 𝑄(𝑆, 𝑖)
Greedy policy:
𝑖∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑄(𝑆, 𝑖)
Update state 𝑆
27
![Page 28: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/28.jpg)
Embedding for state-action value function
pick best
node
Greedy action
𝑖∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖 𝑄(𝑆, 𝑖)
State-action value function 𝑄 𝑆, 𝑖
= 𝜃1𝜎(𝜃2 𝑗∈𝑉 𝜇𝑗 + 𝜃3 𝜇𝑖)
aggregated
embedding
individual
embedding
𝑓 ,
Reinforcement
Learning
1. Problem graph
2.
Model &
Structure2Vec
3. Q-function4. Train
28[Dai et al. 2017]
![Page 29: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/29.jpg)
What algorithm is learned?
Learned algorithm balances between
• degree of the picked node and
• fragmentation of the graph
Structure2Vec Node greedy Edge greedy
29
![Page 30: Structure2Vec: Deep Learning for Security Analytics over Graphs · 2018-01-29 · Structure2Vec: Deep Learning for Security Analytics over Graphs Le Song Principal Engineer, AI Department,](https://reader030.fdocuments.in/reader030/viewer/2022040101/5ec9dfa7dc3e093d60060c2e/html5/thumbnails/30.jpg)
Summary
30
Networks
Timing
Various apps
Structure2Vec
Representation
CNN and RNN are for
images and sequences
Supervised
Learning
Generative
Models
Reinforcement
Learning
We are hiring!