Statistical Relational Learning
description
Transcript of Statistical Relational Learning
STATISTICAL RELATIONAL LEARNING
Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik
BAYESIAN NETWORKS
Burglary Earthquake
Alarm
JohnCalls
e b a0 0 0.10 1 0.81 0 0.61 1 0.9
e0.01
b0.01
MaryCalls
BAYESIAN NETWORK FOR A CITY
Burglary Earthquake
Alarm
Calls(H1) Calls(H3)
Burglary Earthquake
Alarm
Calls(H2)
Burglary Earthquake
Alarm
Calls(H2) Calls(H4)
Burglary Earthquake
Alarm
Calls(H3) Calls(H5)
Burglary Earthquake
Alarm
Calls(H4) Calls(H6)
H1
H2
H3
H4 H5
SHARED VARIABLESEarthquake(BL)
Alarm(H1) Alarm(H2)Alarm(H3) Alarm(H4)
Burglary(H4)
Burglary(H2) Burglary(H3)
Burglary(H1)
Calls(H1) Calls(H4) Calls(H5)Calls(H2) Calls(H3)
FIRST ORDER LOGIC
Burglary(house) Earthquake(city)
Alarm(house)
Calls(nhouse)
HouseInCity(house, city)
Alarm(house) :- HouseInCity(house, city), Earthquake(city), Burglary(house)
e b a0 0 0.10 1 0.81 0 0.61 1 0.9
Neighbor(house, nhouse)
LOGIC + PROBABILITY = STATISTICAL RELATIONAL LEARNING MODELS
Logic
Probabilities
Add Probabilities
Statistical Relational Learning
(SRL)
Add Relations
),(),,(),(
RPPRatingRCPCRatingmCDiff
PRatingCRating
Diff
ALPHABETIC SOUP
Knowledge-based model construction[Wellman et al., 1992]
PRISM [Sato & Kameya 1997] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Bayesian logic programs [Kersting & De Raedt, 2001] Bayesian logic [Milch et al., 2005] Markov logic [Richardson & Domingos, 2006] Relational dependency networks [Neville & Jensen 2007] ProbLog [De Raedt et al., 2007]
And many others!
RELATIONAL DATABASE
Prof Level
Prof
Course
Rating
Course Diff
Student
Course Grade
Student
IQ Satisfaction
FIRST ORDER LOGIC
Prof(P)
Level(P,L)
Diff(C)Course(C)
taughtBy(P,C)
ratings(P,C,R)
Student(S)
IQ(S,I)
satis(S,B)
takes(S,C)
grde(S,C,G)
Prof Level
Prof
Course
Rating
Course Diff
Student
Course Grade
Student
IQ Satisfaction
GRAPHICAL MODEL
satisfaction(S, B)
Diff(S, C, D)grades(S, C, G)
avgGrade(S, G) avgDiff(S, D)
P(satisfaction(S, B) | avgGrade(S, G), avgDiff(D))
RELATIONAL DECISION TREEspeed(X,S), S>120
job(X, politician)
knows(X,Y)
job(Y, politician)
N
N
N
Y
Y
noyes
noyes
no
no
yes
yes
Name Speed
Job Fine
Bob 120 Teacher NAlice 150 Writer NJohn 180 Politician NMary 160 Student YMike 140 Engineer Y
Person1 Person2Alice JohnMary MikeMary AliceBob MikeBob Mary
RELATIONAL DECISION TREEName Spee
dJob Fine
Bob 120 Teacher NAlice 150 Writer NJohn 180 Politician NMary 160 Student YMike 140 Engineer Y
Person1 Person2Alice JohnMary MikeMary AliceBob MikeBob Mary
speed(Alice,150), 150>120
job(X, politician)
knows(X,Y)
job(Y, politician)
N
N
N
Y
Y
noyes
noyes
no
no
yes
yes
RELATIONAL DECISION TREEName Spee
dJob Fine
Bob 120 Teacher NAlice 150 Writer NJohn 180 Politician NMary 160 Student YMike 140 Engineer Y
Person1 Person2Alice JohnMary MikeMary AliceBob MikeBob Mary
speed(Alice,150), 150>120
job(Alice, politician)
knows(X,Y)
job(Y, politician)
N
N
N
Y
Y
noyes
noyes
no
no
yes
yes
RELATIONAL DECISION TREEName Spee
dJob Fine
Bob 120 Teacher NAlice 150 Writer NJohn 180 Politician NMary 160 Student YMike 140 Engineer Y
Person1 Person2Alice JohnMary MikeMary AliceBob MikeBob Mary
speed(Alice,150), 150>120
job(Alice, politician)
knows(Alice,John)
job(Y, politician)
N
N
N
Y
Y
noyes
noyes
no
no
yes
yes
RELATIONAL DECISION TREEName Spee
dJob Fine
Bob 120 Teacher NAlice 150 Writer NJohn 180 Politician NMary 160 Student YMike 140 Engineer Y
Person1 Person2Alice JohnMary MikeMary AliceBob MikeBob Mary
speed(Alice,150), 150>120
job(Alice, politician)
knows(Alice,John)
job(John, politician)
N
N
N
Y
Y
noyes
noyes
no
no
yes
yes
RELATIONAL DECISION TREEName Spee
dJob Fine
Bob 120 Teacher NAlice 150 Writer NJohn 180 Politician NMary 160 Student YMike 140 Engineer Y
Person1 Person2Alice JohnMary MikeMary AliceBob MikeBob Mary
speed(Alice,150), 150>120
job(Alice, politician)
knows(Alice,John)
job(John, politician)
N
N
N
Y
Y
noyes
noyes
no
no
yes
yes
RELATIONAL DECISION TREEName Spee
dJob Fine
Bob 120 Teacher NAlice 150 Writer NJohn 180 Politician NMary 160 Student YMike 140 Engineer Y
Person1 Person2Alice JohnMary MikeMary AliceBob MikeBob Mary
speed(Alice,150), 150>120
job(Alice, politician)
knows(Alice,John)
job(John, politician)
N
N
N
Y
Y
noyes
noyes
no
no
yes
yes
RELATIONAL PROBABILITY TREES
Use probabilities on the leaves
Can be used to represent the conditional distributions
Can use regression values on leaves to represent regression functions
speed(X,S), S>120
job(X, politician)
knows(X,Y)
job(Y, politician)
0.1
0.2
0.4
0.8
0.8
noyes
noyes
no
no
yes
yes
STRUCTURE LEARNING PROBLEM Learn the structure of the conditional
distributions
Find the parents and the distribution for the target concept
satisfaction(S, B)
avgGrade(S, G) avgDiff(S, D)
IQ(S, I) level(P, L)
RELATIONAL TREE LEARNING
20
student(X)
paper(X,Y)
0.7 -0.2
-0.9
student(X) = T
paper(X,Y) = T paper(X,Y) = F
student(X) = F
X Δx1 0.7x2 -0.2x3 -0.9
X Yx1 y1x1 y2x3 y1
Xx1x2
paper(X, Y)student(X) adviser(X)
X Δx1 0.7x2 -0.2
X Δx3 -0.9
X Δx2 -0.2
X Δx1 0.7
0.25
Sequentially learn models where each subsequent model corrects the previous model
FUNCTIONAL GRADIENT BOOSTING
Data
Predictions
-
Residues
=Initial Model
++
Induce
Iterate
Final Model = + + + +…
ψm
Natarajan et al MLJ’12
BOOSTING ALGORITHM
For each gradient step m=1 to M
For each query predicate, P
Generate trainset usingprevious model, Fm-1
Learn a regression function, Tm,p
For each example, x
Compute gradient for x
Add <x, gradient(x)> to trainset
Add Tm,p to the model, Fm
Set Fm as current model
UW-CSE
UW-CSE AUC-ROC AUC-PR Likelihood Training Time
Boosting 0.96 0.93 0.81 9 s RDN 0.88 0.78 0.80 1 s
Alchemy 0.53 0.62 0.73 93 hrs
• Predict advisedBy relation • Given student, professor, courseTA,
courseProf, etc relations• 5-fold cross validation
http://pages.cs.wisc.edu/~tushar/rdnboost/index.html
CARDIA Family history, medical history, physical activity,
nutrient intake, obesity questions, pysochosocial, pulmonary function etc
Goal is to identify risk factors in early adulthood that causes serious cardio-vascular issues in older adults
Extremely rich dataset with 25 years of information
S. Natarajan , J. Carr
RESULTS
IMITATION LEARNING Expert agent performs actions (trajectories) Goal: Learn a policy from these trajectories
to suggest actions based on current state
Natarajan et al. IJCAI’11
Gridworld domain Robocup domain
ALZHEIMER'S RESEARCH AD – Progressive neurodegenerative
condition resulting in loss of cognitive abilities and memory
MRI – neuroimaging method Visualization of brain anatomy
Humans are not very good at identifying people with AD, especially before cognitive decline
MRI data – major source for distinguishing AD vs CN (Cognitively normal) or MCI vs CN
Natarajan et al. Under review
PROPOSITIONAL MODELS (WITH AAL)
CONCLUSION Statistical Relational Learning combines
first-order logic with probabilistic models
Relational trees used to represent conditional distributions
Boosting trees can be used to efficiently learn structure of SRL models