Statistical Relational Learning

STATISTICAL RELATIONAL LEARNING

Joint Work with Sriraam Natarajan, Kristian Kersting, Jude Shavlik

BAYESIAN NETWORKS

Burglary Earthquake

Alarm

JohnCalls

e b a0 0 0.10 1 0.81 0 0.61 1 0.9

e0.01

b0.01

MaryCalls

BAYESIAN NETWORK FOR A CITY

Burglary Earthquake

Alarm

Calls(H1) Calls(H3)

Burglary Earthquake

Alarm

Calls(H2)

Burglary Earthquake

Alarm

Calls(H2) Calls(H4)

Burglary Earthquake

Alarm

Calls(H3) Calls(H5)

Burglary Earthquake

Alarm

Calls(H4) Calls(H6)

H1

H2

H3

H4 H5

SHARED VARIABLESEarthquake(BL)

Alarm(H1) Alarm(H2)Alarm(H3) Alarm(H4)

Burglary(H4)

Burglary(H2) Burglary(H3)

Burglary(H1)

Calls(H1) Calls(H4) Calls(H5)Calls(H2) Calls(H3)

FIRST ORDER LOGIC

Burglary(house) Earthquake(city)

Alarm(house)

Calls(nhouse)

HouseInCity(house, city)

Alarm(house) :- HouseInCity(house, city), Earthquake(city), Burglary(house)

e b a0 0 0.10 1 0.81 0 0.61 1 0.9

Neighbor(house, nhouse)

LOGIC + PROBABILITY = STATISTICAL RELATIONAL LEARNING MODELS

Logic

Probabilities

Add Probabilities

Statistical Relational Learning

(SRL)

Add Relations

),(),,(),(

RPPRatingRCPCRatingmCDiff

PRatingCRating

Diff

ALPHABETIC SOUP

Knowledge-based model construction[Wellman et al., 1992]

PRISM [Sato & Kameya 1997] Stochastic logic programs [Muggleton, 1996] Probabilistic relational models [Friedman et al., 1999] Bayesian logic programs [Kersting & De Raedt, 2001] Bayesian logic [Milch et al., 2005] Markov logic [Richardson & Domingos, 2006] Relational dependency networks [Neville & Jensen 2007] ProbLog [De Raedt et al., 2007]

And many others!

RELATIONAL DATABASE

Prof Level

Prof

Course

Rating

Course Diff

Student

Course Grade

Student

IQ Satisfaction

FIRST ORDER LOGIC

Prof(P)

Level(P,L)

Diff(C)Course(C)

taughtBy(P,C)

ratings(P,C,R)

Student(S)

IQ(S,I)

satis(S,B)

takes(S,C)

grde(S,C,G)

Prof Level

Prof

Course

Rating

Course Diff

Student

Course Grade

Student

IQ Satisfaction

GRAPHICAL MODEL

satisfaction(S, B)

Diff(S, C, D)grades(S, C, G)

avgGrade(S, G) avgDiff(S, D)

P(satisfaction(S, B) | avgGrade(S, G), avgDiff(D))

RELATIONAL DECISION TREEspeed(X,S), S>120

job(X, politician)

knows(X,Y)

job(Y, politician)

N

N

N

Y

Y

noyes

noyes

no

no

yes

yes

Name Speed

Job Fine

Bob 120 Teacher NAlice 150 Writer NJohn 180 Politician NMary 160 Student YMike 140 Engineer Y

Person1 Person2Alice JohnMary MikeMary AliceBob MikeBob Mary

RELATIONAL DECISION TREEName Spee

dJob Fine



speed(Alice,150), 150>120

job(X, politician)

knows(X,Y)

job(Y, politician)

N

N

N

Y

Y

noyes

noyes

no

no

yes

yes


dJob Fine




job(Alice, politician)

knows(X,Y)

job(Y, politician)

N

N

N

Y

Y

noyes

noyes

no

no

yes

yes


dJob Fine





knows(Alice,John)

job(Y, politician)

N

N

N

Y

Y

noyes

noyes

no

no

yes

yes


dJob Fine





knows(Alice,John)

job(John, politician)

N

N

N

Y

Y

noyes

noyes

no

no

yes

yes

RELATIONAL PROBABILITY TREES

Use probabilities on the leaves

Can be used to represent the conditional distributions

Can use regression values on leaves to represent regression functions

speed(X,S), S>120

job(X, politician)

knows(X,Y)

job(Y, politician)

0.1

0.2

0.4

0.8

0.8

noyes

noyes

no

no

yes

yes

STRUCTURE LEARNING PROBLEM Learn the structure of the conditional

distributions

Find the parents and the distribution for the target concept

satisfaction(S, B)

avgGrade(S, G) avgDiff(S, D)

IQ(S, I) level(P, L)

RELATIONAL TREE LEARNING

20

student(X)

paper(X,Y)

0.7 -0.2

-0.9

student(X) = T

paper(X,Y) = T paper(X,Y) = F

student(X) = F

X Δx1 0.7x2 -0.2x3 -0.9

X Yx1 y1x1 y2x3 y1

Xx1x2

paper(X, Y)student(X) adviser(X)

X Δx1 0.7x2 -0.2

X Δx3 -0.9

X Δx2 -0.2

X Δx1 0.7

0.25

Sequentially learn models where each subsequent model corrects the previous model

FUNCTIONAL GRADIENT BOOSTING

Data

Predictions

-

Residues

=Initial Model

++

Induce

Iterate

Final Model = + + + +…

ψm

Natarajan et al MLJ’12

BOOSTING ALGORITHM

For each gradient step m=1 to M

For each query predicate, P

Generate trainset usingprevious model, Fm-1

Learn a regression function, Tm,p

For each example, x

Compute gradient for x

Add <x, gradient(x)> to trainset

Add Tm,p to the model, Fm

Set Fm as current model

UW-CSE

UW-CSE AUC-ROC AUC-PR Likelihood Training Time

Boosting 0.96 0.93 0.81 9 s RDN 0.88 0.78 0.80 1 s

Alchemy 0.53 0.62 0.73 93 hrs

• Predict advisedBy relation • Given student, professor, courseTA,

courseProf, etc relations• 5-fold cross validation

http://pages.cs.wisc.edu/~tushar/rdnboost/index.html

CARDIA Family history, medical history, physical activity,

nutrient intake, obesity questions, pysochosocial, pulmonary function etc

Goal is to identify risk factors in early adulthood that causes serious cardio-vascular issues in older adults

Extremely rich dataset with 25 years of information

S. Natarajan , J. Carr

RESULTS

IMITATION LEARNING Expert agent performs actions (trajectories) Goal: Learn a policy from these trajectories

to suggest actions based on current state

Natarajan et al. IJCAI’11

Gridworld domain Robocup domain

ALZHEIMER'S RESEARCH AD – Progressive neurodegenerative

condition resulting in loss of cognitive abilities and memory

MRI – neuroimaging method Visualization of brain anatomy

Humans are not very good at identifying people with AD, especially before cognitive decline

MRI data – major source for distinguishing AD vs CN (Cognitively normal) or MCI vs CN

Natarajan et al. Under review

PROPOSITIONAL MODELS (WITH AAL)

CONCLUSION Statistical Relational Learning combines

first-order logic with probabilistic models

Relational trees used to represent conditional distributions

Boosting trees can be used to efficiently learn structure of SRL models

Statistical Relational Learning

Documents

Transcript of Statistical Relational Learning