Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of...

61
Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007

Transcript of Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of...

Page 1: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

Inference on Relational Models Using Markov Chain Monte Carlo

Brian MilchMassachusetts Institute of Technology

UAI Tutorial

July 19, 2007

Page 2: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

2

S. Russel and P. Norvig (1995). Artificial Intelligence: A Modern Approach. Upper Saddle River, NJ: Prentice Hall.

Example 1: Bibliographies

Russell, Stuart and Norvig, Peter. Articial Intelligence. Prentice-Hall, 1995.

Stuart Russell Peter Norvig

Artificial Intelligence: A Modern Approach

Page 3: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

3

(1.9, 6.1, 2.2)

(0.6, 5.9, 3.2)

Example 2: Aircraft Tracking

t=1 t=2 t=3

(1.9, 9.0, 2.1)

(0.7, 5.1, 3.2)

(1.8, 7.4, 2.3)

(0.9, 5.8, 3.1)

Page 4: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

4

Inference on Relational Structures

“Russell” “Roberts”

“AI: A Mod...”

“Rus...” “AI...” “AI: A...” “Rus...” “AI...” “AI: A...”

“Rus...” “AI...” “AI: A...”

“Rob...” “Adv...” “Rob...”

“Shak...” “Haml...” “Wm...”“Seu...” “The...” “Seu...”

“Russell” “Norvig”

“AI: A Mod...”“Advance...”

“Seuss”

“The...” “If you...”

“Shak...”

“Hamlet”“Tempest”

1.2 x 10-12 2.3 x 10-12 4.5 x 10-14

6.7 x 10-16 8.9 x 10-16 5.0 x 10-20

Page 5: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

5

Markov Chain Monte Carlo (MCMC)

• Markov chain s1, s2, ... over worlds where evidence E is true

• Approximate P(Q|E) as fraction of s1, s2, ... that satisfy query Q

E

Q

Page 6: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

6

Outline

• Probabilistic models for relational structures– Modeling the number of objects– Three mistakes that are easy to make

• Markov chain Monte Carlo (MCMC)– Gibbs sampling– Metropolis-Hastings– MCMC over events

• Case studies– Citation matching– Multi-target tracking

Page 7: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

7

Simple Example: Clustering

Wingspan (cm)

= 22 = 49 = 80

10 20 30 40 50 60 70 80 90 100

Page 8: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

8

Simple Bayesian Mixture Model

• Number of latent objects is known to be k

• For each latent object i, have parameter:

• For each data point j, have object selector

and observable value

]100,0[Uniform~i

},...,1Uniform({~ kC j

25,Normal~jcjX

Page 9: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

9

BN for Mixture Model

X1 X2 X3 Xn

C1 C2 C3 Cn

1 2 k…

Page 10: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

10

Context-Specific Dependencies

X1 X2 X3 Xn

C1 C2 C3 Cn

1 2 k…

= 2 = 1 = 2

Page 11: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

11

Extensions to Mixture Model

• Random number of latent objects k, with distribution p(k) such as:– Uniform({1, …, 100})– Geometric(0.1)– Poisson(10)

• Random distribution for selecting objects– p( | k) ~ Dirichlet(1,..., k)

(Dirichlet: distribution over probability vectors)– Still symmetric: each i = /k

unbounded!

Page 12: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

12

Existence versus Observation

• A latent object can exist even if no observations correspond to it– Bird species may not be observed yet– Aircraft may fly over without yielding any blips

• Two questions:– How many objects correspond to observations?– How many objects are there in total?

• Observed 3 species, each 100 times: probably no more• Observed 200 species, each 1 or 2 times: probably more

exist

Page 13: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

13

Expecting Additional Objects

• P(ever observe new species | seen r so far) bounded by P(k r)

• So as # species observed , probability of ever seeing more 0

• What if we don’t want this?

r observed species

observe more later?

Page 14: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

14

Dirichlet Process Mixtures

• Set k = , let be infinite-dimensional probability vector with stick-breaking prior

• Another view: Define prior directly on partitions of data points, allowing unbounded number of blocks

• Drawback: Can’t ask about number of unobserved latent objects (always infinite)

1 2 3 4 5 …

[Ferguson 1983; Sethuraman 1994][tutorials: Jordan 2005; Sudderth 2006]

Page 15: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

15

Outline

• Probabilistic models for relational structures– Modeling the number of objects– Three mistakes that are easy to make

• Markov chain Monte Carlo (MCMC)– Gibbs sampling– Metropolis-Hastings– MCMC over events

• Case studies– Citation matching– Multi-target tracking

Page 16: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

16

Mistake 1: Ignoring Interchangeability

• Which birds are in species S1?• Latent object indices are

interchangeable– Posterior on selector variable CB1 is uniform

– Posterior on S1 has a peak for each cluster of birds

• Really care about partition of observations

• Partition with r blocks corresponds to k! / (k-r)! instantiations of the Cj variables

B1 B3B2 B5B4

{{1, 3}, {2}, {4, 5}}

(1, 2, 1, 3, 3), (1, 2, 1, 4, 4), (1, 4, 1, 3, 3), (2, 1, 2, 3, 3), …

Page 17: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

17

Ignoring Interchangeability, Cont’d

• Say k = 4. What’s prior probability that B1, B3 are in one species, B2 in another?

• Multiply probabilities for CB1, CB2, CB3: (1/4) x (1/4) x (1/4)

• Not enough! Partition {{B1, B3}, {B2}} corresponds to 12 instantiations of C’s

• Partition with r blocks corresponds to kPr instantiations

(S1, S2, S1), (S1, S3, S1), (S1, S4, S1), (S2, S1, S2), (S2, S3, S2), (S2, S4, S2)(S3, S1, S3), (S3, S2, S3), (S3, S4, S3), (S4, S1, S4), (S4, S2, S4), (S4, S3, S4)

Page 18: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

18

Mistake 2: Underestimating the Bayesian Ockham’s Razor Effect

• Say k = 4. Are B1 and B2 in same species?

• Maximum-likelihood estimation would yield one species with = 50 and another with = 52

• But Bayesian model trades off likelihood against prior probability of getting those values

Wingspan (cm)

10 20 30 40 50 60 70 80 90 100

XB1=50 XB2=52

Page 19: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

19

Bayesian Ockham’s Razor

10 20 30 40 50 60 70 80 90 100

XB1=50 XB2=52

H1: Partition is {{B1, B2}}

11211

100

0 1

2

141 )|()|()(41)data,( dxpxppPHp

1.3 x 10-4

H2: Partition is {{B1}, {B2}}

222

100

0 2111

100

0 1

2

242 )|()()|()(41)data,( dxppdxppPHp

7.5 x 10-5

= 0.01

Don’t use more latent objects than necessary to explain your data

[MacKay 1992]

Page 20: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

20

Mistake 3: Comparing Densities Across Dimensions

Wingspan (cm)

10 20 30 40 50 60 70 80 90 100

XB1=50 XB2=52

H1: Partition is {{B1, B2}}, = 51

H2: Partition is {{B1}, {B2}}, B1 = 50, B2 = 52

)5,51;52()5,51;50(01.041)data,( 22

2

141 NNPHp

)5,52;52(01.0)5,50;50(01.041)data,( 22

2

142 NNPHp

1.5 x 10-5

4.8 x 10-7

H1 wins by greater margin

Page 21: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

21

What If We Change the Units?

Wingspan (m)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

XB1=0.50 XB2=0.52

H1: Partition is {{B1, B2}}, = 0.51

H2: Partition is {{B1}, {B2}}, B1 = 0.50, B2 = 0.52

)05.0,51.0;52.0()05.0,51.0;50.0(141)data,( 22

2

141 NNPHp

)05.0,52.0;52.0(1)05.0,50.0;50.0(141)data,( 22

2

142 NNPHp

15

48

density of Uniform(0, 1) is 1!

Now H2 wins by a landslide

Page 22: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

22

Lesson: Comparing Densities Across Dimensions

• Densities don’t behave like probabilities (e.g., they can be greater than 1)

• Heights of density peaks in spaces of different dimension are not comparable

• Work-arounds:– Find most likely partition first, then most likely

parameters given that partition– Find region in parameter space where most of

the posterior probability mass lies

Page 23: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

23

Outline

• Probabilistic models for relational structures– Modeling the number of objects– Three mistakes that are easy to make

• Markov chain Monte Carlo (MCMC)– Gibbs sampling– Metropolis-Hastings– MCMC over events

• Case studies– Citation matching– Multi-target tracking

Page 24: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

24

Why Not Exact Inference?

• Number of possible partitions is superexponential in n

• Variable elimination?– Summing out i

couples all the Cj’s

– Summing out Cj

couples all the i’s

X1 X2 X3 Xn

C1 C2 C3 Cn

1 2 k

Page 25: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

25

Markov Chain Monte Carlo (MCMC)

• Start in arbitrary state (possible world) s1 satisfying evidence E

• Sample s2, s3, ... according to transition kernel T(si, si+1), yielding Markov chain

• Approximate p(Q | E) by fraction of s1, s2, …, sL that are in Q

E

Q

Page 26: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

26

Why a Markov Chain?

• Why use Markov chain rather than sampling independently?– Stochastic local search for high-probability s– Once we find such s, explore around it

Page 27: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

27

Convergence

• Stationary distribution is such that

• If chain is ergodic (can get to anywhere from anywhere*), then:– It has unique stationary distribution – Fraction of s1, s2, ..., sL in Q converges to

(Q) as L

• We’ll design T so (s) = p(s | E)

s

sssTs )'()',()(

* and it’s aperiodic

Page 28: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

28

Gibbs Sampling

• Order non-evidence variables V1,V2,...,Vm

• Given state s, sample from T as follows:– Let s = s– For i = 1 to m

• Sample vi from p(Vi | s-i)

• Let s = (s-i, Vi = vi)

– Return s

• Theorem: stationary distribution is p(s | E)

[Geman & Geman 1984]

Conditional for Vi given other vars in s

Page 29: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

29

• Conditional for V depends only on factors that contain v

• So condition on V’s Markov blanket mb(V): parents, children, and co-parents

Gibbs on Bayesian Network

)ch(

)])([Pa,|][()])[Pa(|()|(VY

VV YsvYspVsvpsvp

V

Page 30: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

30

Gibbs on Bayesian Mixture Model

• Given current state s:– Resample each i

given prior and {Xj : Cj = i in s}

– Resample each Cj given Xj and 1:k X1 X2 X3 Xn

C1 C2 C3 Cn

1 2 k

context-specificMarkov blanket

[Neal 2000]

Page 31: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

31

Sampling Given Markov Blanket

• If V is discrete, just iterate over values, normalize, sample from discrete distrib.

• If V is continuous:– Simple if child distributions are conjugate to

V’s prior: posterior has same form as prior with different parameters

– In general, even sampling from p(v | s-V) can be hard

)ch(

)])([Pa,|][()])[Pa(|()|(VY

VV YsvYspVsvpsvp

[See BUGS software: http://www.mrc-bsu.cam.ac.uk/bugs]

Page 32: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

32

Convergence Can Be Slow

• Cj’s won’t change until 2 is in right area 2 does unguided random walk as long as no

observations are associated with it– Especially bad in high dimensions

should be two clusters

1 = 20 2 = 90

species 2 is far away

Wingspan (cm)

10 20 30 40 50 60 70 80 90 100

Page 33: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

33

Outline

• Probabilistic models for relational structures– Modeling the number of objects– Three mistakes that are easy to make

• Markov chain Monte Carlo (MCMC)– Gibbs sampling– Metropolis-Hastings– MCMC over events

• Case studies– Citation matching– Multi-target tracking

Page 34: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

34

Metropolis-Hastings

• Define T(si, si+1) as follows:

– Sample s from proposal distribution q(s | s)– Compute acceptance probability

– With probability , let si+1 = s; else let si+1 = si

ii

i

ssqEsp

ssqEsp

||

||,1min

relative posteriorprobabilities

backward / forwardproposal probabilities

Can show that p(s | E) is stationary distribution for T

[Metropolis et al. 1953; Hastings 1970]

Page 35: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

35

Metropolis-Hastings

• Benefits– Proposal distribution can propose big steps

involving several variables– Only need to compute ratio p(s | E) / p(s | E),

ignoring normalization factors– Don’t need to sample from conditional distribs

• Limitations– Proposals must be reversible, else q(s | s) =

0– Need to be able to compute q(s | s) / q(s | s)

Page 36: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

36

Split-Merge Proposals

• Choose two observations i, j• If Ci = Cj = c, then split cluster c

– Get unused latent object c– For each observation m such that Cm = c,

change Cm to c with probability 0.5– Propose new values for c, c

• Else merge clusters ci and cj

– For each m such that Cm = cj, set Cm = ci

– Propose new value for c[Jain & Neal 2004]

Page 37: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

37

Split-Merge Example1 = 20 2 = 90

Wingspan (cm)

10 20 30 40 50 60 70 80 90 100

2 = 27

• Split two birds from species 1

• Resample 2 to match these two birds

• Move is likely to be accepted

Page 38: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

38

Mixtures of Kernels

• If T1,…,Tm all have stationary distribution , then so does mixture

• Example: Mixture of split-merge and Gibbs moves

• Point: Faster convergence

m

iii ssTwssT

1

)',()',(

Page 39: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

39

Outline

• Probabilistic models for relational structures– Modeling the number of objects– Three mistakes that are easy to make

• Markov chain Monte Carlo (MCMC)– Gibbs sampling– Metropolis-Hastings– MCMC over events

• Case studies– Citation matching– Multi-target tracking

Page 40: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

40

MCMC States in Split-Merge

• Not complete instantiations!– No parameters for unobserved species

• States are partial instantiations of random variables

– Each state corresponds to an event: set of outcomes satisfying description

k = 12, CB1 = S2, CB2 = S8, S2 = 31, S8 = 84

Page 41: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

41

MCMC over Events

• Markov chain over events , with stationary distrib. proportional to p()

• Theorem: Fraction of visited events in Q converges to p(Q|E) if:– Each is either subset of Q

or disjoint from Q– Events form partition of E

E

Q

[Milch & Russell 2006]

Page 42: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

42

Computing Probabilities of Events

• Engine needs to compute p() / p(n) efficiently (without summations)

• Use instantiations that include all active parents of the variables they instantiate

• Then probability is product of CPDs:

)(vars

))(Pa(|)()(

X

X XXpp

Page 43: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

43

States That Are Even More Abstract

• Typical partial instantiation:

– Specifies particular species numbers, even though species are interchangeable

• Let states be abstract partial instantiations:

• See [Milch & Russell 2006] for conditions under which we can compute probabilities of such events

x y x [k = 12, CB1 = x, CB2 = y, x = 31, y = 84]

k = 12, CB1 = S2, CB2 = S8, S2 = 31, S8 = 84

Page 44: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

44

Outline

• Probabilistic models for relational structures– Modeling the number of objects– Three mistakes that are easy to make

• Markov chain Monte Carlo (MCMC)– Gibbs sampling– Metropolis-Hastings– MCMC over events

• Case studies– Citation matching– Multi-target tracking

Page 45: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

45

Representative Applications

• Tracking cars with cameras [Pasula et al. 1999]

• Segmentation in computer vision [Tu & Zhu 2002]

• Citation matching [Pasula et al. 2003]

• Multi-target tracking with radar [Oh et al. 2004]

Page 46: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

46

Citation Matching Model

#Researcher ~ NumResearchersPrior();

Name(r) ~ NamePrior();

#Paper ~ NumPapersPrior();

FirstAuthor(p) ~ Uniform({Researcher r});

Title(p) ~ TitlePrior();

PubCited(c) ~ Uniform({Paper p});

Text(c) ~ NoisyCitationGrammar (Name(FirstAuthor(PubCited(c))), Title(PubCited(c)));

[Pasula et al. 2003; Milch & Russell 2006]

Page 47: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

47

Citation Matching• Elaboration of generative model shown earlier • Parameter estimation

– Priors for names, titles, citation formats learned offline from labeled data

– String corruption parameters learned with Monte Carlo EM

• Inference– MCMC with split-merge proposals– Guided by “canopies” of similar citations– Accuracy stabilizes after ~20 minutes

[Pasula et al., NIPS 2002]

Page 48: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

48

Citation Matching Results

Four data sets of ~300-500 citations, referring to ~150-300 papers

0

0.05

0.1

0.15

0.2

0.25

Reinforce Face Reason Constraint

Err

or

(Fra

ctio

n o

f C

lust

ers

No

t R

eco

vere

d C

orr

ectl

y)

Phrase Matching[Lawrence et al. 1999]

Generative Model + MCMC[Pasula et al. 2002]

Conditional Random Field[Wellner et al. 2004]

Page 49: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

49

Cross-Citation Disambiguation

Wauchope, K. Eucalyptus: Integrating Natural Language Input with a Graphical User Interface. NRL Report NRL/FR/5510-94-9711 (1994).

Is "Eucalyptus" part of the title, or is the author named K. Eucalyptus Wauchope?

Kenneth Wauchope (1994). Eucalyptus: Integrating natural language input with a graphical user interface. NRL Report NRL/FR/5510-94-9711, Naval Research Laboratory, Washington, DC, 39pp.

Second citation makes it clear how to parse the first one

Page 50: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

50

Preliminary Experiments: Information Extraction

• P(citation text | title, author names) modeled with simple HMM

• For each paper: recover title, author surnames and given names

• Fraction whose attributes are recovered perfectly in last MCMC state:– among papers with one citation: 36.1%– among papers with multiple citations: 62.6%

Can use inferred knowledge for disambiguation

Page 51: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

51

Multi-Object Tracking

FalseDetection

UnobservedObject

Page 52: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

52

State Estimation for “Aircraft”

#Aircraft ~ NumAircraftPrior();

State(a, t) if t = 0 then ~ InitState() else ~ StateTransition(State(a, Pred(t)));

#Blip(Source = a, Time = t) ~ NumDetectionsCPD(State(a, t));

#Blip(Time = t) ~ NumFalseAlarmsPrior();

ApparentPos(r)if (Source(r) = null) then ~ FalseAlarmDistrib()else ~ ObsCPD(State(Source(r), Time(r)));

Page 53: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

53

Aircraft Entering and Exiting

#Aircraft(EntryTime = t) ~ NumAircraftPrior();

Exits(a, t) if InFlight(a, t) then ~ Bernoulli(0.1);

InFlight(a, t)if t < EntryTime(a) then = falseelseif t = EntryTime(a) then = trueelse = (InFlight(a, Pred(t)) & !Exits(a, Pred(t)));

State(a, t)if t = EntryTime(a) then ~ InitState() elseif InFlight(a, t) then ~ StateTransition(State(a, Pred(t)));

#Blip(Source = a, Time = t) if InFlight(a, t) then

~ NumDetectionsCPD(State(a, t));

…plus last two statements from previous slide

Page 54: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

54

MCMC for Aircraft Tracking

• Uses generative model from previous slide (although not with BLOG syntax)

• Examples of Metropolis-Hastings proposals:

[Oh et al., CDC 2004][Figures by Songhwai Oh]

Page 55: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

55

Aircraft Tracking Results

[Oh et al., CDC 2004][Figures by Songhwai Oh]

MCMC has smallest error, hardly degrades at all as tracks get dense

MCMC is nearly as fast as greedy algorithm; much faster than MHT

Estimation Error Running Time

Page 56: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

56

Toward General-Purpose Inference

• Currently, each new application requires new code for:– Proposing moves– Representing MCMC states– Computing acceptance probabilities

• Goal: – User specifies model and proposal distribution– General-purpose code does the rest

Page 57: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

57

General MCMC Engine

• Propose MCMC state s given sn

• Compute ratio q(sn | s) / q(s | sn)

• Compute acceptance probability based on model

• Set sn+1

• Define p(s)Custom proposal distribution

(Java class)

General-purpose engine(Java code)

Model (in declarative language) MCMC states: partial worlds

[Milch & Russell 2006]

Handle arbitrary proposals efficiently

using context-specific structure

Page 58: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

58

Summary

• Models for relational structures go beyond standard probabilistic inference settings

• MCMC provides a feasible path for inference

• Open problems– More general inference– Adaptive MCMC– Integrating discriminative methods

Page 59: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

59

References• Blei, D. M. and Jordan, M. I. (2005) “Variational inference for Dirichlet process

mixtures”. J. Bayesian Analysis 1(1):121-144.• Casella, G. and Robert, C. P. (1996) “Rao-Blackwellisation of sampling schemes”.

Biometrika 83(1):81-94. • Ferguson T. S. (1983) “Bayesian density estimation by mixtures of normal

distributions”. In Rizvi, M. H. et al., eds. Recent Advances in Statistics: Papers in Honor of Herman Chernoff on His Sixtieth Birthday. Academic Press, New York, pages 287-302.

• Geman, S. and Geman, D. (1984) “Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images”. IEEE Trans. on Pattern Analysis and Machine Intelligence 6:721-741.

• Gilks, W. R., Thomas, A. and Spiegelhalter, D. J. (1994) “A language and program for complex Bayesian modelling”. The Statistician 43(1):169-177.

• Gilks, W. R., Richardson, S., and Spiegelhalter, D. J., eds. (1996) Markov Chain Monte Carlo in Practice. Chapman and Hall.

• Green, P. J. (1995) “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination”. Biometrika 82(4):711-732.

Page 60: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

60

References• Hastings, W. K. (1970) “Monte Carlo sampling methods using Markov chains and

their applications”. Biometrika 57:97-109.• Jain, S. and Neal, R. M. (2004) “A split-merge Markov chain Monte Carlo procedure

for the Dirichlet process mixture model”. J. Computational and Graphical Statistics 13(1):158-182.

• Jordan M. I. (2005) “Dirichlet processes, Chinese restaurant processes, and all that”. Tutorial at the NIPS Conference, available at http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

• MacKay D. J. C. (1992) “Bayesian Interpolation” Neural Computation 4(3):414-447.• MacEachern, S. N. (1994) “Estimating normal means with a conjugate style Dirichlet

process prior” Communications in Statistics: Simulation and Computation 23:727-741.• Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E.

(1953) “Equations of state calculations by fast computing machines”. J. Chemical Physics 21:1087-1092.

• Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., and Kolobov, A. (2005) “BLOG: Probabilistic Models with Unknown Objects”. In Proc. 19th Int’l Joint Conf. on AI, pages 1352-1359.

• Milch, B. and Russell, S. (2006) “General-purpose MCMC inference over relational structures”. In Proc. 22nd Conf. on Uncertainty in AI, pages 349-358.

Page 61: Inference on Relational Models Using Markov Chain Monte Carlo Brian Milch Massachusetts Institute of Technology UAI Tutorial July 19, 2007.

61

References• Neal, R. M. (2000) “Markov chain sampling methods for Dirichlet process mixture

models”. J. Computational and Graphical Statistics 9:249-265.• Oh, S., Russell, S. and Sastry, S. (2004) “Markov chain Monte Carlo data association

for general multi-target tracking problems”. In Proc. 43rd IEEE Conf. on Decision and Control, pages 734-742.

• Pasula, H., Russell, S. J., Ostland, M., and Ritov, Y. (1999) “Tracking many objects with many sensors”. In Proc. 16th Int’l Joint Conf. on AI, pages 1160-1171.

• Pasula, H., Marthi, B., Milch, B., Russell, S., and Shpitser, I. (2003) “Identity uncertainty and citation matching”. In Advances in Neural Information Processing Systems 15, MIT Press, pages 1401-1408.

• Richardson,, S. and Green, P. J. (1997) “On Bayesian analysis of mixtures with an unknown number of components”. J. Royal Statistical Society B 59:731-792.

• Sethuraman, J. (1994) “A constructive definition of Dirichlet priors”. Statistica Sinica 4:639-650.

• Sudderth, E. (2006) “Graphical models for visual object recognition and tracking”. Ph.D. thesis, Dept. of EECS, Massachusetts Institute of Technology, Cambridge, MA.

• Tu, Z. and Zhu, S.-C. (2002) “Image segmentation by data-driven Markov chain Monte Carlo”. IEEE Trans. Pattern Analysis and Machine Intelligence 24(5):657-673.