Dynamic Programming & Hidden Markov Models.

Dynamic ProgrammingDynamic Programming& Hidden Markov Models.& Hidden Markov Models.

Alan YuilleAlan Yuille

Dept. Statistics UCLADept. Statistics UCLA

Goal of this TalkGoal of this Talk

• This talk introduces one of the major This talk introduces one of the major algorithms: dynamic programming algorithms: dynamic programming (DP).(DP).

• Then describe how it can be used in Then describe how it can be used in conjunction with EM for learning.conjunction with EM for learning.

1. Chair

Dynamic ProgrammingDynamic Programming

• Dynamic Programming exploits the Dynamic Programming exploits the graphical structure of the probability graphical structure of the probability distribution. It can be applied to any distribution. It can be applied to any structure structure without closed loopswithout closed loops..

• Consider the two-headed coin Consider the two-headed coin example given in Tom Griffiths talk example given in Tom Griffiths talk (Monday).(Monday).

Probabilistic GrammarsProbabilistic Grammars

• By the Markov Condition:By the Markov Condition:

• Hence we can exploit the graphical Hence we can exploit the graphical structure to efficiently compute:structure to efficiently compute:

The structure means that the sum over x2 drops out. We needonly sum over x1 and x3. Only four operations instead of eight.

Dynamic Programming Dynamic Programming IntuitionIntuition• Suppose you wish to travel to Boston from Los Suppose you wish to travel to Boston from Los

Angeles by car.Angeles by car.

• To determine the cost of going via Chicago – To determine the cost of going via Chicago – you only need to calculate the shortest cost you only need to calculate the shortest cost from Los Angeles to Chicago and then, from Los Angeles to Chicago and then, independentlyindependently, the shortest cost from Chicago , the shortest cost from Chicago to Boston.to Boston.

• Decomposing the route in this way gives an Decomposing the route in this way gives an efficient algorithm which is polynomial in the efficient algorithm which is polynomial in the number of nodes and feasible for computation.number of nodes and feasible for computation.

Dynamic Programming Dynamic Programming DiamondDiamond

• Compute the shortest cost from A to Compute the shortest cost from A to B.B.

Application to a 1-dim Application to a 1-dim chain.chain.

• Consider a distribution defined on a 1-dim chain.Consider a distribution defined on a 1-dim chain.• Important property: directed and undirected graphs are Important property: directed and undirected graphs are

equivalent (for 1-dim chain).equivalent (for 1-dim chain).• P(A,B) = P(A|B) P(B) P(A,B) = P(A|B) P(B)

or P(A,B) = P(B|A) P(A)or P(A,B) = P(B|A) P(A)

• For these simple graphs with two nodes -- you cannot For these simple graphs with two nodes -- you cannot distinguish causation from correlation without distinguish causation from correlation without intervention intervention (Wu’s lecture Friday).(Wu’s lecture Friday).

• For this lecture – we will treat a simple one-dimensional For this lecture – we will treat a simple one-dimensional cover directed and undirected models simultaneously. cover directed and undirected models simultaneously. (Translating between directed and undirected is generally (Translating between directed and undirected is generally possible for graphs without closed loops – but has possible for graphs without closed loops – but has subtleties).subtleties).

Probability distribution on 1-D Probability distribution on 1-D chainchain

1-D Chain.1-D Chain.

1-Dim Chain:1-Dim Chain:

• (Proof by induction).(Proof by induction).

1-Dim Chain1-Dim Chain

• We can also use DP to compute other We can also use DP to compute other properties: e.g. to convert the properties: e.g. to convert the distribution from undirected form:distribution from undirected form:

• To directed form:To directed form:

1-Dim Chain1-Dim Chain

Special Case: 1-D Ising Spin Special Case: 1-D Ising Spin ModelModel

Dynamic Programming Dynamic Programming SummarySummary• Dynamic Programming can be applied to Dynamic Programming can be applied to

perform inference on all graphical models perform inference on all graphical models defined on trees –defined on trees –The key insight is that, for The key insight is that, for trees, we can define an order on the nodes trees, we can define an order on the nodes (not necessarily unique) and process nodes (not necessarily unique) and process nodes in sequence (never needing to return to a in sequence (never needing to return to a node that have already been processed).node that have already been processed).

Extensions of Dynamic Extensions of Dynamic Programming:Programming:

• What to do if you have a graph with closed What to do if you have a graph with closed loops?loops?

• There are a variety of advanced ways to There are a variety of advanced ways to exploit the graphical structure and obtain exploit the graphical structure and obtain efficient exact algorithms.efficient exact algorithms.

• Prof. Adnan Darwiche (CS, UCLA) is an Prof. Adnan Darwiche (CS, UCLA) is an expert on this topic. There will be an expert on this topic. There will be an introduction to his SamIam code.introduction to his SamIam code.

• Also can use approximate methods like BP.Also can use approximate methods like BP.

Junction Trees.Junction Trees.

• It is also possible to take a probability It is also possible to take a probability distribution defined on a graph with closed distribution defined on a graph with closed loops and reformulate it as a distribution on a loops and reformulate it as a distribution on a new nodes new nodes without closed loops. without closed loops. (Lauritzen (Lauritzen and Spiegelhalter 1990).and Spiegelhalter 1990).

• This lead to a variety of algorithm generally This lead to a variety of algorithm generally known as junction trees.known as junction trees.

• This is not a universal solution – because the This is not a universal solution – because the resulting new graphs may have too many resulting new graphs may have too many nodes to make them practical.nodes to make them practical.

• Google “junction trees” to find nice tutorials Google “junction trees” to find nice tutorials on junction trees.on junction trees.

Graph ConversionGraph Conversion

• Convert graph by a set of Convert graph by a set of transformations.transformations.

Triangles & Augmented Triangles & Augmented VariablesVariables

• From triangles to ordered triangles.From triangles to ordered triangles.

Original Variables: Loops Augmented Variables: No Loops

Summary of Dynamic Summary of Dynamic Programming.Programming.

• Dynamic Programming can be used to efficiently Dynamic Programming can be used to efficiently compute properties of a distribution for graphs compute properties of a distribution for graphs defined on trees.defined on trees.

• Directed graphs on trees can be reformulated as Directed graphs on trees can be reformulated as undirected graphs on trees, and vice versa.undirected graphs on trees, and vice versa.

• DP can be extended to apply to graphs with closed DP can be extended to apply to graphs with closed loops by restructuring the graphs (junction trees).loops by restructuring the graphs (junction trees).

• It is an active research area to determine efficient It is an active research area to determine efficient inference algorithms which exploit the graphical inference algorithms which exploit the graphical structures of these models.structures of these models.

• Relationship between DP and reinforcement Relationship between DP and reinforcement learning (week 2).learning (week 2).

• DP and A*. DP and pruning.DP and A*. DP and pruning.

HMM’s: Learning and HMM’s: Learning and InferenceInference• So far we have considered inference only.So far we have considered inference only.

• This assumes that the model is known.This assumes that the model is known.

• How can we learn the model?How can we learn the model?

• For 1D models -- this uses DP and EM.For 1D models -- this uses DP and EM.

A simple HMM for Coin A simple HMM for Coin TossingTossing• Two coins, one biased and the other fair, with the Two coins, one biased and the other fair, with the

coins switched occasionally. coins switched occasionally. • The observable 0,1 is whether the coin is head or The observable 0,1 is whether the coin is head or

tails.tails.• The hidden state A,B is which coin is used.The hidden state A,B is which coin is used.• There are unknown transition probabilities There are unknown transition probabilities

between the hidden states A and B, and unknown between the hidden states A and B, and unknown probabilities for the observations conditioned on probabilities for the observations conditioned on the hidden states.the hidden states.

• The learning task is to estimate these The learning task is to estimate these probabilities from a sequence of measurements.probabilities from a sequence of measurements.

HMM for SpeechHMM for Speech

HMM SummaryHMM Summary

• HMM define a class of markov models HMM define a class of markov models with hidden variables. Used for speech with hidden variables. Used for speech recognition, and many other recognition, and many other applications.applications.

• Tasks involving HMM’s involve learning, Tasks involving HMM’s involve learning, inference, and model selection.inference, and model selection.

• These can often be performed by These can often be performed by algorithms based on EM and DP.algorithms based on EM and DP.

Dynamic Programming & Hidden Markov Models.

Documents

Transcript of Dynamic Programming & Hidden Markov Models.