Graphical Models for chains, trees and grids

Graphical Models for Chains, Trees, and Grids

Gabriel Brostow UCL

Sources

•  Book and slides by Simon Prince: “Computer vision: models, learning

and inference” (June 2012) •  See more on

www.computervisionmodels.com

2

Part 1: Graphical Models for Chains and Trees

3

Part 1 Structure

•  Chain and tree models •  MAP inference in chain models •  MAP inference in tree models •  Maximum marginals in chain models •  Maximum marginals in tree models •  Models with loops •  ApplicaOons

4 4 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Extra

Extra

Example Problem: Pictorial Structures

5 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

Chain and tree models •  Given a set of measurements and world states , infer the world states from the measurements.

•  Problem: if N is large, then the model relaOng the two will have a very large number of parameters.

•  SoluOon: build sparse models where we only describe subsets of the relaOons between variables.


Chain and tree models


Chain model: only model connecOons between a world variable and its 1 preceeding and 1 subsequent variables

Tree model: connecOons between world variables are organized as a tree (no loops). Disregard direcOonality of connecOons for directed model

AssumpOons


We’ll assume that –  World states are discrete

–  Observed data variables for each world state –  The nth data variable is condi&onally

independent of all other data variables and world states, given associated world state

See also: Thad Starner’s work

Gesture Tracking


Directed model for chains (Hidden Markov model)


CompaObility of measurement and world state

CompaObility of world state and previous world state

Undirected model for chains


CompaObility of measurement and world state

CompaObility of world state and previous world state

Equivalence of chain models


Directed:

Undirected:

Equivalence:

Chain model for sign language applicaOon


ObservaOons are normally distributed but depend on sign k

World state is categorically distributed, parameters depend on previous world state

Structure



MAP inference in chain model


MAP inference:

SubsOtuOng in :

Directed model:

MAP inference in chain model


Takes the general form:

Unary term:

Pairwise term:

Dynamic programming


Maximizes funcOons of the form:

Set up as cost for traversing graph – each path from le` to right is one possible configuraOon of world states

Dynamic programming


Algorithm: 1.  Work through graph compuOng minimum possible cost to reach each node 2.  When we get to last column, find minimum 3.  Trace back to see how we got there

Worked example


Unary cost Pairwise costs: •  Zero cost to stay at same label •  Cost of 2 to change label by 1 •  Infinite cost for changing by more

than one (not shown)

Worked example


Minimum cost to reach first node is just unary cost

Worked example


Minimum cost is minimum of two possible routes to get here Route 1: 2.0+0.0+1.1 = 3.1 Route 2: 0.8+2.0+1.1 = 3.9

Worked example


Minimum cost is minimum of two possible routes to get here Route 1: 2.0+0.0+1.1 = 3.1 -‐-‐ this is the minimum – note this down Route 2: 0.8+2.0+1.1 = 3.9

Worked example


General rule:

Worked example


Work through the graph, compuOng the minimum cost to reach each node

Worked example


Keep going unOl we reach the end of the graph

Worked example


Find the minimum possible cost to reach the final column

Worked example


Trace back the route that we arrived here by – this is the minimum configuraOon

Structure



MAP inference for trees


Worked example


Worked example


Variables 1-‐4 proceed as for the chain example.

Worked example


At variable n=5 must consider all pairs of paths from into the current node.

Worked example


Variable 6 proceeds as normal. Then we trace back through the variables, splilng at the juncOon.

Structure



Extra

Extra

Jump there

Marginal posterior inference


•  Start by compuOng the marginal distribuOon over the Nth variable

•  Then we`ll consider how to compute the other marginal distribuOons

CompuOng one marginal distribuOon


Compute the posterior using Bayes` rule:

We compute this expression by wriOng the joint probability :



Problem: CompuOng all NK states and marginalizing explicitly is intractable. SoluOon: Re-‐order terms and move summaOons to the right



Define funcOon of variable w1 (two rightmost terms)

Then compute funcOon of variables w2 in terms of previous funcOon

Leads to the recursive relaOon



We work our way through the sequence using this recursion. At the end we normalize the result to compute the posterior

Total number of summaOons is (N-‐1)K as opposed to KN for brute force approach.

Forward-‐backward algorithm


•  We could compute the other N-‐1 marginal posterior distribuOons using a similar set of computaOons

•  However, this is inefficient, as much of the computaOon is duplicated

•  The forward-‐backward algorithm computes all of the marginal posteriors at once

SoluOon:

Compute all first term using a recursion

Compute all second terms using a recursion

... and take products

Forward recursion


Using condiOonal independence relaOons

CondiOonal probability rule

This is the same recursion as before

Backward recursion


Using condiOonal independence

relaOons

CondiOonal probability rule

This is another recursion of the form

Forward backward algorithm


Compute the marginal posterior distribuOon as product of two terms

Forward terms: Backward terms:

Belief propagaOon


•  Forward backward algorithm is a special case of a more general technique called belief propagaOon

•  Intermediate funcOons in forward and backward recursions are considered as messages conveying beliefs about the variables.

•  We’ll examine the Sum-‐Product algorithm.

•  The sum-‐product algorithm operates on factor graphs.

Sum product algorithm


•  Forward backward algorithm is a special case of a more general technique called belief propagaOon

•  Intermediate funcOons in forward and backward recursions are considered as messages conveying beliefs about the variables.

•  We’ll examine the Sum-‐Product algorithm.

•  The sum-‐product algorithm operates on factor graphs.

Factor graphs


•  One node for each variable •  One node for each funcOon relaOng variables



Forward pass •  Distribute evidence through the graph

Backward pass •  Collates the evidence

Both phases involve passing messages between nodes: •  The forward phase can proceed in any order as long

as the outgoing messages are not sent unOl all incoming ones received

•  Backward phase proceeds in reverse order to forward



Three kinds of message •  Messages from unobserved variables to funcOons •  Messages from observed variables to funcOons •  Messages from funcOons to variables



Message type 1: •  Messages from unobserved variables z to funcOon g

•  Take product of incoming messages •  InterpretaOon: combining beliefs

Message type 2: •  Messages from observed variables z to funcOon g

•  InterpretaOon: conveys certain belief that observed values are true



Message type 3: •  Messages from a funcOon g to variable z

•  Takes beliefs from all incoming variables except recipient and uses funcOon g to a belief about recipient

CompuOng marginal distribuOons: •  A`er forward and backward passes, we compute the

marginal dists as the product of all incoming messages

Sum product: forward pass


Message from x1 to g1: By rule 2:



Message from g1 to w1: By rule 3:



Message from w1 to g1,2: By rule 1: (product of all incoming messages)



Message from g1,2 from w2: By rule 3:



Messages from x2 to g2 and g2 to w2:



Message from w2 to g2,3:

The same recursion as in the forward backward algorithm



Message from w2 to g2,3:

Sum product: backward pass


Message from wN to gN,N-1:



Message from gN,N-1 to wN-1:



Message from gn,n-1 to wn-1:

The same recursion as in the forward backward algorithm

Sum product: collaOng evidence


•  Marginal distribuOon is products of all messages at node

•  Proof:

Structure



Marginal posterior inference for trees


Apply sum-‐product algorithm to the tree-‐structured graph.

Structure



Tree structured graphs


This graph contains loops But the associated factor graph has structure of a tree

Can sOll use Belief PropagaOon

Learning in chains and trees


Supervised learning (where we know world states wn) is relaOvely easy.

Unsupervised learning (where we do not know world states wn) is more challenging. Use the EM algorithm: •  E-‐step – compute posterior marginals over

states •  M-‐step – update model parameters

For the chain model (hidden Markov model) this is known as the Baum-‐Welch algorithm.

Grid-‐based graphs


O`en in vision, we have one observaOon associated with each pixel in the image grid.

Why not dynamic programming?


When we trace back from the final node, the paths are not guaranteed to converge.



But:

Approaches to inference for grid-‐based models


1.  Prune the graph.

Remove edges unOl an edge remains


2. Combine variables. Merge variables to form compound variable with more states unOl what remains is a tree.

Not pracOcal for large grids




3. Loopy belief propagaOon.

Just apply belief propagaOon. It is not guaranteed to converge, but in pracOce it works well.

4. Sampling approaches

Draw samples from the posterior (easier for directed models) 5.  Other approaches

•  Tree-‐reweighted message passing •  Graph cuts

Structure




Gesture Tracking

Stereo vision


•  Two image taken from slightly different posiOons •  Matching point in image 2 is on same scanline as image 1 •  Horizontal offset is called disparity •  Disparity is inversely related to depth •  Goal – infer dispariOes wm,n at pixel m,n from images x(1) and x(2) Use likelihood:

Stereo vision


Stereo vision


1. Independent pixels

Stereo vision


2. Scanlines as chain model (hidden Markov model)

Stereo vision


3. Pixels organized as tree (from Veksler 2005)

Pictorial Structures


SegmentaOon


Part 1 Conclusion


•  For the special case of chains and trees we can perform MAP inference and compute marginal posteriors efficiently.

•  Unfortunately, many vision problems are defined on pixel grids – this requires special methods

Part 2: Graphical Models for Grids

85


•  Stereo vision

Example ApplicaOon

Part 2 Structure

•  Denoising problem •  Markov random fields (MRFs) •  Max-‐flow / min-‐cut •  Binary MRFs -‐ submodular (exact soluOon) •  MulO-‐label MRFs – submodular (exact soluOon) •  MulO-‐label MRFs -‐ non-‐submodular (approximate)

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 87

Models for grids


•  Consider models with one unknown world state at each pixel in the image – takes the form of a grid.

•  Loops in the graphical model, so cannot use dynamic programming or belief propagaOon

•  Define probability distribuOons that favor certain configuraOons of world states –  Called Markov random fields –  Inference using a set of techniques called graph cuts


Binary Denoising

Before A`er Image represented as binary discrete variables. Some proporOon of pixels

randomly changed polarity.


MulO-‐label Denoising

Before A`er Image represented as discrete variables represenOng intensity. Some

proporOon of pixels randomly changed according to a uniform distribuOon.


Denoising Goal

Observed Data Uncorrupted Image

•  Most of the pixels stay the same •  Observed image is not as smooth as original Now consider pdf over binary images that encourages smoothness – Markov random field


Denoising Goal



Markov random fields

This is just the typical property of an undirected model. We’ll conOnue the discussion in terms of undirected models



Normalizing constant (parOOon funcOon) PotenOal funcOon

Returns posiOve number

Subset of variables (clique)



Normalizing constant (parOOon funcOon)

Cost funcOon Returns any number

Subset of variables (clique) RelaOonship


Smoothing Example


Smoothing Example

Smooth soluOons (e.g. 0000,1111) have high probability Z was computed by summing the 16 un-‐normalized probabiliOes


Smoothing Example

Samples from larger grid -‐-‐ mostly smooth Cannot compute parOOon funcOon Z here -‐ intractable


Denoising Goal



Denoising overview Bayes’ rule:

Likelihoods:

Prior: Markov random field (smoothness)

MAP Inference: Graph cuts

Probability of flipping polarity


Denoising with MRFs

Observed image, x

Original image, w

MRF Prior (pairwise cliques)

Inference :

Likelihoods

MAP Inference


Unary terms (compatability of data with label y)

Pairwise terms (compatability of neighboring labels)


Graph Cuts Overview



Graph cuts used to opOmise this cost funcOon:

Three main cases:


Graph Cuts Overview




Approach: Convert minimizaOon into the form of a standard CS problem,

MAXIMUM FLOW or MINIMUM CUT ON A GRAPH Polynomial-‐Ome methods for solving this problem are known


Max-‐Flow Problem

Goal: To push as much ‘flow’ as possible through the directed graph from the source to the sink. Cannot exceed the (non-‐negaOve) capaciOes cij associated with each edge.


Saturated Edges

When we are pushing the maximum amount of flow: •  There must be at least one saturated edge on any path from source to sink

(otherwise we could push more flow) •  The set of saturated edges hence separate the source and sink


AugmenOng Paths

Two numbers represent: current flow / total capacity

Choose any route from source to sink with spare capacity, and push as much flow as you can. One edge (here 6-‐t) will saturate. 108

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

AugmenOng Paths


AugmenOng Paths

Choose another route, respecOng remaining capacity. This Ome edge 6-‐5 saturates.


AugmenOng Paths

A third route. Edge 1-‐4 saturates


AugmenOng Paths

A fourth route. Edge 2-‐5 saturates


AugmenOng Paths

A fi`h route. Edge 2-‐4 saturates

There is now no further route from source to sink – there is a saturated edge along every possible route (highlighted arrows) 113

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

AugmenOng Paths


AugmenOng Paths

The saturated edges separate the source from the sink and form the min-‐cut soluOon. Nodes either connect to the source or connect to the sink.

Graph Cuts: Binary MRF

Unary terms (compatability of data with label w)



First work with binary case (i.e. True label w is 0 or 1) Constrain pairwise costs so that they are “zero-‐diagonal”


Graph ConstrucOon •  One node per pixel (here a 3x3 image) •  Edge from source to every pixel node •  Edge from every pixel node to sink •  Reciprocal edges between neighbours

Note that in the minimum cut EITHER the edge connecOng to the source will be cut, OR the edge connecOng to the sink, but NOT BOTH (unnecessary). Which determines whether we give that pixel label 1 or label 0. Now a 1 to 1 mapping between possible labelling and possible minimum cuts


Graph ConstrucOon Now add capaciOes so that minimum cut, minimizes our cost funcOon Unary costs U(0), U(1) avached to links to source and sink. •  Either one or the other is paid. Pairwise costs between pixel nodes as shown. •  Why? Easiest to understand

with some worked examples. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

117

Example 1


Example 2


Example 3


Graph Cuts: Binary MRF

Unary terms (compatability of data with label w)



Summary of approach

•  Associate each possible soluOon with a minimum cut on a graph •  Set capaciOes on graph, so cost of cut matches the cost funcOon •  Use augmenOng paths to find minimum cut •  This minimizes the cost funcOon and finds the MAP soluOon


General Pairwise costs

Modify graph to •  Add P(0,0) to edge s-‐b

•  Implies that soluOons 0,0 and 1,0 also pay this cost

•  Subtract P(0,0) from edge b-‐a •  SoluOon 1,0 has this cost

removed again

Similar approach for P(1,1)


ReparameterizaOon

The max-‐flow / min-‐cut algorithms require that all of the capaciOes are non-‐negaOve. However, because we have a subtracOon on edge a-‐b we cannot guarantee that this will be the case, even if all the original unary and pairwise costs were posiOve. The soluOon to this problem is reparamaterizaOon: find new graph where costs (capaciOes) are different but choice of minimum soluOon is the same (usually just by adding a constant to each soluOon)


ReparameterizaOon 1

The minimum cut chooses the same links in these two graphs Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

ReparameterizaOon 2

The minimum cut chooses the same links in these two graphs Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

126

Submodularity

Adding together implies

Subtract constant β Add constant, β


Submodularity

If this condiOon is obeyed, it is said that the problem is “submodular” and it can be solved in polynomial Ome. If it is not obeyed then the problem is NP hard. Usually it is not a problem as we tend to favour smooth soluOons.


Denoising Results

Original Pairwise costs increasing

Pairwise costs increasing Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

129

Plan of Talk

•  Denoising problem •  Markov random fields (MRFs) •  Max-‐flow / min-‐cut •  Binary MRFs – submodular (exact soluOon) •  MulO-‐label MRFs – submodular (exact soluOon) •  MulO-‐label MRFs -‐ non-‐submodular (approximate)


ConstrucOon for two pixels (a and b) and four labels (1,2,3,4) There are 5 nodes for each pixel and 4 edges between them have unary costs for the 4 labels. One of these edges must be cut in the min-‐cut soluOon and the choice will determine which label we assign.

MulOple Labels


Constraint Edges

The edges with infinite capacity poinOng upwards are called constraint edges. They prevent soluOons that cut the chain of edges associated with a pixel more than once (and hence given an ambiguous labelling)


MulOple Labels

Inter-‐pixel edges have costs defined as:

Superfluous terms :

For all i,j where K is number of labels Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

133

Example Cuts

Must cut links from before cut on pixel a to a`er cut on pixel b. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

134

Pairwise Costs

If pixel a takes label I and pixel b takes label J

Must cut links from before cut on pixel a to a`er cut on pixel b. Costs were carefully chosen so that sum of these links gives appropriate pairwise term.


ReparameterizaOon


Submodularity We require the remaining inter-‐pixel links to be posiOve so that or

By mathemaOcal inducOon we can get the more general result


Submodularity

If not submodular, then the problem is NP hard. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

138

Convex vs. non-‐convex costs

QuadraOc •  Convex •  Submodular

Truncated QuadraOc •  Not Convex •  Not Submodular

Povs Model •  Not Convex •  Not Submodular


What is wrong with convex costs?

•  Pay lower price for many small changes than one large one •  Result: blurring at large changes in intensity

Observed noisy image Denoised result


Plan of Talk

•  Denoising problem •  Markov random fields (MRFs) •  Max-‐flow / min-‐cut •  Binary MRFs -‐ submodular (exact soluOon) •  MulO-‐label MRFs – submodular (exact soluOon) •  MulO-‐label MRFs -‐ non-‐submodular (approximate)


Alpha Expansion Algorithm •  break mulOlabel problem into a series of binary problems •  at each iteraOon, pick label α and expand (retain original or change to α)

IniOal labelling

IteraOon 1 (orange)

IteraOon 3 (red)

IteraOon 2 (yellow)


Alpha Expansion Ideas •  For every iteraOon –  For every label –  Expand label using opOmal graph cut soluOon

Co-‐ordinate descent in label space. Each step opOmal, but overall global maximum not guaranteed Proved to be within a factor of 2 of global opOmum. Requires that pairwise costs form a metric:


Alpha Expansion ConstrucOon

Binary graph cut – either cut link to source (assigned to α) or to sink (retain current label) Unary costs avached to links between source, sink and pixel nodes appropriately. Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

144


Graph is dynamic. Structure of inter-‐pixel links depends on α and the choice of labels. There are four cases.



Case 1: Adjacent pixels both have label α already. Pairwise cost is zero – no need for extra edges.



Case 2: Adjacent pixels are α,β. Result either

•  α,α (no cost and no new edge). •  α,β (P(α,β), add new edge).



Case 3: Adjacent pixels are β,β. Result either •  β,β (no cost and no new edge). •  α,β (P(α,β), add new edge). •  β,α (P(β,α), add new edge). Computer vision: models, learning and inference. ©2011 Simon J.D. Prince

148


Case 4: Adjacent pixels are β,γ. Result either •  β,γ (P(β,γ), add new edge). •  α,γ (P(α,γ), add new edge). •  β,α (P(β,α), add new edge). •  α,α (no cost and no new edge).


Example Cut 1


Example Cut 1 Important!


Example Cut 2


Example Cut 3


Denoising Results



CondiOonal Random Fields


Directed model for grids

Cannot use graph cuts as three-‐wise term. Easy to draw samples.


•  Background subtracOon

ApplicaOons


•  Grab cut

ApplicaOons


•  Shi`-‐map image ediOng

ApplicaOons


•  Super-‐resoluOon

ApplicaOons


•  Texture synthesis

ApplicaOons


Image QuilOng


•  Synthesizing faces

ApplicaOons

Further resources

•  hvp://www.computervisionmodels.com/ – Code – Links + readings (for these and other topics)

•  Conference papers online: BMVC, CVPR, ECCV, ICCV, etc.

•  Jobs mailing lists: Imageworld, Visionlist

166

Graphical Models for chains, trees and grids

Science

Transcript of Graphical Models for chains, trees and grids