Variational Inference for Structured NLP Models

215
Variational Inference for Structured NLP Models ACL, August 4, 2013 David Burkett and Dan Klein

Transcript of Variational Inference for Structured NLP Models

Page 1: Variational Inference for Structured NLP Models

Variational Inference for

Structured NLP Models

ACL, August 4, 2013

David Burkett and Dan Klein

Page 2: Variational Inference for Structured NLP Models

Tutorial Outline

1. Structured Models and Factor Graphs

2. Mean Field

3. Structured Mean Field

4. Belief Propagation

5. Structured Belief Propagation

6. Wrap-Up

Page 3: Variational Inference for Structured NLP Models

Part 1: Structured Models and

Factor Graphs

Page 4: Variational Inference for Structured NLP Models

Structured NLP Models

Inputs

(words)

Outputs

(POS tags)

Example: Hidden Markov Model

(Sample Application: Part of Speech Tagging)

Goal: Queries from posterior

Page 5: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Hidden Markov Model

Page 6: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Hidden Markov Model

Page 7: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Hidden Markov Model

Page 8: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Hidden Markov Model

Page 9: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Hidden Markov Model

Page 10: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Hidden Markov Model

Page 11: Variational Inference for Structured NLP Models

Factor Graph Notation

Factors

Cliques

Unary FactorBinary Factor

Variables Yi

Page 12: Variational Inference for Structured NLP Models

Factor Graph Notation

Factors

Cliques

Variables Yi

Page 13: Variational Inference for Structured NLP Models

Factor Graph Notation

Factors

Cliques

Variables have factor (clique) neighbors:

Variables Yi

Factors have variable neighbors:

Page 14: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Conditional Random Field

(Sample Application: Named Entity Recognition)

(Lafferty et al., 2001)

Page 15: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Conditional Random Field

Page 16: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Conditional Random Field

Page 17: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Edge-Factored Dependency Parsing

the cat ate the rat

L L L L

O O O

O O

O

(McDonald et al., 2005)

Page 18: Variational Inference for Structured NLP Models

Structured NLP Models

the cat ate the rat

L

L

L

LO O O

O O

O

Example: Edge-Factored Dependency Parsing

Page 19: Variational Inference for Structured NLP Models

Structured NLP Models

the cat ate the rat

L

L

R

R

O

O O

O

O O

Example: Edge-Factored Dependency Parsing

Page 20: Variational Inference for Structured NLP Models

Structured NLP Models

the cat ate the rat

L

R

L L

O

O O

O

O

O

Example: Edge-Factored Dependency Parsing

Page 21: Variational Inference for Structured NLP Models

Structured NLP ModelsExample: Edge-Factored Dependency Parsing

LL L

L

L

LL L

L

L

Page 22: Variational Inference for Structured NLP Models

Inference

Input: Factor Graph

Output: Marginals

Page 23: Variational Inference for Structured NLP Models

Inference

Typical NLP Approach: Dynamic Programs!

Examples:

Sequence Models (Forward/Backward)

Phrase Structure Parsing (CKY, Inside/Outside)

Dependency Parsing (Eisner algorithm)

ITG Parsing (Bitext Inside/Outside)

Page 24: Variational Inference for Structured NLP Models

Complex Structured Models

POS Tagging

Named Entity

Recognition

Joint

(Sutton et al., 2004)

Page 25: Variational Inference for Structured NLP Models

Complex Structured Models

Dependency Parsing

with Second Order Features

(Carreras, 2007)

(McDonald & Pereira, 2006)

Page 26: Variational Inference for Structured NLP Models

Complex Structured Models

Word Alignment

I saw the cold cat

vi

el

gato

frío

(Taskar et al., 2005)

Page 27: Variational Inference for Structured NLP Models

Complex Structured Models

Word Alignment

I saw the cold cat

vi

el

gato

frío

Page 28: Variational Inference for Structured NLP Models

Variational Inference

Approximate inference techniques that can be

applied to any graphical model

This tutorial:

Mean Field: Approximate the joint distribution

with a product of marginals

Belief Propagation: Apply tree inference

algorithms even if your graph isn’t a tree

Structure: What changes when your factor graph

has tractable substructures

Page 29: Variational Inference for Structured NLP Models

Part 2: Mean Field

Page 30: Variational Inference for Structured NLP Models

Mean Field Warmup

Wanted:

Iterated Conditional Modes (Besag, 1986)

Idea: coordinate ascent

Key object: assignments

Page 31: Variational Inference for Structured NLP Models

Mean Field Warmup

Wanted:

Page 32: Variational Inference for Structured NLP Models

Mean Field Warmup

Wanted:

Page 33: Variational Inference for Structured NLP Models

Mean Field Warmup

Wanted:

Page 34: Variational Inference for Structured NLP Models

Mean Field Warmup

Wanted:

Page 35: Variational Inference for Structured NLP Models

Mean Field Warmup

Wanted:

Approximate Result:

Page 36: Variational Inference for Structured NLP Models

Iterated Conditional Modes

Example

Page 37: Variational Inference for Structured NLP Models

Iterated Conditional Modes

Example

Page 38: Variational Inference for Structured NLP Models

Iterated Conditional Modes

Example

Page 39: Variational Inference for Structured NLP Models

Iterated Conditional Modes

Example

Page 40: Variational Inference for Structured NLP Models

Iterated Conditional Modes

Example

Page 41: Variational Inference for Structured NLP Models

Iterated Conditional Modes

Example

Page 42: Variational Inference for Structured NLP Models

Iterated Conditional Modes

Example

Page 43: Variational Inference for Structured NLP Models

Iterated Conditional Modes

Example

Page 44: Variational Inference for Structured NLP Models

Mean Field Intro

Mean Field is coordinate ascent,

just like Iterated Conditional

Modes, but with soft

assignments to each variable!

Page 45: Variational Inference for Structured NLP Models

Mean Field Intro

Wanted:

Idea: coordinate ascent

Key object: (approx) marginals

Page 46: Variational Inference for Structured NLP Models

Mean Field Intro

Page 47: Variational Inference for Structured NLP Models

Mean Field Intro

Page 48: Variational Inference for Structured NLP Models

Mean Field Intro

Wanted:

Page 49: Variational Inference for Structured NLP Models

Mean Field Intro

Wanted:

Page 50: Variational Inference for Structured NLP Models

Mean Field Intro

Wanted:

Page 51: Variational Inference for Structured NLP Models

Mean Field Procedure

Wanted:

Page 52: Variational Inference for Structured NLP Models

Mean Field Procedure

Wanted:

Page 53: Variational Inference for Structured NLP Models

Mean Field Procedure

Wanted:

Page 54: Variational Inference for Structured NLP Models

Mean Field Procedure

Wanted:

Page 55: Variational Inference for Structured NLP Models

Example Results

Page 56: Variational Inference for Structured NLP Models

Mean Field Derivation

Goal:

Approximation:

Constraint:

Objective:

Procedure: Coordinate ascent on each

What’s the update?

Page 57: Variational Inference for Structured NLP Models

Mean Field Update

1)

2)

3-9) Lots of algebra

10)

Page 58: Variational Inference for Structured NLP Models

f

Approximate Expectations

Yi

General:

Page 59: Variational Inference for Structured NLP Models

General Update *

Exponential Family:

Generic:

Page 60: Variational Inference for Structured NLP Models

Mean Field Inference Example

1 1

1

1

2 5

2 1

.7

.3

.4 .6

.2 .5

.2 .1

.5

.5

.5 .5

Page 61: Variational Inference for Structured NLP Models

Mean Field Inference Example

1 1

1

1

2 5

2 1

.7

.3

.4 .6

.2 .5

.2 .1

.69

.31

.5 .5

Page 62: Variational Inference for Structured NLP Models

Mean Field Inference Example

1 1

1

1

2 5

2 1

.7

.3

.4 .6

.2 .5

.2 .1

.69

.31

.5 .5

Page 63: Variational Inference for Structured NLP Models

Mean Field Inference Example

1 1

1

1

2 5

2 1

.7

.3

.4 .6

.2 .5

.2 .1

.69

.31

.40 .60

Page 64: Variational Inference for Structured NLP Models

Mean Field Inference Example

1 1

1

1

2 5

2 1

.7

.3

.4 .6

.2 .5

.2 .1

.73

.27

.40 .60

Page 65: Variational Inference for Structured NLP Models

Mean Field Inference Example

1 1

1

1

2 5

2 1

.7

.3

.4 .6

.2 .5

.2 .1

.73

.27

.38 .62

Page 66: Variational Inference for Structured NLP Models

Mean Field Inference Example

1 1

1

1

2 5

2 1

.7

.3

.4 .6

.2 .5

.2 .1

.73

.27

.38 .62

.28 .45

.10 .17

Page 67: Variational Inference for Structured NLP Models

Mean Field Inference Example

2 1

2

1

1 1

1 1

.67

.33

.67 .33

.44 .22

.22 .11

.67

.33

.67 .33

.44 .22

.22 .11

Page 68: Variational Inference for Structured NLP Models

Mean Field Inference Example

1 1

1

1

9 1

1 5

.62

.38

.62 .38

.56 .06

.06 .31

.82

.18

.82 .18

.67 .15

.15 .03

Page 69: Variational Inference for Structured NLP Models

Mean Field Q&A

Are the marginals guaranteed to converge to

the right thing, like in sampling?

Is the algorithm at least guaranteed to

converge to something?

So it’s just like EM?

No

Yes

Yes

Page 70: Variational Inference for Structured NLP Models

Why Only Local Optima?!

Variables:

Discrete distributions:

e.g. P(0,1,0,…0) = 1

All distributions

(all convex combos)

Mean field approximable

(can represent all discrete ones, but not all)

Page 71: Variational Inference for Structured NLP Models

Part 3: Structured Mean Field

Page 72: Variational Inference for Structured NLP Models

Mean Field Approximation

Model: Approximate Graph:

Page 73: Variational Inference for Structured NLP Models

Structured Mean Field

Approximation

Model: Approximate Graph:

(Xing et al, 2003)

Page 74: Variational Inference for Structured NLP Models

Structured Mean Field

Approximation

Page 75: Variational Inference for Structured NLP Models

Structured Mean Field

Approximation

Page 76: Variational Inference for Structured NLP Models

Structured Mean Field

Approximation

Page 77: Variational Inference for Structured NLP Models

Computing Structured Updates

??

Page 78: Variational Inference for Structured NLP Models

Computing Structured Updates

Marginal probability of

under .

Computed with

forward-backward

Updating .

consists of computing

all marginals .

Page 79: Variational Inference for Structured NLP Models

Structured Mean Field Notation

Page 80: Variational Inference for Structured NLP Models

Structured Mean Field Notation

Page 81: Variational Inference for Structured NLP Models

Structured Mean Field Notation

Page 82: Variational Inference for Structured NLP Models

Structured Mean Field Notation

Page 83: Variational Inference for Structured NLP Models

Structured Mean Field Notation

Connected

Components

Page 84: Variational Inference for Structured NLP Models

Structured Mean Field Notation

Neighbors:

Page 85: Variational Inference for Structured NLP Models

Structured Mean Field Updates

Naïve Mean Field:

Structured Mean Field:

Page 86: Variational Inference for Structured NLP Models

Expected Feature Counts

Page 87: Variational Inference for Structured NLP Models

Component Factorizability *

Example Feature

(pointwise product)

Generic ConditionCondition

Page 88: Variational Inference for Structured NLP Models

Component Factorizability *

(Abridged)

Use conjunctive indicator features

Page 89: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

(Burkett et al, 2010)

Page 90: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

SentencesInput:

Page 91: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Output: Trees contain

Nodes

Page 92: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Output:Alignments

Page 93: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Output:Alignments

contain Bispans

Page 94: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Output:

Page 95: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Variables

Page 96: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Variables

Page 97: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Variables

Page 98: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Factors

Page 99: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Factors

Page 100: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Factors

Page 101: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

High

levels

of

product

and

project 产品

项目

水平

Factors

Page 102: Variational Inference for Structured NLP Models

Notational Abuse

Structural factors are implicit

Subscript Omission:

Skip Nonexistent Substructures:

Shorthand:

Page 103: Variational Inference for Structured NLP Models

Model Form

Page 104: Variational Inference for Structured NLP Models

Expected Feature

CountsMarginals

Training

Page 105: Variational Inference for Structured NLP Models

Structured Mean Field

Approximation

Page 106: Variational Inference for Structured NLP Models

Approximate Component Scores

Monolingual parser:

Score for

To compute :

Score for

If we knew :

Score for

Page 107: Variational Inference for Structured NLP Models

Expected Feature Counts

Marginals computed

with bitext inside-outside

Marginals computed

with inside-outside

For fixed :

Page 108: Variational Inference for Structured NLP Models

Initialize:

Inference Procedure

Page 109: Variational Inference for Structured NLP Models

Iterate marginal updates:

Inference Procedure

…until convergence!

Page 110: Variational Inference for Structured NLP Models

Approximate Marginals

Page 111: Variational Inference for Structured NLP Models

Decoding

(Minimum Risk)

Page 112: Variational Inference for Structured NLP Models

Structured Mean Field Summary

Split the model into pieces you have dynamic

programs for

Substitute expected feature counts for actual

counts in cross-component factors

Iterate computing marginals until convergence

Page 113: Variational Inference for Structured NLP Models

Structured Mean Field Tips

Try to make sure cross-component features are products of indicators

You don’t have to run all the way to convergence; marginals are usually pretty good after just a few rounds

Recompute marginals for fast components more frequently than for slow ones

e.g. For joint parsing and alignment, the two monolingual tree marginals ( ) were updated until convergence between each update of the ITG marginals ( )

Page 114: Variational Inference for Structured NLP Models

Break Time!

Page 115: Variational Inference for Structured NLP Models

Part 4: Belief Propagation

Page 116: Variational Inference for Structured NLP Models

Belief Propagation

Wanted:

Idea: pretend graph is a tree

Key objects:

Beliefs (marginals)

Messages

Page 117: Variational Inference for Structured NLP Models

//

Belief Propagation Intro

Assume we

have a tree

Page 118: Variational Inference for Structured NLP Models

Belief Propagation Intro

Page 119: Variational Inference for Structured NLP Models

Messages

Variable to Factor

//

Factor to Variable

Both take form of “distribution” over

Page 120: Variational Inference for Structured NLP Models

Messages General Form

Messages from variables to factors:

Page 121: Variational Inference for Structured NLP Models

Messages General Form

Messages from factors to variables:

Page 122: Variational Inference for Structured NLP Models

Marginal Beliefs

Page 123: Variational Inference for Structured NLP Models

Belief Propagation on Tree-

Structured Graphs

If the factor graph has no cycles, BP is exact

Can always order message computations

After one pass, marginal beliefs are correct

Page 124: Variational Inference for Structured NLP Models

“Loopy” Belief Propagation

Problem: we no longer have a tree

Solution: ignore problem

Page 125: Variational Inference for Structured NLP Models

“Loopy” Belief Propagation

Just start passing messages anyway!

Page 126: Variational Inference for Structured NLP Models

Belief Propagation Q&A

Are the marginals guaranteed to converge to

the right thing, like in sampling?

Well, is the algorithm at least guaranteed to

converge to something, like mean field?

Will everything often work out more or less

OK in practice?

No

No

Maybe

Page 127: Variational Inference for Structured NLP Models

Belief Propagation Example

7 1

1 3

1 1

1 8

6 1

2 3

1 9

8 1

.59

.41

.16

.84

.34 .66

.81 .19

7 1

1 3

.5

.5

.5

.5

.5 .5

.5 .5

.67 .33

.67

.33

Exact BP

Page 128: Variational Inference for Structured NLP Models

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19

1 1

1 8

.67

.33

.5

.5

.67 .33

.5 .5

.31 .69

.23

.77

Exact BP

Page 129: Variational Inference for Structured NLP Models

.67

.33

.74

.26

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19 .5 .5

.31 .69

.23

.77

6 1

2 3 .74 .26

Exact BP

Page 130: Variational Inference for Structured NLP Models

.74

.26

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19

.31 .69

.23

.77

.74 .261 9

8 1.86 .14

.13

.87

Exact BP

Page 131: Variational Inference for Structured NLP Models

.53

.47

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19

.35 .65

.86 .14

.13

.87

Exact BP

7 1

1 3

Page 132: Variational Inference for Structured NLP Models

.53

.47

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19

.30 .70

.86 .14

.14

.86

Exact BP

1 1

1 8

Page 133: Variational Inference for Structured NLP Models

.61

.39

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19

.30 .70

.80 .20

.14

.86

Exact BP

6 1

2 3

Page 134: Variational Inference for Structured NLP Models

.61

.39

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19

.30 .70

.79 .21

.19

.81

Exact BP

1 9

8 1

Page 135: Variational Inference for Structured NLP Models

.57

.43

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19

.37 .63

.77 .23

.21

.79

Exact BP

Page 136: Variational Inference for Structured NLP Models

.57

.43

Belief Propagation Example

.59

.41

.16

.84

.34 .66

.81 .19

.37 .63

.77 .23

.21

.79

Exact BP

.85

.15

.03

.97

.38 .62

.97 .03

Mean

Field

Page 137: Variational Inference for Structured NLP Models

.36

.64

Belief Propagation Example

.29

.71

.19

.81

.24 .76

.77 .23

.28 .72

.69 .31

.27

.73

Exact BP

Page 138: Variational Inference for Structured NLP Models

Playing Telephone

Page 139: Variational Inference for Structured NLP Models

Part 5: Belief Propagation with

Structured Factors

Page 140: Variational Inference for Structured NLP Models

Structured Factors

Problem:

Computing factor messages is exponential in arity

Many models we care about have high-arity factors

Solution:

Take advantage of NLP tricks for efficient sums

Examples:

Word Alignment (at-most-one constraints)

Dependency Parsing (tree constraint)

Page 141: Variational Inference for Structured NLP Models

Warm-up Exercise

Page 142: Variational Inference for Structured NLP Models

Warm-up Exercise

Page 143: Variational Inference for Structured NLP Models

Warm-up Exercise

Page 144: Variational Inference for Structured NLP Models

Warm-up Exercise

Page 145: Variational Inference for Structured NLP Models

Warm-up Exercise

Page 146: Variational Inference for Structured NLP Models

Warm-up Exercise

Page 147: Variational Inference for Structured NLP Models

Warm-up Exercise

Page 148: Variational Inference for Structured NLP Models

Warm-up Exercise

Benefits:

Cleans up notation

Saves time

multiplying

Enables efficient

summing

Page 149: Variational Inference for Structured NLP Models

The Shape of Structured BP

Isolate the combinatorial factors

Figure out how to compute

efficient sums

Directly exploiting sparsity

Dynamic programming

Work out the bookkeeping

Or, use a reference!

Page 150: Variational Inference for Structured NLP Models

Word Alignment with BP

(Cromières & Kurohashi, 2009)

(Burkett & Klein, 2012)

Page 151: Variational Inference for Structured NLP Models

Computing Messages from Factors

Exponential in arity of factor

(have to sum over all assignments)

Arity 1

Arity

Arity

Page 152: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Input:

Goal:

Page 153: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

: Assignment to variables where

Page 154: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

: Assignment to variables where

: Special case for all off

Page 155: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Input:

Goal:

Only need to consider

for

Page 156: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Page 157: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Page 158: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Page 159: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Page 160: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Page 161: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Page 162: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

Page 163: Variational Inference for Structured NLP Models

Computing Constraint Factor

Messages

1. Precompute:

2.

3. Partition:

4. Messages:

Page 164: Variational Inference for Structured NLP Models

Using BP Marginals

Expected Feature Counts:

Marginal Decoding:

Page 165: Variational Inference for Structured NLP Models

Dependency Parsing with BP

(Smith & Eisner, 2008)

(Martins et al., 2010)

Page 166: Variational Inference for Structured NLP Models

Dependency Parsing with BP

Arity 1

Arity 2

Arity

Exponential in arity of factor

Page 167: Variational Inference for Structured NLP Models

Messages from the Tree Factor

Input: for all variables

Goal: for all variables

Page 168: Variational Inference for Structured NLP Models

What Do Parsers Do?

Initial state:

Value of an edge ( has parent ):

Value of a tree:

Run inside-outside to compute:

Total score for all trees:

Total score for an edge:

Page 169: Variational Inference for Structured NLP Models

Initializing the Parser

Product over edges in :

or

Product over ALL edges,

including

Problem:

Solution: Use odds ratios

(Klein & Manning, 2002)

Page 170: Variational Inference for Structured NLP Models

Running the Parser

Sums we want:

Page 171: Variational Inference for Structured NLP Models

Computing Tree Factor Messages

1. Precompute:

2. Initialize:

3. Run inside-outside

4. Messages:

Page 172: Variational Inference for Structured NLP Models

Using BP Marginals

Expected Feature Counts:

Minimum Risk Decoding:

1. Initialize:2. Run parser:

Page 173: Variational Inference for Structured NLP Models

Structured BP Summary

Tricky part is factors whose arity grows with

input size

Simplify the problem by focusing on sums of

total scores

Exploit problem-specific structure to compute

sums efficiently

Use odds ratios to eliminate “default” values

that don’t appear in dynamic program sums

Page 174: Variational Inference for Structured NLP Models

Belief Propagation Tips

Don’t compute unary messages multiple times

Store variable beliefs to save time computing

variable to factor messages (divide one out)

Update the slowest messages less frequently

You don’t usually need to run to convergence;

measure the speed/performance tradeoff

Page 175: Variational Inference for Structured NLP Models

Part 6: Wrap-Up

Page 176: Variational Inference for Structured NLP Models

Mean Field vs Belief Propagation

When to use Mean Field:

Models made up of weakly interacting structures

that are individually tractable

Joint models often have this flavor

When to use Belief Propagation:

Models with intersecting factors that are tractable

in isolation but interact badly

You often get models like this when adding non-

local features to an existing tractable model

Page 177: Variational Inference for Structured NLP Models

Mean Field vs Belief Propagation

Mean Field Advantages

For models where it applies, the coordinate ascent

procedure converges quite quickly

Belief Propagation Advantages

More broadly applicable

More freedom to focus on factor graph design

when modeling

Advantages of Both

Work pretty well when the real posterior is

peaked (like in NLP models!)

Page 178: Variational Inference for Structured NLP Models

Other Variational Techniques

Variational Bayes

Mean Field for models with parametric forms (e.g. Liang et al., 2007; Cohen et al., 2010)

Expectation Propagation

Theoretical generalization of BP

Works kind of like Mean Field in practice; good for product models (e.g. Hall and Klein, 2012)

Convex Relaxation

Optimize a convex approximate objective

Page 179: Variational Inference for Structured NLP Models

Related Techniques

Dual Decomposition

Not probabilistic, but good for finding maxes in

similar models (e.g. Koo et al., 2010; DeNero &

Machery, 2011)

Search approximations

E.g. pruning, beam search, reranking

Orthogonal to approximate inference techniques

(and often stackable!)

Page 180: Variational Inference for Structured NLP Models

Thank You

Page 181: Variational Inference for Structured NLP Models

Appendix A: Bibliography

Page 182: Variational Inference for Structured NLP Models

References

Conditional Random FieldsJohn D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira (2001). Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In ICML.

Edge-Factored Dependency ParsingRyan McDonald, Koby Crammer, and Fernando Pereira (2005). Online Large-Margin Training of Dependency Parsers. In ACL.

Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič (2005). Non-projective Dependency Parsing using Spanning Tree Algorithms. In HLT/EMNLP.

Page 183: Variational Inference for Structured NLP Models

References

Factorial Chain CRFCharles Sutton, Khashayar Rohanimanesh, and Andrew McCallum (2004). Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. In ICML.

Second-Order Dependency ParsingRyan McDonald and Fernando Pereira (2006). Online Learning of Approximate Dependency Parsing Algorithms. In EACL.

Xavier Carreras (2007). Experiments with a Higher-Order Projective Dependency Parser. In CoNLL Shared Task Session.

Page 184: Variational Inference for Structured NLP Models

References

Max Matching Word AlignmentBen Taskar, Simon, Lacoste-Julien, and Dan Klein (2005). A discriminative matching approach to word alignment. In HLT/EMNLP.

Iterated Conditional ModesJulian Besag (1986). On the Statistical Analysis of Dirty Pictures. Journal of the Royal Statistical Society, Series B. Vol. 48, No. 3, pp. 259-302.

Structured Mean FieldEric P. Xing, Michael I. Jordan, and Stuart Russell (2003). A Generalized Mean Field Algorithm for Variational Inference in Exponential Families. In UAI.

Page 185: Variational Inference for Structured NLP Models

References

Joint Parsing and AlignmentDavid Burkett, John Blitzer, and Dan Klein (2010). Joint Parsing and Alignment with Weakly Synchronized Grammars. In NAACL.

Word Alignment with Belief PropagationJan Niehues and Stephan Vogel (2008). Discriminative Word Alignment via Alignment Matrix Modelling. In ACL:HLT.

Fabien Cromières and Sadao Kurohashi (2009). An Alignment Algorithm using Belief Propagation and a Structure-Based Distortion Model. In EACL.

David Burkett and Dan Klein (2012). Fast Inference in Phrase Extraction Models with Belief Propagation. In NAACL.

Page 186: Variational Inference for Structured NLP Models

References

Dependency Parsing with Belief PropagationDavid A. Smith and Jason Eisner (2008). Dependency Parsing by Belief Propagation. In EMNLP.

André F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, and Mário A. T. Figueiredo (2010). Turbo Parsers: Dependency Parsing by Approximate Variational Inference. In EMNLP.

Odds RatiosDan Klein and Chris Manning (2002). A Generative Constituent-Context Model for Improved Grammar Induction. In ACL.

Variational BayesPercy Liang, Slav Petrov, Michael I. Jordan, and Dan Klein (2007). The Infinite PCFG using Hierarchical Dirichlet Processes. In EMNLP/CoNLL.

Shay B. Cohen, David M. Blei, and Noah A. Smith (2010). Variational Inference for Adaptor Grammars. In NAACL.

Page 187: Variational Inference for Structured NLP Models

References

Expectation PropagationDavid Hall and Dan Klein (2012). Training Factored PCFGs with Expectation Propagation. In EMNLP-CoNLL.

Dual DecompositionTerry Koo, Alexander M. Rush, Michael Collins, TommiJaakkola, and David Sontag (2010). Dual Decomposition for Parsing with Non-Projective Head Automata. In EMNLP.

Alexander M. Rush, David Sontag, Michael Collins, and Tommi Jaakkola (2010). On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing. In EMNLP.

John DeNero and Klaus Macherey (2011). Model-Based Aligner Combination Using Dual Decomposition. In ACL.

Page 188: Variational Inference for Structured NLP Models

Further Reading

Theoretical Background

Martin J. Wainwright and Michael I. Jordan (2008). Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in

Machine Learning, Vol. 1, No. 1-2, pp. 1-305.

Gentle Introductions

Christopher M. Bishop (2006). Pattern Recognition

and Machine Learning. Springer.

David J.C. MacKay (2003). Information Theory,

Inference, and Learning Algorithms. Cambridge University Press.

Page 189: Variational Inference for Structured NLP Models

Further Reading

More Variational Inference for Structured NLPZhifei Li, Jason Eisner, and Sanjeev Khudanpur (2009). Variational Decoding for Statistical Machine Translation. In ACL.

Michael Auli and Adam Lopez (2011). A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing. In ACL.

Veselin Stoyanov and Jason Eisner (2012). Minimum-Risk Training of Approximate CRF-Based NLP Systems. In NAACL.

Jason Naradowsky, Sebastian Riedel, and David A. Smith (2012). Improving NLP through Marginalization of Hidden Syntactic Structure. In EMNLP-CoNLL.

Greg Durrett, David Hall, and Dan Klein (2013). Decentralized Entity-Level Modeling for Coreference Resolution. In ACL.

Page 190: Variational Inference for Structured NLP Models

Appendix B: Mean Field Update

Derivation

Page 191: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Model:

Approximate Graph:

Goal:

Page 192: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 193: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 194: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 195: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 196: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 197: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 198: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 199: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 200: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 201: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 202: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 203: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 204: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 205: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 206: Variational Inference for Structured NLP Models

Mean Field Update Derivation

Page 207: Variational Inference for Structured NLP Models

Appendix C: Joint Parsing and

Alignment Component Distributions

Page 208: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

Component Distributions

Page 209: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

Component Distributions

Page 210: Variational Inference for Structured NLP Models

Joint Parsing and Alignment

Component Distributions

Page 211: Variational Inference for Structured NLP Models

Appendix D: Forward-Backward

as Belief Propagation

Page 212: Variational Inference for Structured NLP Models

Forward-Backward as Belief

Propagation

Page 213: Variational Inference for Structured NLP Models

Forward-Backward as Belief

Propagation

Page 214: Variational Inference for Structured NLP Models

Forward-Backward as Belief

Propagation

Page 215: Variational Inference for Structured NLP Models

Forward-Backward Marginal Beliefs