Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

102
Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1

Transcript of Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

Page 1: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

1

Conditional Random Fields

Advanced Statistical Methods in NLPLing 572

February 9, 2012

Page 2: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

2

RoadmapGraphical Models

Modeling independenceModels revisitedGenerative & discriminative models

Conditional random fieldsLinear chain models

Skip chain models

Page 3: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

3

PreviewConditional random fields

Undirected graphical modelDue to Lafferty, McCallum, and Pereira, 2001

Page 4: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

4

PreviewConditional random fields

Undirected graphical modelDue to Lafferty, McCallum, and Pereira, 2001

Discriminative modelSupports integration of rich feature sets

Page 5: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

5

PreviewConditional random fields

Undirected graphical modelDue to Lafferty, McCallum, and Pereira, 2001

Discriminative modelSupports integration of rich feature sets

Allows range of dependency structuresLinear-chain, skip-chain, generalCan encode long-distance dependencies

Page 6: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

6

PreviewConditional random fields

Undirected graphical modelDue to Lafferty, McCallum, and Pereira, 2001

Discriminative modelSupports integration of rich feature sets

Allows range of dependency structuresLinear-chain, skip-chain, generalCan encode long-distance dependencies

Used diverse NLP sequence labeling tasks:Named entity recognition, coreference resolution, etc

Page 7: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

7

Graphical Models

Page 8: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

8

Graphical ModelsGraphical model

Simple, graphical notation for conditional independence

Probabilistic model where:Graph structure denotes conditional independence

b/t random variables

Page 9: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

9

Graphical ModelsGraphical model

Simple, graphical notation for conditional independence

Probabilistic model where:Graph structure denotes conditional independence

b/t random variables

Nodes: random variables

Page 10: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

10

Graphical ModelsGraphical model

Simple, graphical notation for conditional independence

Probabilistic model where:Graph structure denotes conditional independence

b/t random variables

Nodes: random variablesEdges: dependency relation between random

variables

Page 11: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

11

Graphical ModelsGraphical model

Simple, graphical notation for conditional independence Probabilistic model where:

Graph structure denotes conditional independence b/t random variables

Nodes: random variablesEdges: dependency relation between random variables

Model types: Bayesian Networks Markov Random Fields

Page 12: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

12

Modeling (In)dependenceBayesian network

Page 13: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

13

Modeling (In)dependenceBayesian network

Directed acyclic graph (DAG)

Page 14: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

14

Modeling (In)dependenceBayesian network

Directed acyclic graph (DAG)Nodes = Random VariablesArc ~ directly influences, conditional

dependency

Page 15: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

15

Modeling (In)dependenceBayesian network

Directed acyclic graph (DAG)Nodes = Random VariablesArc ~ directly influences, conditional

dependency

Arcs = Child depends on parent(s)No arcs = independent (0 incoming: only a priori)Parents of X = For each X need

)(X))(|( XXP

Page 16: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

16

Example I

Russel & Norvig, AIMA

Page 17: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

17

Example I

Russel & Norvig, AIMA

Page 18: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

18

Example I

Russel & Norvig, AIMA

Page 19: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

19

Simple Bayesian NetworkMCBN1

A

B C

D E

A B depends on C depends on D depends on E depends on

Need: Truth table

Page 20: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

20

Simple Bayesian NetworkMCBN1

A

B C

D E

A = only a prioriB depends on C depends on D depends on E depends on

Need:P(A)

Truth table2

Page 21: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

21

Simple Bayesian NetworkMCBN1

A

B C

D E

A = only a prioriB depends on AC depends onD depends onE depends on

Need:P(A)P(B|A)

Truth table22*2

Page 22: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

22

Simple Bayesian NetworkMCBN1

A

B C

D E

A = only a prioriB depends on AC depends on AD depends on E depends on

Need:P(A)P(B|A)P(C|A)

Truth table22*22*2

Page 23: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

23

Simple Bayesian NetworkMCBN1

A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

Need:P(A)P(B|A)P(C|A)P(D|B,C)P(E|C)

Truth table22*22*22*2*22*2

Page 24: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

24

Holmes Example (Pearl)Holmes is worried that his house will be burgled. Forthe time period of interest, there is a 10^-4 a priori chanceof this happening, and Holmes has installed a burglar alarmto try to forestall this event. The alarm is 95% reliable insounding when a burglary happens, but also has a false positive rate of 1%. Holmes’ neighbor, Watson, is 90% sure to call Holmes at his office if the alarm sounds, but he is alsoa bit of a practical joker and, knowing Holmes’ concern, might (30%) call even if the alarm is silent. Holmes’ otherneighbor Mrs. Gibbons is a well-known lush and often befuddled, but Holmes believes that she is four times morelikely to call him if there is an alarm than not.

Page 25: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

25

Holmes Example: Model

There a four binary random variables:

Page 26: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

26

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 27: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

27

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 28: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

28

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 29: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

29

Holmes Example: Model

There a four binary random variables:B: whether Holmes’ house has been burgledA: whether his alarm soundedW: whether Watson calledG: whether Gibbons called

B A

W

G

Page 30: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

30

Holmes Example: Tables

B = #t B=#f

0.0001 0.9999

A=#t A=#fB

#t#f

0.95 0.05 0.01 0.99

W=#t W=#fA

#t#f

0.90 0.10 0.30 0.70

G=#t G=#fA

#t#f

0.40 0.60 0.10 0.90

Page 31: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

31

Bayes’ Nets: Markov Property

Bayes’s Nets:Satisfy the local Markov property

Variables: conditionally independent of non-descendents given their parents

Page 32: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

32

Bayes’ Nets: Markov Property

Bayes’s Nets:Satisfy the local Markov property

Variables: conditionally independent of non-descendents given their parents

Page 33: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

33

Bayes’ Nets: Markov Property

Bayes’s Nets:Satisfy the local Markov property

Variables: conditionally independent of non-descendents given their parents

Page 34: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

34

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=

Page 35: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

35

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=P(A)

Page 36: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

36

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=P(A)P(B|A)

Page 37: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

37

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=P(A)P(B|A)P(C|A)

Page 38: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

38

Simple Bayesian NetworkMCBN1 A

B C

D E

A = only a prioriB depends on AC depends on AD depends on B,CE depends on C

P(A,B,C,D,E)=P(A)P(B|A)P(C|A)P(D|B,C)P(E|C)There exist algorithms for training, inference on BNs

Page 39: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

39

Naïve Bayes Model

Bayes’ Net: Conditional independence of features given class

Y

f1 f2 f3 fk

Page 40: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

40

Naïve Bayes Model

Bayes’ Net: Conditional independence of features given class

Y

f1 f2 f3 fk

Page 41: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

41

Naïve Bayes Model

Bayes’ Net: Conditional independence of features given class

Y

f1 f2 f3 fk

Page 42: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

42

Hidden Markov ModelBayesian Network where:

yt depends on

Page 43: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

43

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt

Page 44: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

44

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt depends on yt

y1 y2 y3 yk

x1 x2 x3 xk

Page 45: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

45

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt depends on yt

y1 y2 y3 yk

x1 x2 x3 xk

Page 46: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

46

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt depends on yt

y1 y2 y3 yk

x1 x2 x3 xk

Page 47: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

47

Hidden Markov ModelBayesian Network where:

yt depends on yt-1

xt depends on yt

y1 y2 y3 yk

x1 x2 x3 xk

Page 48: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

48

Generative ModelsBoth Naïve Bayes and HMMs are generative

models

Page 49: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

49

Generative ModelsBoth Naïve Bayes and HMMs are generative

models

We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y.

(Sutton & McCallum, 2006)State y generates an observation (instance) x

Page 50: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

50

Generative ModelsBoth Naïve Bayes and HMMs are generative

models

We use the term generative model to refer to a directed graphical model in which the outputs topologically precede the inputs, that is, no x in X can be a parent of an output y in Y.

(Sutton & McCallum, 2006)State y generates an observation (instance) x

Maximum Entropy and linear-chain Conditional Random Fields (CRFs) are, respectively, their discriminative model counterparts

Page 51: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

51

Markov Random Fieldsaka Markov Network

Graphical representation of probabilistic modelUndirected graph

Can represent cyclic dependencies(vs DAG in Bayesian Networks, can represent induced

dep)

Page 52: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

52

Markov Random Fieldsaka Markov Network

Graphical representation of probabilistic modelUndirected graph

Can represent cyclic dependencies(vs DAG in Bayesian Networks, can represent induced

dep)

Also satisfy local Markov property:where ne(X) are the neighbors of X

Page 53: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

53

Factorizing MRFsMany MRFs can be analyzed in terms of cliques

Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices vi,vj, there exists E(vi,vj)

Example due to F. Xia

Page 54: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

54

Factorizing MRFsMany MRFs can be analyzed in terms of cliques

Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices vi,vj, there exists E(vi,vj)

Maximal clique can not be extended

Example due to F. Xia

Page 55: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

55

Factorizing MRFsMany MRFs can be analyzed in terms of cliques

Clique: in undirected graph G(V,E), clique is a subset of vertices v in V, s.t. for every pair of vertices vi,vj, there exists E(vi,vj)

Maximal clique can not be extendedMaximum clique is largest clique in G.

Clique:

Maximal clique:

Maximum clique:

Example due to F. Xia

A

B C

E D

Page 56: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

56

MRFsGiven an undirected graph G(V,E), random vars:

X

Cliques over G: cl(G)

Example due to F. Xia

Page 57: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

57

MRFsGiven an undirected graph G(V,E), random vars:

X

Cliques over G: cl(G)

B C

E D

Example due to F. Xia

Page 58: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

58

MRFsGiven an undirected graph G(V,E), random vars:

X

Cliques over G: cl(G)

B C

E D

Example due to F. Xia

Page 59: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

59

Conditional Random FieldsDefinition due to Lafferty et al, 2001:

Let G = (V,E) be a graph such that Y=(Yv)vinV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Yv obey the Markov property with respect to the graph: p(Yv|X,Yw,w!=v)=p(Yv|X,Yw,w~v), where w∼v means that w and v are neighbors in G

Page 60: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

60

Conditional Random FieldsDefinition due to Lafferty et al, 2001:

Let G = (V,E) be a graph such that Y=(Yv)vinV, so that Y is indexed by the vertices of G. Then (X,Y) is a conditional random field in case, when conditioned on X, the random variables Yv obey the Markov property with respect to the graph: p(Yv|X,Yw,w!=v)=p(Yv|X,Yw,w~v), where w∼v means that w and v are neighbors in G.

A CRF is a Markov Random Field globally conditioned on the observation X, and has the form:

Page 61: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

61

Linear-Chain CRFCRFs can have arbitrary graphical structure, but..

Page 62: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

62

Linear-Chain CRFCRFs can have arbitrary graphical structure, but..

Most common form is linear chain Supports sequence modelingMany sequence labeling NLP problems:

Named Entity Recognition (NER), Coreference

Page 63: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

63

Linear-Chain CRFCRFs can have arbitrary graphical structure, but..

Most common form is linear chain Supports sequence modelingMany sequence labeling NLP problems:

Named Entity Recognition (NER), CoreferenceSimilar to combining HMM sequence w/MaxEnt

modelSupports sequence structure like HMM

but HMMs can’t do rich feature structure

Page 64: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

64

Linear-Chain CRFCRFs can have arbitrary graphical structure, but..

Most common form is linear chain Supports sequence modelingMany sequence labeling NLP problems:

Named Entity Recognition (NER), CoreferenceSimilar to combining HMM sequence w/MaxEnt

modelSupports sequence structure like HMM

but HMMs can’t do rich feature structure

Supports rich, overlapping features like MaxEnt but MaxEnt doesn’t directly supports sequences labeling

Page 65: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

65

Discriminative & Generative

Model perspectives (Sutton & McCallum)

Page 66: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

66

Linear-Chain CRFsFeature functions:

In MaxEnt: f: X x Y {0,1}e.g. fj(x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0

o.w.

Page 67: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

67

Linear-Chain CRFsFeature functions:

In MaxEnt: f: X x Y {0,1}e.g. fj(x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0

o.w.

In CRFs, f: Y x Y x X x T Re.g. fk(yt,yt-1,x,t)=1, if yt=V and yt-1=N and xt=“flies”,0

o.w.frequently indicator function, for efficiency

Page 68: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

68

Linear-Chain CRFsFeature functions:

In MaxEnt: f: X x Y {0,1}e.g. fj(x,y) = 1, if x=“rifle” and y=talk.politics.guns, 0

o.w.

In CRFs, f: Y x Y x X x T Re.g. fk(yt,yt-1,x,t)=1, if yt=V and yt-1=N and xt=“flies”,0

o.w.frequently indicator function, for efficiency

Page 69: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

69

Linear-Chain CRFs

Page 70: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

70

Linear-Chain CRFs

Page 71: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

71

Linear-chain CRFs:Training & Decoding

Training:

Page 72: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

72

Linear-chain CRFs:Training & Decoding

Training: Learn λj

Approach similar to MaxEnt: e.g. L-BFGS

Page 73: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

73

Linear-chain CRFs:Training & Decoding

Training: Learn λj

Approach similar to MaxEnt: e.g. L-BFGS

Decoding:Compute label sequence that optimizes P(y|x)Can use approaches like HMM, e.g. Viterbi

Page 74: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

74

Skip-chain CRFs

Page 75: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

75

MotivationLong-distance dependencies:

Page 76: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

76

MotivationLong-distance dependencies:

Linear chain CRFs, HMMs, beam search, etcAll make very local Markov assumptions

Preceding label; current data given current labelGood for some tasks

Page 77: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

77

MotivationLong-distance dependencies:

Linear chain CRFs, HMMs, beam search, etcAll make very local Markov assumptions

Preceding label; current data given current labelGood for some tasks

However, longer context can be usefule.g. NER: Repeated capitalized words should get same

tag

Page 78: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

78

MotivationLong-distance dependencies:

Linear chain CRFs, HMMs, beam search, etcAll make local Markov assumptions

Preceding label; current data given current labelGood for some tasks

However, longer context can be usefule.g. NER: Repeated capitalized words should get same

tag

Page 79: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

79

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Page 80: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

80

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Which edges?

Page 81: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

81

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Which edges? Identical words, words with same stem?

Page 82: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

82

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Which edges? Identical words, words with same stem?

How many edges?

Page 83: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

83

Skip-Chain CRFsBasic approach:

Augment linear-chain CRF model withLong-distance ‘skip edges’

Add evidence from both endpoints

Which edges? Identical words, words with same stem?

How many edges?Not too many, increases inference cost

Page 84: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

84

Skip Chain CRF ModelTwo clique templates:

Standard linear chain template

Page 85: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

85

Skip Chain CRF ModelTwo clique templates:

Standard linear chain templateSkip edge template

Page 86: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

86

Skip Chain CRF ModelTwo clique templates:

Standard linear chain templateSkip edge template

Page 87: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

87

Skip Chain CRF ModelTwo clique templates:

Standard linear chain templateSkip edge template

Page 88: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

88

Skip Chain NERNamed Entity Recognition:

Task: start time, end time, speaker, locationIn corpus of seminar announcement emails

Page 89: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

89

Skip Chain NERNamed Entity Recognition:

Task: start time, end time, speaker, locationIn corpus of seminar announcement emails

All approaches:Orthographic, gazeteer, POS features

Within preceding, following 4 word window

Page 90: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

90

Skip Chain NERNamed Entity Recognition:

Task: start time, end time, speaker, locationIn corpus of seminar announcement emails

All approaches:Orthographic, gazeteer, POS features

Within preceding, following 4 word window

Skip chain CRFs: Skip edges between identical capitalized words

Page 91: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

91

NER Features

Page 92: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

92

Skip Chain NER Results

Skip chain improves substantially on ‘speaker’ recognition- Slight reduction in accuracy for times

Page 93: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

93

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Page 94: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

94

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Linear-chain modelsHMM sequence structure + MaxEnt feature models

Page 95: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

95

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Linear-chain modelsHMM sequence structure + MaxEnt feature models

Skip-chain modelsAugment with longer distance dependencies

Pros:

Page 96: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

96

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Linear-chain modelsHMM sequence structure + MaxEnt feature models

Skip-chain modelsAugment with longer distance dependencies

Pros: Good performanceCons:

Page 97: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

97

SummaryConditional random fields (CRFs)

Undirected graphical modelCompare with Bayesian Networks, Markov Random

Fields

Linear-chain modelsHMM sequence structure + MaxEnt feature models

Skip-chain modelsAugment with longer distance dependencies

Pros: Good performanceCons: Compute intensive

Page 98: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

98

HW #5

Page 99: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

99

HW #5: Beam Search Apply Beam Search to MaxEnt sequence

decoding

Task: POS tagging

Given files:test data: usual formatboundary file: sentence lengthsmodel file

Comparisons:Different topN, topK, beam_width

Page 100: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

Tag ContextFollowing Ratnaparkhi ‘96, model uses previous

tag (prevT=tag) and previous tag bigram (prevTwoTags=tagi-2+tagi-1)

These are NOT in the data file; you compute them on the fly.

Notes:Due to sparseness, it is possible a bigram may not

appear in the model file. Skip it.These are feature functions: If you have a different

candidate tag for the same word, weights will differ.

100

Page 101: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

101

UncertaintyReal world tasks:

Partially observable, stochastic, extremely complex

Probabilities capture “Ignorance & Laziness”Lack relevant facts, conditions

Failure to enumerate all conditions, exceptions

Page 102: Conditional Random Fields Advanced Statistical Methods in NLP Ling 572 February 9, 2012 1.

102

MotivationUncertainty in medical diagnosis

Diseases produce symptoms In diagnosis, observed symptoms => disease IDUncertainties

Symptoms may not occurSymptoms may not be reportedDiagnostic tests not perfect

False positive, false negative

How do we estimate confidence?