Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Dynamic Conditional Random Fieldsfor Labeling and Segmenting Sequences

Khashayar Rohanimanesh

Joint work with

Charles SuttonAndrew McCallum

University of Massachusetts Amherst

Noun Phrase Segmentation(CoNLL-2000, Sang and Buckholz, 2000)

B I I B I I O O ORockwell International Corp. 's Tulsa unit said it signed

B I I O B I O B Ia tentative agreement extending its contract with Boeing Co.

O O B I O B B I Ito provide structural parts for Boeing 's 747 jetliners.

Named Entity Recognition

CRICKET - MILLNS SIGNS FOR BOLAND

CAPE TOWN 1996-08-22

South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England all-rounder Phillip DeFreitas as Boland's overseas professional.

Labels: Examples:

PER Yayuk BasukiInnocent Butare

ORG 3MKDPLeicestershire

LOC LeicestershireNirmal HridayThe Oval

MISC JavaBasque1,000 Lakes Rally

[McCallum & Li, 2003]

Information Extraction

a seminar entitled “Nanorheology of Polymers & Complex

STIME LOCFluids," at 4:30 p.m, Monday, Feb. 27, in Wean Hall 7500.

SPEAKThe seminar will be given by Professor Steven Granick

Seminar Announcements [Peshkin,Pfeffer 2003]

PROTEINSNC1, a gene from the yeast Saccharomyces cerevisiae,

LOCencodes a homolog of vertebrate synaptic vesicle-associated

membrane proteins (VAMPs) or synaptobrevins. ”subcellular-localization(SNC1,vesicle)

Biological Abstracts [Skounakis,Craven,Ray 2003]

Simultaneous noun-phrase & part-of-speech tagging

B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed

B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.

Probabilistic Sequence Labeling

Linear-Chain CRFs

c(,)c(,)

Finite-State

c(,)c(,)

Linear-Chain CRFs

Graphical Model

Training

Um… what's ?

Linear-Chain CRFs

Graphical Model Training

Rewrite as:

for some features fk and weights k

Now solve for k by convex optimization.

General CRFs

A CRF is an undirected, conditionally-trained graphical model.

Train k by convex optimization to maximize conditional log-likelihood.

Features fk can be arbitrary, overlapping, domain-specific.

CRF Training

Train k by convex optimization to maximize conditional log-likelihood.

Optimization Methods

• Generalized Iterative Scaling (GIS)– Improved Iterative Scaling

• First order methods– Non-Linear conjugate gradient

• Second Order methods– Limited memory Quasi-Newton (BFGS)

From Generative to Conditional

Graphical ModelModel

Linear chainCRFs

Models observation

- Does not model observation- Label bias problem

- Does not model observation- Eliminates label bias problem

Dynamic CRFs

Simultaneous noun-phrase & part-of-speech tagging

Features

• Word identity “International”• Capitalization Xxxxxxx• Character classes Contains digits• Character n-gram …ment• Lexicon memberships In list of company

names• WordNet synset (speak, say, tell)• …• Part of speech Proper Noun

Multiple Nested Predictionson the Same Sequence

Part-of-speech

Word identity (input observation)

(output prediction)

Noun phrase

Rockwell Int’l Corp. 's Tulsa

Multiple Nested Predictionson the Same Sequence

Part-of-speech

Noun phrase

Word identity (input observation)

(input observation)

(output prediction)

But errors in each stage are compounding.Uncertainty from one stage to the next is not preserved.

Rockwell Int’l Corp. 's Tulsa

Cascaded Predictions

Segmentation

Chinese character (input observation)

(output prediction)

Part-of-speech

Named-entity tag

Segmentation

Part-of-speech

(input observation)

(output prediction)

Named-entity tag

Segmentation

Part-of-speech

Named-entity tag

(input observation)

(input obseration)

(output prediction)

Even more stages here, so compounding of errors is worse.

Joint PredictionCross-Product over Labels

Segmentation+POS+NE

(output prediction)

2 x 45 x 11 = 990 possible states

O(T x 9902) running time

O(|V| x 9902) parameters

e.g.: state label = (Wordbeg, Noun, Person)

Segmentation

Part-of-speech

Named-entity tag

(output prediction)

O(|V| x 990) parameters

Joint PredictionFactorial CRF

Linear-chain

Factorial

() exp k fk ()k

p(y | x) 1

Z(x)y (y t , y t 1)xy (x t , y t )

Linear-Chain to Factorial CRFsModel Definition

p(y | x) 1

Z(x)u(ut ,ut 1)v (v t ,v t 1)w (wt ,wt 1)

uv (ut ,v t )vw (v t ,wt )wx (wt , x t )

Linear-chain

Factorial

Linear-Chain to Factorial CRFsLog-likelihood Training

fk (x(i),ut( i),ut 1

(i) )t

p(u | x)u

fk (x( i),ut ,ut 1)i

fk (x(i),y t( i),y t 1

(i) )t

p(y | x)u

fk (x( i),y t ,y t 1)i

Dynamic CRFsUndirected conditionally-trained analogue

to Dynamic Bayes Nets (DBNs)

Factorial Higher-Order Hierarchical

Need for Inference

Marginal distributions

Most-likely (Viterbi) labeling

p(y t ,y t1 | x)

argmaxy

p(y | x)

Used during training

Used to label a sequence 9000 training instances x 100 maximizer iterations= 900,000 calls to inference algorithm!

Max-clique: 3 x 45 x 45 = 6075 assignments

Inference (Exact)Junction Tree

Max-clique: 3 x 45 x 45 x 11 = 66825 assignments

Inference (Exact)Junction Tree

Inference (Approximate)Loopy Belief Approximation

v3v2v1

m4(v1) m6(v3)m5(v2)m1(v4) m3(v6)m2(v5)

m1(v2)

m5(v6)m4(v5)

m2(v3)

m5(v4) m5(v4)

m3(v2)m2(v1)

[Wainwright, Jaakkola, Willsky 2001]

Inference (Approximate)Tree Re-parameterization

p23p2 p3

p36p3 p6

p25p2 p5

p14p1p4

p12p1p2

p23p2 p3

p36p3 p6

p25p2 p5

p14p1p4

p12p1p2

ExperimentsSimultaneous noun-phrase & part-of-speech tagging

• Data from CoNLL Shared Task 2000 (Newswire)– Training subsets of various sizes: from 223-894 sentences– Features include: word identity, neighboring words,

capitalization, lexicons of parts-of-speech, company names (1,358227 feature functions !)

ExperimentsSimultaneous noun-phrase & part-of-speech tagging

Two experiments• Compare exact and approximate inference• Compare accuracy of cascaded CRFs and Factorial DCRFs

Noun Phrase Accuracy

Accuracy

POS-tagger, (Brill, 1994) F1 for NP on 8936: 93.87

Summary

• Many natural language tasks are solved by chaining errorful subtasks.

• Approach: Jointly solve all subtasks in a single graphical model.– Learn dependence between subtasks– Allow higher-level to inform lower level

• Improved joint and POS accuracy over cascaded model, but NP accuracy lower.

• Current work: Emphasize one subtask

Maximize Marginal Likelihood(Ongoing work)

O() log p(np( i) | x( i))i

p(np( i),pos(i) | x(i))pos

p(pos | np( i),x(i)) fk (pos,np( i),x(i))pos

p(pos,np | x(i)) fk (pos,np,x(i))np

Thank you!

State-of-the-art Performance

• POS tagging: – 97% (Brill, 1999)

• NP chinking:– 94.38% (Sha and Pereira)– 94.39% (?)

Alternatives to Traditional Joint

• Optimize Marginal Likelihood

• Optimize Utility

• Optimize Margin (M3N) [Taskar, Guestrin, Koller 2003]

Maximize Marginal Likelihood(Ongoing work)

O() log p(np( i) | x( i))i

p(np( i),pos(i) | x(i))pos

p(pos | np( i),x(i)) fk (pos,np( i),x(i))pos

p(pos,np | x(i)) fk (pos,np,x(i))np

Undirected Graphical Models

Directed

Undirected

Hidden Markov Models

TrainingGraphical Model

p(|)p(|)

p(,)=p() p(|) p(|) p(|) p(|) p(|)

Hidden Markov Models

p(|)p(|)

Finite-State

p(|)p(|)

Graphical Model

p(|)p(|)

p(,)=p() p(|) p(|) p(|) p(|) p(|)

Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Documents

Transcript of Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Segmenting & Targeting

The Segmenting Spectator: Documentary Structure and The ...filmoterapia.pl/.../The-Segmenting-Spectator-Documentary-Structure-… · The Segmenting Spectator: Documentary Structure

11 CS 388: Natural Language Processing: Discriminative Training and Conditional Random Fields (CRFs) for Sequence Labeling Raymond J. Mooney University.

Segmenting Nonsense

Learning Seminar, 2004 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data J. Lafferty, A. McCallum, F. Pereira Presentation:

Segmenting, Targeting and Positiong

The Marketing Mixes 45 CHAPTER TWO SEGMENTING THE MARKET · The Marketing Mixes 45 CHAPTER TWO SEGMENTING THE MARKET ... THE FIVE BASES FOR SEGMENTING CONSUMER ... Geographic Segmentation

Segmenting the market

Segmenting, Targeting & Positioning

Segmenting and Targeting

ANALISIS SEGMENTING, TARGETING, POSITIONING DIVISI …e-repository.perpus.iainsalatiga.ac.id/2395/1/TA DWI... · 2018-03-06 · segmenting, targeting, positioning. Segmenting, targeting

Segmenting Africa's Enterprise Opportunity

Segmenting & Targeting Markets

Process of Segmenting Market

Sequence Labeling (III) Conditional Random Fieldsbrenocon/anlp2018/lectures/...4 • Seq. labeling as log-linear structured prediction 6.2. SEQUENCE LABELING AS STRUCTURE PREDICTION

Segmenting and Targeting Markets

Models for music analysis from a Markov logic …...for tonal harmony music analysis [24]. Here tonal harmony analysis is understood as segmenting and labeling an audio signal according

EY - Segmenting the digital householdFile/ey-segmenting-the-digital-household.pdf · Segmenting the digital household Exploring the diversity of sentiment in today’s digital home

A Hierarchical Conditional Random Field Model for … Hierarchical Conditional Random Field Model for Labeling and Segmenting Images of Street Scenes Qixing Huang Stanford University

Segmenting the business market