8/3/2019 Mlas06 Nigam Tie 01
1/52
Machine Learning for InformationExtraction: An Overview
Kamal NigamGoogle Pittsburgh
With input, slides and suggestions from William Cohen, Andrew McCallum and Ion Muslea
8/3/2019 Mlas06 Nigam Tie 01
2/52
Example: A Problem
Genomics job
Mt. Baker, the school district
Baker Hostetler, the company
Baker, a job opening
8/3/2019 Mlas06 Nigam Tie 01
3/52
Example: A Solution
8/3/2019 Mlas06 Nigam Tie 01
4/52
Job Openings:Category = Food ServicesKeyword = BakerLocation = Continental U.S.
8/3/2019 Mlas06 Nigam Tie 01
5/52
Extracting Job Openings from the Web
Title: Ice Cream Guru
Description:If you dream of cold creamy
Contact:[email protected]
Category: Travel/Hospitality
Function: Food Services
mailto:[email protected]:[email protected]8/3/2019 Mlas06 Nigam Tie 01
6/52
Potential Enabler of Faceted Search
8/3/2019 Mlas06 Nigam Tie 01
7/52
Lots of Structured Information in Text
8/3/2019 Mlas06 Nigam Tie 01
8/52
IE from Research Papers
8/3/2019 Mlas06 Nigam Tie 01
9/52
What is Information Extraction?
Recovering structured data from formatted text
8/3/2019 Mlas06 Nigam Tie 01
10/52
What is Information Extraction?
Recovering structured data from formatted text
Identifying fields (e.g. named entity recognition)
8/3/2019 Mlas06 Nigam Tie 01
11/52
What is Information Extraction?
Recovering structured data from formatted text
Identifying fields (e.g. named entity recognition)
Understanding relations between fields (e.g. record
association)
8/3/2019 Mlas06 Nigam Tie 01
12/52
What is Information Extraction?
Recovering structured data from formatted text
Identifying fields (e.g. named entity recognition)
Understanding relations between fields (e.g. record
association) Normalization and deduplication
8/3/2019 Mlas06 Nigam Tie 01
13/52
What is Information Extraction?
Recovering structured data from formatted text
Identifying fields (e.g. named entity recognition)
Understanding relations between fields (e.g. record
association) Normalization and deduplication
Today, focus mostly on field identification &
a little on record association
8/3/2019 Mlas06 Nigam Tie 01
14/52
IE Posed as a Machine Learning Task
Training data: documents marked up withground truth
In contrast to text classification, local features
crucial. Features of: Contents
Text just before item
Text just after item
Begin/end boundaries
00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrun
prefix contents suffix
8/3/2019 Mlas06 Nigam Tie 01
15/52
Good Features for Information Extraction
Example word features:
identity of word
is in all caps
ends in -ski
is part of a noun phrase is in a list of city names
is under node X in WordNet orCyc
is in bold font
is in hyperlink anchor
features of past & future
last person name was female
next two words are and
Associates
begins-with-number
begins-with-ordinal
begins-with-punctuation
begins-with-question-wordbegins-with-subject
blank
contains-alphanum
contains-bracketed-
number
contains-http
contains-non-space
contains-number
contains-pipe
contains-question-mark
contains-question-word
ends-with-question-mark
first-alpha-is-capitalized
indented
indented-1-to-4
indented-5-to-10
more-than-one-third-space
only-punctuationprev-is-blank
prev-begins-with-ordinal
shorter-than-30
Creativity and Domain Knowledge Required!
8/3/2019 Mlas06 Nigam Tie 01
16/52
Is Capitalized
Is Mixed Caps
Is All Caps
Initial Cap
Contains Digit
All lowercaseIs Initial
Punctuation
Period
Comma
Apostrophe
Dash
Preceded by HTML tag
Character n-gram classifiersays string is a personname (80% accurate)
In stopword list(the, of, their, etc)
In honorific list(Mr, Mrs, Dr, Sen, etc)
In person suffix list(Jr, Sr, PhD, etc)
In name particle list(de, la, van, der, etc)
In Census lastname list;segmented by P(name)
In Census firstname list;segmented by P(name)
In locations lists(states, cities, countries)
In company name list(J. C. Penny)
In list of company suffixes
(Inc, & Associates,Foundation)
Word Features lists of job titles, Lists of prefixes Lists of suffixes 350 informative phrases
HTML/Formatting Features
{begin, end, in} x{, , , } x{lengths 1, 2, 3, 4, or longer}
{begin, end} of line
Creativity and Domain Knowledge Required!
Good Features for Information Extraction
8/3/2019 Mlas06 Nigam Tie 01
17/52
IE History
Pre-Web
Mostly news articles De Jongs FRUMP[1982]
Hand-built system to fill Schank-style scripts from news wire
Message Understanding Conference (MUC)DARPA [87-95],TIPSTER[92-96]
Most early work dominated by hand-built models
E.g. SRIs FASTUS, hand-built FSMs.
But by 1990s, some machine learning: Lehnert, Cardie, Grishman andthen HMMs: Elkan [Leek 97], BBN [Bikel et al 98]
Web
AAAI 94 Spring Symposium on Software Agents
Much discussion of ML applied to Web. Maes, Mitchell, Etzioni. Tom Mitchells WebKB, 96
Build KBs from the Web.
Wrapper Induction
Initially hand-build, then ML: [Soderland 96], [Kushmeric 97],
8/3/2019 Mlas06 Nigam Tie 01
18/52
Landscape of ML Techniques for IE:
Any of these models can be used to capture words, formatting or both.
Classify Candidates
Abraham Lincoln was born in Kentucky.
Classifier
which class?
Sliding Window
Abraham Lincoln was born in Kentucky.
Classifier
which class?
Try alternatewindow sizes:
Boundary Models
Abraham Lincoln was born in Kentucky.
Classifier
which class?
BEGIN END BEGIN END
BEGIN
Finite State Machines
Abraham Lincoln was born in Kentucky.
Most likely state sequence?
Wrapper Induction
Abraham Lincoln was born in Kentucky.
Learn and apply pattern for a website
PersonName
8/3/2019 Mlas06 Nigam Tie 01
19/52
Sliding Windows & Boundary Detection
8/3/2019 Mlas06 Nigam Tie 01
20/52
Information Extraction by Sliding Windows
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
21/52
Information Extraction by Sliding Windows
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
22/52
Information Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
23/52
Information Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
24/52
Information Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
25/52
Information Extraction with Sliding Windows[Freitag 97, 98; Soderland 97; Califf 98]
00 : pm Place : Wean Hall Rm 5409 Speaker : Sebastian Thrunw t-m w t-1 w t w t+n w t+n+1 w t+n+m
prefix contents suffix
Standard supervised learning setting Positive instances: Candidates with real label
Negative instances: All other candidates
Features based on candidate, prefix and suffix
Special-purpose rule learning systems work wellcourseNumber(X) :-
tokenLength(X,=,2),every(X, inTitle, false),some(X, A, , inTitle, true),
some(X, B, . tripleton, true)
8/3/2019 Mlas06 Nigam Tie 01
26/52
Rule-learning approaches to sliding-window classification: Summary
Representations for classifiers allowrestriction of the relationships betweentokens, etc
Representations are carefully chosensubsets of even more powerfulrepresentations based on logic programming(ILP and Prolog)
Use of these heavyweight representations iscomplicated, but seems to pay off in results
8/3/2019 Mlas06 Nigam Tie 01
27/52
IE by Boundary Detection
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
28/52
IE by Boundary Detection
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
29/52
IE by Boundary Detection
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,
analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
30/52
IE by Boundary Detection
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
31/52
IE by Boundary Detection
GRAND CHALLENGES FOR MACHINE LEARNING
Jaime Carbonell
School of Computer Science
Carnegie Mellon University
3:30 pm
7500 Wean Hall
Machine learning has evolved from obscurity
in the 1970s into a vibrant and popular
discipline in artificial intelligence
during the 1980s and 1990s. As a result
of its success and growth, machine learning
is evolving into a collection of related
disciplines: inductive concept acquisition,analytic learning in problem solving (e.g.
analogy, explanation-based learning),
learning theory (e.g. PAC learning),
genetic algorithms, connectionist learning,
hybrid systems, and so on.
CMU UseNet Seminar Announcement
E.g.
Looking forseminar
location
8/3/2019 Mlas06 Nigam Tie 01
32/52
BWI: Learning to detect boundaries
Another formulation: learn three probabilisticclassifiers: START(i) = Prob( position istarts a field)
END(j) = Prob( positionjends a field)
LEN(k) = Prob( an extracted field has length k)
Then score a possible extraction (i,j) bySTART(i) * END(j) * LEN(j-i)
LEN(k) is estimated from a histogram
[Freitag & Kushmerick, AAAI 2000]
8/3/2019 Mlas06 Nigam Tie 01
33/52
BWI: Learning to detect boundaries
BWI uses boostingto find detectors forSTARTand END
Each weak detector has a BEFOREandAFTERpattern (on tokens before/afterposition i).
Each pattern is a sequence of tokens and/orwildcards like: anyAlphabeticToken, anyToken,anyUpperCaseLetter, anyNumber,
Weak learner for patterns uses greedysearch (+ lookahead) to repeatedly extend apair of empty BEFORE,AFTERpatterns
8/3/2019 Mlas06 Nigam Tie 01
34/52
BWI: Learning to detect boundaries
Field F1Person Name: 30%Location: 61%Start Time: 98%
8/3/2019 Mlas06 Nigam Tie 01
35/52
Problems with Sliding Windowsand Boundary Finders
Decisions in neighboring parts of the inputare made independently from each other.
Nave Bayes Sliding Window may predict a
seminar end time before the seminar start time. It is possible for two overlappingwindows to both
be above threshold.
In a Boundary-Finding system, left boundaries arelaid down independently from right boundaries,and their pairing happens as a separate step.
8/3/2019 Mlas06 Nigam Tie 01
36/52
Finite State Machines
8/3/2019 Mlas06 Nigam Tie 01
37/52
Hidden Markov Models
St-1
St
Ot
St+1
Ot+1
Ot-1
...
...
Finite state model Graphical model
Parameters: for all states S={s1,s2,}
Start state probabilities: P(st)
Transition probabilities: P(st|st-1 )
Observation (emission) probabilities: P(ot|st)Training:
Maximize probability of training observations (w/ prior)
||
1
1 )|()|(),(o
t
tttt soPssPosP
HMMs are the standard sequence modeling tool in
genomics, music, speech, NLP,
...transitions
observations
o1 o2 o3 o4 o5 o6 o7 o8
Generates:
Statesequence
Observationsequence
Usually a multinomial overatomic, fixed alphabet
8/3/2019 Mlas06 Nigam Tie 01
38/52
IE with Hidden Markov Models
Yesterday Lawrence Saul spoke this example sentence.
YesterdayLawrence Saulspoke this example sentence.
Person name: Lawrence Saul
Given a sequence of observations:
and a trained HMM:
Find the most likely state sequence: (Viterbi)
Any words said to be generated by the designated person namestate extract as a person name:
),(maxarg osPs
8/3/2019 Mlas06 Nigam Tie 01
39/52
Generative Extraction with HMMs
Parameters: {P(st|st-1), P(ot|st), for all states st, words ot} Parameters define generative model:
[McCallum, Nigam, Seymore & Rennie 00]
||
1
1 )|()|(),(o
t
tttt soPssPosP
8/3/2019 Mlas06 Nigam Tie 01
40/52
HMM Example: Nymble
Other examples of HMMs in IE: [Leek 97; Freitag & McCallum 99; Seymore et al. 99]
Task: Named Entity Extraction
Train on 450k words of news wire text.Case Language F1 .Mixed English 93%Upper English 91%Mixed Spanish 90%
[Bikel, et al 97]
Person
Org
Other
(Five other name classes)
start-of-sentence
end-of-sentence
Transitionprobabilities
Observationprobabilities
P(st | st-1,ot-1) P(ot | st,st-1)
Back-off to: Back-off to:
P(st | st-1)
P(st)
P(ot | st,ot-1)
P(ot | st)
P(ot)
or
Results:
8/3/2019 Mlas06 Nigam Tie 01
41/52
Regrets from Atomic View of Tokens
Would like richer representation of text:
multiple overlapping features, whole chunks of text.
line, sentence, or paragraph features:
length
is centered in page
percent of non-alphabetics
white-space aligns with next line
containing sentence has two verbs
grammatically contains a question
contains links to authoritative pages
emissions that are uncountable
features at multiple levels of granularity
Example word features:
identity of word
is in all caps
ends in -ski
is part of a noun phrase
is in a list of city names
is under node X in WordNet or Cyc
is in bold font
is in hyperlink anchor
features of past & future last person name was female
next two words are and Associates
P bl i h Ri h R i
8/3/2019 Mlas06 Nigam Tie 01
42/52
Problems with Richer Representationand a Generative Model
These arbitrary features are not independent: Overlapping and long-distance dependences
Multiple levels of granularity (words, characters)
Multiple modalities (words, formatting, layout)
Observations from past and future
HMMs are generativemodels of the text:
Generative models do not easily handle these non-independent features. Two choices:
Model the dependencies. Each state would have its ownBayes Net. But we are already starved for training data!
Ignore the dependencies. This causes over-counting ofevidence (ala nave Bayes). Big problem when combiningevidence, as in Viterbi!
),( osP
8/3/2019 Mlas06 Nigam Tie 01
43/52
Conditional Sequence Models
We would prefer a conditionalmodel:P(s|o) instead of P(s,o):
Can examine features, but not responsible for generatingthem.
Dont have to explicitly model their dependencies.
Dont waste modeling effort trying to generate what we are
given at test time anyway.
If successful, this answers the challenge of
integrating the ability to handle many arbitraryfeatures with the full power of finite state automata.
8/3/2019 Mlas06 Nigam Tie 01
44/52
Conditional Markov Models
St-1
St
Ot
St+1
Ot+1
Ot-1
...
...
Generative (traditional HMM)
||
1
1 )|()|(),(
o
t
tttt soPssPosP
...
transitions
observations
St-1
St
Ot
St+1
Ot+1
Ot-1
...
...
Conditional
...transitions
observations
||
11 ),|()|(
o
tttt ossPosP
Standard belief propagation: forward-backward procedure.Viterbi and Baum-Welch follow naturally.
Maximum Entropy Markov Models [McCallum, Freitag & Pereira, 2000]
MaxEnt POS Tagger [Ratnaparkhi, 1996]SNoW-based Markov Model [Punyakanok & Roth, 2000]
E ti l F
8/3/2019 Mlas06 Nigam Tie 01
45/52
Exponential Formfor Next State Function
k
ttkk
tt
ttsttt sofsoZ
osPossPt
),(exp),(
1)|(),|(
1
11
Capture dependency on st-1 with |S|independent functions, Pst-1(st|ot).
Each state contains a next-state classifierthat, given the next observation, produces aprobability of the next state, Pst-1(st|ot).
st-1
st
Recipe:- Labeled data is assigned to transitions.-Train each states exponential model by maximum entropy
weight feature
8/3/2019 Mlas06 Nigam Tie 01
46/52
Consider this MEMM, and enough training data to perfectly model it:
Pr(0123|rob) = Pr(1|0,r)/Z1 * Pr(2|1,o)/Z2 * Pr(3|2,b)/Z3
= 0.5 * 1 * 1
Pr(0453|rib) = Pr(4|0,r)/Z1 * Pr(5|4,i)/Z2 * Pr(3|5,b)/Z3= 0.5 * 1 *1
Pr(0123|rib)=1
Pr(0453|rob)=1
Label Bias Problem
8/3/2019 Mlas06 Nigam Tie 01
47/52
nn oooossss ,...,,..., 2121
HMM
MEMM
CRF
St-1 St
Ot
St+1
Ot+1Ot-1
St-1 St
Ot
St+1
Ot+1Ot-1
St-1 St
Ot
St+1
Ot+1Ot-1
...
...
...
...
...
...
||
1
1 )|()|(),(o
t
tttt soPssPosP
||
1
1
,
||
1
1
),(
),(
exp1
),|()|(
1
o
t
k
ttkk
j
ttjj
os
o
t
ttt
osg
ssf
Z
ossPosP
tt
||
1
1
),(
),(
exp1
)|(o
t
k
ttkk
j
ttjj
o osg
ssf
ZosP
(A special case of MEMMs and CRFs.)
Conditional Random Fields (CRFs)[Lafferty, McCallum, Pereira 2001]
From HMMs to MEMMs to CRFs
8/3/2019 Mlas06 Nigam Tie 01
48/52
Conditional Random Fields (CRFs)
St St+1 St+2
O = Ot, Ot+1, Ot+2, Ot+3, Ot+4
St+3 St+4
Markov on s, conditional dependency on o.
||
1
1 ),,,(exp1
)|(o
t k
ttkk
o
tossfZ
osP
Hammersley-Clifford-Besag theorem stipulates that the CRF
has this forman exponential function of the cliques in the graph.
Assuming that the dependency structure of the states is tree-shaped(linear chain is a trivial tree), inference can be done by dynamicprogramming in time O(|o| |S|2)just like HMMs.
[Lafferty, McCallum, Pereira 2001]
8/3/2019 Mlas06 Nigam Tie 01
49/52
Training CRFs
),,,(),(
),()|(),(
:gradientlikelihood-Log
}),{|}({
:dataninggiven traiparametersoflikelihood-logMaximize
1
2)()(}{
)()(
penaltysmoothingparameterscurrentbyassignedlabelsusingcountfeaturelabelscorrectusingcountfeature
)(
tt
t
kk
k
i s
ik
i
i
iik
k
i
k
sstofosC
osCosPosCL
--
soL
k
Methods: iterative scaling (quite slow) conjugate gradient (much faster) conjugate gradient with preconditioning (super fast) limited-memory quasi-Newton methods (also super fast)
Complexity comparable to standard Baum-Welch
[Sha & Pereira 2002]& [Malouf 2002]
8/3/2019 Mlas06 Nigam Tie 01
50/52
Sample IE Applications of CRFs
Noun phrase segmentation [Sha & Pereira, 03]
Named entity recognition [McCallum & Li 03]
Protein names in bio abstracts [Settles 05]
Addresses in web pages [Culotta et al. 05] Semantic roles in text [Roth & Yih 05]
RNA structural alignment [Sato & Satakibara 05]
8/3/2019 Mlas06 Nigam Tie 01
51/52
Examples of Recent CRF Research
Semi-Markov CRFs [Sarawagi & Cohen 05] Awkwardness of token level decisions for segments
Segment sequence model alleviates this
Two-level model with sequences of segments,
which are sequences of tokens
Stochastic Meta-Descent [Vishwanathan 06] Stochastic gradient optimization for training
Take gradient step with small batches of examples Order of magnitude faster than L-BFGS
Same resulting accuracies for extraction
8/3/2019 Mlas06 Nigam Tie 01
52/52
Further Reading about CRFs
Charles Sutton and Andrew McCallum. AnIntroduction to Conditional Random Fields forRelational Learning. In Introduction to Statistical
Relational Learning. Edited by Lise Getoor andBen Taskar. MIT Press. 2006.
http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdfhttp://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdfhttp://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdfhttp://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdfhttp://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdfhttp://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdfTop Related