Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department...
-
Upload
jordan-potter -
Category
Documents
-
view
219 -
download
0
Transcript of Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department...
![Page 1: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/1.jpg)
Information Extraction, Data Mining & Joint Inference
Andrew McCallum
Computer Science Department
University of Massachusetts Amherst
Joint work with Charles Sutton, Aron Culotta, Khashayar Rohanemanesh,Ben Wellner, Karl Schultz, Michael Hay, Michael Wick, David Mimno.
![Page 2: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/2.jpg)
My Research
Building models that mine actionable knowledge
from unstructured text.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 3: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/3.jpg)
Extracting Job Openings from the Web
foodscience.com-Job2
JobTitle: Ice Cream Guru
Employer: foodscience.com
JobCategory: Travel/Hospitality
JobFunction: Food Services
JobLocation: Upper Midwest
Contact Phone: 800-488-2611
DateExtracted: January 8, 2001
Source: www.foodscience.com/jobs_midwest.html
OtherCompanyJobs: foodscience.com-Job1
![Page 4: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/4.jpg)
A Portal for Job Openings
![Page 5: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/5.jpg)
Job
Op
enin
gs:
Cat
ego
ry =
Hig
h T
ech
Key
wo
rd =
Jav
a L
oca
tio
n =
U.S
.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 6: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/6.jpg)
Data Mining the Extracted Job Information
![Page 7: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/7.jpg)
IE fromChinese Documents regarding Weather
Department of Terrestrial System, Chinese Academy of Sciences
200k+ documentsseveral millennia old
- Qing Dynasty Archives- memos- newspaper articles- diaries
![Page 8: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/8.jpg)
What is “Information Extraction”
Information Extraction = segmentation + classification + clustering + association
As a familyof techniques:
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.
"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“
Richard Stallman, founder of the Free Software Foundation, countered saying…
Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation
![Page 9: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/9.jpg)
What is “Information Extraction”
Information Extraction = segmentation + classification + association + clustering
As a familyof techniques:
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.
"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“
Richard Stallman, founder of the Free Software Foundation, countered saying…
Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation
![Page 10: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/10.jpg)
What is “Information Extraction”
Information Extraction = segmentation + classification + association + clustering
As a familyof techniques:
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.
"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“
Richard Stallman, founder of the Free Software Foundation, countered saying…
Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation
![Page 11: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/11.jpg)
What is “Information Extraction”
Information Extraction = segmentation + classification + association + clustering
As a familyof techniques:
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the open-source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers.
"We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“
Richard Stallman, founder of the Free Software Foundation, countered saying…
Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation
NAME
TITLE ORGANIZATION
Bill Gates
CEO
Microsoft
Bill Veghte
VP
Microsoft
Richard Stallman
founder
Free Soft..
*
*
*
*
![Page 12: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/12.jpg)
From Text to Actionable Knowledge
SegmentClassifyAssociateCluster
Filter
Prediction Outlier detection Decision support
IE
Documentcollection
Database
Discover patterns - entity types - links / relations - events
DataMining
Spider
Actionableknowledge
![Page 13: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/13.jpg)
A Natural Language Processing Pipeline
Pragmatics
Anaphora Resolution
Semantic Role Labeling
Entity Recognition
Parsing
Chunking
POS tagging
Errorscascade &
accumulate
![Page 14: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/14.jpg)
Unified Natural Language Processing
Pragmatics
Anaphora Resolution
Semantic Role Labeling
Entity Recognition
Parsing
Chunking
POS tagging
Unified,joint
inference.
![Page 15: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/15.jpg)
SegmentClassifyAssociateCluster
Filter
Prediction Outlier detection Decision support
IE
Documentcollection
Database
Discover patterns - entity types - links / relations - events
DataMining
Spider
Actionableknowledge
Uncertainty Info
Emerging Patterns
Solution:
![Page 16: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/16.jpg)
SegmentClassifyAssociateCluster
Filter
Prediction Outlier detection Decision support
IE
Documentcollection
ProbabilisticModel
Discover patterns - entity types - links / relations - events
DataMining
Spider
Actionableknowledge
Solution:
Conditional Random Fields [Lafferty, McCallum, Pereira]
Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…]
Discriminatively-trained undirected graphical models
Complex Inference and LearningJust what we researchers like to sink our teeth into!
Unified Model
![Page 17: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/17.jpg)
Scientific Questions
• What model structures will capture salient dependencies?
• Will joint inference actually improve accuracy?
• How to do inference in these large graphical models?
• How to do parameter estimation efficiently in these models,which are built from multiple large components?
• How to do structure discovery in these models?
![Page 18: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/18.jpg)
Scientific Questions
• What model structures will capture salient dependencies?
• Will joint inference actually improve accuracy?
• How to do inference in these large graphical models?
• How to do parameter estimation efficiently in these models,which are built from multiple large components?
• How to do structure discovery in these models?
![Page 19: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/19.jpg)
Methods of Inference
• Exact– Exhaustively explore all interpretations– Graphical model has low “tree-width”
• Variational– Represent distribution in simpler model that is close
• Monte-Carlo– Randomly (but cleverly) sample to explore
interpretations
![Page 20: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/20.jpg)
Outline
• Examples of IE and Data Mining.
• Motivate Joint Inference
• Brief introduction to Conditional Random Fields
• Joint inference: Information Extraction Examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Joint Segmentation and Co-ref (Sparse BP)
– Probability + First-order Logic, Co-ref on Entities (MCMC)
• Demo: Rexa, a Web portal for researchers
![Page 21: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/21.jpg)
Hidden Markov Models
St -1
St
Ot
St+1
Ot +1
Ot -1
...
...
Finite state model Graphical model
€
P(v s ,
v o )∝ P(st | st−1)P(ot | st )
t=1
|v o |
∏
HMMs are the standard sequence modeling tool in genomics, music, speech, NLP, …
...transitions
observations
o1 o2 o3 o4 o5 o6 o7 o8
Generates:
State sequence
Observation sequence
![Page 22: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/22.jpg)
IE with Hidden Markov Models
Yesterday Ron Parr spoke this example sentence.
Yesterday Ron Parr spoke this example sentence.
Person name: Ron Parr
Given a sequence of observations:
and a trained HMM:
Find the most likely state sequence: (Viterbi)
Any words said to be generated by the designated “person name”state extract as a person name:
person name
location name
background
![Page 23: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/23.jpg)
We want More than an Atomic View of Words
Would like richer representation of text: many arbitrary, overlapping features of the words.
St -1
St
Ot
St+1
Ot +1
Ot -1
identity of wordends in “-ski”is capitalizedis part of a noun phraseis in a list of city namesis under node X in WordNetis in bold fontis indentedis in hyperlink anchorlast person name was femalenext two words are “and Associates”
…
…part of
noun phrase
is “Wisniewski”
ends in “-ski”
![Page 24: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/24.jpg)
Problems with Richer Representationand a Joint Model
These arbitrary features are not independent.– Multiple levels of granularity (chars, words, phrases)
– Multiple dependent modalities (words, formatting, layout)
– Past & future
Two choices:
Model the dependencies.Each state would have its own Bayes Net. But we are already starved for training data!
Ignore the dependencies.This causes “over-counting” of evidence (ala naïve Bayes). Big problem when combining evidence, as in Viterbi!
St -1
St
Ot
St+1
Ot +1
Ot -1
St -1
St
Ot
St+1
Ot +1
Ot -1
![Page 25: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/25.jpg)
Conditional Sequence Models
• We prefer a model that is trained to maximize a conditional probability rather than joint probability:P(s|o) instead of P(s,o):
– Can examine features, but not responsible for generating them.
– Don’t have to explicitly model their dependencies.
– Don’t “waste modeling effort” trying to generate what we are given at test time anyway.
![Page 26: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/26.jpg)
Joint
Conditional
St-1 St
Ot
St+1
Ot+1Ot-1
St-1 St
Ot
St+1
Ot+1Ot-1
...
...
...
...
(A super-special case of Conditional Random Fields.)
[Lafferty, McCallum, Pereira 2001]
where
From HMMs to Conditional Random Fields
Set parameters by maximum likelihood, using optimization method on L.
€
P(v s ,
v o ) = P(st | st−1)P(ot | st )
t=1
|v o |
∏
€
vs = s1,s2,...sn
v o = o1,o2,...on
€
P(v s |
v o ) =
1
P(v o )
P(st | st−1)P(ot | st )t=1
|v o |
∏
€
=1
Z(v o )
Φs(st ,st−1)Φo(ot ,st )t=1
|v o |
∏
€
Φo(t) = exp λ k fk (st ,ot )k
∑ ⎛
⎝ ⎜
⎞
⎠ ⎟
![Page 27: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/27.jpg)
(Linear Chain) Conditional Random Fields
yt -1
yt
xt
yt+1
xt +1
xt -1
Finite state model Graphical model
Undirected graphical model, trained to maximize
conditional probability of output (sequence) given input (sequence)
. . .
FSM states
observations
yt+2
xt +2
yt+3
xt +3
said Jones a Microsoft VP …
OTHER PERSON OTHER ORG TITLE …
output seq
input seq
Asian word segmentation [COLING’04], [ACL’04]IE from Research papers [HTL’04]Object classification in images [CVPR ‘04]
Wide-spread interest, positive experimental results in many applications.
Noun phrase, Named entity [HLT’03], [CoNLL’03]Protein structure prediction [ICML’04]IE from Bioinformatics text [Bioinformatics ‘04],…
[Lafferty, McCallum, Pereira 2001]
€
p(y | x) =1
Zx
Φ(y t , y t−1,x, t)t
∏ where
€
Φ(y t ,y t−1,x, t) = exp λ k fk (y t ,y t−1,x, t)k
∑ ⎛
⎝ ⎜
⎞
⎠ ⎟
![Page 28: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/28.jpg)
Table Extraction from Government ReportsCash receipts from marketings of milk during 1995 at $19.9 billion dollars, was slightly below 1994. Producer returns averaged $12.93 per hundredweight, $0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds, 1 percent above 1994. Marketings include whole milk sold to plants and dealers as well as milk sold directly to consumers. An estimated 1.56 billion pounds of milk were used on farms where produced, 8 percent less than 1994. Calves were fed 78 percent of this milk with the remainder consumed in producer households. Milk Cows and Production of Milk and Milkfat: United States, 1993-95 -------------------------------------------------------------------------------- : : Production of Milk and Milkfat 2/ : Number :------------------------------------------------------- Year : of : Per Milk Cow : Percentage : Total :Milk Cows 1/:-------------------: of Fat in All :------------------ : : Milk : Milkfat : Milk Produced : Milk : Milkfat -------------------------------------------------------------------------------- : 1,000 Head --- Pounds --- Percent Million Pounds : 1993 : 9,589 15,704 575 3.66 150,582 5,514.4 1994 : 9,500 16,175 592 3.66 153,664 5,623.7 1995 : 9,461 16,451 602 3.66 155,644 5,694.3 --------------------------------------------------------------------------------1/ Average number during year, excluding heifers not yet fresh. 2/ Excludes milk sucked by calves.
![Page 29: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/29.jpg)
Table Extraction from Government Reports
Cash receipts from marketings of milk during 1995 at $19.9 billion dollars, was
slightly below 1994. Producer returns averaged $12.93 per hundredweight,
$0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds,
1 percent above 1994. Marketings include whole milk sold to plants and dealers
as well as milk sold directly to consumers.
An estimated 1.56 billion pounds of milk were used on farms where produced,
8 percent less than 1994. Calves were fed 78 percent of this milk with the
remainder consumed in producer households.
Milk Cows and Production of Milk and Milkfat:
United States, 1993-95
--------------------------------------------------------------------------------
: : Production of Milk and Milkfat 2/
: Number :-------------------------------------------------------
Year : of : Per Milk Cow : Percentage : Total
:Milk Cows 1/:-------------------: of Fat in All :------------------
: : Milk : Milkfat : Milk Produced : Milk : Milkfat
--------------------------------------------------------------------------------
: 1,000 Head --- Pounds --- Percent Million Pounds
:
1993 : 9,589 15,704 575 3.66 150,582 5,514.4
1994 : 9,500 16,175 592 3.66 153,664 5,623.7
1995 : 9,461 16,451 602 3.66 155,644 5,694.3
--------------------------------------------------------------------------------
1/ Average number during year, excluding heifers not yet fresh.
2/ Excludes milk sucked by calves.
CRFLabels:• Non-Table• Table Title• Table Header• Table Data Row• Table Section Data Row• Table Footnote• ... (12 in all)
[Pinto, McCallum, Wei, Croft, 2003 SIGIR]
Features:• Percentage of digit chars• Percentage of alpha chars• Indented• Contains 5+ consecutive spaces• Whitespace in this line aligns with prev.• ...• Conjunctions of all previous features,
time offset: {0,0}, {-1,0}, {0,1}, {1,2}.
100+ documents from www.fedstats.gov
![Page 30: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/30.jpg)
Table Extraction Experimental Results
Line labels,percent correct
Table segments,F1
95 % 92 %
65 % 64 %
85 % -
HMM
StatelessMaxEnt
CRF
[Pinto, McCallum, Wei, Croft, 2003 SIGIR]
![Page 31: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/31.jpg)
IE from Research Papers[McCallum et al ‘99]
![Page 32: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/32.jpg)
IE from Research Papers
Field-level F1
Hidden Markov Models (HMMs) 75.6[Seymore, McCallum, Rosenfeld, 1999]
Support Vector Machines (SVMs) 89.7[Han, Giles, et al, 2003]
Conditional Random Fields (CRFs) 93.9[Peng, McCallum, 2004]
error40%
![Page 33: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/33.jpg)
Outline
• The Need for IE and Data Mining.
• Motivate Joint Inference
• Brief introduction to Conditional Random Fields
• Joint inference: Information Extraction Examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Probability + First-order Logic, Co-ref on Entities (MCMC)
– Joint Information Integration (MCMC + Sample Rank)
• Demo: Rexa, a Web portal for researchers
![Page 34: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/34.jpg)
1. Jointly labeling cascaded sequencesFactorial CRFs
Part-of-speech
Noun-phrase boundaries
Named-entity tag
English words
[Sutton, Khashayar, McCallum, ICML 2004]
![Page 35: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/35.jpg)
1. Jointly labeling cascaded sequencesFactorial CRFs
Part-of-speech
Noun-phrase boundaries
Named-entity tag
English words
[Sutton, Khashayar, McCallum, ICML 2004]
![Page 36: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/36.jpg)
1. Jointly labeling cascaded sequencesFactorial CRFs
Part-of-speech
Noun-phrase boundaries
Named-entity tag
English words
[Sutton, Khashayar, McCallum, ICML 2004]
But errors cascade--must be perfect at every stage to do well.
![Page 37: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/37.jpg)
1. Jointly labeling cascaded sequencesFactorial CRFs
Part-of-speech
Noun-phrase boundaries
Named-entity tag
English words
[Sutton, Khashayar, McCallum, ICML 2004]
Joint prediction of part-of-speech and noun-phrase in newswire,matching accuracy with only 50% of the training data.
Inference:Loopy Belief Propagation
![Page 38: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/38.jpg)
Outline
• The Need for IE and Data Mining.
• Motivate Joint Inference
• Brief introduction to Conditional Random Fields
• Joint inference: Information Extraction Examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Probability + First-order Logic, Co-ref on Entities (MCMC)
– Joint Information Integration (MCMC + Sample Rank)
• Demo: Rexa, a Web portal for researchers
![Page 39: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/39.jpg)
2. Jointly labeling distant mentionsSkip-chain CRFs
Senator Joe Green said today … . Green ran for …
…
[Sutton, McCallum, SRL 2004]
Dependency among similar, distant mentions ignored.
![Page 40: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/40.jpg)
2. Jointly labeling distant mentionsSkip-chain CRFs
Senator Joe Green said today … . Green ran for …
…
[Sutton, McCallum, SRL 2004]
14% reduction in error on most repeated field in email seminar announcements.
Inference:Tree reparameterized BP
[Wainwright et al, 2002]
See also[Finkel, et al, 2005]
![Page 41: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/41.jpg)
Outline
• The Need for IE and Data Mining.
• Motivate Joint Inference
• Brief introduction to Conditional Random Fields
• Joint inference: Information Extraction Examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Probability + First-order Logic, Co-ref on Entities (MCMC)
– Joint Information Integration (MCMC + Sample Rank)
• Demo: Rexa, a Web portal for researchers
![Page 42: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/42.jpg)
3. Joint co-reference among all pairsAffinity Matrix CRF
[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]
~25% reduction in error on co-reference of proper nouns in newswire.
Inference:Correlational clusteringgraph partitioning
[Bansal, Blum, Chawla, 2002]
“Entity resolution”“Object correspondence”
DanaHill
Mr.Hill
AmyHall
sheDana
![Page 43: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/43.jpg)
Coreference Resolution
Input
AKA "record linkage", "database record deduplication", "citation matching", "object correspondence", "identity uncertainty"
Output
News article, with named-entity "mentions" tagged
Number of entities, N = 3
#1 Secretary of State Colin Powell he Mr. Powell Powell
#2 Condoleezza Rice she Rice
#3 President Bush Bush
Today Secretary of State Colin Powell met with . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . he . . . . . .. . . . . . . . . . . . . Condoleezza Rice . . . . .. . . . Mr Powell . . . . . . . . . .she . . . . . . . . . . . . . . . . . . . . . Powell . . . . . . . . . . . .. . . President Bush . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . Rice . . . . . . . . . . . . . . . . Bush . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
![Page 44: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/44.jpg)
Inside the Traditional Solution
Mention (3) Mention (4)
. . . Mr Powell . . . . . . Powell . . .
N Two words in common 29Y One word in common 13Y "Normalized" mentions are string identical 39Y Capitalized word in common 17Y > 50% character tri-gram overlap 19N < 25% character tri-gram overlap -34Y In same sentence 9Y Within two sentences 8N Further than 3 sentences apart -1Y "Hobbs Distance" < 3 11N Number of entities in between two mentions = 0 12N Number of entities in between two mentions > 4 -3Y Font matches 1Y Default -19
OVERALL SCORE = 98 > threshold=0
Pair-wise Affinity Metric
Y/N?
![Page 45: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/45.jpg)
Entity Resolution
DanaHill
Mr.Hill
AmyHall
sheDana
“mention”
“mention” “mention”
“mention”
“mention”
![Page 46: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/46.jpg)
Entity Resolution
DanaHill
Mr.Hill
AmyHall
sheDana
“entity”
“entity”
![Page 47: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/47.jpg)
Entity Resolution
DanaHill
Mr.Hill
AmyHall
sheDana
“entity”
“entity”
![Page 48: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/48.jpg)
Entity Resolution
DanaHill
Mr.Hill
AmyHall
sheDana
“entity”
“entity”
“entity”
![Page 49: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/49.jpg)
The Problem
DanaHill
Mr.Hill
AmyHall
sheDana
Independent pairwise affinity with connected components
Pair-wise mergingdecisions are beingmade independentlyfrom each other
They should be madejointly.
Affinity measures are noisy and imperfect.
C
C N
C
C
NN
C
N
N
![Page 50: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/50.jpg)
DanaHill
Mr.Hill
AmyHall
sheDana
€
P(v y |
v x ) =
1
Z v x
exp λ l f l (x i, x j , y ij ) + λ ' f '(y ij ,y jk, y ik )i, j,k
∑l
∑i, j
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
C
C N
C
C
NN
N
NC
[McCallum & Wellner, 2003, ICML]
Make pair-wise mergingdecisions jointly by:- calculating a joint prob.- including all edge weights- adding dependence on consistent triangles.
CRF for Co-reference
![Page 51: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/51.jpg)
DanaHill
Mr.Hill
AmyHall
sheDana
€
P(v y |
v x ) =
1
Z v x
exp λ l f l (x i, x j , y ij ) + λ ' f '(y ij ,y jk, y ik )i, j,k
∑l
∑i, j
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
C
C N
C
C
NN
N
NC
[McCallum & Wellner, 2003, ICML]
CRF for Co-reference
![Page 52: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/52.jpg)
CRF for Co-reference
€
P(v y |
v x ) =
1
Z v x
exp λ l f l (x i, x j , y ij ) + λ ' f '(y ij ,y jk, y ik )i, j,k
∑l
∑i, j
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
C
C N
C
C
NN
N
C
+(23)
+(10)
+(4)
+(17)
-(-55)
∞−
-(-44)
-(-23)+(11)
-(-9) -(-22)
DanaHill
Mr.Hill
AmyHall
sheDana
N
218
![Page 53: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/53.jpg)
CRF for Co-reference
€
P(v y |
v x ) =
1
Z v x
exp λ l f l (x i, x j , y ij ) + λ ' f '(y ij ,y jk, y ik )i, j,k
∑l
∑i, j
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
C
C N
C
N
NN
N
C
+(23)
+(10)
-(4)
+(17)
-(-55)
-(-44)
-(-23)+(11)
-(-9) -(-22)
DanaHill
Mr.Hill
AmyHall
sheDana
N
0∞−210
![Page 54: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/54.jpg)
CRF for Co-reference
€
P(v y |
v x ) =
1
Z v x
exp λ l f l (x i, x j , y ij ) + λ ' f '(y ij ,y jk, y ik )i, j,k
∑l
∑i, j
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
C
N C
N
C
NC
N
N
-(23)
+(10)
+(4)
-(17)
+(-55)
-(-44)
-(-23)-(11)
+(-9) -(-22)
DanaHill
Mr.Hill
AmyHall
sheDana
N
0-12
![Page 55: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/55.jpg)
Inference in these MRFs = Graph Partitioning
C
N C
N
C
NC
N
N
-(23)
+(10)
+(4)
-(17)
+(-55)
-(-44)
-(-23)-(11)
+(-9) -(-22)
DanaHill
Mr.Hill
AmyHall
sheDana
N
[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002],
[Yu, Cross, Shi, 2002]
Correlational Clustering[Bansal & Blum]
[Demaine]
€
log P(v y |
v x )( )∝ λ l f l (x i,x j , y ij )
l
∑i, j
∑ = w ij
i, j w/inparitions
∑ − w ij
i, j acrossparitions
∑
![Page 56: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/56.jpg)
Co-reference Experimental Results
Proper noun co-reference
DARPA ACE broadcast news transcripts, 117 stories
Partition F1 Pair F1Single-link threshold 16 % 18 %Best prev match [Morton] 83 % 89 %MRFs 88 % 92 %
error=30% error=28%
DARPA MUC-6 newswire article corpus, 30 stories
Partition F1 Pair F1Single-link threshold 11% 7 %Best prev match [Morton] 70 % 76 %MRFs 74 % 80 %
error=13% error=17%
[McCallum & Wellner, 2003]
![Page 57: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/57.jpg)
Pairwise Affinity is not Enough
C
N C
N
C
NC
N
N
-(23)
+(10)
+(4)
-(17)
+(-55)
-(-44)
-(-23)-(11)
+(-9) -(-22)
DanaHill
Mr.Hill
AmyHall
sheDana
N
![Page 58: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/58.jpg)
Pairwise Affinity is not Enough
C
N C
N
C
NC
N
N
DanaHill
Mr.Hill
AmyHall
sheDana
N
![Page 59: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/59.jpg)
Pairwise Affinity is not Enough
C
C N
C
N
NN
N
Cshe
she
AmyHall
sheshe
N
![Page 60: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/60.jpg)
Pairwise Comparisons Not EnoughExamples:
mentions are pronouns?• Entities have multiple attributes (name, email, institution, location);
need to measure “compatibility” among them.• Having 2 “given names” is common, but not 4.
– e.g. Howard M. Dean / Martin, Dean / Howard Martin
• Need to measure size of the clusters of mentions. a pair of lastname strings that differ > 5?
We need to ask , questions about a set of mentions
We want first-order logic!
![Page 61: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/61.jpg)
Outline
• The Need for IE and Data Mining.
• Motivate Joint Inference
• Brief introduction to Conditional Random Fields
• Joint inference: Information Extraction Examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Probability + First-order Logic, Co-ref on Entities (MCMC)
– Joint Information Integration (MCMC + Sample Rank)
• Demo: Rexa, a Web portal for researchers
![Page 62: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/62.jpg)
Pairwise Affinity is not Enough
C
C N
C
N
NN
N
Cshe
she
AmyHall
sheshe
N
![Page 63: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/63.jpg)
Partition Affinity CRF
she
she
AmyHall
sheshe
Ask arbitrary questionsabout all entities in a partitionwith first-order logic... ... bringing together LOGIC and PROBABILITY
![Page 64: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/64.jpg)
Partition Affinity CRF
she
she
AmyHall
sheshe
![Page 65: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/65.jpg)
Partition Affinity CRF
she
she
AmyHall
sheshe
![Page 66: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/66.jpg)
Partition Affinity CRF
she
she
AmyHall
sheshe
![Page 67: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/67.jpg)
Partition Affinity CRF
she
she
AmyHall
sheshe
![Page 68: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/68.jpg)
This space complexity is common in probabilistic first-order logic models
![Page 69: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/69.jpg)
ground Markov network
“Markov Logic” First-Order Logic as a Template to Define CRF Parameters
[Richardson & Domingos 2005]
grounding Markov network requires space O(nr)
n = number constants r = highest
clause arity
[Paskin & Russell 2002][Taskar et al 2003]
![Page 70: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/70.jpg)
How can we perform inference and learning in models that cannot be grounded?
![Page 71: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/71.jpg)
Inference in Weighted First-Order LogicSAT Solvers
• Weighted SAT solvers [Kautz et al 1997]
– Requires complete grounding of network
• LazySAT [Singla & Domingos 2006]– Saves memory by only storing clauses that may become unsatisfied
– Initialization still requires time O(nr) to visit all ground clauses
![Page 72: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/72.jpg)
• Gibbs Sampling– Difficult to move between high probability configurations by
changing single variables • Although, consider MC-SAT. [Poon & Domingos ‘06]
• An alternative: Metropolis-Hastings sampling[Culotta & McCallum 2006]
– 2 parts: proposal distribution, acceptance distribution.– Can be extended to partial configurations
• Only instantiate relevant variables
– Successfully used in BLOG models [Milch et al 2005]
– Key advantage: can design arbitrary “smart” jumps
Inference in Weighted First-Order LogicMCMC
![Page 73: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/73.jpg)
Don’t represent all alternatives...
she
she
AmyHall
sheshe
![Page 74: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/74.jpg)
Don’t represent all alternatives... just one
she
she
AmyHall
sheshe
she
she
AmyHall
sheshe
StochasticJump
ProposalDistribution
at a time
![Page 75: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/75.jpg)
Metropolis-Hastings“Jump acceptance probability”
• p(y’)/p(y) : likelihood ratio– Ratio of P(Y|X)
– ZX cancels!
• q(y’|y) : proposal distribution– probability of proposing move y y’– ratio makes up for any biases in the proposal
distribution
![Page 76: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/76.jpg)
Learning the Likelihood Ratio
Given a pair of configurations, learn to rank the “better” configuration higher.
![Page 77: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/77.jpg)
• Most methods require calculating gradient of log-likelihood, P(y1, y2, y3,... | x1, x2, x3,...)...
• ...which in turn requires “expectations of marginals,” P(y1| x1, x2, x3,...)
• But, getting marginal distributions by sampling can be inefficient due to large sample space.
• Alternative: Perceptron. Approx gradient from difference between true output and model’s predicted best output.
• But, even finding model’s predicted best output is expensive.
• We propose: “Sample Rank” [Culotta, Wick, Hall, McCallum, HLT 2007]
Learn to rank intermediate solutions P(y1=1, y2=0, y3=1,... | ...) > P(y1=0, y2=0, y3=1,... | ...)
Parameter Estimation in Large State Spaces
![Page 78: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/78.jpg)
Ranking vs Classification Training
• Instead of training
[Powell, Mr. Powell, he] --> YES[Powell, Mr. Powell, she] --> NO
• ...Rather...
[Powell, Mr. Powell, he] > [Powell, Mr. Powell, she]
• In general, higher-ranked example may contain errors
[Powell, Mr. Powell, George, he] > [Powell, Mr. Powell, George, she]
![Page 79: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/79.jpg)
1.
UPDATE
Ranking Intermediate SolutionsExample
2.
∆ Model = -23∆ Truth = -0.2
3.
∆ Model = 10∆ Truth = -0.1
4.
∆ Model = -10∆ Truth = -0.1
5.
∆ Model = 3∆ Truth = 0.3
• Like Perceptron:Proof of convergence under Marginal Separability
• More constrained than Maximum Likelihood:Parameters must correctly rank incorrect solutions!
![Page 80: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/80.jpg)
1. Proposer:2. Performance Metric:3. Inputs: input sequence x and an initial (random) configuration4. Initialization: set the parameter vector5. Output: Parameters 6. Score function:
7. For t = 1,…,T and i = 0, …, n-1 do. Generate a training instance. Let and be the best and worst configurations among and according to the performance metric. . If
. end if
8. end for
€
y t +1 = Propose(x, y t )
€
F(Y )
€
y0
€
α =0
€
α
€
y t +1 = Propose(x,y t )
€
y +
€
y−
€
y t
€
y t +1€
Scorexα (y) = α .φ(x,y )
€
y t +1
€
Scorexα (y +) < Scorex
α (y−)
€
α =α +φ(x,y +) − φ(x,y−)
Sample Rank Algorithm
![Page 81: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/81.jpg)
Weighted Logics TechniquesOverview
• Metropolis-Hastings (MH) for inference– Freely bake-in domain knowledge about fruitful jumps;
MH safely takes care of its biases.– Avoid memory and time consumption with massive
deterministic constraint factors: built jump functions that simply avoid illegal states.
• “Sample Rank”– Don’t train by likelihood of completely correct solution...– ...train to properly rank intermediate configurations
the partition function (normalizer) cancels!...plus other efficiencies
![Page 82: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/82.jpg)
Partition Affinity CRF Experiments
Likelihood-basedTraining
Rank-basedTraining
Partition
Affinity69.2 79.3
Pairwise
Affinity62.4 72.5
B-Cubed F1 Score on ACE 2004 Noun Coreference
Better Representation
Better Training
To our knowledge, best previously reported results:
1997 65%
2002 67%
2005 68%
New
state-of-the-art
[Culotta, Wick, Hall, McCallum, 2007]
![Page 83: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/83.jpg)
Outline
• The Need for IE and Data Mining.
• Motivate Joint Inference
• Brief introduction to Conditional Random Fields
• Joint inference: Information Extraction Examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Probability + First-order Logic, Co-ref on Entities (MCMC)
– Joint Information Integration (MCMC + Sample Rank)
• Demo: Rexa, a Web portal for researchers
![Page 84: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/84.jpg)
Database A (Schema A)
First Name Last Name Contact
J. Smith 222-444-1337
J. Smith 444 1337
John Smith (1) 4321115555
Database B (Schema B)
Name Phone
John Smith U.S. 222-444-1337
John D. Smith 444 1337
J Smiht 432-111-5555
Schema A Schema B
First Name Name
Last Name Phone
Contact
John #1 John #2
J. Smith John Smith
J. Smith J Smiht
John Smith
John D. Smith
Information Integration
Entity# Name Phone
523 John Smith 222-444-1337
524 John D. Smith 432-111-5555
… … …
Schema MatchingCoreference
Normalized DB
![Page 85: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/85.jpg)
Information Integration Steps
First Name, Last Name
Name Phone
Contact J. Smith
John Smith
J. Smith
A. Jones
Amanda John Smith ..
Amanda Jones ..
… ..
1. Schema Matching
2. Coreference
3. Canonicalization
![Page 86: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/86.jpg)
Problems with a Pipeline
1. Data integration tasks are highly correlated
2. Errors can propagate
![Page 87: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/87.jpg)
Schema Matching First
First Name, Last Name
Name Phone
Contact
1. Schema Matching
Provides Evidence
1. String Identical: F.Name+L.Name==Name2. Same Area Code: 3-gram in Phone/Contact3. …
J. Smith
John Smith
J. Smith
A. JonesAmanda
2. Coreference
NEW FEATURES
![Page 88: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/88.jpg)
Coreference First
1. Field values similar across coref’d records2. Phone+Contact has same value for J. Smith mentions3. …
J. Smith
John Smith
J. Smith
A. JonesAmanda
1. Coreference
NEW FEATURES
Name Phone
Contact
2. Schema Matching
First Name, Last NameProvides Evidence
![Page 89: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/89.jpg)
Problems with a Pipeline
1. Data integration tasks are highly correlated
2. Errors can propagate
![Page 90: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/90.jpg)
Hazards of a Pipeline
Full Name
Company Name
Phone
Contact
1. Schema Matching
Table B
Full Name Company Name
Amanda Jones Smith & Sons
John Smith IBM
Table A
Name Corporation
Amanda Jones J. Smith & Sons
J. Smith IBM
ERRORS PROPOGATE
2. Coreferent?
![Page 91: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/91.jpg)
Coref
Canonicalization
John SmithJ. SmithJ. SmithJ. SmihtJ.S MithJonh smithJohn
Canoni-calization
John Smith
Entity 87
Typically occurs AFTER coreference
Desiterata:• Complete: Contains all information (e.g. first + last)• Error-free: No typos (e.g. avoid “Smiht”)• Central: Represents all mentions (not “Mith”)
Access to such features would be very helpful to Coref
![Page 92: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/92.jpg)
x6 x7
y67
f67x5 x8
x4
f5
y5
f8
y8y54 y54
Schema Matching
y1 y2x3
y3
y13 y23
y12
f1 f2
Coreference and Canonicalization
€
P(Y | X) =1
ZXψw(yi, xi) ψb(yij, xij)
yi,yj∈Y
∏yi∈Y
∏
€
ψ(yi,xi) = exp λ kfk(yi,xi)k
∑ ⎛
⎝ ⎜
⎞
⎠ ⎟f7
y5
y7
• x6 is a set of attributes {phone,contact,telephone}}
• x7 is a set of attributes {last name, last name}
• f67 is a factor between x6/x7
• y67 is a binary variable indicating a match (no)
• f7 is a factor over cluster x7
• y7 is a binary variable indicating match (yes)
![Page 93: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/93.jpg)
x6 x7
y67
f67x5 x8
x4
f5
y5
f8
y8y54 y54
Schema Matching
y1 y2x3
y3
y13 y23
y12
f1 f2
Coreference and Canonicalization
€
P(Y | X) =1
ZXψw(yi, xi) ψb(yij, xij)
yi,yj∈Y
∏yi∈Y
∏
€
ψ(yi,xi) = exp λ kfk(yi,xi)k
∑ ⎛
⎝ ⎜
⎞
⎠ ⎟f7
y5
y7
x1 x2
• x1 is a set of mentions {J. Smith,John,John Smith}}
• x2 is a set of mentions {Amanda, A. Jones}
• f12 is a factor between x1/x2
• y12 is a binary variable indicating a match (no)
• f1 is a factor over cluster x1
• y1 is a binary variable indicating match (yes)
• Entity/attribute factors omitted for clarity
![Page 94: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/94.jpg)
x6 x7
y67
f67x5 x8
x4
f5
y5
f8
y8y54 y54
Schema Matching
f43
y1 y2x3
y3
y13 y23
y12
f1 f2
Coreference and Canonicalization
€
P(Y | X) =1
ZXψw(yi, xi) ψb(yij, xij)
yi,yj∈Y
∏yi∈Y
∏
€
ψ(yi,xi) = exp λ kfk(yi,xi)k
∑ ⎛
⎝ ⎜
⎞
⎠ ⎟f7
y5
y7
x1 x2
![Page 95: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/95.jpg)
Dataset
• Faculty and alumni listings from university websites, plus an IE system
• 9 different schemas
• ~1400 mentions, 294 coreferent
![Page 96: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/96.jpg)
Example Schemas
DEX IE Northwestern Fac UPenn Fac
First Name Name Name
Middle Name Title First Name
Last Name PhD Alma Mater Last Name
Title Research Interests Job+Department
Department Office Address
Company Name E-mail
Home Phone
Office Phone
Fax Number
![Page 97: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/97.jpg)
Schema Matching Features
• String identical• Sub string matches• TFIDF weighted cosine distance• All of the above with between coreferent
mentions only
First order quantifications/aggregations over:
![Page 98: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/98.jpg)
Systems
• ISO: Each task in isolation• CASC: Coref -> Schema matching• CASC: Schema matching -> Coref• JOINT: Coref + Schema matching
Each system is evaluated with and withoutJoint canonicalization
Our new work
![Page 99: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/99.jpg)
Coreference Results
Pair MUCF1 Prec Recall F1 Prec Recall
No Canon
ISO 72.7 88.9 61.5 75.0 88.9 64.9
CASC 64.0 66.7 61.5 65.7 66.7 64.9
JOINT 76.5 89.7 66.7 78.8 89.7 70.3
Canon
ISO 78.3 90.0 69.2 80.6 90.0 73.0
CASC 65.8 67.6 64.1 67.6 67.6 67.6
JOINT 81.7 90.6 74.4 84.1 90.6 74.4
Note: cascade does worse than ISO
![Page 100: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/100.jpg)
Schema Matching Results
Pair MUCF1 Prec Recall F1 Prec Recall
No Canon
ISO 50.9 40.9 67.5 69.2 81.8 60.0
CASC 50.9 40.9 67.5 69.2 81.8 60.0
JOINT 68.9 100 52.5 69.6 100 53.3
Canon
ISO 50.9 40.9 67.5 69.2 81.8 60.0
CASC 52.3 41.8 70.0 74.1 83.3 66.7
JOINT 71.0 100 55.0 75.0 100 60.0
Note: cascade not as harmful here
![Page 101: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/101.jpg)
Outline
• The Need for IE and Data Mining.
• Motivate Joint Inference
• Brief introduction to Conditional Random Fields
• Joint inference: Information Extraction Examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Probability + First-order Logic, Co-ref on Entities (MCMC)
– Joint Information Integration (MCMC + Sample Rank)
• Demo: Rexa, a Web portal for researchers
![Page 102: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/102.jpg)
Data Mining Research Literature
• Better understand structure of our own research area.
• Structure helps us learn a new field.• Aid collaboration• Map how ideas travel through social networks
of researchers.
• Aids for hiring and finding reviewers!• Measure impact of papers or people.
![Page 103: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/103.jpg)
Our Data
• Over 1.6 million research papers, gathered as part of Rexa.info portal.
• Cross linked references / citations.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 104: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/104.jpg)
Previous Systems
![Page 105: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/105.jpg)
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 106: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/106.jpg)
ResearchPaper
Cites
Previous Systems
![Page 107: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/107.jpg)
ResearchPaper
Cites
Person
UniversityVenue
Grant
Groups
Expertise
More Entities and Relations
![Page 108: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/108.jpg)
![Page 109: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/109.jpg)
![Page 110: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/110.jpg)
![Page 111: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/111.jpg)
![Page 112: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/112.jpg)
![Page 113: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/113.jpg)
![Page 114: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/114.jpg)
![Page 115: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/115.jpg)
![Page 116: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/116.jpg)
![Page 117: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/117.jpg)
![Page 118: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/118.jpg)
![Page 119: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/119.jpg)
![Page 120: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/120.jpg)
![Page 121: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/121.jpg)
Topical TransferCitation counts from one topic to another.
Map “producers and consumers”
![Page 122: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/122.jpg)
Topical Bibliometric Impact Measures
• Topical Citation Counts
• Topical Impact Factors
• Topical Longevity
• Topical Precedence
• Topical Diversity
• Topical Transfer
[Mann, Mimno, McCallum, 2006]
![Page 123: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/123.jpg)
Topical Transfer
Transfer from Digital Libraries to other topics
Other topic Cit’s Paper Title
Web Pages 31 Trawling the Web for Emerging Cyber-Communities, Kumar, Raghavan,... 1999.
Computer Vision 14 On being ‘Undigital’ with digital cameras: extending the dynamic...
Video 12 Lessons learned from the creation and deployment of a terabyte digital video libr..
Graphs 12 Trawling the Web for Emerging Cyber-Communities
Web Pages 11 WebBase: a repository of Web pages
![Page 124: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/124.jpg)
Topical Diversity
Papers that had the most influence across many other fields...
![Page 125: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/125.jpg)
Topical DiversityEntropy of the topic distribution among
papers that cite this paper (this topic).
HighDiversity
LowDiversity
![Page 126: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/126.jpg)
Summary
• Joint inference needed for avoiding cascading errors in information extraction and data mining.– Most fundamental problem in NLP, data mining, ...
• Can be performed in CRFs– Cascaded sequences (Factorial CRFs)– Distant correlations (Skip-chain CRFs)– Co-reference (Affinity-matrix CRFs)– Logic + Probability (efficient by MCMC + Sample Rank)– Information Integration
• Rexa: New research paper search engine, mining the interactions in our community.
![Page 127: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/127.jpg)
![Page 128: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/128.jpg)
![Page 129: Information Extraction, Data Mining & Joint Inference Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles.](https://reader035.fdocuments.in/reader035/viewer/2022062409/56649eca5503460f94bd8a7e/html5/thumbnails/129.jpg)
Outline
• Model / Feature Engineering
– Brief review of IE w/ Conditional Random Fields
– Flexibility to use non-independent features
• Inference
– Entity Resolution with Probability + First-order Logic
– Resolution + Canonicalization + Schema Mapping
– Inference by Metropolis-Hastings
• Parameter Estimation
– Semi-supervised Learning with Label Regularization
– ...with Feature Labeling
– Generalized Expectation criteria