Information Extraction,Conditional Random Fields,and Social Network Analysis
Andrew McCallum
Computer Science DepartmentUniversity of Massachusetts Amherst
Joint work withAron Culotta, Charles Sutton, Ben Wellner, Khashayar Rohanimanesh, Wei Li,
Andres Corrada, Xuerui Wang
Goal:
Mine actionable knowledgefrom unstructured text.
Pages Containing the Phrase“high tech job openings”
Extracting Job Openings from the Web
foodscience.com-Job2
JobTitle: Ice Cream Guru
Employer: foodscience.com
JobCategory: Travel/Hospitality
JobFunction: Food Services
JobLocation: Upper Midwest
Contact Phone: 800-488-2611
DateExtracted: January 8, 2001 Source: www.foodscience.com/jobs_midwest.html
OtherCompanyJobs: foodscience.com-Job1
A Portal for Job Openings
Job
Ope
ning
s:C
ateg
ory
= H
igh
Tech
Key
wor
d =
Java
Lo
catio
n =
U.S
.
Data Mining the Extracted Job Information
IE fromChinese Documents regarding Weather
Department of Terrestrial System, Chinese Academy of Sciences
200k+ documentsseveral millennia old
- Qing Dynasty Archives- memos- newspaper articles- diaries
IE from Cargo Container Ship Manifests
Cargo Tracking Div.US Navy
IE from Research Papers[McCallum et al ‘99]
IE from Research Papers
Mining Research Papers
[Giles et al]
[Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004]
What is “Information Extraction”
Information Extraction = segmentation + classification + clustering + association
As a familyof techniques:October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO BillGates railed against the economic philosophyof open-source software with Orwellianfervor, denouncing its communal licensing asa "cancer" that stifled technologicalinnovation.
Today, Microsoft claims to "love" the open-source concept, by which software code ismade public to encourage improvement anddevelopment by outside programmers. Gateshimself says Microsoft will gladly disclose itscrown jewels--the coveted code behind theWindows operating system--to selectcustomers.
"We can be open source. We love the conceptof shared source," said Bill Veghte, aMicrosoft VP. "That's a super-important shiftfor us in terms of code access.“
Richard Stallman, founder of the FreeSoftware Foundation, countered saying…
Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation
What is “Information Extraction”
Information Extraction = segmentation + classification + association + clustering
As a familyof techniques:October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO BillGates railed against the economic philosophyof open-source software with Orwellianfervor, denouncing its communal licensing asa "cancer" that stifled technologicalinnovation.
Today, Microsoft claims to "love" the open-source concept, by which software code ismade public to encourage improvement anddevelopment by outside programmers. Gateshimself says Microsoft will gladly disclose itscrown jewels--the coveted code behind theWindows operating system--to selectcustomers.
"We can be open source. We love the conceptof shared source," said Bill Veghte, aMicrosoft VP. "That's a super-important shiftfor us in terms of code access.“
Richard Stallman, founder of the FreeSoftware Foundation, countered saying…
Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation
What is “Information Extraction”
Information Extraction = segmentation + classification + association + clustering
As a familyof techniques:October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO BillGates railed against the economic philosophyof open-source software with Orwellianfervor, denouncing its communal licensing asa "cancer" that stifled technologicalinnovation.
Today, Microsoft claims to "love" the open-source concept, by which software code ismade public to encourage improvement anddevelopment by outside programmers. Gateshimself says Microsoft will gladly disclose itscrown jewels--the coveted code behind theWindows operating system--to selectcustomers.
"We can be open source. We love the conceptof shared source," said Bill Veghte, aMicrosoft VP. "That's a super-important shiftfor us in terms of code access.“
Richard Stallman, founder of the FreeSoftware Foundation, countered saying…
Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation
What is “Information Extraction”
Information Extraction = segmentation + classification + association + clustering
As a familyof techniques:October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO BillGates railed against the economic philosophyof open-source software with Orwellianfervor, denouncing its communal licensing asa "cancer" that stifled technologicalinnovation.
Today, Microsoft claims to "love" the open-source concept, by which software code ismade public to encourage improvement anddevelopment by outside programmers. Gateshimself says Microsoft will gladly disclose itscrown jewels--the coveted code behind theWindows operating system--to selectcustomers.
"We can be open source. We love the conceptof shared source," said Bill Veghte, aMicrosoft VP. "That's a super-important shiftfor us in terms of code access.“
Richard Stallman, founder of the FreeSoftware Foundation, countered saying…
Microsoft CorporationCEOBill GatesMicrosoftGatesMicrosoftBill VeghteMicrosoftVPRichard StallmanfounderFree Software Foundation
NAME
TITLE ORGANIZATION
Bill Gates
CEO
Microsoft
Bill Veghte
VPMicrosoft
Richard
Stallman
founder
Free Soft..
*
*
*
*
Larger Context
SegmentClassifyAssociateCluster
Filter
Prediction Outlier detection Decision support
IE
Documentcollection
Database
Discover patterns - entity types - links / relations - events
DataMining
Spider
Actionableknowledge
Outline
• Examples of IE and Data Mining.
• Conditional Random Fields and Feature Induction.• Joint inference: Motivation and examples
– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Joint Segmentation and Co-ref (Iterated Conditional Samples.)
• Two example projects– Email, contact address book, and Social Network Analysis
– Research Paper search and analysis
Hidden Markov Models
S t-1 S t
O t
S t+1
O t +1Ot -1
...
...
Finite state model Graphical model
Parameters: for all states S={s1,s2,…} Start state probabilities: P(st ) Transition probabilities: P(st|st-1 ) Observation (emission) probabilities: P(ot|st )Training: Maximize probability of training observations (w/ prior)
!=
"#||
1
1 )|()|(),(o
t
ttttsoPssPosP
v
vv
HMMs are the standard sequence modeling tool in genomics, music, speech, NLP, …
...transitions
observations
o1 o2 o3 o4 o5 o6 o7 o8
Generates:
State sequenceObservation sequence
Usually a multinomial overatomic, fixed alphabet
IE with Hidden Markov Models
Yesterday Rich Caruana spoke this example sentence.
Yesterday Rich Caruana spoke this example sentence.
Person name: Rich Caruana
Given a sequence of observations:
and a trained HMM:
Find the most likely state sequence: (Viterbi)
Any words said to be generated by the designated “person name”state extract as a person name:
person namelocation namebackground
We want More than an Atomic View of Words
Would like richer representation of text:many arbitrary, overlapping features of the words.
S t-1 S t
O t
S t+1
O t +1Ot -1
identity of wordends in “-ski”is capitalizedis part of a noun phraseis in a list of city namesis under node X in WordNetis in bold fontis indentedis in hyperlink anchorlast person name was femalenext two words are “and Associates”
…
…part of
noun phrase
is “Wisniewski”
ends in “-ski”
Problems with Richer Representationand a Joint Model
These arbitrary features are not independent.– Multiple levels of granularity (chars, words, phrases)– Multiple dependent modalities (words, formatting, layout)– Past & future
Two choices:
Model the dependencies.Each state would have its ownBayes Net. But we are alreadystarved for training data!
Ignore the dependencies.This causes “over-counting” ofevidence (ala naïve Bayes).Big problem when combiningevidence, as in Viterbi!
S t-1 S t
O t
S t+1
O t +1Ot -1
S t-1 S t
O t
S t+1
O t +1Ot -1
Conditional Sequence Models
• We prefer a model that is trained to maximize aconditional probability rather than joint probability:P(s|o) instead of P(s,o):
– Can examine features, but not responsible for generatingthem.
– Don’t have to explicitly model their dependencies.– Don’t “waste modeling effort” trying to generate what we are
given at test time anyway.
Joint
Conditional
St-1 St
Ot
St+1
Ot+1Ot-1
St-1 St
Ot
St+1
Ot+1Ot-1
...
...
...
...
(A super-special case of Conditional Random Fields.)
[Lafferty, McCallum, Pereira 2001]
where
From HMMs to Conditional Random Fields
Set parameters by maximum likelihood, using optimization method on δL.
Conditional Random Fields
St St+1 St+2
O = Ot, Ot+1, Ot+2, Ot+3, Ot+4
St+3 St+4
1. FSM special-case: linear chain among unknowns, parameters tied across time steps.
[Lafferty, McCallum, Pereira 2001]
2. In general: CRFs = "Conditionally-trained Markov Network" arbitrary structure among unknowns
3. Relational Markov Networks [Taskar, Abbeel, Koller 2002]: Parameters tied across hits from SQL-like queries ("clique templates")
Training CRFs
!
Log - likelihood gradient :
"L
"#k
= Ck (v s
( i),v o
(i))i
$ % P{#k }(v s |
v o
(i)) Ck (v s ,
v o
( i))v s
$i
$ %#k
2
Ck (v s ,
v o ) = fk (
t
$v o ,t,st%1,st )
!
Maximize log - likelihood of parameters given training data :
L({"k} |{
v o ,
v s
( i)})
Feature count usingcorrect labels
Feature count usingpredicted labels Smoothing penalty- -
Linear-chain CRFs vs. HMMs
• Comparable computational efficiency for inference
• Features may be arbitrary functions of any or allobservations
– Parameters need not fully specify generation ofobservations; can require less training data
– Easy to incorporate domain knowledge
Main Point #1
Conditional probability sequence models
give great flexibility regarding features used,
and have efficient dynamic-programming-based algorithms for inference.
IE from Research Papers[McCallum et al ‘99]
IE from Research Papers
Field-level F1
Hidden Markov Models (HMMs) 75.6[Seymore, McCallum, Rosenfeld, 1999]
Support Vector Machines (SVMs) 89.7[Han, Giles, et al, 2003]
Conditional Random Fields (CRFs) 93.9[Peng, McCallum, 2004]
Δ error40%
Table Extraction from Government ReportsCash receipts from marketings of milk during 1995 at $19.9 billion dollars, was slightly below 1994. Producer returns averaged $12.93 per hundredweight, $0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds, 1 percent above 1994. Marketings include whole milk sold to plants and dealers as well as milk sold directly to consumers. An estimated 1.56 billion pounds of milk were used on farms where produced, 8 percent less than 1994. Calves were fed 78 percent of this milk with the remainder consumed in producer households. Milk Cows and Production of Milk and Milkfat: United States, 1993-95 -------------------------------------------------------------------------------- : : Production of Milk and Milkfat 2/ : Number :------------------------------------------------------- Year : of : Per Milk Cow : Percentage : Total :Milk Cows 1/:-------------------: of Fat in All :------------------ : : Milk : Milkfat : Milk Produced : Milk : Milkfat -------------------------------------------------------------------------------- : 1,000 Head --- Pounds --- Percent Million Pounds : 1993 : 9,589 15,704 575 3.66 150,582 5,514.4 1994 : 9,500 16,175 592 3.66 153,664 5,623.7 1995 : 9,461 16,451 602 3.66 155,644 5,694.3 --------------------------------------------------------------------------------1/ Average number during year, excluding heifers not yet fresh. 2/ Excludes milk sucked by calves.
Table Extraction from Government Reports
Cash receipts from marketings of milk during 1995 at $19.9 billion dollars,
was
slightly below 1994. Producer returns averaged $12.93 per hundredweight,
$0.19 per hundredweight below 1994. Marketings totaled 154 billion pounds,
1 percent above 1994. Marketings include whole milk sold to plants and
dealers
as well as milk sold directly to consumers.
An estimated 1.56 billion pounds of milk were used on farms where produced,
8 percent less than 1994. Calves were fed 78 percent of this milk with the
remainder consumed in producer households.
Milk Cows and Production of Milk and Milkfat:
United States, 1993-95
--------------------------------------------------------------------------------
: : Production of Milk and Milkfat 2/
: Number :-------------------------------------------------------
Year : of : Per Milk Cow : Percentage : Total
:Milk Cows 1/:-------------------: of Fat in All :------------------
: : Milk : Milkfat : Milk Produced : Milk : Milkfat
--------------------------------------------------------------------------------
: 1,000 Head --- Pounds --- Percent Million Pounds
:
1993 : 9,589 15,704 575 3.66 150,582 5,514.4
1994 : 9,500 16,175 592 3.66 153,664 5,623.7
1995 : 9,461 16,451 602 3.66 155,644 5,694.3
--------------------------------------------------------------------------------
1/ Average number during year, excluding heifers not yet fresh.
2/ Excludes milk sucked by calves.
CRFLabels:• Non-Table• Table Title• Table Header• Table Data Row• Table Section Data Row• Table Footnote• ... (12 in all)
[Pinto, McCallum, Wei, Croft, 2003 SIGIR]
Features:• Percentage of digit chars• Percentage of alpha chars• Indented• Contains 5+ consecutive spaces• Whitespace in this line aligns with prev.• ...• Conjunctions of all previous features,
time offset: {0,0}, {-1,0}, {0,1}, {1,2}.
100+ documents from www.fedstats.gov
Table Extraction Experimental Results
Line labels,percent correct
Table segments,F1
95 % 92 %
65 % 64 %
85 % -
HMM
StatelessMaxEnt
CRF w/outconjunctions
CRF
52 % 68 %
[Pinto, McCallum, Wei, Croft, 2003 SIGIR]
Feature Induction for CRFs
1. Begin with knowledge of atomic features,but no features yet in the model.
2. Consider many candidate features,including atomic and conjunctions.
3. Evaluate each candidate feature.4. Add to the model some that are ranked
highest.5. Train the model.
[McCallum, 2003, UAI]
Candidate Feature Evaluation
Common method: Information Gain
!"
#=Ff
fCHfPCHFC )|( )()(),(InfoGain
True optimization criterion: Likelihood of training data
!+! "= LLf f##),(Gain Likelihood
Technical meat is in how to calculate this efficiently for CRFs• Mean field approximation• Emphasize error instances (related to Boosting)• Newton's method to set λ
[McCallum, 2003, UAI]
Named Entity Recognition
CRICKET -MILLNS SIGNS FOR BOLAND
CAPE TOWN 1996-08-22
South African provincial sideBoland said on Thursday theyhad signed Leicestershire fastbowler David Millns on a oneyear contract.Millns, who toured Australiawith England A in 1992, replacesformer England all-rounderPhillip DeFreitas as Boland'soverseas professional.
Labels: Examples:
PER Yayuk BasukiInnocent Butare
ORG 3MKDPCleveland
LOC ClevelandNirmal HridayThe Oval
MISC JavaBasque1,000 Lakes Rally
Automatically Induced Features
Index Feature0 inside-noun-phrase (ot-1)5 stopword (ot)20 capitalized (ot+1)75 word=the (ot)100 in-person-lexicon (ot-1)200 word=in (ot+2)500 word=Republic (ot+1)711 word=RBI (ot) & header=BASEBALL1027 header=CRICKET (ot) & in-English-county-lexicon (ot)1298 company-suffix-word (firstmentiont+2)4040 location (ot) & POS=NNP (ot) & capitalized (ot) & stopword (ot-1)4945 moderately-rare-first-name (ot-1) & very-common-last-name (ot)4474 word=the (ot-2) & word=of (ot)
[McCallum & Li, 2003, CoNLL]
Named Entity Extraction Results
Method F1
HMMs BBN's Identifinder 73%
CRFs w/out Feature Induction 83%
CRFs with Feature Induction 90%based on LikelihoodGain
[McCallum & Li, 2003, CoNLL]
Outline
• Examples of IE and Data Mining.
• Conditional Random Fields and Feature Induction.
• Joint inference: Motivation and examples– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Joint Segmentation and Co-ref (Iterated Conditional Samples.)
• Two example projects– Email, contact address book, and Social Network Analysis
– Research Paper search and analysis
Problem:
Combined in serial juxtaposition,IE and KD are unaware of each others’weaknesses and opportunities.
1) KD begins from a populated DB, unaware ofwhere the data came from, or its inherentuncertainties.
2) IE is unaware of emerging patterns andregularities in the DB.
The accuracy of both suffers, and significant mining
of complex text sources is beyond reach.
Segment
Classify
Associate
Cluster
IE
Document
collection
Database
Discover patterns
- entity types
- links / relations
- events
Knowledge
Discovery
Actionable
knowledge
SegmentClassifyAssociateCluster
Filter
Prediction Outlier detection Decision support
IE
Documentcollection
Database
Discover patterns - entity types - links / relations - events
DataMining
Spider
Actionableknowledge
Uncertainty Info
Emerging Patterns
Solution:
SegmentClassifyAssociateCluster
Filter
Prediction Outlier detection Decision support
IE
Documentcollection
ProbabilisticModel
Discover patterns - entity types - links / relations - events
DataMining
Spider
Actionableknowledge
Solution:
Conditional Random Fields [Lafferty, McCallum, Pereira]
Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…]
Discriminatively-trained undirected graphical models
Complex Inference and LearningJust what we researchers like to sink our teeth into!
Unified Model
1. Jointly labeling cascaded sequencesFactorial CRFs
Part-of-speech
Noun-phrase boundaries
Named-entity tag
English words
[Sutton, Khashayar, McCallum, ICML 2004]
1. Jointly labeling cascaded sequencesFactorial CRFs
Part-of-speech
Noun-phrase boundaries
Named-entity tag
English words
[Sutton, Khashayar, McCallum, ICML 2004]
1. Jointly labeling cascaded sequencesFactorial CRFs
Part-of-speech
Noun-phrase boundaries
Named-entity tag
English words
[Sutton, Khashayar, McCallum, ICML 2004]
But errors cascade--must be perfect at every stage to do well.
1. Jointly labeling cascaded sequencesFactorial CRFs
Part-of-speech
Noun-phrase boundaries
Named-entity tag
English words
[Sutton, Khashayar, McCallum, ICML 2004]
Joint prediction of part-of-speech and noun-phrase in newswire,matching accuracy with only 50% of the training data.
Inference:Tree reparameterization BP
[Wainwright et al, 2002]
2. Jointly labeling distant mentionsSkip-chain CRFs
Senator Joe Green said today … . Green ran for …
…
[Sutton, McCallum, SRL 2004]
Dependency among similar, distant mentions ignored.
2. Jointly labeling distant mentionsSkip-chain CRFs
Senator Joe Green said today … . Green ran for …
…
[Sutton, McCallum, SRL 2004]
14% reduction in error on most repeated field in email seminar announcements.
Inference:Tree reparameterization BP
[Wainwright et al, 2002]
3. Joint co-reference among all pairsAffinity Matrix CRF
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
45
−99Y/N
Y/N
Y/N
11
[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]
~25% reduction in error onco-reference ofproper nouns in newswire.
Inference:Correlational clusteringgraph partitioning
[Bansal, Blum, Chawla, 2002]
“Entity resolution”“Object correspondence”
Coreference Resolution
Input
AKA "record linkage", "database record deduplication", "entity resolution", "object correspondence", "identity uncertainty"
OutputNews article, with named-entity "mentions" tagged
Number of entities, N = 3
#1 Secretary of State Colin Powell he Mr. Powell Powell
#2 Condoleezza Rice she Rice
#3 President Bush Bush
Today Secretary of State Colin Powellmet with . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . he . . . . . .. . . . . . . . . . . . . Condoleezza Rice . . . . .. . . . Mr Powell . . . . . . . . . .she . . . . . . . . . . . . . . . . . . . . . Powell . . . . . . . . . . . .. . . President Bush . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . Rice . . . . . . . . . . . . . . . . Bush . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .
Inside the Traditional Solution
Mention (3) Mention (4)
. . . Mr Powell . . . . . . Powell . . .
N Two words in common 29Y One word in common 13Y "Normalized" mentions are string identical 39Y Capitalized word in common 17Y > 50% character tri-gram overlap 19N < 25% character tri-gram overlap -34Y In same sentence 9Y Within two sentences 8N Further than 3 sentences apart -1Y "Hobbs Distance" < 3 11N Number of entities in between two mentions = 0 12N Number of entities in between two mentions > 4 -3Y Font matches 1Y Default -19
OVERALL SCORE = 98 > threshold=0
Pair-wise Affinity Metric
Y/N?
The Problem
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
affinity = 98
affinity = 11
affinity = −104
Pair-wise mergingdecisions are beingmade independentlyfrom each otherY
Y
N
Affinity measures are noisy and imperfect.
They should be madein relational dependencewith each other.
A Markov Random Field for Co-reference
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
45
−30Y/N
Y/N
Y/N
[McCallum & Wellner, 2003, ICML](MRF)
Make pair-wise mergingdecisions in dependentrelation to each other by- calculating a joint prob.- including all edge weights- adding dependence on consistent triangles.
11
!
P(v y |
v x ) =
1
Z v x
exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k
#l
#i, j
#$
% & &
'
( ) )
A Markov Random Field for Co-reference
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
45
−30Y/N
Y/N
Y/N
[McCallum & Wellner, 2003](MRF)
Make pair-wise mergingdecisions in dependentrelation to each other by- calculating a joint prob.- including all edge weights- adding dependence on consistent triangles.
11!"
!
P(v y |
v x ) =
1
Z v x
exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k
#l
#i, j
#$
% & &
'
( ) )
A Markov Random Field for Co-reference
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
Y
N
N
[McCallum & Wellner, 2003]
!
P(v y |
v x ) =
1
Z v x
exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k
#l
#i, j
#$
% & &
'
( ) )
(MRF)
−4
−(45)
−(−30)
+(11)
A Markov Random Field for Co-reference
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
Y
Y
N
[McCallum & Wellner, 2003]
!
P(v y |
v x ) =
1
Z v x
exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k
#l
#i, j
#$
% & &
'
( ) )
(MRF)
−infinity
+(45)
−(−30)
+(11)
A Markov Random Field for Co-reference
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
N
Y
N
[McCallum & Wellner, 2003]
!
P(v y |
v x ) =
1
Z v x
exp "l f l (xi,x j ,yij ) + "' f '(yij ,y jk,yik )i, j,k
#l
#i, j
#$
% & &
'
( ) )
(MRF)
64
+(45)
−(−30)
−(11)
Inference in these MRFs = Graph Partitioning[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
45
11
−30
. . . Condoleezza Rice . . .
−134
10
!
log P(v y |
v x )( )" #l f l (xi,x j ,yij )
l
$i, j
$ = w ij
i, j w/inparitions
$ % w ij
i, j acrossparitions
$
−106
Inference in these MRFs = Graph Partitioning[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . . . . . Condoleezza Rice . . .
= −22
45
11
−30
−134
10
−106
!
log P(v y |
v x )( )" #l f l (xi,x j ,yij )
l
$i, j
$ = w ij
i, j w/inparitions
$ % w ij
i, j acrossparitions
$
Inference in these MRFs = Graph Partitioning[Boykov, Vekler, Zabih, 1999], [Kolmogorov & Zabih, 2002], [Yu, Cross, Shi, 2002]
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . . . . . Condoleezza Rice . . .
!
log P(v y |
v x )( )" #l f l (xi, x j ,yij )
l
$i, j
$ = w ij
i, j w/inparitions
$ + w'iji, j acrossparitions
$ = 314
45
11
−30
−134
10
−106
Co-reference Experimental Results
Proper noun co-reference
DARPA ACE broadcast news transcripts, 117 stories
Partition F1 Pair F1Single-link threshold 16 % 18 %Best prev match [Morton] 83 % 89 %MRFs 88 % 92 %
Δerror=30% Δerror=28%
DARPA MUC-6 newswire article corpus, 30 stories
Partition F1 Pair F1Single-link threshold 11% 7 %Best prev match [Morton] 70 % 76 %MRFs 74 % 80 %
Δerror=13% Δerror=17%
[McCallum & Wellner, 2003]
Joint co-reference among all pairsAffinity Matrix CRF
. . . Mr Powell . . .
. . . Powell . . .
. . . she . . .
45
−99Y/N
Y/N
Y/N
11
[McCallum, Wellner, IJCAI WS 2003, NIPS 2004]
~25% reduction in error onco-reference ofproper nouns in newswire.
Inference:Correlational clusteringgraph partitioning
[Bansal, Blum, Chawla, 2002]
pDatabase
field values
c
4. Joint segmentation and co-reference
o
s
o
s
c
c
s
o
Citation attributes
y y
y
Segmentation
[Wellner, McCallum, Peng, Hay, UAI 2004]Inference:Variant of Iterated Conditional Modes
Co-referencedecisions
Laurel, B. Interface Agents:Metaphors with Character, inThe Art of Human-Computer InterfaceDesign, B. Laurel (ed), Addison-Wesley, 1990.
Brenda Laurel. Interface Agents:Metaphors with Character, inLaurel, The Art of Human-ComputerInterface Design, 355-366, 1990.
[Besag, 1986]
WorldKnowledge
35% reduction in co-reference error by using segmentation uncertainty.
6-14% reduction in segmentation error by using co-reference.
Extraction from and matching of research paper citations.
see also [Marthi, Milch, Russell, 2003]
Joint IE and Coreference from Research Paper Citations
Textual citation mentions(noisy, with duplicates)
Paper database, with fields,clean, duplicates collapsed
AUTHORS TITLE VENUECowell, Dawid… Probab… SpringerMontemerlo, Thrun…FastSLAM… AAAI…Kjaerulff Approxi… Technic…
4. Joint segmentation and co-reference
Laurel, B. Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
Citation Segmentation and Coreference
Laurel, B. Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
1) Segment citation fields
Citation Segmentation and Coreference
Laurel, B. Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
1) Segment citation fields
2) Resolve coreferent citations
Citation Segmentation and Coreference
Y?N
Laurel, B. Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
Citation Segmentation and Coreference
Y?N
93%True Segmentation
91%CRF Segmentation
78%No Segmentation
Citation Co-reference (F1)Segmentation Quality
1) Segment citation fields
2) Resolve coreferent citations
Laurel, B. Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
1) Segment citation fields
2) Resolve coreferent citations
3) Form canonical database record
Citation Segmentation and Coreference
AUTHOR = Brenda LaurelTITLE = Interface Agents: Metaphors with CharacterPAGES = 355-366BOOKTITLE = The Art of Human-Computer Interface DesignEDITOR = T. SmithPUBLISHER = Addison-WesleyYEAR = 1990
Y?N
Resolving conflicts
Laurel, B. Interface Agents: Metaphors with Character , in
The Art of Human-Computer Interface Design , T. Smith (ed) ,
Addison-Wesley , 1990 .
Brenda Laurel . Interface Agents: Metaphors with Character , in
Smith , The Art of Human-Computr Interface Design , 355-366 , 1990 .
1) Segment citation fields
2) Resolve coreferent citations
3) Form canonical database record
Citation Segmentation and Coreference
AUTHOR = Brenda LaurelTITLE = Interface Agents: Metaphors with CharacterPAGES = 355-366BOOKTITLE = The Art of Human-Computer Interface DesignEDITOR = T. SmithPUBLISHER = Addison-WesleyYEAR = 1990
Y?N
Perform jointly.
x
s
Observed citation
CRF Segmentation
IE + Coreference Model
J Besag 1986 On the…
AUT AUT YR TITL TITL
x
s
Observed citation
CRF Segmentation
IE + Coreference Model
Citation mention attributes
J Besag 1986 On the…
AUTHOR = “J Besag”YEAR = “1986”TITLE = “On the…”
c
x
s
IE + Coreference Model
c
J Besag 1986 On the…Smyth . 2001 Data Mining…
Smyth , P Data mining…
Structure for eachcitation mention
x
s
IE + Coreference Model
c
Binary coreference variablesfor each pair of mentions
J Besag 1986 On the…Smyth . 2001 Data Mining…
Smyth , P Data mining…
x
s
IE + Coreference Model
c
y n
n
J Besag 1986 On the…Smyth . 2001 Data Mining…
Smyth , P Data mining…
Binary coreference variablesfor each pair of mentions
y n
n
x
s
IE + Coreference Model
c
J Besag 1986 On the…Smyth . 2001 Data Mining…
Smyth , P Data mining…
Research paper entityattribute nodes
AUTHOR = “P Smyth”YEAR = “2001”TITLE = “Data Mining…”...
Such a highly connected graph makesexact inference intractable, so…
• Loopy Belief Propagation
v6v5
v3v2v1
v4
m1(v2) m2(v3)
m3(v2)m2(v1) messages passedbetween nodes
Approximate Inference 1
• Loopy Belief Propagation
• Generalized Belief Propagation
v6v5
v3v2v1
v4
m1(v2) m2(v3)
m3(v2)m2(v1)
v6v5
v3v2v1
v4
v9v8v7
messages passedbetween nodes
messagespassed betweenregions
Here, a message is a conditional probability table passed among nodes.But when message size grows exponentially with size of overlap between regions!
Approximate Inference 1
• Iterated Conditional Modes (ICM)
[Besag 1986]
v6v5
v3v2v1
v4
v6i+1 = argmax P(v6
i | v \ v6
i)v6
i
= held constant
Approximate Inference 2
• Iterated Conditional Modes (ICM)
[Besag 1986]
v6v5
v3v2v1
v4
v5j+1 = argmax P(v5
j | v \ v5
j)v5
j
= held constant
Approximate Inference 2
• Iterated Conditional Modes (ICM)
[Besag 1986]
v6v5
v3v2v1
v4
v4k+1 = argmax P(v4
k | v \ v4
k)v4
k
= held constant
Approximate Inference 2
but greedy, and easily falls into local minima.Structured inference scales well here,
• Iterated Conditional Modes (ICM)
[Besag 1986]
• Iterated Conditional Sampling (ICS) (our name) Instead of selecting only argmax, sample of argmaxes of P(v4
k | v \ v4
k)
e.g. an N-best list (the top N values)
v6v5
v3v2v1
v4
v4k+1 = argmax P(v4
k | v \ v4
k)v4
k
= held constant
v6v5
v3v2v1
v4
Approximate Inference 2
Can use “Generalized Version” of this;doing exact inference on a region ofseveral nodes at once.
Here, a “message” grows only linearlywith overlap region size and N!
IE + Coreference Model
Exact inference onthese linear-chain regions
J Besag 1986 On the…Smyth . 2001 Data Mining…
Smyth , P Data mining…
From each chainpass an N-best List
into coreference
IE + Coreference Model
J Besag 1986 On the…Smyth . 2001 Data Mining…
Smyth , P Data mining…
Approximate inferenceby graph partitioning…
…integrating outuncertaintyin samples
of extraction
Make scale to 1Mcitations with Canopies
[McCallum, Nigam, Ungar 2000]
…Metaphors withCharacter
Laurel, B. Interf aceAgents
…Interface Agents:Metaphors withCharacter
Laurel, B.
…Interf ace Agents:Metaphors withCharacter The
Laurel, B
…TitleName
When calculatingsimilarity with anothercitation, have moreopportunity to findcorrect, matching fields.
1990The Art of HumanComputerInterface Design
Agents: Metaphorswith Character
Laurel, B. Interf ace
1990of HumanComputer Interf aceDesign
Interface Agents:Metaphors withCharacter The Art
Laurel, B.
1990The Art of HumanComputerInterface Design
Agents: Metaphorswith Character
Laurel, B. Interf ace
YearBook TitleTitleName
Inference:Sample = N-best List from CRF Segmentation
y ? n
y n
n
IE + Coreference Model
J Besag 1986 On the…Smyth . 2001 Data Mining…
Smyth , P Data mining…
Exact (exhaustive) inferenceover entity attributes
y n
n
IE + Coreference Model
J Besag 1986 On the…Smyth . 2001 Data Mining…
Smyth , P Data mining…
Revisit exact inferenceon IE linear chain,
now conditioned onentity attributes
y n
n
Parameter Estimation
Coref graph edge weightsMAP on individual edges
Separately for different regions
IE Linear-chainExact MAP
Entity attribute potentialsMAP, pseudo-likelihood
In all cases:Climb MAP gradient with
quasi-Newton method
pDatabase
field values
c
4. Joint segmentation and co-reference
o
s
o
s
c
c
s
o
Citation attributes
y y
y
Segmentation
[Wellner, McCallum, Peng, Hay, UAI 2004]
Inference:Variant of Iterated Conditional Modes
Co-referencedecisions
Laurel, B. Interface Agents:Metaphors with Character, inThe Art of Human-Computer InterfaceDesign, B. Laurel (ed), Addison-Wesley, 1990.
Brenda Laurel. Interface Agents:Metaphors with Character, inLaurel, The Art of Human-ComputerInterface Design, 355-366, 1990.
[Besag, 1986]
WorldKnowledge
35% reduction in co-reference error by using segmentation uncertainty.
6-14% reduction in segmentation error by using co-reference.
Extraction from and matching of research paper citations.
Outline
• Examples of IE and Data Mining.
• Conditional Random Fields and Feature Induction.
• Joint inference: Motivation and examples– Joint Labeling of Cascaded Sequences (Belief Propagation)
– Joint Labeling of Distant Entities (BP by Tree Reparameterization)
– Joint Co-reference Resolution (Graph Partitioning)
– Joint Segmentation and Co-ref (Iterated Conditional Samples.)
• Two example projects– Email, contact address book, and Social Network Analysis
– Research Paper search and analysis
Workplace effectiveness ~ Ability to leverage network of acquaintances
But filling Contacts DB by hand is tedious, and incomplete.
Email Inbox Contacts DB
WWW
Automatically
Managing and UnderstandingConnections of People in our Email World
System Overview
ContactInfo andPersonName
Extraction
PersonName
Extraction
NameCoreference
HomepageRetrieval
SocialNetworkAnalysis
KeywordExtraction
CRFWWW
names
An ExampleTo: “Andrew McCallum” [email protected]
Subject ...
Information extraction, social network,…
KeyWords:
Fernando Pereira, SamRoweis,…
Links:
(413) 545-1323CompanyPhone:
01003Zip:
MAState:
AmherstCity:
140 Governor’s Dr.StreetAddress:
University of MassachusettsCompany:
Associate ProfessorJobTitle:
McCallumLastName:
KachitesMiddleName:
AndrewFirstName:
Search fornew people
Summary of Results
80.7676.3385.7394.50CRF
FieldF1
FieldRecall
FieldPrec
TokenAcc
Machine learningCognitive statesLearning apprenticeArtificial intelligence
Tom Mitchell
Semantic webDescription logicsKnowledgerepresentationOntologies
DeborahMcGuiness
Bayesian networksRelational modelsProbabilistic modelsHidden variables
Daphne Koller
Logic programmingText categorizationData integrationRule learning
William Cohen
KeywordsPerson
Contact info and name extraction performance (25 fields)
Example keywords extracted
1. Expert Finding: When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!)
2. Social Network Analysis: Understand the social structure of your organization. Suggest structural changes for improved efficiency.
Clustering words into topics withLatent Dirichlet Allocation
[Blei, Ng, Jordan 2003]
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAMDREAMS
THOUGHTIMAGINATION
MOMENTTHOUGHTS
OWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASESGERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
Example topicsinduced from a large collection of text
FIELDMAGNETICMAGNET
WIRENEEDLE
CURRENTCOIL
POLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYED
PLAYINGHIT
TENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
[Tennenbaum et al]
STORYSTORIES
TELLCHARACTER
CHARACTERSAUTHOR
READTOLD
SETTINGTALESPLOT
TELLINGSHORT
FICTIONACTION
TRUEEVENTSTELLSTALE
NOVEL
MINDWORLDDREAMDREAMS
THOUGHTIMAGINATION
MOMENTTHOUGHTS
OWNREALLIFE
IMAGINESENSE
CONSCIOUSNESSSTRANGEFEELINGWHOLEBEINGMIGHTHOPE
WATERFISHSEA
SWIMSWIMMING
POOLLIKE
SHELLSHARKTANK
SHELLSSHARKSDIVING
DOLPHINSSWAMLONGSEALDIVE
DOLPHINUNDERWATER
DISEASEBACTERIADISEASESGERMSFEVERCAUSE
CAUSEDSPREADVIRUSES
INFECTIONVIRUS
MICROORGANISMSPERSON
INFECTIOUSCOMMONCAUSING
SMALLPOXBODY
INFECTIONSCERTAIN
FIELDMAGNETICMAGNET
WIRENEEDLE
CURRENTCOIL
POLESIRON
COMPASSLINESCORE
ELECTRICDIRECTION
FORCEMAGNETS
BEMAGNETISM
POLEINDUCED
SCIENCESTUDY
SCIENTISTSSCIENTIFIC
KNOWLEDGEWORK
RESEARCHCHEMISTRY
TECHNOLOGYMANY
MATHEMATICSBIOLOGY
FIELDPHYSICS
LABORATORYSTUDIESWORLD
SCIENTISTSTUDYINGSCIENCES
BALLGAMETEAM
FOOTBALLBASEBALLPLAYERS
PLAYFIELD
PLAYERBASKETBALL
COACHPLAYED
PLAYINGHIT
TENNISTEAMSGAMESSPORTS
BATTERRY
JOBWORKJOBS
CAREEREXPERIENCE
EMPLOYMENTOPPORTUNITIES
WORKINGTRAINING
SKILLSCAREERS
POSITIONSFIND
POSITIONFIELD
OCCUPATIONSREQUIRE
OPPORTUNITYEARNABLE
Example topicsinduced from a large collection of text
[Tennenbaum et al]
YALDAM
etnotheratentirichletllocationodel
From LDA to Author-Recipient-Topic(ART)
Inference and Estimation
Gibbs Sampling:- Easy to implement- Reasonably fast
r
Enron Email Corpus
• 250k email messages• 23k people
Date: Wed, 11 Apr 2001 06:56:00 -0700 (PDT)From: [email protected]: [email protected]: Enron/TransAltaContract dated Jan 1, 2001
Please see below. Katalin Kiss of TransAlta has requested anelectronic copy of our final draft? Are you OK with this? Ifso, the only version I have is the original draft withoutrevisions.
DP
Debra PerlingiereEnron North America Corp.Legal Department1400 Smith Street, EB 3885Houston, Texas [email protected]
Topics, and prominent sender/receiversdiscovered by ART
Topics, and prominent sender/receiversdiscovered by ART
Beck = “Chief Operations Officer”Dasovich = “Government Relations Executive”Shapiro = “Vice Presidence of Regulatory Affairs”Steffes = “Vice President of Government Affairs”
Comparing Role Discovery
connection strength (A,B) =
distribution overauthored topics
Traditional SNA
distribution overrecipients
distribution overauthored topics
Author-TopicART
Comparing Role Discovery Tracy Geaconne ⇔ Dan McCarty
Traditional SNA Author-TopicART
Similar roles Different rolesDifferent roles
Geaconne = “Secretary”McCarty = “Vice President”
Traditional SNA Author-TopicART
Different roles Very similarNot very similar
Geaconne = “Secretary”Hayslett = “Vice President & CTO”
Comparing Role Discovery Tracy Geaconne ⇔ Rod Hayslett
Traditional SNA Author-TopicART
Different roles Very differentVery similar
Blair = “Gas pipeline logistics”Watson = “Pipeline facilities planning”
Comparing Role Discovery Lynn Blair ⇔ Kimberly Watson
Traditional SNA Author-TopicART
Block structured NotNot
Comparing Group Discovery Enron TransWestern Division
McCallum Email Corpus 2004
• January - October 2004• 23k email messages• 825 people
From: [email protected]: NIPS and ....Date: June 14, 2004 2:27:41 PM EDTTo: [email protected]
There is pertinent stuff on the first yellow folder that iscompleted either travel or other things, so please sign thatfirst folder anyway. Then, here is the reminder of the thingsI'm still waiting for:
NIPS registration receipt.CALO registration receipt.
Thanks,Kate
McCallum Email Blockstructure
Four most prominent topicsin discussions with ____?
Two most prominent topicsin discussions with ____?
Words Prob
love 0.030514
house 0.015402
0.013659
time 0.012351
great 0.011334
hope 0.011043
dinner 0.00959
saturday 0.009154
left 0.009154
ll 0.009009
0.008282
visit 0.008137
evening 0.008137
stay 0.007847
bring 0.007701
weekend 0.007411
road 0.00712
sunday 0.006829
kids 0.006539
flight 0.006539
Words Prob
today 0.051152
tomorrow 0.045393
time 0.041289
ll 0.039145
meeting 0.033877
week 0.025484
talk 0.024626
meet 0.023279
morning 0.022789
monday 0.020767
back 0.019358
call 0.016418
free 0.015621
home 0.013967
won 0.013783
day 0.01311
hope 0.012987
leave 0.012987
office 0.012742
tuesday 0.012558
Topic 37Words Prob Sender Recipient Prob
jean 0.032349 jean mccallum 0.111932
james 0.02684 mccallum jean 0.072785
lab 0.017229 jean jean 0.056024
space 0.016292 mccallum allan 0.054266
ciir 0.015471 jean allan 0.035162
students 0.015354 allan mccallum 0.027778
bruce 0.014885 mccallum croft 0.025668
office 0.014768 mccallum grupen 0.021917
staff 0.013244 mccallum stowell 0.01887
funding 0.012072 mccallum [email protected]
mike 0.011017 grupen mccallum 0.018167
adam 0.010666 gwking mccallum 0.016057
move 0.010666 [email protected] 0.014299
grad 0.01008 mccallum saunders 0.014299
ll 0.009962 mccallum jensen 0.007267
rod 0.009494 mccallum culotta 0.006915
time 0.008556 gauthier mccallum 0.006564
asked 0.008556 grupen grupen 0.006446
student 0.008556 corrada mccallum 0.006212
today 0.008322 mccallum pal 0.005626
Topic 40
Words Prob Sender Recipient Prob
code 0.060565 hough mccallum 0.067076
mallet 0.042015 mccallum hough 0.041032
files 0.029115 mikem mccallum 0.028501
al 0.024201 culotta mccallum 0.026044
file 0.023587 saunders mccallum 0.021376
version 0.022113 mccallum saunders 0.019656
java 0.021499 pereira mccallum 0.017813
test 0.020025 casutton mccallum 0.017199
problem 0.018305 mccallum ronb 0.013514
run 0.015356 mccallum pereira 0.013145
cvs 0.013391 hough melinda.gervasio 0.013022
add 0.012776 mccallum casutton 0.013022
directory 0.012408 fuchun mccallum 0.010811
release 0.012285 mccallum culotta 0.009828
output 0.011916 ronb mccallum 0.009705
bug 0.011179 westy hough 0.009214
source 0.010197 xuerui corrada 0.0086
ps 0.009705 ghuang mccallum 0.008354
log 0.008968 khash mccallum 0.008231
created 0.0086 melinda.gervasiomccallum 0.008108
Pairs with highestrank difference between ART & SNA
5 other professors3 other ML researchers
Role-Author-Recipient-Topic Models
Summary
• Traditional SNA examines links,but not the language content on those links.
• Presented ART, an LDA-like model formessages sent in a social network.
• Future work:– Explicitly model & discover roles and groups– Integrate with coreference and relation extraction– Leverage others’ work on correlations and
topic shifts over time
Main Application Project:
Main Application Project:
ResearchPaper
Cites
Main Application Project:
ResearchPaper
Cites
Person
UniversityVenue
Grant
Groups
Expertise
Summary
• Conditional Random Fields combine the benefits of– Conditional probability models (arbitrary features)– Markov models (for sequences or other relations)
• Success in– Factorial finite state models– Jointly labeling distant entities– Coreference analysis– Segmentation uncertainty aiding coreference
• Current projects– Email, contact management, expert-finding, SNA– Mining the scientific literature
End of Talk
Top Related