Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011...

38
TEMPORAL FACT EXTRACTION, DISAMBIGUATION, AND EVOLUTION Arturas Mazeika and Marc Spaniol Symposium on Bias and Diversity in IR European Summer School of Information Retrieval August, 29-September, 2, 2011, Koblenz, Germany

Transcript of Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011...

Page 1: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

TEMPORAL FACT EXTRACTION, DISAMBIGUATION, AND EVOLUTION

Arturas Mazeika and Marc Spaniol

Symposium on Bias and Diversity in IR

European Summer School of Information Retrieval

August, 29-September, 2, 2011, Koblenz, Germany

Page 2: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

APPLICATION: ENTITY TIMELINES

• Harvesting NEs

• Extracting time

• Canonicalization

• YAGO ontology

01.09.2011 2Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 3: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

OUTLINE

• Querying semantic knowledge bases

• Harvesting facts

– Wikipedia

– Web

– Temporal facts

01.09.2011 3Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 4: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

QUERYING THE SEMANTIC WEB

01.09.2011 4Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 5: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

WHAT’S WORKING? WHAT’S NOT?

QUERYING THE SEMANTIC WEB

01.09.2011 5Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 6: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

Query:

politicians who are

also scientists ?

?x isa politician .

?x isa scientist

Results:

Benjamin Franklin

Zbigniew Brzezinski

Angela Merkel…

http://www.mpi-inf.mpg.de/yago-naga/

QUERYING THE SEMANTIC WEB

01.09.2011 6Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 7: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

http://www.mpi-inf.mpg.de/yago-naga/

YAGO Entity

Max_Planck

Apr 23, 1858

Person

City

Country

subclass

Location

subclass

instanceOf

subclass

bornOn

“Max

Planck”

means(0.9)

subclass

Oct 4, 1947 diedOn

Kiel

bornInNobel Prize

Erwin_Planck

FatherOfhasWon

Scientist

means

“Max Karl Ernst

Ludwig Planck”

Physicist

instanceOf

subclass

Germany

Politician

Angela Merkel

Schleswig-

Holstein

State

“Angela

Dorothea

Merkel”

Oct 23, 1944diedOn

means(0.1)

instanceOfinstanceOf

subclass

subclass

means

“Angela

Merkel”

means

citizenOf

instanceOf

instanceOf

locatedIn

locatedIn

subclass

Accuracy

95%

(Suchanek et al.: WWW‟07,

Hoffart et al.: WWW„11)

01.09.2011 7Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 8: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

KNOWLEDGE REPRESENTATION

...

• RDF (Resource Description Framework, W3C):

subject-property-object (SPO) triples, binry relations

structure, but no (prescriptive) schema

• Relations, frames

• Description logics: OWL, DL-lite

• Higher-order logics, epistemic logics

facts (RDF triples):1. (JimGray, hasAdvisor, MikeHarrison)

2. (SurajitChaudhuri, hasAdvisor, JeffUllman)

3. (Madonna, marriedTo, GuyRitchie)

4. (NicolasSarkozy, marriedTo, CarlaBruni)

facts about facts:5: (1, inYear, 1968)

6: (2, inYear, 2006)

7: (3, validFrom, 22-Dec-2000)

8: (3, validUntil, Nov-2008)

9: (4, validFrom, 2-Feb-2008)

10: (2, source, SigmodRecord)

01.09.2011 8Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 9: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

http://www.mpi-inf.mpg.de/yago-naga/

YAGO Entity

Max_Planck

Apr 23, 1858

Person

City

Country

subclass

Location

subclass

instanceOf

subclass

bornOn

“Max

Planck”

means(0.9)

subclass

Oct 4, 1947 diedOn

Kiel

bornInNobel Prize

Erwin_Planck

FatherOfhasWon

Scientist

means

“Max Karl Ernst

Ludwig Planck”

Physicist

instanceOf

subclass

Germany

Politician

Angela Merkel

Schleswig-

Holstein

State

“Angela

Dorothea

Merkel”

Oct 23, 1944diedOn

means(0.1)

instanceOfinstanceOf

subclass

subclass

means

“Angela

Merkel”

means

citizenOf

instanceOf

instanceOf

locatedIn

locatedIn

subclass

Accuracy

95%

(Suchanek et al.: WWW‟07,

Hoffart et al.: WWW„11)

01.09.2011 9Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 10: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

WORDNET THESAURUS [MILLER/FELLBAUM 1998]

http://wordnet.princeton.edu/

3 concepts / classes & their

synonyms (synset„s)

01.09.2011 10Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 11: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

WORDNET THESAURUS [MILLER/FELLBAUM 1998]

subclasses

(hyponyms)

superclasses

(hypernyms)

01.09.2011 11Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 12: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

MAPPING: WIKIPEDIA WORDNET[Suchanek: WWW„07, Ponzetto&Strube: AAAI„07]

Jim Gray(computerspecialist)

Computer

Scientist

American

Scientist

Sailor,

Crewman

Missing

Person

Chemist

Artist

01.09.2011 12Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 13: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

WIKIPEDIA CATEGORIES

01.09.2011 13Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 14: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

?

?

?

American

Sailor,

Crewman

MAPPING: WIKIPEDIA WORDNET[Suchanek: WWW„07]

Jim Gray(computerspecialist)

Computer

Scientist

Data-base

Fellow (1), Comrade

Fellow (2),Colleague

Fellow (3)(of Society)

Scientist

Member (1),Fellow

Member (2),Extremity

AmericanComputerScientists

DatabaseResearcher

Fellows ofthe ACM

PeopleLost at Sea

instanceOf subclassOf

name similarity

(edit dist., n-gram overlap) ?

context similarity

(word/phrase level) ?

machine learning ?

Missing

Person

01.09.2011 14Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 15: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

MAPPING: WIKIPEDIA WORDNET[Suchanek: WWW„07, Ponzetto & Strube: AAAI„07]

Analyzing category names noun group parser:

American Musicians of Italian Descent

American Folk Music of the 20th Century

American Indy 500 Drivers on Pole Positions

Head word is key, should be in plural for instanceOf

headpre-modifier post-modifier

headpre-modifier post-modifier

headpre-modifier post-modifier

Given: entity e in Wikipedia categories c1, …, ck

Wanted: instanceOf(e,c) and subclassOf(ci,c) for WN class cProblem: vagueness & ambiguity of names c1, …, ck

01.09.2011 15Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 16: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

MAPPING WIKIPEDIA ENTITIES TO WORDNET CLASSES

Heuristic Method:for each ci do

if head word w of category name ci is plural {

1) match w against synsets of WordNet classes2) choose best fitting class c and set e c3) expand w by pre-modifier and set ci w+ c

}

• can also derive features this way

• feed into supervised classifier

[Suchanek: WWW„07]

tuned conservatively: high precision, reduced recall

Given: entity e in Wikipedia categories c1, …, ck

Wanted: instanceOf(e,c) and subclassOf(ci,c) for WN class cProblem: vagueness & ambiguity of names c1, …, ck

01.09.2011 16Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 17: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

http://www.mpi-inf.mpg.de/yago-naga/

YAGO Entity

Max_Planck

Apr 23, 1858

Person

City

Country

subclass

Location

subclass

instanceOf

subclass

bornOn

“Max

Planck”

means(0.9)

subclass

Oct 4, 1947 diedOn

Kiel

bornInNobel Prize

Erwin_Planck

FatherOfhasWon

Scientist

means

“Max Karl Ernst

Ludwig Planck”

Physicist

instanceOf

subclass

Germany

Politician

Angela Merkel

Schleswig-

Holstein

State

“Angela

Dorothea

Merkel”

Oct 23, 1944diedOn

means(0.1)

instanceOfinstanceOf

subclass

subclass

means

“Angela

Merkel”

means

citizenOf

instanceOf

instanceOf

locatedIn

locatedIn

subclass

Accuracy

95%

(Suchanek et al.: WWW‟07,

Hoffart et al.: WWW„11)

01.09.2011 17Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 18: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

WIKIPEDIA INFOBOXES

harvest by

extraction rules:

• regex matching

• type checking

(?i)IBL\|BEG\s*awards\s*=\s*(.*?)IBL\|END"

=> "$0 hasWonPrize @WikiLink($1)

01.09.2011 18Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 19: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

TYPE CHECKING

Use consistency constraints

to prune false candidates

spouse(Hillary,Bill)

spouse(Carla,Nicolas)

spouse(Cecilia,Nicolas)

spouse(Carla,Ben)

spouse(Carla,Mick)

spouse(Carla, Sofie)

spouse(x,y) diff(y,z) spouse(x,z)

f(Hillary)

f(Carla)

f(Cecilia)

f(Sofie)

m(Bill)

m(Nicolas)

m(Ben)

m(Mick)

spouse(x,y) f(x) spouse(x,y) m(y)

spouse(x,y) (f(x) m(y)) (m(x) f(y))

FOL rules (restricted):

ground atoms:

spouse(x,y) diff(w,x) spouse(w,y)

Simple type checks:

marriedTo (Planck, quantum physics)

01.09.2011 19Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 20: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

O. Etzioni, M. Banko, M.J. Cafarella: Machine Reading, AAAI ‚06

T. Mitchell et al.: Populating the Semantic Web by Macro-Reading Internet Text, ISWC’09

MACHINE READING

01.09.2011 20Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 21: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PROSPERA: PATTERN-BASED HARVESTING

Facts Patterns

(Hillary, Bill)

(Carla, Nicolas)

& Fact Candidates

X and her husband Y

X and Y on their honeymoon

X and Y and their children

X has been dating with Y

X loves Y

…• good for recall

• noisy, drifting

• not robust enough

for high precision

(Angelina, Brad)

(Hillary, Bill)

(Victoria, David)

(Carla, Nicolas)

(Angelina, Brad)

(Yoko, John)

(Carla, Benjamin)

(Larry, Google)

(Kate, Pete)

(Victoria, David)

01.09.2011 21Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 22: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

• “attended secondary school” goodPattern for bornIn?

• “attended secondary school” goodPattern for attndSchool?

PROSPERA: REASONING EXAMPLE

• Elvis attended secondary school in Memphis.

• Elvis isBornIn Mississippi

• A person cannot be born in two places

• Memphis not isIn Mississippi

• => attended secondary school is not a goodPattern for bornIn

• Herrmann Einstein attended secondary school in Germany.

• Hermann Einstein attendedSchoolIn Stuttgart

• Stuttgart is located in Germany

• => attended secondary school is goodPattern for bornIn

• Weighted max sat problem

• Find the best assignment of patterns to relations

• The assignment should maximize the weights of correct facts01.09.2011 22Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 23: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

• Formalization

01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 23

PROSPERA: REASONING EXAMPLE

•Predefined rules

•Instantiate Nes

•Find best pattern-

relation assignmen

Page 24: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PROSPERA: WEB-SCALE EXPERIMENTS

• on ClueWeb„09 corpus (500 Mio. English Web pages)

• with Hadoop cluster of 10x16 cores and 10x48 GB memory

PROSPERA ReadTheWeb [CMU]

Relation #Facts Precision Prec@1000 #Facts Precision

AthletePlaysForTeam 14685 82% 100% 456 100%

TeamPlaysAgainstTeam 15170 89% 100% 1068 99%

TeamMate 9666 86% 100% --- ---

FacultyAt 4394 96% 100% --- ---

www.mpi-inf.mpg.de/yago-naga/prospera/

[N. NAKASHOLE ET AL.: WSDM‟11]

01.09.2011 24Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 25: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: TEMPORAL KNOWLEDGE AND EVOLUTION• Relation Types

– Base relations: (entityi, entityj)– Temporal relations: (entityi, entityj)@tk

– Type signature: (TYPEi, TYPEj), e.g., bornIn(PERSON, LOCATION)

• Input:– A set of relations of interest with their type signatures.

• e.g., playsForClubTemp(PERSON, CLUB), …

– A small number of labeled positive/negative seed facts for each relation.

• e.g., (David_Beckham, Real_Madrid)@2007 vs. (David_Beckham, Manchester_U)@2007

– A large corpus of textual documents.• e.g., ClueWeb09, Wikipedia, …

• Output:– A set of new facts for each relation.

• e.g., (Lionel_Messi, FC_Barcelona)@2008, (Michael Ballack, Bayern_Munich)@2005, (Ronaldo, Real_Madrid)@2004 ...

Y. Wang et al: CIKM‟11

01.09.2011 25Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 26: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: FRAMEWORK

Candidate Gathering

Pattern Analysis

Graph Construction

Label Propagation

Base & Temporal Facts

Corpus +Relations of Interest

Pos./Neg.Seed Facts

Fact

CandidatesCandidate

Sentences

Fact

CandidatesSeed

Patterns

Graph

01.09.2011 26Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 27: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: CANDIDATE GATHERING

• We extract and disambiguate entities from the corpora (sentence level)

• Fact candidates– Two entities, whose types are pertinent to a relation of

interest, appear in the same sentence

– For temporal relations, it also requires an associated temporal mention appearing in the same sentence

• Candidate sentences – A sentence that contains at least one fact candidate.

“Beckham played for Real and Galaxy.”

(Real_Madrid, LA_Galaxy)

“Beckham joined Real in 2003.”

(David_Beckham, Real_Madrid)@2003

(David_Beckham, Real_Madrid)

01.09.2011 27Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 28: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: PATTERN REPRESENTATION

• Pattern is a string between two named entities• The string is changed and generalized for better IE

• Surface String– “finally moved from Real Madrid before his recent joining”

• Compressed surface string– Only keep verbs, nouns and prepositions.– “move from Real Madrid before join”

• Compressed and Lifted surface string– Replacing entity mentions by their types– “move from CLUB before join”

• n-grams based on compressed and lifted surface string– {“move from CLUB”, “from CLUB before”, “CLUB before join”}

• Final representation: (TYPE1, TYPE2, p)– The pattern for fact candidate (David_Beckham, LA_Galaxy) is

(PERSON, CLUB, {“move from CLUB”, “from CLUB before”, “CLUB before join”})

“Beckham finally moved from Real Madrid before

his recent joining LA_Galaxy in 2007.”

01.09.2011 28Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 29: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: PATTERN ANALYSIS

Assign patterns to relations. The pattern

– must be frequent in positive seeds

– must be infrequently in negative seeds:

conf(p, Ri) = Num_Pos / (Num_Pos + Num_Neg)

playsForClub =

{“sign for”:1.0, “score for”:1.0, “stay at”:0.8}

01.09.2011 29Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 30: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: GRAPH GENERATION

• A weighted undirected graph G(V, E, W)– Vertexes are either: or

• VF (facts): (David_Beckham,Real_Madrid)@2003

• VF (patterns): (PLAYER,CLUB,{“sign for”:1.0, “score for”:1.0})

– Edge set E• Edge Type 1: Between a fact vertex vf and a pattern vertex vp

– The edge weight is calculated using the number of sentences which contain the fact of vf and the pattern of vp

• Edge Type 2: Between two pattern vertices– The edge weight is defined as the similarity of the two patterns.

» Two patterns’ type signature

» Whether sharing the same verb and preposition

» Distance-weighted Jaccard similarity

01.09.2011 30Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 31: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

Beckham‟s new contract with Real starts from 2003.

Beckham finally moved to Real in 2003.

Beckham finally moved to Spain in 2003.

Rafael‟s last minute move to Hotspur is the best transfer in 2010.

# of sentences containing VF1 and VP1 = 25

W(VF1, VP1) = 1-e(-1)*0.03*25 =0.7

0.7

0.9

0.3

0.6

VP2 = {“move to”:1}

VP4 = {“minute move”:1, “move to”:2}

Jaccard = {“move to”}/{”minute move”, “move to”}

=2/(1+2) = 0.67

0.67

01.09.2011 31Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 32: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: LABEL PROPAGATION (NO CONTRAINTS)

• Labels on vertices indicate relation and its conf.• Each vertex vi gets Yi vector of (m+1) labels• Yi

r indicates the initial confidence of vertex vi

holding the relation r.– Fact vertex: if vi is for r, then Yi

r = 1.0.– Pattern vertex: if vi is a seed pattern of r, then Yi

r = 1.0.– Otherwise: Yi

r = 0.

YVF1:(playsForClub: 1.0; joinsClub: 0; leavesClub: 0; none: 0)

YVP2:(playsForClub: 0; joinsClub: 1.0; leavesClub: 0; none: 0)

• Labels propagate via edges into Ŷi

seed label loss Edge loss Regularization

ŶVF1(playsForClub: 0.8; joinsClub: 0.7; leavesClub: 0.1; none: 0.01)playsForClub

1

1 1 1

2^

2

1,

2^^

1

2_^

)()()(m

l

n

i

n

i

l

i

l

i

n

ji

l

j

l

iij

l

i

l

i

l rYYYwYYsi

01.09.2011 32Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 33: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: INCORPORATING CONSTRAINTS

• Inclusion constraints(IC)– Relation level: joinsClub(David_Beckham, Real_Madrid) →

worksForClub(David_Beckham, Real_Madrid)

• Exclusion constraints(EC)

– Relation level: isSonOf(George_W._Bush, George_H._W._Bush) NOT isDaughterOf(George_W._Bush, George_H._W._Bush)

– Entity level: bornIn(Albert_Einstein, Germany) →NOT bornIn(Albert_Einstein, United_States)

IC EC; entity levelEC; relational level

1

1 1 1,

2^^

5

1

2^^

4

2^^

3 )()()(m

l k

n

i

n

ji

l

j

l

i

l

k

n

i

k

i

l

i

lkk

i

l

i

lk YYeYYdYYc ij

1

1 1 1

2^

2

1,

2^^

1

2_^

)()()(m

l

n

i

n

i

l

i

l

i

n

ji

l

j

l

iij

l

i

l

i

l rYYYwYYsi

01.09.2011 33Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 34: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: EXPERIMENTAL STUDY

• Date Set

– 23000 soccer players and celebrities in Wikipedia articles

– 110000 online news articles contained in “FIFA 100 list”

– 88000 news mentioned in “Forbes 100 list”

• Prominent facts are chosen as seed facts according to their frequency

01.09.2011 34Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 35: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: BASE FACT EXTRACTION RESULT

100 positive seeds and 10 negative seeds

01.09.2011 35Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 36: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

PRAVDA: TEMPORAL FACT EXTRACTION RESULT

01.09.2011 36Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 37: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

CONCLUSION AND OUTLOOK

Entities & Classes

Relationships

Temporal Knowledgewidely open (fertile) research ground:

• uncertain / incomplete temporal scopes of facts

• joint reasoning on ER facts and time scopes

good progress, but many challenges left:

• recall & precision by patterns & reasoning

• efficiency & scalability

• soft rules, hard constraints, richer logics, …

• open-domain discovery of new relation types

strong success story, some problems left:

• large taxonomies of classes with individual entities

• long tail calls for new methods

• entity disambiguation remains grand challenge

01.09.2011 37Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011

Page 38: Symposium on Bias and Diversity in IR European …machine learning ? Missing Person 01.09.2011 Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011 14 MAPPING:

REFERENCES

• F.M. Suchanek, G. Kasneci, G. Weikum: Yago: a core of semantic knowledge. WWW 2007

• J. Hoffart, F.M. Suchanek, K. Berberich, et al.: YAGO2: exploring and querying • world knowledge in time, space, context, and many languages. WWW 2011• F.M. Suchanek et al.: SOFIE: a self-organizing framework for information

extraction. WWW 2009• Y. Wang, M. Zhu, L. Qu, M. Spaniol, G. Weikum: Timely YAGO: harvesting,

querying, and visualizing temporal knowledge from Wikipedia. EDBT 2010• Y. Wang, L. Qu, B. Yang, M. Spaniol, G. Weikum: Harvesting Facts from

Textual Web Sources by Constrained Label Propagation. CIKM 2011• A. Mazeika, T. Tylenda, G. Weikum: Entity Timelines: Visual Analytics and

Named Entity Evolution. CIKM 2011

01.09.2011 38Temporal Fact extraction, Disambiguation, and Evolution, SBDIR@ESSIR 2011