Gerhard Weikum Max Planck Institute for Informatics & Saarland University weikum

64
Gerhard Weikum Max Planck Institute for Informatics & Saarland University http://www.mpi-inf.mpg.de/ ~weikum/ Semantic Search: from Names and Phrases to Entities and Relations

description

Semantic Search : from Names and Phrases to Entities and Relations. Gerhard Weikum Max Planck Institute for Informatics & Saarland University http://www.mpi-inf.mpg.de/~weikum/. Acknowledgements. Big Picture: Opportunities Now !. Very Large Knowledge Bases. KB Population. - PowerPoint PPT Presentation

Transcript of Gerhard Weikum Max Planck Institute for Informatics & Saarland University weikum

Page 1: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Gerhard Weikum Max Planck Institute for Informatics & Saarland Universityhttp://www.mpi-inf.mpg.de/~weikum/

Semantic Search:from Names and Phrasesto Entities and Relations

Page 2: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Acknowledgements

Page 3: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Big Picture: Opportunities Now !

KB Population

Info Extraction Semantic Authoring

Entity Linkage

Web of D

ataW

eb o

f Use

rs &

Con

tent

s

Very Large Knowledge Bases

Semantic Docs

Disambiguation

Page 4: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Big Picture: Opportunities Now !

KB Population

Info Extraction Semantic Authoring

Entity Linkage

Web of D

ataW

eb o

f Use

rs &

Con

tent

s

Very Large Knowledge Bases

Semantic Docs

Disambiguation

This talk:How Do We Search this World ofKnowledge, Data, and Text(and cope with ambiguity)

for Knowledge Harvestingsee talks at College de Franceand at VLDB School in Kunming

Page 6: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.png

Web of Data: RDF, Tables, Microdata

YAGO

30 Bio. SPO triples (RDF) and growing

• 10M entities in 350K classes• 120M facts for 100 relations• 100 languages• 95% accuracy

• 4M entities in 250 classes• 500M facts for 6000 properties• live updates

• 25M entities in 2000 topics• 100M facts for 4000 properties• powers Google knowledge graph

Ennio_Morricone type composerEnnio_Morricone type GrammyAwardWinnercomposer subclassOf musicianEnnio_Morricone bornIn RomeRome locatedIn ItalyEnnio_Morricone created Ecstasy_of_GoldEnnio_Morricone wroteMusicFor The_Good,_the_Bad_,and_the_UglySergio_Leone directed The_Good,_the_Bad_,and_the_Ugly

Page 7: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

owl:s

ameAs

rdf.freebase.com/ns/en.rome

owl:sameAs

owl:sameAs

data.nytimes.com/51688803696189142301

Coord

geonames.org/3169070/roma

N 41° 54' 10'' E 12° 29' 2''

dbpprop:citizenOf

dbpedia.org/resource/Rome

rdf:ty

pe

rdfs:subclassOf

yago/wordnet:Actor109765278

rdf:ty

pe

rdfs:subclassOfyago/wikicategory:ItalianComposer

yago/wordnet: Artist109812338

prop:actedInimdb.com/name/nm0910607/

Linked RDF Triples on the Web

prop: composedMusicFor

imdb.com/title/tt0361748/

dbpedia.org/resource/Ennio_Morricone

500 Mio. links

Page 8: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Embedding (RDF) Microdata in HTML Pages

May 2, 2011

Maestro Morricone will perform on the stage of the Smetana Hall to conduct the Czech National Symphony Orchestra and Choir. The concert will feature both Classical compositions and soundtracks such asthe Ecstasy of Gold.In programme two concerts for July 14th and 15th.

<html … May 2, 2011

<div typeof=event:music>

<span id="Maestro_Morricone">Maestro Morricone<a rel="sameAs"resource="dbpedia/Ennio_Morricone "/></span>…<span property = "event:location" >Smetana Hall </span>…<span property="rdf:type"resource="yago:performance">The concert </span> will feature …<span property="event:date" content="14-07-2011"></span>July 1

</div>

Supported by RDFaand microformats like schema.org

Page 9: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Outline

Opportunities Now

Entity Name Disambiguation

Question Answering

Disambiguation Reloaded

Wrap-Up

Semantic Search Today

Page 10: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (1)

Page 11: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (1)

Page 12: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (1)

Page 13: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (1)

Page 14: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (1)

Page 15: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (2)

Select ?x Where { ?x type composer [western movie] .?x wasBornIn ?y . ?y locatedIn Europe . }

Page 16: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (2)

Select ?x Where { ?x type composer .?x participatedIn ?y . ?y type western_film . }

Page 17: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (3)

Page 18: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (3)

Page 19: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (3)

Page 20: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (4)

Page 21: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Search Today (4)Key problem in semantic search: diversity and ambiguity of names and phrases !

Page 22: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Outline

Opportunities Now

Entity Name Disambiguation

Question Answering

Disambiguation Reloaded

Wrap-Up

Semantic Search Today

Page 23: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Three Different NLP Problems

Harry fought with you know who. He defeats the dark lord.

1) named-entity detection: segment & label by HMM or CRF (e.g. Stanford NER tagger)

2) co-reference resolution: link to preceding NP (trained classifier over linguistic features)3) named-entity disambiguation: map each mention (name) to canonical entity (entry in KB)

Three NLP tasks:

HarryPotter

DirtyHarry

LordVoldemort

The Who(band)

Prince Harryof England

3-23

Page 24: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Sergio talked to Ennio aboutEli‘s role in theEcstasy scene. This sequence onthe graveyardwas a highlight inSergio‘s trilogyof western films.

Named Entity Disambiguation

D5 Overview May 30, 2011

Sergio means Sergio_LeoneSergio means Serge_GainsbourgEnnio means Ennio_AntonelliEnnio means Ennio_MorriconeEli means Eli_(bible)Eli means ExtremeLightInfrastructureEli means Eli_WallachEcstasy means Ecstasy_(drug)Ecstasy means Ecstasy_of_Goldtrilogy means Star_Wars_Trilogytrilogy means Lord_of_the_Ringstrilogy means Dollars_Trilogy … … …

KB

Eli (bible)

Eli Wallach

Mentions(surface names)

Entities(meanings)

Dollars Trilogy

Lord of the Rings

Star Wars Trilogy

Benny Andersson

Benny Goodman

Ecstasy of Gold

Ecstasy (drug)

?

3-24

Page 25: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Sergio talked to Ennio aboutEli‘s role in theEcstasy scene. This sequence onthe graveyardwas a highlight inSergio‘s trilogyof western films.

Mention-Entity Graph

Dollars Trilogy

Lord of the Rings

Star Wars

Ecstasy of Gold

Ecstasy (drug)

Eli (bible)

Eli Wallach

KB+Stats

weighted undirected graph with two types of nodes

Popularity(m,e):• freq(e|m)• length(e)• #links(e)

Similarity (m,e):• cos/Dice/KL (context(m), context(e))

bag-of-words orlanguage model:words, bigrams, phrases

3-25

Page 26: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Sergio talked to Ennio aboutEli‘s role in theEcstasy scene. This sequence onthe graveyardwas a highlight inSergio‘s trilogyof western films.

Mention-Entity Graph

Dollars Trilogy

Lord of the Rings

Star Wars

Ecstasy of Gold

Ecstasy (drug)

Eli (bible)

Eli Wallach

KB+Stats

weighted undirected graph with two types of nodes

Popularity(m,e):• freq(e|m)• length(e)• #links(e)

Similarity (m,e):• cos/Dice/KL (context(m), context(e))

jointmapping

3-26

Page 27: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Mention-Entity Graph

27 / 20

Dollars Trilogy

Lord of the Rings

Star Wars

Ecstasy of Gold

Ecstasy(drug)

Eli (bible)

Eli Wallach

KB+Stats

weighted undirected graph with two types of nodes

Popularity(m,e):• freq(m,e|m)• length(e)• #links(e)

Similarity (m,e):• cos/Dice/KL (context(m), context(e))

Coherence (e,e‘):• dist(types)• overlap(links)• overlap (anchor words)

Sergio talked to Ennio aboutEli‘s role in theEcstasy scene. This sequence onthe graveyardwas a highlight inSergio‘s trilogyof western films.

3-27

Page 28: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Mention-Entity Graph

28 / 20

KB+Stats

weighted undirected graph with two types of nodes

Popularity(m,e):• freq(m,e|m)• length(e)• #links(e)

Similarity (m,e):• cos/Dice/KL (context(m), context(e))

Coherence (e,e‘):• dist(types)• overlap(links)• overlap (anchor words)

American Jewsfilm actorsartistsAcademy Award winners

Metallica songsEnnio Morricone songsartifactssoundtrack music

spaghetti westernsfilm trilogiesmoviesartifactsDollars Trilogy

Lord of the Rings

Star Wars

Ecstasy of Gold

Ecstasy (drug)

Eli (bible)

Eli Wallach

Sergio talked to Ennio aboutEli‘s role in theEcstasy scene. This sequence onthe graveyardwas a highlight inSergio‘s trilogyof western films.

3-28

Page 29: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Mention-Entity Graph

29 / 20

KB+Stats

weighted undirected graph with two types of nodes

Popularity(m,e):• freq(m,e|m)• length(e)• #links(e)

Similarity (m,e):• cos/Dice/KL (context(m), context(e))

Coherence (e,e‘):• dist(types)• overlap(links)• overlap (anchor words)

http://.../wiki/Dollars_Trilogyhttp://.../wiki/The_Good,_the_Bad, _the_Uglyhttp://.../wiki/Clint_Eastwoodhttp://.../wiki/Honorary_Academy_Award

http://.../wiki/The_Good,_the_Bad,_the_Uglyhttp://.../wiki/Metallicahttp://.../wiki/Bellagio_(casino)http://.../wiki/Ennio_Morricone

http://.../wiki/Sergio_Leonehttp://.../wiki/The_Good,_the_Bad,_the_Uglyhttp://.../wiki/For_a_Few_Dollars_Morehttp://.../wiki/Ennio_MorriconeDollars Trilogy

Lord of the Rings

Star Wars

Ecstasy of Gold

Ecstasy (drug)

Eli (bible)

Eli Wallach

Sergio talked to Ennio aboutEli‘s role in theEcstasy scene. This sequence onthe graveyardwas a highlight inSergio‘s trilogyof western films.

3-29

Page 30: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Mention-Entity Graph

30 / 20

KB+StatsPopularity(m,e):• freq(m,e|m)• length(e)• #links(e)

Similarity (m,e):• cos/Dice/KL (context(m), context(e))

Coherence (e,e‘):• dist(types)• overlap(links)• overlap (anchor words)

Metallica on Morricone tributeBellagio water fountain showYo-Yo MaEnnio Morricone composition

The Magnificent SevenThe Good, the Bad, and the UglyClint EastwoodUniversity of Texas at Austin

For a Few Dollars MoreThe Good, the Bad, and the UglyMan with No Name trilogysoundtrack by Ennio Morricone

weighted undirected graph with two types of nodes

Dollars Trilogy

Lord of the Rings

Star Wars

Ecstasy of Gold

Ecstasy (drug)

Eli (bible)

Eli Wallach

Sergio talked to Ennio aboutEli‘s role in theEcstasy scene. This sequence onthe graveyardwas a highlight inSergio‘s trilogyof western films.

3-30

Page 31: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Joint Mapping

• Build mention-entity graph or joint-inference factor graph from knowledge and statistics in KB• Compute high-likelihood mapping (ML or MAP) or dense subgraph such that: each m is connected to exactly one e (or at most one e)

9030

5100

100

50 20

50

90

80 90

30

10 10

20

30

30

3-31

Page 32: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Coherence Graph Algorithm

• Compute dense subgraph to maximize min weighted degree among entity nodes such that: each m is connected to exactly one e (or at most one e)• Greedy approximation: iteratively remove weakest entity and its edges• Keep alternative solutions, then use local/randomized search

9030

5100

100

50 50

90

80 90

30

10 20

10

20

30

30

[J. Hoffart et al.: EMNLP‘11]140

180

50

470

145

230

3-32

Page 33: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Mention-Entity Popularity Weights

• Collect hyperlink anchor-text / link-target pairs from• Wikipedia redirects• Wikipedia links between articles• Interwiki links between Wikipedia editions• Web links pointing to Wikipedia articles

…• Build statistics to estimate P[entity | name]

• Need dictionary with entities‘ names:• full names: Arnold Alois Schwarzenegger, Los Angeles, Microsoft Corp.• short names: Arnold, Arnie, Mr. Schwarzenegger, New York, Microsoft, …• nicknames & aliases: Terminator, City of Angels, Evil Empire, …• acronyms: LA, UCLA, MS, MSFT• role names: the Austrian action hero, Californian governor, CEO of MS, …

… plus gender info (useful for resolving pronouns in context): Bill and Melinda met at MS. They fell in love and he kissed her.

[Milne/Witten 2008, Spitkovsky/Chang 2012]

3-33

Page 34: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Mention-Entity Similarity Edges

Extent of partial matches Weight of matched words

Precompute characteristic keyphrases q for each entity e:anchor texts or noun phrases in e page with high PMI:

)()(

),()(~)|(

mcontextinekeyphrasesq

mcover(q)distqscoremescore

1

)|(#~)|(qw

cover(q)w

e)|weight(w

ewweight

cover(q)oflengthwordsmatchingeqscore

)()(),(log),(efreqqfreq

eqfreqeqweight

Match keyphrase q of candidate e in context of mention m

Compute overall similarity of context(m) and candidate e

„Metallica tribute to Ennio Morricone“

The Ecstasy piece was covered by Metallica on the Morricone tribute album.

3-34

Page 35: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Entity-Entity Coherence EdgesPrecompute overlap of incoming links for entities e1 and e2

))2(),1(min(log||log))2()1(log())2,1(max(log1

eineinEeineineein~e2)coh(e1,-mw

Alternatively compute overlap of anchor texts for e1 and e2

or overlap of keyphrases, or similarity of bag-of-words, or …

)2()1()2()1(

engramsengramsengramsengrams

~e2)coh(e1,-ngram

Optionally combine with type distance of e1 and e2(e.g., Jaccard index for type instances)

For special types of e1 and e2 (locations, people, etc.)use spatial or temporal distance

3-35

Page 36: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

AIDA: Accurate Online Disambiguation

http://www.mpi-inf.mpg.de/yago-naga/aida/3-36

Page 37: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

AIDA: Accurate Online Disambiguation

http://www.mpi-inf.mpg.de/yago-naga/aida/3-37

Page 38: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

http://www.mpi-inf.mpg.de/yago-naga/aida/

AIDA: Very Difficult Example

3-38

Page 39: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

http://www.mpi-inf.mpg.de/yago-naga/aida/

AIDA: Very Difficult Example

3-39

Page 40: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

AIDA: Accurate Online Disambiguation

http://www.mpi-inf.mpg.de/yago-naga/aida/3-40

Page 41: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

AIDA: Accurate Online Disambiguation

http://www.mpi-inf.mpg.de/yago-naga/aida/3-41

Page 42: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Some NED Online Tools forJ. Hoffart et al.: EMNLP 2011, VLDB 2011https://d5gate.ag5.mpi-sb.mpg.de/webaida/P. Ferragina, U. Scaella: CIKM 2010http://tagme.di.unipi.it/R. Isele, C. Bizer: VLDB 2012http://spotlight.dbpedia.org/demo/index.htmlReuters Open Calaishttp://viewer.opencalais.com/ S. Kulkarni, A. Singh, G. Ramakrishnan, S. Chakrabarti: KDD 2009http://www.cse.iitb.ac.in/soumen/doc/CSAW/D. Milne, I. Witten: CIKM 2008http://wikipedia-miner.cms.waikato.ac.nz/demos/annotate/

perhaps more

some use Stanford NER tagger for detecting mentionshttp://nlp.stanford.edu/software/CRF-NER.shtml

3-42

Page 43: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

NED: Experimental EvaluationBenchmark:• Extended CoNLL 2003 dataset: 1400 newswire articles• originally annotated with mention markup (NER), now with NED mappings to Yago and Freebase• difficult texts: … Australia beats India … Australian_Cricket_Team … White House talks to Kreml … President_of_the_USA … EDS made a contract with … HP_Enterprise_Services

Results:Best: AIDA method with prior+sim+coh + robustness test82% precision @100% recall, 87% mean average precisionComparison to other methods, see paper

J. Hoffart et al.: Robust Disambiguation of Named Entities in Text, EMNLP 2011http://www.mpi-inf.mpg.de/yago-naga/aida/

3-43

Page 44: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Ongoing Research & Remaining Challenges• More efficient graph algorithms (multicore, etc.)

• Short and difficult texts: • tweets, headlines, etc.• fictional texts: novels, song lyrics, etc.• incoherent texts

• Disambiguation beyond entity names:• coreferences: pronouns, paraphrases, etc.• common nouns, verbal phrases (general WSD)

• Leverage deep-parsing structures, leverage semantic types Example: Page played Kashmir on his Gibson

subj obj

mod

• Allow mentions of unknown entities, mapped to null

• Structured Web data: tables and lists

3-44

Page 45: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Variants of NED at Web Scale

• How to run this on big batch of 1 Mio. input texts? partition inputs across distributed machines, organize dictionary appropriately, … exploit cross-document contexts

• How to handle Web-scale inputs (100 Mio. pages) restricted to a set of interesting entities? (e.g. tracking politicians and companies)

Tools can map short text onto entities in a few seconds

3-45

Page 46: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Outline

Opportunities Now

Entity Name Disambiguation

Question Answering

Disambiguation Reloaded

Wrap-Up

Semantic Search Today

Page 47: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Deep Question Answering

99 cents got me a 4-pack of Ytterlig coasters from this Swedish chain

This town is known as "Sin City" & its downtown is "Glitter Gulch"

William Wilkinson's "An Account of the Principalities of Wallachia and Moldavia" inspired this author's most famous novel

As of 2010, this is the only former Yugoslav republic in the EU

YAGO

knowledgeback-ends

questionclassification &decomposition

D. Ferrucci et al.: Building Watson. AI Magazine, Fall 2010.IBM Journal of R&D 56(3/4), 2012: This is Watson.

Page 48: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Semantic Keyword Search Need to map (groups of) keywords onto entities & relationshipsbased on name-entity similarities/probabilities

q: composer Rome scores westerns

[Ilyas et al. Sigmod‘10]

Media Composervideo editor

Western Digital

Rome(Italy)

goal infootball

film music

composer(creatorof music)

Rome(NY)

LazioRoma

western movies

western world

Western (airline)ASRoma

Western (NY)

… born in … … plays for … … used in … … recorded at …

Page 49: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Natural Language Questions are Natural

Who composed scores for westerns and is from Rome?

translate question into Sparql query:• dependency parsing to decompose question• mapping of question units onto entities, classes, relations

Who composedscores for westernsand is from Rome?

map resultsinto tabular or visual presentationor speech

Page 50: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

From Questions to Queries

NL question:

Who composed scores for westerns and is from Rome?

scores for westerns

is from Rome Who composed scores

Dependency parsing exposes structure of question „triploids“ (sub-cues)

2-50

Page 51: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

From Triploids to TriplesWho composed scores for westerns and is from Rome?

Who is from Rome

Who composed scores

scores for westerns

?x composed scores

?x bornIn Rome

scores contributesTo ?y?y type westernMovie

?x type composer?x composed ?s

?s contributesTo ?y

?s type music

2-51

Page 52: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Pattern Dictionary for Relations[N. Nakashole et al.: EMNLP 2012]

WordNet-style dictionary/taxonomy for relational phrases based on SOL patterns (syntactic-lexical-ontological)

• Relational phrases can be synonymous

• One relational phrase can subsume another

• Relational phrases are typed

Problem: cope with language diversity & ambiguityExample: composed …, wrote …, created …, …

“graduated from” “obtained degree in * from”“and $PRP ADJ advisor” “under the supervision of”

“wife of” “ spouse of”

<person> graduated from <university><singer> released <album><singer> covered <song> <book> covered <event>

Page 53: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

PATTY: Pattern Taxonomy for Relations[N. Nakashole et al.: EMNLP 2012, demo at VLDB 2012]

350 000 SOL patterns with 4 Mio. instancesDerived from large data (Wikipedia, NYT, ClueWeb)by scalable sequence miningaccessible at: www.mpi-inf.mpg.de/yago-naga/patty

Page 54: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Disambiguation Mapping for TriploidsWho composed scores for westerns and is from Rome?

composed

composedscores

scores for

westerns

is from

Rome

Who

q1

q2

q3

q4

Combinatorial Optimization by ILP (with type constraints etc.)

e: Rome (Italy)e: Lazio Roma

c: personc: musiciane: WHO

r: createdr: wroteCompositionr: wroteSoftware

c:soundtrackr: soundtrackForr: shootsGoalFor

r: bornInr: actedIn

c: western moviee: Western Digital

wei

ghte

d ed

ges

(coh

eren

ce, s

imila

rity,

etc

.)

Page 55: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Relaxing Overconstrained QueriesSelect ?p Where {

?p composed ?s . ?s type music . ?s for ?m . ?m type movie .?p bornIn Rome . }

Select ?p Where {

?p composed ?s . ?s type music . ?s for ?m . ?m type movie [western] .?p bornIn Rome . }

Select ?p Where {

?p ?rel1 ?s [composed] . ?s type music . ?s ?rel2 ?m . ?m type movie [western] .?p bornIn Rome . }

with extended SPARQL-FullText: SPOX quad patterns

(S. Elbassuoni et al.: CIKM‘10, ESWC’11, SIGIR‘12)

Select ?p Where {?p composed ?s . ?s type music . ?s for ?m . ?m type movie [western] .?p bornIn Rome . }

Page 56: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Preliminary Results (M. Yahya et al.: WWW‘12, EMNLP‘12)

http://www.mpi-inf.mpg.de/yago-naga/deanna/

Page 57: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Outline

Opportunities Now

Entity Name Disambiguation

Question Answering

Disambiguation Reloaded

Wrap-Up

Semantic Search Today

Page 58: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Disambiguation MappingWho composed scores for westerns and is from Rome?

composed

composedscores

scores for

westerns

is from

Rome

Who

q1

q2

q3

q4

e:Rome (Italy)e:Lazio Roma

c:personc:musiciane:WHO

r:createdr:wroteCompositionr:wroteSoftware

c:soundtrackr:soundtrackForr:shootsGoalFor

r:bornInr:actedIn

c:western moviee:Western Digital

wei

ghte

d ed

ges

(coh

eren

ce, s

imila

rity,

etc

.)

Selection: Xi Assignment: YijJointMapping: Zkl

[M.Yahya et al.: EMNLP‘12]

Page 59: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Disambig. Mapping: Objective FunctionWho composed scores for westerns and is from Rome?

composed

composedscores

scores for

westerns

is from

Rome

Who

q1

q2

q3

q4

e:Rome (Italy)e:Lazio Roma

c:personc:musiciane:WHO

r:createdr:wroteCompositionr:wroteSoftware

c:soundtrackr:soundtrackForr:shootsGoalFor

r:bornInr:actedIn

c:western moviee:Western Digital

wei

ghte

d ed

ges

(coh

eren

ce, s

imila

rity,

etc

.)

Selection: Xi Assignment: YijJointMapping: Zkl

maximize i,j wij Yij + k,l vkl Zkl +… subject to:1) Yij Xi for all i,j2) j Yij 1 for all i3) Zkl i,j Yik and Zkl j Yil for all k,l4) Xi,Yij,Zkl {0,1}

wijvkl

Page 60: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Disambig. Mapping: ConstraintsWho composed scores for westerns and is from Rome?

composed

composedscores

scores for

westerns

is from

Rome

Who

q1

q2

q3

q4

e:Rome (Italy)e:Lazio Roma

c:personc:musiciane:WHO

r:createdr:wroteCompositionr:wroteSoftware

c:soundtrackr:soundtrackForr:shootsGoalFor

r:bornInr:actedIn

c:western moviee:Western Digital

wei

ghte

d ed

ges

(coh

eren

ce, s

imila

rity,

etc

.)

Selection: Xi Assignment: YijJointMapping: Zkl

maximize i,j wij Yij + k,l vkl Zkl +… subject to:5) Qhi = 1 g Qhg = 3 for all h,i6) Xi + Xg 1 for all mutually exclusive i,g7) Qhi = 1 g,j Qhg Ygj = 1 for relation nodes j

wijvkl

Selection: Qhi

Page 61: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Disambig. Mapping: Type ConstraintsWho composed scores for westerns and is from Rome?

composed

composedscores

scores for

westerns

is from

Rome

Who

q1

q2

q3

q4

e:Rome (Italy)e:Lazio Roma

c:personc:musiciane: WHO

r:createdr:wroteCompositionr:wroteSoftware

c:soundtrackr:soundtrackForr:shootsGoalFor

r:bornInr:actedIn

c:western moviee:Western Digital

wei

ghte

d ed

ges

(coh

eren

ce, s

imila

rity,

etc

.)

Selection: Xi Assignment: YijJointMapping: Zkl

maximize i,j wij Yij + k,l vkl Zkl +… subject to:8) Yij = 1 and j is relation node and Zkj=1 and Zjl=1 domain(j) types(k) and range(j) types(l)

wijvkl

Selection: Qhi

ILP optimizers like Gurobisolve this in 1 or 2 seconds

Page 62: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Outline

Opportunities Now

Entity Name Disambiguation

Question Answering

Disambiguation Reloaded

Wrap-Up

Semantic Search Today

Page 63: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Summary

• Web of Data & Knowledge & Text (RDF + Phrases) Calls for Semantic Search by Entities, Classes & Relations

• Diversity & Ambiguity of Names and Phrases Calls for Disambiguation Mapping

• Strong Story for Entity Name Disambiguation

• Ongoing Work on Relation Phrase Disambiguation

• Cornerstone of Question Answering with Natural Language or Advanced Keywords

Great opportunity towards next-generation searchChallenging problems: robustness, scale, dynamics & transfer

Page 64: Gerhard Weikum  Max Planck Institute  for Informatics & Saarland  University weikum

Take-Home Message

Solve „Who composed the Ecstasy and other pieces for westerns?“

can solve semantic search with natural-language disambiguation