Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of...

13
Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Transcript of Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of...

Page 1: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Text Analysis ConferenceKnowledge Base Population

2013

Hoa Trang DangNational Institute of Standards and Technology

Sponsored by:

Page 2: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

TAC KBP Goals

• Goal: Populate a knowledge base (KB) with information about entities as found in a collection of source documents, following a specified schema for the KB

• KBP 2009-2011: Focus on augmenting an existing KB. Decompose KBP into two tasks▫ Entity-Linking: link each given named entity mention to a node in

reference KB (or create new node)▫ Slot-Filling: Learn attributes about target entities from the source

documents and add new information about the entity to the reference KB

• KBP 2012: Combine entity-linking and slot-filling to build a KB from scratch -> Cold Start

• KBP 2013: ▫ Conversational, informal data (discussion fora)▫ Temporal constraints for Slot Filling (2011 pilot)▫ Sentiment analysis for Slot Filling

Page 3: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

TAC KBP 2013 Track Participants

• Track coordinators▫ Hoa Dang (Slot Filler Validation)▫ Jim Mayfield (Entity Linking, Cold Start KBP)▫ Margaret Mitchell (Sentiment Slot Filling)▫ Mihai Surdeanu (English Slot Filling and Temporal Slot

Filling)• LDC linguistic resource providers: Joe Ellis, Jeremy

Getman, Justin Mott, Xuansong Li, Kira Griffitt, Stephanie M. Strassel, Jonathan Wright

• Coordinators emeritus: Ralph Grishman, Heng Ji• Advisor: Boyan Onyshkevych• 45 Teams

▫ 14 countries (21 USA, 9 China, 3 Spain, 2 Germany,….)

Page 4: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

6 (8) TAC KBP 2013 Tracks

• Entity-Linking▫ English▫ Chinese▫ Spanish

• Slot-Filling (English)▫ Regular▫ Sentiment▫ Temporal▫ Slot Filler Validation Task

• Cold Start (English)

Page 5: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Entity Linking and Slot Filling Tracks

• Goal: Augment a reference knowledge base (KB) with info about query entities (PER, ORG, GPE) as found in a diverse collection of documents

• Reference KB: Oct 2008 Wikipedia snapshot. Each KB node corresponds to a Wikipedia page and contains:▫ Infobox▫ Wiki_text (free text not in infobox)

• English source documents:▫ 1M News docs▫ 1M Web docs▫ 99K Discussion Forum docs (threads)

• Chinese source documents: 2M news, 800K Web• Spanish source documents: 900K news

Page 6: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Entity-Linking Evaluation Results

• English▫ Participants: 26 teams▫ Highest F1: 0.721 (0.730 in 2012)▫ Median F1: 0.583 (0.536 in 2012)

• Chinese▫ Participants: 4 teams▫ Highest F1: 0.622 (0.740 in 2012)▫ Median F1: 0.619 (0.617 in 2012)

• Spanish▫ Participants 3 teams▫ Highest F1: 0.709 (0.641 in 2012)▫ Median F1: 0.651 (0.612 in 2012)

Page 7: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Regular Slot Filling Evaluation Results

•Participants: 18 teams•Human F1: 0.685 (0.814 in 2012)•Highest System F1: 0.373 (0.517 in 2012)•2nd Highest System F1: 0.339 (0.296 in 2012)•Median System F1: 0.150 (0.099 in 2012)

Page 8: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Sentiment Slot Filling Track

• Sentiment analysis for KBP:▫Holder (PER, ORG, GPE)▫Target (PER, ORG, GPE)▫Polarity (positive, negative)

• Implemented as regular slot filling, with different set of slots▫{per,org,gpe}:positive-towards▫{per,org,gpe}:negative-towards▫{per,org,gpe}:positive-from▫{per,org,gpe}:negative-from

• Participants: 3 teams• Evaluation results:

▫Human F1: 0.727▫Highest System F1: 0.132▫Median System F1: 0.014

Page 9: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Temporal Slot Filling Track

• Find tightest temporal constraints [T1 T2 T3 T4] on a given relation▫ Relation is true for a period beginning between T1 and

T2▫ Relation is true for a period ending between T3 and T4

• Participants: 5 teams• Evaluation results:

▫ Human Accuracy: 0.688▫ Highest System Accuracy: 0.331▫ Median System Accuracy: 0.148

Page 10: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Slot Filler Validation Track (SFV)

• Task: Determine whether or not a candidate slot filler is correct

• Objective: improve precision without excessive reduction of recall

• Participants: 5 teams• Some SFV runs had overwhelmingly positive impact

on individual SF runs!

Page 11: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Cold Start KBP Track

• Goal: Build a KB from scratch, containing all targeted info about all entities as found in a relatively closed domain corpus of documents

• KB schema: same entity types and slots as regular slot-filling task• Source document collection:

▫ 50K Web pages from small-town publications (from TREC KBA document stream)

• Required capabilities:▫ Entity-linking: Grounding all named entity mentions in docs to

KB nodes▫ Slot-filling: Learning attributes about all named entities

• Post-submission evaluation queries traverse KB starting from a single entity node (entity mention):▫ 0-hop: Find all children of Michael Jordan▫ 1-hop: Find date of birth of each of the children of Michael

Jordan

Page 12: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

Cold Start Evaluation Results (Preliminary)

• Participants: 3 teams• 0-hop queries:

▫ Highest F1 0.384 (0.497 in 2012)• 1-hop queries:

▫ Highest F1 0.145 (0.255 in 2012)• Combined 0-hop and 1-hop F1

▫ Highest F1: 0.278 (~0.352 in 2012)

Page 13: Text Analysis Conference Knowledge Base Population 2013 Hoa Trang Dang National Institute of Standards and Technology Sponsored by:

TAC KBP Discussion/Planning Sessions

• Monday, November 18 (2:15-3:10pm):▫ English Slot Filling▫ Slot Filler Validation▫ Temporal Slot Filling?▫ +Spanish Slot Filling?▫ +Event identification and argument extraction?

• Tuesday, November 19 (3:00-4:00pm):▫ Cold Start▫ English Entity Linking (as queries in Cold Start

framework?)▫ Cross-Lingual Spanish and Chinese Entity Linking

+ Discussion forum