Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and...

16
Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology) James Mayfield (Johns Hopkins University) Joe Ellis, Xuansong Li, Kira Griffitt, Stephanie Strassel, Jonathan Wright (Linguistic Data Consortium)

Transcript of Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and...

Page 1: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Overview of the KBP 2012Slot-Filling Tasks

Hoa Trang Dang (National Institute of Standards and TechnologyJavier Artiles (Rakuten Institute of Technology)James Mayfield (Johns Hopkins University)Joe Ellis, Xuansong Li, Kira Griffitt, Stephanie Strassel, Jonathan Wright (Linguistic Data Consortium)

Page 2: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Slot-filling Tasks

• Goal: Augment a reference knowledge base (KB) with info about target entities as found in a diverse collection of documents

• Reference KB: Oct 2008 Wikipedia snapshot. Each KB node corresponds to a Wikipedia and contains:▫ Infobox▫ Wiki_text (free text not in infobox)

• English source documents:▫ 2.3 M news docs (1.2 M docs in 2011)▫ 1.5 M Web and other docs (0.5 M docs in 2011)

• [Spanish source documents]• Diagnostic task: Slot Filler Validation

Page 3: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Slots derived from Wikipedia infoboxPerson Organization

per:alternate_names per:member_of org:alternate_names

per:date_of_birth per:employee_of org:political_religious_affiliation

per:age per:religion org:top_members_employees

per:country_of_birth per:spouse org:number_of_employees

per:stateorprovince_of_birth per:children org:members

per:city_of_birth per:parents org:member_of

per:date_of_death per:siblings org:subsidiaries

per:country_of_death per:other_family org:parents

per:stateorprovince_of_death per:charges org:founded_by

per:city_of_death org:date_founded

per:cause_of_death org:date_dissolved

per:countries_of_residence org:country_of_headquarters

per:statesorprovinces_of_residence

org:stateorprovince_of_headquarters

per:cities_of_residence org:city_of_headquarters

per:schools_attended org:shareholders

per:title org:website

Page 4: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Slot-Filling Task Requirements

• Task: given target entity and predefined slots for each entity type (PER, ORG), return all new slot fillers for that entity that can be found in the source documents, and a supporting document for each filler

• Non-redundant▫ Don’t return a slot filler if it’s already in the KB▫ Don’t return more than one instance of a slot filler

• Exact boundaries of filler string, as found in supporting document▫ Text is complete (e.g., “John Doe” rather than “John”)▫ No extraneous text (e.g., “John Doe” rather than “John Doe’s house”

• Evaluation based on TREC-QA pooling methodology, combine▫ Candidate slot fillers from non-exhaustive manual search▫ Candidate slot fillers from fully automatic systems

Answer “key” is incomplete, coverage depends on number, quality, and diversity of contributing systems.

Page 5: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Differences from KBP 2011

• Offsets provided for target entity mention in query• Increased number of submissions (up to 5)• Require normalization of slot fillers that are dates

(“yesterday” -> “2012-11-04”)• Request each proposed slot filler to include

▫ A confidence value▫ Offsets for justification (usually a sentence)▫ Offsets for the raw (unnormalized) slot filler in the document

• Move toward more precise justifications▫ Improved usability (for humans) in end applications▫ Improved training data for systems

• Offsets and confidence values did not affect official scores▫ But confidence values were used to rank and truncate

extremely lengthy submissions

Page 6: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Slot-Filling Evaluation

• Pool responses from submitted runs and from manual search -> ▫ Set of [docid, answer-string] pairs for each target entity and

slot• Assessment:

▫ Each pair judged as one of correct, redundant, inexact, or wrong (credit given only for correct responses)

▫ Correct pairs grouped into equivalence classes (entities); each single-valued slot has at most one equivalence class for a given target entity

• Scoring:▫ Recall: number of correct equivalence classes returned /

number of known equivalence classes▫ Precision: number of correct equivalence classes returned /

number of [docid, answer-string] pairs returned▫ F1 = (P*R)/(R+P)

Page 7: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Slot Filling Participants

Team Organization

ADVIS_UIC * University of Illinois at Chicago

GDUFS * Guangdong University of Foreign Affairs

IIRG University College Dublin

lsv Saarland University

NLPComp The Hong Kong Polytechnic University

NYU New York University

papelo * NEC Laboratories America

PRIS Beijing University of Posts and Telecommunications

Siel_12 International Institute of Information Technology, Hyderabad

sweat2012 * Chinese Academy of Sciences

TALP_UPC * Technical University of Catalonia, UPC

* first-time slot-filling team

Page 8: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Top 6 KBP 2012 Slot-Filling teams

Page 9: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Top 4 KBP 2012 Slot-Filling teams

Page 10: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Slot-Filling Approaches

• IIRG: (+ling, -ML)▫ Stanford CoreNLP for POS, NER, parse. ▫ Sentence retrieval by exact match with named mention of target entity. ▫ Rule-based pattern matching and keyword matching to identify slot fillers.

• lsv: (-ling, +ML)▫ Shallow approach – no parse or coref▫ Query expansion via Wikipedia redirect links▫ SVM and Freebase for distant supervision

• NYU: (+ling, +ML)▫ POS, parse, NER, time expression tagging, coref▫ Query expansion via small set of handcrafted rules, Wikipedia redirect links▫ MaxEnt and Freebase for distant supervision. ▫ Combination of: hand-coded rules, patterns generated by bootstrapping and then

manually reviewed, and classifier trained by distant supervision• PRIS: (+ling, +ML)

▫ Stanford CoreNLP for POS, NER, SUTime, parse, coref; ▫ Query expansion via small set of handcrafted rules, coref’d names; ▫ Adaboost for finding new extraction patterns (word sequence patterns and

dependency path patterns)

Page 11: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Distribution of slots in answer key

Page 12: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Slot productivity

2010 1057

2011 953 2012 1569

per:title 14% per:title 21% per:title 14%

org:top_members/employees

12% org:top_members/employees

12% org:top_members_employees

11%

per:employee_of 7% org:alternate_names

10% per:member_of 6%

org:alternate_names

5% per:employee_of 7% per:children 6%

org:subsidiaries 4% per:member_of 5% org:alternate_names

6%

per:member_of 4% per:alternate_names

5% per:employee_of 4%

per:cities_of_residence

4% org:subsidiaries 3% per:cities_of_residence

4%

Page 13: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Slot filler Validation (SFV)

• Goals▫ Improve precision of full slot-filling systems (without reducing recall)▫ Allow teams without a full slot-filling system to participate, focus on

answer validation rather than document retrieval• SFV input:

▫ All input to slot-filling task▫ Submission files from all Slot Filling runs, containing candidate slot

fillers▫ No information about “past performance” of each slot filling system

• SFV output:▫ Binary classification (Correct / Incorrect) of each candidate slot filler

• Evaluation:▫ Filter out “Incorrect” slot fillers from each run, and score; compare

to score for original run• Submissions: 1 team (Blender_CUNY)

Page 14: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Filtering candidate slot fillers

Page 15: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

Answer Justification

• Goals▫ Improve training data for systems – narrow down location of answer

patterns▫ Reduce assessment effort (for correct answers with correct

justifications)▫ Improve usability (for humans) in end applications

• Task guidelines: ▫ For each slot filler, provide start and end offsets for the sentence or

clause that provides justification for the relation. For example, for query per:spouse of “Michelle Obama” and the sentence “He is married to Michelle Obama” (“He” referring to Barack Obama mentioned earlier in the document), the filler … should be “Barack Obama”, the offsets for filler must point to “He” and the offsets for justification must point to “He is married to Michelle Obama”.

• Slight mismatch with LDC assessment guidelines (require antecedent of relevant pronouns in justification, otherwise judged as inexact)▫ Need additional discussion/refinement of guidelines

Page 16: Overview of the KBP 2012 Slot-Filling Tasks Hoa Trang Dang (National Institute of Standards and Technology Javier Artiles (Rakuten Institute of Technology)

LDC Data, Annotation, and Assessment