Advanced Information Retrieval -...

Advanced Information Retrieval

Wintersemester 2007/08Teil 7

Question AnsweringUwe QuasthoffUniversität Leipzig

Institut für [email protected]

unter Verwendung von Material von Giuseppe Attardi, Dipartimento di Informatica, Università di Pisa

Question Answering

IR: find documents relevant to query– query: boolean combination of

keywords QA: find answer to question

– Question: expressed in natural language

– Answer: short phrase (< 50 byte)

Trec-9 Q&A track

693 fact-based, short answer questions– either short (50 B) or long (250 B) answer

~3 GB newspaper/newswire text (AP, WSJ, SJMN, FT, LAT, FBIS)

Score: MRR (penalizes second answer) Resources: top 50 (no answer for 130 q) Questions: 186 (Encarta), 314 (seeds from

Excite logs), 193 (syntactic variants of 54 originals)

Commonalities

Approaches:– question classification– finding entailed answer type– use of WordNet

High-quality document search helpful (e.g. Queen College)

Sample Questions

Q: Who shot President Abraham Lincoln?A: John Wilkes Booth

Q: How many lives were lost in the Pan Am crash in Lockerbie?A: 270

Q: How long does it take to travel from London to Paris through the Channel?A: three hours 45 minutesQ: Which Atlantic hurricane had the highest recorded wind speed?A: Gilbert (200 mph)

Q: Which country has the largest part of the rain forest?A: Brazil (60%)

Question Types

Class 1 Answer: single datum or list of itemsC: who, when, where, how (old, much, large)

Class 2 A: multi-sentenceC: extract from multiple sentences

Class 3 A: across several textsC: comparative/contrastive

Class 4 A: an analysis of retrieved informationC: synthesized coherently from several retrieved fragments

Class 5 A: result of reasoningC: word/domain knowledge and common sense reasoning

Question subtypes

Class 1.A About subjects, objects, manner, time or location

Class 1.B About properties or attributes

Class 1.C Taxonomic nature

Results (long)

0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

SMU

Queens

Waterlo

oIB

MLIM

SINTT IC

Pisa

MRRUnofficial

Falcon: ArchitectureQuestion

Question Semantic Form

ExpectedAnswer

TypeAnswer

Paragraphs

Answer Semantic Form

Answer

Answer Logical Form

Paragraph Index

Question Processing Paragraph Processing Answer Processing

Paragraph filtering

Collins Parser + NE Extraction

Abduction Filter

Coreference Resolution

Question Taxonomy

Question ExpansionWordNet

Collins Parser + NE Extraction

Question Logical Form

Question parse

Who was the first Russian astronaut to walk in space

WP VBD DT JJ NNP NP TO VB IN NN

NP NP

PP

VP

S

VP

S

Question semantic form

astronaut

walk space

Russianfirst

PERSON

first(x) ∧ astronaut(x) ∧ Russian(x) ∧ space(z) ∧ walk(y, z, x) ∧ PERSON(x)

Question logic form:Question logic form:

Answer type

Expected Answer Type

size Argentina

dimension

QUANTITYWordNet

Question: Question: What is the size of Argentina?What is the size of Argentina?

Questions about definitions

Special patterns:– What {is|are} …?– What is the definition of …?– Who {is|was|are|were} …?

Answer patterns:– …{is|are}– …, {a|an|the}– … -

Question Taxonomy

Reason

Number

Manner

Location

Organization

Product

Language

Mammal

Currency

Nationality

Question

Game

Reptile

Country

City

Province

Continent

Speed

Degree

Dimension

Rate

Duration

Percentage

Count

Question expansion

Morphological variants– invented → inventor

Lexical variants– killer → assassin– far → distance

Semantic variants– like → prefer

Indexing for Q/A

Alternatives:– IR techniques– Parse texts and derive conceptual

indexes Falcon uses paragraph indexing:

– Vector-Space plus proximity– Returns weights used for abduction

Abduction to justify answers

Backchaining proofs from questions Axioms:

– Logical form of answer– World knowledge (WordNet)– Coreference resolution in answer text

Effectiveness:– 14% improvement– Filters 121 erroneous answers (of 692)– Requires 60% question processing time

TREC 13 QA

Several subtasks:– Factoid questions– Definition questions– List questions– Context questions

LCC still best performance, but different architecture

LCC Block Architecture

PassageRetrieval

Answer Extraction

Theorem Prover

Answer Justification

Answer Reranking

Axiomatic Knowledge Base

WordNetNER WordNetNER

DocumentRetrieval

Keywords Passages

Question Semantics

Captures the semantics of the questionSelects keywords for PR

Extracts and ranks passagesusing surface-text techniques

Extracts and ranks answersusing NL techniques

Q AQuestion Parse

Semantic

Transformation

Recognition of Expected Answer Type

Keyword Extraction

Question Processing Answer Processing

Question Processing

Two main tasks– Determining the type of the answer– Extract keywords from the question and

formulate a query

Answer Types

Factoid questions…– Who, where, when, how many…– The answers fall into a limited and

somewhat predictable set of categories• Who questions are going to be answered

by… • Where questions…

– Generally, systems select answer types from a set of Named Entities, augmented with other types that are relatively easy to extract

Answer Types

Of course, it isn’t that easy…– Who questions can have organizations

as answers• Who sells the most hybrid cars?

– Which questions can have people as answers

• Which president went to war with Mexico?

Answer Type Taxonomy Contains ~9000 concepts reflecting expected

answer types Merges named entities with the WordNet hierarchy

Answer Type Detection

Most systems use a combination of hand-crafted rules and supervised machine learning to determine the right answer type for a question.

Not worthwhile to do something complex here if it can’t also be done in candidate answer passages.

Keyword Selection

Answer Type indicates what the question is looking for:– It can be mapped to a NE type and used

for search in enhanced index Lexical terms (keywords) from the

question, possibly expanded with lexical/semantic variations provide the required context.

Keyword Extraction

Questions approximated by sets of unrelated keywordsQuestion (from TREC QA track) Keywords

Q002: What was the monetary value of the Nobel Peace Prize in 1989?

monetary, value, Nobel, Peace, Prize

Q003: What does the Peugeot company manufacture?

Peugeot, company, manufacture

Q004: How much did Mercury spend on advertising in 1993?

Mercury, spend, advertising, 1993

Q005: What is the name of the managing director of Apricot Computer?

name, managing, director, Apricot, Computer

Keyword Selection Algorithm

1. Select all non-stopwords in quotations2. Select all NNP words in recognized

named entities3. Select all complex nominals with their

adjectival modifiers4. Select all other complex nominals5. Select all nouns with adjectival modifiers6. Select all other nouns7. Select all verbs8. Select the answer type word

Passage RetrievalExtracts and ranks passagesusing surface-text techniques

PassageRetrieval

Answer Extraction

Theorem Prover


Answer Reranking



DocumentRetrieval

Keywords Passages

Question Semantics

Q AQuestion Parse

Semantic

Transformation


Keyword Extraction


Passage Extraction Loop

Passage Extraction Component– Extracts passages that contain all selected keywords– Passage size dynamic– Start position dynamic

Passage quality and keyword adjustment– In the first iteration use the first 6 keyword selection

heuristics– If the number of passages is lower than a threshold ⇒

query is too strict ⇒ drop a keyword– If the number of passages is higher than a threshold ⇒

query is too relaxed ⇒ add a keyword

Passage Scoring Passages are scored based on keyword windows

– For example, if a question has a set of keywords: {k1, k2, k3, k4}, and in a passage k1 and k2 are matched twice, k3 is matched once, and k4 is not matched, the following windows are built:

k1 k2 k3k2 k1

Window 1

k1 k2 k3k2 k1

Window 2

k1 k2 k3k2 k1

Window 3

k1 k2 k3k2 k1

Window 4

Passage Scoring Passage ordering is performed using

a sort that involves three scores:– The number of words from the question

that are recognized in the same sequence in the window

– The number of words that separate the most distant keywords in the window

– The number of unmatched keywords in the window

Answer Extraction

Extracts and ranks answersusing NL techniques

PassageRetrieval

Answer Extraction

Theorem Prover


Answer Reranking



DocumentRetrieval

Keywords Passages

Question Semantics

Q AQuestion Parse

Semantic

Transformation


Keyword Extraction


Ranking Candidate Answers

■ Answer type: Person■ Text passage:

“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”

Q066: Name the first private citizen to fly in space.

Ranking Candidate Answers

■ Answer type: Person■ Text passage:

“Among them was Christa McAuliffe, the first private citizen to fly in space. Karen Allen, best known for her starring role in “Raiders of the Lost Ark”, plays McAuliffe. Brian Kerwin is featured as shuttle pilot Mike Smith...”

■ Best candidate answer: Christa McAuliffe

Q066: Name the first private citizen to fly in space.

Features for Answer Ranking Number of question terms matched in the answer passage Number of question terms matched in the same phrase as the

candidate answer Number of question terms matched in the same sentence as

the candidate answer Flag set to 1 if the candidate answer is followed by a

punctuation sign Number of question terms matched, separated from the

candidate answer by at most three words and one comma Number of terms occurring in the same order in the answer

passage as in the question Average distance from candidate answer to question term

matches

Lexical ChainsQuestion: When was the internal combustion engine

invented?Answer: The first internal combustion engine was built in

1867.Lexical chains:(1) invent:v#1 → HYPERNIM → create_by_mental_act:v#1 →

HYPERNIM → create:v#1 → HYPONIM → build:v#1

Question: How many chromosomes does a human zygote have?

Answer: 46 chromosomes lie in the nucleus of every normal human cell.

Lexical chains:(1) zygote:n#1 → HYPERNIM → cell:n#1 → HAS.PART →

nucleus:n#1

Theorem ProverQ: What is the age of the solar system?QLF: quantity_at(x2) & age_nn(x2) & of_in(x2,x3) & solar_jj(x3) &

system_nn(x3)Question Axiom: (exists x1 x2 x3 (quantity_at(x2) & age_NN(x2) &

of_in(x2,x3) & solar_jj(x3) & system_nn(x3))Answer: The solar system is 4.6 billion years old.Wordnet Gloss: old_jj(x6) ↔ live_vb(e2,x6,x2) & for_in(e2,x1) &

relatively_jj(x1) & long_jj(x1) & time_nn(x1) & or_cc(e5,e2,e3) & attain_vb(e3,x6,x2) & specific_jj(x2) & age_nn(x2)

Linguistic Axiom: all x1 (quantity_at(x1) & solar_jj(x1) & system_nn(x1) → of_in(x1,x1))

Proof: ¬quantity_at(x2) | ¬age_nn(x2) | ¬of_in(x2,x3) | ¬solar_jj(x3) | ¬system_nn(x3)

Refutation assigns value to x2

Is the Web Different?

In TREC (and most commercial applications), retrieval is performed against a smallish closed collection of texts.

The diversity/creativity in how people express themselves necessitates all that work to bring the question and the answer texts together.

But…

The Web is Different

On the Web popular factoids are likely to be expressed in a gazillion different ways.

At least a few of which will likely match the way the question was asked.

So why not just grep (or agrep) the Web using all or pieces of the original question.

AskMSR

Process the question by…– Forming a search engine query from the

original question– Detecting the answer type

Get some results Extract answers of the right type

based on– How often they occur

Step 1: Rewrite the questions

Intuition: The user’s question is often syntactically quite close to sentences that contain the answer

– Where is the Louvre Museum located? • The Louvre Museum is located in Paris

– Who created the character of Scrooge?• Charles Dickens created the character of

Scrooge.

Query rewritingClassify question into seven categories

– Who is/was/are/were…?– When is/did/will/are/were …?– Where is/are/were …?

a. Hand-crafted category-specific transformation rulese.g.: For where questions, move ‘is’ to all possible locations

Look to the right of the query terms for the answer.

“Where is the Louvre Museum located?” → “is the Louvre Museum located” → “the is Louvre Museum located” → “the Louvre is Museum located” → “the Louvre Museum is located” → “the Louvre Museum located is”

Step 2: Query search engine

Send all rewrites to a Web search engine

Retrieve top N answers (100-200) For speed, rely just on search

engine’s “snippets”, not the full text of the actual document

Step 3: Gathering N-Grams Enumerate all N-grams (N=1,2,3) in all

retrieved snippets Weight of an n-gram: occurrence count,

each weighted by “reliability” (weight) of rewrite rule that fetched the document– Example: “Who created the character of

Scrooge?”Dickens 117Christmas Carol 78Charles Dickens 75Disney 72Carl Banks 54A Christmas 41Christmas Carol 45Uncle 31

Step 4: Filtering N-Grams

Each question type is associated with one or more “data-type filters” = regular expressions for answer types

Boost score of n-grams that match the expected answer type.

Lower score of n-grams that don’t match.

Step 5: Tiling the Answers

Dickens

Charles Dickens

Mr Charles

Scores

20

15

10

merged, discardold n-grams

Mr Charles DickensScore 45

Results

Standard TREC contest test-bed (TREC 2001): 1M documents; 900 questions– Technique does ok, not great (would have

placed in top 9 of ~30 participants)– But with access to the Web… they do

much better, would have come in second on TREC 2001

Harder Questions

Factoid question answering is really pretty silly.

A more interesting task is one where the answers are fluid and depend on the fusion of material from disparate texts over time.– Who is Condoleezza Rice?– Who is Mahmoud Abbas?– Why was Arafat flown to Paris?

Components

Language Processing Tools

Maximum Entropy classifier Sentence Splitter Multi-language POS Tagger Multi-language NE Tagger Conceptual clustering

MaxEntropy: application

Sentence Splitting Not all punctuations are sentence

boundaries:– U.S.A.– St. Helen– 3.14

Use features like:– Capitalization (previous, next word)– Present in abbreviation list– Suffix/prefix digits– Suffix/prefix long

Precision: > 95%

Part of Speech Tagging

TreeTagger: statistic package based on HMM and decision trees

Trained on manually tagged text Full language lexicon (with all

inflections: 140.000 words for Italian)

Training Corpus

Il DET:def:*:*:masc:sg _ilpresidente NOM:*:*:*:masc:sg _presidentedella PRE:det:*:*:femi:sg _delRepubblica NOM:*:*:*:femi:sg _repubblicafrancese ADJ:*:*:*:femi:sg _franceseFrancois NPR:*:*:*:*:* _FrancoisMitterrand NPR:*:*:*:*:* _Mitterrandha VER:aux:pres:3:*:sg _avereproposto VER:*:pper:*:masc:sg _proporre…

Named Entity Tagger

Uses MaxEntropy NE categories:

– Top level: NAME, ORGANIZATION, LOCATION, QUANTITY, TIME, EVENT, PRODUCT

– Second level: 30-100. E.g. QUANTITY:• MONEY, CARDINAL, PERCENT,

MEASURE, VOLUME, AGE, WEIGHT, SPEED, TEMPERATURE, ETC.

See resources at CoNLL (cnts.uia.ac.be/connl2004)

NE Features

Feature types:– word-level (es. capitalization, digits, etc.)– punctuation– POS tag– Category designator (Mr, Av.)– Category suffix (center, museum, street, etc.) – Lowercase intermediate terms (of, de, in)– presence in controlled dictionaries (locations,

people, organizations) Context: words in position -1, 0, +1

Sample training document<TEXT> Today the <ENAMEX TYPE='ORGANIZATION'>Dow Jones</ENAMEX>

industrial average gained <NUMEX TYPE='MONEY'>thirtyeight and three quarter points</NUMEX>.

When the first American style burger joint opened in <ENAMEX TYPE='LOCATION'>London</ENAMEX>'s fashionable <ENAMEX TYPE='LOCATION'>Regent street</ENAMEX> some <TIMEX TYPE='DURATION'>twenty years</TIMEX> ago, it was mobbed.

Now it's <ENAMEX TYPE='LOCATION'>Asia</ENAMEX>'s turn.</TEXT><TEXT> The temperatures hover in the <NUMEX

TYPE='MEASURE'>nineties</NUMEX>, the heat index climbs into the <NUMEX TYPE='MEASURE'>hundreds</NUMEX>.

And that's continued bad news for <ENAMEX TYPE='LOCATION'>Florida</ENAMEX> where wildfires have charred nearly <NUMEX TYPE='MEASURE'>three hundred square miles</NUMEX> in the last <TIMEX TYPE='DURATION'>month</TIMEX> and destroyed more than a <NUMEX TYPE='CARDINAL'>hundred</NUMEX> homes.

</TEXT>

Clustering

Classification: assign an item to one among a given set of classes

Clustering: find groupings of similar items (i.e. generate the classes)

Conceptual Clustering of results

Similar to Vivisimo– Built on the fly rather than from– Predefined categories (Northern Light)

Generalized suffix tree of snippets Stemming Stop words (articulated, essential)

PiQASso: Pisa Question Answering System

“Computers are useless, they can only give answers”

Pablo Picasso

PiQASso Architecture

SentenceSplitter

Indexer

QueryFormulation/Expansion

WordNet

MiniPar

?

Documentcollection

MiniPar

TypeMatching

RelationMatching

Answer Pars

AnswerScoring

PopularityRanking

Answer

found?

Answer

Questionanalysis

Answer analysis

WNSense

QuestionClassification

Linguistic tools

• extracts lexical knowledge from WordNet

• classifies words according to WordNet top-level categories, weighting its senses

• computes distance between words based on is-a links

• suggests word alternatives for query expansion

What metal has the highest melting point?

subj lex-mod

obj

mod

WNSense Minipar [D. Lin]

Example: Theatre

Categorization: artifact 0.60, communication 0.40

Synonyms: dramaturgy, theater, house, dramatics

• Identifies dependency relations between words (e.g. subject, object, modifiers)

• Provides POS tagging

• Detects semantic types of words (e.g. location, person, organization)

• Extensible: we integrated a Maximum Entropy based Named Entity Tagger

Question AnalysisWhat metal has the highest melting point?

metal, highest, melting, point

2. Keyword extraction

1. Parsing

3. Answer type detection

SUBSTANCE

4. Relation extraction

<SUBSTANCE, has, subj><point, has, obj><melting, point, lex-mod><highest, point, mod>

1. NL question is parsed

2. POS tags are used to select search keywords

3. Expected answer type is determined applying heuristic rules to the dependency tree

4. Additional relations are inferred and the answer entity is identified

What metal has the highest melting point?

subj lex-mod

obj

mod

Answer Analysis

Tungsten is a very dense material and has the highest melting point of any metal.

1 Parsing………….

2 Answer type check 3 Relation extraction

SUBSTANCE<tungsten, material, pred><tungsten, has, subj><point, has, obj>…

4 Matching Distance

Tungsten

6 Popularity Ranking

ANSWER

1. Parse retrieved paragraphs

2. Paragraphs not containing an entity of the expected type are discarded

3. Dependency relations are extracted from Minipar output

4. Matching distance between word relations in question and answer is computed

5. Too distant paragraphs are filtered out

6. Popularity rank used to weight distances

5 Distance Filtering

Match Distance between Question and Answer

Analyze relations between corresponding words considering:

number of matching words in question and in answer

distance between words. Ex: moon matching with satellite

relation types. Ex: words in the question related by subj while the matching words in the answer related by pred

Improving PIQASso

More NLP

NLP techniques largely unsuccessful at information retrieval– Document retrieval as primary measure

of information retrieval success• Document retrieval reduces the need for

NLP techniques– Discourse factors can be ignored– Query words perform word-sense

disambiguation– Lack of robustness:

• NLP techniques are typically not as robust as word indexing

How these technologies help? Question Analysis

– The tag of the predicted category is added to the query

Named-Entity Detection:– The NE categories found in text are included

as tags in the index

What party is John Kerry in? (ORGANIZATION)

John Kerry defeated John Edwards in the primaries for the Democratic Party.

Tags: PERSON, ORGANIZATION

NLP Technologies

Coreference Relations:– Interpretation of a paragraph may

depend on the context in which it occurs

Description Extraction:– Appositive and predicate nominative

constructions provide descriptive terms about entities

Represented as annotations associated to words, i.e. words in the same position as the reference

Coreference Relations

How long was Margaret Thatcher the prime minister? (DURATION)

The truth, which has been added to over each of her 11 1/2 years in power, is that they don't make many like her anymore.Tags: DURATIONColocated: her, MARGARET THATCHER

Description Extraction

Identifies DESCRIPTION category Allows descriptive terms to be used

in term expansion

Famed architect Frank Gary… Tags: DESCRIPTION, PERSON, LOCATION

Buildings he designed include the Guggenheim Museum in Bilbao.Colocation: he, FRANK GARY

Who is Frank Gary? (DESCRIPTION) What architect designed the Guggenheim Museum in Bilbao? (PERSON)

NLP Technologies

Question Analysis:– identify the semantic type of the

expected answer implicit in the query Named-Entity Detection:

– determine the semantic type of proper nouns and numeric amounts in text

Will it work?

Will these semantic relations improve paragraph retrieval?– Are the implementations robust enough

to see a benefit across large document collections and question sets?

– Are there enough questions where these relationships are required to find an answer?

Hopefully yes!

Preprocessing

Paragraph Detection Sentence Detection Tokenization POS Tagging NP-Chunking

Queries to a NE enhanced index

text matches bushtext matches PERSON:bushtext matches LOCATION:* & PERSON:

bin-ladentext matches DURATION:*

PERSON:margaret-thatcher prime-minister

Coreference

Task:– Determine space of entity extents:

• Basal noun phrases:– Named entities consisting of multiple basal

noun phrases are treated as a single entity• Pre-nominal proper nouns• Possessive pronouns

– Determine which extents refer to the same entity in the world

Paragraph Retrieval

Indexing:– add NE tags for each NE category

present in the text– add coreference relationships– Use syntactically-based categorical

relations to create a DESCRIPTION category for term expansion

– Use IXE passage indexer

High Composability

DocInfo

PassageDoc

Collection<DocInfo>

Collection<PassageDoc>namedatesize

textboundaries QueryCursor

PassageQueryCursor

next()

next()

Cursornext()

Tagged Documents

QueryCursor

QueryCursorTaggedWord

QueryCursorWord

select documents where – text matches bush– text matches PERSON:bush– text matches osama & LOCATION:*

Combination

Searching passages on a collection of tagged documents

PassageQueryCursor<Collection<TaggedDoc>>

QueryCursor<Collection>

Paragraph Retrieval

Retrieval:– Use question analysis component to

predict answer category and append it to the question

– Evaluate using TREC questions and answer patterns

• 500 questions

System Overview

NE Recognizer

Coreference Resolution

Documents

IXE Search

Question Analysis

Question

Paragraphs

Description Extraction

Paragraphs+

Sent. Splitter

POS tagger

Paragr. Splitter

Tokenization

IXE indexer

Indexing Retrieval

Conclusion

QA is a challenging task Involves state of the art techniques

in various fields:– IR– NLP– AI– Managing large data sets– Advanced Software Technologies

Advanced Information Retrieval -...

Documents

Transcript of Advanced Information Retrieval -...