Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433...

Post on 01-Apr-2015

222 views 1 download

Transcript of Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433...

Open Domain Question Answering

Lide Wu

Dept. of Computer Science

Fudan University

Shanghai 200433

China

Outline

• What is open domain question answering (ODQA)

• The state of arts of ODQA

• The future of ODQA

• ODQA as a grand challenge in CS/AI/IT

• Summary

What’s QA?

Free TextCorpus

question answer

When did Hawaii become

a state ? August

21, 1959

When did Hawaii become a state?

• AnswerBus Question Answering System - When did Hawaii become a ... Type in your question in English, French, Spanish, German, Italianor Portuguese. Question: When did Hawaii become a state? ... www.answerbus.com/cgi-bin/ answer.cgi?When%2Bdid%2BHawaii%2Bbecome%2Ba%2Bstate%3F - 4k - Cached - Similar pages

• uncategorized threads in About Hawaii... How did Hawaii become a state? What is the history of Hawaii?? ... When and whydid Hawaii become a state (cause and effect); Safe to live by Mauna Loa? ... www.greenspun.com/bboard/ q-and-a-one-category.tcl?topic=About%20Hawaii&category=uncategorized - 5k - Cached - Similar pages

• Is Hawaii Really a State of the Union?... Become a state, or remain a territory? Why was the option of independence not onthe ballot? Did Hawaii not have the option to become an independent country in ... www.hawaii-nation.org/statehood.html - 14k - Cached - Similar pages

• Hawaii Flag Printout - EnchantedLearning.com... __________________________________________. 3. When did Hawaii become a stateof the USA? _______________. Copyright ©2000-2003 EnchantedLearning.com. www.enchantedlearning.com/usa/ flags/hawaii/hawaiiflag.shtml - 3k - Cached - Similar pages

• PaleoZoo's Prehistoric Hawaii!... became extinct after rats and mongooses arrived in Hawaii. ... let Nature decide whena species should become extinct. They decided to save the nene, and they did. ... www.geobop.com/paleozoo/World/NA/US/HI/ - 42k - Cached - Similar pages

When did Hawaii become a state?• HAWAII SUPREME COURT DROPS GAY MARRIAGE CASE || Human Rights ...

... It did not bar future cases that seek the benefits, protections and responsibilitiesthat come ... Their ads claimed that Hawaii would become the "homosexual ... www.hrc.org/newsreleases/1999/991210.asp - 16k - 30 Jun 2003 - Cached - Similar pages

• Maui Trivia by MAUI CHEETAH... ~ Ans: Front Street in Lahaina ***** submitted by: THonings; Whendid hawaii become a state? ~~ Ans: 1959 ***** submitted ... www.mauigateway.com/~rw/trivia1.htm - 12k - Cached - Similar pages

• State Bird of Hawaii Unmasked as Canadian... it should be no surprise that Canada geese did it some ... But in their adopted tropicalhabitat of Hawaii, the birds "evolved to become more independent of ... news.nationalgeographic.com/news/2002/ 02/0206_020206_canadiangeese.html - 38k - Cached - Similar pages

• [PDF]BEFORE ARBITRATOR TAMOTSU TANAKA STATE OF HAWAII In the Matter of ...

File Format: PDF/Adobe Acrobat - View as HTML... training to insure that qualified employees become available. ... did not contravenethe provisions of the Collective ... DATED: Honolulu, Hawaii, December 10, 1998. ... www.state.hi.us/hrd/121098.pdf - Similar pages

Comparison to Search Engines

• More natural interface

Natural language question vs Keywords

• More compact answer

Exact answers vs Relevant documents

The General solution of QA

Question AnalysisModel

Search EngineModel

Answer extractionModel

Query set

Answer Type/Patterns

Potential segments

Question Analysis

•Input: Question ( When did Hawaii become a state?)

•Output: Answer type/Patters (Date)

Queries (A group of key words:

Hawaii, state, became…)

•Methods: POS tagging

Named entity tagging

BMP Chunking

Syntactic parsing

Semantic tagging

…..

Question Analysis

•Input: Question ( When did Hawaii become a state?)

•Output:

Answer type : Date

Patters :“ Hawaii became a state in….”

“In … Hawaii became a state.”

………….

Queries (A group of key words):

“When did Hawaii become a state”

“Hawaii became a state in….”

Hawaii, state, became

Search

• Input: Queries (“Hawaii became a state in”,

i.e. groups of key words or phrases

• Output: Text segments (snippets) relevant to

the answer such as the ones returned

by Google

• Methods: Search Engines for passages

Answer Extraction• Input: Question answer type/patterns from question analysis Snippets returned by search engines• Output: Answers• Methods: POS tagging Named entity tagging BMP Chunking Syntactic parsing Semantic tagging Co-reference resolution Logic Proving/Matching ………….

Answer ExtractionQuestion: When did Hawaii become a state?

Answer type: Date

Patterns from question analysis:

“Hawaii became a state ….”

“In … Hawaii became a state.”

………….

Snippets returned by search engines”:

“…Hawaii became the 50th state on Aug.21,1959…”

“…Hawaii joined the States in 1959……”

………………

Key techniques

CL:• Part-of-speech tagging• NE tagging• Semantic tagging• BNP Chunking• Reference resolution• Syntactic parsing

IR: Search Engine

AI:• Pattern Matching• Logic proving

Machine Learning

Key Knowledge

Dictionaries• WordNet• HowNet• FrameNet

World Knowledge

• Encyclopedia• Web

The State of The Arts: Introduction of TREC- QA Task

• http://trec.nist.gov

• Organized by NIST

• Sponsor : NIST, DARPA, and ARDA

• Start from 1999

• Have the most participants among tasks

TREC-QA2002 participants (35)• Alicante Unv. BBN,

• CMU-Javelin,

• Chinese Academy of Sciences,

• CL Research,

• Columbia Univ.-Illouz,

• Fudan University,

• IBM T.J. Watson Res. Ctr.-Ittycheriah,

• IBM T.J. Watson Res. Ctr.-Prager,

• InsightSoft-M,

• ITC-irst,

•Language Comuter Corporation,

•LIMSI,

•MIT,

•National Univ. of Singapore-Lee,

•National Univ. of Singapore-Hui,

• NTT Communication Science Labs,

•POSTECH,

•Syracuse University,

•The MITRE Corp.

•Tokyo Univ. of Science,

•Univ. of Amsterdam –Monz,

•Universit d’ Angers,

•Univ. of Avignon,

•Univ. of Illinois at Urbana/Champaign,

•Univ. of Iowa,

•Univ. of Limerick,

•Univ. of Michigan,

•Univ. of Montreal,

•Univ. of Pisa,

•Univ. of Sheffield,

•Univ of Southern California/ISI,

•Univ. of Waterloo,

• Univ. of York

Document set

• The document set is the set of documents on

the AQUAINT disk set.

• 3GB

• News

Evaluation500 questions (Ex. When did Hawaii become as state?)

For each question the answer is evaluated as• Incorrect (W): the answer-string does not contain a correct

answer or the answer is not responsive; • Unsupported (U): the answer-string contains a correct answer

but the document returned does not support that answer; • Non-exact (X): the answer-string contains a correct answer and

the document supports that answer, but the string contains more than just the answer (or is missing bits of the answer);

• Correct (R): the answer-string consists of exactly a correct answer and that answer is supported by the document returned.

• Only correct answers have scores

Score

500

1

#

500

1

i i

QuestionithToUpAnswersCorrectofS

Top 15 Groups (2002)

TREC-QA2003 participants (25)

• Alicante Unv. BBN,

• CMU-Javelin,

• Chinese Academy of Sciences,

• CL Research,

• Fudan University,

• IBM T.J. Watson Res. Ctr.-Ittycheriah,

• IBM T.J. Watson Res. Ctr.-Prager,

• ITC-irst,

• Language Comuter Corporation

•, Lexiclone Inc

• LIMSI,

• MIT,

• National Univ. of Singapore,

• NTT Communication Science Labs,

• New Mexico State Univ.

•The MITRE Corp.

•Univ. of Amsterdam –Monz,

•Univ. of Iowa,

•Univ. of Limerick,

• UPC&UdG

•Univ. of Pisa,

•Univ. of Sheffield,

•Univ of Southern California/ISI,

• Univ. of Waterloo,

• Univ. of Wales Bangor

TREC2004:Question Set

• A series of questions for each of a set of targets

• Number of targets: 50-100

• Each series will contain:– Several factoid questions– 0-2 list questions– A question called “other”

Example question• <target id="1" text="AmeriCorps">

       <qa>             <q id = "1.1" type="FACTOID">                         When was AmeriCorps founded?       </q>       </qa>       <qa>             <q id = "1.2" type="FACTOID">                         How many volunteers work for it?       </q>       </qa>       <qa>             <q id = "1.3" type="LIST">                         What activities are its volunteers involved in?       </q>       </qa>       <qa>             <q id="1.4" type="OTHER">                         Other       </q>       </qa></target>

Question Set

• Targets:– Suggested by mining Microsoft and AOL web

search logs

• The assessors created the questions before they did any searching of the document set to find answers to the questions.

The future of ODQA: A Roadmap---Adapted from NIST Vision paper

Variation of questions

The simplest questions

•Factual questions : What is Hawaii’s state flower?

•Void Questions : The answer is no longer

guaranteed to be present in the text collection

and the systems are expected to notify the

absence of an answer.

•List Questions : The answer is scattered across two or

more documents

Context Questions : A group of relevant questions “within a

context”

List Questions The answer is scattered across two or more documents

What countries from the South America did the Pope visit and when?

Answer:• Argentina – 1987 [Document Source 1]• Columbia – 1986 [Document Source 2]• Brazil – 1982, 1991 [Document Source 3]

Context QuestionsA group of relevant questions “within a context”

• Context: Topic 168

- Title: Financing AMTRAK

- Description: The role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK).

 

• (Q1) Why AMTRAK cannot be considered economically viable ?• (Q2) Should it be privatized ?• (Q3) How much larger are the government subsidies to AMTRAK as

compared to those given to air transportation ?

Definition/Template Question

•There are some template for this kind of questions

•Example: Who is XXX?

The template consists of

The address, phone number, Fax number,

Email address, Website,….

The Education history

The working experience

The contributions

………

Question with ambiguity

The answer will comprise an

explanation of possible

ambiguities and a justification of

why the answer is right

ExamplesWhere is the Taj Mahal? Answer: If you are interested in the Indian landmark, it is in Agra, India. If instead you want to find the location of the Casino, it is in

Atlantic City, NJ, U.S.A. There are also several restaurants named Taj Mahal. A full list is

rendered by the following hypertable. If you click on the location, you may find the address.

  The Taj Mahal Indian Cuisine, Mountain View, CA The Taj Mahal Restaurant, Dallas, TX Taj Mahal, Las Vegas, NV Taj Mahal, Springfield, VA

Examples

How did Socrates die?

 

Answer:

He drunk poisoned wine.

Anyone drinking or eating something that is poisoned is likely to die.

Summaries as answer

• More complex questions will requires the answers to be summaries of the textual information comprised in one or several documents.

• The summarization is going to be driven by the question from one or multiple documents,

• Moreover, the summary will present in a coherent manner using text generation capabilities.

Examples

• Context-based summary-generating questions.

What is the financial situation of AMTRAK? • Stand-alone summary-generating questions

How safe are commercial flights? • Example-based summary-generating questions

What other companies are operated with Government aid?

Expert-Level Questions

The questions asked by expert requires • Collect sufficient structured and unstructured

information for different domains. • Mining domain knowledge and mastering the

relationships between all activities, situations and facts within a specific domain.

• Reasoning by analogy, comparing and discovering new relations

Examples

• (Q1) What are the opinions of the Danes on the Euro?

• (Q2) Why so many people buy four-wheel-drive cars lately?

• (Q3) How likely is it that the Fed will raise the interest rates at their next meeting?

A General Approach• Accept complex “Questions” in a form natural to the

analyst • Translate “Complex Question” into multiple queries

appropriate to the various data sets to be searched • Find relevant information in distributed, multimedia,

multilingual, multi-agency data sources. • Analyze, fuse and summarize information into a

coherent “Answer. • Provide (Proposed) “Answer” to analyst in the form

they want. • Provide Multimedia Visualization and Navigation

tools.

ODQA as a grand challenge

What makes a good long-range research goalor a grand challenge ---Jim Gray

• Understandable. The goal should be simple to state • Challenging. It should not be obvious how to achieve

the goal • Useful. If the goal is achieved, the results should be

clearly useful to many people• Testable. Solutions to the goal should have a simple test

so that one can measure progress and one can tell when the goal is achieved

• Incremental. It is very desirable that the goal has intermediate milestones so that progress can be measured along the way

QA as a grand challenge

A more demanding task is to take a corpus like the Internet or the Computer Science journals, or Encyclopedia Britannica, and be able to answer summarization questions about it as well as a human expert in that field

---Jim Gray Journal ACM, Jan.2003 ( J.ACM’s 50th Anniversary)

QA as a grand challenge

Read a Chapter in a Book and Answer the Questions at the End of the Chapter. Reading and understanding books is a quintessentially human activity. It is the process by which much knowledge transfer occurs from generation to generation.

-- Ai-Raj Reddy Journal ACM, Jan.2003

QA as a grand challenge

• Build a large knowledge base by reading text, reducing knowledge engineering effort by one order of magnitude

• The intent here is to “educate” a knowledge base in the same way that we receive most of our education

--Edward A. Feigenbaum Journal ACM, Jan.2003

QA as a grand challenge

Because questions can be devises to query any aspect of text comprehension, the ability to answer questions is the strongest possible demonstration of understanding.

---Wendy Lehnert

• So ODQA is AI complete in some sense

Conclusion

• Open Domain Question Answering is

a grand challenge in CS/AI/IT

•It is Understandable,

Challenging,

Useful,

Testable,

and Incremental.

Thanks