Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433...

47
Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China

Transcript of Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433...

Page 1: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Open Domain Question Answering

Lide Wu

Dept. of Computer Science

Fudan University

Shanghai 200433

China

Page 2: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Outline

• What is open domain question answering (ODQA)

• The state of arts of ODQA

• The future of ODQA

• ODQA as a grand challenge in CS/AI/IT

• Summary

Page 3: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

What’s QA?

Free TextCorpus

question answer

When did Hawaii become

a state ? August

21, 1959

Page 4: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

When did Hawaii become a state?

• AnswerBus Question Answering System - When did Hawaii become a ... Type in your question in English, French, Spanish, German, Italianor Portuguese. Question: When did Hawaii become a state? ... www.answerbus.com/cgi-bin/ answer.cgi?When%2Bdid%2BHawaii%2Bbecome%2Ba%2Bstate%3F - 4k - Cached - Similar pages

• uncategorized threads in About Hawaii... How did Hawaii become a state? What is the history of Hawaii?? ... When and whydid Hawaii become a state (cause and effect); Safe to live by Mauna Loa? ... www.greenspun.com/bboard/ q-and-a-one-category.tcl?topic=About%20Hawaii&category=uncategorized - 5k - Cached - Similar pages

• Is Hawaii Really a State of the Union?... Become a state, or remain a territory? Why was the option of independence not onthe ballot? Did Hawaii not have the option to become an independent country in ... www.hawaii-nation.org/statehood.html - 14k - Cached - Similar pages

• Hawaii Flag Printout - EnchantedLearning.com... __________________________________________. 3. When did Hawaii become a stateof the USA? _______________. Copyright ©2000-2003 EnchantedLearning.com. www.enchantedlearning.com/usa/ flags/hawaii/hawaiiflag.shtml - 3k - Cached - Similar pages

• PaleoZoo's Prehistoric Hawaii!... became extinct after rats and mongooses arrived in Hawaii. ... let Nature decide whena species should become extinct. They decided to save the nene, and they did. ... www.geobop.com/paleozoo/World/NA/US/HI/ - 42k - Cached - Similar pages

Page 5: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

When did Hawaii become a state?• HAWAII SUPREME COURT DROPS GAY MARRIAGE CASE || Human Rights ...

... It did not bar future cases that seek the benefits, protections and responsibilitiesthat come ... Their ads claimed that Hawaii would become the "homosexual ... www.hrc.org/newsreleases/1999/991210.asp - 16k - 30 Jun 2003 - Cached - Similar pages

• Maui Trivia by MAUI CHEETAH... ~ Ans: Front Street in Lahaina ***** submitted by: THonings; Whendid hawaii become a state? ~~ Ans: 1959 ***** submitted ... www.mauigateway.com/~rw/trivia1.htm - 12k - Cached - Similar pages

• State Bird of Hawaii Unmasked as Canadian... it should be no surprise that Canada geese did it some ... But in their adopted tropicalhabitat of Hawaii, the birds "evolved to become more independent of ... news.nationalgeographic.com/news/2002/ 02/0206_020206_canadiangeese.html - 38k - Cached - Similar pages

• [PDF]BEFORE ARBITRATOR TAMOTSU TANAKA STATE OF HAWAII In the Matter of ...

File Format: PDF/Adobe Acrobat - View as HTML... training to insure that qualified employees become available. ... did not contravenethe provisions of the Collective ... DATED: Honolulu, Hawaii, December 10, 1998. ... www.state.hi.us/hrd/121098.pdf - Similar pages

Page 6: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.
Page 7: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Comparison to Search Engines

• More natural interface

Natural language question vs Keywords

• More compact answer

Exact answers vs Relevant documents

Page 8: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

The General solution of QA

Question AnalysisModel

Search EngineModel

Answer extractionModel

Query set

Answer Type/Patterns

Potential segments

Page 9: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Question Analysis

•Input: Question ( When did Hawaii become a state?)

•Output: Answer type/Patters (Date)

Queries (A group of key words:

Hawaii, state, became…)

•Methods: POS tagging

Named entity tagging

BMP Chunking

Syntactic parsing

Semantic tagging

…..

Page 10: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Question Analysis

•Input: Question ( When did Hawaii become a state?)

•Output:

Answer type : Date

Patters :“ Hawaii became a state in….”

“In … Hawaii became a state.”

………….

Queries (A group of key words):

“When did Hawaii become a state”

“Hawaii became a state in….”

Hawaii, state, became

Page 11: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Search

• Input: Queries (“Hawaii became a state in”,

i.e. groups of key words or phrases

• Output: Text segments (snippets) relevant to

the answer such as the ones returned

by Google

• Methods: Search Engines for passages

Page 12: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Answer Extraction• Input: Question answer type/patterns from question analysis Snippets returned by search engines• Output: Answers• Methods: POS tagging Named entity tagging BMP Chunking Syntactic parsing Semantic tagging Co-reference resolution Logic Proving/Matching ………….

Page 13: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Answer ExtractionQuestion: When did Hawaii become a state?

Answer type: Date

Patterns from question analysis:

“Hawaii became a state ….”

“In … Hawaii became a state.”

………….

Snippets returned by search engines”:

“…Hawaii became the 50th state on Aug.21,1959…”

“…Hawaii joined the States in 1959……”

………………

Page 14: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Key techniques

CL:• Part-of-speech tagging• NE tagging• Semantic tagging• BNP Chunking• Reference resolution• Syntactic parsing

IR: Search Engine

AI:• Pattern Matching• Logic proving

Machine Learning

Page 15: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Key Knowledge

Dictionaries• WordNet• HowNet• FrameNet

World Knowledge

• Encyclopedia• Web

Page 16: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

The State of The Arts: Introduction of TREC- QA Task

• http://trec.nist.gov

• Organized by NIST

• Sponsor : NIST, DARPA, and ARDA

• Start from 1999

• Have the most participants among tasks

Page 17: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

TREC-QA2002 participants (35)• Alicante Unv. BBN,

• CMU-Javelin,

• Chinese Academy of Sciences,

• CL Research,

• Columbia Univ.-Illouz,

• Fudan University,

• IBM T.J. Watson Res. Ctr.-Ittycheriah,

• IBM T.J. Watson Res. Ctr.-Prager,

• InsightSoft-M,

• ITC-irst,

•Language Comuter Corporation,

•LIMSI,

•MIT,

•National Univ. of Singapore-Lee,

•National Univ. of Singapore-Hui,

• NTT Communication Science Labs,

•POSTECH,

•Syracuse University,

•The MITRE Corp.

•Tokyo Univ. of Science,

•Univ. of Amsterdam –Monz,

•Universit d’ Angers,

•Univ. of Avignon,

•Univ. of Illinois at Urbana/Champaign,

•Univ. of Iowa,

•Univ. of Limerick,

•Univ. of Michigan,

•Univ. of Montreal,

•Univ. of Pisa,

•Univ. of Sheffield,

•Univ of Southern California/ISI,

•Univ. of Waterloo,

• Univ. of York

Page 18: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Document set

• The document set is the set of documents on

the AQUAINT disk set.

• 3GB

• News

Page 19: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Evaluation500 questions (Ex. When did Hawaii become as state?)

For each question the answer is evaluated as• Incorrect (W): the answer-string does not contain a correct

answer or the answer is not responsive; • Unsupported (U): the answer-string contains a correct answer

but the document returned does not support that answer; • Non-exact (X): the answer-string contains a correct answer and

the document supports that answer, but the string contains more than just the answer (or is missing bits of the answer);

• Correct (R): the answer-string consists of exactly a correct answer and that answer is supported by the document returned.

• Only correct answers have scores

Page 20: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Score

500

1

#

500

1

i i

QuestionithToUpAnswersCorrectofS

Page 21: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Top 15 Groups (2002)

Page 22: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

TREC-QA2003 participants (25)

• Alicante Unv. BBN,

• CMU-Javelin,

• Chinese Academy of Sciences,

• CL Research,

• Fudan University,

• IBM T.J. Watson Res. Ctr.-Ittycheriah,

• IBM T.J. Watson Res. Ctr.-Prager,

• ITC-irst,

• Language Comuter Corporation

•, Lexiclone Inc

• LIMSI,

• MIT,

• National Univ. of Singapore,

• NTT Communication Science Labs,

• New Mexico State Univ.

•The MITRE Corp.

•Univ. of Amsterdam –Monz,

•Univ. of Iowa,

•Univ. of Limerick,

• UPC&UdG

•Univ. of Pisa,

•Univ. of Sheffield,

•Univ of Southern California/ISI,

• Univ. of Waterloo,

• Univ. of Wales Bangor

Page 23: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.
Page 24: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

TREC2004:Question Set

• A series of questions for each of a set of targets

• Number of targets: 50-100

• Each series will contain:– Several factoid questions– 0-2 list questions– A question called “other”

Page 25: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Example question• <target id="1" text="AmeriCorps">

       <qa>             <q id = "1.1" type="FACTOID">                         When was AmeriCorps founded?       </q>       </qa>       <qa>             <q id = "1.2" type="FACTOID">                         How many volunteers work for it?       </q>       </qa>       <qa>             <q id = "1.3" type="LIST">                         What activities are its volunteers involved in?       </q>       </qa>       <qa>             <q id="1.4" type="OTHER">                         Other       </q>       </qa></target>

Page 26: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Question Set

• Targets:– Suggested by mining Microsoft and AOL web

search logs

• The assessors created the questions before they did any searching of the document set to find answers to the questions.

Page 27: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

The future of ODQA: A Roadmap---Adapted from NIST Vision paper

Variation of questions

Page 28: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.
Page 29: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

The simplest questions

•Factual questions : What is Hawaii’s state flower?

•Void Questions : The answer is no longer

guaranteed to be present in the text collection

and the systems are expected to notify the

absence of an answer.

•List Questions : The answer is scattered across two or

more documents

Context Questions : A group of relevant questions “within a

context”

Page 30: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

List Questions The answer is scattered across two or more documents

What countries from the South America did the Pope visit and when?

Answer:• Argentina – 1987 [Document Source 1]• Columbia – 1986 [Document Source 2]• Brazil – 1982, 1991 [Document Source 3]

Page 31: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Context QuestionsA group of relevant questions “within a context”

• Context: Topic 168

- Title: Financing AMTRAK

- Description: The role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK).

 

• (Q1) Why AMTRAK cannot be considered economically viable ?• (Q2) Should it be privatized ?• (Q3) How much larger are the government subsidies to AMTRAK as

compared to those given to air transportation ?

Page 32: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Definition/Template Question

•There are some template for this kind of questions

•Example: Who is XXX?

The template consists of

The address, phone number, Fax number,

Email address, Website,….

The Education history

The working experience

The contributions

………

Page 33: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Question with ambiguity

The answer will comprise an

explanation of possible

ambiguities and a justification of

why the answer is right

Page 34: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

ExamplesWhere is the Taj Mahal? Answer: If you are interested in the Indian landmark, it is in Agra, India. If instead you want to find the location of the Casino, it is in

Atlantic City, NJ, U.S.A. There are also several restaurants named Taj Mahal. A full list is

rendered by the following hypertable. If you click on the location, you may find the address.

  The Taj Mahal Indian Cuisine, Mountain View, CA The Taj Mahal Restaurant, Dallas, TX Taj Mahal, Las Vegas, NV Taj Mahal, Springfield, VA

Page 35: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Examples

How did Socrates die?

 

Answer:

He drunk poisoned wine.

Anyone drinking or eating something that is poisoned is likely to die.

Page 36: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Summaries as answer

• More complex questions will requires the answers to be summaries of the textual information comprised in one or several documents.

• The summarization is going to be driven by the question from one or multiple documents,

• Moreover, the summary will present in a coherent manner using text generation capabilities.

Page 37: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Examples

• Context-based summary-generating questions.

What is the financial situation of AMTRAK? • Stand-alone summary-generating questions

How safe are commercial flights? • Example-based summary-generating questions

What other companies are operated with Government aid?

Page 38: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Expert-Level Questions

The questions asked by expert requires • Collect sufficient structured and unstructured

information for different domains. • Mining domain knowledge and mastering the

relationships between all activities, situations and facts within a specific domain.

• Reasoning by analogy, comparing and discovering new relations

Page 39: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Examples

• (Q1) What are the opinions of the Danes on the Euro?

• (Q2) Why so many people buy four-wheel-drive cars lately?

• (Q3) How likely is it that the Fed will raise the interest rates at their next meeting?

Page 40: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

A General Approach• Accept complex “Questions” in a form natural to the

analyst • Translate “Complex Question” into multiple queries

appropriate to the various data sets to be searched • Find relevant information in distributed, multimedia,

multilingual, multi-agency data sources. • Analyze, fuse and summarize information into a

coherent “Answer. • Provide (Proposed) “Answer” to analyst in the form

they want. • Provide Multimedia Visualization and Navigation

tools.

Page 41: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

ODQA as a grand challenge

What makes a good long-range research goalor a grand challenge ---Jim Gray

• Understandable. The goal should be simple to state • Challenging. It should not be obvious how to achieve

the goal • Useful. If the goal is achieved, the results should be

clearly useful to many people• Testable. Solutions to the goal should have a simple test

so that one can measure progress and one can tell when the goal is achieved

• Incremental. It is very desirable that the goal has intermediate milestones so that progress can be measured along the way

Page 42: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

QA as a grand challenge

A more demanding task is to take a corpus like the Internet or the Computer Science journals, or Encyclopedia Britannica, and be able to answer summarization questions about it as well as a human expert in that field

---Jim Gray Journal ACM, Jan.2003 ( J.ACM’s 50th Anniversary)

Page 43: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

QA as a grand challenge

Read a Chapter in a Book and Answer the Questions at the End of the Chapter. Reading and understanding books is a quintessentially human activity. It is the process by which much knowledge transfer occurs from generation to generation.

-- Ai-Raj Reddy Journal ACM, Jan.2003

Page 44: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

QA as a grand challenge

• Build a large knowledge base by reading text, reducing knowledge engineering effort by one order of magnitude

• The intent here is to “educate” a knowledge base in the same way that we receive most of our education

--Edward A. Feigenbaum Journal ACM, Jan.2003

Page 45: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

QA as a grand challenge

Because questions can be devises to query any aspect of text comprehension, the ability to answer questions is the strongest possible demonstration of understanding.

---Wendy Lehnert

• So ODQA is AI complete in some sense

Page 46: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Conclusion

• Open Domain Question Answering is

a grand challenge in CS/AI/IT

•It is Understandable,

Challenging,

Useful,

Testable,

and Incremental.

Page 47: Open Domain Question Answering Lide Wu Dept. of Computer Science Fudan University Shanghai 200433 China.

Thanks