From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs Artificial Intelligence Center...

25
From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs Artificial Intelligence Center SRI International Menlo Park, California (with Douglas Appelt, Chris Culy, David Israel, David Martin, Martin Reddy, Mark Stickel, and Richard Waldinger)

Transcript of From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs Artificial Intelligence Center...

From Question-Answeringto Information-Seeking Dialogs

Jerry R. Hobbs

Artificial Intelligence Center

SRI International

Menlo Park, California

(with Douglas Appelt, Chris Culy, David Israel, David Martin, Martin Reddy, Mark Stickel, and Richard Waldinger)

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 2

Key Ideas

1. Logical analysis/decomposition of questions into component questions, using a reasoning engine

2. Bottoming out in variety of web resources and information extraction engine

3. Use of component questions to drive subsequent dialogue, for elaboration, revision, and clarification

4. Use of analysis of questions to determine, formulate, and present answers.

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 3

Plan of Attack

Inference-Based System:

Inference for Question-Answering -- this year Inference for Dialog Structure -- next year, but starting design this year

Document retrieval and information extraction for question-answering:

Incorporate as resource in inference-based system -- this year

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 4

Composition of Informationfrom Multiple Sources

How far is it from Mascat to Kandahar?

What is the lat/longof Mascat?

What is the distancebetween the two lat/longs?

What is the lat/longof Kandahar?

AlexandrianDigital Library

Gazetteer Geographical Formula

orwww.nau.edu/~cvm/latlongdist.html

QuestionDecomposition

via Logical Rules

ResourcesAttached toReasoning

Process

AlexandrianDigital Library

Gazetteer

GEMINI

SNARK

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 5

Composition of Informationfrom Multiple Sources

Show me the region 100 km north of the capital of Afghanistan.

What is the capitalof Afghanistan?

What is the lat/long100 km north?

What is the lat/longof Kabul?

CIAFact Book Geographical

Formula

QuestionDecomposition

via Logical Rules

AlexandrianDigital Library

Gazetteer

Show thatlat/long

Terravision

ResourcesAttached toReasoning

Process

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 6

Combining Time, Space,and Personal Information

Could Mohammed Atta have met with an Iraqi official between 1998 and 2001?

IE Engine

GeographicalReasoning

QuestionDecomposition

via Logical Rules

ResourceAttached toReasoning

Process

meet(a,b,t) & 1998 t 2001

at(a,x1,t) & at(b,x2,t) & near(x1,x2) & official(b,Iraq)

go(a,x1,t) go(b,x2,t)

IE Engine

TemporalReasoning

Logical Form

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 7

Two Central Systems

GEMINI: Large unification grammar of English Under development for more than a decade Fast parser Generates logical forms Used in ATIS and CommandTalk

SNARK: Large, efficient theorem prover Under development for more than a decade Built-in temporal and spatial reasoners Procedural attachment, incl for web resources Extracts answers from proofs Strategic controls for speed-up

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 8

Linguistic Variation

How far is Mascat from Kandahar?How far is it from Mascat to Kandahar?How far is it from Kandahar to Mascat?How far is it betweeen Mascat and Kandahar?What is the distance from Mascat to Kandahar?What is the distance between Mascat and Kandahar?

GEMINI parses and produces logical forms for most TREC-type queriesUse TACITUS and FASTUS lexicons to augment GEMINI lexiconUnknown word guessing based on "morphology" and immediate context

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 9

"Snarkification"

Problem: GEMINI produces logical forms not completely aligned with what SNARK theories need

Current solution: Write simplification code to map from one to the other

Long-term solution: Logical forms that are aligned better

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 10

Relating Lexical Predicates

to Core Theory Predicates

"... distance ..." "how far ..."

distance-between

Need to write these axioms for every domain we deal withHave illustrative examples

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 11

Decomposition of Questions

lat-long(l1,x) & lat-long(l2,y) & lat-long-distance(d,l1,l2) --> distance-between(d,x,y)

Need axioms relating core theory predicates and predicates from available resourcesHave illustrative examples

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 12

Procedural Attachment

Declaration for certain predicates: There is a procedure for proving it Which arguments are required before called lat-long(l1,x) lat-long-distance(d,l1,l2)

When predicate with those arguments bound is generated in proof, procedure is exectuted.

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 13

Open Agent Architecture

OAA Agent

GEMINI snarkify SNARK

Resources viaOAA Agents

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 14

Use of SMART + TextProQuestion

Subquestion-1

Other Resources

QuestionDecomposition

via Logical Rules

ResourcesAttached toReasoning

Process

Subquestion-2

Subquestion-3

SMART + TextPro

OneResource

AmongMany

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 15

Information ExtractionEngine as a Resource

SMART: Document retrieval for pre-processing

TextPro: Top of the line information extraction engine

Analyze NL query w GEMINI and SNARK

Run TextPro over documents retrieved by SMART

Retrieve best-match passage

Use TextPro annotations or GEMINI analysis to extract answer from passage

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 16

Linking SNARK with TextPro

TextSearch(EntType(?x), Terms(p), Terms(c), WSeq)

& Analyze(WSeq, p(?x,c))

--> p(?x,c)

Call to SMART+TextPro

Type of questionedconstituent

Synonyms and hypernymsof word associated with p or c

Answer:Ordered sequenceof strings of words

Match pieces of answer stringswith pieces of querySubquery generated by SNARK

during analysis of query

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 17

Information ExtractionEngine as a Resource

SMART: Document retrieval for pre-processing

TextPro: Top of the line information extraction engine

Analyze NL query w GEMINI and SNARK

Run TextPro over documents retrieved by SMART

TextPro returns relevant templates

Agent turns templates into logic for SNARK to use in proof

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 18

Domain-Specific Patterns

Decide upon domain (e.g., nonproliferation)

Compile list of principal properties and relations of interest

Implement these patterns in TextPro

Implement link between TextPro and SNARK, converting between templates and logic

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 19

Temporal Reasoning: Structure

Topology of Time: start, end, before, between

Measures of Duration: for an hour, ...

Clock and Calendar: 3:45pm, Wednesday, June 12

Temporal Aggregates: every other Wednesday

Deictic Time: last year, ...

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 20

Temporal Reasoning: Goals

Develop temporal ontology (DAML)

Reason about time in SNARK (AQUAINT, DAML)

Link with Temporal Annotation Standards (AQUAINT)

Answer questions with temporal component (AQUAINT)

Nearly complete

In progress

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 21

Spatial and GeographicalReasoning: Structure

Topology of Space: Is Albania a part of Europe?

Dimensionality

Measures: How large is North Korea? Orientation and Shape: What direction is Monterey from SF?

Latitude and Longitude: Alexandrian Digital Library Gazetteer

Political Divisions: CIA World Fact Book, ...

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 22

Spatial and GeographicalReasoning: Goals

Develop spatial and geographical ontology (DAML)

Reason about space and geography in SNARK (AQUAINT, DAML)

Attach spatial and geographical resources (AQUAINT)

Answer questions with spatial component (AQUAINT)

Somecapability

now

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 23

Dialog Modeling

Key Idea: System matches user's utterance with one of several active tasks. Understanding dialog is one active task.

Rules of form:

property(situation) --> active(Task1)

including

utter(u,w) --> active(DialogTask) want(u,Task1) --> active(Task1)

Understanding is matching utterance (conjunction of predications) with an active task or the condition of an inactive task.

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 24

Dialog Task Model

understand(a,e,t): hear(a,w) & parse(w,e) & match(e,t)

yes Action determinedby utterance and

task

no -- x unmatched

Ask about x

05/15/02Principal Investigator: Jerry R. Hobbs, SRI International 25

Fixed-Domain QA Evaluation

Pick a domain, e.g., nonproliferation

Pick a set of resources, including a corpus of texts, structured databases, web services

Have expert make up 200+ realistic questions, answerable with resources + inference

Divide questions into training and test sets

Give sites one month+ to work on training set

Test on test set