From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences...

45
From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences Institute Marina del Rey, California (with Chris Culy, Douglas Appelt, David Israel, Peter Jarvis, David Martin, Mark Stickel, and Richard Waldinger of SRI)

Transcript of From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences...

From Question-Answeringto Information-Seeking Dialogs

Jerry R. Hobbs

USC Information Sciences Institute

Marina del Rey, California

(with Chris Culy, Douglas Appelt, David Israel, Peter Jarvis, David Martin, Mark Stickel, and Richard Waldinger of SRI)

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 2

Key Ideas

1. Logical analysis/decomposition of questions into component questions, using a reasoning engine

2. Bottoming out in variety of web resources and information extraction engine

3. Use of component questions to drive subsequent dialogue, for elaboration, revision, and clarification

4. Use of analysis of questions to determine, formulate, and present answers.

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 3

Plan of Attack

Inference-Based System:

Inference for Question-Answering -- this year Inference for Dialog Structure -- beginning now

Incorporate Resources: Geographical Reasoning -- this year Temporal Reasoning -- this summer Agent and action ontology -- this summer Document retrieval and information extraction for question-answering -- beginning now

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 4

An Information-Seeking ScenarioHow safe is the Mascat harbor for refueling US Navy ships?

What recent terroristincidents in Oman?

Are relations betweenOman and US friendly?

How secure is the Mascat harbor?

IR + IE Enginefor searching

recent news feeds

Find map of harborfrom DAML-encoded

Semantic Web/Intelink

Ask Analyst

QuestionDecomposition

via Logical Rules

ResourcesAttached toReasoning

Process

Asking Useris one suchResource

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 5

Composition of Informationfrom Multiple Sources

How far is it from Mascat to Kandahar?

What is the lat/longof Mascat?

What is the distancebetween the two lat/longs?

What is the lat/longof Kandahar?

AlexandrianDigital Library

Gazetteer Geographical Formula

orwww.nau.edu/~cvm/latlongdist.html

QuestionDecomposition

via Logical Rules

ResourcesAttached toReasoning

Process

AlexandrianDigital Library

Gazetteer

GEMINI

SNARK

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 6

Composition of Informationfrom Multiple Sources

Show me the region 100 km north of the capital of Afghanistan.

What is the capitalof Afghanistan?

What is the lat/long100 km north?

What is the lat/longof Kabul?

CIAFact Book Geographical

Formula

QuestionDecomposition

via Logical Rules

AlexandrianDigital Library

Gazetteer

Show thatlat/long

Terravision

ResourcesAttached toReasoning

Process

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 7

Combining Time, Space,and Personal Information

Could Mohammed Atta have met with an Iraqi official between 1998 and 2001?

IE Engine

GeographicalReasoning

QuestionDecomposition

via Logical Rules

ResourceAttached toReasoning

Process

meet(a,b,t) & 1998 t 2001

at(a,x1,t) & at(b,x2,t) & near(x1,x2) & official(b,Iraq)

go(a,x1,t) go(b,x2,t)

IE Engine

TemporalReasoning

Logical Form

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 8

System Architecture

GEMINI

SNARK

Query

Logical Form

Web Resources Other Resources

parsing

decomposition and interpretation

Proof withAnswer

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 9

Two Central Systems

GEMINI: Large unification grammar of English Under development for more than a decade Fast parser Generates logical forms Used in ATIS and CommandTalk

SNARK: Large, efficient theorem prover Under development for more than a decade Built-in temporal and spatial reasoners Procedural attachment, incl for web resources Extracts answers from proofs Strategic controls for speed-up

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 10

Linguistic Variation

How far is Mascat from Kandahar?How far is it from Mascat to Kandahar?How far is it from Kandahar to Mascat?How far is it betweeen Mascat and Kandahar?What is the distance from Mascat to Kandahar?What is the distance between Mascat and Kandahar?

GEMINI parses and produces logical forms for most TREC-type queriesUse TACITUS and FASTUS lexicons to augment GEMINI lexiconUnknown word guessing based on "morphology" and immediate context

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 11

"Snarkification"

Problem: GEMINI produces logical forms not completely aligned with what SNARK theories need

Current solution: Write simplification code to map from one to the other

Long-term solution: Logical forms that are aligned better

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 12

Relating Lexical Predicates

to Core Theory Predicates

"... distance ..." "how far ..."

distance-between

Need to write these axioms for every domain we deal withHave illustrative examples

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 13

Decomposition of Questions

lat-long(l1,x) & lat-long(l2,y) & lat-long-distance(d,l1,l2) --> distance-between(d,x,y)

Need axioms relating core theory predicates and predicates from available resourcesHave illustrative examples

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 14

Procedural Attachment

Declaration for certain predicates: There is a procedure for proving it Which arguments are required before called lat-long(l1,x) lat-long-distance(d,l1,l2)

When predicate with those arguments bound is generated in proof, procedure is exectuted.

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 15

Open Agent Architecture

OAA Agent

GEMINI snarkify SNARK

Resources viaOAA Agents

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 16

Use of SMART + TextProQuestion

Subquestion-1

Other Resources

QuestionDecomposition

via Logical Rules

ResourcesAttached toReasoning

Process

Subquestion-2

Subquestion-3

SMART + TextPro

OneResource

AmongMany

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 17

Information ExtractionEngine as a Resource

Document retrieval for pre-processing

TextPro: Top of the line information extraction engine recognizes subject-verb-object, coref rels

Analyze NL query w GEMINI and SNARK

Bottom out in a pattern for TextPro to seek

Keyword search on very large corpus

TextPro runs over documents retrieved

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 18

Linking SNARK with TextPro

TextSearch(EntType(?x), Terms(p), Terms(c), WSeq)

& Analyze(WSeq, p(?x,c))

--> p(?x,c)

Call to TextPro

Type of questionedconstituent

Synonyms and hypernymsof word associated with p or c

Answer:Ordered sequence

of annotated strings of words

Match pieces of annotated answer strings with pieces of query

Subquery generated by SNARKduring analysis of query

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 19

Three Modes of Operationfor TextPro

1. Search for predefined patterns and relations (ACE-style) and translate relations into SNARK's logic

Where does the CEO of IBM live?

2. Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM."

3. Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano .... CEO .... IBM."

Use coreference links to get most informative answer

ACE Roleand AT

Relations

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 20

First Mode

TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, Role(?x,Management,IBM,CEO)) --> CEO(?x,IBM)

CEO(Samuel Palmisano,IBM)

Analyze

Entity1: {Samuel Palmisano, Palmisano, head, he}Entity2: {IBM, International Business Machines, they}Relation: Role(Entity1,Entity2, Management,CEO)

<relation TYPE=Role SUBTYPE=Management> <rel_entity_arg ID=“Entity1” ARGNUM=“1”/> <rel_entity_arg ID=“Entity2” ARGNUM=“2”/> <rel_attribute ATTR=“POSITION”>CEO</rel_attribute></relation>

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 21

Three Modes of Operationfor TextPro

1. Search for predefined patterns (MUC-style) and translate template into SNARK's logic Where does the CEO of IBM live?

2. Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM."

3. Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano .... CEO .... IBM."

Use coreference links to get most informative answer

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 22

Second Mode

TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, CEO(?x,IBM)) --> CEO(?x,IBM)

"<subj> Samuel Palmisano </subj> <verb> heads </verb> <obj> IBM </obj>"

CEO(Samuel Palmisano,IBM)

Analyze

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 23

Three Modes of Operationfor TextPro

1. Search for predefined patterns (MUC-style) and translate template into SNARK's logic Where does the CEO of IBM live?

2. Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM."

3. Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano .... CEO .... IBM."

Use coreference links to get most informative answer

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 24

Third Mode

TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, CEO(?x,IBM)) --> CEO(?x,IBM)

"<person> He </person> has recently been rumored to have been

appointed Lou Gerstner's successor as <CEOword> CEO </CEOword>of the major computer maker nicknamed <co> Big Blue </co>"

CEO(Samuel Palmisano,IBM)

Analyze

"<person> Samuel Palmisano </person> ...."

coref

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 25

Domain-Specific Patterns

Decide upon domain (e.g., nonproliferation)

Compile list of principal properties and relations of interest

Implement these patterns in TextPro

Implement link between TextPro and SNARK, converting between templates and logic

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 26

Challenges

Cross-document identification of individuals Document 1: Osama bin Laden Document 2: bin Laden Document 3: Usama bin Laden

Do entities with the same or similar names represent the same individual?

Metonymy Text: Beijing approved the UN resolution on Iraq. Query involves “China”, not “Beijing”

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 27

DAML Search Engine

pred:

arg1:

arg2: Indonesia

?x

capital namespace

namespace

namespace

Searches entire(soon to be

exponentially growing)Semantic Web

Also conjunctive queries: population of capital of Indonesia

Problem: you have to know logic and RDF to use it.

Tecknowledge has developed:

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 28

DAML Search Engineas AQUAINT Web

Resource

pred:

arg1:

arg2: Indonesia

?x

capital namespace

namespace

namespace

Searches entire(soon to be

exponentially growing)Semantic Web

Solution: You only have to know English to use it; Makes the entire Semantic Web accessible to AQUAINT users.

AQUAINT System

capital(?x,Indonesia)

procedural attachment in SNARK

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 29

Temporal Reasoning: Structure

Topology of Time: start, end, before, between

Measures of Duration: for an hour, ...

Clock and Calendar: 3:45pm, Wednesday, June 12

Temporal Aggregates: every other Wednesday

Deictic Time: last year, ...

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 30

Temporal Reasoning: Goals

Develop temporal ontology (DAML)

Reason about time in SNARK (AQUAINT, DAML)

Link with Temporal Annotation Language TimeML (AQUAINT)

Answer questions with temporal component (AQUAINT)

Nearly complete

In progress

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 31

Convergence

DAML Annotationof Temporal Information

on Web(DAML-Time)

Annotation of Temporal Information

in Text(TimeML)

Most information on Web is in text

The two annotation schemesshould be intertranslatable

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 32

TimeML Annotation Scheme(An Abstract View)

2001

6 mos

Sept 11

warning

clock & calendar intervals& instants

intervalsinclusion

beforedurations

instantaneousevents

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 33

TimeML Example

The top commander of a Cambodian resistance force said Thursdayhe has sent a team to recover the remains of a British mine removalexpert kidnapped and presumed killed by Khmer Rouge guerrillastwo years ago.

resist

command

sent recover

Thursday

said now

remove kidnap

2 years

presumed

killedremain

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 34

Vision

Manual DAML temporal annotation of web resources

Manual temporal annotation of large NL corpus

Programs for automatic temporal annotation of NL text

Automatic DAML temporal annotation of web resources

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 35

Spatial and GeographicalReasoning: Structure

Topology of Space: Is Albania a part of Europe?

Dimensionality

Measures: How large is North Korea? Orientation and Shape: What direction is Monterey from SF?

Latitude and Longitude: Alexandrian Digital Library Gazetteer

Political Divisions: CIA World Fact Book, ...

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 36

Spatial and GeographicalReasoning: Goals

Develop spatial and geographical ontology (DAML)

Reason about space and geography in SNARK (AQUAINT, DAML)

Attach spatial and geographical resources (AQUAINT)

Answer questions with spatial component (AQUAINT)

Somecapability

now

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 37

Rudimentary Ontologyof Agents and Actions

Persons and their properties and relations:

name, alias, (principal) residence family and friendship relationships movements and interactions

Actions/events:

types of actions/events preconditions and effects

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 38

Domain-DependentOntologies

Nonproliferation data and task

Construct relevant ontologies

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 39

Dialog Modeling:Approaching It Top Down

Key Idea: System matches user's utterance with one of several active tasks. Understanding dialog is one active task.

Rules of form:

property(situation) --> active(Task1)

including

utter(u,w) --> active(DialogTask) want(u,Task1) --> active(Task1)

Understanding is matching utterance (conjunction of predications) with an active task or the condition of an inactive task.

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 40

Dialog Task Model

understand(a,e,t): hear(a,w) & parse(w,e) & match(e,t)

yes Action determinedby utterance and

task

no -- x unmatched

Ask about x

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 41

Dialog Modeling:Approaching It Bottom Up

identify[x | p(x)] ==> identify[x | p(x) & q(x)]

Clarification: Show me St Petersburg. Florida or Russia? Refinement: Show me a lake in Israel. Bigger than 100 sq mi.

identify[x | p(x)] ==> identify[x | p1(x)], where p and p1 are related

Further properties: What's the area of the Dead Sea? The depth? Change of parameter: Show me a lake in Israel. Jordan. Correction: Show me Bryant, Texas. Bryan.

identify[y | y=f(x)] ==> identify[z | z=g(y)]

Piping: What is the capital of Oman? What's its population?

Challenge: Narrowing in on information need.

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 42

Fixed-Domain QA Evaluation:Why?

Who is Colin Powell? What is naproxen?

Broad range of domains ==> shallow processing

Relatively small fixed domain ==> possibility of deeper processing

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 43

Fixed-Domain QA Evaluation

Pick a domain, e.g., nonproliferation

Pick a set of resources, including a corpus of texts, structured databases, web services

Pick 3-4 pages of Text in domain (to constrain knowledge)

Have expert make up 200+ realistic questions, answerable with Text + non-NL resources + inference (maybe + explicit NL resources)

Divide questions into training and test sets

Give sites one month+ to work on training set

Test on test set and analyze results

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 44

Some Issues

Range of questions from easy to impossible

Form of questions: question templates? let data determine -- maybe 90% manually produced logical forms?

Form of answers: natural language or XML templates?

Isolated questions or sequences related to fixed scenario? Some of each

Community interest: Half a dozen sites might participate if difficulties worked out

10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 45

Next Steps

Pick several candidate Texts

Researchers and experts generate questions from those Texts