From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences...
-
Upload
monica-garrett -
Category
Documents
-
view
217 -
download
0
Transcript of From Question-Answering to Information-Seeking Dialogs Jerry R. Hobbs USC Information Sciences...
From Question-Answeringto Information-Seeking Dialogs
Jerry R. Hobbs
USC Information Sciences Institute
Marina del Rey, California
(with Chris Culy, Douglas Appelt, David Israel, Peter Jarvis, David Martin, Mark Stickel, and Richard Waldinger of SRI)
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 2
Key Ideas
1. Logical analysis/decomposition of questions into component questions, using a reasoning engine
2. Bottoming out in variety of web resources and information extraction engine
3. Use of component questions to drive subsequent dialogue, for elaboration, revision, and clarification
4. Use of analysis of questions to determine, formulate, and present answers.
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 3
Plan of Attack
Inference-Based System:
Inference for Question-Answering -- this year Inference for Dialog Structure -- beginning now
Incorporate Resources: Geographical Reasoning -- this year Temporal Reasoning -- this summer Agent and action ontology -- this summer Document retrieval and information extraction for question-answering -- beginning now
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 4
An Information-Seeking ScenarioHow safe is the Mascat harbor for refueling US Navy ships?
What recent terroristincidents in Oman?
Are relations betweenOman and US friendly?
How secure is the Mascat harbor?
IR + IE Enginefor searching
recent news feeds
Find map of harborfrom DAML-encoded
Semantic Web/Intelink
Ask Analyst
QuestionDecomposition
via Logical Rules
ResourcesAttached toReasoning
Process
Asking Useris one suchResource
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 5
Composition of Informationfrom Multiple Sources
How far is it from Mascat to Kandahar?
What is the lat/longof Mascat?
What is the distancebetween the two lat/longs?
What is the lat/longof Kandahar?
AlexandrianDigital Library
Gazetteer Geographical Formula
orwww.nau.edu/~cvm/latlongdist.html
QuestionDecomposition
via Logical Rules
ResourcesAttached toReasoning
Process
AlexandrianDigital Library
Gazetteer
GEMINI
SNARK
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 6
Composition of Informationfrom Multiple Sources
Show me the region 100 km north of the capital of Afghanistan.
What is the capitalof Afghanistan?
What is the lat/long100 km north?
What is the lat/longof Kabul?
CIAFact Book Geographical
Formula
QuestionDecomposition
via Logical Rules
AlexandrianDigital Library
Gazetteer
Show thatlat/long
Terravision
ResourcesAttached toReasoning
Process
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 7
Combining Time, Space,and Personal Information
Could Mohammed Atta have met with an Iraqi official between 1998 and 2001?
IE Engine
GeographicalReasoning
QuestionDecomposition
via Logical Rules
ResourceAttached toReasoning
Process
meet(a,b,t) & 1998 t 2001
at(a,x1,t) & at(b,x2,t) & near(x1,x2) & official(b,Iraq)
go(a,x1,t) go(b,x2,t)
IE Engine
TemporalReasoning
Logical Form
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 8
System Architecture
GEMINI
SNARK
Query
Logical Form
Web Resources Other Resources
parsing
decomposition and interpretation
Proof withAnswer
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 9
Two Central Systems
GEMINI: Large unification grammar of English Under development for more than a decade Fast parser Generates logical forms Used in ATIS and CommandTalk
SNARK: Large, efficient theorem prover Under development for more than a decade Built-in temporal and spatial reasoners Procedural attachment, incl for web resources Extracts answers from proofs Strategic controls for speed-up
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 10
Linguistic Variation
How far is Mascat from Kandahar?How far is it from Mascat to Kandahar?How far is it from Kandahar to Mascat?How far is it betweeen Mascat and Kandahar?What is the distance from Mascat to Kandahar?What is the distance between Mascat and Kandahar?
GEMINI parses and produces logical forms for most TREC-type queriesUse TACITUS and FASTUS lexicons to augment GEMINI lexiconUnknown word guessing based on "morphology" and immediate context
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 11
"Snarkification"
Problem: GEMINI produces logical forms not completely aligned with what SNARK theories need
Current solution: Write simplification code to map from one to the other
Long-term solution: Logical forms that are aligned better
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 12
Relating Lexical Predicates
to Core Theory Predicates
"... distance ..." "how far ..."
distance-between
Need to write these axioms for every domain we deal withHave illustrative examples
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 13
Decomposition of Questions
lat-long(l1,x) & lat-long(l2,y) & lat-long-distance(d,l1,l2) --> distance-between(d,x,y)
Need axioms relating core theory predicates and predicates from available resourcesHave illustrative examples
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 14
Procedural Attachment
Declaration for certain predicates: There is a procedure for proving it Which arguments are required before called lat-long(l1,x) lat-long-distance(d,l1,l2)
When predicate with those arguments bound is generated in proof, procedure is exectuted.
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 15
Open Agent Architecture
OAA Agent
GEMINI snarkify SNARK
Resources viaOAA Agents
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 16
Use of SMART + TextProQuestion
Subquestion-1
Other Resources
QuestionDecomposition
via Logical Rules
ResourcesAttached toReasoning
Process
Subquestion-2
Subquestion-3
SMART + TextPro
OneResource
AmongMany
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 17
Information ExtractionEngine as a Resource
Document retrieval for pre-processing
TextPro: Top of the line information extraction engine recognizes subject-verb-object, coref rels
Analyze NL query w GEMINI and SNARK
Bottom out in a pattern for TextPro to seek
Keyword search on very large corpus
TextPro runs over documents retrieved
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 18
Linking SNARK with TextPro
TextSearch(EntType(?x), Terms(p), Terms(c), WSeq)
& Analyze(WSeq, p(?x,c))
--> p(?x,c)
Call to TextPro
Type of questionedconstituent
Synonyms and hypernymsof word associated with p or c
Answer:Ordered sequence
of annotated strings of words
Match pieces of annotated answer strings with pieces of query
Subquery generated by SNARKduring analysis of query
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 19
Three Modes of Operationfor TextPro
1. Search for predefined patterns and relations (ACE-style) and translate relations into SNARK's logic
Where does the CEO of IBM live?
2. Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM."
3. Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano .... CEO .... IBM."
Use coreference links to get most informative answer
ACE Roleand AT
Relations
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 20
First Mode
TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, Role(?x,Management,IBM,CEO)) --> CEO(?x,IBM)
CEO(Samuel Palmisano,IBM)
Analyze
Entity1: {Samuel Palmisano, Palmisano, head, he}Entity2: {IBM, International Business Machines, they}Relation: Role(Entity1,Entity2, Management,CEO)
<relation TYPE=Role SUBTYPE=Management> <rel_entity_arg ID=“Entity1” ARGNUM=“1”/> <rel_entity_arg ID=“Entity2” ARGNUM=“2”/> <rel_attribute ATTR=“POSITION”>CEO</rel_attribute></relation>
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 21
Three Modes of Operationfor TextPro
1. Search for predefined patterns (MUC-style) and translate template into SNARK's logic Where does the CEO of IBM live?
2. Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM."
3. Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano .... CEO .... IBM."
Use coreference links to get most informative answer
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 22
Second Mode
TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, CEO(?x,IBM)) --> CEO(?x,IBM)
"<subj> Samuel Palmisano </subj> <verb> heads </verb> <obj> IBM </obj>"
CEO(Samuel Palmisano,IBM)
Analyze
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 23
Three Modes of Operationfor TextPro
1. Search for predefined patterns (MUC-style) and translate template into SNARK's logic Where does the CEO of IBM live?
2. Search for subject-verb-object relations in processed text that matches predicate-argument structure of SNARK's logical expression "Samuel Palmisano is CEO of IBM."
3. Search for passage with highest density of relevant words and entity of right type for answer "Samuel Palmisano .... CEO .... IBM."
Use coreference links to get most informative answer
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 24
Third Mode
TextSearch(Person, Terms(CEO), Terms(IBM), WSeq) & Analyze(WSeq, CEO(?x,IBM)) --> CEO(?x,IBM)
"<person> He </person> has recently been rumored to have been
appointed Lou Gerstner's successor as <CEOword> CEO </CEOword>of the major computer maker nicknamed <co> Big Blue </co>"
CEO(Samuel Palmisano,IBM)
Analyze
"<person> Samuel Palmisano </person> ...."
coref
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 25
Domain-Specific Patterns
Decide upon domain (e.g., nonproliferation)
Compile list of principal properties and relations of interest
Implement these patterns in TextPro
Implement link between TextPro and SNARK, converting between templates and logic
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 26
Challenges
Cross-document identification of individuals Document 1: Osama bin Laden Document 2: bin Laden Document 3: Usama bin Laden
Do entities with the same or similar names represent the same individual?
Metonymy Text: Beijing approved the UN resolution on Iraq. Query involves “China”, not “Beijing”
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 27
DAML Search Engine
pred:
arg1:
arg2: Indonesia
?x
capital namespace
namespace
namespace
Searches entire(soon to be
exponentially growing)Semantic Web
Also conjunctive queries: population of capital of Indonesia
Problem: you have to know logic and RDF to use it.
Tecknowledge has developed:
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 28
DAML Search Engineas AQUAINT Web
Resource
pred:
arg1:
arg2: Indonesia
?x
capital namespace
namespace
namespace
Searches entire(soon to be
exponentially growing)Semantic Web
Solution: You only have to know English to use it; Makes the entire Semantic Web accessible to AQUAINT users.
AQUAINT System
capital(?x,Indonesia)
procedural attachment in SNARK
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 29
Temporal Reasoning: Structure
Topology of Time: start, end, before, between
Measures of Duration: for an hour, ...
Clock and Calendar: 3:45pm, Wednesday, June 12
Temporal Aggregates: every other Wednesday
Deictic Time: last year, ...
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 30
Temporal Reasoning: Goals
Develop temporal ontology (DAML)
Reason about time in SNARK (AQUAINT, DAML)
Link with Temporal Annotation Language TimeML (AQUAINT)
Answer questions with temporal component (AQUAINT)
Nearly complete
In progress
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 31
Convergence
DAML Annotationof Temporal Information
on Web(DAML-Time)
Annotation of Temporal Information
in Text(TimeML)
Most information on Web is in text
The two annotation schemesshould be intertranslatable
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 32
TimeML Annotation Scheme(An Abstract View)
2001
6 mos
Sept 11
warning
clock & calendar intervals& instants
intervalsinclusion
beforedurations
instantaneousevents
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 33
TimeML Example
The top commander of a Cambodian resistance force said Thursdayhe has sent a team to recover the remains of a British mine removalexpert kidnapped and presumed killed by Khmer Rouge guerrillastwo years ago.
resist
command
sent recover
Thursday
said now
remove kidnap
2 years
presumed
killedremain
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 34
Vision
Manual DAML temporal annotation of web resources
Manual temporal annotation of large NL corpus
Programs for automatic temporal annotation of NL text
Automatic DAML temporal annotation of web resources
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 35
Spatial and GeographicalReasoning: Structure
Topology of Space: Is Albania a part of Europe?
Dimensionality
Measures: How large is North Korea? Orientation and Shape: What direction is Monterey from SF?
Latitude and Longitude: Alexandrian Digital Library Gazetteer
Political Divisions: CIA World Fact Book, ...
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 36
Spatial and GeographicalReasoning: Goals
Develop spatial and geographical ontology (DAML)
Reason about space and geography in SNARK (AQUAINT, DAML)
Attach spatial and geographical resources (AQUAINT)
Answer questions with spatial component (AQUAINT)
Somecapability
now
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 37
Rudimentary Ontologyof Agents and Actions
Persons and their properties and relations:
name, alias, (principal) residence family and friendship relationships movements and interactions
Actions/events:
types of actions/events preconditions and effects
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 38
Domain-DependentOntologies
Nonproliferation data and task
Construct relevant ontologies
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 39
Dialog Modeling:Approaching It Top Down
Key Idea: System matches user's utterance with one of several active tasks. Understanding dialog is one active task.
Rules of form:
property(situation) --> active(Task1)
including
utter(u,w) --> active(DialogTask) want(u,Task1) --> active(Task1)
Understanding is matching utterance (conjunction of predications) with an active task or the condition of an inactive task.
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 40
Dialog Task Model
understand(a,e,t): hear(a,w) & parse(w,e) & match(e,t)
yes Action determinedby utterance and
task
no -- x unmatched
Ask about x
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 41
Dialog Modeling:Approaching It Bottom Up
identify[x | p(x)] ==> identify[x | p(x) & q(x)]
Clarification: Show me St Petersburg. Florida or Russia? Refinement: Show me a lake in Israel. Bigger than 100 sq mi.
identify[x | p(x)] ==> identify[x | p1(x)], where p and p1 are related
Further properties: What's the area of the Dead Sea? The depth? Change of parameter: Show me a lake in Israel. Jordan. Correction: Show me Bryant, Texas. Bryan.
identify[y | y=f(x)] ==> identify[z | z=g(y)]
Piping: What is the capital of Oman? What's its population?
Challenge: Narrowing in on information need.
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 42
Fixed-Domain QA Evaluation:Why?
Who is Colin Powell? What is naproxen?
Broad range of domains ==> shallow processing
Relatively small fixed domain ==> possibility of deeper processing
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 43
Fixed-Domain QA Evaluation
Pick a domain, e.g., nonproliferation
Pick a set of resources, including a corpus of texts, structured databases, web services
Pick 3-4 pages of Text in domain (to constrain knowledge)
Have expert make up 200+ realistic questions, answerable with Text + non-NL resources + inference (maybe + explicit NL resources)
Divide questions into training and test sets
Give sites one month+ to work on training set
Test on test set and analyze results
10/24/02Principal Investigator: Jerry R. Hobbs, USC/ISI 44
Some Issues
Range of questions from easy to impossible
Form of questions: question templates? let data determine -- maybe 90% manually produced logical forms?
Form of answers: natural language or XML templates?
Isolated questions or sequences related to fixed scenario? Some of each
Community interest: Half a dozen sites might participate if difficulties worked out