QUIRK: QU estion Answering = I nformation R etrieval + K nowledge

32
QUIRK: QUIRK: QU QU estion Answering = estion Answering = I I nformation nformation R R etrieval + etrieval + K K nowledge nowledge Cycorp IBM Presenter: Stefano Bertolo (Cycorp)

description

QUIRK: QU estion Answering = I nformation R etrieval + K nowledge. Cycorp IBM Presenter: Stefano Bertolo (Cycorp). Project Goals. Break answer-by-retrieval bottleneck Deep (semantic) understanding of queries and answers Integration of heterogeneous sources - PowerPoint PPT Presentation

Transcript of QUIRK: QU estion Answering = I nformation R etrieval + K nowledge

Page 1: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

QUIRK: QUIRK: QUQUestion Answering = estion Answering = IInformation nformation RRetrieval + etrieval + KKnowledgenowledge

Cycorp

IBM

Presenter: Stefano Bertolo (Cycorp)

Page 2: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Project GoalsProject Goals

Break answer-by-retrieval bottleneckDeep (semantic) understanding of

queries and answersIntegration of heterogeneous

sourcesFormalized knowledge to integrate

state-of-the-art IR components with state-of-the-art knowledge bases

Page 3: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Answer by retrievalAnswer by retrieval

Q: Who was the first president of Zambia?

………………………………………… Kenneth Kaunda, the first president, kept Zambia within the Commonwealth of Nations… …………………………..

Page 4: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Answer by reasoningAnswer by reasoning

Q: Who sponsored Kai’s attack against Pamina?

…On February 13, Kai detonated the truck in front of Pamina’s HQ…

…On January 25, Kai bought a truckload of fertilizer drawing against account 9999 at MegaBank…

… On January 15, Vitas Bayo deposited $50,000 on account 9999 at MegaBank…

Page 5: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge
Page 6: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge
Page 7: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge
Page 8: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge
Page 9: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

QUIRK strategyQUIRK strategy

Use Formalized knowledge for:– Semantic understanding of queries;– Justification of answers;

Use Formalized knowledge as:– Format for data normalization– ‘Glue’ for data integration of:

• information extracted from unstructured data• SQL queries against structured DBs• Cyc’s knowledge

Page 10: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Blackboard

Query Manager

Answer Manager

Inference Agent

IR Agent

Cyc KB

GuruQA

(IBM)

DB1

DB2

DB-N

Preemptive annotations

Unstructured

Documents

Page 11: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Q-Eng

A-Eng

Q-CycL

A-CycL

Q-Guru

A-Guru

Query Interpreter GuruQA Assistant

GuruQA (IBM)

Cyc English Generator Cyc Inference EngineAnswer Manager

Query Refiner

Blackboard

Page 12: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Blackboard architectureBlackboard architecture

Add/remove agents without disrupting existing architecture

Test performance/speed with several combinations of agents

Operate asynchronously.

Page 13: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Query InterpreterQuery Interpreter

Q: “Who opposes the WTO?”

(and (isa ?WHO Person)

(thereExists ?EVENT

(and (isa ?EVENT ActOfDissent)

(performedBy ?EVENT ?WHO)

(maleficiary ?EVENT WorldTradeOrganization))))

Page 14: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

GuruQA AssistantGuruQA Assistant

CycL query =>

PERSON$ oppose(s/d) the WTO

denounce(s/d) the World Trade Organization

attacke(s/d)

Page 15: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Cyc Inference EngineCyc Inference Engine

CycL Query =>

[(PersonNamedFn “Kai”) JUSTIFICATION-1]

[(PersonNamedFn “Dr. Chen”) JUSTIFICATION-2]

[(PersonNamedFn “Kai”) JUSTIFICATION-N]

Page 16: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Cyc JustificationsCyc Justifications

A?

A from [B and C] (source 6743)

B from source 67430

C from source 78539

Page 17: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Sources for Cyc InferenceSources for Cyc Inference

1.4M+ CycL assertions already in Cyc’s Knowledge Base

Virtual Assertions in DataBases

Unsupervised Textract / CycL annotation of unstructured documents

Page 18: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Data Source IntegrationData Source Integration

Data Normalization

Data Fusion

Page 19: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Data NormalizationData Normalization

Interpretation

Search

cat chat Katze gato gatto “felis felis”

cat OR chat OR Katze OR gato OR gatto OR “felis felis”

Page 20: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Data NormalizationData Normalization

…Zhang Mei Li, was born on January 1, 1927…

Name DOBZhang Mei Li 01-01-1927

… …

(birthDate (PersonNamedFn “Zhang Mei Li”) (DayFn 01 (MonthFn January (YearFn 1927))))

Page 21: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Data NormalizationData Normalization

language independent representation of- entities- concepts- relationships

CycL contains 100K+ primitives, cancompositionally define infinitely many non-atomic terms.

Page 22: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Data FusionData Fusion

Dr. Chen lives in FresnoZhang Mei Li lives in OaklandKai lives in Los AngelesCalifornia is in the Pacific Time Zone

Dr. Chen/Zhang Mei Li/Kai and Dr. Chen/Zhang Mei Li/Kai live in the same time zone

Page 23: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

HeterogeneousHeterogeneous Sources Sources

Q: How old is Dr. Chen’s mother?

…Zhang Mei Li, mother of Pamina’s Dr. Chen…

Name DOBZhang Mei Li 01-01-1927

… …

Page 24: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Data FusionData Fusion

Requires language independent connections/inferential links among

- Entities- Concepts- Propositions (Facts, Rules)Cyc’s OntologyCyc’s Knowledge Base

Page 25: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Consensus RealityConsensus Reality

Formalized Knowledge about `Consensus Reality’ = inferentially enabled `glue’ for Data Fusion

E.g. “Was Kai implicated in the Munich 1972 attack (when he was a toddler of 2)?”

Page 26: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

DBs as `virtual assertions’ storesDBs as `virtual assertions’ stores

(birthDate

(PersonNamedFn “Zhang Mei Li)

?WHEN)

SELECT: DOB

FROM: PERSONAL_DATA

WHERE: NAME = “Zhang Mei Li”

Page 27: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Unsupervised Textract / CycL AnnotationsUnsupervised Textract / CycL Annotations

IBM Textract relations:

[Cycorp, Inc. : located-in : Austin, TX]

mapped to CycL Assertions:

(objectFoundInLocation

Cycorp CityOfAustinTX)

Page 28: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Augmenting Textract AnnotationsAugmenting Textract Annotations

Concept Annotation“Boston” { CityOfBostonMA, BostonTheBand, … }

Word Sense Disambiguation“I went to Boston” CityOfBostonMA

Analysis of nominal compounds“leather jacket”

(SubcollectionOfWithRelationToTypeFn

Jacket mainConstituent Leather)

Page 29: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Unsupervised CycL AnnotationsUnsupervised CycL Annotations

IBM’s Nominator and Parsers to extract Named Entities and basic syntactic dependencies (SUBJ-VERB, VERB-OBJ)

Map dependencies to CycL event structures.

Page 30: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Cyc-to-English generatorCyc-to-English generator

(PersonNamedFn “Dr. Chen”) JUSTIFICATION-N

“Dr. Chen opposes the WTO, because people who demonstrate against organizations oppose them (Cyc KB, assertion 99999) and Dr. Chen demonstrated against the WTO in Seattle (document 12345).

Page 31: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Year 1 TasksYear 1 Tasks

Get entire system to run robustly with integration of all the IBM and Cycorp components described

Improve question understanding and refinement

Broaden coverage of English to CycL mapping enabling annotation of large collection of documents

Page 32: QUIRK:  QU estion Answering  =  I nformation  R etrieval +  K nowledge

Year 2 TasksYear 2 Tasks

Add new agents to the blackboard to represent the user and session context

Improve integration of answers obtained from GuruQA and Cyc

Improve integrated IBM and Cycorp modules for unstructured document annotation