1 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Interactive Knowledge Representation...
-
Upload
neil-miller -
Category
Documents
-
view
218 -
download
0
Transcript of 1 USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT IKRAFT: Interactive Knowledge Representation...
1USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
IKRAFT:Interactive Knowledge Representation and
Acquisition from Text
Yolanda GilVarun Ratnakar
www.isi.edu/expect/projects/trellis
trellis.semanticweb.org
USC/Information Sciences Institute
2USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Motivation:How KBs Are Built Today
Knowledge Acquisition
Tools
Read/ask /study/listen...
…reason/deduce/solve
…analyze/group/index...
…structure/relate/fit...
KB
Domain Expert
Knowledge Engineer
3USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Motivation:The Aftermath of Knowledge Base Development
Knowledge Acquisition
Tools…reason/deduce/solve
Read/ask /study/listen...
…analyze/group/index...
…structure/relate/fit...
KB
Domain Expert
Knowledge EngineerTRASH
4USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Motivation:Capturing the Design of Knowledge Bases
((( )) ())))
Richer representationsMore ambiguous
More versatile
(defconcept bridge ()))
More formalMore concrete
More introspectible
Introductory texts, expert hints, explanations, dialogues, comments, examples, exceptions,...
Info. extraction templates,dialogue segments and pegs,filled-out forms, high-level connections,...
Alternative formalizations (KIF, MELD, RDF,…), alternative views of the same notion (e.g., what is a threat)
Descriptions augmented with prototypical examples & exceptions, problem-solving steps and substeps, ...
WWW
5USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Claims Knowledge can be reused at any level of (in)formality Knowledge can be extended more easily
• Addt’l documents and semi-formal structures readily available Knowledge can be translated and integrated at any
level to facilitate interoperability• KR languages can be a straitjacket for some kinds of
knowledge Intelligent systems will provide better justifications
• Many users want to know where axioms came from before they trust system’s reasoning
Content providers will not need to be sophisticated programmers/knowledge engineers• May be easier for end users to organize knowledge rather
than formalize it• Good symbiosis of sophisticated and unsophisticated users
6USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
An Example:Building a Knowledge Base from a Textbook(DARPA Rapid Knowledge Formation -- RKF)“…The first step a cell takes in reading out part of its genetic instructions is to
copy the required portion of the nucleotide sequence of DNA – the gene – into a nucleotide sequence of RNA. The process is called transcription because the information, though copied into another chemical form, is still written in essentially the same language – the language of nucleotides. Like DNA, RNA is a linear polymer made of four different types of nucleotides subunits linked together by phosphodiester bonds. It differs from DNA chemically in two respects: (1) the nucleotides in RNA are ribonucleotides – that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains uracil (U) instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen-bonding with A, the base-pairing properties described for DNA also apply to RNA…”
-- Essential Cell Biology, Alberts et al. 1992
7USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Protein Synthesis in RKF’s SHAKEN Authored by a Biologist [Chaudri et al 2001]
8USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Step 1: Selecting Relevant Knowledge Fragments
“…The first step a cell takes in reading out part of its genetic instructions is to copy the required portion of the nucleotide sequence of DNA – the gene – into a nucleotide sequence of RNA. The process is called transcription because the information, though copied into another chemical form, is still written in essentially the same language – the language of nucleotides. Like DNA, RNA is a linear polymer made of four different types of nucleotides subunits linked together by phosphodiester bonds. It differs from DNA chemically in two respects: (1) the nucleotides in RNA are ribonucleotides – that is, they contain the sugar ribose (hence the name ribonucleic acid) rather than deoxyribose; (2) although, like DNA, RNA contains the bases adenine (A), guanine (G), and cytosine (C), it contains uracil (U) instead of the thymine (T) in DNA. Since U, like T, can base-pair by hydrogen-bonding with A, the base-pairing properties described for DNA also apply to RNA…”
-- Essential Cell Biology, Alberts et al. 1992
9USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Step 2:Composing Stylized Knowledge Fragments- ribose - it is a kind of sugar, like deoxyribose - it is contained in the nucleotides of RNA - uracil - it is a kind of nucleotide, like adenine and guanine - it can base-pair with adenine - RNA - it is a kind of nucleic acid, like DNA - it contains uracil instead of thymine - it is single-stranded - it folds in complex 3-D shapes - nucleotides are linked with phospohodiester bonds, like DNA - there are many types of RNA - RNA is the template for synthesizing protein - its nucleotides contain the sugar ribose (DNA has deoxyribose) - gene - subsequence of DNA that can be used as a template to create protein - protein synthesis - non-destructive creation process: RNA and protein created from DNA - its speed is regulated by the cell - substeps: (ordered in sequence) 1) RNA transcription - a DNA fragment (a gene) is copied, just like DNA is copied during DNA synthesis - the result is an RNA chain 2) protein translation - RNA is used as a template
10USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Step 3:Creating Knowledge Base Items
…
(defconcept uracil :is-primitive nucleotide
:constraints (:the base-pair adenine))
(defconcept RNA
:is (:and nucleic-acid
(:some contains uracil)))
…
11USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
IKRAFT: Interactive Knowledge Representation and Acquisition from Text
User starts with documents, extracts a small amount of information from them• Text contains significant portions for
context/reference/recall IKRAFT allows users to annotate text with
statements, expressed in natural language• Highlight portions of original text, annotate statement• Statements tend to be stylized
Statements are parsed, system generates summary of:• Objects• Events/actions
12USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
IKRAFT: Annotating Manual Information Extraction
13USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
IKRAFT: Extracting Statements from Complementary/Contradictory Text Sources
14USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
IKRAFT: Documenting Seismic Hazard in Southern California
15USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Seismic Hazard Analysis (SHA) for Southern California Earthquake Center (SCEC)
16USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
DOCKER: Scientist Publishes SHA Models
SCEContologies
AS97
msg
types
AS97 ontology
constrs
docs
User specifies: Types of model parameters Format of input messages Documentation Constraints
User Interface
ConstraintAcquisition
ModelSpecification
DOCKER
Web Browser
WrapperGeneration
(WSDL, PWL)
AS97
17USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Documenting the Model with IKRAFT
18USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Documenting Each Constraint
19USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Formalizing Simple Constraints
20USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Documentation of Constraints (Some Are Formalized, Some Are Not)
21USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
DOCKER: Engineer Uses SHA Model
User Interface
Sharedontologies
AS97
msgtypes
AS97 ontology
constrsdocs
ConstraintReasoning
User can: Browse through SHA models Invoke SHA models Get help in selecting
appropriate model
KR&R(Powerloom)
ModelReasoning
PathwayElicitation
DOCKER
Web Browser
AS97
22USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
DOCKER Detects Constraint Violations
23USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Should Engineer Override Constraint Specified by Model Developer?
24USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Engineer Brings Up IKRAFT to Find Reasons for the Constraint
25USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Engineer Can Check Additional Model Constraints (Not Formalized)
26USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Constraints Grounded on Model Documentation
27USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Engineers Makes an Informed Decision on Whether to Override the Constraint
28USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Discussion
Overhead in capturing the rationale?• Related to motivation and payoff• Rationale here is captured in a very simple process
Related Work:• Documenting design rationale [Shum 96]• Methodologies for knowledge base development
[Schreiber et al 00]• Higher-level languages, e.g., KARL [Fensel et al 98]
29USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Conclusions and Future Work
IKRAFT helps users document formal expressions• Each formal expression is back up by a concise NL statement
that is linked back to one or more sources Users can understand justification for system’s
reasoning (e.g., SHA) Future work:
• NLP techniques to extract terms from user’s concise statements
• Controlled grammar for formulation of statements• Other documentation: e.g., tables, forms, exceptions
High payoff in capturing the rationale of knowledge bases
30USC INFORMATION SCIENCES INSTITUTE Yolanda GilIKRAFT
Speculation: Will the (Semantic) Web End Up Looking Like This?
((( )) ())))
Richer representationsMore ambiguous
More versatile
(defconcept bridge ()))
More formalMore concrete
More introspectible
Introductory texts, expert hints, explanations, dialogues, comments, examples, exceptions,...
Info. extraction templates,dialogue segments and pegs,filled-out forms, high-level connections,...
Alternative formalizations (KIF, MELD, RDF,…), alternative views of the same notion (e.g., what is a threat)
Descriptions augmented with prototypical examples & exceptions, problem-solving steps and substeps, ...