A Real-World Knowledge Engineering Application: The NeuroScholar Project Gully APC Burns K. M....
-
Upload
arron-randall -
Category
Documents
-
view
216 -
download
0
Transcript of A Real-World Knowledge Engineering Application: The NeuroScholar Project Gully APC Burns K. M....
A Real-World Knowledge Engineering Application:The NeuroScholar ProjectGully APC Burns
K. M. Research Group University of Southern California
Structure of the presentation
1. Ideas & Concepts2. Design3. Implementation4. Demonstration
I. Ideas & Concepts
In which we are reminded of what most people think knowledge is, how it is currently used (and
misused) and how we might improve matters.
Main Entry: knowl·edge Pronunciation: 'nä-lijFunction: nounEtymology: Middle English knowlege, from knowlechen to acknowledge, irregular from knowenDate: 14th century1 obsolete : COGNIZANCE2 a (1) : the fact or condition of knowing something with familiarity gained through experience or association (2) : acquaintance with or understanding of a science, art, or technique b (1) : the fact or condition of being aware of something (2) : the range of one's information or understanding <answered to the best of my knowledge> c : the circumstance or condition of apprehending truth or fact through reasoning : COGNITION d : the fact or condition of having information or of being learned <a man of unusual knowledge>3 archaic : SEXUAL INTERCOURSE4 a : the sum of what is known : the body of truth, information, and principles acquired by mankind b archaic : a branch of learning
What does the word ‘Knowledge’ mean?
[from http://www.m-w.com/]
The published literature
Image taken from U.S. Geological Survey Energy Resource Surveys Program
… is the end-product of research and as such forms the basis for human understanding of the subject
… is very valuable.
… is structured.
… is interpretable.
The published literature
Image taken from U.S. Geological Survey Energy Resource Surveys Program
… is large and unwieldy.… has varying reliability.… is inconsistent.… is based on natural language. … is difficult to automate.… is terse… is qualitative… is 2-D
The published literature
Image taken from U.S. Geological Survey Energy Resource Surveys Program
… is a valid target for attack with informatics-based methods. This permits …(a) Increased clarification through formalization (b) large-scale data-handling capability(c) analysis of existing data to examine organization
A semantic continuum
[Mike Uschold, Boeing Corp]
Shared human consensus
Text descriptions
Semantics hardwired; used at runtime
Semantics processed and used at runtime
Implicit Informal(explicit)
Formal(for humans)
Formal(for machines)
Further to the right means: • Less ambiguity• More likely to have correct functionality• Better inter-operation (hopefully)
• Less hardwiring• More robust to change• More difficult
The current status of ‘theory’ in NeuroscienceThe current status of ‘theory’ in NeuroscienceHow we would like neuroscientists to thinkHow we would like neuroscientists to thinkWhere we would like to workWhere we would like to work
What’s wrong with this picture?…from a neuroscientist’s point of view…
From Swanson (1998), “Brain Maps, Structure of the Rat Brain”, 2nd edition, Elsevier, Amsterdam.
Number of structures = 500 x 2
Number of Cell Groups per structure
= 10
Number of Possible Connections between cell groups
= 10,000 x 10,000
= 108
Estimated Number of Connections between cell groups
= 250,000
… it’s even worse than that …
Neuroscience is extremely multidisciplinarySpatial Scales of Measurement: 101 – 10-9 mTemporal Scales of Measurement: 70 yrs (2.21x109 s) to 10-3 s(not even including evolutionary time!)
Study occurs in a heterogeneous theoretical framework involving:
Anatomy, Physiology, Psychology, Ethology, Biochemistry (Molecular Biology, Genetics, Bioinformatics), Biophysics, Behavioural Ecology, Biology … to name a few…
All of these subjects are specialized, hard to link work between disciplines and across levels
… & it’s even worse than that !!!
Neuroanatomical nomenclature are the closest thing that neuroscience has for a standardized framework…
In any given paper, the same name may be used for different structures, or different names may be used different structures.
e.g., ‘Globus Pallidus, pars medialis (GPm)’ also called the ‘Entopeduncular Nucleus’ by others.
See the index of Swanson (1998), “Brain Maps, Structure of the Rat Brain”, 2nd edition, Elsevier, Amsterdam list of synonyms according to one source.
We restrict the problem space to a specific soluble strategy
1. Describe a given phenomenon (e.g., the stress response).
2. Identify which populations of neurons are involved in the phenomenon (i.e., any neurons that turn on, turn off, change their firing, affect the phenomenon if messed with, etc.).
3. Represent how these populations of neurons are interconnected.
4. Represent the dynamic processes of there neurons that underlie the phenomenon.
A Construct: ‘A Knowledge Model’
= A personalized representation of an
individual’s knowledge.
e.g., A review article is an example of a non-computational knowledge
model
Another Construct: ‘Knowledge Landscape’
= A map of Knowledge Models (where each KM
is timestamped)
e.g., An list of the best reviews of a given subject over time is an
example of a non-computational knowledge landscape
II. Design
In which all of these high-falutin’ ideas are put into a logical design and it becomes clear that the
design criteria of the NeuroScholar project distinguish it from pure research in computer
science
Some design requirements
In order of importance1. Powerful & enabling to
neuroscientists in their everyday work
2. Easy to use! (i.e., free, multi-platform, one-click installation)
3. Knowledge acquisition / data collation is the rate limiting step
4. Open-source for future development as an academic project.
Knowledge Landscapes
NeuroScholar Screenshot- (dummy data)
Knowledge Landscapes
‘Knowledge Landscape’
‘Knowledge Model’
‘Fragments’
‘Entities’
‘Properties’ ‘Relations’
‘Annotations’
‘Data Collection’
NeuroScholar Screenshot- (dummy data)
‘Fragments’
‘Entities’
‘Properties’ ‘Relations’
‘Annotations’
‘Data Collection’ A set of data fragmentse.g. a publication: Allen GV & DF Cechetto. (1993) J Comp Neurol 330:421-438.
Knowledge Models & examples
‘Entities’
‘Properties’ ‘Relations’
‘Annotations’
‘Data Collection’
‘Fragments’ individual pieces of the literaturee.g. descriptions of experimental results.“… Moderate to light terminal labeling was present in the parvocellular portions of the paraventricular nucleus, anterior-hypothalamic nucleus, anterior portion of the lateral hypothalamic area (Figs. 2D, 3B), and in the central nucleus of the amygdala (Fig, 2D)….”
From Allen & Cechetto (1993)
Knowledge Models & examples
‘Fragments’
‘Relations’
‘Annotations’
‘Data Collection’Abstract data structures that capture the meaning of a set of fragments within the framework of the NeuroScholar system
‘Entities’
‘Properties’
injectionSite labeling
labeling
experimentalMethod
e.g. neuronPopulation object
knowledge type = descriptiondomain type = tract-tracing experiment
brainVolumes
Knowledge Models & examples
‘Fragments’
‘Entities’
‘Properties’ ‘Relations’
‘Annotations’
‘Data Collection’Rules that link two objects together.
‘Relations’
LHA
ZI
Knowledge Models & examples
‘Fragments’
‘Entities’
‘Properties’ ‘Relations’
‘Annotations’
‘Data Collection’
‘Summaries’
Sets of objects and relations, explicitly selected and prioritized within system
Knowledge Models & examples
neuronPopulation2
neuronPopulation1
‘Fragments’
‘Objects’
‘Properties’ ‘Relations’
‘Annotations’
‘Data Collection’
Human-interpretable text to make contents of knowledge base understandable
‘Annotations’
Knowledge Models & examples
Distributed Online Sources of Information
‘Fragments’
Local Implementation
Distributed Online Sources of Information
‘Fragments’
Local Implementation
Users’ Spaces & Models
Centralized Published KnowledgeRepository
Distributed Online Sources of Information
Users’ Spaces & Models
‘Fragments’
‘Pending Review’
Distributed Online Sources of Information
Users’ Spaces & Models
‘Fragments’
P2P sharing
KnowledgeModelComparison
Knowledge Model Comparison
Given two users A & B, with Knowledge Models KA & KB being shared under the P2P model.
We want A to be able to run a program that automatically compares KB to KA so that the discrepancies and contradictions between the two models can be understood and reconciled.
What’s wrong with this picture?…from an computer scientist’s point of view…
Where is the formal logic?
It’s o.k. if we only export knowledge models to a formal logic-based representation
rather that base our entire approach on it.
Knowledge Acquisition is the rate-limiting step!
Knowledge Representation
Knowledge representation is a multidisciplinary subject that applies theories and techniques from three other fields:
1. Logic provides the formal structure and rules of inference.
2. Ontology defines the kinds of things that exist in the application domain.
3. Computation supports the applications that distinguish knowledge representation from pure philosophy…
Sowa (2000), Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.
Knowledge Representation
… Without logic, a knowledge representation is vague, with no criteria for determining whether statements are redundant or contradictory. Without ontology, the terms and symbols are ill-defined, confused, and confusing. And without computable models, the logic and ontology cannot be implemented in computer programs. Knowledge representation is the application of logic and ontology to the task of constructing computable models for some domain.
Sowa (2000), Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.
III. Implementation
In which the design issues become concerned with more pressing concerns like: ‘how are we
actually going to build this thing?’
Some implementation choices
Built under UML-based software engineering paradigm The View-Primitive-Data-Model framework (‘VPDMf’)
Object Oriented Design Unified Modeling Language (UML) PerlOO Java
Relational Databases MySQL Informix
Exporting Ontologies (via the VPDMf) XML, RDF, Flogic
Exporting Logic Embedded within typed Relation objects within the OO knowledge model. Use simple method overloading in Java to run Knowledge Model
Comparison
VPDMf System Builder
VPDMf specs(Data Model file &VPDMf XML files)
UML-based documentation
DBMS
User Interface
Component
Final Working System
Forward Engineering
Reverse Engineering
Implementation Plan
MainDatabase
PluginsVPDMfClientApp
LocalDatabase
ServerClient
ReviewDatabase
VPDMfAdminApp
Plugins
Implementation Plan
MainDatabase
PluginsVPDMfClientApp
LocalDatabase
LocalApps
ServerClient
ReviewDatabase
VPDMfSystemBuilder
VPDMfAdminApp
Plugins
Implementation Plan
MainDatabase
PluginsVPDMfClientApp
LocalDatabase
ServerClient
ReviewDatabase
VPDMfAdminApp
Plugins
Demonstration
Large scale organization of NeuroScholar’s schemaData management of
publication dataGeneral knowledge management structures
Annotations, Justifications, JudgementsExperimental data,
General histological dataNeuroanatomical tract tracing dataFinal output of the system: the knowledge model
Components of the knowledge model specific to neuronal data
General data constructs used throughout the system
e.g., Views from ‘bibliography’
Excerpt
bl_x : int32 = 0bl_y : int32 = 0tr_x : int32 = 0tr_y : int32 = 0li_x : int32 = 0li_y : int32 = 0ri_x : int32 = 0ri_y : int32 = 0pagenumber : int32
enclose_excerpt()enclose_corner_area()
Journal
JournalTitle : StringPublisherName : StringAbbr : StringISSN : String
Author
Affiliation : string(40)LastName : string(40)Initials : string(10)
Fragment
fragment_type : object(CV)
1..n
1
+excerpts1..n
+fragment1
Article
Pages : string(10)Volume : int32Issue : int32Abstract : StringPMID : int32Title : string(255)Language : object(CV)PubDate : yearchecksum : Stringsize : int32
0..*
1
0..*
+journal
1
1..n
1..n
+publishedWork
1..n
+authorList
1..n
<<ordered>> 0..n1+fragments0..n
+publication
1
CV
name : Stringcontext : Stringdescription : String
(from coreSystem)
ViewDefinitionArticle
ViewDefinitionFragment
ViewLink
ViewLink
Basic Functionality: The ViewStateMachine & Forms
Query
Insert List
Display
Edit
Execute( viewInstance ) /
runExecute
Select( viewType, viewID
) / runSelectCommit( viewInstance ) / run Commit
ClearInsert( viewType ) /
runClearInsert
Update( viewInstance ) / runUpdate
Cancel( viewType, viewID ) / runCancel
Start
Query( viewType ) / runQueryInsert(
viewType ) / runInsert JumpToStart
JumpToQuery
JumpToInsert
JumpToDisplay( viewType, ViewID )
Edit( viewInstance ) / runEdit
Delete( viewType, viewID ) / runDelete
BackToQuery( viewInstance )
BackToList( viewInstance )
Additional Functionality: Specialized Form Controls &Plugins1. The Article Robot Form Control
Uses PubMed to retrieve citation information easily
2. The Fragmenter PluginAllows delineation of fragments on pdf files
3. The AtlasMapper PluginAllows delineation of regions on brain
maps
IV. Demonstration
In which the truth is finally revealed
Acknowledgements
This work is funded by the National Library of Medicine (RO1-LM07061-01)Thanks toArshad KhanShahram Ghandehanderazdeh Cyrus ShahabiMark O’NeillLarry SwansonAlan WattsMihail Bota
Wei Cheng ChenShyam KapadiaShanshan Song Ning Zhang Yi-Shin Chen