A Real-World Knowledge Engineering Application: The NeuroScholar Project Gully APC Burns K. M....

A Real-World Knowledge Engineering Application:The NeuroScholar ProjectGully APC Burns

K. M. Research Group University of Southern California

Structure of the presentation

1. Ideas & Concepts2. Design3. Implementation4. Demonstration

I. Ideas & Concepts

In which we are reminded of what most people think knowledge is, how it is currently used (and

misused) and how we might improve matters.

Main Entry: knowl·edge Pronunciation: 'nä-lijFunction: nounEtymology: Middle English knowlege, from knowlechen to acknowledge, irregular from knowenDate: 14th century1 obsolete : COGNIZANCE2 a (1) : the fact or condition of knowing something with familiarity gained through experience or association (2) : acquaintance with or understanding of a science, art, or technique b (1) : the fact or condition of being aware of something (2) : the range of one's information or understanding <answered to the best of my knowledge> c : the circumstance or condition of apprehending truth or fact through reasoning : COGNITION d : the fact or condition of having information or of being learned <a man of unusual knowledge>3 archaic : SEXUAL INTERCOURSE4 a : the sum of what is known : the body of truth, information, and principles acquired by mankind b archaic : a branch of learning

What does the word ‘Knowledge’ mean?

[from http://www.m-w.com/]

http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=cognizance

http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=knowing

http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=cognition

http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=sexual+intercourse

http://www.m-w.com/cgi-bin/dictionary?book=Dictionary&va=known

The published literature

Image taken from U.S. Geological Survey Energy Resource Surveys Program

… is the end-product of research and as such forms the basis for human understanding of the subject

… is very valuable.

… is structured.

… is interpretable.



… is large and unwieldy.… has varying reliability.… is inconsistent.… is based on natural language. … is difficult to automate.… is terse… is qualitative… is 2-D



… is a valid target for attack with informatics-based methods. This permits …(a) Increased clarification through formalization (b) large-scale data-handling capability(c) analysis of existing data to examine organization

A semantic continuum

[Mike Uschold, Boeing Corp]

Shared human consensus

Text descriptions

Semantics hardwired; used at runtime

Semantics processed and used at runtime

Implicit Informal(explicit)

Formal(for humans)

Formal(for machines)

Further to the right means: • Less ambiguity• More likely to have correct functionality• Better inter-operation (hopefully)

• Less hardwiring• More robust to change• More difficult

The current status of ‘theory’ in NeuroscienceThe current status of ‘theory’ in NeuroscienceHow we would like neuroscientists to thinkHow we would like neuroscientists to thinkWhere we would like to workWhere we would like to work

What’s wrong with this picture?…from a neuroscientist’s point of view…

From Swanson (1998), “Brain Maps, Structure of the Rat Brain”, 2nd edition, Elsevier, Amsterdam.

Number of structures = 500 x 2

Number of Cell Groups per structure

= 10

Number of Possible Connections between cell groups

= 10,000 x 10,000

= 108

Estimated Number of Connections between cell groups

= 250,000

… it’s even worse than that …

Neuroscience is extremely multidisciplinarySpatial Scales of Measurement: 101 – 10-9 mTemporal Scales of Measurement: 70 yrs (2.21x109 s) to 10-3 s(not even including evolutionary time!)

Study occurs in a heterogeneous theoretical framework involving:

Anatomy, Physiology, Psychology, Ethology, Biochemistry (Molecular Biology, Genetics, Bioinformatics), Biophysics, Behavioural Ecology, Biology … to name a few…

All of these subjects are specialized, hard to link work between disciplines and across levels

… & it’s even worse than that !!!

Neuroanatomical nomenclature are the closest thing that neuroscience has for a standardized framework…

In any given paper, the same name may be used for different structures, or different names may be used different structures.

e.g., ‘Globus Pallidus, pars medialis (GPm)’ also called the ‘Entopeduncular Nucleus’ by others.

See the index of Swanson (1998), “Brain Maps, Structure of the Rat Brain”, 2nd edition, Elsevier, Amsterdam list of synonyms according to one source.

We restrict the problem space to a specific soluble strategy

1. Describe a given phenomenon (e.g., the stress response).

2. Identify which populations of neurons are involved in the phenomenon (i.e., any neurons that turn on, turn off, change their firing, affect the phenomenon if messed with, etc.).

3. Represent how these populations of neurons are interconnected.

4. Represent the dynamic processes of there neurons that underlie the phenomenon.

A Construct: ‘A Knowledge Model’

= A personalized representation of an

individual’s knowledge.

e.g., A review article is an example of a non-computational knowledge

model

Another Construct: ‘Knowledge Landscape’

= A map of Knowledge Models (where each KM

is timestamped)

e.g., An list of the best reviews of a given subject over time is an

example of a non-computational knowledge landscape

II. Design

In which all of these high-falutin’ ideas are put into a logical design and it becomes clear that the

design criteria of the NeuroScholar project distinguish it from pure research in computer

science

Some design requirements

In order of importance1. Powerful & enabling to

neuroscientists in their everyday work

2. Easy to use! (i.e., free, multi-platform, one-click installation)

3. Knowledge acquisition / data collation is the rate limiting step

4. Open-source for future development as an academic project.

Knowledge Landscapes

NeuroScholar Screenshot- (dummy data)

Knowledge Landscapes

‘Knowledge Landscape’

‘Knowledge Model’

‘Fragments’

‘Entities’

‘Properties’ ‘Relations’

‘Annotations’

‘Data Collection’

NeuroScholar Screenshot- (dummy data)

‘Fragments’

‘Entities’


‘Annotations’

‘Data Collection’ A set of data fragmentse.g. a publication: Allen GV & DF Cechetto. (1993) J Comp Neurol 330:421-438.

Knowledge Models & examples

‘Entities’


‘Annotations’


‘Fragments’ individual pieces of the literaturee.g. descriptions of experimental results.“… Moderate to light terminal labeling was present in the parvocellular portions of the paraventricular nucleus, anterior-hypothalamic nucleus, anterior portion of the lateral hypothalamic area (Figs. 2D, 3B), and in the central nucleus of the amygdala (Fig, 2D)….”

From Allen & Cechetto (1993)


‘Fragments’

‘Relations’

‘Annotations’

‘Data Collection’Abstract data structures that capture the meaning of a set of fragments within the framework of the NeuroScholar system

‘Entities’

‘Properties’

injectionSite labeling

labeling

experimentalMethod

e.g. neuronPopulation object

knowledge type = descriptiondomain type = tract-tracing experiment

brainVolumes


‘Fragments’

‘Entities’


‘Annotations’

‘Data Collection’Rules that link two objects together.

‘Relations’

LHA

ZI


‘Fragments’

‘Entities’


‘Annotations’


‘Summaries’

Sets of objects and relations, explicitly selected and prioritized within system


neuronPopulation2

neuronPopulation1

‘Fragments’

‘Objects’


‘Annotations’


Human-interpretable text to make contents of knowledge base understandable

‘Annotations’


Distributed Online Sources of Information

‘Fragments’

Local Implementation


‘Fragments’

Local Implementation

Users’ Spaces & Models

Centralized Published KnowledgeRepository



‘Fragments’

‘Pending Review’



‘Fragments’

P2P sharing

KnowledgeModelComparison

Knowledge Model Comparison

Given two users A & B, with Knowledge Models KA & KB being shared under the P2P model.

We want A to be able to run a program that automatically compares KB to KA so that the discrepancies and contradictions between the two models can be understood and reconciled.

What’s wrong with this picture?…from an computer scientist’s point of view…

Where is the formal logic?

It’s o.k. if we only export knowledge models to a formal logic-based representation

rather that base our entire approach on it.

Knowledge Acquisition is the rate-limiting step!

Knowledge Representation

Knowledge representation is a multidisciplinary subject that applies theories and techniques from three other fields:

1. Logic provides the formal structure and rules of inference.

2. Ontology defines the kinds of things that exist in the application domain.

3. Computation supports the applications that distinguish knowledge representation from pure philosophy…

Sowa (2000), Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.

Knowledge Representation

… Without logic, a knowledge representation is vague, with no criteria for determining whether statements are redundant or contradictory. Without ontology, the terms and symbols are ill-defined, confused, and confusing. And without computable models, the logic and ontology cannot be implemented in computer programs. Knowledge representation is the application of logic and ontology to the task of constructing computable models for some domain.

Sowa (2000), Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks Cole Publishing Co., Pacific Grove, CA.

III. Implementation

In which the design issues become concerned with more pressing concerns like: ‘how are we

actually going to build this thing?’

Some implementation choices

Built under UML-based software engineering paradigm The View-Primitive-Data-Model framework (‘VPDMf’)

Object Oriented Design Unified Modeling Language (UML) PerlOO Java

Relational Databases MySQL Informix

Exporting Ontologies (via the VPDMf) XML, RDF, Flogic

Exporting Logic Embedded within typed Relation objects within the OO knowledge model. Use simple method overloading in Java to run Knowledge Model

Comparison

VPDMf System Builder

VPDMf specs(Data Model file &VPDMf XML files)

UML-based documentation

DBMS

User Interface

Component

Final Working System

Forward Engineering

Reverse Engineering

Implementation Plan

MainDatabase

PluginsVPDMfClientApp

LocalDatabase

ServerClient

ReviewDatabase

VPDMfAdminApp

Plugins

Implementation Plan

MainDatabase


LocalDatabase

LocalApps

ServerClient

ReviewDatabase

VPDMfSystemBuilder

VPDMfAdminApp

Plugins

Implementation Plan

MainDatabase


LocalDatabase

ServerClient

ReviewDatabase

VPDMfAdminApp

Plugins

Demonstration

Large scale organization of NeuroScholar’s schemaData management of

publication dataGeneral knowledge management structures

Annotations, Justifications, JudgementsExperimental data,

General histological dataNeuroanatomical tract tracing dataFinal output of the system: the knowledge model

Components of the knowledge model specific to neuronal data

General data constructs used throughout the system

e.g., Views from ‘bibliography’

Excerpt

bl_x : int32 = 0bl_y : int32 = 0tr_x : int32 = 0tr_y : int32 = 0li_x : int32 = 0li_y : int32 = 0ri_x : int32 = 0ri_y : int32 = 0pagenumber : int32

enclose_excerpt()enclose_corner_area()

Journal

JournalTitle : StringPublisherName : StringAbbr : StringISSN : String

Author

Affiliation : string(40)LastName : string(40)Initials : string(10)

Fragment

fragment_type : object(CV)

1..n

1

+excerpts1..n

+fragment1

Article

Pages : string(10)Volume : int32Issue : int32Abstract : StringPMID : int32Title : string(255)Language : object(CV)PubDate : yearchecksum : Stringsize : int32

0..*

1

0..*

+journal

1

1..n

1..n

+publishedWork

1..n

+authorList

1..n

<<ordered>> 0..n1+fragments0..n

+publication

1

CV

name : Stringcontext : Stringdescription : String

(from coreSystem)

ViewDefinitionArticle

ViewDefinitionFragment

ViewLink

ViewLink

Basic Functionality: The ViewStateMachine & Forms

Query

Insert List

Display

Edit

Execute( viewInstance ) /

runExecute

Select( viewType, viewID

) / runSelectCommit( viewInstance ) / run Commit

ClearInsert( viewType ) /

runClearInsert

Update( viewInstance ) / runUpdate

Cancel( viewType, viewID ) / runCancel

Start

Query( viewType ) / runQueryInsert(

viewType ) / runInsert JumpToStart

JumpToQuery

JumpToInsert

JumpToDisplay( viewType, ViewID )

Edit( viewInstance ) / runEdit

Delete( viewType, viewID ) / runDelete

BackToQuery( viewInstance )

BackToList( viewInstance )

Additional Functionality: Specialized Form Controls &Plugins1. The Article Robot Form Control

Uses PubMed to retrieve citation information easily

2. The Fragmenter PluginAllows delineation of fragments on pdf files

3. The AtlasMapper PluginAllows delineation of regions on brain

maps

IV. Demonstration

In which the truth is finally revealed

Acknowledgements

This work is funded by the National Library of Medicine (RO1-LM07061-01)Thanks toArshad KhanShahram Ghandehanderazdeh Cyrus ShahabiMark O’NeillLarry SwansonAlan WattsMihail Bota

Wei Cheng ChenShyam KapadiaShanshan Song Ning Zhang Yi-Shin Chen

A Real-World Knowledge Engineering Application: The NeuroScholar Project Gully APC Burns K. M....

Documents

Transcript of A Real-World Knowledge Engineering Application: The NeuroScholar Project Gully APC Burns K. M....