Transcript of Amarnath Gupta University of California San Diego NIF as a Multi-Model Semantic Information System...
- Slide 1
- Amarnath Gupta University of California San Diego NIF as a
Multi-Model Semantic Information System Part 1: Relational, XML,
RDF and OWL models
- Slide 2
- Preamble 1 As we design and extend the NIF system we recognize
that Users will give us data in any form that is convenient for
them Standard data may be stored in a flat file Web service output
can be in XML Semantic Web enthusiasts may represent data using
proper RDF However, regardless of the form in which data may be
represented The NIF system must treat them (query, index,
relate,...) in a uniform manner The NIF system must utilize the
underlying systems to query/access/index data
- Slide 3
- Preamble 2 In this presentation we intend to Explain our
perspective on these different data models Provide a background on
the data models we consider Offer a sense of the semantic character
of these data models Present our design philosophy on Where to keep
them separate Where to transform them into a common model
- Slide 4
- What is a Data Model? A conceptual data model A formal
representation of the users/applications mental model of data
elements and their relationships that should be put in a database,
manipulated, queried and operated upon A logical data model A
formal description of the data model in a logical structure that a
computer can use to perform the queries and other operations. In
many cases, the same conceptual model can be represented by
different logical models A physical data model An implementable
version of the data model in terms of data structures, access
structures (e.g., indices) and the set of low-level operations that
a system needs to perform on the data
- Slide 5
- A Conceptual Model ORM Model Terry Halpin Object Relationship/
Role Value Constraint Uniqueness Constraint Inter-relationship
Constraint Value Type n-ary Role
- Slide 6
- A Logical Data Model A formal specification of The structure of
the data The structure tells us how the data is organized (123,
Purkinje Cell, Cerebellum) (828, Hippocampus, Hilar Cell) Often the
structure of the data, together with some constraints, represent
some semantics If the data are not structured (like free text), the
techniques for handling them will be different. Operations on this
structure Every data model is based on some mathematical principles
that define what you can do with the data the nature of data values
Data domains and data types operations on data values is not
structured
- Slide 7
- The Relational Data Model
NeuronIDNeuronNameBrainRegionNeuroTransmitterCurrent 1Purkinje
CellCerebellumGlutamateTransient Na + 2Hilar CellDentate
GyrusGABACa 2+ Attribute Domain all possible values the attribute
can take Candidate key: a set of columns that uniquely determines a
row Relational model is a set (bag) of tuples model Metadata stored
in a separate catalog which is also relational First order
constraints All queries are about some combination of Selecting
rows, columns Combining tables by union, intersection, join
Computing data or aggregate functions Grouping and sorting A query
can return only values Relation names and attribute names cannot
serve as variables in a query Table: Neurons Attribute name
Attribute value: Cannot be complex Relation name Tuple
- Slide 8
- Object Relational Model Eases some of the problems of the
classical relational model Data values can be of arbitrary data
types Sets (e.g., multiple currents for a neuron) Tuples (e.g.,
references ordered by year) Time-series (e.g., raw EEG data)
Spatial Data (e.g., atlases in CCDB) Each data type can have its
own operations Find all data points within a neighborhood of a
spatial location Queries are still values Catalog queries and data
queries cannot be mixed in a single query All industrial-strength
DBMSs use some version this model Need to be a skilled DB
programmer to develop custom applications on this model
- Slide 9
- XML (Two Perspectives) Document Community data = linear text
documents mark up (annotate) text pieces to describe context,
structure, semantics of the marked text Oxidative stress has been
proposed to be involved in the pathogenesis of Parkinson's disease
(PD). A plausible source of oxidative stress in nigral dopaminergic
neurons is the redox reactions that specifically involve dopamine
and produce various toxic molecules.
- Slide 10
- XML (Two Perspectives) Database Community XML as a (most
prominent) example of the semi-structured data model => captures
the whole spectrum from highly structured, regular data to
unstructured data (relational, object-oriented, marked up text,...)
A new annotation file true 0.000001 Text message for the event
start. Text message for the event end. From the CARMEN group
- Slide 11
- XML as a Logical Data Model XML is a tree-structured document
Nodes Element nodes Children can be ordered Recursive elements
(parts under parts) Attribute nodes Mandatory or optional Edges
Sub-element edges Attribute edges IDRef edges Constraints
References Value restrictions, OneOf Cardinality Trees are more
flexible than tables Any number of nodes can be added anywhere
without breaking the model
- Slide 12
- XML as a Logical Data Model XML has its own schema language
Lets you specify a complex type system A database is a collection
of XML trees Storing XML Mostly relational with some very clever
indexing to encode the hierarchy, tree paths, and order Querying
XML Elements, attribute names, values and structure can be queried
Multiple trees can be joined by value Example (Xpath)
http://mousespinal.brain-map.org/imageseries/detail/100002661.xml
Find images of the spinal column
//image[//structurelabel/text()=SPINAL COLUMN]/ish_image_path is a
tree query XQuery and full-text XQuery
- Slide 13
- Misusing and Abusing XML Using XML if your data is relational
It will result in flat trees that will suffer from complex querying
Encoding orders and hierarchies that need special parsing
Apo-Levocarb (carbidopa + levodopa) Apo-Levocarb CR
Controlled-Release Tablets (carbidopa + levodopa) Using implicit
multi-valuedness 0.2 -0.3 0.1
- Slide 14
- Expressing Semantics in XML Adorning elements with Namespaces A
namespace is a unique URI (Uniform Resource Locator) To
disambiguate between two elements that happen to share the same
name To group elements relating to a common idea together
Metalloelastase NP_304845 RefSeq
- Slide 15
- The Problem with XML Semantics Two different XML
representations of the same kind of information may not be easily
unifiable What did XML not encode?
- Slide 16
- Resource Description Format (RDF) Rdf:statement URI(CNTFR-
URI(modulates) URI(eSNCA- mediated neurotoxicity) Rdf:type
Rdf:object Rdf:predicate Rdf:subject URI(membrane -protein)
Rdf:type URI(protein- mediated toxicity) Rdf:type Rdf:property
- Slide 17
- The Basic Constructs of RDF RDF meta-model basic elements All
defined in rdf namespace
http://www.w3.org/1999/02/22-rdf-syntax-ns# Types (or classes)
rdf:resource everything that can be identified (with a URI)
rdf:property specialization of a resource expressing a binary
relation between two resources rdf:statement a triple with
properties rdf:subject, rdf:predicate, rdf:object Properties
rdf:type - subject is an instance of that category or class defined
by the value rdf:subject, rdf:predicate, rdf:object relate elements
of statement tuple to a resource of type statement.
- Slide 18
- Relational Data vis--vis RDF Node to edge ratio is relatively
small in many applications Number of relationships need not be
fixed at design time The general tendency is keep the number of
edge labels small Graph-based operations can be performed on RDF,
which requires an unspecified number of joins in relational
data
- Slide 19
- RDF Blank Nodes RDF allows one to create anonymous objects
whose existence is known but details are not There exists some
neuron to which both NeuronX and NeuronY connect
- Slide 20
- RDF Schema Declaration of vocabularies classes, properties, and
relationships defined by a particular community rdfs:Class,
rdfs:subClassOf Property-related rdfs:subPropertyOf relationship of
properties to classes rdfs:domain, rdfs:range Provides substructure
for inferences based on existing triples NOT prescriptive, but
descriptive This is different from XML Schema Schema language is an
expression of basic RDF model uses meta-model constructs:
resources, statements, properties schema are legal RDF graphs and
can be expressed in RDF/XML syntax
- Slide 21
- Examples of RDF Inferencing From this we can infer (:alice
rdf:type parent) (:betty rdf:type parent) (:eve rdf:type
female-person) (:charles rdf:type :person)
- Slide 22
- RDF as a Logical Data Model RDF does not distinguish between
different relationships Instance-to-type Instance-to-instance
Type-to-instance No transitivity inference is possible over, say,
rdf:type RDF (as well as XML) has lost the notion of the abstract
data type like spatial object or time Operations on object types
does not mix well with RDF Constraints like uniqueness, 1-to-1
relationships, cannot be expressed SPARQL, the query language for
RDF is An edge-only language it cannot express the // construct of
XML Blank nodes are treated as variables not output in the results
Parts of the language are undecidable! A problem is undecidable if
it can be proved that there can be no algorithm to solve it
- Slide 23
- OWL Components of an OWL Ontology Vocabulary (concepts)
Structure (attributes of concepts and hierarchy)
Concept-to-concept, concept-to-data, property-to-property
relationships Logical characteristics of relationships Domain and
range restrictions Properties of relations (symmetry, transitivity)
Cardinality of relations Open world vs. Closed world assumptions
Contrast to most reasoning systems that assume anything absent from
knowledge base is not true Need to maintain monotonicity with
tolerance for contradictions OWL Classes rdf:subclassOf rdf:Class
Class of all classes
- Slide 24
- Basic OWL Constructs Creating OWL Classes disjointWith Neurons
are not glial cells sameClassAs (equivalence) Class Gabaergic
neuron is exactly the same class as neuronswhich has GABA as
neurotransmitter Enumerations (on instances) Class Cerebellar
lobules are Lobule I, Lobule II, Boolean set semantics (on classes)
Union (logical disjunction) Class nerve cell is union of neuron,
glial cell Intersection (logical conjunction of class with
properties) Class hippocampal neurons is conjunction of things of
class Neuron and have property (has-soma-located-in) (hippocampus
union any class that is (part-of) hippocampus) complimentOf
(logical negation) Class benign tumor is disjunct of class
malignant tumor
- Slide 25
- Properties of OWL Properties Transitive Property P(x,y) and
P(y,z) P(x,z) subclassOf SymmetricProperty P(x,y) iff P(y,x)
is_functionally_related_to Functional Property P(x,y) and P(x,z)
y=z soma_located_in inverseOf P1(x,y) iff P2(y,x) regulates
is_regulated_by InverseFunctional Property P(y,x) and P(z,x) y=z
is_isoform_of Cardinality Only 0 or 1 in OWL-lite and OWL-full
- Slide 26
- Instances in OWL Instances are distinct from Classes In RDF
there is no distinction between class and instances OWL DL
restrictions Type separation Class can not also be an individual or
property Property can not also be an individual or class is allowed
in RDF
- Slide 27
- A Rough Comparison ~ RDF and OWL do not represent n-ary roles
cleanly
- Slide 28
- Querying OWL The are several languages in the making SPARQL
engines (e.g., Virtuoso) are used often Pellet is used for
reasoning tasks Subsumption Consistency New, more advanced
languages like nSPARQL are coming up vSPARQL is being developed to
enable views on SPARQL, which will lead to nested SPARQL queries
Our goal Develop a query processor for these advanced languages
Part of OntoQuest, our ontological information management
system
- Slide 29
- Where does NIF stand in this? Not every model is directly
inter-convertible with every other model NIF is designed to Work
with multiple models Ensure that the modeling capability and query
capability of every model is preserved in its native form Queries
in our system get translated to queries in the native forms of the
databases we federate Express the local semantics of any data
appropriately by Augmenting the semantic model of the data
Connecting the data to NIFs ontology Extending the NIF ontology in
the process Develop a mechanism to create a common integrated model
over these models this model is an ontological graph that
incorporates object and temporal semantics
- Slide 30
- Example of An Ontological Extension Representing time and
events Phenotypes, physiology, Instants, intervals, and periods
Temporal granularity of observation Events Multi-temporal
observations based on conditions on properties Modeling states,
objects in state, and state transitions One-only, repeatable, and
time deictic events Subevents History of objects, events, roles
Subtype migration, Temporal roles and role migration Progression of
disease, symptom or recovery states Repeatability Considering TOWL
and Temporal ORM
- Slide 31
- Questions?