Amarnath Gupta University of California San Diego NIF as a Multi-Model Semantic Information System...

download Amarnath Gupta University of California San Diego NIF as a Multi-Model Semantic Information System Part 1: Relational, XML, RDF and OWL models.

If you can't read please download the document

Transcript of Amarnath Gupta University of California San Diego NIF as a Multi-Model Semantic Information System...

  • Slide 1
  • Amarnath Gupta University of California San Diego NIF as a Multi-Model Semantic Information System Part 1: Relational, XML, RDF and OWL models
  • Slide 2
  • Preamble 1 As we design and extend the NIF system we recognize that Users will give us data in any form that is convenient for them Standard data may be stored in a flat file Web service output can be in XML Semantic Web enthusiasts may represent data using proper RDF However, regardless of the form in which data may be represented The NIF system must treat them (query, index, relate,...) in a uniform manner The NIF system must utilize the underlying systems to query/access/index data
  • Slide 3
  • Preamble 2 In this presentation we intend to Explain our perspective on these different data models Provide a background on the data models we consider Offer a sense of the semantic character of these data models Present our design philosophy on Where to keep them separate Where to transform them into a common model
  • Slide 4
  • What is a Data Model? A conceptual data model A formal representation of the users/applications mental model of data elements and their relationships that should be put in a database, manipulated, queried and operated upon A logical data model A formal description of the data model in a logical structure that a computer can use to perform the queries and other operations. In many cases, the same conceptual model can be represented by different logical models A physical data model An implementable version of the data model in terms of data structures, access structures (e.g., indices) and the set of low-level operations that a system needs to perform on the data
  • Slide 5
  • A Conceptual Model ORM Model Terry Halpin Object Relationship/ Role Value Constraint Uniqueness Constraint Inter-relationship Constraint Value Type n-ary Role
  • Slide 6
  • A Logical Data Model A formal specification of The structure of the data The structure tells us how the data is organized (123, Purkinje Cell, Cerebellum) (828, Hippocampus, Hilar Cell) Often the structure of the data, together with some constraints, represent some semantics If the data are not structured (like free text), the techniques for handling them will be different. Operations on this structure Every data model is based on some mathematical principles that define what you can do with the data the nature of data values Data domains and data types operations on data values is not structured
  • Slide 7
  • The Relational Data Model NeuronIDNeuronNameBrainRegionNeuroTransmitterCurrent 1Purkinje CellCerebellumGlutamateTransient Na + 2Hilar CellDentate GyrusGABACa 2+ Attribute Domain all possible values the attribute can take Candidate key: a set of columns that uniquely determines a row Relational model is a set (bag) of tuples model Metadata stored in a separate catalog which is also relational First order constraints All queries are about some combination of Selecting rows, columns Combining tables by union, intersection, join Computing data or aggregate functions Grouping and sorting A query can return only values Relation names and attribute names cannot serve as variables in a query Table: Neurons Attribute name Attribute value: Cannot be complex Relation name Tuple
  • Slide 8
  • Object Relational Model Eases some of the problems of the classical relational model Data values can be of arbitrary data types Sets (e.g., multiple currents for a neuron) Tuples (e.g., references ordered by year) Time-series (e.g., raw EEG data) Spatial Data (e.g., atlases in CCDB) Each data type can have its own operations Find all data points within a neighborhood of a spatial location Queries are still values Catalog queries and data queries cannot be mixed in a single query All industrial-strength DBMSs use some version this model Need to be a skilled DB programmer to develop custom applications on this model
  • Slide 9
  • XML (Two Perspectives) Document Community data = linear text documents mark up (annotate) text pieces to describe context, structure, semantics of the marked text Oxidative stress has been proposed to be involved in the pathogenesis of Parkinson's disease (PD). A plausible source of oxidative stress in nigral dopaminergic neurons is the redox reactions that specifically involve dopamine and produce various toxic molecules.
  • Slide 10
  • XML (Two Perspectives) Database Community XML as a (most prominent) example of the semi-structured data model => captures the whole spectrum from highly structured, regular data to unstructured data (relational, object-oriented, marked up text,...) A new annotation file true 0.000001 Text message for the event start. Text message for the event end. From the CARMEN group
  • Slide 11
  • XML as a Logical Data Model XML is a tree-structured document Nodes Element nodes Children can be ordered Recursive elements (parts under parts) Attribute nodes Mandatory or optional Edges Sub-element edges Attribute edges IDRef edges Constraints References Value restrictions, OneOf Cardinality Trees are more flexible than tables Any number of nodes can be added anywhere without breaking the model
  • Slide 12
  • XML as a Logical Data Model XML has its own schema language Lets you specify a complex type system A database is a collection of XML trees Storing XML Mostly relational with some very clever indexing to encode the hierarchy, tree paths, and order Querying XML Elements, attribute names, values and structure can be queried Multiple trees can be joined by value Example (Xpath) http://mousespinal.brain-map.org/imageseries/detail/100002661.xml Find images of the spinal column //image[//structurelabel/text()=SPINAL COLUMN]/ish_image_path is a tree query XQuery and full-text XQuery
  • Slide 13
  • Misusing and Abusing XML Using XML if your data is relational It will result in flat trees that will suffer from complex querying Encoding orders and hierarchies that need special parsing Apo-Levocarb (carbidopa + levodopa) Apo-Levocarb CR Controlled-Release Tablets (carbidopa + levodopa) Using implicit multi-valuedness 0.2 -0.3 0.1
  • Slide 14
  • Expressing Semantics in XML Adorning elements with Namespaces A namespace is a unique URI (Uniform Resource Locator) To disambiguate between two elements that happen to share the same name To group elements relating to a common idea together Metalloelastase NP_304845 RefSeq
  • Slide 15
  • The Problem with XML Semantics Two different XML representations of the same kind of information may not be easily unifiable What did XML not encode?
  • Slide 16
  • Resource Description Format (RDF) Rdf:statement URI(CNTFR- URI(modulates) URI(eSNCA- mediated neurotoxicity) Rdf:type Rdf:object Rdf:predicate Rdf:subject URI(membrane -protein) Rdf:type URI(protein- mediated toxicity) Rdf:type Rdf:property
  • Slide 17
  • The Basic Constructs of RDF RDF meta-model basic elements All defined in rdf namespace http://www.w3.org/1999/02/22-rdf-syntax-ns# Types (or classes) rdf:resource everything that can be identified (with a URI) rdf:property specialization of a resource expressing a binary relation between two resources rdf:statement a triple with properties rdf:subject, rdf:predicate, rdf:object Properties rdf:type - subject is an instance of that category or class defined by the value rdf:subject, rdf:predicate, rdf:object relate elements of statement tuple to a resource of type statement.
  • Slide 18
  • Relational Data vis--vis RDF Node to edge ratio is relatively small in many applications Number of relationships need not be fixed at design time The general tendency is keep the number of edge labels small Graph-based operations can be performed on RDF, which requires an unspecified number of joins in relational data
  • Slide 19
  • RDF Blank Nodes RDF allows one to create anonymous objects whose existence is known but details are not There exists some neuron to which both NeuronX and NeuronY connect
  • Slide 20
  • RDF Schema Declaration of vocabularies classes, properties, and relationships defined by a particular community rdfs:Class, rdfs:subClassOf Property-related rdfs:subPropertyOf relationship of properties to classes rdfs:domain, rdfs:range Provides substructure for inferences based on existing triples NOT prescriptive, but descriptive This is different from XML Schema Schema language is an expression of basic RDF model uses meta-model constructs: resources, statements, properties schema are legal RDF graphs and can be expressed in RDF/XML syntax
  • Slide 21
  • Examples of RDF Inferencing From this we can infer (:alice rdf:type parent) (:betty rdf:type parent) (:eve rdf:type female-person) (:charles rdf:type :person)
  • Slide 22
  • RDF as a Logical Data Model RDF does not distinguish between different relationships Instance-to-type Instance-to-instance Type-to-instance No transitivity inference is possible over, say, rdf:type RDF (as well as XML) has lost the notion of the abstract data type like spatial object or time Operations on object types does not mix well with RDF Constraints like uniqueness, 1-to-1 relationships, cannot be expressed SPARQL, the query language for RDF is An edge-only language it cannot express the // construct of XML Blank nodes are treated as variables not output in the results Parts of the language are undecidable! A problem is undecidable if it can be proved that there can be no algorithm to solve it
  • Slide 23
  • OWL Components of an OWL Ontology Vocabulary (concepts) Structure (attributes of concepts and hierarchy) Concept-to-concept, concept-to-data, property-to-property relationships Logical characteristics of relationships Domain and range restrictions Properties of relations (symmetry, transitivity) Cardinality of relations Open world vs. Closed world assumptions Contrast to most reasoning systems that assume anything absent from knowledge base is not true Need to maintain monotonicity with tolerance for contradictions OWL Classes rdf:subclassOf rdf:Class Class of all classes
  • Slide 24
  • Basic OWL Constructs Creating OWL Classes disjointWith Neurons are not glial cells sameClassAs (equivalence) Class Gabaergic neuron is exactly the same class as neuronswhich has GABA as neurotransmitter Enumerations (on instances) Class Cerebellar lobules are Lobule I, Lobule II, Boolean set semantics (on classes) Union (logical disjunction) Class nerve cell is union of neuron, glial cell Intersection (logical conjunction of class with properties) Class hippocampal neurons is conjunction of things of class Neuron and have property (has-soma-located-in) (hippocampus union any class that is (part-of) hippocampus) complimentOf (logical negation) Class benign tumor is disjunct of class malignant tumor
  • Slide 25
  • Properties of OWL Properties Transitive Property P(x,y) and P(y,z) P(x,z) subclassOf SymmetricProperty P(x,y) iff P(y,x) is_functionally_related_to Functional Property P(x,y) and P(x,z) y=z soma_located_in inverseOf P1(x,y) iff P2(y,x) regulates is_regulated_by InverseFunctional Property P(y,x) and P(z,x) y=z is_isoform_of Cardinality Only 0 or 1 in OWL-lite and OWL-full
  • Slide 26
  • Instances in OWL Instances are distinct from Classes In RDF there is no distinction between class and instances OWL DL restrictions Type separation Class can not also be an individual or property Property can not also be an individual or class is allowed in RDF
  • Slide 27
  • A Rough Comparison ~ RDF and OWL do not represent n-ary roles cleanly
  • Slide 28
  • Querying OWL The are several languages in the making SPARQL engines (e.g., Virtuoso) are used often Pellet is used for reasoning tasks Subsumption Consistency New, more advanced languages like nSPARQL are coming up vSPARQL is being developed to enable views on SPARQL, which will lead to nested SPARQL queries Our goal Develop a query processor for these advanced languages Part of OntoQuest, our ontological information management system
  • Slide 29
  • Where does NIF stand in this? Not every model is directly inter-convertible with every other model NIF is designed to Work with multiple models Ensure that the modeling capability and query capability of every model is preserved in its native form Queries in our system get translated to queries in the native forms of the databases we federate Express the local semantics of any data appropriately by Augmenting the semantic model of the data Connecting the data to NIFs ontology Extending the NIF ontology in the process Develop a mechanism to create a common integrated model over these models this model is an ontological graph that incorporates object and temporal semantics
  • Slide 30
  • Example of An Ontological Extension Representing time and events Phenotypes, physiology, Instants, intervals, and periods Temporal granularity of observation Events Multi-temporal observations based on conditions on properties Modeling states, objects in state, and state transitions One-only, repeatable, and time deictic events Subevents History of objects, events, roles Subtype migration, Temporal roles and role migration Progression of disease, symptom or recovery states Repeatability Considering TOWL and Temporal ORM
  • Slide 31
  • Questions?