Observations and Ontologies Achieving semantic interoperability of environmental and ecological data...
-
Upload
kelley-golden -
Category
Documents
-
view
214 -
download
0
Transcript of Observations and Ontologies Achieving semantic interoperability of environmental and ecological data...
Observations and Ontologies Achieving semantic interoperability of
environmental and ecological data
Mark Schildhauer1, Shawn Bowers2, Josh Madin3, Matt Jones1
1NCEAS UC Santa Barbara, 2Gonzaga University3Macquarie University,
http://sonet.ecoinformatics.org
NCEAS-ACEAS Workshop, Brisbane May 2010
Motivation-- Critical questions
Need to answer increasingly complex and critical questions about the environment:
are the world’s fisheries sustainable?how will climate change impact food production?
are GMOD crops safe to introduce to the environment?is deforestation accelerating climate change?
why are pollinators declining around the world? will nanotech wastes alter ecosystems? what are causes of ocean acidification on reef corals?
can we predict the spread of an invasive species
are there tipping points in environmental change?
Motivation– Environmental Synthesis
Answering complex, critical environmental questions requires integrating and analyzing many types of data:
Local to large scale, global coveragesFine-grain, high-resolution
Physical context: land-use/land-cover, geologysoils, atmosphere, hydrology, oceanography
Biotic context: from genes to ecosystems
Socioecology: traditions & customs, economics, governance
Good news-- more and more data
There is a growing deluge of environmental data to assistin these investigations …
Need for ecoinformatics
But…
locating desired information is already quite difficult… Culling through irrelevant information (precision) Failing to find all useful information (recall)
using the data you find is problematic… Interpretation (units, context, methods) Merging, transforming for re-use
Manual, ad-hoc, arduous
… Why?
Environmental Data-- State of Affairs
Environmental data are:
Stewarded/owned by many groups, individuals Sparsely documented (metadata, data catalog) Variably accessible via the Internet
Heterogeneous: broad range of relevant topics
The informatics challenge…
Environmental data are highly heterogeneous…geospatial data-- point, line, polygon, raster
time series/monitoring data
tables, spreadsheets/csv
grids, matrices
normalized DBMS
• Variable structure• Variable syntax (R, MATLAB, mySQL, .xls)• Variable semantics (what is “temp”?)
Data Integration
Combining heterogeneous data is necessary for synthesis
Approaches Develop consistent data
models within and across entire domains– “standardized schema”
“Describe” your data and its contents so that machines can process and integrate– “semantic mediation”
Data Integration
Combining heterogeneous data is necessary for synthesis
Impractical if not impossible to standardize schemas for all data sets being collected
Use emerging approaches of Semantic Web1
1 Berners-Lee, Hendler & Lassila 2001. The Semantic Web. http://www.scientificamerican.com/2001/0501issue/0501berners-lee.html [18.04.2002 21:56:54]
Semantic Data Integration
Metadata standards are step in right direction…
Expose data in standard schema for transfer
Dublin Core ISO 19115 (geospatial metadata) Darwin Core (biodiversity specimen metadata) EML (Ecological Metadata Language) GeoSciML
All have XML implementations for document exchange
Can map one format to another to resolve minor differences
Importance of semantics
Descriptive metadata is insufficient “semantics” are expressed in natural language
Inconsistent, imprecise, not standardized
The computer can’t “understand”: what is being measured how measurements relate to one another how semantics map to logical structure
Importance of semantics
Efficient, effective integration and subsequent analysis depends on understanding the semantic contextual relationships of each data measurement, as well as the relationships among measurements in a table structure or other data format.
Usually an expert provides this, or a data catalog
How to capture and expose for machine processing? Semantic Mediation!
Semantic Data Integration
Metadata-- Cannot formally express complex constructs:
Define Specific Leaf Area What type of weight measurement is involved in
its calculation? How is SLA measurement in column 1 related
to plot ID measurement in column 2?
Cannot provide native reasoning: I measured a specimen with a prehensile tail,
extrusible tongue, eats insects, has fused toes What is it? Can I know anything more about it?
Semantic Data Integration
Ontologies do not have these limitations…
Can express complex constructs: SLA is an abbreviation that is a synonym for the functional
trait called Specific Leaf Area that is a measurement taken from a leaf, which is a part of a plant
SLA consists of a dry weight measurement divided into an areal measurement
Can natively reason: The specimen has a prehensile tail, extrusible tongue,
eats insects, has fused toes
infer: specimen is a chameleoninfer: chameleon is a reptileinfer: specimen has stereoscopic eyesInfer: specimen may be able to change color
Formal Ontologies and Reasoners
Use W3C standard: Semantic Web http://www.w3.org/standards/semanticweb/
Expose data syntax, schema and semantics through astandardized language that computers can parse and interpret: OWL, the WebOntology Language
OWL, RDF, XML Reasoners
What is an ontology?
A formal specification of concepts, and the relationships that may exist between those concepts.
How can ontologies help?
Classification and “reasoning” Data discovery Integration/merge
Concept mapping Units conversion Spatial & temporal scaling
How can ontologies help?
Classification and “reasoning” New “facts” derived from ontology Potential emergence
ArealDensity requires knowledge of Area and Abundance
If have Area and Abundance, might have ArealDensity
How can ontologies help?
Classification and “reasoning” Data discovery Integration/merge Analytical assistance
Statistical inference Data types Data transformations
How can ontologies help?
Use OWL-DL (OWL2 RL) W3C Recommendation
Provides complete and consistent reasoning
Standard, free, reasoners available Pellet, FaCT++
Construct and visualize ontologies using free tools Protégé, SWOOP OWLIFIER tool (Josh)
How can ontologies help?
Can “Define” Objects with equivalence classes
Specifies Necessary and Sufficient Conditions Reasoner will classify described Object
has Fur
locomotes Bipedal
native_to Australia
births UndevelopedYoung
has GoodJumpingAbility
What do ontologies consist of?
Objects (terms) Arrange in class (subsumption) hierarchies Can describe objects in terms of properties and
relationships to other objects
Relationships Specify relationships between Objects Can be reflexive, symmetric, transitive (or not)
Beyond SQL…
OWL DL Symbol Example
Restrictions:
someValuesFrom ∃ hasPart some LeafallValuesFrom ∀ isPartof only PlanthasValue ∋ hasCountryOfOrigin value AustraliaminCardinality ≥ hasStoma min 1cardinality = hasStem exactly 1maxCardinality ≤ hasPetals max 100
Class constructors:
intersectionOf ⊓ WoodyBark and RiparianHabitatunionOf ⊔ Tree or BushcomplementOf ¬ not Grass
Model and define domain science concepts
Lots of domain ontologies emerging http://www.biofoundry.org
How to use these to advance data integration?
Use of Ontologies
Genomics have largely homogeneous data
Ontologies “unify” vocabularies in model organisms(fruit fly, yeast, mouse, arabidopsis etc.)
Many ontologies emerging
Are these useful for semantic mediation and data integration?
Nature of scientific data sets
Scientific data often in tables
Tables consist of rows (records) and columns (attributes)
The association of specific columns together (tuple) in a scientific data set is often a non-normalized (materialized) view, with special meaning/use for researcher
Individual cells contain values that are measurements of characteristic of some thing
Semantic annotation
Data setslide from J. Madin
• computer doesn’t know that “Ht.” represents a “height” measurement
• computer doesn’t know whether Plot is nested within Site or vice-versa
• computer to determine if the Temp applies to Site or Plot or Species
Observation defined
Observations in scientific data sets typically co-occur with other observations
Ontologies must assist with describing the inter-relationships among observations within and across datasets
Observational Data Model
Observation defined
An observation represents any measurement of some characteristic (attribute) of some real-world entity or phenomenon.
A measurement consists of a realized value of some characteristic of an entity, expressed in some well-specified units (drawn from a measurement standard)
Observations can provide context for other observations (e.g. observations of spatial or temporal information would often provide context for some other observation)
Measurements are taken using some protocol
Another definition for observation
An observation is an act that results in the estimation of the value of a feature property, and involves application of a specified procedure, such as a sensor, instrument, algorithm or process chain. The procedure may be applied in-situ, remotely, or ex-situ with respect to the sampling location… The key idea is that the observation result is an estimate of the value of some property of the feature of interest, and the other observation properties provide context or metadata to support evaluation, interpretation and use of the result.
(OGC Observations and Measurements, 2010-01-05)
Extensible Observation Ontology (OBOE)
A scientific Observation is
Measurement of the Value
of a Characteristic
of some Entity
in a particular Context
using some Protocol
Provides extension points for loading specialized domain ontologies
To generically describe the structure of scientific observation and measurement as would be found in a scientific data set
OBOE - Extensible Observation Ontology
Entities represent real-world objects or concepts that can be measured.
Measurements assign values and units to characteristics of observed entities.
Observations are made about particular entities.
Every measurement has a characteristic, which defines the property of the entity being measured.
Every measurement has a unit.
Observations can provide context for other observations.
Entities, through observations, can be associated with one or more measured characteristics.
A value is typically a cell in a data set.
Extension points
Linking data values to concepts
Extensible Observation Ontology (OBOE)
OBOE provides a high-level abstraction of scientific observations and measurements
Enables data (or metadata) structures to be linked to domain-specific ontology concepts
Can inter-relate values in a tuple
Provides clarification of semantics of data set as a whole, not just “independent” values
OBOE - Units
Standard and customized units and their relationships to one another can easily be loaded into OBOE
OBOE - Semantic units
Measurements can be of one or more characteristics of one or more entities (unit components)
Plant measured in StudyAreaStudyArea is on the Plant
OBOE - Context
Context provides essential semantic detail by linking Observations
OBOE - Context
Experimental design
Spatial & temporal scaling
“Smart” data merge
“Sensible” analysis
OBOE - Context
Experimental design
Spatial & temporal scaling
“Smart” data merge
“Sensible” analysis
Data Integration with OBOE
Observations can be aligned for data integration ...
Observation Measurement
0.1 1.3
Diameter
Meters
has-precision has-value
Tree
Apply conversions based on alignments, e.g.-use common Entity and Characteristic concepts-apply Unit conversions to values -select lowest precision and apply
3.2
OBOE: Aligning Observations
Observations can be aligned for data integration ...
Picea rubens
Observation Measurement
0.01 1.25
Diameter
Meters
has-precision has-value
Abbies balsa.
Observation Measurement
10 320
DBH
Centimeters
has-precision has-value
Two similar observations of trees
OBOE: Aligning Observations
Observations can be aligned for data integration ...
Picea rubens
Observation Measurement
0.01 1.25
Diameter
Meters
has-precision has-value
Abbies balsa.
Observation Measurement
10 320
DBH
Centimeters
has-precision has-value
Tree
isa
isa
Length
has-dimension
has-dimension
Align entities, characteristics, and standards
isa
Observation Based Structured Query
• Both datasets contain “tree lengths” • Annotation search for “tree length” would return both datasets• Structured search allows the search to be limited by the observed entity (e.g. a tree or a tree branch)• Increase precision and recall
Example: “Sensible” data summarization
Leveraging annotations Consistency checking
NOT sensible to summarize variables by “downstream” factors; e.g., Precipitation in the StudySite by TaxonomicName
IS sensible to summarize variables by “upstream” factors; e.g., Plant Height by StudySite or by Precipitation
IS sensible to summarize variables by factors in the same Observation; e.g., Plant Height by TaxonomicName or Precipitation by StudySite
Our Semantic Approach
Method for linking elements of data objects (e.g., columns in a table) to consistent and potentially rich sets of concepts
Semantic Annotations link EML attributes to concepts defined in a Formal Ontology
Store and retrieve annotations and ontologies in Metacat
Semantic Annotation
Links data structures via metadata, to ontology termsvia OBOE
Actively working on materializing data result sets from these ontology-based queries
Investigating expressiveness of annotation language
Annotating to other data stores
KNB metadata catalog
Stores EML (XML) and raw data objects
Extend to store Ontologies, domain and OBOE (OWL-DLs serialized in XML)
Extend to store Annotations (XML)
Jena to facilitate querying ontologies
Pellet to reason (consistency of ontologies; class subsumption)
Need for data interoperability
MANY different “semantic” efforts underway to unify data within earth/biodiversity/environmental disciplines, converging on use of OBSERVATIONAL data construct
SPECIALIZED needs and concerns of different domains may drive semantic technology solutions to be diverse and incompatible
OPPORTUNITY exists for communicating and coordinating among different domains to achieve greater interoperability of emerging semantic technology solutions
BENEFIT is providing cross-disciplinary scientists with more seamless and powerful access to a broad range of relevant data and information
USA NSF’s OCI INTEROP
This NSF crosscutting program supports community efforts to provide for broad interoperability through the development of mechanisms such as robust data and metadata conventions, ontologies, and taxonomies.
Support is provided… for consensus-building activities:
community workshops, web resources such as community interaction sites, and task groups.
… and for providing the expertise necessary to turn the consensus into technical standards with associated implementation tools and resources:
information sciences, software development, and ontology and taxonomy design and implementation.
Objectives of SONet
Broad Objectives
Address semantic interoperability issues in environmental (earth sciences) data [sharing, discovery, integration]
Build a network of practitioners (SONet), including domain scientists, computer scientists, and information managers
Build generic, cross-disciplinary data interoperability solutions
Immediate Goals to Develop
An extensible and open observations data model (“core model”) to unify existing domain-specific approaches
A semantic (ontology) framework for scientific terminology and corresponding domain extensions
Demonstration prototypes using these to address critical data interoperability issues
Prospective observation models…
Project Domain Observational data model
TDWG/OSR Biodiversity Meta-model to integrate field observational data with specimen data
VSTO Atmospheric sciences
Ontologies for interoperations among different meteorological metadata standards
ODM Hydrology CUAHSI’s Observational Data Model for storing diverse hydrological data
SERONTO Socioecological research
Ontology for integrating socio-ecological data
OGC’s O&M Geospatial Observations and Measurements standard for enhancing sensor data interoperability
SEEK’s OBOE Ecology Extensible Observation Ontology for describing data as observations and measurements
Developing a core model
Identify the key observational models in the earth and environmental sciences
Are these various observational models easily reconciled and/or harmonized?
Are there special capabilities and features enabled by some observational approaches?
What services should be developed around these observational models?
Working Groups
Subgroup 1:Core Data Model for
Observations
Subgroup 2:Catalog of Common Field Observations
Subgroup 3:Scientist-Oriented Term Organization
Subgroup 4:Demonstration
Projects
Subgroup 1
Collect interoperability requirements Define common, unified data model Engage tool & data providers, data
consumersSubgroup 2
Identify and catalog common observation types (semantics)
Engage data providers and information managers
Subgroup 3
Define general extension ontologies of scientific terms
Focus work on outputs of group 2 Engage range of domain scientists Subgroup 4
Define and prototype demonstration projects
Ensure compatability of subgroups
• Each group consists of two team leads
• Postdoc funded to work on demonstration projects & help ensure compatibility across subgroups
Core SONetTeam
Goals
Identify and resolve commonalities and discrepancies among observational efforts
Define a common core observational model for data
Test with use cases (cross-disciplinary data integration tasks)
Where we are at…
Identifying and resolving commonalities and discrepancies among observational models—O&M (ISO track) and OBOE
Developing best-practices and design patterns for constructing observation-model compliant earth science ontologies, e.g. “measurement type”
Developing cross-disciplinary use cases that exercise data integration capabilities of semantic approach
Where we are at…
SEMTOOLS project
Testing and enhancing semantic mediation
Leveraging SONet observation data model
Building semantic querying and annotation capabilities into Morpho
Use Cases include using ontologies for data integration involving: ecology at an LTER site, Salmon Monitoring, and for Vegetation Traits
Future directions…
Enabling semantic annotation onto disparate data resources
Ontologies for analysis
Ontologies for experimental design
Acknowledgments
Thanks to Chad Berkley, Ben Leinfelder, and Huiping Cao for ideas, implementation and slides.
This material is based upon work supported by:
The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, 0225676, 0619060, 0722079, 0743429.
The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number 0072909), the University of California, and the UC Santa Barbara campus.
The Andrew W. Mellon Foundation.