A Semantic Modelling Approach to Biological Parameter Interoperability

17
A Semantic Modelling A Semantic Modelling Approach to Biological Approach to Biological Parameter Parameter Interoperability Interoperability Roy Lowry & Laura Bird British Oceanographic Data Centre Pieter Haaring RIKZ, Rijkswaterstaat, The Netherlands Ocean Biodiversity Ocean Biodiversity Informatics Informatics

description

Ocean Biodiversity Informatics. A Semantic Modelling Approach to Biological Parameter Interoperability. Roy Lowry & Laura Bird British Oceanographic Data Centre Pieter Haaring RIKZ, Rijkswaterstaat, The Netherlands. Presentation Overview. The nature of the problem - PowerPoint PPT Presentation

Transcript of A Semantic Modelling Approach to Biological Parameter Interoperability

Page 1: A Semantic Modelling Approach to Biological Parameter Interoperability

A Semantic Modelling Approach A Semantic Modelling Approach to Biological Parameter to Biological Parameter

InteroperabilityInteroperability

Roy Lowry & Laura BirdBritish Oceanographic Data Centre

Pieter HaaringRIKZ, Rijkswaterstaat, The Netherlands

Ocean Biodiversity InformaticsOcean Biodiversity Informatics

Page 2: A Semantic Modelling Approach to Biological Parameter Interoperability

Presentation OverviewPresentation Overview

• The nature of the problem• Dictionaries and data models• The starting position• Manual mapping• Automation through semantic matching• From dictionary to semantic model• Mapping semantic models• Semantic model applications• Conclusions and lessons learned

Page 3: A Semantic Modelling Approach to Biological Parameter Interoperability

The Nature of the ProblemThe Nature of the Problem

• BODC and Rijkswaterstaat both have marine databases holding a wide range of physical, chemical and biological parameters

• Both were to be included pan-European metadatabases (EDIOS and SEA-SEARCH CDI) using a common discovery vocabulary

• BODC set up the vocabulary and obviously included a mapping to the BODC Parameter Dictionary

• Problem arose of how to provide a similar mapping for the Rijkswaterstaat

• If the Rijkswaterstaat data markup vocabulary could be mapped to the BODC Parameter Dictionary then the BODC discovery vocabulary mapping could be used

Page 4: A Semantic Modelling Approach to Biological Parameter Interoperability

Dictionaries and Data ModelsDictionaries and Data Models

• BODC systems have roots in the GF3 model, which means:

Data values are linked to a parameter code Parameter code is defined in a Parameter

Dictionary The parameter code specifies more than one

metadata item for the data value For chemical and biological data ‘more than

one’ becomes ‘a lot’

Page 5: A Semantic Modelling Approach to Biological Parameter Interoperability

Dictionaries and Data ModelsDictionaries and Data Models

• Rijkswaterstaat uses data models (DONAR becoming WADI) Measurements are accompanied by

attributes containing specific atomic metadata items

Each attribute is populated from a controlled vocabulary

DONAR constrains attribute term combinations using a ‘parameter dictionary’ concept

WADI reduces maintenance overheads by allowing any combination

Page 6: A Semantic Modelling Approach to Biological Parameter Interoperability

The Starting PositionThe Starting PositionBODC

Parameter Codes defined by two plain-text fields

Related semantic information not necessarily in the same field

Fields would not concatenate sensibly OK for humans, but not for machines

Rijkswaterstaat Consistently located semantics Metadata fields that concatenate sensibly in

both Dutch and English

Page 7: A Semantic Modelling Approach to Biological Parameter Interoperability

Manual MappingManual Mapping• Manual mapping protocol

For each entry in the Rijkswaterstaat ‘dictionary’ spreadsheet Look up code with identical meaning using BODC

Dictionary search tools (Access Filter by Form) If found

– Copy BODC code from Access and paste into spreadsheet

Else– Prepare dictionary update record and submit

for QA and load

• Error prone and 500 entries is pushing the limit of human endurance!

Page 8: A Semantic Modelling Approach to Biological Parameter Interoperability

Semantic MatchingSemantic Matching

• When code lists run into thousands, automation is required

• Rijkswaterstaat developed a semantic matching tool to pull matching terms (preferably one) from the BODC dictionary

• Defeated by the lack of standardisation in the BODC plain-text fields e.g. Calanus abundance Abundance of Calanus Calanus count Number of Calanus

Page 9: A Semantic Modelling Approach to Biological Parameter Interoperability

Dictionary to Semantic ModelDictionary to Semantic Model

• Became apparent that the BODC Dictionary required significant improvement if it was to support mapping automation

• Development strategy was to model the parameter code in the same way DONAR models a measurement

• Semantic model developed to cover all codes in BODC Dictionary

Page 10: A Semantic Modelling Approach to Biological Parameter Interoperability

Dictionary to Semantic ModelDictionary to Semantic Model

• Semantic model developed from DONAR Semantic model developed from DONAR with an increased semantic element with an increased semantic element count to overcome shoe-horningcount to overcome shoe-horning

• Principle that semantic elements may be Principle that semantic elements may be combined automatically to produce text combined automatically to produce text descriptions maintaineddescriptions maintained

• Currently implemented as three sub-Currently implemented as three sub-modelsmodels

• Element superset will ultimately be Element superset will ultimately be created as a single modelcreated as a single model

Page 11: A Semantic Modelling Approach to Biological Parameter Interoperability

Dictionary to Semantic ModelDictionary to Semantic Model

• Biological sub-model semantic elements Parameter (Abundance, Biomass) Taxon_code (ITIS code) Taxon_name Taxon_subgroup (gender, size, stage) Parameter_compartment_relationship (per unit

volume of the, per unit area of the) Compartment (water column, bed, sediment) Sample_preparation Analysis Data_processing

• Needs further refinement e.g. subdivide Taxon_subgroup

Page 12: A Semantic Modelling Approach to Biological Parameter Interoperability

Mapping Semantic ModelsMapping Semantic Models

• Two stage process First map the semantic elements

DONAR Parameter = BODC Parameter + Parameter_compartment_relationship

DONAR Compartment = BODC Compartment

Then map vocabularies for mapped elements Surface water = water column

• Relational database designers will recognise this as normalisation

Page 13: A Semantic Modelling Approach to Biological Parameter Interoperability

Mapping Semantic ModelsMapping Semantic Models• Number of ‘look-ups’ required is reduced by an

order of magnitude

• Vocabulary elements have simple semantics so automation is possible

• Approximately 90% of the Rijkswaterstaat to BODC mapping accomplished by a single SQL statement

• Straightforward extension of vocabulary maps (different names for same thing) sorted out most of the rest

• Thesauri could help reduce the need for this

Page 14: A Semantic Modelling Approach to Biological Parameter Interoperability

Mapping Semantic ModelsMapping Semantic Models

• ‘Hard Core’ problems required manual resolution Unclear or ambiguous semantics in

Rijkswaterstaat element vocabularies (residual beta)

Problems with Dutch to English translation

• Some mapping errors were detected Caused by homonyms (Branchiura) Emphasises the need for more than just a

name for a taxon (reference or ITIS code)

Page 15: A Semantic Modelling Approach to Biological Parameter Interoperability

Semantic Model ApplicationsSemantic Model Applications

• Semantic modelling is a lowest common denominator approach to metadata

• This is what makes it good for mapping

• The approach also offers the basis for user-controlled data discovery and interoperability User chooses the semantic element subset User data selection interaction based on the

subset vocabulary Automated interoperability requires more

sophistication (thesauri, ontologies)

Page 16: A Semantic Modelling Approach to Biological Parameter Interoperability

ConclusionsConclusions• Don’t even think about manual

mapping of large parameter dictionaries

• 99% of a map is completed in the first 10% of the time

• More standardisation means fewer errors and problems

• Semantic model vocabularies need ontologies and thesauri to achieve their full interoperability potential

Page 17: A Semantic Modelling Approach to Biological Parameter Interoperability

ConclusionsConclusions

• Semantic modelling works for mappings between dictionaries and data models

• It also has great potential for parameter discovery and interoperability