ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife Transitioning...

Post on 13-Dec-2015

221 views 5 download

Transcript of ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife Transitioning...

ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Transitioning Relational Databases to Ontologies

Farid CerbahDassault Aviation

farid.cerbah@dassault-aviation.fr

2ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Outline

Problem statement Previous work The RDBToOnto tool and the RTAXON method Improving the process through database

optimisation A case study in aircraft maintenance Extending RDBToOnto Conclusion

3ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Problem statement

Relational databases are valuable heterogeneous sources for ontology learning Better accuracy can be expected than from text corpora

Ontology learning from relational databases is not a new research issue

Limitations of existing support Problem often restricted to finding automated ways to

import “tables” into ontologies

Derivation of ontologies with flat structure that look like the source databases

4ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Our contribution

RDBToOnto Platform

A comprehensive software support to learn fine-tuned ontologies

A framework that eases the development and the experimentation of transitioning methods

RTAXON Method

To find out taxonomies hidden in the data

5ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

A motivating example

Typical mappingscovered by

several methods

Specific toRTAXON

6ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Previous work (1)

RDB -> Ontology Transformation Database Reverse Engineering

Many transformation rules from this domain are reused for ontology learning

[Behm et al. 1997], [Ramanathan & Hodges 1997], …

Approaches mostly based on an analysis of the RDB schema

Data correlations are considered but with the restriction "Data ≡ Key Values" Key inclusion may express inheritance

Exploiting null values semantics [Lammari et al. 2007] Partitioning of a table on the basis of null values may reveal

concept hierarchies Involves data from non-key attributes

7ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Previous work (2) Mapping languages and tools

D2RQ RDB to OWL/RDF mapping Ontology-based access to relational databases Rewriting SPARQL queries into SQL

Relational.OWL A minimal ontology of ‘tables’ and ‘column’ and a processor to populate

this ontology with data from relational databases Can be used to exchange data between databases

Triplify Plugin for web applications Converts the result of SQL queries into RDF

KAON Reverse Software support to interactively map an RDB schema to a predefined

ontology

DataMaster Protégé Plugin to import table data into ontologies

8ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

RDBToOnto

A user-oriented tool with a full-fledged user interface

Supports an extensive process from the access to the data to ontology generation

Includes the RTAXON converter

Though automated to a large extent, local constraints can be interactively included to progressively refine the ontologies

Types of local constraints Table and column exclusion Naming patterns for classes and instances Categorisation patterns

9ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

The RTAXON method

Major improvement over existing methods Further refine the classes derived from the schema with subclasses found in the

content of the relations Focus on reliable categorisation patterns

Demo

Access Zones (X 516)

A/C Codes Description Type

F7X 2103 nose cone DOOR

F7X 281FL windshield retainers PANEL

F7X 300ZZ umbrella access panel No.1 PANEL

F7X 243DF servicing compartment floor No.1 FLOOR

F7X 342EZ rear under pylon fairing FAIRG

Access Zone

Door Panel Fairing Floor

Two sources involved in the identification of categ. attributes Attribute names

Revealed by lexical clues Redundancy in attribute extensions

Entropy-based approach to find good profiles Formal definition of RTAXON

Categorising attribute

10ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Optimising the source databases

Another key improvement is the inclusion of a database optimisation step Many input databases suffer from data duplication problems Optimisation -> eliminate data duplication through the processing of

inclusion dependencies

Dassault-AviationF0214

Messier-DowtyF564

ParkerF0086

NameCage_Code

(PKEY)

Companies (X 105)

Data Duplication

eels, Brakes and Braking

Landing Gear Emergency Control System

Landing Gears

Hydraulic Power

WP Title

ABSB45335

Dassault-AviationF021434A

Messier-DowtyF56434

ParkerF008633

Company NameCompany CodeWP Number

WorkPackages (X 82)

Companies (X 106)

eels, Brakes and Braking

Landing Gear Emergency Control System

Landing Gears

Hydraulic Power

WP Title

B45335

F021434A

F56434

F008633

Company CodeWP Number

WorkPackages (X 82)

Foreign Key Relationship

Name]Companies[ Code] es[CompanyWorkPackag Name] Code, CageCompanies[ Names]Company Code, es[CompanyWorkPackag

Inclusion dependency

11ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Effect of inclusion dependency processing Inclusion dependencies more inter-class relations (i.e. object properties).

Without ID identification

With ID identification

12ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Identification of inclusion dependencies RDBToOnto includes an editor to interactively define inclusion dependencies

Automated identification of inclusion dependencies A data mining approach Based on LATINO

See presentation in this tutorial on ontology learning by Miha Grčar (JSI) Dependencies discovered by LATINO are exported in RDBToOnto and can be

validated in the ID editor

13ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Mining inclusion dependencies with LATINO

14ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

A case study in aircraft maintenance

KCIT(GATE-based annotator)RDBToOnto + LATINO

Radiant

OWLIM

15ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

The ontology acquisition process

The legacy data LSA database: an heterogeneous relational database

that gathers all information related to maintenance activity

Required logistic resources Aircraft parts (Product tree) Scheduling data

Standards: Documents including widely shared conceptual models

The ontology acquisition process A multi-step transitioning process that favours modular

design

16ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Model Boostrapping + Ontology Normalisation

<>…</><>… </>….

<> …</>

Reusable Ontologies

Ontology Learning Tools

MSG-3 SNS/ATA FOAF

ModelBootstrapping

Ontology Normalisation

ATA

imports

Legacy Data

OWLIM/HKSRepository

17ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

The defined RDBToOnto conversion project

75 constraints Mostly naming patterns and inclusion dependencies

Resulting ontology Ontology model

115 classes, 334 datatypes, 54 object properties Population

49617 class instances, 51449 object property instances

No constraints for categorisation The ten discovered hierarchies by RTAXON are relevant Good behaviour when faced with categorisation conflicts

18ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

The generated class hierarchy

19ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Identified object properties

20ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

RDBToOnto extension capabilities

RDBToOnto is a user-oriented tool but it is also a framework Written in Java OWL as target language (exploiting Jena 2.5 API)

Two types of components can be added Database readers to cover more database

formats Converters to implement new learning methods New converters can have their specific global

options, local constraints and GUI

21ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Structure of RDBToONTO

DBReader

Database getDatabase()

Table ReadData(String name)

MSAccessReader DB2Reader

Database

RDBToOntoConverter

OntModel Convert(Database db)

OntClass CreateClass(TableDef)

RTAXON BasicConverter

can be extended by the users

22ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

The neutral database model

DatabaseDBSchema

TableDef

Key

PrimaryKey ForeignKey

Attribute

Table Column

StringfriendlyNames Values

*

* *

*

**

*Input to any converter

23ESWC'08 - Tutorial 3 - Transitioning Applications to Ontologies - June, 1st - Tenerife

Conclusion

We presented a significant support for transitioning relational databases to ontologies

RDBToOnto and RTAXON method have been evaluated on significant databases

RTAXON is just a first step as many extensions can be studied Learning two-level hierarchies Automatically generating local constraints (e.g. naming patterns)

More resources are available on TAO project web site, including User Guide and demos Development Guide A fully implemented sample showing how to extend the tool