Unifying Data and Domain Knowledge Using Virtual Views
IBM T.J. Watson Research Center
Lipyeow Lim, Haixun Wang, Min Wang, VLDB2007
2008. 1. 4
Summarized by Gong GI Hyun, IDS Lab., Seoul National University
Copyright 2008 by CEBTCenter for E-Business Technology
Background
DBMS originally designed for transaction data.
Current DBMSs are not ready to manipulate data in con-nection with knowledge.
An unending quest
Database or Knowledge-base?
New applications: the Semantic web, etc.
New extensions are required to bridge the gap between data representation and knowledge representation/infer-encing
IDS Lab. Seminar - 2
Copyright 2008 by CEBTCenter for E-Business Technology
A Motivating Example
RDBMS allows us to query wines through attributes ID, Type, Origin, Maker, Price.
Human intelligence operates in a quite different way.
Humans have ability to combine data with the domain Knowledge.
IDS Lab. Seminar - 3
Copyright 2008 by CEBTCenter for E-Business Technology
A Motivating Example
Query 1
To find wines that originate from the US.
RDB Case :
– Select W.ID From Wine as W Where W.Origin = ‘US’;
– Result : No result
Human’s answer : Zinfandel
– Zinfandel’s Origin EdnaValley is located in California.
IDS Lab. Seminar - 4
Copyright 2008 by CEBTCenter for E-Business Technology
A Motivating Example
Query 2
Which wine is a red wine?
RDB Case :
– Select W.ID From Wine as W Where W.hasColor = ‘red’;
– Result : ERROR!
– HasColor is not in the schema of the wine table.
Human’s Answer : Zinfandel
– Zinfandel is red.
Both the user and the DBMS must know what HasColor stands for when it appears in a query, and how to derive the value for Has-Color for any given wine.
IDS Lab. Seminar - 5
Copyright 2008 by CEBTCenter for E-Business Technology
Domain Knowledge from OWL Ontol-ogy
Wine Ontology from the web ontology language OWL (W3C)
Extract class hierarchies, (transitive) properties, implications from OWL
IDS Lab. Seminar - 6
Copyright 2008 by CEBTCenter for E-Business Technology
Challenges
How to incorporate domain knowledge (ontology) into a RDBMS?
Relational Data model remains ill-suited for semi-structured data.
How to integrate relational data with domain knowledge?
How to query relational data with meaning ?
How to process such queries ?
IDS Lab. Seminar - 7
Copyright 2008 by CEBTCenter for E-Business Technology
Overview of our solution
Create a relational virtual view on top of the data and the domain knowledge.
Data and knowledge can be queried together.
New knowledge can be derived
The virtual view is an interface through which users can query data, domain knowledge, and derived knowledge in a seamlessly unified manner.
Rewrite query on virtual view.
IDS Lab. Seminar - 8
Copyright 2008 by CEBTCenter for E-Business Technology
The Virtuality of the View
Users create virtual views over the relational data and the ontology.
Virtual columns/attributes not in original data.
Virtual columns not materialized -- inferred from the on-tology.
IDS Lab. Seminar - 9
Copyright 2008 by CEBTCenter for E-Business Technology
Creating the Vitual View
CREATE VIEW WineView(Id, Type, Origin,
Maker, Price, LocatedIn) AS
SELECT W.*, R.Regions
FROM Wine AS W, RegionKnowledge AS R
WHERE W.Origin = R.region
IDS Lab. Seminar - 10
Copyright 2008 by CEBTCenter for E-Business Technology
Queries
Query 1 : To find wines that originate from the US
Select ID from WineView Where ‘US’ in LocatedIn
Query 2 : Which wine is a red wine? Select ID from WineView Where hasColor = ‘red’;
IDS Lab. Seminar - 11
Copyright 2008 by CEBTCenter for E-Business Technology
Physical Storage Layer
The ontology is modeled as semi-structured data.
Traditional RDMSs cannot handle directly.
Hybrid Relational-XML DBMS
IBM DB2 9 PureXML supports XML
IDS Lab. Seminar - 12
Copyright 2008 by CEBTCenter for E-Business Technology
Ontology Repository
Ontology repository extracts several types of information from the ontology files including class hierarchies, implication rules, transitive properties.
Class Hierarchies
subClassOf
isA
IDS Lab. Seminar - 13
Transitive Property
TransitiveProperty
Implication rules
Implication graph
Copyright 2008 by CEBTCenter for E-Business Technology
Query Expanding
SELECT V.Id FROM WineView AS V WHERE .hasColor=White;
(Type=WhiteWine) → (hasColor=white)
(Type=Riesling) → (hasColor=white)
SELECT V.Id FROM Wine AS W WHERE W.type=WhiteWine OR W.type=Riesling;
IDS Lab. Seminar - 14
Copyright 2008 by CEBTCenter for E-Business Technology
Experiment
Investigate time to rewrite the queries on virtual views.
Measurement: rewriting time averaged over 5 randomly generated data sets.
Some tweaks :
Remove dead nodes
Memoization techniques
Pre-computation of predicate re-writing
IDS Lab. Seminar - 15
Copyright 2008 by CEBTCenter for E-Business Technology
Experiment
Implication Graph Density
Size of transitive property trees
IDS Lab. Seminar - 16
Copyright 2008 by CEBTCenter for E-Business Technology
Related Works
Ontology Tools
OntoEdit : Use a file system to store ontology.
RStar, KAON : Allow the ontology data to be stored in a RDB
Two Limitations of this loosely coupled approach
DBMS users cannot reference ontology data directly.
Ontology related query processing cannot leverage the query processing and optimization power of a DBMS.
A recent advance in ontology management in DBMSs was introduced by Oracle.
IDS Lab. Seminar - 17
Copyright 2008 by CEBTCenter for E-Business Technology
Conclusion
Framework for putting a little semantics into relational SQL systems.
Users register ontologies in DBMS and links them with re-lational data by creating virtual views.
Virtual columns in the virtual views are not materialized.
Queries on the virtual columns are rewritten to predi-cates on base table columns.
Future work: performance issues
IDS Lab. Seminar - 18
Top Related