29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis.
-
Upload
alejandro-hensley -
Category
Documents
-
view
216 -
download
1
Transcript of 29 th November 2001 Graphs and Functions: Recurring Themes in Databases Alex Poulovassilis.
29th November 2001
Graphs and Functions:Recurring Themes in Databases
Alex Poulovassilis
29th November 2001
Databases
Databases store information of relevance to a group of users e.g.
• employees’ personal details, for a Personnel department
• employees’ income details, for a Payroll department
• details of molecular structure and interaction, for a Drug company
• details of TV broadcasts and ratings, for a TV company
29th November 2001
Data models
The information stored in a database is expressed using a data model
The binary relational data model is a very simple data model
In this model, information is represented using entities and binary relationships between them
These can be represented as the nodes and edges of a graph
e.g. here is the schema of a ViewingFigures database:
29th November 2001
29th November 2001
Data and Schema
The schema of a database defines the type and format of the actual data – it is part of the database’s meta data
The data in the database conforms to the schema.
So a fragment of the ViewingFigures data might be:
29th November 2001
29th November 2001
29th November 2001
29th November 2001
The TriStarp Project
The TriStarp research project, led by Prof Peter King from the mid 1980s, aimed to
(1) develop repository technology for binary relational information
(2) develop languages for computing with this kind of information
Mir Derakhshan worked on (1). Carol Small and I worked on (2).
We were supported by CASE studentships from IBM UK Labs, Prof Geoff Sharman and Norman Winterbottom being our industrial supervisors
29th November 2001
Computing with Binary Relational Data
There are two natural candidates for this:
• logic languages - explored by Carol
• functional languages - the topic of my PhD research, resulting in the FDL language (1990)
29th November 2001
The Logic Approach
• Find all actors who star in programme P205
stars(P205,x?)
stars
P205 x?
stars
Programme Actor
29th November 2001
The Logic Approach
• Find all programmes in which Kevin Bacon stars
stars(p?,’Kevin Bacon’)
stars
p? Kevin Bacon
stars
Programme Actor
29th November 2001
The Logic Approach
• Find all actors who have starred with Kevin Bacon
stars(p?,’Kevin Bacon’),stars(p?,x?)
stars
Programme Actor
stars
p? Kevin Bacon
x?
stars
29th November 2001
The Functional Approach
The functional approach interprets binary relationships as functions, leading to the so-called functional data model
starsProgramme Actor
inv_stars
29th November 2001
The Functional Approach
• Find all actors who star in programme P205
stars P205
starsProgramme Actor
inv_stars
29th November 2001
The Functional Approach
• Find all programmes in which Kevin Bacon stars
inv_stars ’Kevin Bacon’
starsProgramme Actor
inv_stars
29th November 2001
The Functional Approach
Find all actors who have starred with Kevin Bacon
[x | pinv_stars ’Kevin Bacon’; xstars p]
starsProgramme Actor
inv_stars
29th November 2001
More complex queries
Find the most popular programme showing at 10pm on 1st November, 2001:
let maxViewers = max [viewers s | s inv_date (1,11,2001);
(start s) <= 2200; (end s) > 2200] in
[of s | s inv_viewers maxViewers]
29th November 2001
Derived Functions
Find the most popular programme showing at time t on date d:
mostPopular t d =
let maxViewers = max [viewers s | s inv_date d;
(start s) <= t; (end s) > t] in
[of s | s inv_viewers maxViewers]
29th November 2001
Recursive functions
Find actors linked to Kevin Bacon via any number of edges labelled
stars:
linkedTo [‘Kevin Bacon’]
where:
linkedTo result = let new = [x | y result;
p inv_stars y;
x stars p] in
if (subset new result)
then result
else linkedTo (new U result)
stars
Programme Actor
linkedTo
29th November 2001
Oracle of Bacon at Virginia www.cs.virginia.edu/oracle
Bacon Number No of People
0 1
1 1479
2 115203
3 285896
4 65055
5 4535
6 534
7 81
8 28
9 1
10 1
Total linkable actors 472814
29th November 2001
Higher-order functions
More generally:
linkedTo s = complete (stars,inv_stars) s
where:
complete (f,inv_f) result = let new = [x | b result;
a inv_f b;
x f a] in
if (subset new result)
then result
else complete (f,inv_f) (new U result)
f
A B
linkedTo
29th November 2001
Collaboration Networks
Find all people linked to a person P via the author relationship:
complete (author,inv_author) [P]
author Paper Person
inv_author
29th November 2001
Acknowledgements…
If we ask the simpler query
[x | pinv_author ’Alexandra Poulovassilis’; xauthor p]
author Paper Person
inv_author
we obtain the people with whom I have co-authored research papers:J.Bailey K.Benkerimi S.Courtenage P.Demetriades M.Derakhshan B.Heydecker S.Hild P.J.H.King M.Levene N.Lorentzos P.J.McBrien P.Newson E.Nonas R.Offen S.Reddi S.Schwarz C.Small E.Tuv P.T.Wood L.Xu
29th November 2001
Drawbacks of the Binary Relational Model
Despite its elegance, the binary relational model has some drawbacks:
(a) large binary relational schemas can be hard to understand
(b) it is not so natural for representing higher-dimensional relationships
29th November 2001
The Hypernode Model
(a) led to research into nested-graph data models with Mark Levene
29th November 2001
Higher-dimensional relationships
An example of problem (b) is the 3-way relationship between
Distribution companies, Programmes and TV companies
which has to be represented by an entity and 3 binary relationships:
Supply
DistrCo
Programme
TVCo
29th November 2001
The PFL Language
This led to the development of a new functional language PFL, with Carol Small, which directly supports higher-dimensional relationships
e.g. the supply relationship is accessed by a single selector function
|supply : (DistrCo,Programme,TVCo) [(DistrCo,Programme,TVCo)]
Some examples:
|supply (Any,P205,BBC)
|supply (Any,Any,BBC)
|supply (Any,P205,Any)
29th November 2001
Active Databases
Up to now, I have been looking at schema, data and derived database information
In the 1990s a new kind of database information was being explored, namely event-condition-action rules of the form:
on event if condition do action
ECA rules make a database active in that it can automatically execute actions if events occur and conditions hold
29th November 2001
Active PFL
In a project during mid 1990s, we extended PFL with ECA rules (with Swarup Reddi and Carol Small)
For example:
on insert viewers
if [s | (s,n)|viewersInc (Any,Any); n < 500000]
do insert [s | (s,n)|viewersInc (Any,Any); n < 500000] lowRated
viewers
Showing Number
29th November 2001
PFL’s ECA rule execution semantics
We specified these in PFL itself, to experiment before implementing:
execSched (db,s) =if s = []then (db,[])else execSched (schedRules (exec (head s,db),s))
schedRules (db,a:s) =let (db,pre,suf) =
fold schedRule (db,[],[]) (triggers a) in(db,pre ++ s ++ suf)
schedRule i (db,pre,suf) =if (eval (event-condition-query i) db) = {}then (db,pre,suf)else updateSched (actions i,mode i,db,pre,suf)
29th November 2001
Analysing and Optimising ECA rules
Techniques are needed for analysing and optimising the behaviour of ECA rules
In a project that started in late 1990s, we have been using the functional semantics of ECA rule execution as the basis for developing such techniques (with James Bailey, Simon Courtenage, Pete Newson)
In particular, we have been investigating abstract interpretation and partial evaluation of the rule execution semantics for analysis and optimisation, respectively.
29th November 2001
Abstract execution semantics
execSched* (db*,s*) =if s* = []then (db*,[])else execSched* (schedRules* (exec* (head s*,db*),s*))
schedRules* (db*,a*:s*) =let (db*,pre*,suf*) =
fold schedRule* (db*,[],[]) (triggers a*) in(db*,pre* ++ s* ++ suf*)
schedRule* i (db*,pre*,suf*) =if (eval* (event-condition-query i) db*) = Falsethen (db*,pre*,suf*)else updateSched (actions i,mode i,db*,pre*,suf*)
29th November 2001
Correctness of the Abstract Execution
If for all queries q, abstract databases db*, and abstract actions a*:
• conc (exec* (a*,db*)) is a superset of
[exec (a,db) | (a,db) conc (a*,db*)]
• eval* q db* = False implies that
for all db in conc db*, eval q db = {}
then execSched* is a conservative test for
• rule termination
• rule unreachability
29th November 2001
Partial Evaluation of Rule Execution
Produce a specialised equation for schedRules for each kind of rule action
that may appear at the head of the schedule:
schedRules (db,a1:s) =
let (db,pre,suf) =
fold schedRule (db,[],[]) (triggers a1) in
(db,pre ++ s ++ suf)
schedRules (db,a2:s) =
let (db,pre,suf) =
fold schedRule (db,[],[]) (triggers a2) in
(db,pre ++ s ++ suf) . . .
29th November 2001
Partial Evaluation of Rule Execution
Suppose action a1 triggers rule 2 and rule 3 (in that order of priority).
Then we can replace triggers a1 above by [2,3] and apply fold
obtaining:
schedRules (db,a1:s) =
let (db,pre,suf) =
schedRule (schedRule (db,[],[]) 2) 3 in
(db,pre ++ s ++ suf)
29th November 2001
Partial Evaluation of Rule Execution
Now we can apply schedRule (assuming rule 2 has Immediate scheduling mode and rule 3 Deferred scheduling mode):
schedRules (db,a1:s) = let (db,pre,suf) = if (eval (event-condition-query 2) db) = {}then if (eval (event-condition-query 3) db) = {}
then (db,[],[]) else (db,[],bind (actions 3) db)
else if (eval (event-condition-query 3) db) = {} then (db,bind (actions 2) db,[]) else (db,bind (actions 2) db,bind (actions 3) db)
in (db,pre ++ s ++ suf)
29th November 2001
Heterogeneous Databases
So far, I have been discussing single databases
However, larger-scale applications may need to integrate information from several databases, possibly supporting different data models
To integrate information stored in such heterogeneous databases it is necessary to form a single, integrated schema
Conflicts may existing between the various source schemas, which must be removed by applying transformations to these schemas
29th November 2001
29th November 2001
Graphs for Schema Transformation
In work with Peter McBrien started in late 1990s, we have developed a general framework for transforming and integrating heterogeneous database schemas
We represent schemas expressed in higher-level data models, such as relational or object-oriented, in terms of a nested-graph data model, thus allowing us to transform between different data models
In our schema transformation framework, new schema constructs are defined using queries over existing constructs
In our framework, schema transformations are reversible, thus allowing query and data translation between schemas:
29th November 2001
29th November 2001
29th November 2001
addClass Series [p|(p,S)category]
addClass Doc [p|(p,D)category]
addClass Film [p|(p,F)category]
addClass Prog [p|(p,c)category]
29th November 2001
addSubClass Film Prog
addSubClass Doc Prog
addSubClass Series Prog
addClass Series [p|(p,S)category]
addClass Doc [p|(p,D)category]
addClass Film [p|(p,F)category]
addClass Prog [p|(p,c)category]
29th November 2001
addSubClass Film Prog
addSubClass Doc Prog
addSubClass Series Prog
addClass Series [p|(p,S)category]
addClass Doc [p|(p,D)category]
addClass Film [p|(p,F)category]
addClass Prog [p|(p,c)category]
delRel category [(p,F)|pFilm] U
[(p,D)|pDoc] U
[(p,S)|pSeries]
29th November 2001
addConstraint subset Film ProgaddConstraint subset Doc
Prog addConstraint subset Series
Prog
addNode Series [p|(p,S)category]addNode Doc [p|(p,D)category]addNode Film [p|(p,F)category]addNode Prog [p|(p,c)category]
delEdge category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries]
delNode Programme ProgdelNode Category [F,D,S]
29th November 2001
delConstraint subset Film ProgdelConstraint subset Doc
Prog delConstraint subset Series
Prog
delNode Series [p|(p,S)category]delNode Doc [p|(p,D)category]delNode Film [p|(p,F)category]delNode Prog [p|(p,c)category]
addEdge category [(p,F)|pFilm] U [(p,D)|pDoc] U [(p,S)|pSeries]
addNode Programme ProgaddNode Category [F,D,S]
29th November 2001
Given a transformation from a schema S1 to a schema S2, and a query Q on S1, we use the delete transformation steps to substitute for constructs of S1 which are not in S2 e.g. from the previous slide:
[title p | p Film U Doc] on:
translates into
[title p | p [p | (p,F) category] U
[p | (p,D) category] on:
Query Translation
29th November 2001
Functions for Database Integration
In the formal specification of our framework, each schema transformation is a function
t : Database Database
where a database consists of schema+data
We are currently implementing our framework within the Automed project
We are planning to handle query language heterogeneity in Automed by translation into/from a functional intermediate query language
29th November 2001
Future Research
Extending Automed to also handle materialised views and view updates, leading to a data warehousing approach to data integration
Data warehousing of genomic data (in collaboration with Profs Thornton, Orengo, Barton, and Drs Keller, Martin, Shepherd)
Moving beyond database integration and database dynamics to data integration on the Web and Web dynamics:
• handling XML data sources within Automed
• developing an ECA rule language for XML