A Brief History of DB Theory
description
Transcript of A Brief History of DB Theory
![Page 1: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/1.jpg)
1
A Brief History of DB Theory• Functional dependencies ---
Normalization• More dependencies: multivalued,
general• Universal relations• Acyclic hypergraphs• Logical query languages: Datalog
![Page 2: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/2.jpg)
2
Functional Dependencies• Setting: A relation = table with
column headers (schema ) and a finite number of rows (tuples ).
A B C0 1 20 3 4
![Page 3: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/3.jpg)
3
FD X ->Y
• A statement about which instances (finite sets of tuples) are “legal” for a given relation.
• If two tuples agree in all the attributes of set X , then then must also agree in all attributes of set Y.– Common case: X is the key of the
relation, Y is the other attributes.
![Page 4: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/4.jpg)
4
Issue #1: Inference
• Armstrong’s Axioms– X ->Y if Y is a subset of X– X ->Y implies XZ ->YZ (XZ , etc. =
“union”)– X ->Y and YZ ->W implies XZ ->W
• Closure test (Bernstein): given FD’s and a possible consequent X ->Y , use the FD’s to see what X determines, and then see if Y is contained therein.
![Page 5: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/5.jpg)
5
Issue #2: Redundancy
• Certain combinations of FD’s lead to redundancy and other “anomalies.”
• Example: B ->C is the only FD:A B C0 1 23 1 ?
• The FD lets us predict the value of ?.
![Page 6: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/6.jpg)
6
Issue #3: Normalization
• Eliminate redundancy by splitting schemas.
• A FD X ->Y allows us to split schema XYZ into XY and XZ .
• Without the FD, the decomposition would lack a lossless join : the ability to reconstruct the original XYZ relation from the decomposed relations.
![Page 7: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/7.jpg)
7
How I Met Moshe & Learned Database Theory
Tsichritzis
Bernstein Beeri
Ullman Vardi
![Page 8: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/8.jpg)
8
Why I Like Working With Vardi• Taste, selection of issues.• Best inventor of constructions.• Name comes after mine in
alphabet.
![Page 9: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/9.jpg)
9
Lossless Joins
• When relations are decomposed, do the pieces allow reconstruction of the original?– Only way: join the projected relations.– You always get back what you started
with.– Bad case is when you get more.
![Page 10: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/10.jpg)
10
ExampleA B C0 1 23 1 4
A B A B C B C0 1 0 1 4 1 23 1 3 1 2 1 4
0 1 23 1 4
![Page 11: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/11.jpg)
11
More Kinds of Dependencies• In essence, a dependency is any
predicate that tells whether a given set of tuples is OK for a given schema.
• The language chosen determines what constraints can be expressed and what we can decide about relations.– FD’s are just one, simple example
![Page 12: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/12.jpg)
12
Multivalued Dependencies
• These occur when a relation tries to connect one class of objects to independent sets from two other classes..
• Notation: X ->->Y means:– If two tuples agree on X , then we may
swap the Y components and get two tuples that are also in the relation.
![Page 13: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/13.jpg)
13
Example
• EmpID ->-> PhoneEmpID Addr Phone Project
If: 123 a1 p1 j1123 a2 p2 j2
Then: 123 a1 p2 j1123 a2 p1 j2
![Page 14: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/14.jpg)
14
MVD Origins
• Independent work of Delobel, Fagin, Zaniolo
• Inference of MVD’s + FD’s: Beeri, Fagin, and Howard.
![Page 15: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/15.jpg)
15
Generalized Dependencies• Equality-generating dependencies:
– If tuples with this pattern of equal symbols appear in a relation instance, then certain symbols must also be equal.
– Generalizes FD’s.
• Tuple-generating dependencies:– If tuples with this pattern of equal symbols
appear in a relation instance, then a certain tuple must also appear.
– Generalizes MVD’s.
![Page 16: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/16.jpg)
16
Example: EGD
• FD A ->B in schema ABC :A B Ca b1 c1
(Hypotheses)a b2 c2 b1 = b2
(Conclusion)
![Page 17: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/17.jpg)
17
Example: TGD
• MVD A ->->B in schema ABC :A B Ca b1 c1
(Hypotheses)a b2 c2a b1 c2 (Conclusion)
![Page 18: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/18.jpg)
18
Full Versus Embedded GD’s• Full GD’s have a conclusion with only
symbols that appear in the hypotheses.• Embedded GD’s may have new
symbols in the conclusion.– Existentially quantified.
• Embedded EGD makes no sense, but embedded TGD’s are quite interesting.
![Page 19: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/19.jpg)
19
The Chase for Inferring GD’s• Test whether a GD G follows from given
GD’s.• Start with R = the hypotheses of G .• Apply GD H by mapping the hypotheses
of H to some rows of R .– If so, infer the (mapped) conclusion of H ---
equate two symbols of R or add a tuple to R .
• If you eventually infer the conclusion of G , then G follows, else it does not.
![Page 20: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/20.jpg)
20
Example: If A ->B Then A ->->B• R has tuples (a,b1,c1) and
(a,b2,c2). Does it have (a,b1,c2)?• Apply A ->B .
– It’s hypotheses are the same as the two tuples of R , so surely they map.
– Lets us conclude b1=b2.– Thus, since R has (a,b2,c2), It surely
has (a,b1,c2).
![Page 21: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/21.jpg)
21
Decision Properties of GD’s• If all dependencies are full, no new
symbols ever appear, so the chase must terminate.– Either we prove the desired
conclusion, or R becomes a relation that provides a counterexample --- it satisfies all the given GD’s, but not the one we were trying to infer.
![Page 22: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/22.jpg)
22
Undecidability of Embedded GD’s• First undecidability results involved the
inference of untyped GD’s (symbols may appear in several columns).– Beeri and Vardi– Chandra, Lewis, and Makoswky
• Tough result is undecidability even when GD’s are typed (one column per symbol).– Vardi– Gurevich and Lewis
![Page 23: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/23.jpg)
23
History of GD’s
• Similar ideas were developed independently by several people:– Yannakakis and Papadimitriou– Beeri and Vardi– Fagin– Sadri and Ullman
![Page 24: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/24.jpg)
24
The Universal Relation
• Idea: attributes carry information regardless of how they are placed in relation schemas.
• One of the cool things about Moshe Vardi is the collection he keeps of re-inventions of this universal relation concept.– Typically touted by some HCI type as a
“wonderful new interface to databases.”
![Page 25: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/25.jpg)
25
UR Query Systems
Query
UR
StoredRelations
![Page 26: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/26.jpg)
26
Example• Stored relations: ES(Emp,Sal),
ED(Emp,Dept), DM(Dept,Mgr)• Universal relation: U(Emp,Sal, Dept,Mgr)• UR query: SELECT Mgr WHERE Emp=a• Translated to: SELECT Mgr FROM ED, DM WHERE Emp=a AND ED.Dept=DM.Dept
• U != join(ES, ED, DM); empty ES proves it.
![Page 27: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/27.jpg)
27
Theory of Universal Relations• Some were quite upset by the UR idea.
– Objection: the informal or ad-hoc way queries were translated from UR to stored relations.
• Codd: “Neither the collection of all base relations nor the collection of all views should be cast by the DBMS in the form of a `universal relation’ (in the Stanford University sense) [Vardi, 1988].”
![Page 28: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/28.jpg)
28
Some Early Approaches to the UR• Selected joins of stored relations.• Representative instance = pad stored
relations with nulls in missing attributes; then chase with given dependencies.– Contributions by Vassiliou, Honeyman,
Mendelzon, Sagiv, Yannakakis.
• Maximal objects = union of joins within “acyclic hypergraphs.”– Maier, Ullman
![Page 29: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/29.jpg)
29
Window Functions• Two stage process:
1. If query involves set of attributes X , compute the window function [X ] = some relation with schema X derived from the UR.2. Apply the query to [X ].
• Different UR definitions give different window functions, e.g., join-based, rep.-instance, maximal-object.
• From Maier, Ullman, Vardi [1984].
![Page 30: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/30.jpg)
30
Acyclic Hypergraphs
• Several different applications led to the identification of a class of database schemas (collection of relation schemas) with useful properties.– Sensible connections among stored relations
in maximal-object theory of UR.– Joins of many relations in time polynomial in
the size of the input and output (Yannakakis).
– Etc., etc.
![Page 31: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/31.jpg)
31
Hypergraphs
• Nodes (typically attributes)• (Hyper)edges = sets of any
number of nodes.
A B C D E F
![Page 32: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/32.jpg)
32
GYO Reduction• Graham, Yu-Ozsuyoglu, independently
defined acyclic hypergraphs thusly: • Two transformations:
1. Delete a node that is in only one hyperedge.2. Delete a hyperedge contained in another.
• Hypergraph is acyclic iff the result is a single, empty edge.– Limit is unique for any hypergraph.
![Page 33: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/33.jpg)
33
Example
• GYO steps: delete C; delete D; delete {B}; delete F; delete {B,E}; delete A; delete B; delete E;
A B C D E F
![Page 34: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/34.jpg)
34
History• Acyclic hypergraph idea from Bernstein
and Goodman.– But the definition is rather different.
• Hypergraph version from Fagin, Mendelzon, Beeri, Maier, others.– And don’t forget the GYO folks.
• Fagin defined several related-but-distinct concepts of “acyclic.”– This one is “alpha-acyclic.”
![Page 35: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/35.jpg)
35
Logical Query Languages
• Collections of Horn-clause rules without function symbols behave almost like SQL, but can do recursion.
• Conjunctive queries = single Horn-clause rules w/o function symbols have decidable containment (Chandra and Merlin).
![Page 36: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/36.jpg)
36
Conjunctive Queries
• Example: path(X,Y) :- edge(X,Z) & edge(Z,Y)
• Atoms in the body (hypothesis) refer to stored relations (EDB or “Extensional Database”).
• Atom in head (conclusion) is the result.• Q1 is contained in Q2 if for every
database D, Q1(D ) is a subset of Q2(D ).
![Page 37: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/37.jpg)
37
Containment Mappings
• Mapping from the variables of Q2 to the variables of Q1 that:
1. Turns the head of Q2 into the head of Q1.2. Turns every atom in the body of Q2 into some atom in the body of Q1.
• Containment mapping exists iff Q1 is contained in Q2.
![Page 38: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/38.jpg)
38
Example
Q1: answer(X,Y) :- e(X,Y) & e(Y,X)
Q2: answer(X,Y) :- e(X,W) & e(W,Z) & e(Z,Y)
![Page 39: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/39.jpg)
39
Datalog Programs
• Collection of conjunctive queries.• Some predicates are EDB (stored).• Other predicates are IDB
(intensional database; defined by the CQ’s only).
• One IDB predicate is the answer = least fixedpoint of the CQ’s.
![Page 40: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/40.jpg)
40
Example
path(X,Y) :- edge(X,Y)path(X,Y) :- path(X,Z) & edge(Z,Y)
• Value of path is all pairs of nodes such that there is a path from the first to the second according to EDB predicate edge .
![Page 41: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/41.jpg)
41
Optimization of Datalog Programs• Problem: often the query asks for
only a fraction of the answer.– e.g., find path(0,Y) for fixed node 0.
• Bottom-up methods (“seminaive”) evaluate the whole least-fixedpoint, throw most away.
• Top-down (goal seeking) can get stuck in left-recursion.
![Page 42: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/42.jpg)
42
Linear-Recursion Methods
• Recursion is linear if at most one IDB atom in the body.– Henschen-Naqvi.– Magic-sets for linear (Bancilhon, Maier,
Sagiv, Ullman).– Left- and right-linear special cases
(Naughton).– Conversion of nonlinear to linear
(Ramakrishnan, Sagiv, Ullman, Vardi).
![Page 43: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/43.jpg)
43
Magic Sets• Several similar techniques for rewriting or
executing Datalog programs in a way that avoids generation of useless facts.– Rohmer, Lescoeur, and Kerisit (earliest
exposition).– Beeri and Ramakrishnan (reordering of
atoms).– Sacca and Zaniolo (simplification of rules).– Dietrich and Warren (tabulation).– Vielle (query-subquery).
![Page 44: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/44.jpg)
44
Bounded Recursion
• Sometimes a recursive Datalog program is equivalent to a nonrecursive program.
• Sufficient conditions (Naughton, Ioannidis, Naughton and Sagiv).
• Undecidability (Gaifman, Mairson, Sagiv, and Vardi).
• Decidable cases (Vardi, Cosmodakis, Gaifman, Kanellakis, and Vardi)
![Page 45: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/45.jpg)
45
Containment for Datalog
• Undecidable if one Datalog program is contained in another (Shmueli).
• NP-complete whether a CQ is contained in a Datalog program.
• Triply exponential whether a Datalog program is contained in a CQ (Chaudhuri and Vardi).
![Page 46: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/46.jpg)
46
What Is It All Good For?
• Dependencies and normalization are now familiar to most CS graduates.– Difference between BCNF and 3NF.– Tradeoffs: lossless joins versus maintaining
dependencies.
• Magic-sets, recursion used in IBM’s DB/2.
• Universal-relation interfaces despite Codd.
![Page 47: A Brief History of DB Theory](https://reader035.fdocuments.in/reader035/viewer/2022081515/56812e35550346895d939bde/html5/thumbnails/47.jpg)
47
More “What Is It All Good For?”• Information integration . While
logical query languages have not caught on, logic has been used to specify how legacy databases are combined into a uniform whole.– Tsimmis (Papakonstantinou, Vassalos).– Information Manifold (Levy).– Infomaster (Duschka, Genesereth).