Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is...

33
Lu Chaojun, SJTU 1 Extended Relational Algebra

description

Why Bags? Efficient implementation –e.g. projection, union –Q: How to eliminate duplicates? Some queries use bags –e.g. Aggregate Find the average grades Lu Chaojun, SJTU 3

Transcript of Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is...

Page 1: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Lu Chaojun, SJTU 1

Extended Relational Algebra

Page 2: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Bag Semantics

• A relation (in SQL, at least) is really a bag (or multiset).– It may contain the same tuple more than once– There is no specified order (unlike a list).

• Select, project, and join work for bags as well as sets.– Just work on a tuple-by-tuple basis, and don't

eliminate duplicates.

Lu Chaojun, SJTU 2

Page 3: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Why Bags?

• Efficient implementation– e.g. projection, union– Q: How to eliminate duplicates?

• Some queries use bags– e.g. Aggregate

Find the average grades

Lu Chaojun, SJTU 3

Page 4: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Bag Union

• R S: Sum the times an element appears in the two bags, i.e. if t appears n/m times in R/S, then t appears n+m times in R S.

• Example{1,2,1}{1,2,3} = {1,1,1,2,2,3}.

4Lu Chaojun, SJTU

Page 5: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Bag Intersection

• R S: Take the minimum of the number of occurrences in each bag, i.e. t appears min(n,m) times in R S.

• Example{1,2,1} {1,2,3,3} = {1,2}.

5Lu Chaojun, SJTU

Page 6: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Bag Difference

• R S: Proper-subtract the number of occurrences in the two bags, i.e. t appears max(0, n m) times in R S.

• Example{1,2,1} {1,2,3,3} = {1}.

6Lu Chaojun, SJTU

Page 7: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Other Operators on Bags

• Projection, selection, product, join– No duplicate elimination

7Lu Chaojun, SJTU

Page 8: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Extensions to Relational Model

• Not a part of the formal relational model, but appear in real query languages like SQL.– Modification: insert, delete, update.– Aggregation: count, sum, average– Views– Null values

8Lu Chaojun, SJTU

Page 9: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Extended RA

• Duplicate-elimination operator• Sorting operator• Extended projection • Grouping-and-aggregation operator• Outerjoin operator

9Lu Chaojun, SJTU

Page 10: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Duplicate Elimination

(R) = relation with one copy of each tuple that appears one or more times in R.

10Lu Chaojun, SJTU

Page 11: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Aggregation Operators

• These are not relational operators; rather they summarize a column in some way.

• Five standard operators: Sum, Average, Count, Min, and Max.

11Lu Chaojun, SJTU

Page 12: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Grouping Operator

L(R), where L is a list of elements that are either– Individual (grouping) attributes or– Of the form (A), where is an aggregation

operator and A the attribute to which it is applied.

• Examplesno,AVG(grade)(SC)

12Lu Chaojun, SJTU

Page 13: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Grouping Operator(cont.)

L(R) is computed by:1. Group R according to all the grouping attributes

on list L.2. Within each group, compute (A), for each

element (A) on list L.3. Result is the relation that consists of one tuple for

each group. The components of that tuple are the values associated with each element of L for that group.

13Lu Chaojun, SJTU

Page 14: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Extended Projection

• Allow the columns in the projection to be functions of one or more columns in the argument relation.

• Examplename,2011 age(Student)

14Lu Chaojun, SJTU

Page 15: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Sorting

L(R) = list of tuples of R, ordered according to attributes on list L.

• Note that result type is outside the normal types (set or bag) for relational algebra.– Consequence: cannot be followed by other

relational operators.

15Lu Chaojun, SJTU

Page 16: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Outerjoin

• The normal join can lose information, because a tuple that doesn't join with any from the other relation becomes dangling.

• The null value can be used to pad dangling tuples so they appear in the join.

• Outerjoin operator: o • Variations: theta-outerjoin, left- and right-

outerjoin (pad only dangling tuples from the left (resp., right).

16Lu Chaojun, SJTU

Page 17: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

A Logic for Relations

Datalog

Lu Chaojun, SJTU 17

Page 18: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Introduction

• A query language for relational model may be based on– Algebra: relational algebra– Logic: relational calculus

e.g. Datalog

• More natural for recursive queries

18Lu Chaojun, SJTU

Page 19: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Predicates and Atoms

• RDB vs. Datalog RDB Datalog relation R( ) predicate R( ) attributes(tuples) arguments x schema R(X) (relational)atom R(x) tuple tR R(t) is TRUE

– R(x) is a boolean-valued function if x contains variables; proposition otherwise.

19Lu Chaojun, SJTU

Page 20: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Arithmetic Atoms

• Comparison between two arithmetic expressionsexp1 exp2– Predicate (exp1,exp2)– infinite and unchanging relation

20Lu Chaojun, SJTU

Page 21: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Datalog Rules

• ExampleHappy(sno)S(sno,n,a,d) AND SC(sno,cno,g) AND g>=95 AND C(cno,cn) AND cn=‘Database’

• Rules: Head Body– Head: relational atom– Body: AND of subgoals

Subgoal: atom or NOT atomAtom: P(arg), P is relation name or arithmetic predicate; arg may be

variable or constant : if

Or :-

21Lu Chaojun, SJTU

Page 22: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Datalog Rules (cont.)

• Query: a collection of one or more rules• Result: a relation appearing in rule heads

– Designate the intended answer when there are more than one relation in rule heads

22Lu Chaojun, SJTU

Page 23: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Meaning of Datalog Rules

• Meaning I:– Assign possible values to variables in the rule– If the assignment makes all the subgoals TRUE, then it

forms a tuple of the result relation.• Meaning II:

– Consider consistent assignment of tuples for each nonnegated, relational subgoals. (see safety)

– Then consider the negated, relational subgoals and the arithmetic subgoals, to see if the assignment of values to variables makes them all TRUE. If yes, a tuple is added to the result relation.

23Lu Chaojun, SJTU

Page 24: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Example: Meaning I

S(x,y)R(x,z) AND R(z,y) AND NOT R(x,y) Consider all possible assignments:R: A B 1. x=1, z=2 make R(x,z) TRUE 1 2 y=3 make R(z,y) TRUE 2 3 NOT R(x,y) TRUE thus add (1,3) to S;S: C D 2. x=2, z=3 make R(x,z) TRUE 1 3 no y make R(z,y) TRUE

24Lu Chaojun, SJTU

Page 25: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Example: Meaning II

S(x,y)R(x,z) AND R(z,y) AND NOT R(x,y) Consider consistent assignment of tuples:R: A B 1. t1 for R(x,z), t1 for R(z,y)

t1 1 2 2. t1 for R(x,z), t2 for R(z,y)

t2 2 3 3. t2 for R(x,z), t1 for R(z,y)

4. t2 for R(x,z), t2 for R(z,y)

S: C D 1 3 only case 2 is a consistent assignment

25Lu Chaojun, SJTU

Page 26: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Safety

Every variable in the rule must appear in some nonnegated relational subgoal.

• To make the result a finite relation.• Example: safety violation

1. S(x)R(y) x not in subgoal2. S(x)NOT R(x) x not in nonnegated subgoal3. S(x)R(y) AND x < y x not in relational subgoal

26Lu Chaojun, SJTU

Page 27: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Datalog Program -- Query

• A collection of rules• Predicates/Relations are divided into two classes:

– Extensional Relations/Predicates: stored in DB– Intensional Relations/Predicates: defined by

rules• EDB predicates can’t appear in the head, only in

body; IDB predicates can appear in head, body, or both.

27Lu Chaojun, SJTU

Page 28: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Datalog Rules Applied to Bags

• When there are no negated relational subgoals:– Meaning I for evaluating Datalog rules applies

to bags as well as sets– But for bags, Meaning II is simpler for

evaluating.• When there are negated relational subgoals:

– There is not a clearly defined meaning under the bag model.

28Lu Chaojun, SJTU

Page 29: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

From RA to Datalog

RS I(x) R(x) AND S(x)RS I(x) R(x) I(x) S(x)RS I(x) R(x) AND NOT S(x)A(R) I(a) R(a,b)

29Lu Chaojun, SJTU

Page 30: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

From RA to Datalog(cont.)

F(R) I(x) R(x) AND F

C1 AND C2(R) I(x) R(x) AND C1 AND C2

C1 OR C2(R) I(x) R(x) AND C1

I(x) R(x) AND C2RS I(x,y) R(x) AND S(y)R S I(x,y,z) R(x,y) AND S(y,z)

30Lu Chaojun, SJTU

Page 31: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Multiple Operations in Datalog

• Create IDB predicates for intermediate relations

• ExampleA(x,y,z) R(x,y,z) AND x > 10B(x,y,z) R(x,y,z) AND y = ‘ok’C(x,y,z) A(x,y,z) AND B(x,y,z)D(x,z) C(x,y,z)

31Lu Chaojun, SJTU

Page 32: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Expressive Power of Datalog

• Non-recursive Datalog = RA• Datalog simulates SQL SELECT-FROM-

WHERE without aggregation and grouping• Recursive Datalog is more powerful than

RA and SQL• None is full in expressive power (Turing

completeness)

32Lu Chaojun, SJTU

Page 33: Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

End