Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data...

28
Murali Mani Relational Algebra

Transcript of Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data...

Murali Mani

Relational Algebra

Murali Mani

What is Relational Algebra?

Defines operations (data retrieval) for relational model

SQL’s DML (Data Manipulation Language) has data retrieval facilities, which are equivalent to that of relational algebra.

SQL and relational algebra are not for complex operations; they support efficient, easy access of large data sets.

Murali Mani

Basics Relational Algebra is defined on bags, rather than

relations (sets). Bag or multiset allows duplicate values; but order is

not significant. We can write an expression using relational algebra

operators with parentheses Need closure – an operator on bag returns a bag. Relational algebra includes set operators, and other

operators specific to relational model.

Murali Mani

Set Operators Union, Intersection, Difference, cross product Union, Intersection and Difference are

defined only for union compatible relations. Two relations are union compatible if they

have the same set of attributes and the types (domains) of the attributes are the same.

Eg of two relations that are not union compatible: Student (sNumber, sName) Course (cNumber, cName)

Murali Mani

Union: Consider two bags R1 and R2 that are union-

compatible. Suppose a tuple t appears in R1 m times, and in R2 n times. Then in the union, t appears m + n times.

A B

1 2

3 4

1 2

R1

A B

1 2

3 4

5 6

R2 A B

1 2

1 2

1 2

3 4

3 4

5 6

R1 R2

Murali Mani

Intersection: ∩ Consider two bags R1 and R2 that are union-

compatible. Suppose a tuple t appears in R1 m times, and in R2 n times. Then in the intersection, t appears min (m, n) times.

A B

1 2

3 4

1 2

R1

A B

1 2

3 4

5 6

R2

A B

1 2

3 4

R1 ∩ R2

Murali Mani

Difference: - Consider two bags R1 and R2 that are union-

compatible. Suppose a tuple t appears in R1 m times, and in R2 n times. Then in R1 – R2, t appears max (0, m - n) times.

A B

1 2

3 4

1 2

R1

A B

1 2

3 4

5 6

R2

A B

1 2

R1 – R2

Murali Mani

Bag semantics vs Set semantics Union is idempotent for sets:

(R1 R2) R2 = R1 R2 Union is not idempotent for bags. Intersection is idempotent for sets and bags. Difference is idempotent for sets, but not for

bags. For sets and bags, R1 R2 = R1 – (R1 – R2).

Murali Mani

Cross Product (Cartesian Product): X Consider two bags R1 and R2. Suppose a tuple

t1 appears in R1 m times, and a tuple t2 appears in R2 n times. Then in R1 X R2, t1t2 appears mn times.

A B

1 2

1 2

R1

B C

2 3

4 5

4 5

R2 A R1.B R2.B C

1 2 2 3

1 2 2 3

1 2 4 5

1 2 4 5

1 2 4 5

1 2 4 5

R1 X R2

Murali Mani

Basic Relational Operations Select, Project, Join Select: denoted σC (R): selects the subset of

tuples of R that satisfies selection condition C. C can be any boolean expression, its clauses can be combined with AND, OR, NOT.

A B C

1 2 5

3 4 6

1 2 7

1 2 7

R σ(C ≥ 6) (R)

A B C

3 4 6

1 2 7

1 2 7

Murali Mani

Select

Select is commutative: σC2 (σC1 (R)) = σC1 (σC2

(R)) Select is idempotent: σC (σC (R)) = σC (R) We can combine multiple select conditions

into one condition. σC1 (σC2 (… σCn (R)…)) = σC1 AND C2 AND … Cn (R)

Murali Mani

Project: πA1, A2, …, An (R)

Consider relation (bag) R with set of attributes AR. πA1, A2, …, An (R), where A1, A2, …, An AR returns the tuples in R, but only with columns A1, A2, …, An.

A B C

1 2 5

3 4 6

1 2 7

1 2 8

R πA, B (R)

A B

1 2

3 4

1 2

1 2

Murali Mani

Project: Bag Semantics vs Set Semantics

For bags, the cardinality of R = cardinality of πA1, A2, …, An (R).

For sets, cardinality of R ≥ cardinality of πA1,A2,

…, An (R). For sets and bags

project is not commutative project is idempotent

Murali Mani

Natural Join: R ⋈ S Consider relations (bags) R with attributes

AR, and S with attributes AS. Let A = AR ∩ AS. R ⋈ S can be defined as

πAR – A, A, AS - A (σR.A1 = S.A1 AND R.A2 =S.A2 AND … R.An=S.An (R X S))where A = {A1, A2, …, An}The above expression says: select those tuples in R X S that agree in values for each of the A attributes, and project the resulting tuples such that we have only one value for each A attribute.

Murali Mani

Natural Join example

A B

1 2

1 2

R1

B C

2 3

4 5

4 5

R2

A B C

1 2 3

1 2 3

R1 ⋈ R2

Murali Mani

Theta Join: R ⋈C S Theta Join is similar to cartesian product,

except that we can specify any condition C. It is defined as

R ⋈

C S = (σC (R X S))

A B

1 2

1 2

R1

B C

2 3

4 5

4 5

R2

R1 ⋈

R1.B<R2.BR2

A R1.B R2.B C

1 2 4 5

1 2 4 5

1 2 4 5

1 2 4 5

Murali Mani

Outer Join: R ⋈o S Similar to natural join, however, if there is a

tuple in R, that has no “matching” tuple in S, or a tuple in S that has no matching tuple in R, then that tuple also appears, with null values for attributes in S (or R).

A B C

1 2 3

4 5 6

7 8 9

R1

B C D

2 3 10

2 3 11

6 7 12

R2

R1 ⋈o R2

A B C D

1 2 3 10

1 2 3 11

4 5 6 null

7 8 9 null

null 6 7 12

Murali Mani

Left Outer Join: R ⋈oLS

Similar to natural join, however, if there is a tuple in R, that has no “matching” tuple in S, then that tuple also appears, with null values for attributes in S (note: a tuple in S that has no matching tuple in R does not appear).

A B C

1 2 3

4 5 6

7 8 9

R1

B C D

2 3 10

2 3 11

6 7 12

R2

R1 ⋈o

L R2

A B C D

1 2 3 10

1 2 3 11

4 5 6 null

7 8 9 null

Murali Mani

Right Outer Join: R ⋈oRS

Similar to natural join, however, if there is a tuple in S, that has no “matching” tuple in R, then that tuple also appears, with null values for attributes in R (note: a tuple in R that has no matching tuple in S does not appear).

A B C

1 2 3

4 5 6

7 8 9

R1

B C D

2 3 10

2 3 11

6 7 12

R2

R1 ⋈o

R R2

A B C D

1 2 3 10

1 2 3 11

null 6 7 12

Murali Mani

Renaming: ρS(A1, A2, …, An) (R) Rename relation R to S, attributes of R are

renamed to A1, A2, …, An ρS (R) renames relation R to S, keeping the

attributes same.

B C D

2 3 10

2 3 11

6 7 12

R2

X C D

2 3 10

2 3 11

6 7 12

ρS(X, C, D) (R2)

SB C D

2 3 10

2 3 11

6 7 12

ρS (R2)

S

Murali Mani

Example: Introducing new relationsFind the semijoin of 2 relations R, S. Semijoin denoted R ⋉ S is defined as the tuples in R, such that for a tuple t1 in R, if there exists a tuple t2 in S, and t1 and t2 agree in all attributes common to R and S, then t1 appears in the result.

R1 = R S⋈R2 = πAR

(R1)

R ⋉ S = R2 R⋂

Murali Mani

Duplicate Elimination: (R)

Convert a bag to a set.

R

A B

1 2

3 4

1 2

1 2

(R)

A B

1 2

3 4

Murali Mani

Extended Projection: πL (R) Here L can be

An attribute (just like simple projection) An expression x → y, where x and y are names of

attributes, this renames attribute x to y. An expression E → z, where E is any expression

involving attributes, constants, and arithmetic and string operators. This has an attribute called z whose values are given by E.

B C D

2 3 10

2 3 11

6 7 12

R πB→A, C+D→X, C, D (R)

A X C D

2 13 3 10

2 14 3 11

6 19 7 12

Murali Mani

Aggregation operators MIN, MAX, COUNT, SUM, AVG AGGB (R) considers only non-null values of R.

R

A B

1 2

3 4

1 null

1 3

MINB (R)

2

MINB (R)

MAXB (R)

4

MAXB (R)

COUNTB (R)

3

COUNTB (R)

SUMB (R)

9

SUMB (R)

AVGB (R)

3

AVGB (R)

COUNT* (R)

4

COUNT* (R)

Murali Mani

Aggregation Operators MIN, MAX, SUM, AVG must be on any 1

attribute. COUNT can be on any 1 attribute or COUNT* (R)

An aggregation operator returns a bag, not a single value ! But SQL allows treatment as a single value.

A B

3 4

σB=MAXB (R) (R)

Murali Mani

Grouping Operator: GL, AL (R) GL, AL (R) groups all attributes in GL, and

performs the aggregation specified in AL.

title year starName

SW1 77 HF

Matrix 99 KR

6D&7N 93 HF

SW2 79 HF

Speed 94 KR

StarsIn

starName, MIN (year)→year, COUNT(title) →num (StarsIn)

starName year num

HF 77 3

KR 94 2

Murali Mani

Sorting Operator: L (R)

It sorts the tuples in R. If L is list A1, A2, …, An, it first sorts by A1, then by A2, and so on.

Sort is used as a last operator in an expression.

A B C

1 2 5

3 1 6

1 2 7

1 3 8

RA B C

1 2 5

1 2 7

1 3 8

3 1 6

A,B (R)

Murali Mani

Relational Algebra Operators

Set Operators: Union, Intersection, Difference, Cartesian Product

Select, Project Join: Natural Join, Theta Join, (Left/Right)

Outer Join Renaming, Duplicate Elimination Aggregation: MIN, MAX, COUNT, SUM, AVG Grouping, Sorting