1 Lecture No 16 Asma Ahmad Relational Algebra I March 10, 2011 Database Systems.

38
1 Lecture No 16 Asma Ahmad Relational Algebra I March 10, 2011 Database Systems Database Systems

Transcript of 1 Lecture No 16 Asma Ahmad Relational Algebra I March 10, 2011 Database Systems.

1

Lecture No 16Asma Ahmad

Relational Algebra IMarch 10, 2011

Database SystemsDatabase Systems

2

Relational AlgebraRelational Query LanguagesFormal Relational Query LanguagesOperations in Relational AlgebraSelectionProjectionCross ProductRenamingUnion, Intersection, Set DifferenceJoin and its typesDivisionGrouping & Aggregate FunctionsExamplesSummary

3

Relational Query Languages Query languages: allow manipulation and

retrieval of data from a database. Relational QLs are simple & powerful.

Strong formal foundation based on logic. Allows for much optimization.

Query languages != programming languages! Not intended for complex calculations. Support easy, efficient access to large data sets.

4

Formal Relational Query Languages

Two mathematical query languages form the basis for “real” languages (e.g., SQL), and for implementation:

1. Relational algebra: more operational, very useful for understanding meanings of queries and for representing execution plans.

2. Relational calculus: lets users describe what they want, rather than how to compute it.

Difference

The relational algebra might suggest these steps to retrieve the phone numbers and names of book stores that supply Some Sample Book: Join books and titles over the BookstoreID. Restrict the result of that join to tuples for the book Some Sample Book. Project the result of that restriction over StoreName and StorePhone.

The relational calculus would formulate a descriptive, declarative way: Get StoreName and StorePhone for supplies such that there exists a title

BK with the same BookstoreID value and with a BookTitle value of Some Sample Book.

The relational algebra and the relational calculus are essentially logically equivalent: for any algebraic expression, there is an equivalent expression in the calculus, and vice versa

5

6

Relational Algebra Basic Operations:

Selection (): choose a subset of rows. Projection (): choose a subset of columns. Cross Product (): Combine two tables. Union (): unique tuples from either table. Set difference (): tuples in R1 not in R2. Renaming (): change names of tables & columns

Additional Operations (for convenience): Intersection, joins (very useful), division, outer joins,

aggregate functions, etc.

7

Selection Format: selection-condition(R). Choose tuples that

satisfy the selection condition. Result has identical schema as the input.

Major = ‘CS’ (Students)

SID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS 678 Ken 3.5 Math

StudentsSID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS

Result

Selection condition is a Boolean expression including =, , <, , >, , and, or, not.

8

Projection Format: attribute-list(R). Retain only those

columns in the attribute-list. Result must eliminate duplicates.

Major(Students)

SID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS 678 Ken 3.5 Math

StudentsMajor CS Math

Result

Operations can be composed.

Name, GPA(Major = ‘CS’ (Students))

9

Cross Product

Format: R1 R2. Each row of R1 is paired with each row of R2.

Result schema consists of all attributes of R1 followed by all attributes of R2.

Problem: Columns may have identical names. Use notation R.A, or renaming attributes.

Only some rows make sense. Often need a selection to follow.

10

Example of Cross Product

SID Name GPA Major SID Amount Year 456 John 3.4 CS 456 1500 1998 456 John 3.4 CS 678 3000 2000 457 Carl 3.2 CS 456 1500 1998 457 Carl 3.2 CS 678 3000 2000 678 Ken 3.5 Math 456 1500 1998 678 Ken 3.5 Math 678 3000 2000

Students Awards

SID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS 678 Ken 3.5 Math

StudentsSID Amount Year 456 1500 1998 678 3000 2000

Awards

11

Renaming

Format: S(R) or S(A1, A2, …)(R): change the name of relation R, and names of attributes of R

CS_Students(Major = ‘CS’ (Students))

SID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS 678 Ken 3.5 Math

StudentsSID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS

CS_Students

12

Union, Intersection, Set Difference Format: R1 R2 (R1 R2, R1 R2). Return all

tuples that belong to either R1 or R2 (to both R1 and R2; to R1 but not to R2).

Requirement: R1 and R2 are union compatible. With same number of attributes. Corresponding attributes have same domains.

Schema of result is identical to that of R1. May need renaming.

Duplicates are eliminated.

13

Examples of Set Operations

SID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS 678 Ken 3.5 Math

TAsSID Name GPA Major 456 John 3.4 CS 223 Bob 2.95 Ed

RAs

SID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS 678 Ken 3.5 Math 223 Bob 2.95 Ed

TAs RAsSID Name GPA Major 456 John 3.4 CS

TAs RAs

SID Name GPA Major 457 Carl 3.2 CS 678 Ken 3.5 Math

TAs RAs

14

Joins Theta Join.

Format: R1 join-condition R2.

Returns tuples in join-condition(R1 R2)

Equijoin. Same as Theta Join except the join-condition

contains only equalities. Natural Join.

Same as Equijoin except that equality conditions are on common attributes and duplicate columns are eliminated.

15

Examples of Joins

Theta Join.

Students Students.Age<=Profs.Age Profs

SID Name GPA Age Prof 456 John 3.4 29 123 457 Carl 3.2 35 123 678 Ken 3.5 25 154

StudentsPID Pname Age Dept 123 John 35 CS 154 Scott 28 Math

Profs

SID Name GPA Age Prof PID Pname Age Dept 456 John 3.4 29 123 123 John 35 CS 457 Carl 3.2 35 123 123 John 35 CS 678 Ken 3.5 25 154 123 John 35 CS 678 Ken 3.5 25 154 154 Scott 28 Math

Result

16

Examples of Joins (cont.) Equijoin.

Students Prof=PID AND Name=Pname Profs

SID Name GPA Age Prof PID Pname Age Dept 456 John 3.4 29 123 123 John 35 CS

Result

Natural Join. Students Profs

SID Name GPA Age Prof PID Pname Dept 457 Carl 3.2 35 123 123 John CS

Result

Relational Algebra Defined:Where is it in DBMS?

parser

SQLRelationalalgebraexpression

OptimizedRelationalalgebraexpression

Query optimizer

Codegenerator

Queryexecutionplan

Executablecode

DBMS

18

Dangling Tuples in Join

Usually, only a subset of tuples of each relation will actually participate in a join.

Tuples of a relation not participating in a join are dangling tuples.

How do we keep dangling tuples in the result of a join? (Why do we want to do that?) Use null values to indicate a “no-join” situation.

19

Outer Joins Left Outer Join.

Format: R1 R2. Similar to a natural join but keep all dangling tuples of R1.

Right Outer Join. Format: R1 R2. Similar to a natural join but keep

all dangling tuples of R2. (Full) Outer Join.

Format: R1 R2. Similar to a natural join but keep all dangling tuples of both R1 & R2.

20

Examples of Outer Joins

Left Outer Join.

Students Awards

SID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS 678 Ken 3.5 Math

StudentsSID Amount Year 456 1500 1998 678 3000 2000

Awards

SID Name GPA Major Amount Year 456 John 3.4 CS 1500 1998 457 Carl 3.2 CS Null Null 678 Ken 3.5 Math 3000 2000

Result

21

Join: An Observation

Some tuples don’t contribute to the result, they get lost.

Employee DepartmentBrownJonesSmith

ABB

DepartmentBC

HeadBlackWhite

Employee Department HeadJonesSmith

BB

BlackBlack

22

Outer Join

An outer join extends those tuples with null values that would get lost by an (inner) join.

The outer join comes in three versions left: keeps the tuples of the left argument, extending them

with nulls if necessary right: ... of the right argument ... full: ... of both arguments ...

23

Left Outer Join

Employee DepartmentBrownJonesSmith

ABB

EmployeeDepartment

BC

HeadBlackWhite

Department

Employee Department HeadBrownJones

AB

nullBlack

Smith B Black

Employee DepartmentLeft

24

Right Outer Join

Employee DepartmentBrownJonesSmith

ABB

EmployeeDepartment

BC

HeadBlackWhite

Department

Employee Department HeadJonesSmith

BB

BlackBlack

null C White

Employee DepartmentRight

25

Full Outer Join

Employee DepartmentBrownJonesSmith

ABB

EmployeeDepartment

BC

HeadBlackWhite

Department

Employee Department HeadBrownJones

AB

nullBlack

Smith B Black

Employee DepartmentFull

null C White

26

Division

Format: R1 R2.

Restriction: Every attribute in R2 is in R1. For R1(A1, ..., An, B1, ..., Bm) R2(B1, ...,

Bm) and T = A1, ..., An (R1), Return the subset of T, say W, such that every tuple in W R2 is in R1.

W is the largest subset of T, such that,

(W R2) R1

27

An Example of Division

Takes CS_Req

SID CNO 456 CS210 456 CS321 456 CS135 457 CS210 457 CS321 532 CS210 678 CS321

TakesSID 456 457

ResultCNO CS210 CS321

CS_Req

What is the meaning of this expression?

28

Division (Definition) R S

Defines a relation over the attributes C that consists of set of tuples from R that match combination of every tuple in S.

Expressed using basic operations:T1 C(R)

T2 C((S X T1) – R)

T T1 – T2

29

Example - Division Identify all clients who have viewed all

properties with three rooms.

(clientNo, propertyNo(Viewing)) (propertyNo(rooms = 3 (PropertyForRent)))

30

DIVISION () : QUERIES THAT INCLUDE THE PHRASE “FOR ALL”.

Q. FIND ALL CUSTOMERS WHO HAVE AN ACCOUNT AT ALL BRANCHES LOCATED IN BROOKLYN. TO OBTAIN ALL BRANCHES IN BROOKLYN :R = BRANCH-NAME(BRANCH-CITY=“BROOKLYN” (BRANCH))

TO OBTAIN ALL CUSTOMER-NAME, BRANCH-NAME PAIRS FOR WHICH THE CUSTOMER HAS AN ACCOUNT AT A BRANCH :S = CUSTOMER-NAME,BRANCH-NAME (DEPOSIT)

NOW TO FIND CUSTOMERS WHO APPEARIN S WITH EVERY BRANCH NAME IN R. CUSTOMER-NAME,BRANCH-NAME (DEPOSIT) BRANCH-NAME(BRANCH-

CITY=“BROOKLYN” (BRANCH))

31

Grouping & Aggregate Functions Format: group_attributes F aggregate_functions ( r ) Partition a relation into groups Apply aggregate function to each group Output grouping and aggregation values, one

tuple per group Ex: Major F count(SID), avg(GPA) (Students)

SID Name GPA Major 456 John 3.4 CS 457 Carl 3.2 CS 678 Ken 3.5 Math

StudentsMajor count(SID) avg(GPA) CS 2 3.3 Math 1 3.5

Result

Sample Schema for Exercises

Student(ID, name, address, GPA, SAT)

Campus(location, enrollment, rank)

Apply(ID, location, date, major, decision)

Sample QueriesFind Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected.

List name and address of all students who didn’t apply anywhere.

name, address ((Students) Students.ID= Not_Apply.ID

(Not_Apply(ID (Students) - ID (Apply))))

name, address (GPA>3.7 decision=‘No’ major=‘CS’

(Student Student.ID=Apply.ID Apply))

BRANCH-SCHEME (BRANCH-NAME, ASSETS, BRANCH-CITY) CUSTOMER (CUSTOMER-NAME, STREET, CUSTOMER-CITY). DEPOSIT (BRANCH-NAME, ACCOUNT-NUMBER, CUSTOMER-NAME, BALANCE) BORROW-SCHEME = (BRANCH-NAME, LOAN-NUMBER, CUSTOMER-NAME, AMOUNT) CLIENT (CLIENT-NAME, BANKER-NAME) 

Another Database Schema

SELECT () : Q. SELECT THOSE TUPLES OF THE BORROW RELATION WHERE BRANCH IS “PERRYRIDGE”.BRANCH-NAME = “PERRYRIDGE”(BORROW)

 Q. FIND ALL TUPLES IN WHICH AMOUNT BORROWED IS LESS THAN $1200. AMOUNT < 1200 (BORROW)

 WE CAN USE THE FOLLOWING OPERATIONS :=, , , , , . ALSO, WE DENOTE AND BY , OR BY .

Q. FIND THOSE TUPLES PERTAINING TO LOANS OF MORE THAN $1200 MADE BY THE PERRYRIDGE BRANCH. BRANCH-NAME = “PERRYRIDGE” AMOUNT > 1200 (BORROW )

PROJECT () :  Q. SHOW CUSTOMERS AND THEIR BORROWING BRANCH-NAME.BRANCH-NAME, CUSTOMER-NAME(BORROW)

 Q. FIND ALL THOSE CUSTOMERS WHO HAVE THE SAME NAME AS THEIR PERSONAL BANKER. CUSTOMER-NAMECUSTOMER-NAME = BANKER-NAME (CLIENT).

   

 

CARTESIAN PRODUCT () :

Q. FIND ALL CLIENTS OF BANKER JOHNSON AND THE CITY IN WHICH THEY LIVE. BANKER-NAME=“JOHNSON” (CLIENTCUSTOMER)

NOW CLIENT.CUTOMER-NAME COLUMN CONTAINS ONLY CUSTOMERS of Banker JOHNSON.  

TO FIND ALL THE CLIENTS OF BANKER JOHNSON, WE WRITECLIENT.CUSTOMER-NAME = CUSTOMER.CUSTOMER-

NAME ( BANKER-NAME=“JOHNSON”

(CLIENTCUSTOMER))

NOW WE WANT ONLY CUSTOMER-NAME AND CUSTOMER-CITY, WE WRITE

CLIENT.CUSTOMER-NAME,CUSTOMER-CITY(CLIENT.CUSTOMER-NAME =

CUSTOMER.CUSTOMER-NAME ( BANKER-NAME =

“JOHNSON” (CLIENT*CUSTOMER)))