Query Design

38
Query Design Objectives of the Lecture : • To learn a strategy for designing queries. • To learn how to use relational algebra concepts to implement the strategy. • To learn how to translate the resulting query design into SQL.

description

Query Design. Objectives of the Lecture :. To learn a strategy for designing queries. To learn how to use relational algebra concepts to implement the strategy. To learn how to translate the resulting query design into SQL. Relational Queries. A query has two parts : - PowerPoint PPT Presentation

Transcript of Query Design

Page 1: Query Design

Query Design

Objectives of the Lecture :

• To learn a strategy for designing queries.

• To learn how to use relational algebra concepts to implement the strategy.

• To learn how to translate the resulting query design into SQL.

Page 2: Query Design

Relational QueriesA query has two parts : derive a relation whose contents are the answer

to your query; retrieve that derived relation.

In SQL :

SELECT ........................................................................................................................... ;

Retrieve

Derivation of relation

Trivial !

Queries so far have retrieved relations whose derivations were relatively simple. Extremely sophisticated derivations - i.e. complex queries - can be written for relational DBs.

Page 3: Query Design

Query Design Strategy

1. Determine which relations in the DB hold relevant data :Which relation holds the Determinant data ?Which relation holds the Dependent data ?

2. Use relational algebra concepts to conceptually derive one relation that holds just the determinant & the dependent.

This relation holds the answer to the query.Each algebra operator represents a conceptually natural manipulation of relations.Simpler to use than SQL clauses & phrases.

3. Translate design into SQL.Most (not all) relational DBMSs use SQL, so translate into SQL in order to execute query.

Page 4: Query Design

Example Database

It is not necessary to know the data values in any relation, or any integrity constraints except attribute data types.

EmpNo EName SalaryM-S DeptNo

Employee

DeptNo DName Budget MgrNo

Dept

ProjNo Start End

Project

ProjNo EmpNo

Alloc

Page 5: Query Design

Example 1 : Determinant & Dependent

Query : Get the names & salaries of all married & widowed employees.

Determinant – who/what is it we want to know about ?Married & widowed employees.

Dependent – what do we want to know about the ‘determinant’ ?

Names & salaries.

Q. In which relation(s) are the determinant & dependent ?A. Both in relation Employee.

Need to form one relation from Employee with just the determinant & dependent data in.

Do this by ‘reducing’ Employee to just the relevant attributes & tuples. Get rid of the rest.

Page 6: Query Design

Example 1 : Derive One Relation

Get rid of all tuples except those containing the determinant (& dependent).

Get rid of all attributes except those containing the determinant & dependent.

Employee

Restrict[ M-S = ‘M’ Or M-S = ‘W’ ]

Project[ M-S, EName, Salary ]

Answer !

Page 7: Query Design

Example 1 : Convert to SQL

SELECT M_S, EName, Salary

FROM Employee

WHERE M_S = ‘M’ OR M_S = ‘W’ ;

Employee

Restrict[ M-S = ‘M’ Or M-S = ‘W’ ]

Project[ M-S, EName, Salary ]

Page 8: Query Design

Example 1 : Result

EName SalaryM-SJane 3000M

Ali 3100M

....... ........

....... ........

....... ........

Sid 4400W

Relation contains only the determinant & dependent.

Determinant

Dependant

Page 9: Query Design

Example 2 : Determinant & Dependent Query : Get the names, salaries & project numbers of

employees who work on projects.

Determinant – who/what is it we want to know about ? Employees who work on projects.

Dependent – what do we want to know about the ‘determinant’ ?Names, salaries and project numbers.

Q. In which relation(s) are the determinant & dependent ?A. Determinant in relation Alloc.

Dependent in relations Employee and Alloc.

Need to form one relation from Employee and Alloc with just the determinant & dependent data in.

Do this by ‘merging’ Employee and Alloc together & ‘reducing’ the result to just the required attributes & tuples.

Page 10: Query Design

‘Merging’ Relations1. ‘Horizontally’

A B C D

.... ..... .. .....

.... .... .. .....

.... .... .. .....

A B C D E

.... ..... .. ...... .....

.... .... .. ...... .....

.... .... .. ...... .....

D E

...... .....

...... ....

...... ....

2. ‘Vertically’

Join ==>

==>

A B C

.... ..... ..

.... .... ..

Union

A B C

.... ..... ..

.... .... ..

A B C

.... ..... ..

.... ..... ..

.... ..... ..

.... .... ..

Example needs a horizontal merge.

Page 11: Query Design

Example 2 : Derive One Relation

Merge relations ‘horizontally’.

Get rid of all attributes except those containing the determinant & dependent.

Answer !

Employee

Join[ EmpNo ]

Project[ ProjNo, EName, Salary ]

Alloc

Page 12: Query Design

Example 2 : Convert to SQL

SELECT ProjNo, EName, Salary

FROM Employee NATURAL JOIN Alloc ;

Employee

Project[ ProjNo, EName, Salary ]

Alloc

Join[ EmpNo ]

Page 13: Query Design

Example 2 : Result

Relation contains only the determinant & dependent.

Determinant

Dependant

EName SalaryProjNoJoan 2900P2

Uli 3200P2

....... ........

Ryan 4400P4

....... ........

....... ........

Page 14: Query Design

Example 3 : Determinant & Dependent

Query : Get the total salary of all married employees.

Determinant – who/what is it we want to know about ?Married employees.

Dependent – what do we want to know about the ‘determinant’ ? Salaries : but a calculation is needed to get the total !

Q. In which relation(s) are the determinant & dependent ?A. Both in relation Employee.

Need to form one relation from Employee with just the determinant & calculated dependent data in.

Do this by ‘reducing’ Employee to just the required tuples, and calculating what is required.

Page 15: Query Design

Calculating Data1. ‘Horizontally’

A B C D

.... ..... .. .....

.... .... .. .....

.... .... .. .....

A B C D E

.... ..... .. ...... .....

.... .... .. ...... .....

.... .... .. ...... .....

2. ‘Vertically’

Extend ==>

==>A C

.... ..

.... ..

A B C

.... ..... ..

.... ..... ..

.... ..... ..

.... .... ..

GroupBy

Example needs a vertical calculation.

Page 16: Query Design

Example 3 : Derive One Relation

Get rid of all tuples except those containing the determinant (& dependent).

Calculate the dependent, & get rid of all other attributes except the determinant.

Employee

Restrict[ M-S = ‘M’ ]

GroupBy[ M-S ] With[ Total <-- Bag[Salary] Sum ]

Answer !

Page 17: Query Design

Example 3 : Convert to SQL

SELECT M_S, SUM(ALL Salary) AS TotalFROM EmployeeWHERE M_S = ‘M’GROUP BY M_S ;

Employee

Restrict[ M-S = ‘M’ ]

GroupBy[ M-S ] With[ Total <-- Bag[Salary] Sum ]

Page 18: Query Design

Example 3 : Result

TotalM-S120,000M

Relation contains only the determinant & dependent.

Determinant

Dependant

Page 19: Query Design

Determinants

Always need a determinant and a dependent to design a query.

However determinant does not always need to be in the answer. Typically there are 2 cases where this arises :

1. If the determinant is a single value,then it can be left out of the answer,

because the query designer knows the determinant to which the dependent data refers, and so doesn’t need it in the answer.

2. If the query designer is not interested in distinguishing between different determining values,

then they can be left out,because they don’t matter.

Page 20: Query Design

Revised E.G. 1 : Determinant/DependentQuery : Get the names & salaries of all married

& widowed& widowed employees.

Determinant – who/what is it we want to know about ?Married & widowed& widowed employees.

Dependent – what do we want to know about the ‘determinant’ ?

Names & salaries.

Q. In which relation(s) are the determinant & dependent ?A. Both in relation Employee.

Need to form one relation from Employee with just the determinant & dependent data in.

Then remove the determinant data as well - it will have just one value, the ‘married’ value.

Page 21: Query Design

Revised E.G. 1 : Required Result

EName SalaryJane 3000

Ali 3100

....... .....

....... .....

Relation contains only dependent data.

No determinant because the query designer knows that all the dependent values now refer to ‘married’ people. Before some were married and some were widowed. Therefore they had to be distinguished.

Page 22: Query Design

Revised E.G. 1 : Derive One Relation

Get rid of all tuples not containing the single ‘married’ determinant value.

Get rid of all attributes except the dependent ones.

Employee

Restrict[ M-S = ‘M’ ]

Project[ EName, Salary ]

Page 23: Query Design

Revised E.G. 1 : Convert to SQL

SELECT EName, Salary

FROM Employee

WHERE M_S = ‘M’;

Employee

Restrict[ M-S = ‘M’ ]

Project[ EName, Salary ]

Page 24: Query Design

Revised E.G. 2 : Determinant/Dependent Query : Get the names, salaries & project numbers& project numbers of

employees who work on projects.

Determinant – who/what is it we want to know about ? Employees who work on projects.

Dependent – what do we want to know about the ‘determinant’ ?Names & salaries.

Q. In which relation(s) are the determinant & dependent ?A. Determinant in relation Alloc.

Dependent in relation Employee.

Form one relation from Employee and Alloc, by joining them. The result will contain all data about employees and their projects.

Then ‘reduce’ the result to just the dependent attributes.

Page 25: Query Design

Revised E.G. 2 : Required Result

Relation contains only dependent data. EName Salary

Joan 2900

Uli 3200

....... .....

Ryan 4400

....... .....

....... .....

No determinant because the query designer is not interested in the particular projects that employees work on. Before the individual projects were relevant. Therefore they had to be distinguished.

Page 26: Query Design

Revised E.G. 2 : Derive One Relation

Merge relations ‘horizontally’.

Get rid of all attributes except the dependents.

Employee

Join[ EmpNo ]

Project[ EName, Salary ]

Alloc

Page 27: Query Design

Revised E.G. 2 : Convert to SQL

SELECT EName, Salary

FROM Employee NATURAL JOIN Alloc ;

Employee

Project[ EName, Salary ]

Alloc

Join[ EmpNo ]

Page 28: Query Design

Revised E.G. 3 : Determinant/DependentQuery : Get the total salary of all married employees.

DeterminantMarried employees

DependentSalaries , but totalled.

Q. In which relation(s) are the determinant & dependent ?A. Both in relation Employee.

Need to form one relation from Employee with just the calculated dependent data in.

Do this by ‘reducing’ Employee to just the required tuples, and calculating what is required.

Don’t actually need to include ‘married’ in the result, since we know the employees referred to.

Page 29: Query Design

Revised E.G. 3 : Required Result

Total120,000

Relation contains only the calculated dependent.

No determinant because the query designer knows that the dependent result refers to ‘married’ people. Didn’t really need to include the ‘married’ marital-status value before.

Page 30: Query Design

Revised E.G. 3 : Derive One Relation

Get rid of all tuples except those referring to the ‘married’ determinant.

Calculate the total from the whole result of the restriction, because it only holds data about ‘married’ employees.

GroupBy[ ] With[ Total <-- Bag[Salary] Sum ]

Employee

Restrict[ M-S = ‘M’ ]

Page 31: Query Design

Revised E.G. 3 : Convert to SQL

SELECT SUM(ALL Salary) AS TotalFROM EmployeeWHERE M_S = ‘M’ ;

Employee

Restrict[ M-S = ‘M’ ]

GroupBy[ ] With[ Total <-- Bag[Salary] Sum ]

Nothing !

Page 32: Query Design

Dependants

Always need a determinant and a dependent to design a query.

If there are queries where only the dependant needs to be in the result,

are there queries where only the determinant needs to be in the result ?

The answer is, not very often.

The typical case is a query to discover if a determinant exists.Such a query typically requires an answer of yes/no or true/false.

Page 33: Query Design

Example with No Dependant in AnswerQuery : Are there any married employees ?

Determinant – who/what is it we want to know about ?Married employees.

Dependent – what do we want to know about the ‘determinant’ ?

Nothing ! Only whether they exist.

Q. In which relation(s) is the determinant?A. Relation Employee : contains data for all employees,

including their marital-status.

Need to form one relation from Employee with just the determinant data in.

Do this by ‘reducing’ Employee to just the marital-status attribute with tuples whose marital-status = ‘married’.

Page 34: Query Design

Required Result : First Attempt

Relation(s) contain only the determinant. Dependant is of no interest.

M-SM

M

M

M

M

M-S

Answer if there are ‘married’ employees.

Answer if there are no ‘married’ employees.

OR

Page 35: Query Design

Result Really Required

because the previous Yes answer could be very large (!)and the previous No answer could be misleading.

OR

Yes No

Unfortunately, SQL has no easy way to derive Yes/No answers.

design the query to deliver the previous answer(s).

Page 36: Query Design

Derive One Relation

Employee

Restrict[ M-S = ‘M’ ]

Project[ M-S ]

Get rid of all tuples except those containing the determinant value ‘married’.

Get rid of all attributes except the determinant value ‘married’.

Answer !

Do we need this step ? Only to prevent a Yes answer from being too big.

Page 37: Query Design

Convert to SQL

Employee

Restrict[ M-S = ‘M’ ]

Project[ M-S ]

SELECT M_SFROM EmployeeWHERE M_S = ‘M’ ;

Page 38: Query Design

ConclusionStrategy : Determine which relations hold the determinant & dependent data. Use relational algebra to derive one relation that holds both. Decide whether determinant or dependant data can be pruned out. Convert to SQL.

Further Developments Suppose determinant and/or dependent data is itself split over 2 or

more relations ? Combine them into one relation using the above approach. Then continue as before.

May need both horizontal and vertical calculations in one query. Even more advanced queries can be built on these foundations.

Design them with the same approach, applied recursively.