M ATH IN SQL. 222 A GGREGATION O PERATORS Operators on sets of tuples. Significant extension of...

26
MATH IN SQL

Transcript of M ATH IN SQL. 222 A GGREGATION O PERATORS Operators on sets of tuples. Significant extension of...

MATH IN SQL

222

AGGREGATION OPERATORS

Operators on sets of tuples.

Significant extension of relational algebra. SUM ( [DISTINCT] A): the sum of all (unique) values

in attribute A. AVG ( [DISTINCT] A): the average of all (unique)

values in attribute A.

SELECT AVG ( DISTINCT S.age)FROM Sailors SWHERE S.rating=10;

SELECT AVG (S.age)FROM Sailors S;

333

AGGREGATION OPERATORS

Operators on sets of tuples.

Significant extension of relational algebra. MAX (A): the maximum value in attribute A. MIN (A): the minimum value in attribute A.

SELECT S.snameFROM Sailors SWHERE S.rating= (SELECT MAX(S2.rating) FROM Sailors S2);

SELECT MAX(rating) FROM Sailors;

444

AGGREGATION OPERATORS

Operators on sets of tuples.

Significant extension of relational algebra. COUNT (*): the number of tuples.

SELECT COUNT (*)FROM Sailors S

555

AGGREGATION OPERATORS

Operators on sets of tuples.

Significant extension of relational algebra. COUNT ( [DISTINCT] A): the number of (unique)

values in attribute A.

SELECT COUNT (DISTINCT S.rating)FROM Sailors SWHERE S.sname=‘Bob’;

666

AGGREGATION OPERATORS Find name and age of

the oldest sailor(s). The first query looks

correct, but is illegal. Thoughts as to why?

The second query is a correct and legal solution.

SELECT S.sname, MAX (S.age)FROM Sailors S;

SELECT S.sname, S.ageFROM Sailors SWHERE S.age = (SELECT MAX (S2.age) FROM Sailors S2);

777

GROUP BY AND HAVING So far, we’ve applied aggregation operators

to all (qualifying) tuples. Sometimes, we want to apply them to each of several groups of tuples.

Find the age of the youngest sailor for each rating value. Suppose we know that rating values go from 1 to

10; we can write ten (!) queries that look like this:

But in general, we don’t know how many rating values exist, and what these rating values are.

Plus, it’s a waste of time to write so many queries

SELECT MIN (S.age)FROM Sailors SWHERE S.rating = i;

For i = 1, 2, ... , 10:

888

GROUP BY AND HAVING

A group is a set of tuples that have the same value for all attributes grouping-list.

The target-list contains attribute names terms with aggregation operations.

Attribute list must be a subset of grouping-list.

Each answer tuple corresponds to a group, and output attributes must have a single value per group.

SELECT [DISTINCT] target-listFROM relation-listWHERE qualificationGROUP BY grouping-listHAVING group-qualification

Notice the notation

999

CONCEPTUAL EVALUATION Given:

SELECT S.rating, MIN(S.age) as minageFROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT(*) > 1

Step 1 The cross-product of relation-list is computed In this instance, it’s only Sailors

101010

CONCEPTUAL EVALUATION Given:

SELECT S.rating, MIN(S.age) as minageFROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT(*) > 1

Step 2 Tuples that fail qualification are discarded ‘unnecessary’ attributes are deleted

111111

CONCEPTUAL EVALUATION Given:

SELECT S.rating, MIN(S.age) as minageFROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT(*) > 1

Step 3 Remaining tuples are partitioned into

groups by the value of attributes ingrouping-list

121212

CONCEPTUAL EVALUATION Given:

SELECT S.rating, MIN(S.age) as minageFROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT(*) > 1

Step 4 The group-qualification is then applied to

eliminate groups that do not satisfy thiscondition.

131313

CONCEPTUAL EVALUATION Given:

SELECT S.rating, MIN(S.age) as minageFROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT(*) > 1

Step 5 One answer tuple is generated per qualifying

group by applying the aggregation operator.

141414

GROUP BY AND HAVING Find the age of the youngest

sailor with age 18, for each rating with at least 2 such sailors.

SELECT S.rating, MIN (S.age)

FROM Sailors SWHERE S.age >= 18GROUP BY S.ratingHAVING COUNT (*) > 1;

Only S.rating and S.age are mentioned in the SELECT, GROUP BY or HAVING clauses; other attributes `unnecessary’.

2nd column of result is unnamed What to do?

sid sname rating age22 dustin 7 45.031 lubber 8 55.571 zorba 10 16.064 horatio 7 35.029 brutus 1 33.058 rusty 10 35.0

rating age1 33.07 45.07 35.08 55.510 35.0

rating7 35.0

Answer relation

151515

GROUP BY AND HAVING For each red boat, find the number of

reservations for this boat.SELECT B.bid, COUNT (*) AS scountFROM Sailors S, Boats B, Reserves RWHERE S.sid=R.sid AND R.bid=B.bid AND

B.color=‘red’GROUP BY B.bid;

Grouping over a join of three relations.

What do we get if we remove B.color=‘red’ from the WHERE clause and add a HAVING clause with this condition?

What if we drop Sailors and the condition involving S.sid?

161616

GROUP BY AND HAVING Find the age of the youngest sailor with age > 18, for

each rating with at least 2 sailors (of any age).

SELECT S.rating, MIN (S.age)FROM Sailors SWHERE S.age > 18GROUP BY S.ratingHAVING 1 < (SELECT COUNT (*) FROM Sailors S2 WHERE S.rating=S2.rating);

Shows HAVING clause can also contain a subquery.

What if HAVING clause is replaced by: HAVING COUNT(*) >1

171717

GROUP BY AND HAVING Find those ratings for which the average age is the

minimum over all ratings. Aggregation operations cannot be nested! WRONG:

SELECT S.ratingFROM Sailors SWHERE S.age = (SELECT MIN (AVG (S2.age)) FROM Sailors S2);

Correct solution:SELECT Temp.rating, Temp.avgageFROM (SELECT S.rating, AVG (S.age) AS avgage FROM Sailors S GROUP BY S.rating) AS TempWHERE Temp.avgage = (SELECT MIN (Temp.avgage) FROM Temp);

ORDERING & TOP/BOTTOM

191919

ORDER BY The ORDER BY keyword is used to sort the

result-set by a specified column. The ORDER BY keyword sort the records in

ascending order by default. If you want to sort the records in a

descending order, you can use the DESC keyword.

202020

TOP/BOTTOM The TOP clause is used to specify the number

of records to return. The TOP clause can be very useful on large

tables with thousands of records Returning a large number of records can impact

on performance Can ‘sample’ the table using TOP

Not all database systems support the TOP clause or implement it in different fashion

212121

TOP/BOTTOM

SQL ServerSELECT TOP number|percent column_name(s)FROM table_name

Ex: SELECT TOP 5 * FROM Persons

MySQLSELECT column_name(s)FROM table_nameLIMIT number

Ex: SELECT *FROM PersonsLIMIT 5

222222

TOP/BOTTOM

OracleSELECT column_name(s)FROM table_nameWHERE ROWNUM <= number

Ex: SELECT *FROM PersonsWHERE ROWNUM <=5

DB2SELECT column_name(s)FROM table_nameFETCH FIRST number ROWS ONLY

Ex: SELECT *FROM PersonsFETCH FIRST 5 ROWS ONLY

232323

TOP/BOTTOM

Can specify

Fixed numberSELECT TOP 10 * …

A percentSELECT TOP 10 PERCENT * …

242424

TOP/BOTTOM How to return the oldest 5 rentals?

How to return the newest 5 rentals?

252525

TOP/BOTTOM How to return the 3rd newest rental?

262626

SUMMARY SQL was an important factor in the early

acceptance of the relational model; more natural than earlier, procedural query languages.

All queries that can be expressed in relational algebra can also be formulated in SQL.

In addition, SQL has significantly more expressive power than relational algebra, in particular aggregation operations and grouping.

Many alternative ways to write a query; query optimizer looks for most efficient evaluation plan.

In practice, users need to be aware of how queries are optimized and evaluated for most efficient results.