Database Programming

45
Database Programming Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations

description

Database Programming. Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations. GROUP BY Clause. Use the Group By clause to divide the rows in a table into groups that apply. This Groups Functions to return summary information about that group - PowerPoint PPT Presentation

Transcript of Database Programming

Page 1: Database Programming

Database Programming

Sections 5– GROUP BY, HAVING clauses, Rollup & Cube Operations, Grouping Set, Set Operations

Page 2: Database Programming

Marge Hohly 2

GROUP BY Clause Use the Group By clause to divide the rows in

a table into groups that apply. This Groups Functions to return summary information about that group

In the example below, the rows are being GROUPed BY department_id. The AVG(group function) is then applied to each GROUP, or department_id.

SELECT department_id, AVG(salary)FROM employeesGROUP BY department_id;

Page 3: Database Programming

Marge Hohly 3

Results of previous query

Page 4: Database Programming

Marge Hohly 4

GROUP BY Clause With aggregate (group) functions in the SELECT

clause, be sure to include individual columns from the SELECT clause in a GROUP BY clause!!!

SELECT department_id, job_id, SUM(salary)FROM employeesWHERE hire_date <= ’01-JUN-00’GROUP BY department_id, job_id;

SELECT d.department_id,d.department_name, MIN(e.hire_date) AS “Min Date”FROM departments d, employees eWHERE e.department_id=d.department_idGROUP BY ?????

Page 5: Database Programming

Marge Hohly 5

GROUP BY Rules... Use the Group By clause to divide the rows in a table into

groups then apply the Group Functions to return summary information about that group.

If the Group By clause is used, all individual columns in the SELECT clause must also appear in the GROUP BY clause.

Columns in the GROUP BY clause do not have to appear in the SELECT clause

No column aliases can be used in the Group By clause. Use the ORDER BY clause to sort the results other than the

default ASC order. The WHERE clause, if used, can not have any group

functions – use it to restrict any columns in the SELECT clause before they are divided into groups.

Use the HAVING clause to restrict groups not the WHERE clause.

Page 6: Database Programming

GROUP BY examples:1. Show the average graduation rate of the schools in several

cities; include only those students who have graduated in the last few years

SELECT AVG(graduation_rate), city FROM students

WHERE graduation_date >= ’01-JUN-07’ GROUP BY city;

2. Count the number of students in the school, grouped by grade; include all students

SELECT COUNT(first_name), grade FROM students GROUP BY grade;

Marge Hohly 6

Page 7: Database Programming

GROUP BY Guidelines

Important guidelines to remember when using a GROUP BY clause are: If you include a group function (AVG, SUM, COUNT,

MAX, MIN, STDDEV, VARIANCE) in a SELECT clause and any other individual columns, each individual column must also appear in the GROUP BY clause.

You cannot use a column alias in the GROUP BY clause.

The WHERE clause excludes rows before they are divided into groups.

Marge Hohly 7

Page 8: Database Programming

GROUPS WITHIN GROUPS Sometimes you need to

divide groups into smaller groups. For example, you may want to group all employees by department; then, within each department, group them by job.

This example shows how many employees are doing each job within each department.

SELECT department_id, job_id, count(*)

FROM employeesWHERE department_id > 40GROUP BY department_id,

job_id;

Marge Hohly 8

Dept_ID JOB_Idq COUNT(*)

50 ST_MAN 1

50 ST_CLERK 4

60 IT_PROG 3

80 SA_MAN 1

80 SA_REP 2

… … …

Page 9: Database Programming

Nesting Group functions Group functions can be nested to a depth of

two when GROUP BY is used.

SELECT max(avg(salary)) FROM employees GROUP by department_id;

How many values will be returned by this query? The answer is one – the query will find the average salary for each department, and then from that list, select the single largest value.

Marge Hohly 9

Page 10: Database Programming

Marge Hohly 10

The HAVING Clause With the HAVING clause the Oracle Server:

1. Having is used to restrict groups.2. Applies group function to the group(s).3. Displays the groups that match the criteria in

the HAVING clause. SELECT department_id, job_id,

SUM(salary)FROM employeesWHERE hire_date <= ’01-JUN-00’GROUP BY department_id, job_idHAVING department_id >50;

Page 11: Database Programming

Marge Hohly 11

The HAVING clause example SELECT

department_id, job_id, SUM(salary)FROM employeesWHERE hire_date <= ’01-JUN-00’GROUP BY department_id, job_idHAVING department_id >50;

Page 12: Database Programming

Marge Hohly 12

WHERE or HAVING ?? The WHERE clause is used to restrict rows.

SELECT department_id, MAX(salary)FROM employeesWHERE department_id>=20GROUP BY department_id;

The HAVING clause is used to restrict groups returned by a GROUP BY clause.

SELECT department_id, MAX(salary)FROM employeesGROUP BY department_idHAVING MAX(salary)> 1000;

Page 13: Database Programming

HAVING

Although the HAVING clause can precede the GROUP BY clause in a SELECT statement, it is recommended that you place each clause in the order shown. The ORDER BY clause (if used) is always last!

SELECT column, group_functionFROM tableWHERE GROUP BYHAVINGORDER BY

Marge Hohly 13

Page 14: Database Programming

Marge Hohly 14

GROUP BY Extensions When using GROUP BY functions and you want

to extend them you can add the ROLLUP, CUBE, & GROUPING SETS.

These extensions make your life a lot easier and more efficient

Page 15: Database Programming

ROLLUP In GROUP BY queries you are quite often required to

produce subtotals and totals, and the ROLLUP operation can do that for you.

Action of ROLLUP is straightforward: it creates subtotals that roll up from the most detailed level to a grand total, following a grouping list specified in the ROLLUP clause.

ROLLUP takes as its argument an ordered list of grouping columns. it calculates the standard aggregate values specified in the GROUP

BY clause. it creates progressively higher-level subtotals, moving from right to

left through the list of grouping columns it creates a grand total.

Marge Hohly 15

Page 16: Database Programming

Marge Hohly 16

Page 17: Database Programming

Marge Hohly 17

CUBE CUBE is an extension to the GROUP BY clause like

ROLLUP. It produces cross-tabulation reports. It can be applied to all aggregate functions including

AVG, SUM, MIN, MAX and COUNT. Columns listed in the GROUP BY clause are cross-

referenced to create a superset of groups. The aggregate functions specified in the SELECT list are applied to this group to create summary values for the additional super-aggregate rows. Every possible combination of rows is aggregated by CUBE. If you have n columns in the GROUP BY clause, there will be 2n possible super-aggregate combinations. Mathematically these combinations form an n-dimensional cube, which is how the operator got its name.

Page 18: Database Programming

Marge Hohly 18

CUBE (Cont.) CUBE is typically most suitable in queries that use columns from

multiple tables rather than columns representing different rows of a single table.

Imagine for example a user querying the Sales table for a company like AMAZON.COM. For instance, a commonly requested cross- tabulation might need subtotals for all the combinations of Month, Region and Product.

These are three independent tables, and analysis of all possible subtotal combinations is commonplace. In contrast, a cross- tabulation showing all possible combinations of year, month and day would have several values of limited interest, because there is a natural hierarchy in the time table. Subtotals such as profit by day of month summed across year would be unnecessary in most analyses. Relatively few users need to ask "What were the total sales for the 16th of each month across the year?"

Page 19: Database Programming

CUBE (Cont.)

Marge Hohly 19

Page 20: Database Programming

Marge Hohly 20

GROUPING SETS GROUPING SETS is another extension to the GROUP BY clause, like

ROLLUP and CUBE. It is used to specify multiple groupings of data. It is like giving you the possibility to have multiple GROUP BY clauses in the same SELECT statement, which is not allowed in the syntax.

The point of GROUPING SETS is that if you want to see data from the EMPLOYEES table grouped by (department_id, job_id , manager_id), but also by (department_id, manager_id) and also by (job_id, manager_id) then you would normally have to write 3 different select statements with the only difference being the GROUP BY clauses. For the database this means retrieving the same data in this case 3 times, and that can be quite a big overhead. Imagine if your company had 3,000,000 employees. Then you are asking the database to retrieve 9 million rows instead of just 3 million rows – quite a big difference.

So GROUPING SETS are much more efficient when writing complex reports.

Page 21: Database Programming

Marge Hohly 21

Page 22: Database Programming

Marge Hohly 22

Page 23: Database Programming

Marge Hohly 23

Page 24: Database Programming

GROUPING FUNCTIONS Continued.

The GROUPING function handles these problems. Using a single column from the query as its argument, GROUPING returns 1 when it encounters a NULL value created by a ROLLUP or CUBE operation. That is, if the NULL indicates the row is a subtotal, GROUPING returns a 1. Any other type of value, including a stored NULL, returns a 0. So the GROUPING function will return a 1 for an aggregated (computed) row and a 0 for a not aggregated (returned) row.

The syntax for the GROUPING is simply GROUPING (column_name). It is used only in the SELECT clause and it takes only a single column expression as argument.

Marge Hohly 24

Page 25: Database Programming

GROUPING FUNCTIONS Continued.

SELECT department_id, job_id, SUM(salary), GROUPING(department_id) Dept_sub_total, DECODE(GROUPING(department_id), 1,'Dept Aggregate row', department_id) AS DT, GROUPING(job_id) job_sub_total, DECODE(GROUPING(job_id), 1,'JobID Aggregate row',job_id) AS JIFROM employeesWHERE department_id < 50GROUP BY CUBE (department_id, job_id);

Marge Hohly 25

Page 26: Database Programming

Marge Hohly 26

Page 27: Database Programming

Marge Hohly 27

Page 28: Database Programming

Marge Hohly 28

Page 29: Database Programming

Marge Hohly 29

Page 30: Database Programming

Group By Extensions: ROLLUP: Used to create subtotals that roll up

from the most detailed level to a grand total, following a grouping list specified in the clause

CUBE: An extension to the GROUP BY clause like ROLLUP that produces cross-tabulation reports.

GROUPING SETS: Used to specify multiple groupings of data

GROUPING FUNCTION: Used to return a value for an aggregated row and another value for a non aggregated row.

Marge Hohly 30

Page 31: Database Programming

What will be covered:

Define and explain the purpose of Set Operators

Use a set operator to combine multiple queries into a single query

Control the order of rows returned using set operators

Marge Hohly 31

Page 32: Database Programming

Why Set Operators Set operators are used to combine the results from

different SELECT statements into one single result output.

Sometimes you want a single output from more than one table. If you join the tables, you only get returned the rows that match, but what if you don’t want to do a join, or can’t do a join because a join will give the wrong result?

This is where SET operators comes in. They can return the rows found in both statements, the rows that are in one table and not the other or the rows common to both statements

Marge Hohly 32

Page 33: Database Programming

Marge Hohly 33

Page 34: Database Programming

Rules to Remember

There are a few rules to remember when using SET operators: The number of columns and the data types of the

columns must be identical in all of the SELECT statements used in the query.

The names of the columns need not be identical. Column names in the output are taken from the

column names in the first SELECT statement. So any column aliases should be entered in the first statement as you would want to see them in the finished report.

Marge Hohly 34

Page 35: Database Programming

Marge Hohly 35

Page 36: Database Programming

Marge Hohly 36

Page 37: Database Programming

Marge Hohly 37

Page 38: Database Programming

Marge Hohly 38

Page 39: Database Programming

SET Operator Examples Sometimes if you are selecting rows from tables that do not have

columns in common, you may have to make up columns in order to match the queries. The easiest way to do this is to include one or more NULL values in the select list. Remember to give them suitable aliases and matching datatypes.

For example:Table A contains a location id and a department name.Table B contains a location id and a warehouse name.You can use the TO_CHAR(NULL) function to fill in the missing columns as

shown below.SELECT location_id, department_name "Department", TO_CHAR(NULL)

"Warehouse"FROM departments UNION SELECT location_id, TO_CHAR(NULL) "Department", warehouse_name FROM warehouses;

Marge Hohly 39

Page 40: Database Programming

SET Operator Examples (Continued)

The keyword NULL can be used to match columns in a SELECT list. One NULL is included for each missing column. Furthermore, NULL is formatted to match the datatype of the column it is standing in for, so TO_CHAR, TO_DATE or TO_NUMBER functions are often used to achieve identical SELECT lists.

Marge Hohly 40

Page 41: Database Programming

Marge Hohly 41

Page 42: Database Programming

Marge Hohly 42

Page 43: Database Programming

Marge Hohly 43

Page 44: Database Programming

Marge Hohly 44

Page 45: Database Programming

Key terms SET operators: used to combine results into one

single result from multiple SELECT statements UNION: operator that returns all rows from both

tables and eliminates duplicates UNION ALL: operator that returns all rows from

both tables, including duplicates INTERSECT: operator that returns rows common to

both tables MINUS: operator that returns rows that are unique

to each table TO_CHAR(null): columns that were made up to

match queries in another table that are not in both tables

Marge Hohly 45