SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 ·...

37
SQL Workshop Subqueries Doug Shook

Transcript of SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 ·...

Page 1: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

SQL Workshop

Subqueries

Doug Shook

Page 2: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

2

Subqueries A SELECT statement within a SELECT statement

Four ways to code one:– WHERE (*)– HAVING (*)– FROM– SELECT

Syntax is the same– Rare to see GROUP BY or HAVING

Page 3: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

3

SubqueriesA subquery in the WHERE clause SELECT InvoiceNumber, InvoiceDate, InvoiceTotal FROM Invoices WHERE InvoiceTotal > (SELECT AVG(InvoiceTotal) FROM Invoices) ORDER BY InvoiceTotal;

The value returned by the subquery 1879.7413

The result set

Page 4: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

4

Subqueries vs. Joins Advantages of subqueries– Aggregates can be passed to outer query– More intuitive than joins for ad-hoc relationships– Can be easier to code

Advantages of joins– Can include columns from both tables– More intuitive than subqueries for existing

relationships– Performance is generally better

Page 5: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

5

Subqueries vs. Joins What would this look like with a subquery instead?

A query that uses an inner join SELECT InvoiceNumber, InvoiceDate, InvoiceTotal FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID WHERE VendorState = 'CA' ORDER BY InvoiceDate;

The result set

( 4 0 r o w s )

Page 6: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

6

Subqueries vs. Joins

Most queries can be written either way

The same query restated with a subquery SELECT InvoiceNumber, InvoiceDate, InvoiceTotal FROM Invoices WHERE VendorID IN (SELECT VendorID FROM Vendors WHERE VendorState = 'CA') ORDER BY InvoiceDate;

The same result set

Page 7: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

7

Subqueries and WHERE

Subquery must return a single column of values here– Why?

Could this be rewritten without the subquery?

The syntax of a WHERE clause that uses an IN phrase with a subquery

WHERE test_expression [NOT] IN (subquery)

A query that returns vendors without invoices SELECT VendorID, VendorName, VendorState FROM Vendors WHERE VendorID NOT IN (SELECT DISTINCT VendorID FROM Invoices);

Page 8: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

8

Subqueries and WHERE

NOT IN can typically be replaced with an outer join.

The query restated without a subquery SELECT Vendors.VendorID, VendorName, VendorState FROM Vendors LEFT JOIN Invoices ON Vendors.VendorID = Invoices.VendorID WHERE Invoices.VendorID IS NULL;

The same result set

(88 rows)

Page 9: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

9

Subqueries and WHERE

What is the subquery returning here?

The syntax of a WHERE clause that uses a comparison operator WHERE expression comparison_operator [SOME|ANY|ALL] (subquery)

A query with a subquery in the WHERE condition SELECT InvoiceNumber, InvoiceDate, InvoiceTotal, InvoiceTotal - PaymentTotal - CreditTotal AS BalanceDue FROM Invoices WHERE InvoiceTotal - PaymentTotal - CreditTotal > 0 AND InvoiceTotal - PaymentTotal - CreditTotal < (SELECT AVG(InvoiceTotal - PaymentTotal - CreditTotal) FROM Invoices WHERE InvoiceTotal - PaymentTotal - CreditTotal > 0) ORDER BY InvoiceTotal DESC;

Page 10: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

10

ALL Can be thought of as a sequence of ANDs:

Special cases:– If no rows are returned, ALL will be true– What if all rows are null?

How the ALL keyword works Condition Equivalent expression x > ALL (1, 2) x > 2 x < ALL (1, 2) x < 1 x = ALL (1, 2) (x = 1) AND (x = 2) x <> ALL (1, 2) (x <> 1) AND (x <> 2)

Page 11: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

11

ALLA query that uses the ALL keyword SELECT VendorName, InvoiceNumber, InvoiceTotal FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID WHERE InvoiceTotal > ALL (SELECT InvoiceTotal FROM Invoices WHERE VendorID = 34) ORDER BY VendorName;

The result of the subquery

The result set

(25 rows)

Page 12: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

12

ANY and SOME Can be thought of as a sequence of Ors

Special cases– No rows returned?– All null values?

How the ANY and SOME keywords work Condition Equivalent expression x > ANY (1, 2) x > 1 x < ANY (1, 2) x < 2 x = ANY (1, 2) (x = 1) OR (x = 2) x <> ANY (1, 2) (x <> 1) OR (x <> 2)

Page 13: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

13

ANYA query that uses the ANY keyword SELECT VendorName, InvoiceNumber, InvoiceTotal FROM Vendors JOIN Invoices ON Vendors.VendorID = Invoices.VendorID WHERE InvoiceTotal < ANY (SELECT InvoiceTotal FROM Invoices WHERE VendorID = 115);

The result of the subquery

The result set

(17 rows)

Page 14: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

14

Correlated Subqueries Each subquery we’ve seen so far has been executed exactly once

Correlated subqueries execute once for each row in the outer query– Typically as part of a conditional statement

Potential conflicts– Same column in outer query and subquery?– Same table in outer query and subquery?

Page 15: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

15

Correlated SubqueriesA query that uses a correlated subquery SELECT VendorID, InvoiceNumber, InvoiceTotal FROM Invoices AS Inv_Main WHERE InvoiceTotal > (SELECT AVG(InvoiceTotal) FROM Invoices AS Inv_Sub WHERE Inv_Sub.VendorID = Inv_Main.VendorID) ORDER BY VendorID, InvoiceTotal;

The value returned by the subquery for vendor 95 28.5016

The result set

(36 rows)

Page 16: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

16

EXISTS Tests whether a subquery returns any rows– Does not return the rows, even if they exist– Most often used with correlated subquery

The syntax of a subquery with EXISTS WHERE [NOT] EXISTS (subquery)

A query that returns vendors without invoices SELECT VendorID, VendorName, VendorState FROM Vendors WHERE NOT EXISTS (SELECT * FROM Invoices WHERE Invoices.VendorID = Vendors.VendorID);

The result set

(88 rows)

Page 17: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

17

Subqueries with FROM This type of subquery returns a derived table– Must assign it to an alias– Can be used in the outer query like any other

table

Often used to provide additional summaries to a summary query

Page 18: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

18

Subqueries with FROM

A subquery coded in the FROM clause SELECT Invoices.VendorID, MAX(InvoiceDate) AS LatestInv, AVG(InvoiceTotal) AS AvgInvoice FROM Invoices JOIN (SELECT TOP 5 VendorID, AVG(InvoiceTotal) AS AvgInvoice FROM Invoices GROUP BY VendorID ORDER BY AvgInvoice DESC) AS TopVendor ON Invoices.VendorID = TopVendor.VendorID GROUP BY Invoices.VendorID ORDER BY LatestInv DESC;

Page 19: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

19

Subqueries with FROMThe derived table generated by the subquery

The result set

Page 20: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

20

Subqueries with SELECT Used in place of a column specifier– Must return a single value– Usually correlated– Can often be stated with a join instead• And should be!

Page 21: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

21

Subqueries with SELECT

How could we restate this with a join?

A correlated subquery in the SELECT clause SELECT DISTINCT VendorName, (SELECT MAX(InvoiceDate) FROM Invoices WHERE Invoices.VendorID = Vendors.VendorID) AS LatestInv FROM Vendors ORDER BY LatestInv DESC;

The result set

(122 rows)

Page 22: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

22

Subqueries with SELECTThe same query restated using a join SELECT VendorName, MAX(InvoiceDate) AS LatestInv FROM Vendors LEFT JOIN Invoices ON Vendors.VendorID = Invoices.VendorID GROUP BY VendorName ORDER BY LatestInv DESC;

The same result set

( 1 2 2 r o w s )

Page 23: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

23

Complex QueriesA complex query that uses three subqueries SELECT Summary1.VendorState, Summary1.VendorName, TopInState.SumOfInvoices FROM (SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoices FROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorID GROUP BY V_Sub.VendorState, V_Sub.VendorName) AS Summary1 A complex query (continued) JOIN (SELECT Summary2.VendorState, MAX(Summary2.SumOfInvoices) AS SumOfInvoices FROM (SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoices FROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorID GROUP BY V_Sub.VendorState, V_Sub.VendorName) AS Summary2 GROUP BY Summary2.VendorState) AS TopInState ON Summary1.VendorState = TopInState.VendorState AND Summary1.SumOfInvoices = TopInState.SumOfInvoices ORDER BY Summary1.VendorState;

Page 24: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

24

Complex Queries

Such queries are useful, but can we simplify this process?

The result set

( 1 0 r o w s )

Page 25: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

25

Complex Queries Procedure– State the problem to be solved– Use pseudocode to outline the query• Identify subqueries• Identify results from subqueries• Include aliases

– Use pseudocode for each subquery– Code the subqueries• Test!

– Combine and test final query

Page 26: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

26

Complex Queries

Pseudocode for the query SELECT Summary1.VendorState, Summary1.VendorName, TopInState.SumOfInvoices FROM (Derived table returning VendorState, VendorName, SumOfInvoices) AS Summary1 JOIN (Derived table returning VendorState, MAX(SumOfInvoices)) AS TopInState ON Summary1.VendorState = TopInState.VendorState AND Summary1.SumOfInvoices = TopInState.SumOfInvoices ORDER BY Summary1.VendorState;

Page 27: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

27

Complex QueriesThe code for the Summary1 and Summary2 subqueries SELECT V_Sub.VendorState, V_Sub.VendorName, SUM(I_Sub.InvoiceTotal) AS SumOfInvoices FROM Invoices AS I_Sub JOIN Vendors AS V_Sub ON I_Sub.VendorID = V_Sub.VendorID GROUP BY V_Sub.VendorState, V_Sub.VendorName;

The result of the Summary1 and Summary2 subqueries

( 3 4 r o w s )

Page 28: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

28

Complex QueriesPseudocode for the TopInState subquery SELECT Summary2.VendorState, MAX(Summary2.SumOfInvoices) FROM (Derived table returning VendorState, VendorName, SumOfInvoices) AS Summary2 GROUP BY Summary2.VendorState;

The result of the TopInState subquery

( 1 0 r o w s )

Page 29: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

29

Common Table Expressions Allows you to code an expression that defines a derived table– This table can be used with the query immediately

following the CTE

Most often used with SELECT

Page 30: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

30

Common Table ExpressionsTwo CTEs and a query that uses them WITH Summary AS ( SELECT VendorState, VendorName, SUM(InvoiceTotal) AS SumOfInvoices FROM Invoices JOIN Vendors ON Invoices.VendorID = Vendors.VendorID GROUP BY VendorState, VendorName ), TopInState AS ( SELECT VendorState, MAX(SumOfInvoices) AS SumOfInvoices FROM Summary GROUP BY VendorState ) SELECT Summary.VendorState, Summary.VendorName, TopInState.SumOfInvoices FROM Summary JOIN TopInState ON Summary.VendorState = TopInState.VendorState AND Summary.SumOfInvoices = TopInState.SumOfInvoices ORDER BY Summary.VendorState;

Page 31: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

31

Exercises Write a SELECT statement that returns the same result set as this SELECT statement. Substitute a subquery in a WHERE clause for the inner join.

SELECT DISTINCT VendorNameFROM Vendors JOIN InvoicesON Vendors.VendorID = Invoices.VendorIDORDER BY VendorName;

Page 32: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

32

Exercises Write a SELECT statement that answers this question: Which invoices have a PaymentTotal that's greater than the average PaymentTotal for all paid invoices? Return the InvoiceNumber and InvoiceTotal for each invoice.

Page 33: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

33

Exercises Write a SELECT statement that answers this question: which invoices have a PaymentTotal that's greater than the median PaymentTotal for all paid invoices? Return the InvoiceNumber and InvoiceTotal for each invoice.

Hint: Start with the solution to the previous problem then use the ALL keyword in the WHERE clause and code "TOP 50 PERCENT PaymentTotal" in the subquery.

Page 34: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

34

Exercises Write a SELECT statement that returns two columns from the GLAccounts table: AccountNo and AccountDescription. The result set should have one row for each account number that has never been used. Use a correlated subquery introduced with the NOT EXISTS operator. Sort the final result set by AccountNo.

Page 35: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

35

Exercises Write a SELECT statement that returns four columns: VendorName, InvoiceID, InvoiceSequence, and InvoiceLineItemAmount for each invoice that has more than one line item in the InvoiceLineItems table.

Hint: Use a subquery that tests for InvoiceSequence > 1.

Page 36: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

36

Exercises Write a SELECT statement that returns a single value that represents the sum of the largest unpaid invoices submitted by each vendor. Use a derived table that reutrns MAX(InvoiceTotal) grouped by VendorID, filtering for invoices with a balance due.

Page 37: SQL Workshop - Washington University in St. Louisdshook/sqlworkshop/Slides/p4.pdf · 2017-03-24 · 4 Subqueries vs. Joins Advantages of subqueries – Aggregates can be passed to

37

Exercises Write a SELECT statement that returns the name, city, and state of each vendor that's located in a unique city and state. In other words, don't include vendors that have a city and state in common with another vendor.

Write a SELECT statement that returns four columns: VendorName, InvoiceNumber, InvoiceDate, and InvoiceTotal. Return one row per vendor, representing the vendor's invoice with the earliest date.