SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL:...

26
SQL: Ordering Results, Duplicate Semantics and NULL Values Spring 2018 School of Computer Science University of Waterloo Databases CS348 (University of Waterloo) SQL: Duplicates and NULLs 1 / 29

Transcript of SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL:...

Page 1: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

SQL: Ordering Results, Duplicate Semanticsand NULL Values

Spring 2018

School of Computer ScienceUniversity of Waterloo

Databases CS348

(University of Waterloo) SQL: Duplicates and NULLs 1 / 29

Page 2: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Ordering Results

No particular ordering on the rows of a table can be assumedwhen queries are written. (This is important!)No particular ordering of rows of an intermediate result in thequery can be assumed either.However, it is possible to order the final result of a query, using theORDER BY clause at the end of the query.

General form:ORDER BY e1 [Dir1], . . . ,ek [Dir k ]

where Dir i is either ASC or DESC; ASC is the default.

(University of Waterloo) SQL: Duplicates and NULLs 3 / 29

Page 3: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Example

List all authors in the database in ascending order of their name:

SQL> select distinct *2 from author3 order by name;

AID NAME--- --------------2 Chomicki, Jan3 Saake, Gunter1 Toman, David

Again, the asc keyword is optional, and is assumed by default.A descending order is obtained with the desc keyword.Minor sorts, minor minor sorts, etc., can be added.

(University of Waterloo) SQL: Duplicates and NULLs 4 / 29

Page 4: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Multisets and Duplicates

SQL has always had a MULTISET/BAG semantics rather than aSET semantics:⇒ SQL tables are multisets of tuples⇒ originally for efficiency reasons

What does “allows duplicates” mean?

partboltboltboltnutnut

⇐⇒part cntbolt 3nut 2

(University of Waterloo) SQL: Duplicates and NULLs 6 / 29

Page 5: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

How does this impact Queries?

Example (Cheap Quantification–Projection)

EMPName Dept DeptBob CS CSSue CS CSFred PMath

{y |∃x .EMP(x ,y)}−−−−−−−−−−−→ PMathBarb Stats StatsJim Stats Stats

⇐⇒

Dept cntCS 2

PMath 1Stats 2

(University of Waterloo) SQL: Duplicates and NULLs 7 / 29

Page 6: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Range-restricted Queries for Multisets

Definition (Range restricted formulas with DISTINCT)

A formula (condition) ϕ is range restricted when, for ϕi that are alsorange restricted, ϕ has the form

R(xi1 , . . . , xik ),ϕ1 ∧ ϕ2 ,ϕ1 ∧ (xi = xj) ({xi , xj} ∩ FV (ϕ1) 6= ∅),∃xi .ϕ1 (xi ∈ FV (ϕ1)),ϕ1 ∨ ϕ2 (FV (ϕ1) = FV (ϕ2)),ϕ1 ∧ ¬ϕ2 (FV (ϕ2) = FV (ϕ1)), orDISTINCT(ϕ).

(University of Waterloo) SQL: Duplicates and NULLs 8 / 29

Page 7: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Duplicates and QueriesIdeas

1 A valuation can appear k times (k > 0) as a query answer

2 The number of duplicates is a function of the numbers of duplicates informulas.

3 DB, θ, k |= ϕ reads “valuation θ appears k times in ϕ’s answer”.(we assume k = 0 for valuations that are NOT ϕ’s answers”.).

Definition (Multiset Semantics for the Relational Calculus)

DB, θ, k |= R(x1, . . . , xk ) if (θ(x1), . . . , θ(xk )) ∈ R k timesDB, θ,m · n |= ϕ ∧ ψ if DB, θ,m |= ϕ and DB, θ, n |= ψDB, θ,m |= ϕ ∧ (xi = xj) if DB, θ,m |= ϕ and θ(xi) = θ(xj)DB, θ,

∑v∈D nv |= ∃x .ϕ if DB, θ[x := v ],nv |= ϕ

DB, θ,m + n |= ϕ ∨ ψ if DB, θ,m |= ϕ and DB, θ, n |= ψDB, θ,m − n |= ϕ ∧ ¬ψ if DB, θ,m |= ϕ,DB, θ, n |= ψ and m > nDB, θ, 1 |= DISTINCT(ϕ) if DB, θ,m |= ϕ

(University of Waterloo) SQL: Duplicates and NULLs 9 / 29

Page 8: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Duplicates and SQL

Allowing duplicates leads to additional syntax.

A duplicate elimination operator⇒ “SELECT DISTINCT x” v.s. “SELECT x” in SELECT-blocks

MULTISET (BAG) operators⇒ equivalents of set operations⇒ but with multiset semantics.

(University of Waterloo) SQL: Duplicates and NULLs 10 / 29

Page 9: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Example

SQL> select r1.publication2 from wrote r1, wrote r23 where r1.publication=r2.publication4 and r1.author<>r2.author;

PUBLICAT--------ChSa98ChSa98ChTo98ChTo98ChTo98aChTo98a

⇒ for publications with n authors we get O(n2) answers!

(University of Waterloo) SQL: Duplicates and NULLs 11 / 29

Page 10: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Bag Operations

Bag union: UNION ALL

⇒ additive union: bag containing all in Q1 and Q2.

Bag difference: EXCEPT ALL

⇒ subtractive difference (monus):⇒ a bag all tuples in Q1 for which

there is no “matching” tuple in Q2.

Bag intersection: INTERSECT ALL

⇒ a bag of all tuples taking themaximal number common to Q1 and Q2

(University of Waterloo) SQL: Duplicates and NULLs 12 / 29

Page 11: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Example

SQL> ( select author2 from wrote, book3 where publication=pubid )4 union all5 ( select author6 from wrote, article7 where publication=pubid );

AUTHOR----------

2312121

. . . a fragment of a more meaningful query (coming later).(University of Waterloo) SQL: Duplicates and NULLs 13 / 29

Page 12: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Summary

SQL covered so far:

1 simple SELECT BLOCK

2 set operations

3 duplicates and multiset operations

4 formulation of complex queries, nesting of queries, and views

5 aggregation

Note that duplicates in subqueries occurring in where clauses willnot change the results computed by the top-level query, but thatthis is not true for subqueries in with or from clauses.

(University of Waterloo) SQL: Duplicates and NULLs 14 / 29

Page 13: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

“Pure” SQL Equivalence, Revisited

Recall how nesting in the WHERE clause is syntactic sugar:

SELECT r.b SELECT r.bFROM r FROM r, (WHERE r.a IN ( SELECT DISTINCT bSELECT b FROM sFROM s ) AS s

) WHERE r.a = s.b

Rewriting does not generally hold if DISTINCT is removed.

(University of Waterloo) SQL: Duplicates and NULLs 15 / 29

Page 14: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

What is a “null” value?

PhoneName Office HomeJoe 1234 3456Sue 1235 ?

Sue doesn’t have home phone (value inapplicable)Sue has home phone, but we don’t know her number

(value unknown)

(University of Waterloo) SQL: Duplicates and NULLs 17 / 29

Page 15: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Value Inapplicable

Essentially poor schema design.

Better design:

Office PhoneName OfficeJoe 1234Sue 1235

Home PhoneName HomeJoe 3456

Queries should behave as if asked over the above decomposition.⇒ (relatively) easy to implement

(University of Waterloo) SQL: Duplicates and NULLs 18 / 29

Page 16: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Value Unknown

Idea

Unknown values can be replaced by any domain value (that satisfiesintegrity constraints).⇒ many possibilities (possible worlds)

PhoneName Office HomeJoe 1234 3456Sue 1235 ?

PhoneName Office HomeJoe 1234 3456Sue 1235 0000

...

PhoneName Office HomeJoe 1234 3456Sue 1235 9999

(University of Waterloo) SQL: Duplicates and NULLs 19 / 29

Page 17: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Value Unknown and Queries

How do we answer queries?

Idea

Answers true in all possible worlds W of an incomplete D.

Certain AnswerQ(D) =

⋂W world of D

Q(W )

⇒ answer common to all possible worlds.

Is this (computationally) feasible?

⇒ NO (NP-hard to undecidable except in trivial cases)

SQL’s solution: a (crude) approximation

(University of Waterloo) SQL: Duplicates and NULLs 20 / 29

Page 18: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

What can we do with NULLs in SQL?

expressionsgeneral rule: a NULL as a parameter to an operationmakes (should make) the result NULL1 + NULL→ NULL, ’foo’||NULL→ NULL, etc.

predicates/comparisonsthree-valued logic (crude approximation of “valueunknown”)

set operationsunique special value for duplicates

aggregate operationsdoesn’t “count” (i.e., “value inapplicable”)

(University of Waterloo) SQL: Duplicates and NULLs 21 / 29

Page 19: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Comparisons Revisited

Idea

Comparisons with a NULL value return UNKNOWN

Example

1 = 1 TRUE1 = NULL UNKNOWN1 = 2 FALSE

Still short of proper logical behaviour:x = 0 ∨ x 6= 0

should be always true (no matter what x is, including NULL!), but. . .

(University of Waterloo) SQL: Duplicates and NULLs 22 / 29

Page 20: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

UNKNOWN and Boolean Connectives

Idea

Boolean operations have to handle UNKNOWN⇒ extended truth tables for Boolean connectives

∧ T U FT T U FU U U FF F F F

∨ T U FT T T TU T U UF T U F

¬T FU UF T

. . . for tuples in which x is assigned the NULL value we get:x = 0 ∨ x 6= 0→ UNKNOWN ∨ UNKNOWN→ UNKNOWN

which is not the same as TRUE.

(University of Waterloo) SQL: Duplicates and NULLs 23 / 29

Page 21: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

UNKNOWN in WHERE ClausesHow is this used in a WHERE clause?

Additional syntax IS TRUE, IS FALSE, and IS UNKNOWN⇒ WHERE <cond> shorthand for WHERE <cond> IS TRUE

Special comparison IS NULL

List all authors for which we don’t know a URL of their home page:

SQL> select aid, name2 from author3 where url IS NULL;

AID NAME---------- -----------

3 Saake, Gunter

(University of Waterloo) SQL: Duplicates and NULLs 24 / 29

Page 22: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Counting NULLS

How do NULLs interact with counting (and aggregates in general)?count(URL) counts only non-NULL URL’s

⇒ count(*) counts “rows”

db2 => select count(*) as RS, count(url) as USdb2 => from author;

RS US---------------3 2

1 record(s) selected.

(University of Waterloo) SQL: Duplicates and NULLs 25 / 29

Page 23: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Outer Join

Idea

Allow “NULL-padded” answers that “fail to satisfy” a conjunct in aconjunction

Extension of syntax for the FROM clause⇒ FROM R <j-type> JOIN S ON C⇒ the <j-type> is one of FULL, LEFT, RIGHT, or INNER

Semantics (for R(x , y), S(y , z), and C = (r .y = s.y)).1 {(x , y , z) : R(x , y) ∧ S(y , z)}2 {(x , y ,NULL) : R(x , y) ∧ ¬(∃z.S(y , z))} for LEFT and FULL

3 {(NULL, y , z) : S(y , z) ∧ ¬(∃x .R(x , y))} for RIGHT and FULL

⇒ syntactic sugar for UNION ALL

(University of Waterloo) SQL: Duplicates and NULLs 26 / 29

Page 24: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Example

db2 => select aid,publicationdb2 => from author left join wrotedb2 => on aid=author;

AID PUBLICATION----------- -----------

1 ChTo981 ChTo98a1 Tom972 ChTo982 ChTo98a2 ChSa983 ChSa985 -

8 record(s) selected.

(University of Waterloo) SQL: Duplicates and NULLs 27 / 29

Page 25: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Counting with OJ

For every author count the number of publications:

db2 => select aid, count(publication) as pubsdb2 => from author left join wrotedb2 => on aid=authordb2 => group by aid;

AID PUBS----------- -----------

1 32 33 15 0

4 record(s) selected.

(University of Waterloo) SQL: Duplicates and NULLs 28 / 29

Page 26: SQL: Ordering Results, Duplicate Semantics and NULL …david/cs348/lect-BADSQL-handout.pdf · SQL: Ordering Results, Duplicate Semantics and NULL Values ... SQL has always had a MULTISET/BAG

Summary

NULLs are necessary evil

⇒ used to account for (small) irregularities in data⇒ should be used sparingly

Can be always avoided

⇒ however some of the solutions may be inefficient

You can’t escape NULLs in practice

⇒ easy fix for blunders in schema design⇒ . . . also due to schema evolution, etc.

(University of Waterloo) SQL: Duplicates and NULLs 29 / 29