SQL: Ordering Results, Duplicate Semanticsand NULL Values
Spring 2018
School of Computer ScienceUniversity of Waterloo
Databases CS348
(University of Waterloo) SQL: Duplicates and NULLs 1 / 29
Ordering Results
No particular ordering on the rows of a table can be assumedwhen queries are written. (This is important!)No particular ordering of rows of an intermediate result in thequery can be assumed either.However, it is possible to order the final result of a query, using theORDER BY clause at the end of the query.
General form:ORDER BY e1 [Dir1], . . . ,ek [Dir k ]
where Dir i is either ASC or DESC; ASC is the default.
(University of Waterloo) SQL: Duplicates and NULLs 3 / 29
Example
List all authors in the database in ascending order of their name:
SQL> select distinct *2 from author3 order by name;
AID NAME--- --------------2 Chomicki, Jan3 Saake, Gunter1 Toman, David
Again, the asc keyword is optional, and is assumed by default.A descending order is obtained with the desc keyword.Minor sorts, minor minor sorts, etc., can be added.
(University of Waterloo) SQL: Duplicates and NULLs 4 / 29
Multisets and Duplicates
SQL has always had a MULTISET/BAG semantics rather than aSET semantics:⇒ SQL tables are multisets of tuples⇒ originally for efficiency reasons
What does “allows duplicates” mean?
partboltboltboltnutnut
⇐⇒part cntbolt 3nut 2
(University of Waterloo) SQL: Duplicates and NULLs 6 / 29
How does this impact Queries?
Example (Cheap Quantification–Projection)
EMPName Dept DeptBob CS CSSue CS CSFred PMath
{y |∃x .EMP(x ,y)}−−−−−−−−−−−→ PMathBarb Stats StatsJim Stats Stats
⇐⇒
Dept cntCS 2
PMath 1Stats 2
(University of Waterloo) SQL: Duplicates and NULLs 7 / 29
Range-restricted Queries for Multisets
Definition (Range restricted formulas with DISTINCT)
A formula (condition) ϕ is range restricted when, for ϕi that are alsorange restricted, ϕ has the form
R(xi1 , . . . , xik ),ϕ1 ∧ ϕ2 ,ϕ1 ∧ (xi = xj) ({xi , xj} ∩ FV (ϕ1) 6= ∅),∃xi .ϕ1 (xi ∈ FV (ϕ1)),ϕ1 ∨ ϕ2 (FV (ϕ1) = FV (ϕ2)),ϕ1 ∧ ¬ϕ2 (FV (ϕ2) = FV (ϕ1)), orDISTINCT(ϕ).
(University of Waterloo) SQL: Duplicates and NULLs 8 / 29
Duplicates and QueriesIdeas
1 A valuation can appear k times (k > 0) as a query answer
2 The number of duplicates is a function of the numbers of duplicates informulas.
3 DB, θ, k |= ϕ reads “valuation θ appears k times in ϕ’s answer”.(we assume k = 0 for valuations that are NOT ϕ’s answers”.).
Definition (Multiset Semantics for the Relational Calculus)
DB, θ, k |= R(x1, . . . , xk ) if (θ(x1), . . . , θ(xk )) ∈ R k timesDB, θ,m · n |= ϕ ∧ ψ if DB, θ,m |= ϕ and DB, θ, n |= ψDB, θ,m |= ϕ ∧ (xi = xj) if DB, θ,m |= ϕ and θ(xi) = θ(xj)DB, θ,
∑v∈D nv |= ∃x .ϕ if DB, θ[x := v ],nv |= ϕ
DB, θ,m + n |= ϕ ∨ ψ if DB, θ,m |= ϕ and DB, θ, n |= ψDB, θ,m − n |= ϕ ∧ ¬ψ if DB, θ,m |= ϕ,DB, θ, n |= ψ and m > nDB, θ, 1 |= DISTINCT(ϕ) if DB, θ,m |= ϕ
(University of Waterloo) SQL: Duplicates and NULLs 9 / 29
Duplicates and SQL
Allowing duplicates leads to additional syntax.
A duplicate elimination operator⇒ “SELECT DISTINCT x” v.s. “SELECT x” in SELECT-blocks
MULTISET (BAG) operators⇒ equivalents of set operations⇒ but with multiset semantics.
(University of Waterloo) SQL: Duplicates and NULLs 10 / 29
Example
SQL> select r1.publication2 from wrote r1, wrote r23 where r1.publication=r2.publication4 and r1.author<>r2.author;
PUBLICAT--------ChSa98ChSa98ChTo98ChTo98ChTo98aChTo98a
⇒ for publications with n authors we get O(n2) answers!
(University of Waterloo) SQL: Duplicates and NULLs 11 / 29
Bag Operations
Bag union: UNION ALL
⇒ additive union: bag containing all in Q1 and Q2.
Bag difference: EXCEPT ALL
⇒ subtractive difference (monus):⇒ a bag all tuples in Q1 for which
there is no “matching” tuple in Q2.
Bag intersection: INTERSECT ALL
⇒ a bag of all tuples taking themaximal number common to Q1 and Q2
(University of Waterloo) SQL: Duplicates and NULLs 12 / 29
Example
SQL> ( select author2 from wrote, book3 where publication=pubid )4 union all5 ( select author6 from wrote, article7 where publication=pubid );
AUTHOR----------
2312121
. . . a fragment of a more meaningful query (coming later).(University of Waterloo) SQL: Duplicates and NULLs 13 / 29
Summary
SQL covered so far:
1 simple SELECT BLOCK
2 set operations
3 duplicates and multiset operations
4 formulation of complex queries, nesting of queries, and views
5 aggregation
Note that duplicates in subqueries occurring in where clauses willnot change the results computed by the top-level query, but thatthis is not true for subqueries in with or from clauses.
(University of Waterloo) SQL: Duplicates and NULLs 14 / 29
“Pure” SQL Equivalence, Revisited
Recall how nesting in the WHERE clause is syntactic sugar:
SELECT r.b SELECT r.bFROM r FROM r, (WHERE r.a IN ( SELECT DISTINCT bSELECT b FROM sFROM s ) AS s
) WHERE r.a = s.b
Rewriting does not generally hold if DISTINCT is removed.
(University of Waterloo) SQL: Duplicates and NULLs 15 / 29
What is a “null” value?
PhoneName Office HomeJoe 1234 3456Sue 1235 ?
Sue doesn’t have home phone (value inapplicable)Sue has home phone, but we don’t know her number
(value unknown)
(University of Waterloo) SQL: Duplicates and NULLs 17 / 29
Value Inapplicable
Essentially poor schema design.
Better design:
Office PhoneName OfficeJoe 1234Sue 1235
Home PhoneName HomeJoe 3456
Queries should behave as if asked over the above decomposition.⇒ (relatively) easy to implement
(University of Waterloo) SQL: Duplicates and NULLs 18 / 29
Value Unknown
Idea
Unknown values can be replaced by any domain value (that satisfiesintegrity constraints).⇒ many possibilities (possible worlds)
PhoneName Office HomeJoe 1234 3456Sue 1235 ?
→
PhoneName Office HomeJoe 1234 3456Sue 1235 0000
...
PhoneName Office HomeJoe 1234 3456Sue 1235 9999
(University of Waterloo) SQL: Duplicates and NULLs 19 / 29
Value Unknown and Queries
How do we answer queries?
Idea
Answers true in all possible worlds W of an incomplete D.
Certain AnswerQ(D) =
⋂W world of D
Q(W )
⇒ answer common to all possible worlds.
Is this (computationally) feasible?
⇒ NO (NP-hard to undecidable except in trivial cases)
SQL’s solution: a (crude) approximation
(University of Waterloo) SQL: Duplicates and NULLs 20 / 29
What can we do with NULLs in SQL?
expressionsgeneral rule: a NULL as a parameter to an operationmakes (should make) the result NULL1 + NULL→ NULL, ’foo’||NULL→ NULL, etc.
predicates/comparisonsthree-valued logic (crude approximation of “valueunknown”)
set operationsunique special value for duplicates
aggregate operationsdoesn’t “count” (i.e., “value inapplicable”)
(University of Waterloo) SQL: Duplicates and NULLs 21 / 29
Comparisons Revisited
Idea
Comparisons with a NULL value return UNKNOWN
Example
1 = 1 TRUE1 = NULL UNKNOWN1 = 2 FALSE
Still short of proper logical behaviour:x = 0 ∨ x 6= 0
should be always true (no matter what x is, including NULL!), but. . .
(University of Waterloo) SQL: Duplicates and NULLs 22 / 29
UNKNOWN and Boolean Connectives
Idea
Boolean operations have to handle UNKNOWN⇒ extended truth tables for Boolean connectives
∧ T U FT T U FU U U FF F F F
∨ T U FT T T TU T U UF T U F
¬T FU UF T
. . . for tuples in which x is assigned the NULL value we get:x = 0 ∨ x 6= 0→ UNKNOWN ∨ UNKNOWN→ UNKNOWN
which is not the same as TRUE.
(University of Waterloo) SQL: Duplicates and NULLs 23 / 29
UNKNOWN in WHERE ClausesHow is this used in a WHERE clause?
Additional syntax IS TRUE, IS FALSE, and IS UNKNOWN⇒ WHERE <cond> shorthand for WHERE <cond> IS TRUE
Special comparison IS NULL
List all authors for which we don’t know a URL of their home page:
SQL> select aid, name2 from author3 where url IS NULL;
AID NAME---------- -----------
3 Saake, Gunter
(University of Waterloo) SQL: Duplicates and NULLs 24 / 29
Counting NULLS
How do NULLs interact with counting (and aggregates in general)?count(URL) counts only non-NULL URL’s
⇒ count(*) counts “rows”
db2 => select count(*) as RS, count(url) as USdb2 => from author;
RS US---------------3 2
1 record(s) selected.
(University of Waterloo) SQL: Duplicates and NULLs 25 / 29
Outer Join
Idea
Allow “NULL-padded” answers that “fail to satisfy” a conjunct in aconjunction
Extension of syntax for the FROM clause⇒ FROM R <j-type> JOIN S ON C⇒ the <j-type> is one of FULL, LEFT, RIGHT, or INNER
Semantics (for R(x , y), S(y , z), and C = (r .y = s.y)).1 {(x , y , z) : R(x , y) ∧ S(y , z)}2 {(x , y ,NULL) : R(x , y) ∧ ¬(∃z.S(y , z))} for LEFT and FULL
3 {(NULL, y , z) : S(y , z) ∧ ¬(∃x .R(x , y))} for RIGHT and FULL
⇒ syntactic sugar for UNION ALL
(University of Waterloo) SQL: Duplicates and NULLs 26 / 29
Example
db2 => select aid,publicationdb2 => from author left join wrotedb2 => on aid=author;
AID PUBLICATION----------- -----------
1 ChTo981 ChTo98a1 Tom972 ChTo982 ChTo98a2 ChSa983 ChSa985 -
8 record(s) selected.
(University of Waterloo) SQL: Duplicates and NULLs 27 / 29
Counting with OJ
For every author count the number of publications:
db2 => select aid, count(publication) as pubsdb2 => from author left join wrotedb2 => on aid=authordb2 => group by aid;
AID PUBS----------- -----------
1 32 33 15 0
4 record(s) selected.
(University of Waterloo) SQL: Duplicates and NULLs 28 / 29
Summary
NULLs are necessary evil
⇒ used to account for (small) irregularities in data⇒ should be used sparingly
Can be always avoided
⇒ however some of the solutions may be inefficient
You can’t escape NULLs in practice
⇒ easy fix for blunders in schema design⇒ . . . also due to schema evolution, etc.
(University of Waterloo) SQL: Duplicates and NULLs 29 / 29
Top Related