Chapter 5 Relational Algebra and SQL

149
Chapter 5 Relational Algebra and SQL Now that we have some idea as how to create and set up a database based on a project spec- ification, via an E/R chart, we will learn how to get the “right stuff” out of such a database, which is what we do most of the time. A database query language is a special-purpose programming language that is designed and used to retrieve, and update, information stored in databases. The structured query language(SQL) is the one that we use most of the time. A very important feature for SQL statements is that it only states what it does, but not how to do it, which is left for DBMS to figure out. It is thus called a declarative language, different from the other procedural languages. 1

Transcript of Chapter 5 Relational Algebra and SQL

Chapter 5

Relational Algebra and SQL

Now that we have some idea as how to create

and set up a database based on a project spec-

ification, via an E/R chart, we will learn how

to get the “right stuff” out of such a database,

which is what we do most of the time.

A database query language is a special-purpose

programming language that is designed and

used to retrieve, and update, information stored

in databases.

The structured query language(SQL) is the

one that we use most of the time. A very

important feature for SQL statements is that

it only states what it does, but not how to do

it, which is left for DBMS to figure out. It

is thus called a declarative language, different

from the other procedural languages.

1

RA, SQL, and MariaDB

SQL is based on a mathematical body of knowl-

edge, Relational algebra (RA), which serves as

an intermediate language for the DBMS.

When a declarative SQL statement is parsed

by a DBMS, it will be translated into an RA ex-

pression. Such an expression is then analyzed

and optimized by a query optimizer to become

an equivalent but more efficient algorithm, or,

a query execution plan. Such a plan is then

converted to a piece of executable code.

The mathematical nature of the relational al-

gebra makes such analysis and optimization,

and proof of equivalence, possible.

MariaDB is one way to implement SQL, incre-

mentally, with the most recent version being

10.6.4, released on August 6, 2021.

2

What is relational algebra?

A relational algebraic expression consists of a

combination of some eight, or nine, basic op-

erators.

There are three groups of operators, two of

them, Restrict and Project, on the tables; four,

Union, Difference, Intersection, and Cartesian

product, on sets; together with two derived

ones: Join, and Division.

Sometimes, renaming, the ninth one, also plays

a role, when name change become necessary....

Just like we use the combination of the three

basic control structures to come up with a pro-

gram as we know it, we use a combination of

these operators to come up with a data access

program.

3

Eight operators illustrated...

4

...and in words

A∪B, the union of A and B, returns a relation

containing all tuples that appear in either, or

both, of the two specified relations, A and B.

A ∩ B, the intersection of A and B, returns

a relation containing all tuples that appear in

both of the two specified relations, A and B.

A ×B, the product of A and B, returns a rela-

tion containing all possible pairs (a, b), where

a is from A, and b from B.

A − B, the difference of A and B, returns a

relation containing all the tuples that appear

in A, but not in B.

We also mentioned, during the review, that the

first two are communicative (?), but the other

two are not.

5

σCR, the restriction of a relation R on C, re-

turns a relation containing all the tuples from

a relation R that satisfy a condition C.

πAsR, the projection of a relation R on As, re-

turns a relation containing all the (sub) tuples

of R in terms of attributes As.

A ./C B, the join of A and B in terms of C,

returns a relation containing all the pairs (a, b),

a ∈ A and b ∈ B such that (a, b) satisfies the

condition C.

Natural join of A and B collects those that

agree on the shared attribute(s).

A/C via B, the division of A by C via B, takes

two unary relations, A and C, and a binary one,

B, as its inputs. As the output, it sends back a

relation containing all the tuples from A, each

is matched with all the tuples in C, as shown

in B.

6

Restriction (Select)

We apply this operation to select a subset of

tuples satisfying certain Boolean conditions.

For example, if we want to get a list of com-

puter science professors, we use a select op-

eration to get it from the Professor table as

follows:

σDeptId=’CS’(Professor)

i.e., “Select all tuples from the Professor rela-

tion that satisfy the condition that DeptId=‘CS’.”

The general syntax is the following:

σselection condition(relation expression)

7

What could a condition be?

A condition can be a simple one, such as

attribute ⊕ constant,

e.g., “DeptId=’CS’”; or

attribute ⊕ attribute,

e.g., “Teaching.ProfId = Professor.Id”.

It could also be a general logic expression, i.e.,

an expression formed with logical operators,

such as And (∧), Or (∨), and Not (¬).

You must have learned this stuff in earlier courses.

8

How do we get the nastiness?

σselection condition(relation expression) is just a

string. But, when applied to a concrete database,

it has a value as its meaning.

Assume that it is applied to a relation instance

r of type R, we define the values of such an

expression σselection condition(r) to be the col-

lection of all the tuples in r that satisfies the

selection condition.

The important thing is that, when applied to

a relation, this expression will result in another

relation.

Question: So what?

This provides the basis for nested (nasty) queries,

as a query can be put anywhere a table fits.

9

An example

Given the following Person table,

Id Name Address Hobby

1123 John 123 Main St. Stamps

1123 John 123 Main St. Coin

5556 Mary 7 Lake Dr. Hike

9876 Bart 5 Pine St. Stamps

with the expression σHobby=’Stamps’(Person),

we will get the following table back.

Id Name Address Hobby

1123 John 123 Main St. Stamps

9876 Bart 5 Pine St. Stamps

This latter table (relation) can be used in other

queries... .

10

Another example

Given a more complicated condition

σStudId!=1111111 And (Semester=‘S2017’ Or Grade=’B’)(Transcript)

for each and every tuple in the table, it will

check if it satisfies the requirement, and throw

that into the result bucket if it does.

Question: What do we want?

The condition part could be further extended,

e.g.,

EmpSalary > (MngrSalary ∗ 2)

And (DeptId + CrsNumber) Like CrsCode

where ‘+’ is for string concatenation, and ‘Like’

is for pattern matching.

11

What do we usually do?

Query: What are all the courses taught by CS

professors”.

Question: How should we do it?

Answer: We always start with the input and

walk towards the output.

It seems that two tables, Teaching and Professor

are mentioned, and the input seems to be “CS”.

Notice that those two tables share the profes-

sor id information, we can find the output from

these two tables such that the tuples share the

same professor id, and the professor is affiliated

with Computer Science.

Technically, we can have the following RA ex-pression.

σProfessor.DeptId=’CS’ And Teaching.ProfId=Professor.Id

(Teaching × Professor)

12

Check it out...

Given the following data of the Professor table,

+------+--------------+--------+| Id | Name | DeptId |+------+--------------+--------+| 1111 | Jacob | MG || 2222 | John | CS || 3333 | David | EE || 4444 | Mary | CS |+------+--------------+--------+

and that of the Teaching table:

+------+---------+----------+

|ProfId| CrsCode | Semester |

+------+---------+----------+

| 1111 | MGT123 | F1995 |

| 2222 | CS305 | S1996 |

| 2222 | CS315 | F1997 |

| 3333 | EE101 | F1995 |

| 4444 | CS305 | F1995 |

+------+---------+----------+

13

How will it get the result?

1. Get the information by doing a Cartesian

product

Id# NAME DeptId ProfId CrsCode Semester

1111 Jacob MG 1111 MGT123 F19951111 Jacob MG 2222 CS305 S19961111 Jacob MG 2222 CS315 F19971111 Jacob MG 3333 EE101 F19951111 Jacob MG 4444 CS305 F19952222 John CS 1111 MGT123 F19952222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19972222 John CS 3333 EE101 F19952222 John CS 4444 CS305 F19953333 David EE 1111 MGT123 F19953333 David EE 2222 CS305 S19963333 David EE 2222 CS315 F19973333 David EE 3333 EE101 F19953333 David EE 4444 CS305 F19954444 Mary CS 1111 MGT123 F19954444 Mary CS 2222 CS305 S19964444 Mary CS 2222 CS315 F19974444 Mary CS 3333 EE101 F19954444 Mary CS 4444 CS305 F1995

The first row is ‘related”, but not what we

want; and the second is not related.

We need to get rid of them by doing a restric-

tion through the two conditions.

14

Keep those useful...

... taught by CS professors...Id# NAME DeptId ProfId CrsCode Semester

2222 John CS 1111 MGT123 F19952222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19972222 John CS 3333 EE101 F19952222 John CS 4444 CS305 F19954444 Mary CS 1111 MGT123 F19954444 Mary CS 2222 CS305 S19964444 Mary CS 2222 CS315 F19974444 Mary CS 3333 EE101 F19954444 Mary CS 4444 CS305 F1995

... and tuples have to be related, with match-

ing Ids.Id# NAME DeptId ProfId CrsCode Semester

2222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19974444 Mary CS 4444 CS305 F1995

Question: Is this what we want?

Answer: No. We want names of those courses,

taught by CS professors.

Question: How could we focus on, e.g, CS305,

and get what we really want?

15

The three yards....

1. A Cartesian product will be formed, which

contains a (h)uge table of twenty rows and six

columns. /

2. Only those ten (?)rows, where DeptId keeps

the “CS” value will be kept.

3. Finally, it keeps three (?) rows where CS

professors 101202303 and 555666777 taught

CS305 and CS315.

4. All these three rows contain six attributes,

and we will use projection to get out the course

code for these courses, as we will see later.

5. We want more..., e.g., focusing on the

course Ids and get their names through the

Course table.

16

Is there a better way?

The above solution works, but it is bulky /.

We always want to have a smaller intermediate

table to reduce the space, as well as time, to

get something done.

Procedurally, we start with the Professor table

to get all the professors who work in the CS

department, then walk over to the Teaching

table to get those tuples such that its ProfId

match with those that we just found.

σT.ProfId=Professor.Id((σProfessor.DeptId=’CS’Professor) × Teaching)

Question: Is this one better?

Notice that we have to get the stuff from two

tables. Thus, we are really Joining tuples from

two related tables that agree on the ProfId

attribute, shared among the two tables.

17

Check it out...

Given the following data of the Professor table,

+------+--------------+--------+| Id | Name | DeptId |+------+--------------+--------+| 1111 | Jacob | MG || 2222 | John | CS || 3333 | David | EE || 4444 | Mary | CS |+------+--------------+--------+

1. The restriction

σProfessor.DeptId=’CS’Professor

will get us the following:

+------+--------------+--------+| Id | Name | DeptId |+------+--------------+--------+| 2222 | John | CS || 4444 | Mary | CS |+------+--------------+--------+

18

2. With the restricted Cartesian product

(σProfessor.DeptId=’CS’Professor) × Teaching

we get the following smaller intermediate ta-

ble:

Id# NAME DeptId ProfId CrsCode Semester

2222 John CS 1111 MGT123 F19952222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19972222 John CS 3333 EE101 F19952222 John CS 4444 CS305 F19954444 Mary CS 1111 MGT123 F19954444 Mary CS 2222 CS305 S19964444 Mary CS 2222 CS315 F19974444 Mary CS 3333 EE101 F19954444 Mary CS 4444 CS305 F1995

3. Finally, the final layer of restriction

σT.ProfId=Professor.Id((σProfessor.DeptId=’CS’

Professor) × Teaching)

leads to the following result.

Id# NAME DeptId ProfId CrsCode Semester

2222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19974444 Mary CS 4444 CS305 F1995

19

Projection

A table might contain too much stuff, we don’t

always want to get back all the information.

The projection is to help us choose attributes.

Let A denote an attribute of a relation, R, and

let t be a tuple in r, and instance of R, then t.A

denote part of t consisting of the column under

A only, e.g., if t is a tuple of the Professor

table, then t.Id refers to the Id value of this

tuple.

In general, we have the following notation:

πattribute list(relation)

When applied to a relation r with type R, where

A1, · · · , An are all attributes of R, then πA1,···,An(r),

the projection of r on the list, returns the col-

lection of tuples t.[A1, · · · , An], where t is a tu-

ple of r.

20

An example

Given the following Person table,

Id Name Address Hobby

1123 John 123 Main St. Stamps

1123 John 123 Main St. Coin

5556 Mary 7 Lake Dr. Hike

9876 Bart 5 Pine St. Stamps

with the expression πName,Hobby(Person), we

will get the following table back

Name Hobby

John Stamps

John Coin

Mary Hike

Bart Stamps

21

Embedded (nested) expression

The gist of DB programming is that RA op-

erations can be combined. For example, given

the following table

Id Name Address Hobby

1123 John 123 Main St. Stamps

1123 John 123 Main St. Coin

5556 Mary 7 Lake Dr. Hike

9876 Bart 5 Pine St. Stamps

and πId,Name(σHobby=’Stamps’ Or Hobby=’Coins’(Person)),

we will get the following table back

Id Name

1123 John

9876 Bart

Notice the order of operation is “inside-out”,

just like in an arithmetic expression.

22

Related to SQL

Given πId,Name(σHobby=’Stamps’ Or Hobby=’Coins’(Person)),

we immediately have the following SQL query:

Select Distinct Id, Name

From Person

Where Hobby=’Stamps’ or Hobby=’Coins’;

Question: What will the above get?

MariaDB [zshen]> Select Distinct Id, Name

-> From Person

-> Where Hobby=’Stamps’ or Hobby=’Coins’;

+------+------+

| Id | Name |

+------+------+

| 1123 | John |

| 9876 | Bart |

+------+------+

2 rows in set (0.00 sec)

23

deja vu

Query: What are “all the courses taught by

CS professors”?

The following RA expression finds out the course

numbers of the courses taught by CS profes-

sors (Cf. Page 12).

πCrsCode(σProfessor.DeptId=’CS’ And ProfId=Professor.Id

(Teaching× Professor))

Question: What are those courses, namely,

names?

Since course names are in Course tables, we get

the stuff out of Teaching× Professor × Course,where the tuples agree on both Professor Id

and Course Id.

πCrsName(σCourse.CrsCode=Teaching.CrsCode

(πCrsCode(σProfessor.DeptId=’CS’ And ProfId=Professor.Id

(Teaching× Professor)))× Course)

24

How would it work out?

a. Courses that a CS person teaches (Cf.

Page 15 or 19).Id# NAME DeptId ProfId CrsCode Semester

2222 John CS 2222 CS305 S19962222 John CS 2222 CS315 F19974444 Mary CS 4444 CS305 F1995

b. Get the course code:

CrsCode

CS305

CS315

c. Join with Course with matching CrsCode:

CrsCode DeptId CrsName Descr

CS305 CS Database On the road to high-paying jobCS315 CS Trans. Proc. Recover from your worst crashes

d. Get the names by making a projection on

the course names.

CrsName

Database

Trans. Proc.

25

Related to SQL

Given the following RA query

πCrsName(σCourse.CrsCode=Teaching.CrsCodeAnd Professor.DeptId=’CS’ And Teaching.ProfId=Professor.Id

(Teaching× Professor × Course))

We immediately (?) get the following

Select Distinct CrsNameFrom Course C, Teaching T, Professor PWhere P.DeptId="CS" And T.CrsCode=C.CrsCode

And T.ProfId=P.Id;

With the current instance, this query bringsback the following:

MariaDB [register]> Select Distinct CrsName-> From Course C, Teaching T, Professor P-> Where P.DeptId="CS" And T.CrsCode=C.CrsCode-> And T.ProfId=P.Id;

+---------------------+| CrsName |+---------------------+| Database Systems. || Transaction Process |+---------------------+2 rows in set (0.00 sec)

26

Is there a better way?

A better way might be the following (Cf. Page 17):

πCrsName(σT.CrsCode=C.CrsCodeπCrsCode(σT.ProfId=Professor.Id

(σProfessor.DeptId=’CS’Professor) × Teaching)× Course)

Procedurally, we start with the Professor ta-

ble to get all the Ids of those professors who

work in the CS department, walk over to the

Teaching table to get the Course Ids of those

courses taught by CS professors. Finally, we

walk over to the Course tables with those course

Ids to get the names of those courses.

Question: Why is this one better?

Answer: All the intermediate tables will con-

tain minimum information that we need to con-

tinue.

Assignment: Use the current instance (Unit

4, Page 23-24) to verify the answer.

27

Set operations

Since relations (tables) are sets, the set op-

erations are pretty straightforward. You must

have played with them in either Finite Math.,

MA for CS, Math Reasoning, or Discrete Math..

Again, given two sets A and B, their union,

intersection, and difference are represented as

A ∪ B, A ∩ B, and A − B, respectively. Notice

that although the first two are symmetric, the

difference is not, i.e., A − B could be different

from A − B.

Given two relations r and s, we immediately

obtain r∪s, r∩s, r−s as the collection of tuples

that are in either r or s; in both r and s; and

in r but not in s. Thus, the results are all sets,

as well.

28

Union compatible

To be meaningful in database manipulation,when we apply set operators, both relationsmust have the same attributes, i.e., union com-patible.

πCrsCode,Semester(σGrade=‘C’(Transcript))

− πCrsCode,Semester(σCrsCode=‘MAT123’(Transcript))

What are the courses, except MAT123, andwhen it was offered, at any time, when at leastone student got a ‘C’?

πCrsCode,Semester(σGrade=‘C’(Transcript))

∪ πCrsCode,Semester(σCrsCode=‘MAT123’(Transcript))

When did we offer MAT123 in the past, andwhat are the other courses, offered at anytime, when at least one student got a ‘C’?

πCrsCode,Semester(σGrade=‘C’(Transcript))

∩ πCrsCode,Semester(σCrsCode=‘MAT123’(Transcript))

When did we offer MAT123, where at leastone student got a ’C’?

29

What does it do?

We can check them out with the following

data:

30

Assignment: Evaluate the above queries.

31

Related to SQL

Query: When did we off MAT123 that at least

one student took, or something else for whichshe got a ‘C’:

We have the following RA expression:

πCrsCode,Semester(σGrade=‘C’(Transcript))

∪ πCrsCode,Semester(σCrsCode=‘MAT123’(Transcript))

The MariaDB query is immediate:MariaDB [registration]> (Select CrsCode, Semester

-> From Transcript Where Grade=’C’)-> Union-> (Select CrsCode, Semester-> From Teaching Where CrsCode=’MAT123’);

+---------+----------+| CrsCode | Semester |+---------+----------+| CS305 | F1995 || CS315 | F1997 || MAT123 | F1997 || MAT123 | S1996 |+---------+----------+

Notice that neither intersection nor comple-

ment is supported by Version 5.5.56 of Mari-

aDB, but it is available after Version 10.3.0.

32

Cartesian product

Given two relations r and s, r × s, where r and

s share no common attribute names consists

of the set of all tuples (a, b), a ∈ r and b ∈ s.

For example, Let r and s be the following,

S#

S1

S2

P#

P1

P2

Then, the result of r × s is the following,

S# P#

S1 P1

S1 P2

S2 P1

S2 P2

33

What happens...

when r and s do share common attribute names?

For example, T1(A, B)×T2(B, C). If we do noth-

ing, by the very definition, we will end up with

a table T3(A, B, B, C), where the two B’s have

the same name, but potentially different val-

ues. This is not allowed by the relational data

model. (Still remember data atomicity?)

What we will do is thus to rename such at-

tributes. This ninth operator, not a basic one,

can take the following syntax:

expression[A1, · · · , An],

where A1, · · · , An are the new names of the

original relational expression, for the correspond-

ing positions.

Let’s check out an example:

34

Mix up the profs and students...

(πId,Name(Student)× πId,DeptId(Professor))

[Student.Id, Name,Professor.Id, DeptId])

35

Join

A RDB is often a collection of small tables.

Thus, a query is often involved with multiple

tales, when we use Join.

A bit more formally, given two relation schemas,

R and S, their join, is denoted as

R ./join condition S,

where the join condition is used to complete

this operation.

Let A1, · · · , An and B1, · · · , Bn be two subsets of

attributes of R and S, respectively, and ⊕1, · · · ,

⊕n be the standard comparators such as ‘=’,

‘<’, etc., then a general join is R× S, with the

following restriction:

(R.A1 ⊕1 S.B1) And · · ·And (R.An ⊕n S.Bn).

A join is thus not a basic operation, but one

derived with product, restriction, and projec-

tion.

36

Natural join

When all the operations used in a join are ‘=’,

we call this special case a natural join.

For example, considering two tables, Dept

DEPT# DNAME BUDGET

D1 Marketing 10MD2 Development 12MD3 Research 5M

and Emp

EMP# ENAME DEPT# SALARY

E1 Lopez D1 40KE2 John D1 42KE3 Bob D2 30KE4 Jay D2 35K

37

Their natural join over DEPT#, a commonly shared

attribute, is the following:

DEPT# DNAME BUDGET EMP# ENAME SALARY

D1 Marketing 10M E1 Lopez 40KD1 Marketing 10M E2 John 42KD2 Development 12M E3 Bob 30KD2 Development 12M E4 Jay 35K

We notice the following two things about this

table: 1. Those two tables are related through

a commonly shared column, i.e., DEPT#. We will

discuss the extreme cases later, on Page 55,

when nothing, or everything, is shared.

2. When being joined, every row in the first

table will be concatenated with another from

the second row, as long as they are related,

i.e., sharing the same DEPT# value.

For example, since no row in the first table

has a DEPT# value of ‘D3’, then no such row is

contained in the joined table.

38

More specifically...

1. A Cartesian product of the two tables will

be formed:

D# DNAME BUDGET EMP# ENAME D# SALARY

D1 Market 10M E1 Lopez D1 40KD1 Market 10M E2 John D1 42KD1 Market 10M E3 Bob D2 30KD1 Market 10M E4 Jay D2 35KD2 Develop 12M E1 Lopez D1 40KD2 Develop 12M E2 John D1 42KD2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35KD3 Research 5M E1 Lopes D2 40KD3 Research 5M E2 John D2 42KD3 Research 5M E3 Bob D2 30KD3 Research 5M E4 Jay D2 35K

2. All rows that have different D# values will

be deleted, since they are not related. Thus,

D# DNAME BUDGET EMP# ENAME D# SALARY

D1 Market 10M E1 Lopez D1 40KD1 Market 10M E2 John D1 42KD2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35K

39

3. Finally, the duplicated D# column is deleted,

since it would be redundant.

Technically, we make a projection of the result-

ing table on all by one redundant attribute, D#

in this case.

D# DNAME BUDGET EMP# ENAME SALARY

D1 Market 10M E1 Lopez 40KD1 Market 10M E2 John 42KD2 Develop 12M E3 Bob 30KD2 Develop 12M E4 Jay 35K

After the normalization process, to be discussed

in the next Chapter, an RDB almost always

consists of a bunch of simple and small tables.

On the other hand, a general query needs in-

formation from several tables, when the join

operation is applied to collect information from

related tables.

Let’s look at a few applications of this useful,

but challenging, operation.

40

An example

Query: Who taught a course in the fall semester

of 1995?

We want the names of the professors, not all

of them, but the ones who taught in F1995.

The general plan is always how to get the out-

put, based on the input.

One way to proceed is to start with the input,

’F1995’, and find out where that input sits,

Teaching (taught). This can be obtained via a

restriction.

σSemester=‘F1995’(Teaching)

With the current instance of the Teaching ta-

ble, we get the following:

ProfId CrsCode Semester

555666777 CS305 F1995121232343 EE101 F1995

41

What do we really want?

The above information does show the profes-

sors who taught in F1995, but only their num-

bers, not their names.

Question: Where is the beef?

Look at the other tables, we find out that their

names can be found in the Professor table,

related to the Teaching table via the professor’s

Ids.

We can thus get a) the ids of those professors,

then b) their names via a join.

It is easy to get the Ids via a projection.

πProfId(σSemester=‘F1995’(Teaching))

ProfId

555666777121232343

42

Go over with join

With the Id’s in hand, we can connect the two

tables via a join, as follows:

Professor ./P.Id=T.ProfId (σSemester=‘F1995’(Teaching)),

Since join only keeps those rows sharing the

same Id values, we get the following:

Id Name DeptId

555666777 Mary Doe CS121232343 David Jones EE

43

The final kick

Since we only want the names, we have to do

another projection on the Name attribute.

πName(Professor ./Id=ProfId (σSemester=‘F1995’(Teaching))),

This gets us the following:

Name

Mary DoeDavid Jones

Question: What have we done?

Answer: The restriction applied on Teaching

finds out all the information about who taught

what in F1995, including the professor Id.

To get their names, we have to match up the

selected tuples with the tuples in the Professor

table with a natural join on their Id.

Finally, since we only want the names, we make

a projection on the Name.

44

Related to SQL

Given the RA expression

πName(Professor ./Id=ProfId σSemester=‘F1995’(Teaching)),

we immediately have the following SQL query:

MariaDB [registration]> Select P.Name-> From Teaching T, Professor P-> Where T.Semester=’F1995’ And P.Id=T.ProfId;

Notice that we use T, P as shortcuts for Teaching

and Professor, and we also use a condition,

P.Id=T.ProfId, to explicitly enforce the join.

We will get the following result for this query:

+-------------+| Name |+-------------+| David Jones || Mary Doe |+-------------+

Assignment: You have to check all these queries

with you-know-what.

45

Another example

Query: Who taught what in the fall semester

of 1995.

πCrsName,Name((Professor ./P.Id=T.ProfId

(σSemester=‘F1995’(Teaching))) ./C.CrsCode=T.CrsCode Course)

Question: What is going on?

Answer: The restriction finds all the informa-

tion from the Teaching table about who taught

what in F1995, including the professor Id and

course Id.

To get the names of the professors and those

of the courses, we have to match up the se-

lected tuples with the tuples in the Professor

table with a natural join. We similarly find out

the names of those courses.

Finally, since we only want to get the names,

we make a projection on the respective names.

46

Related to SQL

Given

πCrsName,Name((Professor ./P.Id=T.ProfId

(σSemester=‘F1995’(Teaching))) ./C.CrsCode=T.CrsCode Course)

we immediately have the following SQL query:

MariaDB [registration]> Select P.Name, C.CrsName

-> From Teaching T, Professor P, Course C-> Where T.Semester=’F1995’ And P.Id=T.ProfId-> And T.CrsCode=C.CrsCode;

Notice again that we use additional conditions

to explicitly enforce the two join operations.

We will get the following result for this query:

+-------------+---------------------+| Name | CrsName |+-------------+---------------------+| Mary Doe | Database Systems. || David Jones | Electronic Circuits || Ann White | Algebra |+-------------+---------------------+

47

Join in MariaDB

MariaDB actually implements a Join in the

form of

A Join B (join condition)

Thus, we can have the following alternative

SQL expression, which gets us the same an-

swer.

MariaDB [registration]> Select distinct Name, CrsName-> From Professor Join Teaching-> on (Professor.Id=Teaching.ProfId)-> Join Course-> on (Teaching.CrsCode=Course.Crscode)-> Where Teaching.Semester=’F1995’;

It sends back the following:

+-------------+---------------------+| Name | CrsName |+-------------+---------------------+| Mary Doe | Database Systems. || David Jones | Electronic Circuits || Ann White | Algebra |+-------------+---------------------+

48

Why natural join?

Such equality based join is indeed natural, since

it reflects a good design principle: the same

stuff should be related, and nothing else.

For example, different pieces of information

about the same course should be related; while

those information of different courses have noth-

ing to do with each other. Such an attribute is

often given the same name in different tables,

collecting different information.

The condition in a natural join actually equates

all the related attributes in the relations be-

ing joined. Moreover, as we already discussed,

since these attributes really mean the same

thing, it keeps only one of them, while an equi-

join keeps both.

A natural join is defined as follows:

πattributes(σequation of the shared attributes(R × S))

49

Who taught whom?

Those who took the same course as taught by

the professor in the same semester will be so

paired off /. Below is the RA expression:

πStudId,ProfId(Transcript ./C Teaching)

We have the following with MariaDB, where

condition C is made explicit:

MariaDB [registration]> Select T.StudId, H.ProfId-> From Transcript T, Teaching H-> Where T.CrsCode=H.CrsCode-> and T.Semester=H.Semester;

+-----------+-----------+| StudId | ProfId |+-----------+-----------+| 666666666 | 9406321 || 987654321 | 9406321 || 23456789 | 101202303 || 123454321 | 101202303 || 23456789 | 121232343 || 666666666 | 121232343 || 123454321 | 555666777 || 987654321 | 555666777 || 111111111 | 783432188 || 111111111 | 900120450 || 666666666 | 900120450 |+-----------+-----------+

Question: Who are those people?

50

Let’s find them out...

Find out where their names sit, then join those

tables.

MariaDB [registration]> Select P.Name As Professor, S.Name As Student-> From Transcript T, Teaching H, Professor P, Student S-> Where T.CrsCode=H.CrsCode and T.Semester=H.Semester-> and P.Id=H.ProfId and T.StudId=S.Id;

Below is the answer:

+--------------+---------------+| Professor | Student |+--------------+---------------+| John Smyth | Homer Simpson || David Jones | Homer Simpson || Ann White | Jane Doe || Adrian Jones | Jane Doe || Mary Doe | Joe Blow || John Smyth | Joe Blow || David Jones | Jesoph Public || Ann White | Jesoph Public || Jacob Taylor | Jesoph Public || Mary Doe | Bart Simpson || Jacob Taylor | Bart Simpson |+--------------+---------------+

Notice As, the renaming operator.

51

Yet another example

Query: Who took at least two courses?

There are several ways of doing this. We will

start with the following one:

πStudId(σCrsCode 6=CrsCode2(Trancript

./ Transcript[StudId, CrscCode2, Semester2, Grade2]))

Question: Why don’t we rename StudId?

Answer: We use StudId to connect all the

courses taken by the same student, since each

student has his/her unique Id.

Question: Are you sure this stuff works?

Check it out...

52

Let’s find it out...

Given the following Transcript table,

SId CrsC G Sem

1111 CS2370 C F20171111 CS3600 A F20182222 CS2370 B F2017

(Trancript ./ Transcript[SId, CrscC2, G2, Sem2]))

will give us the following:SId CrsC G Sem CrsC2 G2 Sem2

1111 CS2370 C F2017 CS2370 C F20171111 CS2370 C F2017 CS3600 A F20181111 CS3600 A F2018 CS2370 C F20171111 CS3600 A F2018 CS3600 A F20182222 CS2370 B F2017 CS2370 B F2017

(σCrsC 6=CrsC2(Trancript ./ Transcript[SId, CrscC2, G2, Sem2]))

will give usSId CrsC G Sem CrsC2 G2 Sem2

1111 CS2370 C F2017 CS36000 A F20181111 CS3600 A F2018 CS2370 C F2017

And the whole thing gives us

SId

1111

53

Related to SQL

Given

πStudId(σCrsCode 6=CrsCode2(Trancript

./ Transcript[StudId, CrscCode2, Semester2, Grade2]))

we can have the following SQL query:MariaDB [registration]> Select distinct T1.StudId

-> From Transcript T1, Transcript T2-> Where T1.StudId=T2.StudId And T1.CrsCode <> T2.CrsCode;

Notice that we use different names to get two

separate copies, T1 and T2, of the same table.

We will get the following result for this query,

based on our instance:

+-----------+| StudId |+-----------+| 23456789 || 111111111 || 123454321 || 666666666 || 987654321 |+-----------+

Assignment: Find out who they are....

54

Something special

Question: What is the natural join of R and

S when R and S share the same attributes?

Answer: By definition, once we construct the

product of R and S, and apply the equality

restriction on the shared attributes, only the

identical pairs, i.e., those belonging to both,

will stay. But, we will keep only one copy of

those identical pairs.

Hence, in this case, we have R ∩ S.

Question: What is the natural join of R and

S when they have no attribute in common.

Answer: In this case, when we apply the equal-

ity restriction on the shared attributes, nothing

will be kicked out, since nothing is shared. In

the projection step, we also project out no at-

tributes since no attribute is duplicated.

Then, in this case, the whole product stays.

55

An example

To construct the join of S1 and S2

A B

a1 b1a1 b2

A B

a1 b1a2 b1

1. Construct S1 × S2 :

A B A B

a1 b1 a1 b1a1 b1 a2 b1a1 b2 a1 b1a1 b2 a2 b1

2. Apply the join condition: S1.A = S2.A and

S1.B = S2.B

A B A B

a1 b1 a1 b1

3. Remove redundancy.

A B

a1 b1

We end up with S1 ∩ S2.

56

Another example

To construct the join of S1 and S3

A B

a1 b1a1 b2

C D

c1 d1

c2 d1

1. Construct S1 × S3 :

A B C D

A B C B

a1 b1 c1 d1

a1 b1 c2 d1

a1 b2 c1 d1

a1 b2 c2 d1

2. Apply the join condition. Since nothing is

shared, none is removed.

3. Remove redundancy. Again, since no dupli-

cate attributes exist, no attribute is projected

away.

We end up with S1 × S3.

By the way, I just did 5.6 for you.

57

Division

This might be the most complex operation. It

is used in such scenarios that who has taught

everything that is offered by the CS depart-

ment or who has taken every course offered by

a particular professor or who supplies every red

part?

Division in RA is OK, but much more challeng-

ing in SQL /

This operator takes two unary relations, A,B;

and one binary one, C, as its inputs. As the

output, it sends back elements of A that matches

with every element in B, as shown in C.

In the teaching everything case, A and B refer

to ProfId and CrsCode of all the CS courses,

respectively; and C is πProfId,CrsCode(Teaching).

58

Worth how many words?

Let’s check them out with the following data,

and will get {1,2,3,4} divide by s per r =

{2,3} :

59

A final example

Query: Who have taken all the courses taught

by Professor John Smyth?

We know that we can use the division operator,

by finding out the three tables, A, B and C.

A = πStudIdTranscript, i.e., “those who have

taken courses.”

It is easy to get C, “Who have taken what”:

πStudId, CrsCode(Transcript).

To find out B, we have to find out the code of

those courses taught by Prof. Smyth, i.e.,

πCrsCode(σProfId=(πProfId(σName=‘John Smyth’(Professor)))(Teaching)).

We will see later on how to use MariaDB to do

division which is much more intimidating. /

60

Now the SQL part

SQL is the most widely used DB programming

language, with MariaDB being an incomplete

implementation, with the current version being

10.6.4, as of October 6, 2021.

We can submit individual SQL query state-

ments directly to an DBMS through a terminal,

as what we have been doing.

But, in practice, to provide the users with a

better UX, we almost always embed them in a

program in, e.g., PhP, that submits a collec-

tion of SQL statements, with, e.g., an HTML

based UI, to a DBMS at run time and process

the returned results.

We discuss the former case in this chapter, and

talk about the embedding case with a front

end, in a later chapter, Unit 8.

61

To kick off

Query: Who are the professors working in the

EE department?

MariaDB [register]> Select P.Name

-> From Professor P

-> Where P.DeptId=’EE’;

+-------------+

| Name |

+-------------+

| David Jones |

+-------------+

1 row in set (0.00 sec)

As we saw earlier, in the above, we use a tuple

variable, P, which ranges over the tuples of the

Professor relation.

It is not necessary here, but quite useful when

we have to deal with several tables with iden-

tical attribute names as we saw earlier.

62

The evaluation process

The basic algorithm for evaluating such an

SQL statement is as follows:

1. The From part is evaluated to produce a

Cartesian product of all the tables mentioned.

2. The Where part is evaluated to apply a re-

striction on the product where we keep only

these rows that “make the cut”.

3. Finally, the Select part is evaluated to apply

a projection to select those attributed from the

leftover rows taken from the previous step.

Thus, the previous query is nothing but

πName(σDeptId=‘EE’(Professor)).

63

A multi-table example

Considering two tables, Dept

DeptId DName Budget

D1 Marketing 10D2 Development 12D3 Research 5

and Emp

EMPId EName DeptId Salary

E1 Lopez D1 40000E2 John D1 42000E3 Bob D2 30000E4 Jay D2 35000

and the query “Who makes less then 40 grands,

and where do they work?”

Select E.EName,D.DName,E.Salary

From Dept D, Emp E

Where D.DeptId=E.DeptId and E.SALARY<40000;

64

The evaluation process

1. The Cartesian product of the two tables,

Dept and Emp, as mentioned in the From part, is

constructed as follows:

DeptId DName Budget EmpId EName DeptId Salary

D1 Market 10M E1 Lopez D1 40KD1 Market 10M E2 John D1 42KD1 Market 10M E3 Bob D2 30KD1 Market 10M E4 Jay D2 35KD2 Develop 12M E1 Lopez D1 40KD2 Develop 12M E2 John D1 42KD2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35KD3 Research 5M E1 Lopes D2 40KD3 Research 5M E2 John D2 42KD3 Research 5M E3 Bob D2 30KD3 Research 5M E4 Jay D2 35K

2. Then, the Where part is evaluated, so that

only those tuples satisfying the condition

D.DeptId=E.DeptId and E.Salary<40000

are kept.

65

DeptId DName Budget EmpId EName DeptId Salary

D2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35K

3. Finally, the Select part is evaluated, which

keeps only those attributes as mentioned in the

target list.

EName DName Salary

Bob Development 30KJay Development 35K

This is indeed the result of this query when

being applied to this table instance.

MariaDB [Strange]> Select E.EName,D.DName,E.Salary-> From Dept D, Emp E-> Where D.DeptId=E.DeptId and E.SALARY<40000;

+-------+-------------+--------+| EName | DName | Salary |+-------+-------------+--------+| Bob | Development | 30000 || Jaz | Development | 35000 |+-------+-------------+--------+

66

Query with a join

Query: Who taught in Fall 1995?

MariaDB [register]> Select P.Name

-> From Professor P, Teaching T

-> Where P.Id=T.ProfId

-> And T.Semester=’F1995’;

+-------------+

| Name |

+-------------+

| David Jones |

| Mary Doe |

| Ann White |

+-------------+

The evaluation of this query really follows the

three step process as we discussed earlier.

This one is nothing but, in terms of relational

algebra:

πName(Professor ./Id=ProfId (σSemester=‘F1995’(Teaching))).

67

SQL and RA

Given an SQL query

Select TargetList

From Rel1 V1, ..., Reln Vn

Where Condition

its RA expression is essentially the following:

πTargetList(σCondition(Rel1 × · · · × Reln)),

where we do need to convert the Condition into

its RA equivalent.

We talked about earlier the optimization gain

of such a conversion, and, on the other hand,

RA will often give us clues as how to come up

with SQL query, particularly for those tough

ones, as we will see with numerous examples

later on.

68

An example

Query: Who taught what in Fall 1995? (The

stuff on Page 46 ,)

Its RA expression is something like the follow-ing:

πName,CrsName(Professor ./ (σSem=‘F1995’Teaching) ./ Courses),

with appropriate join conditions.

This can be immediately turned into an SQL

query, as follows:

MariaDB [registration]> Select P.Name, C.CrsName-> From Professor P, Teaching T, Course C-> Where T.Semester=’F1995’ And-> P.Id=T.Profid And T.CrsCode=C.CrsCode;

The answer should be the following:

+-------------+---------------------+| Name | CrsName |+-------------+---------------------+| Mary Doe | Database Systems. || David Jones | Electronic Circuits || Ann White | Algebra |+-------------+---------------------+

69

Self-join queries

We once mentioned that, to compose a Carte-

sian product, we sometimes have to rename

identically named attributes.

In particular, we had the following RA expres-

sion (Page 52) to get all students who took at

least two different courses:

πStudId(σCrsCode 6=CrsCode2(Trancript

./ Transcript[StudId, CrscCode2, Semester2, Grade2]))

Here we keep the table name, but rename the

attributes.

70

Its SQL cousin

To do it in SQL, we have the following, wherewe rename the table names.

MariaDB [registration]> Select Distinct T1.StudId-> From Transcript T1, Transcript T2-> Where T1.CrsCode<>T2.CrsCode-> And T1.StudId=T2.StudId;

It would give us the following:

+-----------+| StudId |+-----------+| 23456789 || 111111111 || 123454321 || 666666666 || 987654321 |+-----------+

Question: How do we verify the above result?

Question: Why do we need “Distinct”?

71

We want distinct results

If we don’t use “distinct” we would get the

following:

MariaDB [registration]> Select T1.StudId-> From Transcript T1, Transcript T2-> Where T1.CrsCode<>T2.CrsCode-> And T1.StudId=T2.StudId;

+-----------+| StudId |+-----------+| 23456789 || 23456789 || 111111111 || 111111111 || 111111111 || 111111111 || 111111111 || 111111111 || 123454321 || 123454321 || 123454321 || 123454321 || 123454321 || 123454321 || 666666666 || 666666666 || 666666666 || 666666666 || 666666666 || 666666666 || 987654321 || 987654321 |+-----------+

72

Who are these students?

It should be clear that we just need to add in

another layer of Join with Student to find out

their names.

MariaDB [registration]> Select Distinct S.Name-> From Transcript T1, Transcript T2, Student S-> Where T1.CrsCode<>T2.CrsCode-> And T1.StudId=T2.StudId-> And S.Id=T1.StudId;

+---------------+

| Name |

+---------------+

| Homer Simpson |

| Jane Doe |

| Joe Blow |

| Jesoph Public |

| Bart Simpson |

+---------------+

5 rows in set (0.00 sec)

73

RA 6= SQL

The mathematical RA is to get a set, while

the practical SQL is to get a multiset.

To return a true relations with no duplicates,

the evaluator has to do another scan to take

out all the duplicates, which can be arranged

by putting up another operator

MariaDB [registration]> Select Distinct-> T.ProfId, T.CrsCode-> From Teaching T;

+-----------+---------+| ProfId | CrsCode |+-----------+---------+| 9406321 | MGT123 || 101202303 | CS305 || 101202303 | CS315 || 121232343 | EE101 || 555666777 | CS305 || 783432188 | MGT123 || 900120450 | MAT123 |+-----------+---------+

74

Making comments

We sometimes want to make comments to

make queries more readable. Given the fol-

lowing code

# An example of Select distinct

Select distinct

T.ProfId, T.CrsCode

From Teaching T;

MariaDB will give your the following back:

MariaDB [register]> # An example of Select distinctMariaDB [register]> Select distinct

-> T.ProfId, T.CrsCode-> From Teaching T;

+-----------+---------+| ProfId | CrsCode |+-----------+---------+| 9406321 | MGT123 || 101202303 | CS305 || 101202303 | CS315 || 121232343 | EE101 || 555666777 | CS305 || 783432188 | MGT123 || 900120450 | MAT123 |+-----------+---------+

75

What is in Where?

We have so far only seen simple conditions in

the Where part.

SQL provides some common operators for such

a purpose, such as ‘=’, ‘<’, ‘<>’, (Same as

! =, or even ‘Not =’), etc..

In general, any Boolean expression will do. For

example, the following query

Select E.Id

From Employee E, Employee M

Where E.BossSSn=M.SSN And E.Salary>2*M.Salary

And E.LastName=‘Mc’||E.FirstName

should return all employees who make more

than twice what his boss does, and whose last

name is ‘Mc’ concatenated with his first name,

such as “Donald McDonald”.

76

What about Select?

The Select part can also come with a few spe-

cial features.

If you do want to get everything from the From

part, you put an ‘*’ in the Select.

MariaDB [registration]> Select * From Professor;+-----------+--------------+--------+-----+--------+| Id | Name | DeptId | Age | Salary |+-----------+--------------+--------+-----+--------+| 9406321 | Jacob Taylor | MG | 45 | 30000 || 101202303 | John Smyth | CS | 32 | 40000 || 121232343 | David Jones | EE | 56 | 25000 || 555666777 | Mary Doe | CS | 67 | 40000 || 783432188 | Adrian Jones | MG | 55 | 30000 || 864297351 | Qi Chen | MA | 34 | 35000 || 900120450 | Ann White | MA | 38 | 50000 |+-----------+--------------+--------+-----+--------+

77

Expression in Select

SQL permits expressions in the target list, as

well as new headings through the renaming

mechanism via “As”.

The following finds out the average salary per

year in age for all the professors with MariaDB.

MariaDB [registration]> Select Name, Age, Salary,-> Round(Salary/Age, 2) As SalaryByAge-> From Professor;

+--------------+-----+--------+-------------+| Name | Age | Salary | SalaryByAge |+--------------+-----+--------+-------------+| Jacob Taylor | 45 | 30000 | 666.67 || John Smyth | 32 | 40000 | 1250.00 || David Jones | 56 | 25000 | 446.43 || Mary Doe | 67 | 40000 | 597.01 || Adrian Jones | 55 | 30000 | 545.45 || Qi Chen | 34 | 35000 | 1029.41 || Ann White | 38 | 50000 | 1315.79 |+--------------+-----+--------+-------------+

Notice ROUND(X,D) rounds the argument X to D

decimal places.

Check out the course page fore more.

78

What does Not mean?

Any condition can be negated. For example,

NOT (T1.CrsCode=T2.CrsCode)

It could be even nested. For example

NOT (E.BossSNN=M.SSN And E.Salary>2*M.Salary

And NOT (E.LastName=’Mc’||E.FirstName))

Question: What does the last piece mean?

Answer: It means that if somebody’s salary is

more than twice that much of his boss, then

his last name is ’Mc’ together with his first

name, since

¬(A ∧ ¬B) ≡ ¬A ∨ B ≡ A → B.

Labwork: Let’s take care of Lawork 3.1.

79

Set operations

With SQL, we can use set operators as defined

in RA, i.e., union, intersection and difference.

Query: Who are those professors working ei-

ther in CS or EE departments?

MariaDB [registration]> Select P.Name-> From Professor P-> Where P.DeptId=’CS’-> Union-> Select P.Name From Professor P-> Where P.DeptId=’EE’;

+-------------+| Name |+-------------+| John Smyth || Mary Doe || David Jones |+-------------+

Check out Example 5.15 in Sec. 3.2 of theMariaDB

notes for much more details.

80

An equivalent form

Recall the following:

A ∪ B ≡ {x : x ∈ A ∨ x ∈ B},

the previous query is thus equivalent to the

following:

MariaDB [registration]> Select Distinct P.Name

-> From Professor P

-> Where (P.DeptId=’CS’ Or P.DeptId=’EE’);

+-------------+| Name |+-------------+| John Smyth || David Jones || Mary Doe |+-------------+

Question: Which one is to use?

Answer: It is largely a personal preference for

“Union”, but we don’t have a choice for the

other two, “Intersect” and “Except”, / when

we use MariaDB before 10.3.0, as neither is

available. /

81

This or that...

Query: Who are those professors either affil-

iated with the Computer Science department

or have ever taught a CS course?

MariaDB [registration]> Select P.Name-> From Professor P, Teaching T-> Where P.Id=T.ProfId And T.CrsCode like ’CS%’-> Union-> Select P.Name From Professor P-> Where P.DeptId=’CS’;

+------------+| Name |+------------+| Mary Doe || John Smyth |+------------+

Alternatively, we can also use “or” to do thesame:

MariaDB [registration]> Select Distinct P.Name-> From Professor P, Teaching T-> Where (P.Id=T.ProfId) And (T.CrsCode like ’CS%’)-> Or (P.DeptId=’CS’);

Notice that the condition is (A ∧ B) ∨ C.

82

This and that...

Query: Who are those students who took

both CS315 and CS305?

We might want to do the following:

MariaDB [registration]> Select S.Name-> From Student S, Transcript T-> Where S.Id=T.StudId And T.CrsCode=’CS305’-> Intersect-> Select S.Name-> From Student S, Transcript T-> Where S.Id=T.StudId And T.CrsCode=’CS315’;

ERROR 1064 (42000): You have an error in your SQL syntax;check the manual that corresponds to your MariaDB serverversion for the right syntax to use near ’Intersect

It is because our version, MariaDB Ver. 3.5.68,

does not support “Intersecct” /; while 3.10.0

does.,

Question: Is there a way out?

83

What could we do?

We can use logical operators. Does the fol-lowing one work?

MariaDB [registration]> Select S.Name-> From Student S, Transcript T-> Where S.Id=T.StudId And T.CrsCode=’CS315’-> And T.CrsCode=’CS305’;

Empty set (0.00 sec)

Question: Is it really empty?

Answer: Apparently not so. With the current

instance, 123454321 (Joe Blow) took both. /

Question: Why is it incorrect?

The CrsCode box of any tuple contains only

one value. So, no CrsCode box of any tuple

of Transcript may contain both ‘CS305’ and

‘CS315’.,

Do you still remember data atomicity? /

84

What should we do?

Look for evidence in two tuples in the Transcript

table for two different courses.

MariaDB [registration]> Select S.Name-> From Student S, Transcript T1, Transcript T2-> Where S.Id=T1.StudId And T1.CrsCode=’CS315’-> And S.Id=T2.StudId And T2.CrsCode=’CS305’;

+----------+| Name |+----------+| Joe Blow |+----------+

Question: What is going on?

We came up with two copies of transcript ta-

ble, T1 and T2, where for the same student S,

we look for her record of taking ‘CS315’ in T1

and ‘CS305’ in T2.

Question: How do we make sure it is the same

student?

Join conditions via Student.Id.... ,

85

Who can take CS 3600?

The prerequisites for CS 3600 are “CS 2370

and (MA 2250 or MA 2200)”.

If you use something that supports all the op-

erations

(Select S.NameFrom Student S, Transcript TWhere T.CrsCode="CS2370" And S.Id=T.StudId)Intersect((Select S.NameFrom Student S, Transcript TWhere T.CrsCode="MA2200" And S.Id=T.StudId)Union(Select S.NameFrom Student S, Transcript TWhere T.CrsCode="MA2250" And S.Id=T.StudId))

Otherwise,

Select S.NameFrom Student S, Transcript T1, Transcript T2Where T1.CrsCode="CS2370" And

(T2.Crscode="MA2200 Or T2.CrsCode="MA2250")And T1.StudId=T2.StudIdAnd S.Id=T1.StudId;

Notice again that the condition is A ∧ (B ∨C).

86

This but not that...

Query: Who are those professors who are not

affiliated with Computer Science department,

but taught a CS course?

(Select P.Name From Professor P, Teaching TWhere P.Id=T.ProfId And T.CrsCode like ’CS%’))Except(Select P.Name From Professor PWhere P.DeptId=‘CS’))

Again, this is not supported with MariaDB 3.5.68,

either, but it is supported after Version 10.3.0.,

Question: Is there any alternative?

Recall that A \ B = A ∩ B = A ∩ (¬B)

= {x|x ∈ A} ∩ {x|x 6∈ B} = {x|x ∈ A ∧ x 6∈ B}.

Hence, the following query should do the trick.

MariaDB [registration]> Select P.Name-> From Professor P, Teaching T-> Where P.Id=T.ProfId And T.CrsCode like ’CS%’-> And P.DeptId!=’CS’;

Empty set (0.00 sec)

87

Is it really? Yes!

MariaDB [registration]> Select * From Teaching;+-----------+---------+----------+| ProfId | CrsCode | Semester |+-----------+---------+----------+| 9406123 | MGT123 | F1995 || 9406321 | MGT123 | F1994 || 101202303 | CS305 | S1996 || 101202303 | CS315 | F1997 || 121232343 | EE101 | F1995 || 121232343 | EE101 | S1991 || 555666777 | CS305 | F1995 || 783432188 | MGT123 | F1997 || 900120450 | MAT123 | F1997 || 900120450 | MAT123 | S1996 |+-----------+---------+----------+

MariaDB [registration]> Select * From Professor;+-----------+--------------+--------+-----+--------+| Id | Name | DeptId | Age | Salary |+-----------+--------------+--------+-----+--------+| 9406321 | Jacob Taylor | MG | 45 | 30000 || 101202303 | John Smyth | CS | 32 | 40000 || 121232343 | David Jones | EE | 56 | 25000 || 555666777 | Mary Doe | CS | 67 | 40000 || 783432188 | Adrian Jones | MG | 55 | 30000 || 864297351 | Qi Chen | MA | 34 | 35000 || 900120450 | Ann White | MA | 38 | 50000 |+-----------+--------------+--------+-----+--------+

88

Is it in?

With SQL, we can also test whether something

is a member of a finite set, a basic relation in

set theory.

Query: Who are those professors who work

either in CS or EE department?

Select P.Name From Professor P

Where P.DeptId In {’CS’, ’EE’}

In MariaDB, it looks like the following:

MariaDB [registration]> Select P.Name-> From Professor P-> Where P.DeptId In (’CS’, ’EE’);

+-------------+| Name |+-------------+| John Smyth || David Jones || Mary Doe |+-------------+

Labwork: Let’s take care of Lawork 3.2 next.

89

Nested (nasty) queries

Way back, on Page 41, we addressed the fol-lowing:

Query: Who taught in Fall 1995?

MariaDB [registration]> Select P.Name

-> From Professor P, Teaching T

-> Where P.Id=T.ProfId

-> And T.Semester=’F1995’;

+-------------+

| Name |

+-------------+

| David Jones |

| Mary Doe |

| Ann White |

+-------------+

This one is involved with a join, for which wehave to work out first a Cartesian product, anexpensive operation. /

Question: Is there a better way?

90

Let’s get nesty ...

We can also do it in two steps: a) find out

the ids of those who make the cut from the

Teaching table, then, b) using those ids to find

out their names in the Professor instance.

MariaDB [register]> Select P.Name From Professor P-> Where P.Id IN-> (Select T.ProfId From Teaching T-> Where T.Semester=’F1995’);

+-------------+| Name |+-------------+| David Jones || Mary Doe || Ann White |+-------------+3 rows in set (0.00 sec)

Question: Do you like this latter approach

better?

The fact that SQL statements can be nested

makes it much more powerful ,, but poten-

tially complex and tough to work with /.

91

Why going nesty?

In the previous example, it might be more nat-

ural (?) to come up with the nested version.

But, a much more important reason to use

nested query is that it increases SQL’s expres-

sive power in the sense that lots of things can’t

be done without this feature.

Query: Who did not take any course?

MariaDB [registration]> Select S.Name-> From Student S-> Where S.Id Not In-> # All students who takes some course-> (Select T.StudId-> From Transcript T);

+------------+

| Name |

+------------+

| Mary Smith |

+------------+

Question: Did Mary take anything?

92

Let’s find it out...MariaDB [registration]> Select Distinct C.CrsName

-> From Transcript T, Student S, Course C-> Where S.Name="Mary Smith" And S.Id=T.StudId-> And T.CrsCode=C.CrsCode;

Empty set (0.01 sec)

MariaDB [registration]> Select id, Name From Student;+-----------+---------------+| id | Name |+-----------+---------------+| 111111111 | Jane Doe || 666666666 | Jesoph Public || 111223344 | Mary Smith || 987654321 | Bart Simpson || 23456789 | Homer Simpson || 123454321 | Joe Blow |+-----------+---------------+MariaDB [registration]> Select Distinct StudId, CrsCode

-> From Transcript;+-----------+---------+| StudId | CrsCode |+-----------+---------+| 23456789 | CS305 || 23456789 | EE101 || 111111111 | EE101 || 111111111 | MAT123 || 111111111 | MGT123 || 123454321 | CS305 || 123454321 | CS315 || 123454321 | MAT123 || 666666666 | EE101 || 666666666 | MAT123 || 666666666 | MGT123 || 987654321 | CS305 || 987654321 | MGT123 |+-----------+---------+

93

An alternative

The following certainly works as well: We just

collect all these students for whom no tran-

script record exists.

MariaDB [register]> select S.Name

-> From Student S

-> Where not exists

-> (Select * from Transcript T

-> Where T.StudId=S.Id);

+------------+

| Name |

+------------+

| Mary Smith |

+------------+

1 row in set (0.01 sec)

Notice that Exists is different from In: Some-

one is in R 207 vs. John is in R 207.

94

The nasty and nesty division

Query: Who were taught by all the CS pro-

fessors.

Let’s start by finding out all the students who

were not taught by at least one CS professor.

MariaDB [registration]> Select Distinct S.Name-> From Student S,-> # All CS Professors-> (Select P.Id From Professor P-> Where P.DeptId=’CS’) As CSP-> Where CSP.Id Not In-> # Is this CS professor NOT among those-> # who taught S? If this is the case,-> # we have found the evidence, so we-> # put S’s Name into the output bucket.->-> # All those who taught S-> (Select T.ProfId-> From Teaching T, Transcript R-> Where T.CrsCode=R.CrsCode And-> T.Semester=R.Semester And-> S.Id=R.StudId);

It is a three-layer query. /

95

What do we get?

If you apply the above query to the database

instance we have created in Lab 5, we get the

following:

+---------------+

| Name |

+---------------+

| Jane Doe |

| Jesoph Public |

| Mary Smith |

| Bart Simpson |

| Homer Simpson |

+---------------+

Each and every one of them is not taught by

at least one CS faculty.

Question: Should we trust Dr. Shen? ,?

96

Absolutely not!

Question: Why is Jane in, but Joe out?.

With our instance, Table CSP leads to the fol-

lowing two Ids for Computer Science profes-

sors:

MariaDB [registration]> Select P.Id

-> From Professor P

-> Where P.DeptId=’CS’;

+-----------+

| Id |

+-----------+

| 101202303 |

| 555666777 |

+-----------+

Question: Why should Jane be included in

the output bucket?

97

Knowing that Jane’s Id is ‘’111111111’, wefind out the ProfIds of all the professors whohave taught her are the following:

MariaDB [registration]> Select T.ProfId-> From Teaching T, Transcript R-> Where T.CrsCode=R.CrsCode And-> T.Semester=R.Semester And-> R.StudId=’111111111’;

+-----------+| ProfId |+-----------+| 900120450 || 783432188 |+-----------+

Now, the query on Page 95 tries to check ifany of the Id as contained in Table CSP, i.e.,ProfId of the Computer Science professors, isnot in the above ProfId table.

If it is true, it would mean that at least oneComputer Science professor did not teach her,thus Jane should belong to the bucket of thisquery.

The very first, “101202303” is not in /. That’swhy Jane is included in the output bucket.

98

How about Joe?

Joe’s Id is ‘’123454321’, the ProfIds of all the

professors who have taught him are the follow-

ing:

MariaDB [registration]> Select T.ProfId-> From Teaching T, Transcript R-> Where T.CrsCode=R.CrsCode And-> T.Semester=R.Semester And-> R.StudId=’123454321’;

+-----------+| ProfId |+-----------+| 555666777 || 101202303 |+-----------+

Again, the query on Page 95 tries to check if

at least one CSP professor did not teach Joe.

The inside query fails for both CSP instances.

Both taught him ,. That’s why Joe Blow is

not included in the output bucket.

Thus, Dr. Shen might be correct... in this case.

99

Let’s dig a bit deeper....

Mathematically speaking, what we have got is

the following:

B = {s|∃f ( CSP(f) ∧ ¬( Teaching(f, s)))}.

Question: What is its complement (Cf. Page 87)?

S \ B = {s|¬∃f ( CSP(f) ∧ ¬ Teaching(f, s))}

= {s|∀f ¬( CSP(f) ∧ ¬ Teaching(f, s))}

De′Morgan= {s|∀f ¬ CSP(f) ∨ Teaching(f, s))}

Page79= {s|∀f CSP(f) → Teaching(f, s))}.

Therefore, the complement of B collects all

the students whom every CS faculty has taught.

Question: How do we get the complement in

SQL?

Answer: Not in. Remember “Who does not

supply any red part?” (Cf. Query 5 in lab 9)

100

Let’s play it out...

The following digs out all the students whom

every CS faculty has taught, in four layers.

MariaDB [registration]> Select Name From Student-> Where Id Not In (-> # Below is the previous query on Page 95-> Select Distinct S.Id-> From Student S,-> # All CS Professors-> (Select P.Id From Professor P-> Where P.DeptId=’CS’) As CSP-> Where CSP.Id Not In-> # Professors who has taught S-> (Select T.ProfId-> From Teaching T, Transcript R-> Where T.CrsCode=R.CrsCode And-> T.Semester=R.Semester And-> S.Id=R.StudId));

+----------+| Name |+----------+| Joe Blow |+----------+

Question: Is this result correct?

Answer: Let’s find it out...

101

The whole nine yards...

Question: Who took a CS course?

MariaDB [registration]> Select * From Transcript-> Where CrsCode like ’CS%’;

+-----------+---------+----------+-------+| StudId | CrsCode | Semester | Grade |+-----------+---------+----------+-------+| 23456789 | CS305 | S1996 | A || 123454321 | CS305 | F1995 | A || 123454321 | CS315 | F1997 | A || 987654321 | CS305 | F1995 | C |+-----------+---------+----------+-------+

Question: Who taught a CS courses?

MariaDB [registration]> Select * From Teaching-> Where CrsCode like ’CS%’;

+-----------+---------+----------+| ProfId | CrsCode | Semester |+-----------+---------+----------+| 555666777 | CS305 | F1995 || 101202303 | CS305 | S1996 || 101202303 | CS315 | F1997 |+-----------+---------+----------+

102

Question: Who are those CS professors?

MariaDB [registration]> Select Id From Professor-> Where DeptId=’CS’;

+-----------+| Id |+-----------+| 101202303 || 555666777 |+-----------+

Thus, only one student was taught by the only

two CS professors, with the id being 123454321.

Question: Who is this student?

MariaDB [registration]> Select Name From Student-> Where Id=123454321;

+----------+| Name |+----------+| Joe Blow |+----------+

Thus, the result of this four-layer query / seems

to be correct. ,

103

Could we do it with RA?

What we have got is the collection of students

who have been taught by all the CS faculties.

If we do it with Division, we would find out the

following:

Result = A Divide B V ia C,

where

A = πStudIdTranscript,

B = πProfId(σDeptId=’CS’(Professor)),

C = πStudId, ProfId

(Transcript ./CrsCode,Semester Teaching)

The above MariaDB code implements this de-

vision, although it, and SQL in general, does

not provide a direct hit on division.

Question: Do we have to use four layers of

nesting? Is there a simpler way to do it?

104

Quantified predicates

Beginning with SQL 1999, we can also have

a support of limited quantification, with the

following basic format

For All relation (condition)

For Some relation (condition)

The first returns true if, for all the tuples in

relation, condition is true; while the second

returns true if, at least one tuple in relation,

condition is true.

For example, the following tries to make sure

that every professor is teaching at least one

course. Remember participation constraint?

For All Professor

(Id In (Select T.ProfId From Teaching))

105

The exists operator

It is often necessary to check if a nested sub-query actually returns anything.

Query: Who never took any computer sciencecourse?

One way to do it is to find all CS courses astudent has taken and return those studentsfor which this list is empty. Thus,MariaDB [registration]> Select S.Name From Student S

-> Where Not Exists (-> # All CS courses taken by S.Id-> Select T.CrsCode-> From Transcript T-> Where T.CrsCode like ’CS%’-> And T.StudId=S.Id);

+---------------+| Name |+---------------+| Jane Doe || Mary Smith || Jesoph Public |+---------------+

Question: Can we use Not In in place of Not

exists? Why not?

Answer: It does not fit.

106

The All operator

Query: Who has the highest GPA among all

the students?

MariaDB [registration]> Select S.Name, S.Id

-> From Student S

-> Where S.GPA >= All (Select S.GPA

-> From Student S

-> );

+--------------+-----------+

| Name | Id |

+--------------+-----------+

| Bart Simpson | 987654321 |

+--------------+-----------+

1 row in set (0.00 sec)

Here the operator All returns true whenever

its left argument is at least as high as that of

every student or all students.

107

Is he the one?

Let’s check it out....

MariaDB [registration]> Select Name, GPA

-> From Student;

+---------------+-----+

| Name | GPA |

+---------------+-----+

| Homer Simpson | 3.3 |

| Jane Doe | 3.4 |

| Mary Smith | 0 |

| Joe Blow | 3.2 |

| Jesoph Public | 3.3 |

| Bart Simpson | 3.6 |

+---------------+-----+

6 rows in set (0.00 sec)

So, the universal quantifier is supported in the

MariaDB version as currently installed in tur-

ing, 5.5.68 as well! ,

108

An alternative solution

MariaDB also supports an aggregation opera-

tor, MAX, which can be used as follows.

MariaDB [registration]> Select S.Name, S.Id

-> From Student S

-> Where S.GPA >= (Select Max(S1.GPA)

-> From Student S1);

+--------------+-----------+

| Name | Id |

+--------------+-----------+

| Bart Simpson | 987654321 |

+--------------+-----------+

1 row in set (0.03 sec)

We will see how to embed this to a script later

on.

Labwork: It is about time to do Labwork 3.3.

109

Aggregation

As we have already seen, it is often necessary

to calculate the average, the maximum, etc.,

of the values of certain attributes. It also helps

to simplify some query (Cf. The one on the

last Page).

Thus, SQL provides a collection of such ag-

gregated operators.

Query: What is the average age of our se-

niors?

MariaDB [register]> Select ROUND(avg(S.Age), 1)-> From Student S-> Where S.Status=’Senior’;

+----------------------+| ROUND(avg(S.Age), 1) |+----------------------+| 21.5 |+----------------------+1 row in set (0.02 sec)

110

Is it correct?MariaDB [register]> Select S.Age

-> From Student S-> Where S.Status=’Senior’;

+-----+| Age |+-----+| 21 || 22 |+-----+

We have two seniors, with their average age

being 43/2=21.5. Yes!

Question: Can we make it look even “bet-

ter”?

Answer: Yeah, give it another name...

MariaDB [registration]> Select ROUND(avg(S.Age), 1)-> As AverageAge-> From Student S-> Where S.Status=’Senior’;

+------------+| AverageAge |+------------+| 21.5 |+------------+1 row in set (0.00 sec)

111

Another example

Query: Who is the youngest math professor?

# The youngest professor in Math departmentMariaDB [registration]> Select P.Name,P.Age

-> From Professor P-> Where P.DeptId=’MA’ And-> P.Age=(Select Min(P1.Age)-> From Professor P1-> Where P1.DeptId=’MA’);

+---------+-----+| Name | Age |+---------+-----+| Qi Chen | 34 |+---------+-----+

Check it out...MariaDB [registration]> Select P.Name, P.Age

-> From Professor P-> Where P.DeptId=’MA’;

+-----------+-----+| Name | Age |+-----------+-----+| Qi Chen | 34 || Ann White | 38 |+-----------+-----+

Question: Do we have to use P1?

Answer: No. Check it out... . Why?

112

More aggregation examples

Query: How many professors are there in the

Mathematics department?

MariaDB [registration]> Select Count(P.Name)

-> From Professor P

-> Where P.DeptId=’MA’;

+---------------+

| Count(P.Name) |

+---------------+

| 2 |

+---------------+

Question: Could some of them share names?

MariaDB [registration]> Select Count(Distinct P.Name)-> From Professor P-> Where P.DeptId=’MA’;

+------------------------+| Count(Distinct P.Name) |+------------------------+| 2 |+------------------------+

113

thou shall not do this

You cannot mix an aggregated quantity and

any ordinary attribute, like the following:

Select count(*), S.Id From Student S

Since count returns a single value for the set

of tuples, while S.Id tries to send back a set

of values for every tuple. / They don’t fit to-

gether with each other, so you have to quit....

Query: Which junior(s) achieved the highest

GPA among all the students?

MariaDB [registration]> Select S.Name, S.Id

-> From Student S

-> Where S.GPA >= (Select Max(S1.GPA)

-> From Student S1)

-> And S.Status=’junior’;

Empty set (0.00 sec)

114

Aggregation and grouping

Although we know how to count professors for

one department, what happens if we need this

information for all the departments? This is

where the “Group by” clause is used.

This clause will group rows of a table that

agree on values of a specified subset of at-

tributes.

Question: What does it mean? / An example

...

115

... might help ,

Query: How many professors are there in each

department, and what are their average ages?

MariaDB [registration]> select * from Professor;+-----------+--------------+--------+-----+--------+| Id | Name | DeptId | Age | Salary |+-----------+--------------+--------+-----+--------+| 9406321 | Jacob Taylor | MG | 45 | 30000 || 101202303 | John Smyth | CS | 32 | 40000 || 121232343 | David Jones | EE | 56 | 25000 || 555666777 | Mary Doe | CS | 67 | 40000 || 783432188 | Adrian Jones | MG | 55 | 30000 || 864297351 | Qi Chen | MA | 34 | 35000 || 900120450 | Ann White | MA | 38 | 50000 |+-----------+--------------+--------+-----+--------+7 rows in set (0.00 sec)

We can see, e.g., there are two professors in

Computer Science, and their average age is

(32+67)/2, i.e., 49.5.

What we need to do is to group together all

such rows agreeing on DeptId, and then apply

various operations.

116

How to do it sqlly?

We simply group all the tuples by their Dept.Id,

then apply such aggregated operators as count

and avg.

MariaDB [registration]> Select P.DeptId,-> count(P.Name) As DeptSize,-> ROUND(Avg(P.Age), 1) As AvgAge-> From Professor P-> Group By P.DeptId;

+--------+----------+--------+| DeptId | DeptSize | AvgAge |+--------+----------+--------+| CS | 2 | 49.5 || EE | 1 | 56.0 || MA | 2 | 36.0 || MG | 2 | 50.0 |+--------+----------+--------+4 rows in set (0.00 sec)

The key issue here is that, each column in the

resulted table is either named in the Group by

statement, or is the result of applying certain

aggregation to the tuples of that group.

117

Another example

Query: How many courses does each student

take, and what are their average grades?

Select T.StudId, Count(*) As NumCourses,

Avg(T.Grade) As CrsAvg

From Transcript T

Group By T.StudId

The result could be the following:

StudId NumCourses CrsAvg

6666 3 3.339876 2 2.501234 3 3.330234 2 3.50

For each student, it calculates the number of

courses, together with the average grade that

she has achieved in those courses.

118

The Having clause

Similar to “Where”, the “Having” clause is used

together with the “Group By” to indicate which

groups should be included in the final result.

Whenever a group is generated, this condition

will be applied first. If this group does not

meet the cut, it will not be included.

MariaDB [registration]> Select P.DeptId,-> Count(P.Name) As DeptSize,-> ROUND(Avg(P.Age), 1) As AvgAge-> From Professor P-> Group By P.DeptId-> Having count(*) > 1;

+--------+----------+--------+| DeptId | DeptSize | AvgAge |+--------+----------+--------+| CS | 2 | 49.5 || MA | 2 | 36.0 || MG | 2 | 48.5 |+--------+----------+--------+3 rows in set (0.00 sec)

EE is out, because....

119

More examples

Query: Who achieves more than 3.5 at the

end of this academic year?

Select T.StudId, Count(*) As NumCrs,

Avg(T.Grade) As CrsAvg

From Transcript T

Where T.Semester In (’F2021’, ’S2022’)

Group By T.StudId

Having Avg(T.Grade)>3.5

This can also be done without the Having clause

in two steps:

Select Stats.StudId, Stats.CrsAvg

From (Select T.StudId, Avg(T.Grade)As CrsAvg

From Transcript T

Where T.Semester In (’F2021’, ’S2022’)

Group By T.StudId) As Stats

Where Stats.CrsAvg>3.5

120

Put them into order

We sometimes want to line up all the rows in

the result, using the Order by clause. Thus, if

we add the following

Order by CrsAvg

at the end of the previous query, that list will

be sorted by their average GPA.

If we include instead the following

Order by CrsAvg, StudId

then this list will be sorted by the average GPA,

and with those sharing the same GPA, sorted

by their student ID.

121

deja vu

Query: Who took at least two courses?

We once did it using join as follows (Cf. Pages

52-54):MariaDB [registration]> Select distinct T1.StudId

-> From Transcript T1, Transcript T2-> Where T1.StudId=T2.StudId-> And T1.CrsCode <> T2.CrsCode;

+-----------+| StudId |+-----------+| 23456789 || 111111111 || 123454321 || 666666666 || 987654321 |+-----------+

Question: What to do if we want students

who have taken three, four, or five courses? ,

It turns out that the aggregation operators do

a better job: pick up all those students such

that the number of courses that they have

taken is at least 2.

Question: How to do it?

122

Begin with the very beginning...

The following one finds out the StudId and the

courses they have taken.

MariaDB [registration]> Select StudId, CrsCode

-> From Transcript;

+-----------+---------+| StudId | CrsCode |+-----------+---------+| 23456789 | CS305 || 23456789 | EE101 || 111111111 | EE101 || 111111111 | MAT123 || 111111111 | MGT123 || 123454321 | CS305 || 123454321 | CS315 || 123454321 | MAT123 || 666666666 | EE101 || 666666666 | MAT123 || 666666666 | MGT123 || 987654321 | CS305 || 987654321 | MGT123 |+-----------+---------+

Question: Is this what we want? No.

123

Question: What do we really want?

Answer: We want to know the number of the

courses they have taken.

Question: How to get it SQLly?

MariaDB [registration]> Select StudId,

-> Count(CrsCode) as Num

-> From Transcript

-> Group by StudId;

This code gives us the following:

+-----------+-----+

| StudId | Num |

+-----------+-----+

| 23456789 | 2 |

| 111111111 | 3 |

| 123454321 | 3 |

| 666666666 | 3 |

| 987654321 | 2 |

+-----------+-----+

124

Then what?

We cut out those who took less than two courses

to get the final answer.

MariaDB [registration]> Select StudId,-> Count(CrsCode) as Num-> From Transcript-> Group by StudId-> Having Num>=2-> Order by StudId;

+-----------+-----+| StudId | Num |+-----------+-----+| 23456789 | 2 || 111111111 | 3 || 123454321 | 3 || 666666666 | 3 || 987654321 | 2 |+-----------+-----+

In the above, the Order clause lines up the stuff.

Question: Joe Pecci: “Is that it?”

Answer: No!

125

Who are those kids?

Find them out by joining the last table with

Student.

MariaDB [registration]> Select S.Name-> From (Select StudId, count(CrsCode) as Num-> From Transcript-> Group by StudId-> #Taking at least 2-> Having Num>=2-> Order by StudId) T, Student S-> Where T.StudId=S.Id-> Order by S.Name;

+---------------+

| Name |

+---------------+

| Homer Simpson |

| Jane Doe |

| Joe Blow |

| Jesoph Public |

| Bart Simpson |

+---------------+

126

How about at least three courses?

We simply change 2 to 3 in the above query:

Database changedMariaDB [registration]> Select S.Name

-> From (Select StudId, count(CrsCode) as Num-> From Transcript-> Group by StudId-> #Taking at least 3-> Having Num>=3-> Order by StudId) T, Student S-> Where T.StudId=S.Id-> Order by S.Name;

+---------------+

| Name |

+---------------+

| Jane Doe |

| Jesoph Public |

| Joe Blow |

+---------------+

Question: How to find those kids who have

taken at least 25 courses? ,

127

One step a time?

Join is expensive. There are certainly other

ways... e.g.,

MariaDB [registration]> Select S.Name-> From (Select StudId, count(CrsCode) as Num-> From Transcript-> Group by StudId-> #Taking at least 2-> Having Num>=2-> Order by StudId) T, Student S-> Where T.StudId=S.Id-> Order by S.Name;

+---------------+| Name |+---------------+| Bart Simpson || Homer Simpson || Jane Doe || Jesoph Public || Joe Blow |+---------------+

Again, instead of getting an intermediate table

through a product, the above got it from two

tables, thus cutting down both space and time.

128

The whole ninety feet...

1. The From part will be evaluated first to

produce a Cartesian product of all the tables

mentioned there.

2. The Where clause will be evaluated to pro-

cess each row of the above product table indi-

vidually to see if it makes the cut, and throw

out those who don’t.

3. The Group By clause will then be evaluated

to split the previously cleaned table into groups

where each group consists of those tuples that

agree on the specified attributes as mentioned

in the group clause.

4. The Having clause will then be evaluated

to cut out those groups that don’t satisfy the

Having condition.

129

5. The Select part will be evaluated. It takes

the leftover groups, evaluates the aggregated

functions in the target list for each group, re-

tains those columns that are listed as argu-

ments of the Select statement, and generates

one result row for each group.

6. The rows are ordered with Order by.

130

A final example

Considering two tables, Dept

DEPTId DNAME BUDGET

D1 Marketing 10MD2 Development 12MD3 Research 5M

and Emp

EMPId ENAME DEPTId SALARY

E1 Lopez D1 40KE2 John D1 42KE3 Bob D2 30KE4 Jay D2 35K

and the query

Select D.Dname, Avg(E.Salary)As SalaryAvgFrom Dept D, Emp EWhere D.DName In (’Development’,’Research’)

And D.DeptId=E.DeptIdGroup By D.DeptIdIdOrder by D.DName;

131

How is it evaluated?

1. The Cartesian product of the two tables,

Dept and Emp, as mentioned in the From part, is

constructed as follows:

DId DNAME BUDGET EMPId ENAME DId SALARY

D1 Market 10M E1 Lopez D1 40KD1 Market 10M E2 John D1 42KD1 Market 10M E3 Bob D2 30KD1 Market 10M E4 Jay D2 35KD2 Develop 12M E1 Lopez D1 40KD2 Develop 12M E2 John D1 42KD2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35KD3 Research 5M E1 Lopes D2 40KD3 Research 5M E2 John D2 42KD3 Research 5M E3 Bob D2 30KD3 Research 5M E4 Jay D2 35K

2. Then, the Where part is evaluated, to get

those tuples satisfying the condition

D.DName In (’Develop’,’Research’)And D.Dept#=E.Dept#

DId DNAME BUDGET EMPId ENAME DId SALARY

D2 Develop 12M E3 Bob D2 30KD2 Develop 12M E4 Jay D2 35K

132

3. The Group by clause is evaluated to group

the above tuples into groups with the same

DId’s:

DId DNAME BUDGET EMPId ENAME DId SALARY

D2 Develop 12M E3 Bob D2 30KDevelop 12M E4 Jay D2 35K

4. There is no Having clause, so all groups stay.

5. The Select part is done, giving the following

table of Stats:

DNAME SalaryAvG

Develop 32.5K

6. Since there is only one row, the result stays

the same under Order by.

133

This is what MariaDB says...

MariaDB [Strange]> Select D.DName,

-> ROUND(Avg(E.Salary), 2)

-> As SalaryAvg

-> From Dept D, Emp E

-> Where D.DName In (’Development’,

-> ’Research’)

-> And D.DeptId=E.DeptId

-> Group By D.DeptId

-> Order by D.DName;

+-------------+-----------+

| DName | SalaryAvg |

+-------------+-----------+

| Development | 32500.00 |

+-------------+-----------+

1 row in set (0.00 sec)

Labwork: Let’s get it over with Labwork 3.4.

134

Views in SQL

A view is simply a virtual table. We can create

views and then use them in SQL queries. For

example,

MariaDB [register]> Create View AvgDeptAge(Dept, AvgAge) As

-> Select P.DeptId, Avg(P.Age)-> From Professor P

-> Group By P.DeptId;Query OK, 0 rows affected (0.01 sec)MariaDB [registration]> Select Dept, ROUND(AvgAge)

-> From AvgDeptAge;+------+---------------+

| Dept | ROUND(AvgAge) |+------+---------------+| CS | 50 |

| EE | 56 || MA | 36 |

| MG | 50 |+------+---------------+4 rows in set (0.00 sec)

If you get bored with it, use the following to

remove it.

MariaDB [register]> drop view AvgDeptAge;

Query OK, 0 rows affected (0.00 sec)

135

Then what?

We can then use this subroutine, (method) in

further programming, e.g., to find out the de-

partment with the minimum average age

MariaDB [register]> Select A.Dept, ROUND(A.AvgAge)-> From AvgDeptAge A-> Where A.AvgAge=(-> Select Min(A.AvgAge)-> From AvgDeptAge A);

+------+-------------------+| Dept | ROUND(A.AvgAge) |+------+-------------------+| MA | 36 |+------+-------------------+

Thus, a view is similar to a pre-defined method

(function) as we have been using a lot in Java

(Python) or any other programming languages,

which cuts away unnecessary details.

Question: Did Adam talk about it earlier? ,

We will go through some view based program-

ming in Lab 4 this Friday to wrap up DB pro-

gramming.

136

What else?

There are a few reasons why view is desirable.

1. It provides automatic security for hidden

data, e.g., with the PresidentList view, a user

cannot access their Ids: it does not have any.

Create View PresodemtList AsSelect T.Name, Count(*) As NumCrs,

Avg(T.Grade) As CrsAvgFrom Transcript TWhere T.Semester In (’F2021’, ’S2022’)Group By T.StudIdHaving Avg(T.Grade)>= 3.7;

2. It allows a user to pay attention to what

s/he is interested in, and ignore the rest.

3. It makes MariaDB programming much moreexpressive and powerful, as you will see through

Lab 4.

4. It provides logic data independence...?

Incidentally, View is not implemented in MySQL

(MariaDB) until version 5.1. ,

137

Logic data independence

As discussed in an earlier chapter, logical data

independence refers to the immunity of users

and user programs to changes in the logic struc-

ture of the database. Views provide the means

to achieve such immunity in two aspects: growth

and restructuring.

When a database grows to get in more infor-

mation, its structure must grow accordingly: it

may have to include new attributes, and even

new tables.

We already revised the Student table in Lab 11,

and will go through a Professor restructuring

example in Lab 12, as described in Section 4

in the MariaDB notes.

At least in principle, neither of these changes

should have any effect on existing users or user

programs at all. Otherwise, we are in deep

trouble. /

138

What trouble?

From time to time, it is necessary to restruc-

ture the database such that the overall content

remains the same, but the structure of infor-

mation changes. For example, at some point,

we wish to replace the original Student table by

the following two tables:

Create table StudentBasic (

Id Integer,

Name Char(20), Not null

Address Char(50)

Primary Key (Id))

Create table StudentStatus (

Id Integer,

Status Char(10), default ’freshman’

Primary Key (Id))

Now, all the application programs based on the

Student table can no longer be used... /

139

What to do?

We just create a view as follows:

Create View Student As

Select B.StudId, B.Name, B.Address,S.Status

From StudentBasic B, StudentStatus S

Where B.Id=S.Id

When this change takes place, application pro-grams that previously referred to the originalStudent table will now refer to the Student view,thus nothing changes externally from a user’sperspective, although the table structure haschanged at the conceptual level (Still remem-ber this stuff?).

All the applications that we have developedover the years can still be used, through thisview. ,

Question: When is the Final? /

Answer: Four weeks down the road... on De-cember 15, 2021.

140

Two principles

Views really serve two rather different purposes:

1) To a user who defines the view, it is really

just a shorthand for a “subroutine”. 2) To

other external users, it should look and behave

exactly like a table.

The Interchangeability Principle: There is no

distinction between tables and views from an

external perspective.

The Database Relativity Principle: As far as

the information equivalence is concerned, the

choice of which database is the real one is ar-

bitrary, as well.

We only care about the content (what is it?),

but not the format (how is it kept?).

141

Modify the database

We have discussed mostly how to get informa-

tion out of a database via queries.

In reality, a database changes its data all the

time. We can either add more rows, take some

out, or modify the existing rows, as we have

been doing throughout this semester.

When we have to insert a large quantity of

rows, there is a much easier way to do it.

In Section 4 of the MariaDB notes, you can

find out how to easily fill an easyClass table.

Sometimes, we just want to have the informa-

tion, but not physically keep them.

In Lab 12, the final lab for individuals, you will

dig out information of hard(er) classes, with a

hardClass view.

142

Update tables

It is pretty easy to update an existing row in a

table if you can identify it.

MariaDB [register]> Select Grade From Transcript-> Where StudId=’666666666’-> And CrsCode=’EE101’;

+-------+| Grade |+-------+| B |+-------+

MariaDB [register]> Update Transcript

-> Set Grade=’A’

-> Where StudId=’666666666’

-> And CrsCode=’EE101’;

Query OK, 1 row affected (0.00 sec)

Rows matched: 1 Changed: 1 Warnings: 0

MariaDB [register]> Select Grade From Transcript-> Where StudId=’666666666’-> And CrsCode=’EE101’;

+-------+| Grade |+-------+| A |+-------+

143

Another example

If, instead of firing those tough professors, who

failed more than half of a class /, we want to

move them over to administration positions ,.

We can do the following, where hardClass is a

view that you will create in Lab 12.

Update Professor

Set DeptId=’Adm’

Where id In

(Select T.ProfId

From Teaching T, hardClass H

Where T.CrsCode=H.CrsCode

And T.Semester=H.Semester

And H.FailRate>0.5)

144

All politics are local....

The following does not work with the current

version, 5.5.68, of MariaDB.

MariaDB [registration]> Update Professor P1-> Set P1.Salary = P1.Salary*1.1-> Where P1.Id in-> (Select P.Id-> From Professor P, Teaching T-> Where P.age < 40 and P.Id = T.ProfId-> and T.CrsCode =’MAT123’-> and (T.Semester = ’S1997’-> or T.Semester = ’F1997’));

ERROR 1093 (HY000): You can’t specify target table ’P1’for update in FROM clause

I found the following disclaimer in §13.2.11.

UPDATE Syntax of MySQL 5.7 Reference Man-

ual: “Currently, you cannot update a table and

select from the same table in a subquery.” /

But, it is mentioned that, since MariaDB 10.3.2,

UPDATE statements may have the same source

and target. ,

145

Update on views

It is natural to allow the programmers to up-

date them as well. But, it is tough to do....

1. Assume we have a simple projective view

on the Transcript, with only three attributes,

CrsCode, StudId and Semester. If we add a row

into this view, which is further put into the

Transcript table, then the Grade piece is miss-

ing in that row in the table. This can be filled

with a null value if it is permitted by the as-

sociated ICs. Otherwise, it would be rejected.

You would not know why.

2. Assume that we have another view CSProf

over the Professor table, generated with a re-

striction of DeptId=’CS’. Assume that we now

add in a row (1212, ’Paul Schemit’, ’EE’) into

this view, then the table.

When we later query this view, we will not be

able to get the row back even though we just

added it in. (Remember durability?)

146

3. Moreover, the impact of a view update can

be ambiguous. This could lead to serious con-

sequence. For example, given the following

view

Create View ProfDept (PrName, DeName) AS

Select P.Name, D.Name

From Professor P, Dept D

Where P.DeptId = D.DeptId

If we delete a row (’Smyth’,’CS’) from the

view, we could either delete the row for ‘Smyth’

from Professor, or the row for ‘CS’ from Dept,

or set the value for DeptId for ‘Smyth’ in Professor

to null.

Question: What should DBMS do? /

147

A little summary

View update is not always doable. Much work

has been done in this regard, but no consen-

sus has emerged. SQL thus has taken a sim-

ple minded approach by accepting only a very

limited case of view update, called updatable

views.

1. Exactly one table can be included in the

From part.

2. Neither aggregates, Group By clause, Having

clause, nor set operators are allowed.

3. Nested sub-queries in the Where part can’t

refer to the table mentioned in the From part.

4. No expressions, or Distinct keyword are

allowed in the Select part.

148

An example

Below shows an updatable view:

Create View CanTeach(Professor, Course) As

Select T.ProfId, T.CrsCode

From Teaching T

Assume we want to delete a pair (0940, MGT123)

from the view. Then all the rows in the Teaching

table must be deleted.

Labwork: Let’s wrap this up with Lab 12 onLabwork 4, due at 9 p.m., Friday, November 5,2021.

We have learned a lot, don’t we? It is time todo something... .

Based on these programming labs, teams shouldget together and come up with queries for yourprojects.

Project III is due by 9 p.m., Wednesday, Novem-ber 10, 2021.

149